Hi Team, Any updates on DRILL-4520 <https://issues.apache.org/jira/browse/DRILL-4520> ? Also i came across DRILL-4653 <https://issues.apache.org/jira/browse/DRILL-4653> , in 1.9.0 release it is listed in improvements but its current status is reopened. Cam someone confirm on this?
Regards, *Anup Tiwari* On Fri, Mar 25, 2016 at 3:13 PM, Shankar Mane <[email protected]> wrote: > I have opened JIRA for the same. You can track it here " > https://issues.apache.org/jira/browse/DRILL-4520" > > For the time being I am using older approach which is *hive*. > > *Below are the steps I am following:* > 1. Drill - Creating flattened table from HDFS json logs. But as I mentioned > in trail mail, there are bugs/exceptions. So SKIPPING this step. > 1. Repeating step-1 again using hive. Here I am using hive 1.2.1 and > json-serde-1.3.7-jar. > > - a) Defining schema for input json by creating hive external table > (using json-serde). > - b) Defining hive parquet table schema which can be filled through hive > external table. Here all fields in parquet table marked as STRING. This > step is similar to creating parquet table in Drill. > - c) hive will create table and stored all data in HDFS and in parquet > format. > - d) And later use this HDFS path in drill. > > 2. Here now onward Using *drill* to process further queries which works > fine. > > *Here my question is : * > 1. is this write approach to create parquet table using hive and used it > for drill ? > 2. Parquet table created by hive and drill will not make any differences > and causes inconsistency ? > > > We know that drill discovers the schema on-the-fly where hive does not. In > hive, we need to explicitly defined schema. So I can say here that - > 1. Hive explicitly converts data into predefined datatypes where drill > doesn't unless we do cast. > Say for ex: A column with different data types would explicitly convert > into predefined data types in hive But in case of drill it doesn't works > either normal or by casting. > > Please provide us any alternative way or any suggestion. > regards, > shankar > > > > For CTAS, I also hit a NPE when storage format was Parquet (default). > > With storage format as JSON, i hit this error: > Error: SYSTEM ERROR: IllegalArgumentException: You tried to read a > [readText()] type when you are using a field reader of type > [UnionListReader]. > > Since this is still an experimental feature, I'm not sure if someone tried > out CTAS previously. Could you open a JIRA for this? Or let me know if you > want me to open one instead. > > And since you mention queries without CTAS works fine, can you create views > instead and query that (I tried this and it works fine)? > > On Fri, Mar 18, 2016 at 1:29 PM, Shankar Mane <[email protected]> > wrote: > > > @Abhishek: > > > > Some events in 150 gb json file are like this where they differ in > > structure. I could say there are only 0.1% (per 150gb json file) are such > > events. > > > > And yes, union work perfectly. But only when we use select statements. > > > > Could you please change your select query to CTAS ? I am getting > > nullpointer exceptions. > > On 19 Mar 2016 01:35, "Abhishek Girish" <[email protected]> > wrote: > > > > > Hello Shankar, > > > > > > From the sample data you shared, it looks like you have JSON documents > > > which differ considerably in the schema / structure. This isn't > supported > > > by default. > > > > > > You could try turning on UNION type (an experimental feature). > > > > > > > set `exec.enable_union_type` = true; > > > +-------+----------------------------------+ > > > | ok | summary | > > > +-------+----------------------------------+ > > > | true | exec.enable_union_type updated. | > > > +-------+----------------------------------+ > > > 1 row selected (0.193 seconds) > > > > > > > select > > > > `timestamp`, > > > > sessionid, > > > > gameid, > > > > ajaxUrl, > > > > ajaxData > > > > from dfs.`/tmp/test1.json` t; > > > > > > > > > +----------------+--------------------------------+--------- > ------------------------------------------------------------ > -------------------------+-----------------------+-----------+ > > > | timestamp | sessionid | > > > gameid | > > > ajaxUrl | ajaxData | > > > > > > > > > +----------------+--------------------------------+--------- > ------------------------------------------------------------ > -------------------------+-----------------------+-----------+ > > > | 1457658600032 | BC497C7C39B3C90AC9E6E9E8194C3 | null > > > | > > > /player/updatebonus1 | null | > > > | 1457771458873 | D18104E8CA3071C7A8F4E141B127 | > > > > > > > > > https://daemon2.com/tournDetails.do?type=myGames& > userId=1556148&jsonp_callback=jQuery213043 > > > | [] | null | > > > | 1457958600032 | BC497C7C39B3C90AC9E6E9E8194C3 | null > > > | > > > /player/updatebonus2 | null | > > > > > > > > > +----------------+--------------------------------+--------- > ------------------------------------------------------------ > -------------------------+-----------------------+-----------+ > > > > > > 3 rows selected (0.36 seconds) > > > > > > > > > Regards, > > > Abhishek > > > > > > On Fri, Mar 18, 2016 at 12:02 PM, Shankar Mane < > > [email protected] > > > > > > > wrote: > > > > > > > Guys, > > > > > > > > > > > > 1. I am stuck in the middle of somewhere. Could you please help me > > to > > > > resolve below error. > > > > 2. I am running query on drill 1.6.0 in cluster on logs json data > > > (150GB > > > > size of log file) ( 1 json / line). > > > > > > > > > > > > I have just extract 3 lines from logs for test purpose. please find > > > those > > > > lines below. > > > > > > > > > > > > -- ------------------------------------------- *test.json* > > > > ------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/ > updatebonus1","selectedItem":null,"sessionid":" > BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032} > > > > {"gameId":" > > > > > > > > > > > > > > https://daemon2.com/tournDetails.do?type=myGames& > userId=1556148&jsonp_callback=jQuery213043 > > > > > > > > > > ","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo": > 1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1"," > reg":true,"brkt":1457771400268,"sk":"25070010105301000009","id": > 56312439,"a":0,"st":1457771400000,"e":"0.0","j":0,"n":"Loot > > > > Qualifier > > > > > > > > > > > > > > 1","tc":94,"et":0,"syst":1457771456,"rc":145770000,"s": > 5,"t":10000,"tk":false,"prnId":56311896,"jc":10000,"tp":" > 100000.0","ro":145400000,"rp":0,"isprn":false},"fl":"192. > 168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":" > D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873} > > > > > > > > > > > > > > {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/ > updatebonus2","selectedItem":null,"sessionid":" > BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032} > > > > > > > > -- ------------------------------------------------ *Query* > > > > -------------------------------------------- > > > > > > > > > > > > select > > > > `timestamp`, > > > > sessionid, > > > > gameid, > > > > ajaxUrl, > > > > ajaxData > > > > from dfs.`/tmp/test.json` t > > > > ; > > > > > > > > > > > > > > > > Error: DATA_READ ERROR: Error parsing JSON - You tried to start when > > you > > > > are using a ValueWriter of type NullableVarCharWriterImpl. > > > > > > > > File /tmp/test.json > > > > Record 2 > > > > Fragment 0:0 > > > > > > > > > >
