Hi Team,

Any updates on DRILL-4520 <https://issues.apache.org/jira/browse/DRILL-4520>
? Also i came across DRILL-4653
<https://issues.apache.org/jira/browse/DRILL-4653> , in 1.9.0 release it is
listed in improvements but its current status is reopened. Cam someone
confirm on this?

Regards,
*Anup Tiwari*

On Fri, Mar 25, 2016 at 3:13 PM, Shankar Mane <[email protected]>
wrote:

> I have opened JIRA for the same. You can track it here "
> https://issues.apache.org/jira/browse/DRILL-4520";
>
> For the time being I am using older approach which is *hive*.
>
> *Below are the steps I am following:*
> 1. Drill - Creating flattened table from HDFS json logs. But as I mentioned
> in trail mail, there are bugs/exceptions. So SKIPPING this step.
> 1. Repeating step-1 again using hive. Here I am using hive 1.2.1 and
> json-serde-1.3.7-jar.
>
>    - a) Defining schema for input json by creating hive external table
>    (using json-serde).
>    - b) Defining hive parquet table schema which can be filled through hive
>    external table. Here all fields in parquet table marked as STRING. This
>    step is similar to creating parquet table in Drill.
>    - c) hive will create table and stored all data in HDFS and in parquet
>    format.
>    - d) And later use this HDFS path in drill.
>
> 2. Here now onward Using *drill* to process further queries which works
> fine.
>
> *Here my question is : *
> 1. is this write approach to create parquet table using hive and used it
> for drill ?
> 2. Parquet table created by hive and drill will not make any differences
> and causes inconsistency ?
>
>
> We know that drill discovers the schema on-the-fly where hive does not. In
> hive, we need to explicitly defined schema.  So I can say here that -
> 1. Hive explicitly converts data into predefined datatypes where drill
> doesn't unless we do cast.
> Say for ex: A column with different data types would explicitly convert
> into predefined data types in hive But in case of drill it doesn't works
> either normal or by casting.
>
> Please provide us any alternative way or any suggestion.
> regards,
> shankar
>
>
>
> For CTAS, I also hit a NPE when storage format was Parquet (default).
>
> With storage format as JSON, i hit this error:
> Error: SYSTEM ERROR: IllegalArgumentException: You tried to read a
> [readText()] type when you are using a field reader of type
> [UnionListReader].
>
> Since this is still an experimental feature, I'm not sure if someone tried
> out CTAS previously. Could you open a JIRA for this? Or let me know if you
> want me to open one instead.
>
> And since you mention queries without CTAS works fine, can you create views
> instead and query that (I tried this and it works fine)?
>
> On Fri, Mar 18, 2016 at 1:29 PM, Shankar Mane <[email protected]>
> wrote:
>
> > @Abhishek:
> >
> > Some events in 150 gb json file are like this where they differ in
> > structure. I could say there are only 0.1% (per 150gb json file) are such
> > events.
> >
> > And yes, union work perfectly. But only when we use select statements.
> >
> > Could you please change your select query to CTAS ?   I am getting
> > nullpointer exceptions.
> > On 19 Mar 2016 01:35, "Abhishek Girish" <[email protected]>
> wrote:
> >
> > > Hello Shankar,
> > >
> > > From the sample data you shared, it looks like you have JSON documents
> > > which differ considerably in the schema / structure. This isn't
> supported
> > > by default.
> > >
> > > You could try turning on UNION type (an experimental feature).
> > >
> > > > set `exec.enable_union_type` = true;
> > > +-------+----------------------------------+
> > > |  ok   |             summary              |
> > > +-------+----------------------------------+
> > > | true  | exec.enable_union_type updated.  |
> > > +-------+----------------------------------+
> > > 1 row selected (0.193 seconds)
> > >
> > > > select
> > > > `timestamp`,
> > > > sessionid,
> > > > gameid,
> > > > ajaxUrl,
> > > > ajaxData
> > > > from dfs.`/tmp/test1.json` t;
> > >
> > >
> >
> +----------------+--------------------------------+---------
> ------------------------------------------------------------
> -------------------------+-----------------------+-----------+
> > > |   timestamp    |           sessionid            |
> > >                    gameid                                            |
> > >    ajaxUrl        | ajaxData  |
> > >
> > >
> >
> +----------------+--------------------------------+---------
> ------------------------------------------------------------
> -------------------------+-----------------------+-----------+
> > > | 1457658600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null
> > >                                                                       |
> > > /player/updatebonus1  | null      |
> > > | 1457771458873  | D18104E8CA3071C7A8F4E141B127   |
> > >
> > >
> >
> https://daemon2.com/tournDetails.do?type=myGames&;
> userId=1556148&jsonp_callback=jQuery213043
> > >  | []                    | null      |
> > > | 1457958600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null
> > >                                                                       |
> > > /player/updatebonus2  | null      |
> > >
> > >
> >
> +----------------+--------------------------------+---------
> ------------------------------------------------------------
> -------------------------+-----------------------+-----------+
> > >
> > > 3 rows selected (0.36 seconds)
> > >
> > >
> > > Regards,
> > > Abhishek
> > >
> > > On Fri, Mar 18, 2016 at 12:02 PM, Shankar Mane <
> > [email protected]
> > > >
> > > wrote:
> > >
> > > > Guys,
> > > >
> > > >
> > > >    1. I am stuck in the middle of somewhere. Could you please help me
> > to
> > > >    resolve below error.
> > > >    2. I am running query on drill 1.6.0 in cluster on logs json data
> > > (150GB
> > > >    size of log file) ( 1 json / line).
> > > >
> > > >
> > > > I have just extract 3 lines from logs for test purpose. please find
> > > those
> > > > lines below.
> > > >
> > > >
> > > > -- ------------------------------------------- *test.json*
> > > > -------------------------------------------------
> > > >
> > > >
> > > >
> > > >
> > >
> >
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/
> updatebonus1","selectedItem":null,"sessionid":"
> BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032}
> > > > {"gameId":"
> > > >
> > > >
> > >
> >
> https://daemon2.com/tournDetails.do?type=myGames&;
> userId=1556148&jsonp_callback=jQuery213043
> > > >
> > >
> >
> ","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo":
> 1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1","
> reg":true,"brkt":1457771400268,"sk":"25070010105301000009","id":
> 56312439,"a":0,"st":1457771400000,"e":"0.0","j":0,"n":"Loot
> > > > Qualifier
> > > >
> > > >
> > >
> >
> 1","tc":94,"et":0,"syst":1457771456,"rc":145770000,"s":
> 5,"t":10000,"tk":false,"prnId":56311896,"jc":10000,"tp":"
> 100000.0","ro":145400000,"rp":0,"isprn":false},"fl":"192.
> 168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":"
> D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873}
> > > >
> > > >
> > >
> >
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/
> updatebonus2","selectedItem":null,"sessionid":"
> BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032}
> > > >
> > > > -- ------------------------------------------------ *Query*
> > > > --------------------------------------------
> > > >
> > > >
> > > > select
> > > > `timestamp`,
> > > > sessionid,
> > > > gameid,
> > > > ajaxUrl,
> > > > ajaxData
> > > > from dfs.`/tmp/test.json` t
> > > > ;
> > > >
> > > >
> > > >
> > > > Error: DATA_READ ERROR: Error parsing JSON - You tried to start when
> > you
> > > > are using a ValueWriter of type NullableVarCharWriterImpl.
> > > >
> > > > File  /tmp/test.json
> > > > Record  2
> > > > Fragment 0:0
> > > >
> > >
> >
>

Reply via email to