Re: Can I use hive table column as json

John Schneider Thu, 30 Jul 2015 00:08:31 -0700

Jacques - thanks for your reply. It is very much appreciated

I was able to create a view but when accessing the one column in the view I
got an IllegalArgumentException. I tried a casting values but keep getting
back to this error.


One thing I saw though is perhaps not the issue is that when I describe the
view, the line I converted to JSON is an ANY
My transcript below:


CREATE VIEW logView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
hive.production_logs_hcc_24

0: jdbc:drill:drillbit=drill.apixio.com> use dfs.`tmp`;
+-------+--------------------------------------+
|  ok   |               summary                |
+-------+--------------------------------------+
| true  | Default schema changed to [dfs.tmp]  |
+-------+--------------------------------------+
1 row selected (0.353 seconds)
0: jdbc:drill:drillbit=drill.apixio.com> CREATE VIEW logView AS SELECT
CONVERT_FROM(line, 'JSON') AS line FROM hive.production_logs_hcc_24;
+-------+----------------------------------------------------------+
|  ok   |                         summary                          |
+-------+----------------------------------------------------------+
| true  | View 'logView' created successfully in 'dfs.tmp' schema  |
+-------+----------------------------------------------------------+
1 row selected (0.272 seconds)

0: jdbc:drill:drillbit=drill.apixio.com> describe logView;
+--------------+------------+--------------+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--------------+------------+--------------+
| line         | ANY        | YES          |
+--------------+------------+--------------+
1 row selected (0.125 seconds)

0: jdbc:drill:drillbit=drill.apixio.com> select t.line.level from logView t
limit 10;
Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
type when you are using a ValueWriter of type NullableVarCharWriterImpl.

Fragment 0:0

[Error Id: 655d0dbd-4d8f-433f-96a6-7ab85efb0b8b on hadoop-data-drill:31010]
(state=,code=0)
0: jdbc:drill:drillbit=drill.apixio.com> select t.line from logView t limit
10;
Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
type when you are using a ValueWriter of type NullableVarCharWriterImpl.

Fragment 0:0

[Error Id: fee663a6-b75c-414c-9d47-39b1e2437558 on hadoop-data-drill:31010]
(state=,code=0)


_____________

john o schneider
[email protected]
408-203-7891


On Wed, Jul 29, 2015 at 12:20 PM, Jacques Nadeau <[email protected]> wrote:

> Drill doesn't implicitly cast from VarChar -> Map.  However, just create a
> Drill view:
>
> CREATE VIEW newView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
> hive.production_logs_hcc
>
> Then just query newView.
>
> Then 'select t.line.level from newView' should work just fine.
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Wed, Jul 29, 2015 at 12:02 PM, John Schneider <[email protected]>
> wrote:
>
> > I've seen examples from drill queries on HBase (all columns are strings)
> > access each column as a json object as in:
> >
> > select t.column_a.xField, t.column_b.yField from some_hbase_table t
> >
> > I'd like to do the same using my hive tables where I have production
> > logging
> >
> > My production log tables have the following schema
> >
> > CREATE EXTERNAL TABLE logging_master_schema(
> >  line string
> > ) partitioned by (month string, week string, day string);
> >
> > each "line" is a log line in json, we use the partitions
> > to narrow down lines we crunch in hive queries where we
> > use fucntion calls like this get_json_object(line, '$.x.y')
> > to crack our logs.
> >
> > A trivial example of a hive query would be:
> > SELECT get_json_object(line, '$.app.hcc.event_name'),
> >        get_json_object(line, '$.unixtime'),
> >        line FROM production_logs_hcc WHERE year='2014' and month='08' and
> > day='21'
> >        and get_json_object(line, '$.level') = 'EVENT'
> >   ORDER BY get_json_object(line, '$.unixtime')
> >        DESC;
> >
> > I am trying to figure out now if there's a way we can use
> > drill to access hive and do the same kind of queries where
> > drill will treat the 'line' column as json as in this example:
> >
> > select line.unixtime as ts,
> >        line.app.hcc.event_name as evt
> > from hive.production_logs_hcc
> > where year='2014' and month='08' and day='21'
> >       and line.level = 'EVENT'
> >       order by ts desc
> >
> > so I started with the simplest test:
> >
> > > select t.line from from hive.production_logs_hcc t --> works fine
> >
> > > select t.line.level from from hive.production_logs_hcc t --> VARCHAR
> > assertion error
> >
> > bummer, so it seams where a column in hbase is implicitly a json, a
> string
> > column
> > in hive is not AND no ammount of attempts to cast the log line to a
> varchar
> > and then
> > access as json makes any difference... (I tried this when I realized that
> > hive string
> > had no drill sql analog
> >
> > Am I just out of luck here, hbase columns are json but hive text columns
> > aren't
> >
> > Thanks for any help / explanation
> > _____________
> >
> > john o schneider
> > [email protected]
> > 408-203-7891
> >
>

Re: Can I use hive table column as json

Reply via email to