Hi All,
I'm trying to use case statements to manage a heterogeneous stream of json
objects as
shown in the example from
https://drill.apache.org/blog/2015/11/23/drill-1.3-released/
but I'm not getting any love yet. drill 1.1 -> 1.3 is chock full of
goodness and case statements will help me with the last real hurdles I have
using drill with my logs.
Would you please review the tests I created below and tell me if I'm just
missing something obvious?
Thanks
/jos
## first test, two lines, one with a field that's a string and second
field is a map
## first lets just select all records, I expect this to barf since there
are two schemas
: jdbc:drill:zk=local> select * from
dfs.`/Users/jos/work/drill/casetest.json` t ;
Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
are using a ValueWriter of type NullableVarCharWriterImpl.
File /Users/jos/work/drill/casetest.json
Record 2
Fragment 0:0
[Error Id: 1385aea5-68cb-4775-ae17-fad6b4901ea6 on 10.0.1.9:31010]
(state=,code=0)
## now lets use a case statement to sort out the schemas, I don't expect
this to
## barf but barf it does, seems like this should have worked, what am I
missing
0: jdbc:drill:zk=local> select case when is_map(t.user_info.`user`) then
'map' else 'string' end from dfs.`/Users/jos/work/drill/casetest.json` t ;
Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
are using a ValueWriter of type NullableVarCharWriterImpl.
File /Users/jos/Downloads/2015-11-30-bad-3.json
Record 2
Fragment 0:0
[Error Id: 872a5347-93dd-49ae-a55c-e861b807b4a6 on 10.0.1.9:31010]
(state=,code=0)
0: jdbc:drill:zk=local>
## data I used is this
## casetest.json has two lines in it
{"level":"EVENT","time":1448844983160,"user_info":{"session":"9OOLJ8HEGEQ0sTCVSXsK9ddJWVpFM5wM","user":"
[email protected]"}}
{"level":"EVENT","time":1448844983160,"user_info":{"session":"9OOLJ8HEGEQ0sTCVSXsK9ddJWVpFM5wM","user":{"id":"
[email protected]
","roles":null,"isNotadmins":true,"iscoders":true}}}
## now lets see if any case will work on any structure
## new test file with same line in it twice
## select * works as expected
0: jdbc:drill:zk=local> select * from
dfs.`/Users/jos/work/drill/testcase2.json` t ;
+-------+------+-----------+
| level | time | user_info |
+-------+------+-----------+
| EVENT | 1448844983160 |
{"session":"9OOLJ8HEGEQ0sTCVSXsK9ddJWVpFM5wM","user":{"id":"
[email protected]","isNotadmins":true,"iscoders":true}} |
| EVENT | 1448844983160 |
{"session":"9OOLJ8HEGEQ0sTCVSXsK9ddJWVpFM5wM","user":{"id":"
[email protected]","isNotadmins":true,"iscoders":true}} |
+-------+------+-----------+
2 rows selected (1.701 seconds)
## now lets try to use the line in a case statement
## it doesn't work, but we get different more puzzling errors this time
0: jdbc:drill:zk=local> select case when is_map(t.user_info.`user`) then
'map' else 'string' end from dfs.`/Users/jos/work/drill/testcase2.json` t ;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to
materialize incoming schema. Errors:
Error in expression at index -1. Error: Missing function implementation:
[is_map(MAP-REQUIRED)]. Full expression: --UNKNOWN EXPRESSION--.
Error in expression at index -1. Error: Failure composing If Expression.
All conditions must return a boolean type. Condition was of Type NULL..
Full expression: --UNKNOWN EXPRESSION--..
Fragment 0:0
[Error Id: c3a7f989-4d93-48c0-9a16-a38dd195314c on 10.19.220.63:31010]
(state=,code=0)
0: jdbc:drill:zk=local>
## data I used is this test
## casetest2.json has two lines in it
{"level":"EVENT","time":1448844983160,"user_info":{"session":"9OOLJ8HEGEQ0sTCVSXsK9ddJWVpFM5wM","user":{"id":"
[email protected]
","roles":null,"isNotadmins":true,"iscoders":true}}}
{"level":"EVENT","time":1448844983160,"user_info":{"session":"9OOLJ8HEGEQ0sTCVSXsK9ddJWVpFM5wM","user":{"id":"
[email protected]
","roles":null,"isNotadmins":true,"iscoders":true}}}
_____________
john o schneider
[email protected]
408-203-7891