Ted,

Agree fully, it definitely seems like a reproducible bug that should be filed 
and resolved.


—Andries



On Jun 22, 2015, at 9:14 AM, Ted Dunning <[email protected]> wrote:

> Andries,
> 
> That sounds like a reasonable suggestion, but the real problem is that it
> appears that having the field initially and then having the field be
> missing is OK, but if it is missing first and then present Drill blows a
> gasket.
> 
> I think it looks like a bug.  Very good and simple demo.
> 
> 
> 
> On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht <
> [email protected]> wrote:
> 
>> A couple of things to try that I have found useful in the past.
>> 
>> Pending if you want inner or outer joins, you may want to look at using
>> predicates to eliminate records that are not relevant to the join and can
>> complicate the work Drill has to do.
>> 
>> ie. add predicate "orders.cool is not null”
>> 
>> Not only does it filter out the records that are not of interest (and can
>> cause other challenges), but normally if you can apply predicates to
>> queries to reduce the working set that Drill has to join it can
>> substantially improve the performance for large data sets. Joins tend to be
>> one of the more expensive operators in any execution engine, where
>> predicates tend to be a much easier operation to execute at large scale.
>> 
>> —Andries
>> 
>> 
>> On Jun 22, 2015, at 7:19 AM, Christopher Matta <[email protected]> wrote:
>> 
>>> I can confirm that this is reproducible:
>>> 
>>> orders/111.json:
>>> 
>>> {
>>>  "tax" : 10,
>>>  "id" : 111,
>>>  "cust_id" : 333,
>>>  "total" : 12,
>>>  "demo" :10
>>> }
>>> 
>>> orders/222.json:
>>> 
>>> {
>>>  "cool": 20,
>>>  "id" : 222,
>>>  "cust_id" : 111,
>>>  "total" : 12
>>> }
>>> 
>>> 1st query:
>>> 
>>> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
>>> . . . . . . . . . . . . . . . . . . . > FROM
>>> `maprfs.cmatta`.`test/customers/*.json` customers,
>>> . . . . . . . . . . . . . . . . . . . >
>>> `maprfs.cmatta`.`test/orders/*.json` orders
>>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
>> orders.cust_id
>>> . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
>>> +------+-------+
>>> |  id  | cool  |
>>> +------+-------+
>>> | 333  | null  |
>>> +------+-------+
>>> 1 row selected (0.258 seconds)
>>> 
>>> Now change orders/111.json by moving the cool field from 222.json to
>>> 111.json:
>>> 
>>> {
>>>  "cool": 20,
>>>  "tax" : 10,
>>>  "id" : 111,
>>>  "cust_id" : 333,
>>>  "total" : 12,
>>>  "demo" :10
>>> }
>>> 
>>> And removing cool from orders/222.json:
>>> 
>>> {
>>>  "id" : 222,
>>>  "cust_id" : 111,
>>>  "total" : 12
>>> }
>>> 
>>> Re-run the query:
>>> 
>>> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
>>> . . . . . . . . . . . . . . . . . . . > FROM
>>> `maprfs.cmatta`.`test/customers/*.json` customers,
>>> . . . . . . . . . . . . . . . . . . . >
>>> `maprfs.cmatta`.`test/orders/*.json` orders
>>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
>> orders.cust_id
>>> . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
>>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
>>> java.lang.IllegalStateException: Failure while reading vector.
>>> Expected vector class of
>>> org.apache.drill.exec.vector.NullableIntVector but was holding vector
>>> class org.apache.drill.exec.vector.NullableVarCharVector.
>>> 
>>> Fragment 0:0
>>> 
>>> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on
>> se-node11.se.lab:31010]
>>>       at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>>>       at
>> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>>>       at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>>>       at sqlline.SqlLine.print(SqlLine.java:1583)
>>>       at sqlline.Commands.execute(Commands.java:852)
>>>       at sqlline.Commands.sql(Commands.java:751)
>>>       at sqlline.SqlLine.dispatch(SqlLine.java:738)
>>>       at sqlline.SqlLine.begin(SqlLine.java:612)
>>>       at sqlline.SqlLine.start(SqlLine.java:366)
>>>       at sqlline.SqlLine.main(SqlLine.java:259)
>>> 
>>> ​
>>> 
>>> Chris Matta
>>> [email protected]
>>> 215-701-3146
>>> 
>>> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <[email protected]>
>> wrote:
>>> 
>>>> Yes.
>>>> 
>>>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <[email protected]>
>>>> wrote:
>>>> 
>>>>> Just to clarify, you run the *exact same query* once and it works, then
>>>>> you remove say the “cool” field from orders/222.json and put it in
>>>>> orders/111.json and the next time the same query returns that error?
>>>>> ​
>>>>> 
>>>>> Chris Matta
>>>>> [email protected]
>>>>> 215-701-3146
>>>>> 
>>>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <[email protected]>
>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> In my use case I have several JSON documents that I need to query
>> using a
>>>>>> join.
>>>>>> The structure of each document can vary a lot (some fields a present
>> or
>>>>>> not
>>>>>> in documents)
>>>>>> 
>>>>>> Sometimes the following exception is raised:
>>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>>>>> java.lang.IllegalStateException: Failure while reading vector.
>> Expected
>>>>>> vector class of org.apache.drill.exec.vector.NullableIntVector but was
>>>>>> holding vector class
>> org.apache.drill.exec.vector.NullableVarCharVector.
>>>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
>>>>>> 192.168.99.13:31010]
>>>>>> 
>>>>>> The queries:
>>>>>> 
>>>>>> Following query works:
>>>>>> -----
>>>>>> SELECT customers.id, orders.demo
>>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
>>>>>> WHERE customers.id = orders.cust_id
>>>>>> AND customers.country = 'FRANCE'
>>>>>> -----
>>>>>> 
>>>>>> Following query FAILS:
>>>>>> -----
>>>>>> SELECT customers.id, orders.cool
>>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
>>>>>> WHERE customers.id = orders.cust_id
>>>>>> AND customers.country = 'FRANCE'
>>>>>> -----
>>>>>> 
>>>>>> 
>>>>>> The documents:
>>>>>> 
>>>>>> Here the files:
>>>>>> 
>>>>>> ./customers/333.json
>>>>>> {
>>>>>> "id" : 333,
>>>>>> "name" : "Dave Smith",
>>>>>> "country" : "FRANCE"
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> ./orders/111.json
>>>>>> {
>>>>>> "tax" : 10,
>>>>>> "id" : 111,
>>>>>> "cust_id" : 333,
>>>>>> "total" : 12,
>>>>>> "demo" :10
>>>>>> }
>>>>>> 
>>>>>> ./orders/222.json
>>>>>> {
>>>>>> "cool":20,
>>>>>> "id" : 222,
>>>>>> "cust_id" : 111,
>>>>>> "total" : 12
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> To reproduce the bug you may have to change the document (add/remove
>>>>>> cool,
>>>>>> tax fields)
>>>>>> 
>>>>>> It looks like the schema is not "updated" on the fly in some case.
>>>>>> 
>>>>>> Any idea how to workaround? Is that bug?
>>>>>> 
>>>>>> Regards
>>>>>> Tug
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 

Reply via email to