Ted, Agree fully, it definitely seems like a reproducible bug that should be filed and resolved.
—Andries On Jun 22, 2015, at 9:14 AM, Ted Dunning <[email protected]> wrote: > Andries, > > That sounds like a reasonable suggestion, but the real problem is that it > appears that having the field initially and then having the field be > missing is OK, but if it is missing first and then present Drill blows a > gasket. > > I think it looks like a bug. Very good and simple demo. > > > > On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht < > [email protected]> wrote: > >> A couple of things to try that I have found useful in the past. >> >> Pending if you want inner or outer joins, you may want to look at using >> predicates to eliminate records that are not relevant to the join and can >> complicate the work Drill has to do. >> >> ie. add predicate "orders.cool is not null” >> >> Not only does it filter out the records that are not of interest (and can >> cause other challenges), but normally if you can apply predicates to >> queries to reduce the working set that Drill has to join it can >> substantially improve the performance for large data sets. Joins tend to be >> one of the more expensive operators in any execution engine, where >> predicates tend to be a much easier operation to execute at large scale. >> >> —Andries >> >> >> On Jun 22, 2015, at 7:19 AM, Christopher Matta <[email protected]> wrote: >> >>> I can confirm that this is reproducible: >>> >>> orders/111.json: >>> >>> { >>> "tax" : 10, >>> "id" : 111, >>> "cust_id" : 333, >>> "total" : 12, >>> "demo" :10 >>> } >>> >>> orders/222.json: >>> >>> { >>> "cool": 20, >>> "id" : 222, >>> "cust_id" : 111, >>> "total" : 12 >>> } >>> >>> 1st query: >>> >>> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool >>> . . . . . . . . . . . . . . . . . . . > FROM >>> `maprfs.cmatta`.`test/customers/*.json` customers, >>> . . . . . . . . . . . . . . . . . . . > >>> `maprfs.cmatta`.`test/orders/*.json` orders >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id = >> orders.cust_id >>> . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE'; >>> +------+-------+ >>> | id | cool | >>> +------+-------+ >>> | 333 | null | >>> +------+-------+ >>> 1 row selected (0.258 seconds) >>> >>> Now change orders/111.json by moving the cool field from 222.json to >>> 111.json: >>> >>> { >>> "cool": 20, >>> "tax" : 10, >>> "id" : 111, >>> "cust_id" : 333, >>> "total" : 12, >>> "demo" :10 >>> } >>> >>> And removing cool from orders/222.json: >>> >>> { >>> "id" : 222, >>> "cust_id" : 111, >>> "total" : 12 >>> } >>> >>> Re-run the query: >>> >>> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool >>> . . . . . . . . . . . . . . . . . . . > FROM >>> `maprfs.cmatta`.`test/customers/*.json` customers, >>> . . . . . . . . . . . . . . . . . . . > >>> `maprfs.cmatta`.`test/orders/*.json` orders >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id = >> orders.cust_id >>> . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE'; >>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: >>> java.lang.IllegalStateException: Failure while reading vector. >>> Expected vector class of >>> org.apache.drill.exec.vector.NullableIntVector but was holding vector >>> class org.apache.drill.exec.vector.NullableVarCharVector. >>> >>> Fragment 0:0 >>> >>> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on >> se-node11.se.lab:31010] >>> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) >>> at >> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) >>> at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) >>> at sqlline.SqlLine.print(SqlLine.java:1583) >>> at sqlline.Commands.execute(Commands.java:852) >>> at sqlline.Commands.sql(Commands.java:751) >>> at sqlline.SqlLine.dispatch(SqlLine.java:738) >>> at sqlline.SqlLine.begin(SqlLine.java:612) >>> at sqlline.SqlLine.start(SqlLine.java:366) >>> at sqlline.SqlLine.main(SqlLine.java:259) >>> >>> >>> >>> Chris Matta >>> [email protected] >>> 215-701-3146 >>> >>> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <[email protected]> >> wrote: >>> >>>> Yes. >>>> >>>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <[email protected]> >>>> wrote: >>>> >>>>> Just to clarify, you run the *exact same query* once and it works, then >>>>> you remove say the “cool” field from orders/222.json and put it in >>>>> orders/111.json and the next time the same query returns that error? >>>>> >>>>> >>>>> Chris Matta >>>>> [email protected] >>>>> 215-701-3146 >>>>> >>>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <[email protected]> >> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> In my use case I have several JSON documents that I need to query >> using a >>>>>> join. >>>>>> The structure of each document can vary a lot (some fields a present >> or >>>>>> not >>>>>> in documents) >>>>>> >>>>>> Sometimes the following exception is raised: >>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: >>>>>> java.lang.IllegalStateException: Failure while reading vector. >> Expected >>>>>> vector class of org.apache.drill.exec.vector.NullableIntVector but was >>>>>> holding vector class >> org.apache.drill.exec.vector.NullableVarCharVector. >>>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on >>>>>> 192.168.99.13:31010] >>>>>> >>>>>> The queries: >>>>>> >>>>>> Following query works: >>>>>> ----- >>>>>> SELECT customers.id, orders.demo >>>>>> FROM dfs.`/Users/tgrall/working/customers/*.json` customers, >>>>>> dfs.`/Users/tgrall/working/orders/*.json` orders >>>>>> WHERE customers.id = orders.cust_id >>>>>> AND customers.country = 'FRANCE' >>>>>> ----- >>>>>> >>>>>> Following query FAILS: >>>>>> ----- >>>>>> SELECT customers.id, orders.cool >>>>>> FROM dfs.`/Users/tgrall/working/customers/*.json` customers, >>>>>> dfs.`/Users/tgrall/working/orders/*.json` orders >>>>>> WHERE customers.id = orders.cust_id >>>>>> AND customers.country = 'FRANCE' >>>>>> ----- >>>>>> >>>>>> >>>>>> The documents: >>>>>> >>>>>> Here the files: >>>>>> >>>>>> ./customers/333.json >>>>>> { >>>>>> "id" : 333, >>>>>> "name" : "Dave Smith", >>>>>> "country" : "FRANCE" >>>>>> } >>>>>> >>>>>> >>>>>> ./orders/111.json >>>>>> { >>>>>> "tax" : 10, >>>>>> "id" : 111, >>>>>> "cust_id" : 333, >>>>>> "total" : 12, >>>>>> "demo" :10 >>>>>> } >>>>>> >>>>>> ./orders/222.json >>>>>> { >>>>>> "cool":20, >>>>>> "id" : 222, >>>>>> "cust_id" : 111, >>>>>> "total" : 12 >>>>>> } >>>>>> >>>>>> >>>>>> To reproduce the bug you may have to change the document (add/remove >>>>>> cool, >>>>>> tax fields) >>>>>> >>>>>> It looks like the schema is not "updated" on the fly in some case. >>>>>> >>>>>> Any idea how to workaround? Is that bug? >>>>>> >>>>>> Regards >>>>>> Tug >>>>>> >>>>> >>>>> >>>> >> >>
