A couple of things to try that I have found useful in the past. Pending if you want inner or outer joins, you may want to look at using predicates to eliminate records that are not relevant to the join and can complicate the work Drill has to do.
ie. add predicate "orders.cool is not null” Not only does it filter out the records that are not of interest (and can cause other challenges), but normally if you can apply predicates to queries to reduce the working set that Drill has to join it can substantially improve the performance for large data sets. Joins tend to be one of the more expensive operators in any execution engine, where predicates tend to be a much easier operation to execute at large scale. —Andries On Jun 22, 2015, at 7:19 AM, Christopher Matta <[email protected]> wrote: > I can confirm that this is reproducible: > > orders/111.json: > > { > "tax" : 10, > "id" : 111, > "cust_id" : 333, > "total" : 12, > "demo" :10 > } > > orders/222.json: > > { > "cool": 20, > "id" : 222, > "cust_id" : 111, > "total" : 12 > } > > 1st query: > > 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool > . . . . . . . . . . . . . . . . . . . > FROM > `maprfs.cmatta`.`test/customers/*.json` customers, > . . . . . . . . . . . . . . . . . . . > > `maprfs.cmatta`.`test/orders/*.json` orders > . . . . . . . . . . . . . . . . . . . > WHERE customers.id = orders.cust_id > . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE'; > +------+-------+ > | id | cool | > +------+-------+ > | 333 | null | > +------+-------+ > 1 row selected (0.258 seconds) > > Now change orders/111.json by moving the cool field from 222.json to > 111.json: > > { > "cool": 20, > "tax" : 10, > "id" : 111, > "cust_id" : 333, > "total" : 12, > "demo" :10 > } > > And removing cool from orders/222.json: > > { > "id" : 222, > "cust_id" : 111, > "total" : 12 > } > > Re-run the query: > > : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool > . . . . . . . . . . . . . . . . . . . > FROM > `maprfs.cmatta`.`test/customers/*.json` customers, > . . . . . . . . . . . . . . . . . . . > > `maprfs.cmatta`.`test/orders/*.json` orders > . . . . . . . . . . . . . . . . . . . > WHERE customers.id = orders.cust_id > . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE'; > java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: > java.lang.IllegalStateException: Failure while reading vector. > Expected vector class of > org.apache.drill.exec.vector.NullableIntVector but was holding vector > class org.apache.drill.exec.vector.NullableVarCharVector. > > Fragment 0:0 > > [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on se-node11.se.lab:31010] > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) > at > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > at sqlline.SqlLine.print(SqlLine.java:1583) > at sqlline.Commands.execute(Commands.java:852) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:738) > at sqlline.SqlLine.begin(SqlLine.java:612) > at sqlline.SqlLine.start(SqlLine.java:366) > at sqlline.SqlLine.main(SqlLine.java:259) > > > > Chris Matta > [email protected] > 215-701-3146 > > On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <[email protected]> wrote: > >> Yes. >> >> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <[email protected]> >> wrote: >> >>> Just to clarify, you run the *exact same query* once and it works, then >>> you remove say the “cool” field from orders/222.json and put it in >>> orders/111.json and the next time the same query returns that error? >>> >>> >>> Chris Matta >>> [email protected] >>> 215-701-3146 >>> >>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <[email protected]> wrote: >>> >>>> Hello, >>>> >>>> In my use case I have several JSON documents that I need to query using a >>>> join. >>>> The structure of each document can vary a lot (some fields a present or >>>> not >>>> in documents) >>>> >>>> Sometimes the following exception is raised: >>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: >>>> java.lang.IllegalStateException: Failure while reading vector. Expected >>>> vector class of org.apache.drill.exec.vector.NullableIntVector but was >>>> holding vector class org.apache.drill.exec.vector.NullableVarCharVector. >>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on >>>> 192.168.99.13:31010] >>>> >>>> The queries: >>>> >>>> Following query works: >>>> ----- >>>> SELECT customers.id, orders.demo >>>> FROM dfs.`/Users/tgrall/working/customers/*.json` customers, >>>> dfs.`/Users/tgrall/working/orders/*.json` orders >>>> WHERE customers.id = orders.cust_id >>>> AND customers.country = 'FRANCE' >>>> ----- >>>> >>>> Following query FAILS: >>>> ----- >>>> SELECT customers.id, orders.cool >>>> FROM dfs.`/Users/tgrall/working/customers/*.json` customers, >>>> dfs.`/Users/tgrall/working/orders/*.json` orders >>>> WHERE customers.id = orders.cust_id >>>> AND customers.country = 'FRANCE' >>>> ----- >>>> >>>> >>>> The documents: >>>> >>>> Here the files: >>>> >>>> ./customers/333.json >>>> { >>>> "id" : 333, >>>> "name" : "Dave Smith", >>>> "country" : "FRANCE" >>>> } >>>> >>>> >>>> ./orders/111.json >>>> { >>>> "tax" : 10, >>>> "id" : 111, >>>> "cust_id" : 333, >>>> "total" : 12, >>>> "demo" :10 >>>> } >>>> >>>> ./orders/222.json >>>> { >>>> "cool":20, >>>> "id" : 222, >>>> "cust_id" : 111, >>>> "total" : 12 >>>> } >>>> >>>> >>>> To reproduce the bug you may have to change the document (add/remove >>>> cool, >>>> tax fields) >>>> >>>> It looks like the schema is not "updated" on the fly in some case. >>>> >>>> Any idea how to workaround? Is that bug? >>>> >>>> Regards >>>> Tug >>>> >>> >>> >>
