Thanks for your help. If I use * I have another exception:
-- org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: Hash join does not support schema changes Fragment 0:0 [Error Id: 0b20d580-37a3-491a-9987-4d04fb6f2d43 on maprdemo:31010] -- Creating the JIRA as we speak... Tug On Mon, Jun 22, 2015 at 8:41 PM, Hanifi Gunes <[email protected]> wrote: > This is a bug in handling schema change. I would not expect this to happen > in case your second query had select * rather than an explicit projection, > select t.cool. Can you file a JIRA? > > On Mon, Jun 22, 2015 at 9:20 AM, Andries Engelbrecht < > [email protected]> wrote: > > > Ted, > > > > Agree fully, it definitely seems like a reproducible bug that should be > > filed and resolved. > > > > > > —Andries > > > > > > > > On Jun 22, 2015, at 9:14 AM, Ted Dunning <[email protected]> wrote: > > > > > Andries, > > > > > > That sounds like a reasonable suggestion, but the real problem is that > it > > > appears that having the field initially and then having the field be > > > missing is OK, but if it is missing first and then present Drill blows > a > > > gasket. > > > > > > I think it looks like a bug. Very good and simple demo. > > > > > > > > > > > > On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht < > > > [email protected]> wrote: > > > > > >> A couple of things to try that I have found useful in the past. > > >> > > >> Pending if you want inner or outer joins, you may want to look at > using > > >> predicates to eliminate records that are not relevant to the join and > > can > > >> complicate the work Drill has to do. > > >> > > >> ie. add predicate "orders.cool is not null” > > >> > > >> Not only does it filter out the records that are not of interest (and > > can > > >> cause other challenges), but normally if you can apply predicates to > > >> queries to reduce the working set that Drill has to join it can > > >> substantially improve the performance for large data sets. Joins tend > > to be > > >> one of the more expensive operators in any execution engine, where > > >> predicates tend to be a much easier operation to execute at large > scale. > > >> > > >> —Andries > > >> > > >> > > >> On Jun 22, 2015, at 7:19 AM, Christopher Matta <[email protected]> > wrote: > > >> > > >>> I can confirm that this is reproducible: > > >>> > > >>> orders/111.json: > > >>> > > >>> { > > >>> "tax" : 10, > > >>> "id" : 111, > > >>> "cust_id" : 333, > > >>> "total" : 12, > > >>> "demo" :10 > > >>> } > > >>> > > >>> orders/222.json: > > >>> > > >>> { > > >>> "cool": 20, > > >>> "id" : 222, > > >>> "cust_id" : 111, > > >>> "total" : 12 > > >>> } > > >>> > > >>> 1st query: > > >>> > > >>> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, > > orders.cool > > >>> . . . . . . . . . . . . . . . . . . . > FROM > > >>> `maprfs.cmatta`.`test/customers/*.json` customers, > > >>> . . . . . . . . . . . . . . . . . . . > > > >>> `maprfs.cmatta`.`test/orders/*.json` orders > > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id = > > >> orders.cust_id > > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country = > > 'FRANCE'; > > >>> +------+-------+ > > >>> | id | cool | > > >>> +------+-------+ > > >>> | 333 | null | > > >>> +------+-------+ > > >>> 1 row selected (0.258 seconds) > > >>> > > >>> Now change orders/111.json by moving the cool field from 222.json to > > >>> 111.json: > > >>> > > >>> { > > >>> "cool": 20, > > >>> "tax" : 10, > > >>> "id" : 111, > > >>> "cust_id" : 333, > > >>> "total" : 12, > > >>> "demo" :10 > > >>> } > > >>> > > >>> And removing cool from orders/222.json: > > >>> > > >>> { > > >>> "id" : 222, > > >>> "cust_id" : 111, > > >>> "total" : 12 > > >>> } > > >>> > > >>> Re-run the query: > > >>> > > >>> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, > > orders.cool > > >>> . . . . . . . . . . . . . . . . . . . > FROM > > >>> `maprfs.cmatta`.`test/customers/*.json` customers, > > >>> . . . . . . . . . . . . . . . . . . . > > > >>> `maprfs.cmatta`.`test/orders/*.json` orders > > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id = > > >> orders.cust_id > > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country = > > 'FRANCE'; > > >>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: > > >>> java.lang.IllegalStateException: Failure while reading vector. > > >>> Expected vector class of > > >>> org.apache.drill.exec.vector.NullableIntVector but was holding vector > > >>> class org.apache.drill.exec.vector.NullableVarCharVector. > > >>> > > >>> Fragment 0:0 > > >>> > > >>> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on > > >> se-node11.se.lab:31010] > > >>> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) > > >>> at > > >> > > > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > > >>> at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > > >>> at sqlline.SqlLine.print(SqlLine.java:1583) > > >>> at sqlline.Commands.execute(Commands.java:852) > > >>> at sqlline.Commands.sql(Commands.java:751) > > >>> at sqlline.SqlLine.dispatch(SqlLine.java:738) > > >>> at sqlline.SqlLine.begin(SqlLine.java:612) > > >>> at sqlline.SqlLine.start(SqlLine.java:366) > > >>> at sqlline.SqlLine.main(SqlLine.java:259) > > >>> > > >>> > > >>> > > >>> Chris Matta > > >>> [email protected] > > >>> 215-701-3146 > > >>> > > >>> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <[email protected]> > > >> wrote: > > >>> > > >>>> Yes. > > >>>> > > >>>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <[email protected] > > > > >>>> wrote: > > >>>> > > >>>>> Just to clarify, you run the *exact same query* once and it works, > > then > > >>>>> you remove say the “cool” field from orders/222.json and put it in > > >>>>> orders/111.json and the next time the same query returns that > error? > > >>>>> > > >>>>> > > >>>>> Chris Matta > > >>>>> [email protected] > > >>>>> 215-701-3146 > > >>>>> > > >>>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <[email protected]> > > >> wrote: > > >>>>> > > >>>>>> Hello, > > >>>>>> > > >>>>>> In my use case I have several JSON documents that I need to query > > >> using a > > >>>>>> join. > > >>>>>> The structure of each document can vary a lot (some fields a > present > > >> or > > >>>>>> not > > >>>>>> in documents) > > >>>>>> > > >>>>>> Sometimes the following exception is raised: > > >>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > > ERROR: > > >>>>>> java.lang.IllegalStateException: Failure while reading vector. > > >> Expected > > >>>>>> vector class of org.apache.drill.exec.vector.NullableIntVector but > > was > > >>>>>> holding vector class > > >> org.apache.drill.exec.vector.NullableVarCharVector. > > >>>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on > > >>>>>> 192.168.99.13:31010] > > >>>>>> > > >>>>>> The queries: > > >>>>>> > > >>>>>> Following query works: > > >>>>>> ----- > > >>>>>> SELECT customers.id, orders.demo > > >>>>>> FROM dfs.`/Users/tgrall/working/customers/*.json` customers, > > >>>>>> dfs.`/Users/tgrall/working/orders/*.json` orders > > >>>>>> WHERE customers.id = orders.cust_id > > >>>>>> AND customers.country = 'FRANCE' > > >>>>>> ----- > > >>>>>> > > >>>>>> Following query FAILS: > > >>>>>> ----- > > >>>>>> SELECT customers.id, orders.cool > > >>>>>> FROM dfs.`/Users/tgrall/working/customers/*.json` customers, > > >>>>>> dfs.`/Users/tgrall/working/orders/*.json` orders > > >>>>>> WHERE customers.id = orders.cust_id > > >>>>>> AND customers.country = 'FRANCE' > > >>>>>> ----- > > >>>>>> > > >>>>>> > > >>>>>> The documents: > > >>>>>> > > >>>>>> Here the files: > > >>>>>> > > >>>>>> ./customers/333.json > > >>>>>> { > > >>>>>> "id" : 333, > > >>>>>> "name" : "Dave Smith", > > >>>>>> "country" : "FRANCE" > > >>>>>> } > > >>>>>> > > >>>>>> > > >>>>>> ./orders/111.json > > >>>>>> { > > >>>>>> "tax" : 10, > > >>>>>> "id" : 111, > > >>>>>> "cust_id" : 333, > > >>>>>> "total" : 12, > > >>>>>> "demo" :10 > > >>>>>> } > > >>>>>> > > >>>>>> ./orders/222.json > > >>>>>> { > > >>>>>> "cool":20, > > >>>>>> "id" : 222, > > >>>>>> "cust_id" : 111, > > >>>>>> "total" : 12 > > >>>>>> } > > >>>>>> > > >>>>>> > > >>>>>> To reproduce the bug you may have to change the document > (add/remove > > >>>>>> cool, > > >>>>>> tax fields) > > >>>>>> > > >>>>>> It looks like the schema is not "updated" on the fly in some case. > > >>>>>> > > >>>>>> Any idea how to workaround? Is that bug? > > >>>>>> > > >>>>>> Regards > > >>>>>> Tug > > >>>>>> > > >>>>> > > >>>>> > > >>>> > > >> > > >> > > > > >
