Great. On Mon, Jun 22, 2015 at 11:45 AM, Tugdual Grall <[email protected]> wrote:
> Thanks for your help. > > If I use * I have another exception: > > -- > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > org.apache.drill.exec.exception.SchemaChangeException: Hash join does not > support schema changes Fragment 0:0 [Error Id: > 0b20d580-37a3-491a-9987-4d04fb6f2d43 on maprdemo:31010] > -- > > Creating the JIRA as we speak... > > Tug > > > On Mon, Jun 22, 2015 at 8:41 PM, Hanifi Gunes <[email protected]> wrote: > > > This is a bug in handling schema change. I would not expect this to > happen > > in case your second query had select * rather than an explicit > projection, > > select t.cool. Can you file a JIRA? > > > > On Mon, Jun 22, 2015 at 9:20 AM, Andries Engelbrecht < > > [email protected]> wrote: > > > > > Ted, > > > > > > Agree fully, it definitely seems like a reproducible bug that should be > > > filed and resolved. > > > > > > > > > —Andries > > > > > > > > > > > > On Jun 22, 2015, at 9:14 AM, Ted Dunning <[email protected]> > wrote: > > > > > > > Andries, > > > > > > > > That sounds like a reasonable suggestion, but the real problem is > that > > it > > > > appears that having the field initially and then having the field be > > > > missing is OK, but if it is missing first and then present Drill > blows > > a > > > > gasket. > > > > > > > > I think it looks like a bug. Very good and simple demo. > > > > > > > > > > > > > > > > On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht < > > > > [email protected]> wrote: > > > > > > > >> A couple of things to try that I have found useful in the past. > > > >> > > > >> Pending if you want inner or outer joins, you may want to look at > > using > > > >> predicates to eliminate records that are not relevant to the join > and > > > can > > > >> complicate the work Drill has to do. > > > >> > > > >> ie. add predicate "orders.cool is not null” > > > >> > > > >> Not only does it filter out the records that are not of interest > (and > > > can > > > >> cause other challenges), but normally if you can apply predicates to > > > >> queries to reduce the working set that Drill has to join it can > > > >> substantially improve the performance for large data sets. Joins > tend > > > to be > > > >> one of the more expensive operators in any execution engine, where > > > >> predicates tend to be a much easier operation to execute at large > > scale. > > > >> > > > >> —Andries > > > >> > > > >> > > > >> On Jun 22, 2015, at 7:19 AM, Christopher Matta <[email protected]> > > wrote: > > > >> > > > >>> I can confirm that this is reproducible: > > > >>> > > > >>> orders/111.json: > > > >>> > > > >>> { > > > >>> "tax" : 10, > > > >>> "id" : 111, > > > >>> "cust_id" : 333, > > > >>> "total" : 12, > > > >>> "demo" :10 > > > >>> } > > > >>> > > > >>> orders/222.json: > > > >>> > > > >>> { > > > >>> "cool": 20, > > > >>> "id" : 222, > > > >>> "cust_id" : 111, > > > >>> "total" : 12 > > > >>> } > > > >>> > > > >>> 1st query: > > > >>> > > > >>> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, > > > orders.cool > > > >>> . . . . . . . . . . . . . . . . . . . > FROM > > > >>> `maprfs.cmatta`.`test/customers/*.json` customers, > > > >>> . . . . . . . . . . . . . . . . . . . > > > > >>> `maprfs.cmatta`.`test/orders/*.json` orders > > > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id = > > > >> orders.cust_id > > > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country = > > > 'FRANCE'; > > > >>> +------+-------+ > > > >>> | id | cool | > > > >>> +------+-------+ > > > >>> | 333 | null | > > > >>> +------+-------+ > > > >>> 1 row selected (0.258 seconds) > > > >>> > > > >>> Now change orders/111.json by moving the cool field from 222.json > to > > > >>> 111.json: > > > >>> > > > >>> { > > > >>> "cool": 20, > > > >>> "tax" : 10, > > > >>> "id" : 111, > > > >>> "cust_id" : 333, > > > >>> "total" : 12, > > > >>> "demo" :10 > > > >>> } > > > >>> > > > >>> And removing cool from orders/222.json: > > > >>> > > > >>> { > > > >>> "id" : 222, > > > >>> "cust_id" : 111, > > > >>> "total" : 12 > > > >>> } > > > >>> > > > >>> Re-run the query: > > > >>> > > > >>> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, > > > orders.cool > > > >>> . . . . . . . . . . . . . . . . . . . > FROM > > > >>> `maprfs.cmatta`.`test/customers/*.json` customers, > > > >>> . . . . . . . . . . . . . . . . . . . > > > > >>> `maprfs.cmatta`.`test/orders/*.json` orders > > > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id = > > > >> orders.cust_id > > > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country = > > > 'FRANCE'; > > > >>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: > > > >>> java.lang.IllegalStateException: Failure while reading vector. > > > >>> Expected vector class of > > > >>> org.apache.drill.exec.vector.NullableIntVector but was holding > vector > > > >>> class org.apache.drill.exec.vector.NullableVarCharVector. > > > >>> > > > >>> Fragment 0:0 > > > >>> > > > >>> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on > > > >> se-node11.se.lab:31010] > > > >>> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) > > > >>> at > > > >> > > > > > > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > > > >>> at > sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > > > >>> at sqlline.SqlLine.print(SqlLine.java:1583) > > > >>> at sqlline.Commands.execute(Commands.java:852) > > > >>> at sqlline.Commands.sql(Commands.java:751) > > > >>> at sqlline.SqlLine.dispatch(SqlLine.java:738) > > > >>> at sqlline.SqlLine.begin(SqlLine.java:612) > > > >>> at sqlline.SqlLine.start(SqlLine.java:366) > > > >>> at sqlline.SqlLine.main(SqlLine.java:259) > > > >>> > > > >>> > > > >>> > > > >>> Chris Matta > > > >>> [email protected] > > > >>> 215-701-3146 > > > >>> > > > >>> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <[email protected] > > > > > >> wrote: > > > >>> > > > >>>> Yes. > > > >>>> > > > >>>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta < > [email protected] > > > > > > >>>> wrote: > > > >>>> > > > >>>>> Just to clarify, you run the *exact same query* once and it > works, > > > then > > > >>>>> you remove say the “cool” field from orders/222.json and put it > in > > > >>>>> orders/111.json and the next time the same query returns that > > error? > > > >>>>> > > > >>>>> > > > >>>>> Chris Matta > > > >>>>> [email protected] > > > >>>>> 215-701-3146 > > > >>>>> > > > >>>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall < > [email protected]> > > > >> wrote: > > > >>>>> > > > >>>>>> Hello, > > > >>>>>> > > > >>>>>> In my use case I have several JSON documents that I need to > query > > > >> using a > > > >>>>>> join. > > > >>>>>> The structure of each document can vary a lot (some fields a > > present > > > >> or > > > >>>>>> not > > > >>>>>> in documents) > > > >>>>>> > > > >>>>>> Sometimes the following exception is raised: > > > >>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > > > ERROR: > > > >>>>>> java.lang.IllegalStateException: Failure while reading vector. > > > >> Expected > > > >>>>>> vector class of org.apache.drill.exec.vector.NullableIntVector > but > > > was > > > >>>>>> holding vector class > > > >> org.apache.drill.exec.vector.NullableVarCharVector. > > > >>>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on > > > >>>>>> 192.168.99.13:31010] > > > >>>>>> > > > >>>>>> The queries: > > > >>>>>> > > > >>>>>> Following query works: > > > >>>>>> ----- > > > >>>>>> SELECT customers.id, orders.demo > > > >>>>>> FROM dfs.`/Users/tgrall/working/customers/*.json` customers, > > > >>>>>> dfs.`/Users/tgrall/working/orders/*.json` orders > > > >>>>>> WHERE customers.id = orders.cust_id > > > >>>>>> AND customers.country = 'FRANCE' > > > >>>>>> ----- > > > >>>>>> > > > >>>>>> Following query FAILS: > > > >>>>>> ----- > > > >>>>>> SELECT customers.id, orders.cool > > > >>>>>> FROM dfs.`/Users/tgrall/working/customers/*.json` customers, > > > >>>>>> dfs.`/Users/tgrall/working/orders/*.json` orders > > > >>>>>> WHERE customers.id = orders.cust_id > > > >>>>>> AND customers.country = 'FRANCE' > > > >>>>>> ----- > > > >>>>>> > > > >>>>>> > > > >>>>>> The documents: > > > >>>>>> > > > >>>>>> Here the files: > > > >>>>>> > > > >>>>>> ./customers/333.json > > > >>>>>> { > > > >>>>>> "id" : 333, > > > >>>>>> "name" : "Dave Smith", > > > >>>>>> "country" : "FRANCE" > > > >>>>>> } > > > >>>>>> > > > >>>>>> > > > >>>>>> ./orders/111.json > > > >>>>>> { > > > >>>>>> "tax" : 10, > > > >>>>>> "id" : 111, > > > >>>>>> "cust_id" : 333, > > > >>>>>> "total" : 12, > > > >>>>>> "demo" :10 > > > >>>>>> } > > > >>>>>> > > > >>>>>> ./orders/222.json > > > >>>>>> { > > > >>>>>> "cool":20, > > > >>>>>> "id" : 222, > > > >>>>>> "cust_id" : 111, > > > >>>>>> "total" : 12 > > > >>>>>> } > > > >>>>>> > > > >>>>>> > > > >>>>>> To reproduce the bug you may have to change the document > > (add/remove > > > >>>>>> cool, > > > >>>>>> tax fields) > > > >>>>>> > > > >>>>>> It looks like the schema is not "updated" on the fly in some > case. > > > >>>>>> > > > >>>>>> Any idea how to workaround? Is that bug? > > > >>>>>> > > > >>>>>> Regards > > > >>>>>> Tug > > > >>>>>> > > > >>>>> > > > >>>>> > > > >>>> > > > >> > > > >> > > > > > > > > >
