Thanks for your help.

If I use * I have another exception:

--
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
org.apache.drill.exec.exception.SchemaChangeException: Hash join does not
support schema changes Fragment 0:0 [Error Id:
0b20d580-37a3-491a-9987-4d04fb6f2d43 on maprdemo:31010]
--

Creating the JIRA as we speak...

Tug


On Mon, Jun 22, 2015 at 8:41 PM, Hanifi Gunes <[email protected]> wrote:

> This is a bug in handling schema change. I would not expect this to happen
> in case your second query had select * rather than an explicit projection,
> select t.cool. Can you file a JIRA?
>
> On Mon, Jun 22, 2015 at 9:20 AM, Andries Engelbrecht <
> [email protected]> wrote:
>
> > Ted,
> >
> > Agree fully, it definitely seems like a reproducible bug that should be
> > filed and resolved.
> >
> >
> > —Andries
> >
> >
> >
> > On Jun 22, 2015, at 9:14 AM, Ted Dunning <[email protected]> wrote:
> >
> > > Andries,
> > >
> > > That sounds like a reasonable suggestion, but the real problem is that
> it
> > > appears that having the field initially and then having the field be
> > > missing is OK, but if it is missing first and then present Drill blows
> a
> > > gasket.
> > >
> > > I think it looks like a bug.  Very good and simple demo.
> > >
> > >
> > >
> > > On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht <
> > > [email protected]> wrote:
> > >
> > >> A couple of things to try that I have found useful in the past.
> > >>
> > >> Pending if you want inner or outer joins, you may want to look at
> using
> > >> predicates to eliminate records that are not relevant to the join and
> > can
> > >> complicate the work Drill has to do.
> > >>
> > >> ie. add predicate "orders.cool is not null”
> > >>
> > >> Not only does it filter out the records that are not of interest (and
> > can
> > >> cause other challenges), but normally if you can apply predicates to
> > >> queries to reduce the working set that Drill has to join it can
> > >> substantially improve the performance for large data sets. Joins tend
> > to be
> > >> one of the more expensive operators in any execution engine, where
> > >> predicates tend to be a much easier operation to execute at large
> scale.
> > >>
> > >> —Andries
> > >>
> > >>
> > >> On Jun 22, 2015, at 7:19 AM, Christopher Matta <[email protected]>
> wrote:
> > >>
> > >>> I can confirm that this is reproducible:
> > >>>
> > >>> orders/111.json:
> > >>>
> > >>> {
> > >>>  "tax" : 10,
> > >>>  "id" : 111,
> > >>>  "cust_id" : 333,
> > >>>  "total" : 12,
> > >>>  "demo" :10
> > >>> }
> > >>>
> > >>> orders/222.json:
> > >>>
> > >>> {
> > >>>  "cool": 20,
> > >>>  "id" : 222,
> > >>>  "cust_id" : 111,
> > >>>  "total" : 12
> > >>> }
> > >>>
> > >>> 1st query:
> > >>>
> > >>> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> > orders.cool
> > >>> . . . . . . . . . . . . . . . . . . . > FROM
> > >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> > >>> . . . . . . . . . . . . . . . . . . . >
> > >>> `maprfs.cmatta`.`test/orders/*.json` orders
> > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> > >> orders.cust_id
> > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> > 'FRANCE';
> > >>> +------+-------+
> > >>> |  id  | cool  |
> > >>> +------+-------+
> > >>> | 333  | null  |
> > >>> +------+-------+
> > >>> 1 row selected (0.258 seconds)
> > >>>
> > >>> Now change orders/111.json by moving the cool field from 222.json to
> > >>> 111.json:
> > >>>
> > >>> {
> > >>>  "cool": 20,
> > >>>  "tax" : 10,
> > >>>  "id" : 111,
> > >>>  "cust_id" : 333,
> > >>>  "total" : 12,
> > >>>  "demo" :10
> > >>> }
> > >>>
> > >>> And removing cool from orders/222.json:
> > >>>
> > >>> {
> > >>>  "id" : 222,
> > >>>  "cust_id" : 111,
> > >>>  "total" : 12
> > >>> }
> > >>>
> > >>> Re-run the query:
> > >>>
> > >>> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> > orders.cool
> > >>> . . . . . . . . . . . . . . . . . . . > FROM
> > >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> > >>> . . . . . . . . . . . . . . . . . . . >
> > >>> `maprfs.cmatta`.`test/orders/*.json` orders
> > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> > >> orders.cust_id
> > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> > 'FRANCE';
> > >>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
> > >>> java.lang.IllegalStateException: Failure while reading vector.
> > >>> Expected vector class of
> > >>> org.apache.drill.exec.vector.NullableIntVector but was holding vector
> > >>> class org.apache.drill.exec.vector.NullableVarCharVector.
> > >>>
> > >>> Fragment 0:0
> > >>>
> > >>> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on
> > >> se-node11.se.lab:31010]
> > >>>       at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> > >>>       at
> > >>
> >
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> > >>>       at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> > >>>       at sqlline.SqlLine.print(SqlLine.java:1583)
> > >>>       at sqlline.Commands.execute(Commands.java:852)
> > >>>       at sqlline.Commands.sql(Commands.java:751)
> > >>>       at sqlline.SqlLine.dispatch(SqlLine.java:738)
> > >>>       at sqlline.SqlLine.begin(SqlLine.java:612)
> > >>>       at sqlline.SqlLine.start(SqlLine.java:366)
> > >>>       at sqlline.SqlLine.main(SqlLine.java:259)
> > >>>
> > >>> ​
> > >>>
> > >>> Chris Matta
> > >>> [email protected]
> > >>> 215-701-3146
> > >>>
> > >>> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <[email protected]>
> > >> wrote:
> > >>>
> > >>>> Yes.
> > >>>>
> > >>>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <[email protected]
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Just to clarify, you run the *exact same query* once and it works,
> > then
> > >>>>> you remove say the “cool” field from orders/222.json and put it in
> > >>>>> orders/111.json and the next time the same query returns that
> error?
> > >>>>> ​
> > >>>>>
> > >>>>> Chris Matta
> > >>>>> [email protected]
> > >>>>> 215-701-3146
> > >>>>>
> > >>>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <[email protected]>
> > >> wrote:
> > >>>>>
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> In my use case I have several JSON documents that I need to query
> > >> using a
> > >>>>>> join.
> > >>>>>> The structure of each document can vary a lot (some fields a
> present
> > >> or
> > >>>>>> not
> > >>>>>> in documents)
> > >>>>>>
> > >>>>>> Sometimes the following exception is raised:
> > >>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM
> > ERROR:
> > >>>>>> java.lang.IllegalStateException: Failure while reading vector.
> > >> Expected
> > >>>>>> vector class of org.apache.drill.exec.vector.NullableIntVector but
> > was
> > >>>>>> holding vector class
> > >> org.apache.drill.exec.vector.NullableVarCharVector.
> > >>>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
> > >>>>>> 192.168.99.13:31010]
> > >>>>>>
> > >>>>>> The queries:
> > >>>>>>
> > >>>>>> Following query works:
> > >>>>>> -----
> > >>>>>> SELECT customers.id, orders.demo
> > >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> > >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> > >>>>>> WHERE customers.id = orders.cust_id
> > >>>>>> AND customers.country = 'FRANCE'
> > >>>>>> -----
> > >>>>>>
> > >>>>>> Following query FAILS:
> > >>>>>> -----
> > >>>>>> SELECT customers.id, orders.cool
> > >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> > >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> > >>>>>> WHERE customers.id = orders.cust_id
> > >>>>>> AND customers.country = 'FRANCE'
> > >>>>>> -----
> > >>>>>>
> > >>>>>>
> > >>>>>> The documents:
> > >>>>>>
> > >>>>>> Here the files:
> > >>>>>>
> > >>>>>> ./customers/333.json
> > >>>>>> {
> > >>>>>> "id" : 333,
> > >>>>>> "name" : "Dave Smith",
> > >>>>>> "country" : "FRANCE"
> > >>>>>> }
> > >>>>>>
> > >>>>>>
> > >>>>>> ./orders/111.json
> > >>>>>> {
> > >>>>>> "tax" : 10,
> > >>>>>> "id" : 111,
> > >>>>>> "cust_id" : 333,
> > >>>>>> "total" : 12,
> > >>>>>> "demo" :10
> > >>>>>> }
> > >>>>>>
> > >>>>>> ./orders/222.json
> > >>>>>> {
> > >>>>>> "cool":20,
> > >>>>>> "id" : 222,
> > >>>>>> "cust_id" : 111,
> > >>>>>> "total" : 12
> > >>>>>> }
> > >>>>>>
> > >>>>>>
> > >>>>>> To reproduce the bug you may have to change the document
> (add/remove
> > >>>>>> cool,
> > >>>>>> tax fields)
> > >>>>>>
> > >>>>>> It looks like the schema is not "updated" on the fly in some case.
> > >>>>>>
> > >>>>>> Any idea how to workaround? Is that bug?
> > >>>>>>
> > >>>>>> Regards
> > >>>>>> Tug
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Reply via email to