This is a bug in handling schema change. I would not expect this to happen
in case your second query had select * rather than an explicit projection,
select t.cool. Can you file a JIRA?

On Mon, Jun 22, 2015 at 9:20 AM, Andries Engelbrecht <
[email protected]> wrote:

> Ted,
>
> Agree fully, it definitely seems like a reproducible bug that should be
> filed and resolved.
>
>
> —Andries
>
>
>
> On Jun 22, 2015, at 9:14 AM, Ted Dunning <[email protected]> wrote:
>
> > Andries,
> >
> > That sounds like a reasonable suggestion, but the real problem is that it
> > appears that having the field initially and then having the field be
> > missing is OK, but if it is missing first and then present Drill blows a
> > gasket.
> >
> > I think it looks like a bug.  Very good and simple demo.
> >
> >
> >
> > On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht <
> > [email protected]> wrote:
> >
> >> A couple of things to try that I have found useful in the past.
> >>
> >> Pending if you want inner or outer joins, you may want to look at using
> >> predicates to eliminate records that are not relevant to the join and
> can
> >> complicate the work Drill has to do.
> >>
> >> ie. add predicate "orders.cool is not null”
> >>
> >> Not only does it filter out the records that are not of interest (and
> can
> >> cause other challenges), but normally if you can apply predicates to
> >> queries to reduce the working set that Drill has to join it can
> >> substantially improve the performance for large data sets. Joins tend
> to be
> >> one of the more expensive operators in any execution engine, where
> >> predicates tend to be a much easier operation to execute at large scale.
> >>
> >> —Andries
> >>
> >>
> >> On Jun 22, 2015, at 7:19 AM, Christopher Matta <[email protected]> wrote:
> >>
> >>> I can confirm that this is reproducible:
> >>>
> >>> orders/111.json:
> >>>
> >>> {
> >>>  "tax" : 10,
> >>>  "id" : 111,
> >>>  "cust_id" : 333,
> >>>  "total" : 12,
> >>>  "demo" :10
> >>> }
> >>>
> >>> orders/222.json:
> >>>
> >>> {
> >>>  "cool": 20,
> >>>  "id" : 222,
> >>>  "cust_id" : 111,
> >>>  "total" : 12
> >>> }
> >>>
> >>> 1st query:
> >>>
> >>> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> orders.cool
> >>> . . . . . . . . . . . . . . . . . . . > FROM
> >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> >>> . . . . . . . . . . . . . . . . . . . >
> >>> `maprfs.cmatta`.`test/orders/*.json` orders
> >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> >> orders.cust_id
> >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> 'FRANCE';
> >>> +------+-------+
> >>> |  id  | cool  |
> >>> +------+-------+
> >>> | 333  | null  |
> >>> +------+-------+
> >>> 1 row selected (0.258 seconds)
> >>>
> >>> Now change orders/111.json by moving the cool field from 222.json to
> >>> 111.json:
> >>>
> >>> {
> >>>  "cool": 20,
> >>>  "tax" : 10,
> >>>  "id" : 111,
> >>>  "cust_id" : 333,
> >>>  "total" : 12,
> >>>  "demo" :10
> >>> }
> >>>
> >>> And removing cool from orders/222.json:
> >>>
> >>> {
> >>>  "id" : 222,
> >>>  "cust_id" : 111,
> >>>  "total" : 12
> >>> }
> >>>
> >>> Re-run the query:
> >>>
> >>> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> orders.cool
> >>> . . . . . . . . . . . . . . . . . . . > FROM
> >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> >>> . . . . . . . . . . . . . . . . . . . >
> >>> `maprfs.cmatta`.`test/orders/*.json` orders
> >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> >> orders.cust_id
> >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> 'FRANCE';
> >>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
> >>> java.lang.IllegalStateException: Failure while reading vector.
> >>> Expected vector class of
> >>> org.apache.drill.exec.vector.NullableIntVector but was holding vector
> >>> class org.apache.drill.exec.vector.NullableVarCharVector.
> >>>
> >>> Fragment 0:0
> >>>
> >>> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on
> >> se-node11.se.lab:31010]
> >>>       at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> >>>       at
> >>
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> >>>       at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> >>>       at sqlline.SqlLine.print(SqlLine.java:1583)
> >>>       at sqlline.Commands.execute(Commands.java:852)
> >>>       at sqlline.Commands.sql(Commands.java:751)
> >>>       at sqlline.SqlLine.dispatch(SqlLine.java:738)
> >>>       at sqlline.SqlLine.begin(SqlLine.java:612)
> >>>       at sqlline.SqlLine.start(SqlLine.java:366)
> >>>       at sqlline.SqlLine.main(SqlLine.java:259)
> >>>
> >>> ​
> >>>
> >>> Chris Matta
> >>> [email protected]
> >>> 215-701-3146
> >>>
> >>> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <[email protected]>
> >> wrote:
> >>>
> >>>> Yes.
> >>>>
> >>>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> Just to clarify, you run the *exact same query* once and it works,
> then
> >>>>> you remove say the “cool” field from orders/222.json and put it in
> >>>>> orders/111.json and the next time the same query returns that error?
> >>>>> ​
> >>>>>
> >>>>> Chris Matta
> >>>>> [email protected]
> >>>>> 215-701-3146
> >>>>>
> >>>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <[email protected]>
> >> wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> In my use case I have several JSON documents that I need to query
> >> using a
> >>>>>> join.
> >>>>>> The structure of each document can vary a lot (some fields a present
> >> or
> >>>>>> not
> >>>>>> in documents)
> >>>>>>
> >>>>>> Sometimes the following exception is raised:
> >>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM
> ERROR:
> >>>>>> java.lang.IllegalStateException: Failure while reading vector.
> >> Expected
> >>>>>> vector class of org.apache.drill.exec.vector.NullableIntVector but
> was
> >>>>>> holding vector class
> >> org.apache.drill.exec.vector.NullableVarCharVector.
> >>>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
> >>>>>> 192.168.99.13:31010]
> >>>>>>
> >>>>>> The queries:
> >>>>>>
> >>>>>> Following query works:
> >>>>>> -----
> >>>>>> SELECT customers.id, orders.demo
> >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> >>>>>> WHERE customers.id = orders.cust_id
> >>>>>> AND customers.country = 'FRANCE'
> >>>>>> -----
> >>>>>>
> >>>>>> Following query FAILS:
> >>>>>> -----
> >>>>>> SELECT customers.id, orders.cool
> >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> >>>>>> WHERE customers.id = orders.cust_id
> >>>>>> AND customers.country = 'FRANCE'
> >>>>>> -----
> >>>>>>
> >>>>>>
> >>>>>> The documents:
> >>>>>>
> >>>>>> Here the files:
> >>>>>>
> >>>>>> ./customers/333.json
> >>>>>> {
> >>>>>> "id" : 333,
> >>>>>> "name" : "Dave Smith",
> >>>>>> "country" : "FRANCE"
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> ./orders/111.json
> >>>>>> {
> >>>>>> "tax" : 10,
> >>>>>> "id" : 111,
> >>>>>> "cust_id" : 333,
> >>>>>> "total" : 12,
> >>>>>> "demo" :10
> >>>>>> }
> >>>>>>
> >>>>>> ./orders/222.json
> >>>>>> {
> >>>>>> "cool":20,
> >>>>>> "id" : 222,
> >>>>>> "cust_id" : 111,
> >>>>>> "total" : 12
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> To reproduce the bug you may have to change the document (add/remove
> >>>>>> cool,
> >>>>>> tax fields)
> >>>>>>
> >>>>>> It looks like the schema is not "updated" on the fly in some case.
> >>>>>>
> >>>>>> Any idea how to workaround? Is that bug?
> >>>>>>
> >>>>>> Regards
> >>>>>> Tug
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Reply via email to