Andries,

That sounds like a reasonable suggestion, but the real problem is that it
appears that having the field initially and then having the field be
missing is OK, but if it is missing first and then present Drill blows a
gasket.

I think it looks like a bug.  Very good and simple demo.



On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> A couple of things to try that I have found useful in the past.
>
> Pending if you want inner or outer joins, you may want to look at using
> predicates to eliminate records that are not relevant to the join and can
> complicate the work Drill has to do.
>
> ie. add predicate "orders.cool is not null”
>
> Not only does it filter out the records that are not of interest (and can
> cause other challenges), but normally if you can apply predicates to
> queries to reduce the working set that Drill has to join it can
> substantially improve the performance for large data sets. Joins tend to be
> one of the more expensive operators in any execution engine, where
> predicates tend to be a much easier operation to execute at large scale.
>
> —Andries
>
>
> On Jun 22, 2015, at 7:19 AM, Christopher Matta <cma...@mapr.com> wrote:
>
> > I can confirm that this is reproducible:
> >
> > orders/111.json:
> >
> > {
> >   "tax" : 10,
> >   "id" : 111,
> >   "cust_id" : 333,
> >   "total" : 12,
> >   "demo" :10
> > }
> >
> > orders/222.json:
> >
> > {
> >   "cool": 20,
> >   "id" : 222,
> >   "cust_id" : 111,
> >   "total" : 12
> > }
> >
> > 1st query:
> >
> > 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
> > . . . . . . . . . . . . . . . . . . . > FROM
> > `maprfs.cmatta`.`test/customers/*.json` customers,
> > . . . . . . . . . . . . . . . . . . . >
> > `maprfs.cmatta`.`test/orders/*.json` orders
> > . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> orders.cust_id
> > . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
> > +------+-------+
> > |  id  | cool  |
> > +------+-------+
> > | 333  | null  |
> > +------+-------+
> > 1 row selected (0.258 seconds)
> >
> > Now change orders/111.json by moving the cool field from 222.json to
> > 111.json:
> >
> > {
> >   "cool": 20,
> >   "tax" : 10,
> >   "id" : 111,
> >   "cust_id" : 333,
> >   "total" : 12,
> >   "demo" :10
> > }
> >
> > And removing cool from orders/222.json:
> >
> > {
> >   "id" : 222,
> >   "cust_id" : 111,
> >   "total" : 12
> > }
> >
> > Re-run the query:
> >
> > : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
> > . . . . . . . . . . . . . . . . . . . > FROM
> > `maprfs.cmatta`.`test/customers/*.json` customers,
> > . . . . . . . . . . . . . . . . . . . >
> > `maprfs.cmatta`.`test/orders/*.json` orders
> > . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> orders.cust_id
> > . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
> > java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
> > java.lang.IllegalStateException: Failure while reading vector.
> > Expected vector class of
> > org.apache.drill.exec.vector.NullableIntVector but was holding vector
> > class org.apache.drill.exec.vector.NullableVarCharVector.
> >
> > Fragment 0:0
> >
> > [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on
> se-node11.se.lab:31010]
> >        at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> >        at
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> >        at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> >        at sqlline.SqlLine.print(SqlLine.java:1583)
> >        at sqlline.Commands.execute(Commands.java:852)
> >        at sqlline.Commands.sql(Commands.java:751)
> >        at sqlline.SqlLine.dispatch(SqlLine.java:738)
> >        at sqlline.SqlLine.begin(SqlLine.java:612)
> >        at sqlline.SqlLine.start(SqlLine.java:366)
> >        at sqlline.SqlLine.main(SqlLine.java:259)
> >
> > ​
> >
> > Chris Matta
> > cma...@mapr.com
> > 215-701-3146
> >
> > On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <tugd...@gmail.com>
> wrote:
> >
> >> Yes.
> >>
> >> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <cma...@mapr.com>
> >> wrote:
> >>
> >>> Just to clarify, you run the *exact same query* once and it works, then
> >>> you remove say the “cool” field from orders/222.json and put it in
> >>> orders/111.json and the next time the same query returns that error?
> >>> ​
> >>>
> >>> Chris Matta
> >>> cma...@mapr.com
> >>> 215-701-3146
> >>>
> >>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <tugd...@gmail.com>
> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> In my use case I have several JSON documents that I need to query
> using a
> >>>> join.
> >>>> The structure of each document can vary a lot (some fields a present
> or
> >>>> not
> >>>> in documents)
> >>>>
> >>>> Sometimes the following exception is raised:
> >>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> >>>> java.lang.IllegalStateException: Failure while reading vector.
> Expected
> >>>> vector class of org.apache.drill.exec.vector.NullableIntVector but was
> >>>> holding vector class
> org.apache.drill.exec.vector.NullableVarCharVector.
> >>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
> >>>> 192.168.99.13:31010]
> >>>>
> >>>> The queries:
> >>>>
> >>>> Following query works:
> >>>> -----
> >>>> SELECT customers.id, orders.demo
> >>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> >>>>      dfs.`/Users/tgrall/working/orders/*.json` orders
> >>>> WHERE customers.id = orders.cust_id
> >>>> AND customers.country = 'FRANCE'
> >>>> -----
> >>>>
> >>>> Following query FAILS:
> >>>> -----
> >>>> SELECT customers.id, orders.cool
> >>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> >>>>      dfs.`/Users/tgrall/working/orders/*.json` orders
> >>>> WHERE customers.id = orders.cust_id
> >>>> AND customers.country = 'FRANCE'
> >>>> -----
> >>>>
> >>>>
> >>>> The documents:
> >>>>
> >>>> Here the files:
> >>>>
> >>>> ./customers/333.json
> >>>> {
> >>>>  "id" : 333,
> >>>>  "name" : "Dave Smith",
> >>>>  "country" : "FRANCE"
> >>>> }
> >>>>
> >>>>
> >>>> ./orders/111.json
> >>>> {
> >>>> "tax" : 10,
> >>>> "id" : 111,
> >>>> "cust_id" : 333,
> >>>> "total" : 12,
> >>>> "demo" :10
> >>>> }
> >>>>
> >>>> ./orders/222.json
> >>>> {
> >>>> "cool":20,
> >>>> "id" : 222,
> >>>> "cust_id" : 111,
> >>>> "total" : 12
> >>>> }
> >>>>
> >>>>
> >>>> To reproduce the bug you may have to change the document (add/remove
> >>>> cool,
> >>>> tax fields)
> >>>>
> >>>> It looks like the schema is not "updated" on the fly in some case.
> >>>>
> >>>> Any idea how to workaround? Is that bug?
> >>>>
> >>>> Regards
> >>>> Tug
> >>>>
> >>>
> >>>
> >>
>
>

Reply via email to