Great.

On Mon, Jun 22, 2015 at 11:45 AM, Tugdual Grall <[email protected]> wrote:

> Thanks for your help.
>
> If I use * I have another exception:
>
> --
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> org.apache.drill.exec.exception.SchemaChangeException: Hash join does not
> support schema changes Fragment 0:0 [Error Id:
> 0b20d580-37a3-491a-9987-4d04fb6f2d43 on maprdemo:31010]
> --
>
> Creating the JIRA as we speak...
>
> Tug
>
>
> On Mon, Jun 22, 2015 at 8:41 PM, Hanifi Gunes <[email protected]> wrote:
>
> > This is a bug in handling schema change. I would not expect this to
> happen
> > in case your second query had select * rather than an explicit
> projection,
> > select t.cool. Can you file a JIRA?
> >
> > On Mon, Jun 22, 2015 at 9:20 AM, Andries Engelbrecht <
> > [email protected]> wrote:
> >
> > > Ted,
> > >
> > > Agree fully, it definitely seems like a reproducible bug that should be
> > > filed and resolved.
> > >
> > >
> > > —Andries
> > >
> > >
> > >
> > > On Jun 22, 2015, at 9:14 AM, Ted Dunning <[email protected]>
> wrote:
> > >
> > > > Andries,
> > > >
> > > > That sounds like a reasonable suggestion, but the real problem is
> that
> > it
> > > > appears that having the field initially and then having the field be
> > > > missing is OK, but if it is missing first and then present Drill
> blows
> > a
> > > > gasket.
> > > >
> > > > I think it looks like a bug.  Very good and simple demo.
> > > >
> > > >
> > > >
> > > > On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht <
> > > > [email protected]> wrote:
> > > >
> > > >> A couple of things to try that I have found useful in the past.
> > > >>
> > > >> Pending if you want inner or outer joins, you may want to look at
> > using
> > > >> predicates to eliminate records that are not relevant to the join
> and
> > > can
> > > >> complicate the work Drill has to do.
> > > >>
> > > >> ie. add predicate "orders.cool is not null”
> > > >>
> > > >> Not only does it filter out the records that are not of interest
> (and
> > > can
> > > >> cause other challenges), but normally if you can apply predicates to
> > > >> queries to reduce the working set that Drill has to join it can
> > > >> substantially improve the performance for large data sets. Joins
> tend
> > > to be
> > > >> one of the more expensive operators in any execution engine, where
> > > >> predicates tend to be a much easier operation to execute at large
> > scale.
> > > >>
> > > >> —Andries
> > > >>
> > > >>
> > > >> On Jun 22, 2015, at 7:19 AM, Christopher Matta <[email protected]>
> > wrote:
> > > >>
> > > >>> I can confirm that this is reproducible:
> > > >>>
> > > >>> orders/111.json:
> > > >>>
> > > >>> {
> > > >>>  "tax" : 10,
> > > >>>  "id" : 111,
> > > >>>  "cust_id" : 333,
> > > >>>  "total" : 12,
> > > >>>  "demo" :10
> > > >>> }
> > > >>>
> > > >>> orders/222.json:
> > > >>>
> > > >>> {
> > > >>>  "cool": 20,
> > > >>>  "id" : 222,
> > > >>>  "cust_id" : 111,
> > > >>>  "total" : 12
> > > >>> }
> > > >>>
> > > >>> 1st query:
> > > >>>
> > > >>> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> > > orders.cool
> > > >>> . . . . . . . . . . . . . . . . . . . > FROM
> > > >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> > > >>> . . . . . . . . . . . . . . . . . . . >
> > > >>> `maprfs.cmatta`.`test/orders/*.json` orders
> > > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> > > >> orders.cust_id
> > > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> > > 'FRANCE';
> > > >>> +------+-------+
> > > >>> |  id  | cool  |
> > > >>> +------+-------+
> > > >>> | 333  | null  |
> > > >>> +------+-------+
> > > >>> 1 row selected (0.258 seconds)
> > > >>>
> > > >>> Now change orders/111.json by moving the cool field from 222.json
> to
> > > >>> 111.json:
> > > >>>
> > > >>> {
> > > >>>  "cool": 20,
> > > >>>  "tax" : 10,
> > > >>>  "id" : 111,
> > > >>>  "cust_id" : 333,
> > > >>>  "total" : 12,
> > > >>>  "demo" :10
> > > >>> }
> > > >>>
> > > >>> And removing cool from orders/222.json:
> > > >>>
> > > >>> {
> > > >>>  "id" : 222,
> > > >>>  "cust_id" : 111,
> > > >>>  "total" : 12
> > > >>> }
> > > >>>
> > > >>> Re-run the query:
> > > >>>
> > > >>> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> > > orders.cool
> > > >>> . . . . . . . . . . . . . . . . . . . > FROM
> > > >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> > > >>> . . . . . . . . . . . . . . . . . . . >
> > > >>> `maprfs.cmatta`.`test/orders/*.json` orders
> > > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> > > >> orders.cust_id
> > > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> > > 'FRANCE';
> > > >>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
> > > >>> java.lang.IllegalStateException: Failure while reading vector.
> > > >>> Expected vector class of
> > > >>> org.apache.drill.exec.vector.NullableIntVector but was holding
> vector
> > > >>> class org.apache.drill.exec.vector.NullableVarCharVector.
> > > >>>
> > > >>> Fragment 0:0
> > > >>>
> > > >>> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on
> > > >> se-node11.se.lab:31010]
> > > >>>       at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> > > >>>       at
> > > >>
> > >
> >
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> > > >>>       at
> sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> > > >>>       at sqlline.SqlLine.print(SqlLine.java:1583)
> > > >>>       at sqlline.Commands.execute(Commands.java:852)
> > > >>>       at sqlline.Commands.sql(Commands.java:751)
> > > >>>       at sqlline.SqlLine.dispatch(SqlLine.java:738)
> > > >>>       at sqlline.SqlLine.begin(SqlLine.java:612)
> > > >>>       at sqlline.SqlLine.start(SqlLine.java:366)
> > > >>>       at sqlline.SqlLine.main(SqlLine.java:259)
> > > >>>
> > > >>> ​
> > > >>>
> > > >>> Chris Matta
> > > >>> [email protected]
> > > >>> 215-701-3146
> > > >>>
> > > >>> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <[email protected]
> >
> > > >> wrote:
> > > >>>
> > > >>>> Yes.
> > > >>>>
> > > >>>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <
> [email protected]
> > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Just to clarify, you run the *exact same query* once and it
> works,
> > > then
> > > >>>>> you remove say the “cool” field from orders/222.json and put it
> in
> > > >>>>> orders/111.json and the next time the same query returns that
> > error?
> > > >>>>> ​
> > > >>>>>
> > > >>>>> Chris Matta
> > > >>>>> [email protected]
> > > >>>>> 215-701-3146
> > > >>>>>
> > > >>>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <
> [email protected]>
> > > >> wrote:
> > > >>>>>
> > > >>>>>> Hello,
> > > >>>>>>
> > > >>>>>> In my use case I have several JSON documents that I need to
> query
> > > >> using a
> > > >>>>>> join.
> > > >>>>>> The structure of each document can vary a lot (some fields a
> > present
> > > >> or
> > > >>>>>> not
> > > >>>>>> in documents)
> > > >>>>>>
> > > >>>>>> Sometimes the following exception is raised:
> > > >>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM
> > > ERROR:
> > > >>>>>> java.lang.IllegalStateException: Failure while reading vector.
> > > >> Expected
> > > >>>>>> vector class of org.apache.drill.exec.vector.NullableIntVector
> but
> > > was
> > > >>>>>> holding vector class
> > > >> org.apache.drill.exec.vector.NullableVarCharVector.
> > > >>>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
> > > >>>>>> 192.168.99.13:31010]
> > > >>>>>>
> > > >>>>>> The queries:
> > > >>>>>>
> > > >>>>>> Following query works:
> > > >>>>>> -----
> > > >>>>>> SELECT customers.id, orders.demo
> > > >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> > > >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> > > >>>>>> WHERE customers.id = orders.cust_id
> > > >>>>>> AND customers.country = 'FRANCE'
> > > >>>>>> -----
> > > >>>>>>
> > > >>>>>> Following query FAILS:
> > > >>>>>> -----
> > > >>>>>> SELECT customers.id, orders.cool
> > > >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> > > >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> > > >>>>>> WHERE customers.id = orders.cust_id
> > > >>>>>> AND customers.country = 'FRANCE'
> > > >>>>>> -----
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> The documents:
> > > >>>>>>
> > > >>>>>> Here the files:
> > > >>>>>>
> > > >>>>>> ./customers/333.json
> > > >>>>>> {
> > > >>>>>> "id" : 333,
> > > >>>>>> "name" : "Dave Smith",
> > > >>>>>> "country" : "FRANCE"
> > > >>>>>> }
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> ./orders/111.json
> > > >>>>>> {
> > > >>>>>> "tax" : 10,
> > > >>>>>> "id" : 111,
> > > >>>>>> "cust_id" : 333,
> > > >>>>>> "total" : 12,
> > > >>>>>> "demo" :10
> > > >>>>>> }
> > > >>>>>>
> > > >>>>>> ./orders/222.json
> > > >>>>>> {
> > > >>>>>> "cool":20,
> > > >>>>>> "id" : 222,
> > > >>>>>> "cust_id" : 111,
> > > >>>>>> "total" : 12
> > > >>>>>> }
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> To reproduce the bug you may have to change the document
> > (add/remove
> > > >>>>>> cool,
> > > >>>>>> tax fields)
> > > >>>>>>
> > > >>>>>> It looks like the schema is not "updated" on the fly in some
> case.
> > > >>>>>>
> > > >>>>>> Any idea how to workaround? Is that bug?
> > > >>>>>>
> > > >>>>>> Regards
> > > >>>>>> Tug
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>

Reply via email to