Yeah, it's definitely a bug. Could you please file a jira?

On Thu, Mar 10, 2016 at 6:19 PM, Jiang Wu <[email protected]> wrote:

> Here are the complete examples to illustrate the problem:
>
> 0: jdbc:drill:zk=local> select id, t.batters.batter from
> dfs.`c:\tmp\sample.json` t;
> +----+--------+
> | id | EXPR$1 |
> +----+--------+
> | 0001 |
> [{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
> Food"}] |
> | 0002 | [{"id":"2001","type":"Regular"}] |
> | 0003 | [{"id":"3001","type":"Regular"},{"id":"3002","type":"Chocolate"}]
> |
> +----+--------+
> 3 rows selected (0.174 seconds)
>
>
> 0: jdbc:drill:zk=local> select id, t.batters.batter.id from
> dfs.`c:\tmp\sample.json` t;
> +-------+---------+
> |  id   | EXPR$1  |
> +-------+---------+
> | 0001  | 1001    | <-- OK
> | 0002  | 1002    | <-- Wrong, 1002 is not associated with 0002
> | 0003  | 1003    | <-- Wrong, 1003 is not associated with 0003
> +-------+---------+
> 3 rows selected (0.182 seconds)
>
> -- Jiang
>
>
>
> -----Original Message-----
> From: Jiang Wu [mailto:[email protected]]
> Sent: Thursday, March 10, 2016 6:16 PM
> To: [email protected]
> Subject: RE: Question on nested JSON behavior
>
> One problem with the current behavior is that the results are not
> correlated to the JSON row object.  It looks like the code simply picks the
> first N values from the union of all values across all rows.  The N is the
> number of rows in the result.
>
> For example, if I give this query:
>
> 0: jdbc:drill:zk=local> select id, t.batters.batter.id from
> dfs.`c:\tmp\sample.json` t;
> +-------+---------+
> |  id   | EXPR$1  |
> +-------+---------+
> | 0001  | 1001    |
> | 0002  | 1002    |
> | 0003  | 1003    |
> +-------+---------+
> 3 rows selected (0.182 seconds)
>
> The above cannot be correct because "1001", "1002", and "1003" never
> appears in the the row with id being "0001".
>
> Returning an error will be better than returning the wrong answer.
>
> -- Jiang
>
> -----Original Message-----
> From: Steven Phillips [mailto:[email protected]]
> Sent: Thursday, March 10, 2016 5:41 PM
> To: user <[email protected]>
> Subject: Re: Question on nested JSON behavior
>
> I am surprised that you are getting that result. I would have expected the
> query to fail. Since batter is an array, you should specify the index of
> the array if yo want to access lower level elements.
>
> A way to access all of the sub-fields of a repeated map is something we've
> discussed, but never implemented. Until it is implemented, I think the
> correct behavior is for this query to fail.
>
> On Thu, Mar 10, 2016 at 4:42 PM, Neeraja Rentachintala <
> [email protected]> wrote:
>
> > Actually I agree with Jiang. The result does seem unintuitive. If it
> > is a file with just a list, it does still make sense to return the ids
> > in that list as an array unless the user has configured Drill to
> > automatically flatten the first level.
> > Does anyone know how does the other systems behave for this use case?
> > (for
> > ex: Mongo)
> >
> >
> >
> > On Thu, Mar 10, 2016 at 4:21 PM, Nathan Griffith
> > <[email protected]>
> > wrote:
> >
> > > Hi Jiang,
> > >
> > > Think of it this way: If you had a file that was just the list:
> > >
> > > {"id":"1001","type":"Regular"}
> > > {"id":"1002","type":"Chocolate"}
> > > {"id":"1003","type":"Blueberry"}
> > > {"id":"1004","type":"Devil's Food"}
> > >
> > > What would you like it to return when you query:
> > >
> > > select id from dfs.`/path/to/sample_file.json`;
> > >
> > > ?
> > >
> > > When you enter the query that you're asking about, you're indicating
> > > exactly that structure of data. Does this explanation make sense?
> > >
> > > Best,
> > > Nathan
> > >
> > > On Thu, Mar 10, 2016 at 4:07 PM, Jiang Wu <[email protected]>
> > wrote:
> > > > Drill version: 1.4.0.  Assuming 3 JSON objects with the following
> > > structure:
> > > >
> > > >   {  ...
> > > >     "batters":
> > > >       {
> > > >         "batter":
> > > >           [
> > > >             { "id": "1001", "type": "Regular" },
> > > >             { "id": "1002", "type": "Chocolate" },
> > > >             { "id": "1003", "type": "Blueberry" },
> > > >             { "id": "1004", "type": "Devil's Food" }
> > > >           ]
> > > >       },
> > > >     ...
> > > >   }
> > > >
> > > > Now running a few sample queries against the above data:
> > > >
> > > >
> > > > A)      select "batters" returns expected results, which are the
> values
> > > of "batters" from each row.
> > > >
> > > > 0: jdbc:drill:zk=local> select batters from
> > > > dfs.`c:\tmp\sample.json`;
> > > > +---------+
> > > > | batters |
> > > > +---------+
> > > > |
> > >
> > {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocola
> > te"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
> > > Food"}]} |
> > > > | {"batter":[{"id":"1001","type":"Regular"}]} |
> > > > |
> > >
> > {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocola
> > te"}]}
> > > |
> > > > +---------+
> > > > 3 rows selected (0.243 seconds)
> > > >
> > > >
> > > > B)      select "batters.batter" also returns the expected results,
> > which
> > > are the array values for "batters.batter" from each row.
> > > >
> > > >
> > > > 0: jdbc:drill:zk=local> select t.batters.batter from
> > > dfs.`c:\tmp\sample.json` t;
> > > > +--------+
> > > > | EXPR$0 |
> > > > +--------+
> > > > |
> > >
> > [{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id"
> > :"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
> > > Food"}] |
> > > > | [{"id":"1001","type":"Regular"}] |
> > > > | [{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}
> > > > | ] |
> > > > +--------+
> > > > 3 rows selected (0.198 seconds)
> > > >
> > > >
> > > > C)  select "batters.batter.id" returns something unexpected:
> > > >
> > > > 0: jdbc:drill:zk=local> select t.batters.batter.id from
> > > dfs.`c:\tmp\sample.json` t;
> > > > +---------+
> > > > | EXPR$0  |
> > > > +---------+
> > > > | 1001    |
> > > > | 1002    |
> > > > | 1003    |
> > > > +---------+
> > > >
> > > > The above result doesn't make sense.  The result looks like the 3
> > values
> > > from row 1. Should the result be the following instead?
> > > >
> > > > +---------+
> > > > | EXPR$0  |
> > > > +---------+
> > > > | [1001, 1002, 1003, 1004]    |
> > > > | [1001]    |
> > > > | [1001, 1002]    |
> > > > +---------+
> > > >
> > > > Any hints on what is happening here?  Thanks.
> > > >
> > > > -- Jiang
> > > >
> > >
> >
>

Reply via email to