For the those who are curious, here is what Mongo returns for the same
conceptual query:
> db.t1.find({},{"batters":1, "_id":0});
{ "batters" : { "batter" : [ { "id" : "1001", "type" : "Regular" }, { "id" :
"1002", "type" : "Chocolate" }, { "id" : "1003", "type" : "Blueberry" }, { "id"
: "1004", "type" : "Devil's Food" } ] } }
{ "batters" : { "batter" : [ { "id" : "2001", "type" : "Regular" } ] } }
{ "batters" : { "batter" : [ { "id" : "3001", "type" : "Regular" }, { "id" :
"3002", "type" : "Chocolate" } ] } }
> db.t1.find({},{"batters.batter":1, "_id":0});
{ "batters" : { "batter" : [ { "id" : "1001", "type" : "Regular" }, { "id" :
"1002", "type" : "Chocolate" }, { "id" : "1003", "type" : "Blueberry" }, { "id"
: "1004", "type" : "Devil's Food" } ] } }
{ "batters" : { "batter" : [ { "id" : "2001", "type" : "Regular" } ] } }
{ "batters" : { "batter" : [ { "id" : "3001", "type" : "Regular" }, { "id" :
"3002", "type" : "Chocolate" } ] } }
> db.t1.find({},{"batters.batter.id":1, "_id":0});
{ "batters" : { "batter" : [ { "id" : "1001" }, { "id" : "1002" }, { "id" :
"1003" }, { "id" : "1004" } ] } }
{ "batters" : { "batter" : [ { "id" : "2001" } ] } }
{ "batters" : { "batter" : [ { "id" : "3001" }, { "id" : "3002" } ] } }
Each returned document is always a JSON document from the root. Hence, no
ambiguity in the last case. It is returning the inner most array with each
array element having a single map key.
To produce the same conceptual results, all values in the inner most array will
need to returned for each row.
-- Jiang
-----Original Message-----
From: Neeraja Rentachintala [mailto:[email protected]]
Sent: Thursday, March 10, 2016 4:43 PM
To: [email protected]
Subject: Re: Question on nested JSON behavior
Actually I agree with Jiang. The result does seem unintuitive. If it is a file
with just a list, it does still make sense to return the ids in that list as an
array unless the user has configured Drill to automatically flatten the first
level.
Does anyone know how does the other systems behave for this use case? (for
ex: Mongo)
On Thu, Mar 10, 2016 at 4:21 PM, Nathan Griffith <[email protected]>
wrote:
> Hi Jiang,
>
> Think of it this way: If you had a file that was just the list:
>
> {"id":"1001","type":"Regular"}
> {"id":"1002","type":"Chocolate"}
> {"id":"1003","type":"Blueberry"}
> {"id":"1004","type":"Devil's Food"}
>
> What would you like it to return when you query:
>
> select id from dfs.`/path/to/sample_file.json`;
>
> ?
>
> When you enter the query that you're asking about, you're indicating
> exactly that structure of data. Does this explanation make sense?
>
> Best,
> Nathan
>
> On Thu, Mar 10, 2016 at 4:07 PM, Jiang Wu <[email protected]> wrote:
> > Drill version: 1.4.0. Assuming 3 JSON objects with the following
> structure:
> >
> > { ...
> > "batters":
> > {
> > "batter":
> > [
> > { "id": "1001", "type": "Regular" },
> > { "id": "1002", "type": "Chocolate" },
> > { "id": "1003", "type": "Blueberry" },
> > { "id": "1004", "type": "Devil's Food" }
> > ]
> > },
> > ...
> > }
> >
> > Now running a few sample queries against the above data:
> >
> >
> > A) select "batters" returns expected results, which are the values
> of "batters" from each row.
> >
> > 0: jdbc:drill:zk=local> select batters from
> > dfs.`c:\tmp\sample.json`;
> > +---------+
> > | batters |
> > +---------+
> > |
> {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocola
> te"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
> Food"}]} |
> > | {"batter":[{"id":"1001","type":"Regular"}]} |
> > |
> {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocola
> te"}]}
> |
> > +---------+
> > 3 rows selected (0.243 seconds)
> >
> >
> > B) select "batters.batter" also returns the expected results, which
> are the array values for "batters.batter" from each row.
> >
> >
> > 0: jdbc:drill:zk=local> select t.batters.batter from
> dfs.`c:\tmp\sample.json` t;
> > +--------+
> > | EXPR$0 |
> > +--------+
> > |
> [{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id"
> :"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
> Food"}] |
> > | [{"id":"1001","type":"Regular"}] |
> > | [{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}]
> > | |
> > +--------+
> > 3 rows selected (0.198 seconds)
> >
> >
> > C) select "batters.batter.id" returns something unexpected:
> >
> > 0: jdbc:drill:zk=local> select t.batters.batter.id from
> dfs.`c:\tmp\sample.json` t;
> > +---------+
> > | EXPR$0 |
> > +---------+
> > | 1001 |
> > | 1002 |
> > | 1003 |
> > +---------+
> >
> > The above result doesn't make sense. The result looks like the 3
> > values
> from row 1. Should the result be the following instead?
> >
> > +---------+
> > | EXPR$0 |
> > +---------+
> > | [1001, 1002, 1003, 1004] |
> > | [1001] |
> > | [1001, 1002] |
> > +---------+
> >
> > Any hints on what is happening here? Thanks.
> >
> > -- Jiang
> >
>