Steven, are you referring to the fact that he has two map keys with the
same name?

If so, is that technically invalid json?

I generally agree with this post [1].

http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object#answer-23195243

If others agree, I think it would be appropriate for Hao to file a JIRA to
make sure we follow and check this convention.

J

On Wed, Jan 21, 2015 at 5:07 PM, Steven Phillips <[email protected]>
wrote:

> Hao,
>
> It looks to me like your json file is actually invalid, and it seems like
> Drill should be throwing an exception.
>
> On Wed, Jan 21, 2015 at 5:04 PM, Hao Zhu <[email protected]> wrote:
>
> > I figured out the differences after getting the json file from Andries.
> >
> > My json file is like:
> > {“entities”:{xxx},“entities”:{yyy}... }
> >
> > Andries' json file is like:
> > {“entities”:{xxx}}
> > {“entities”:{yyy}}
> > ...
> >
> > So basically Andries' json file is contains multiple json files.
> >
> > Fortunately, we can use the same SQL to get the same results:
> > 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t;
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > | []         |
> > | [{"text":"GoPatriots"}] |
> > | [{"text":"aaa"}] |
> > | []         |
> > | [{"text":"bbb"}] |
> > +------------+
> > 5 rows selected (0.136 seconds)
> > 0: jdbc:drill:> with tmp as
> > . . . . . . . > (
> > . . . . . . . > select flatten(t.entities.hashtags) as c from
> > dfs.tmp.`z2.json` t
> > . . . . . . . > )
> > . . . . . . . > select tmp.c.text from tmp;
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > | GoPatriots |
> > | aaa        |
> > | bbb        |
> > +------------+
> > 3 rows selected (0.115 seconds)
> >
> > Thanks,
> > Hao
> >
> >
> >
> >
> >
> >
> > On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht <
> > [email protected]> wrote:
> >
> > > In my case it returns the empty records when flatten is not used.
> > >
> > > 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as hashtags
> > > from `twitter.json` t limit 10;
> > > +------------+
> > > |  hashtags  |
> > > +------------+
> > > | []         |
> > > | [{"text":"SportsNews","indices":[0,11]}] |
> > > | []         |
> > > | [{"text":"SportsNews","indices":[0,11]}] |
> > > | []         |
> > > | []         |
> > > | []         |
> > > | []         |
> > > | []         |
> > > | [{"text":"CARvsSEA","indices":[36,45]}] |
> > > +------------+
> > >
> > > On Jan 21, 2015, at 4:26 PM, Hao Zhu <[email protected]> wrote:
> > >
> > > > Actually not due to flatten, if you directly query the file, it will
> > only
> > > > show the non-null values.
> > > >
> > > > 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t;
> > > > +------------+
> > > > |   EXPR$0   |
> > > > +------------+
> > > > | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] |
> > > > +------------+
> > > > 1 row selected (0.123 seconds)
> > > >
> > > > Thanks,
> > > > Hao
> > > >
> > > > On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht <
> > > > [email protected]> wrote:
> > > >
> > > >> Very interesting, flatten seems to bypass empty records. Not sure if
> > > that
> > > >> is an ideal result for all use cases, but certainly usable in this
> > case.
> > > >>
> > > >> Thanks
> > > >> —Andries
> > > >>
> > > >>
> > > >> On Jan 21, 2015, at 3:45 PM, Hao Zhu <[email protected]> wrote:
> > > >>
> > > >>> I can also fetch non-null values for attached json file which
> > contains
> > > 6
> > > >> "entities", 3 of them are null, 3 of them have "text" value.
> > > >>>
> > > >>> Could you share your complete "twitter.json"?
> > > >>>
> > > >>> 0: jdbc:drill:> with tmp as
> > > >>> . . . . . . . > (
> > > >>> . . . . . . . > select flatten(t.entities.hashtags) as c from
> > > >> dfs.tmp.`z.json` t
> > > >>> . . . . . . . > )
> > > >>> . . . . . . . > select tmp.c.text from tmp;
> > > >>> +------------+
> > > >>> |   EXPR$0   |
> > > >>> +------------+
> > > >>> | GoPatriots |
> > > >>> | aaa        |
> > > >>> | bbb        |
> > > >>> +------------+
> > > >>> 3 rows selected (0.122 seconds)
> > > >>>
> > > >>> Thanks,
> > > >>> Hao
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht <
> > > >> [email protected]> wrote:
> > > >>> The sample data I posted only has 1 element, but some records have
> > > >> multiple elements in them.
> > > >>>
> > > >>> Interestingly enough though
> > > >>> select t.entities.hashtags[0].`text` from `twitter.json` t limit
> 10;
> > > >>>
> > > >>> Produces
> > > >>> +------------+
> > > >>> |   EXPR$0   |
> > > >>> +------------+
> > > >>> | null       |
> > > >>> | SportsNews |
> > > >>> | null       |
> > > >>> | SportsNews |
> > > >>> | null       |
> > > >>> | null       |
> > > >>> | null       |
> > > >>> | null       |
> > > >>> | null       |
> > > >>> | CARvsSEA   |
> > > >>> +——————+
> > > >>>
> > > >>>
> > > >>> And
> > > >>>
> > > >>> select t.entities.hashtags[0] from `twitter.json` t limit 10;
> > > >>>
> > > >>> +------------+
> > > >>> |   EXPR$0   |
> > > >>> +------------+
> > > >>> | {"indices":[]} |
> > > >>> | {"text":"SportsNews","indices":[90,99]} |
> > > >>> | {"indices":[]} |
> > > >>> | {"text":"SportsNews","indices":[90,99]} |
> > > >>> | {"indices":[]} |
> > > >>> | {"indices":[]} |
> > > >>> | {"indices":[]} |
> > > >>> | {"indices":[]} |
> > > >>> | {"indices":[]} |
> > > >>> | {"text":"CARvsSEA","indices":[90,99]} |
> > > >>> +——————+
> > > >>>
> > > >>> Strange part is that there is no indices map in the hashtags array,
> > so
> > > >> no idea why it shows up when pointing to the first lament in an
> empty
> > > array.
> > > >>>
> > > >>>
> > > >>> —Andries
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <[email protected]> wrote:
> > > >>>
> > > >>>> I just noticed that the result of "hashtags" is just an array with
> > > >> only 1
> > > >>>> element.
> > > >>>> So take your example:
> > > >>>> [root@maprdemo tmp]# cat d.json
> > > >>>> {
> > > >>>> "entities": {
> > > >>>>  "trends": [],
> > > >>>>  "symbols": [],
> > > >>>>  "urls": [],
> > > >>>>  "hashtags": [],
> > > >>>>  "user_mentions": []
> > > >>>> },
> > > >>>> "entities": {
> > > >>>>  "trends": [1,2,3],
> > > >>>>  "symbols": [4,5,6],
> > > >>>>  "urls": [7,8,9],
> > > >>>>  "hashtags": [
> > > >>>>    {
> > > >>>>      "text": "GoPatriots",
> > > >>>>      "indices": []
> > > >>>>    }
> > > >>>>  ],
> > > >>>>  "user_mentions": []
> > > >>>> }
> > > >>>> }
> > > >>>>
> > > >>>> Now we can do this to achieve the results:
> > > >>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json`
> t ;
> > > >>>> +------------+
> > > >>>> |   EXPR$0   |
> > > >>>> +------------+
> > > >>>> | [{"text":"GoPatriots"}] |
> > > >>>> +------------+
> > > >>>> 1 row selected (0.09 seconds)
> > > >>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from
> > > >> dfs.tmp.`d.json` t ;
> > > >>>> +------------+
> > > >>>> |   EXPR$0   |
> > > >>>> +------------+
> > > >>>> | GoPatriots |
> > > >>>> +------------+
> > > >>>> 1 row selected (0.108 seconds)
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Hao
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht <
> > > >>>> [email protected]> wrote:
> > > >>>>
> > > >>>>> When I run the query on a larger dataset it actually show the
> empty
> > > >>>>> records.
> > > >>>>>
> > > >>>>> select t.entities.hashtags from `twitter.json` t limit 10;
> > > >>>>>
> > > >>>>> +------------+
> > > >>>>> |   EXPR$0   |
> > > >>>>> +------------+
> > > >>>>> | []         |
> > > >>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> > > >>>>> | []         |
> > > >>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> > > >>>>> | []         |
> > > >>>>> | []         |
> > > >>>>> | []         |
> > > >>>>> | []         |
> > > >>>>> | []         |
> > > >>>>> | [{"text":"CARvsSEA","indices":[36,45]}] |
> > > >>>>> +------------+
> > > >>>>> 10 rows selected (2.899 seconds)
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> However having the output as maps is not very useful, unless i
> can
> > > >> filter
> > > >>>>> out the records with empty arrays and then drill deeper into the
> > ones
> > > >> with
> > > >>>>> data in the arrays.
> > > >>>>>
> > > >>>>> BTW: Hao I would have expiated your query to return both rows,
> one
> > > >> with an
> > > >>>>> empty array as above and the other with the array data.
> > > >>>>>
> > > >>>>>
> > > >>>>> —Andries
> > > >>>>>
> > > >>>>>
> > > >>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]> wrote:
> > > >>>>>
> > > >>>>>> I am not sure if below is expected behavior.
> > > >>>>>> If we only select "hashtags", and it will return only 1 row
> > ignoring
> > > >> the
> > > >>>>>> null value.
> > > >>>>>> However then if we try to get "hashtags.text", it fails...which
> > > >> means it
> > > >>>>> is
> > > >>>>>> still trying to read the NULL value.
> > > >>>>>> I am thinking it may confuse the SQL developers.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json`
> > t ;
> > > >>>>>> +------------+
> > > >>>>>> |   EXPR$0   |
> > > >>>>>> +------------+
> > > >>>>>> | [{"text":"GoPatriots"}] |
> > > >>>>>> +------------+
> > > >>>>>> 1 row selected (0.109 seconds)
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from
> > > >> dfs.tmp.`d.json` t ;
> > > >>>>>>
> > > >>>>>> Query failed: Query failed: Failure while running fragment.,
> > > >>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be
> > > >> cast to
> > > >>>>>> org.apache.drill.exec.vector.complex.MapVector [
> > > >>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> > > >>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Error: exception while executing query: Failure while executing
> > > >> query.
> > > >>>>>> (state=,code=0)
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Hao
> > > >>>>>>
> > > >>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
> > > >>>>>> [email protected]> wrote:
> > > >>>>>>
> > > >>>>>>> Now try on hashtags with the following:
> > > >>>>>>>
> > > >>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from
> > > >> `/twitter.json` t
> > > >>>>>>> where t.entities.hashtags is not null limit 10;
> > > >>>>>>>
> > > >>>>>>> Query failed: Query failed: Failure while running fragment.,
> > > >>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot
> be
> > > >> cast to
> > > >>>>>>> org.apache.drill.exec.vector.complex.MapVector [
> > > >>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> > > >>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Error: exception while executing query: Failure while executing
> > > >> query.
> > > >>>>>>> (state=,code=0)
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> {
> > > >>>>>>> "entities": {
> > > >>>>>>> "trends": [],
> > > >>>>>>> "symbols": [],
> > > >>>>>>> "urls": [],
> > > >>>>>>> "hashtags": [],
> > > >>>>>>> "user_mentions": []
> > > >>>>>>> },
> > > >>>>>>> "entities": {
> > > >>>>>>> "trends": [1,2,3],
> > > >>>>>>> "symbols": [4,5,6],
> > > >>>>>>> "urls": [7,8,9],
> > > >>>>>>> "hashtags": [
> > > >>>>>>>   {
> > > >>>>>>>     "text": "GoPatriots",
> > > >>>>>>>     "indices": []
> > > >>>>>>>   }
> > > >>>>>>> ],
> > > >>>>>>> "user_mentions": []
> > > >>>>>>> }
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> The issue seems to be that if some records have arrays with
> maps
> > in
> > > >> them
> > > >>>>>>> and others are empty.
> > > >>>>>>>
> > > >>>>>>> —Andries
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]>
> wrote:
> > > >>>>>>>
> > > >>>>>>>> Seems it works for below json file:
> > > >>>>>>>> {
> > > >>>>>>>> "entities": {
> > > >>>>>>>> "trends": [],
> > > >>>>>>>> "symbols": [],
> > > >>>>>>>> "urls": [],
> > > >>>>>>>> "hashtags": [
> > > >>>>>>>>   {
> > > >>>>>>>>     "text": "GoPatriots",
> > > >>>>>>>>     "indices": [
> > > >>>>>>>>       83,
> > > >>>>>>>>       94
> > > >>>>>>>>     ]
> > > >>>>>>>>   }
> > > >>>>>>>> ],
> > > >>>>>>>> "user_mentions": []
> > > >>>>>>>> },
> > > >>>>>>>> "entities": {
> > > >>>>>>>> "trends": [1,2,3],
> > > >>>>>>>> "symbols": [4,5,6],
> > > >>>>>>>> "urls": [7,8,9],
> > > >>>>>>>> "hashtags": [
> > > >>>>>>>>   {
> > > >>>>>>>>     "text": "GoPatriots",
> > > >>>>>>>>     "indices": []
> > > >>>>>>>>   }
> > > >>>>>>>> ],
> > > >>>>>>>> "user_mentions": []
> > > >>>>>>>> }
> > > >>>>>>>> }
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json`
> as
> > t
> > > >> where
> > > >>>>>>>> t.entities.urls is not null;
> > > >>>>>>>> +------------+
> > > >>>>>>>> |   EXPR$0   |
> > > >>>>>>>> +------------+
> > > >>>>>>>> | [7,8,9]    |
> > > >>>>>>>> +------------+
> > > >>>>>>>> 1 row selected (0.139 seconds)
> > > >>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json`
> as
> > t
> > > >> where
> > > >>>>>>>> t.entities.urls is null;
> > > >>>>>>>> +------------+
> > > >>>>>>>> |   EXPR$0   |
> > > >>>>>>>> +------------+
> > > >>>>>>>> +------------+
> > > >>>>>>>> No rows selected (0.158 seconds)
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>> Hao
> > > >>>>>>>>
> > > >>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <
> > [email protected]>
> > > >>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> I believe that this works if the array contains homogeneous
> > > >> primitive
> > > >>>>>>>>> types. In your example, it appears from the error, the array
> > > field
> > > >>>>>>> 'member'
> > > >>>>>>>>> contained maps for at least one record.
> > > >>>>>>>>>
> > > >>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <
> > > >> [email protected]>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Trying that locally did not work for me (drill 0.7.0):
> > > >>>>>>>>>>
> > > >>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from
> > > >>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0;
> > > >>>>>>>>>> Query failed: Query stopped., Failure while trying to
> > > materialize
> > > >>>>>>>>> incoming schema.  Errors:
> > > >>>>>>>>>>
> > > >>>>>>>>>> Error in expression at index -1.  Error: Missing function
> > > >>>>>>>>> implementation: [repeated_count(MAP-REPEATED)].  Full
> > expression:
> > > >>>>>>> --UNKNOWN
> > > >>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
> > > >>>>>>> bullseye-3:31010 ]
> > > >>>>>>>>>>
> > > >>>>>>>>>> Error: exception while executing query: Failure while
> > executing
> > > >>>>> query.
> > > >>>>>>>>> (state=,code=0)
> > > >>>>>>>>>>
> > > >>>>>>>>>> ​
> > > >>>>>>>>>>
> > > >>>>>>>>>> Chris Matta
> > > >>>>>>>>>> [email protected]
> > > >>>>>>>>>> 215-701-3146
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <
> > > [email protected]
> > > >>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> repeated_count('entities.urls') > 0
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
> > > >>>>>>>>>>> [email protected]> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> How do you filter out records with an empty array in
> drill?
> > > >>>>>>>>>>>> i.e some records have "url":[]  and some will have an
> array
> > > >> with
> > > >>>>> data
> > > >>>>>>>>> in
> > > >>>>>>>>>>>> it. When trying to read records with data in the array
> drill
> > > >> fails
> > > >>>>>>> due
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>> records missing any data in the array. Trying a filter
> > with/*
> > > >> where
> > > >>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying url
> > is
> > > >> not
> > > >>>>>>>>> null.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Note some of the arrays contains maps, using twitter data
> as
> > > an
> > > >>>>>>>>> example
> > > >>>>>>>>>>>> below. Some records have an empty array with “hashtags”:[]
> > > and
> > > >>>>>>> others
> > > >>>>>>>>>>> will
> > > >>>>>>>>>>>> look similar to what is listed below.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> "entities": {
> > > >>>>>>>>>>>> "trends": [],
> > > >>>>>>>>>>>> "symbols": [],
> > > >>>>>>>>>>>> "urls": [],
> > > >>>>>>>>>>>> "hashtags": [
> > > >>>>>>>>>>>>   {
> > > >>>>>>>>>>>>     "text": "GoPatriots",
> > > >>>>>>>>>>>>     "indices": [
> > > >>>>>>>>>>>>       83,
> > > >>>>>>>>>>>>       94
> > > >>>>>>>>>>>>     ]
> > > >>>>>>>>>>>>   }
> > > >>>>>>>>>>>> ],
> > > >>>>>>>>>>>> "user_mentions": []
> > > >>>>>>>>>>>> },
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Thanks
> > > >>>>>>>>>>>> —Andries
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>>
> > > >>> <z.json>
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
>  Steven Phillips
>  Software Engineer
>
>  mapr.com
>

Reply via email to