Hao,

It looks to me like your json file is actually invalid, and it seems like
Drill should be throwing an exception.

On Wed, Jan 21, 2015 at 5:04 PM, Hao Zhu <[email protected]> wrote:

> I figured out the differences after getting the json file from Andries.
>
> My json file is like:
> {“entities”:{xxx},“entities”:{yyy}... }
>
> Andries' json file is like:
> {“entities”:{xxx}}
> {“entities”:{yyy}}
> ...
>
> So basically Andries' json file is contains multiple json files.
>
> Fortunately, we can use the same SQL to get the same results:
> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t;
> +------------+
> |   EXPR$0   |
> +------------+
> | []         |
> | [{"text":"GoPatriots"}] |
> | [{"text":"aaa"}] |
> | []         |
> | [{"text":"bbb"}] |
> +------------+
> 5 rows selected (0.136 seconds)
> 0: jdbc:drill:> with tmp as
> . . . . . . . > (
> . . . . . . . > select flatten(t.entities.hashtags) as c from
> dfs.tmp.`z2.json` t
> . . . . . . . > )
> . . . . . . . > select tmp.c.text from tmp;
> +------------+
> |   EXPR$0   |
> +------------+
> | GoPatriots |
> | aaa        |
> | bbb        |
> +------------+
> 3 rows selected (0.115 seconds)
>
> Thanks,
> Hao
>
>
>
>
>
>
> On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht <
> [email protected]> wrote:
>
> > In my case it returns the empty records when flatten is not used.
> >
> > 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as hashtags
> > from `twitter.json` t limit 10;
> > +------------+
> > |  hashtags  |
> > +------------+
> > | []         |
> > | [{"text":"SportsNews","indices":[0,11]}] |
> > | []         |
> > | [{"text":"SportsNews","indices":[0,11]}] |
> > | []         |
> > | []         |
> > | []         |
> > | []         |
> > | []         |
> > | [{"text":"CARvsSEA","indices":[36,45]}] |
> > +------------+
> >
> > On Jan 21, 2015, at 4:26 PM, Hao Zhu <[email protected]> wrote:
> >
> > > Actually not due to flatten, if you directly query the file, it will
> only
> > > show the non-null values.
> > >
> > > 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t;
> > > +------------+
> > > |   EXPR$0   |
> > > +------------+
> > > | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] |
> > > +------------+
> > > 1 row selected (0.123 seconds)
> > >
> > > Thanks,
> > > Hao
> > >
> > > On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht <
> > > [email protected]> wrote:
> > >
> > >> Very interesting, flatten seems to bypass empty records. Not sure if
> > that
> > >> is an ideal result for all use cases, but certainly usable in this
> case.
> > >>
> > >> Thanks
> > >> —Andries
> > >>
> > >>
> > >> On Jan 21, 2015, at 3:45 PM, Hao Zhu <[email protected]> wrote:
> > >>
> > >>> I can also fetch non-null values for attached json file which
> contains
> > 6
> > >> "entities", 3 of them are null, 3 of them have "text" value.
> > >>>
> > >>> Could you share your complete "twitter.json"?
> > >>>
> > >>> 0: jdbc:drill:> with tmp as
> > >>> . . . . . . . > (
> > >>> . . . . . . . > select flatten(t.entities.hashtags) as c from
> > >> dfs.tmp.`z.json` t
> > >>> . . . . . . . > )
> > >>> . . . . . . . > select tmp.c.text from tmp;
> > >>> +------------+
> > >>> |   EXPR$0   |
> > >>> +------------+
> > >>> | GoPatriots |
> > >>> | aaa        |
> > >>> | bbb        |
> > >>> +------------+
> > >>> 3 rows selected (0.122 seconds)
> > >>>
> > >>> Thanks,
> > >>> Hao
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht <
> > >> [email protected]> wrote:
> > >>> The sample data I posted only has 1 element, but some records have
> > >> multiple elements in them.
> > >>>
> > >>> Interestingly enough though
> > >>> select t.entities.hashtags[0].`text` from `twitter.json` t limit 10;
> > >>>
> > >>> Produces
> > >>> +------------+
> > >>> |   EXPR$0   |
> > >>> +------------+
> > >>> | null       |
> > >>> | SportsNews |
> > >>> | null       |
> > >>> | SportsNews |
> > >>> | null       |
> > >>> | null       |
> > >>> | null       |
> > >>> | null       |
> > >>> | null       |
> > >>> | CARvsSEA   |
> > >>> +——————+
> > >>>
> > >>>
> > >>> And
> > >>>
> > >>> select t.entities.hashtags[0] from `twitter.json` t limit 10;
> > >>>
> > >>> +------------+
> > >>> |   EXPR$0   |
> > >>> +------------+
> > >>> | {"indices":[]} |
> > >>> | {"text":"SportsNews","indices":[90,99]} |
> > >>> | {"indices":[]} |
> > >>> | {"text":"SportsNews","indices":[90,99]} |
> > >>> | {"indices":[]} |
> > >>> | {"indices":[]} |
> > >>> | {"indices":[]} |
> > >>> | {"indices":[]} |
> > >>> | {"indices":[]} |
> > >>> | {"text":"CARvsSEA","indices":[90,99]} |
> > >>> +——————+
> > >>>
> > >>> Strange part is that there is no indices map in the hashtags array,
> so
> > >> no idea why it shows up when pointing to the first lament in an empty
> > array.
> > >>>
> > >>>
> > >>> —Andries
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <[email protected]> wrote:
> > >>>
> > >>>> I just noticed that the result of "hashtags" is just an array with
> > >> only 1
> > >>>> element.
> > >>>> So take your example:
> > >>>> [root@maprdemo tmp]# cat d.json
> > >>>> {
> > >>>> "entities": {
> > >>>>  "trends": [],
> > >>>>  "symbols": [],
> > >>>>  "urls": [],
> > >>>>  "hashtags": [],
> > >>>>  "user_mentions": []
> > >>>> },
> > >>>> "entities": {
> > >>>>  "trends": [1,2,3],
> > >>>>  "symbols": [4,5,6],
> > >>>>  "urls": [7,8,9],
> > >>>>  "hashtags": [
> > >>>>    {
> > >>>>      "text": "GoPatriots",
> > >>>>      "indices": []
> > >>>>    }
> > >>>>  ],
> > >>>>  "user_mentions": []
> > >>>> }
> > >>>> }
> > >>>>
> > >>>> Now we can do this to achieve the results:
> > >>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ;
> > >>>> +------------+
> > >>>> |   EXPR$0   |
> > >>>> +------------+
> > >>>> | [{"text":"GoPatriots"}] |
> > >>>> +------------+
> > >>>> 1 row selected (0.09 seconds)
> > >>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from
> > >> dfs.tmp.`d.json` t ;
> > >>>> +------------+
> > >>>> |   EXPR$0   |
> > >>>> +------------+
> > >>>> | GoPatriots |
> > >>>> +------------+
> > >>>> 1 row selected (0.108 seconds)
> > >>>>
> > >>>> Thanks,
> > >>>> Hao
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht <
> > >>>> [email protected]> wrote:
> > >>>>
> > >>>>> When I run the query on a larger dataset it actually show the empty
> > >>>>> records.
> > >>>>>
> > >>>>> select t.entities.hashtags from `twitter.json` t limit 10;
> > >>>>>
> > >>>>> +------------+
> > >>>>> |   EXPR$0   |
> > >>>>> +------------+
> > >>>>> | []         |
> > >>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> > >>>>> | []         |
> > >>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> > >>>>> | []         |
> > >>>>> | []         |
> > >>>>> | []         |
> > >>>>> | []         |
> > >>>>> | []         |
> > >>>>> | [{"text":"CARvsSEA","indices":[36,45]}] |
> > >>>>> +------------+
> > >>>>> 10 rows selected (2.899 seconds)
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> However having the output as maps is not very useful, unless i can
> > >> filter
> > >>>>> out the records with empty arrays and then drill deeper into the
> ones
> > >> with
> > >>>>> data in the arrays.
> > >>>>>
> > >>>>> BTW: Hao I would have expiated your query to return both rows, one
> > >> with an
> > >>>>> empty array as above and the other with the array data.
> > >>>>>
> > >>>>>
> > >>>>> —Andries
> > >>>>>
> > >>>>>
> > >>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]> wrote:
> > >>>>>
> > >>>>>> I am not sure if below is expected behavior.
> > >>>>>> If we only select "hashtags", and it will return only 1 row
> ignoring
> > >> the
> > >>>>>> null value.
> > >>>>>> However then if we try to get "hashtags.text", it fails...which
> > >> means it
> > >>>>> is
> > >>>>>> still trying to read the NULL value.
> > >>>>>> I am thinking it may confuse the SQL developers.
> > >>>>>>
> > >>>>>>
> > >>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json`
> t ;
> > >>>>>> +------------+
> > >>>>>> |   EXPR$0   |
> > >>>>>> +------------+
> > >>>>>> | [{"text":"GoPatriots"}] |
> > >>>>>> +------------+
> > >>>>>> 1 row selected (0.109 seconds)
> > >>>>>>
> > >>>>>>
> > >>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from
> > >> dfs.tmp.`d.json` t ;
> > >>>>>>
> > >>>>>> Query failed: Query failed: Failure while running fragment.,
> > >>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be
> > >> cast to
> > >>>>>> org.apache.drill.exec.vector.complex.MapVector [
> > >>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> > >>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> > >>>>>>
> > >>>>>>
> > >>>>>> Error: exception while executing query: Failure while executing
> > >> query.
> > >>>>>> (state=,code=0)
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Hao
> > >>>>>>
> > >>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
> > >>>>>> [email protected]> wrote:
> > >>>>>>
> > >>>>>>> Now try on hashtags with the following:
> > >>>>>>>
> > >>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from
> > >> `/twitter.json` t
> > >>>>>>> where t.entities.hashtags is not null limit 10;
> > >>>>>>>
> > >>>>>>> Query failed: Query failed: Failure while running fragment.,
> > >>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be
> > >> cast to
> > >>>>>>> org.apache.drill.exec.vector.complex.MapVector [
> > >>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> > >>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Error: exception while executing query: Failure while executing
> > >> query.
> > >>>>>>> (state=,code=0)
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> {
> > >>>>>>> "entities": {
> > >>>>>>> "trends": [],
> > >>>>>>> "symbols": [],
> > >>>>>>> "urls": [],
> > >>>>>>> "hashtags": [],
> > >>>>>>> "user_mentions": []
> > >>>>>>> },
> > >>>>>>> "entities": {
> > >>>>>>> "trends": [1,2,3],
> > >>>>>>> "symbols": [4,5,6],
> > >>>>>>> "urls": [7,8,9],
> > >>>>>>> "hashtags": [
> > >>>>>>>   {
> > >>>>>>>     "text": "GoPatriots",
> > >>>>>>>     "indices": []
> > >>>>>>>   }
> > >>>>>>> ],
> > >>>>>>> "user_mentions": []
> > >>>>>>> }
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> The issue seems to be that if some records have arrays with maps
> in
> > >> them
> > >>>>>>> and others are empty.
> > >>>>>>>
> > >>>>>>> —Andries
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]> wrote:
> > >>>>>>>
> > >>>>>>>> Seems it works for below json file:
> > >>>>>>>> {
> > >>>>>>>> "entities": {
> > >>>>>>>> "trends": [],
> > >>>>>>>> "symbols": [],
> > >>>>>>>> "urls": [],
> > >>>>>>>> "hashtags": [
> > >>>>>>>>   {
> > >>>>>>>>     "text": "GoPatriots",
> > >>>>>>>>     "indices": [
> > >>>>>>>>       83,
> > >>>>>>>>       94
> > >>>>>>>>     ]
> > >>>>>>>>   }
> > >>>>>>>> ],
> > >>>>>>>> "user_mentions": []
> > >>>>>>>> },
> > >>>>>>>> "entities": {
> > >>>>>>>> "trends": [1,2,3],
> > >>>>>>>> "symbols": [4,5,6],
> > >>>>>>>> "urls": [7,8,9],
> > >>>>>>>> "hashtags": [
> > >>>>>>>>   {
> > >>>>>>>>     "text": "GoPatriots",
> > >>>>>>>>     "indices": []
> > >>>>>>>>   }
> > >>>>>>>> ],
> > >>>>>>>> "user_mentions": []
> > >>>>>>>> }
> > >>>>>>>> }
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as
> t
> > >> where
> > >>>>>>>> t.entities.urls is not null;
> > >>>>>>>> +------------+
> > >>>>>>>> |   EXPR$0   |
> > >>>>>>>> +------------+
> > >>>>>>>> | [7,8,9]    |
> > >>>>>>>> +------------+
> > >>>>>>>> 1 row selected (0.139 seconds)
> > >>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as
> t
> > >> where
> > >>>>>>>> t.entities.urls is null;
> > >>>>>>>> +------------+
> > >>>>>>>> |   EXPR$0   |
> > >>>>>>>> +------------+
> > >>>>>>>> +------------+
> > >>>>>>>> No rows selected (0.158 seconds)
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Hao
> > >>>>>>>>
> > >>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <
> [email protected]>
> > >>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> I believe that this works if the array contains homogeneous
> > >> primitive
> > >>>>>>>>> types. In your example, it appears from the error, the array
> > field
> > >>>>>>> 'member'
> > >>>>>>>>> contained maps for at least one record.
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <
> > >> [email protected]>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Trying that locally did not work for me (drill 0.7.0):
> > >>>>>>>>>>
> > >>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from
> > >>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0;
> > >>>>>>>>>> Query failed: Query stopped., Failure while trying to
> > materialize
> > >>>>>>>>> incoming schema.  Errors:
> > >>>>>>>>>>
> > >>>>>>>>>> Error in expression at index -1.  Error: Missing function
> > >>>>>>>>> implementation: [repeated_count(MAP-REPEATED)].  Full
> expression:
> > >>>>>>> --UNKNOWN
> > >>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
> > >>>>>>> bullseye-3:31010 ]
> > >>>>>>>>>>
> > >>>>>>>>>> Error: exception while executing query: Failure while
> executing
> > >>>>> query.
> > >>>>>>>>> (state=,code=0)
> > >>>>>>>>>>
> > >>>>>>>>>> ​
> > >>>>>>>>>>
> > >>>>>>>>>> Chris Matta
> > >>>>>>>>>> [email protected]
> > >>>>>>>>>> 215-701-3146
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <
> > [email protected]
> > >>>
> > >>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> repeated_count('entities.urls') > 0
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
> > >>>>>>>>>>> [email protected]> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> How do you filter out records with an empty array in drill?
> > >>>>>>>>>>>> i.e some records have "url":[]  and some will have an array
> > >> with
> > >>>>> data
> > >>>>>>>>> in
> > >>>>>>>>>>>> it. When trying to read records with data in the array drill
> > >> fails
> > >>>>>>> due
> > >>>>>>>>>>> to
> > >>>>>>>>>>>> records missing any data in the array. Trying a filter
> with/*
> > >> where
> > >>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying url
> is
> > >> not
> > >>>>>>>>> null.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Note some of the arrays contains maps, using twitter data as
> > an
> > >>>>>>>>> example
> > >>>>>>>>>>>> below. Some records have an empty array with “hashtags”:[]
> > and
> > >>>>>>> others
> > >>>>>>>>>>> will
> > >>>>>>>>>>>> look similar to what is listed below.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> "entities": {
> > >>>>>>>>>>>> "trends": [],
> > >>>>>>>>>>>> "symbols": [],
> > >>>>>>>>>>>> "urls": [],
> > >>>>>>>>>>>> "hashtags": [
> > >>>>>>>>>>>>   {
> > >>>>>>>>>>>>     "text": "GoPatriots",
> > >>>>>>>>>>>>     "indices": [
> > >>>>>>>>>>>>       83,
> > >>>>>>>>>>>>       94
> > >>>>>>>>>>>>     ]
> > >>>>>>>>>>>>   }
> > >>>>>>>>>>>> ],
> > >>>>>>>>>>>> "user_mentions": []
> > >>>>>>>>>>>> },
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thanks
> > >>>>>>>>>>>> —Andries
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >>>
> > >>> <z.json>
> > >>
> > >>
> >
> >
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com

Reply via email to