I can also fetch non-null values for attached json file which contains 6
"entities", 3 of them are null, 3 of them have "text" value.

Could you share your complete "twitter.json"?

0: jdbc:drill:> with tmp as
. . . . . . . > (
. . . . . . . > select flatten(t.entities.hashtags) as c from
dfs.tmp.`z.json` t
. . . . . . . > )
. . . . . . . > select tmp.c.text from tmp;
+------------+
|   EXPR$0   |
+------------+
| GoPatriots |
| aaa        |
| bbb        |
+------------+
3 rows selected (0.122 seconds)

Thanks,
Hao



On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht <
[email protected]> wrote:

> The sample data I posted only has 1 element, but some records have
> multiple elements in them.
>
> Interestingly enough though
> select t.entities.hashtags[0].`text` from `twitter.json` t limit 10;
>
> Produces
> +------------+
> |   EXPR$0   |
> +------------+
> | null       |
> | SportsNews |
> | null       |
> | SportsNews |
> | null       |
> | null       |
> | null       |
> | null       |
> | null       |
> | CARvsSEA   |
> +——————+
>
>
> And
>
> select t.entities.hashtags[0] from `twitter.json` t limit 10;
>
> +------------+
> |   EXPR$0   |
> +------------+
> | {"indices":[]} |
> | {"text":"SportsNews","indices":[90,99]} |
> | {"indices":[]} |
> | {"text":"SportsNews","indices":[90,99]} |
> | {"indices":[]} |
> | {"indices":[]} |
> | {"indices":[]} |
> | {"indices":[]} |
> | {"indices":[]} |
> | {"text":"CARvsSEA","indices":[90,99]} |
> +——————+
>
> Strange part is that there is no indices map in the hashtags array, so no
> idea why it shows up when pointing to the first lament in an empty array.
>
>
> —Andries
>
>
>
>
>
>
> On Jan 21, 2015, at 3:24 PM, Hao Zhu <[email protected]> wrote:
>
> > I just noticed that the result of "hashtags" is just an array with only 1
> > element.
> > So take your example:
> > [root@maprdemo tmp]# cat d.json
> > {
> > "entities": {
> >   "trends": [],
> >   "symbols": [],
> >   "urls": [],
> >   "hashtags": [],
> >   "user_mentions": []
> > },
> > "entities": {
> >   "trends": [1,2,3],
> >   "symbols": [4,5,6],
> >   "urls": [7,8,9],
> >   "hashtags": [
> >     {
> >       "text": "GoPatriots",
> >       "indices": []
> >     }
> >   ],
> >   "user_mentions": []
> > }
> > }
> >
> > Now we can do this to achieve the results:
> > 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ;
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > | [{"text":"GoPatriots"}] |
> > +------------+
> > 1 row selected (0.09 seconds)
> > 0: jdbc:drill:> select t.entities.hashtags[0].text from dfs.tmp.`d.json`
> t ;
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > | GoPatriots |
> > +------------+
> > 1 row selected (0.108 seconds)
> >
> > Thanks,
> > Hao
> >
> >
> >
> > On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht <
> > [email protected]> wrote:
> >
> >> When I run the query on a larger dataset it actually show the empty
> >> records.
> >>
> >> select t.entities.hashtags from `twitter.json` t limit 10;
> >>
> >> +------------+
> >> |   EXPR$0   |
> >> +------------+
> >> | []         |
> >> | [{"text":"SportsNews","indices":[0,11]}] |
> >> | []         |
> >> | [{"text":"SportsNews","indices":[0,11]}] |
> >> | []         |
> >> | []         |
> >> | []         |
> >> | []         |
> >> | []         |
> >> | [{"text":"CARvsSEA","indices":[36,45]}] |
> >> +------------+
> >> 10 rows selected (2.899 seconds)
> >>
> >>
> >>
> >> However having the output as maps is not very useful, unless i can
> filter
> >> out the records with empty arrays and then drill deeper into the ones
> with
> >> data in the arrays.
> >>
> >> BTW: Hao I would have expiated your query to return both rows, one with
> an
> >> empty array as above and the other with the array data.
> >>
> >>
> >> —Andries
> >>
> >>
> >> On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]> wrote:
> >>
> >>> I am not sure if below is expected behavior.
> >>> If we only select "hashtags", and it will return only 1 row ignoring
> the
> >>> null value.
> >>> However then if we try to get "hashtags.text", it fails...which means
> it
> >> is
> >>> still trying to read the NULL value.
> >>> I am thinking it may confuse the SQL developers.
> >>>
> >>>
> >>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ;
> >>> +------------+
> >>> |   EXPR$0   |
> >>> +------------+
> >>> | [{"text":"GoPatriots"}] |
> >>> +------------+
> >>> 1 row selected (0.109 seconds)
> >>>
> >>>
> >>> 0: jdbc:drill:> select t.entities.hashtags.text from dfs.tmp.`d.json`
> t ;
> >>>
> >>> Query failed: Query failed: Failure while running fragment.,
> >>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast
> to
> >>> org.apache.drill.exec.vector.complex.MapVector [
> >>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> >>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> >>>
> >>>
> >>> Error: exception while executing query: Failure while executing query.
> >>> (state=,code=0)
> >>>
> >>> Thanks,
> >>> Hao
> >>>
> >>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
> >>> [email protected]> wrote:
> >>>
> >>>> Now try on hashtags with the following:
> >>>>
> >>>> drilldemo:5181> select t.entities.hashtags.`text` from
> `/twitter.json` t
> >>>> where t.entities.hashtags is not null limit 10;
> >>>>
> >>>> Query failed: Query failed: Failure while running fragment.,
> >>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast
> to
> >>>> org.apache.drill.exec.vector.complex.MapVector [
> >>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> >>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> >>>>
> >>>>
> >>>> Error: exception while executing query: Failure while executing query.
> >>>> (state=,code=0)
> >>>>
> >>>>
> >>>> {
> >>>> "entities": {
> >>>>  "trends": [],
> >>>>  "symbols": [],
> >>>>  "urls": [],
> >>>>  "hashtags": [],
> >>>>  "user_mentions": []
> >>>> },
> >>>> "entities": {
> >>>>  "trends": [1,2,3],
> >>>>  "symbols": [4,5,6],
> >>>>  "urls": [7,8,9],
> >>>>  "hashtags": [
> >>>>    {
> >>>>      "text": "GoPatriots",
> >>>>      "indices": []
> >>>>    }
> >>>>  ],
> >>>>  "user_mentions": []
> >>>> }
> >>>> }
> >>>>
> >>>> The issue seems to be that if some records have arrays with maps in
> them
> >>>> and others are empty.
> >>>>
> >>>> —Andries
> >>>>
> >>>>
> >>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]> wrote:
> >>>>
> >>>>> Seems it works for below json file:
> >>>>> {
> >>>>> "entities": {
> >>>>>  "trends": [],
> >>>>>  "symbols": [],
> >>>>>  "urls": [],
> >>>>>  "hashtags": [
> >>>>>    {
> >>>>>      "text": "GoPatriots",
> >>>>>      "indices": [
> >>>>>        83,
> >>>>>        94
> >>>>>      ]
> >>>>>    }
> >>>>>  ],
> >>>>>  "user_mentions": []
> >>>>> },
> >>>>> "entities": {
> >>>>>  "trends": [1,2,3],
> >>>>>  "symbols": [4,5,6],
> >>>>>  "urls": [7,8,9],
> >>>>>  "hashtags": [
> >>>>>    {
> >>>>>      "text": "GoPatriots",
> >>>>>      "indices": []
> >>>>>    }
> >>>>>  ],
> >>>>>  "user_mentions": []
> >>>>> }
> >>>>> }
> >>>>>
> >>>>>
> >>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t
> where
> >>>>> t.entities.urls is not null;
> >>>>> +------------+
> >>>>> |   EXPR$0   |
> >>>>> +------------+
> >>>>> | [7,8,9]    |
> >>>>> +------------+
> >>>>> 1 row selected (0.139 seconds)
> >>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t
> where
> >>>>> t.entities.urls is null;
> >>>>> +------------+
> >>>>> |   EXPR$0   |
> >>>>> +------------+
> >>>>> +------------+
> >>>>> No rows selected (0.158 seconds)
> >>>>>
> >>>>> Thanks,
> >>>>> Hao
> >>>>>
> >>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <[email protected]>
> >> wrote:
> >>>>>
> >>>>>> I believe that this works if the array contains homogeneous
> primitive
> >>>>>> types. In your example, it appears from the error, the array field
> >>>> 'member'
> >>>>>> contained maps for at least one record.
> >>>>>>
> >>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <[email protected]
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Trying that locally did not work for me (drill 0.7.0):
> >>>>>>>
> >>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from
> >>>>>> `Downloads/test.json` where repeated_count(`members`) > 0;
> >>>>>>> Query failed: Query stopped., Failure while trying to materialize
> >>>>>> incoming schema.  Errors:
> >>>>>>>
> >>>>>>> Error in expression at index -1.  Error: Missing function
> >>>>>> implementation: [repeated_count(MAP-REPEATED)].  Full expression:
> >>>> --UNKNOWN
> >>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
> >>>> bullseye-3:31010 ]
> >>>>>>>
> >>>>>>> Error: exception while executing query: Failure while executing
> >> query.
> >>>>>> (state=,code=0)
> >>>>>>>
> >>>>>>> ​
> >>>>>>>
> >>>>>>> Chris Matta
> >>>>>>> [email protected]
> >>>>>>> 215-701-3146
> >>>>>>>
> >>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <[email protected]>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> repeated_count('entities.urls') > 0
> >>>>>>>>
> >>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
> >>>>>>>> [email protected]> wrote:
> >>>>>>>>
> >>>>>>>>> How do you filter out records with an empty array in drill?
> >>>>>>>>> i.e some records have "url":[]  and some will have an array with
> >> data
> >>>>>> in
> >>>>>>>>> it. When trying to read records with data in the array drill
> fails
> >>>> due
> >>>>>>>> to
> >>>>>>>>> records missing any data in the array. Trying a filter with/*
> where
> >>>>>>>>> "url":[0] is not null */ fails, also fails if applying url is not
> >>>>>> null.
> >>>>>>>>>
> >>>>>>>>> Note some of the arrays contains maps, using twitter data as an
> >>>>>> example
> >>>>>>>>> below. Some records have an empty array with “hashtags”:[]  and
> >>>> others
> >>>>>>>> will
> >>>>>>>>> look similar to what is listed below.
> >>>>>>>>>
> >>>>>>>>> "entities": {
> >>>>>>>>>  "trends": [],
> >>>>>>>>>  "symbols": [],
> >>>>>>>>>  "urls": [],
> >>>>>>>>>  "hashtags": [
> >>>>>>>>>    {
> >>>>>>>>>      "text": "GoPatriots",
> >>>>>>>>>      "indices": [
> >>>>>>>>>        83,
> >>>>>>>>>        94
> >>>>>>>>>      ]
> >>>>>>>>>    }
> >>>>>>>>>  ],
> >>>>>>>>>  "user_mentions": []
> >>>>>>>>> },
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> —Andries
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Attachment: z.json
Description: application/json

Reply via email to