Re: Filter out empty arrays in JSON

Christopher Matta Thu, 05 Feb 2015 19:39:57 -0800

You missed something:

select t.entities.hashtags from dfs.twitter./nflt where
t.entities.hashtags[0].*text* is not null limit 10;


Add the .text to the end of the where.


Chris Matta
[email protected]
215-701-3146

On Thu, Feb 5, 2015 at 10:05 PM, Andries Engelbrecht <
[email protected]> wrote:

> Chris
>
> Thanks, but I tried it before without success.
>
> I still get this on Drill 0.7.
>
> select t.entities.hashtags from dfs.twitter.`/nfl` t where
> t.entities.hashtags[0] is not null limit 10;
>
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> Query failed: Query failed: Failure while running fragment., Failure while
> trying to materialize incoming schema.  Errors:
>
> Error in expression at index 0.  Error: Missing function implementation:
> [isnotnull(MAP-REQUIRED)].  Full expression: null.. [
> 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ]
> [ 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ]
>
>
> Mine seems finicky about even using the first element in the array
>
> select t.entities.hashtags[0] from dfs.twitter.`/nfl` t limit 10;
>
> Query failed: Query failed: Failure while running fragment., index: 4,
> length: 4 (expected: range(0, 4)) [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on
> drilldemo:31010 ]
> [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on drilldemo:31010 ]
>
>
> However flatten seems to work. (as long as you don’t have thousands of
> small json files)
>
> select flatten(t.entities.hashtags) from dfs.twitter.`/nfl` t limit 10;
>
> +------------+
> |   EXPR$0   |
> +------------+
> | {"text":"PantherPride","indices":["12","25"]} |
> | {"text":"KeepPounding","indices":["48","61"]} |
> | {"text":"vegas","indices":["107","113"]} |
> | {"text":"nfl","indices":["23","27"]} |
> | {"text":"Cowboys","indices":["7","15"]} |
> | {"text":"Cowboys","indices":["7","15"]} |
> | {"text":"Ebay","indices":["52","57"]} |
> | {"text":"NFL","indices":["58","62"]} |
> | {"text":"Christmas","indices":["63","73"]} |
> | {"text":"Gift","indices":["74","79"]} |
> +------------+
>
>
> —Andries
>
>
>
>
> On Feb 5, 2015, at 6:41 PM, Christopher Matta <[email protected]> wrote:
>
> > Andreas, I think I may have solved your issue:
> >
> >> select t.entities.hashtags FROM mfs.cmatta.`tweets/blackfriday` t WHERE
> t.entities.hashtags[0].text is not null limit 10;
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > |
> [{"text":"Blackfriday","indices":[11,23]},{"text":"facebook","indices":[56,65]},{"text":"christmas","indices":[66,76]}]
> > |
> > |
> [{"text":"BlackFriday","indices":[65,77]},{"text":"ViernesNegro","indices":[105,118]},{"text":"ofertas","indices":[122,130]}]
> > |
> > |
> [{"text":"StarWars","indices":[21,30]},{"text":"BlackFriday","indices":[71,83]}]
> > |
> > | [{"text":"NotOneDime","indices":[14,25]}] |
> > | [{"text":"BlackFriday","indices":[43,55]}] |
> > | [{"text":"JRStudio","indices":[93,102]}] |
> > |
> [{"text":"luv","indices":[38,42]},{"text":"luvmanicures","indices":[43,56]},{"text":"luvpedicures","indices":[57,70]},{"text":"luvroyaloak","indices":[71,83]},{"text":"woodwardave","indices":[84,96]}]
> > |
> > | [{"text":"BlackFriday","indices":[0,12]}] |
> > |
> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}]
> > |
> > |
> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}]
> > |
> > +------------+
> > 10 rows selected (1.697 seconds)
> >
> > I noticed when only extracting the first text item in the hashtag array (
> > t.entities.hashtags[0].text) it was returning null for those tweets that
> > didn’t have hashtags, so filtering out that seems to work.
> > 
> >
> > Chris Matta
> > [email protected]
> > 215-701-3146
> >
> > On Mon, Jan 26, 2015 at 6:43 PM, Jason Altekruse <
> [email protected]>
> > wrote:
> >
> >> As Aditya commented before this will work if the lists only contain
> scalars
> >>
> >> repeated_count('entities.urls') > 0
> >>
> >> If the lists contain maps unfortunately this is not available today.
> There
> >> is an enhancement request open for this feature. I have marked it for a
> fix
> >> in 0.9 as it is more of a feature request than a bug and we are working
> on
> >> closing a large number of bugs for 0.8 before we get to issues like
> this.
> >>
> >> https://issues.apache.org/jira/browse/DRILL-1650
> >>
> >> -Jason
> >>
> >> On Mon, Jan 26, 2015 at 3:26 PM, Andries Engelbrecht <
> >> [email protected]> wrote:
> >>
> >>> Unfortunately it seems that with larger data sets the use of flatten
> >> seems
> >>> to produce an error.
> >>>
> >>> Any other options to filter out JSON records (objects) where an array
> is
> >>> empty?
> >>>
> >>> Thanks
> >>> —Andries
> >>>
> >>>
> >>> On Jan 21, 2015, at 5:18 PM, Andries Engelbrecht <
> >>> [email protected]> wrote:
> >>>
> >>>> It was interesting to see flatten bypass the records with an empty
> >> array.
> >>>>
> >>>> Unfortunately some arrays are much more complex than the example here,
> >>> and it is still useful to have the ability to filter out ones that are
> >>> empty.
> >>>>
> >>>> —Andries
> >>>>
> >>>> On Jan 21, 2015, at 5:04 PM, Hao Zhu <[email protected]> wrote:
> >>>>
> >>>>> I figured out the differences after getting the json file from
> >> Andries.
> >>>>>
> >>>>> My json file is like:
> >>>>> {“entities”:{xxx},“entities”:{yyy}... }
> >>>>>
> >>>>> Andries' json file is like:
> >>>>> {“entities”:{xxx}}
> >>>>> {“entities”:{yyy}}
> >>>>> ...
> >>>>>
> >>>>> So basically Andries' json file is contains multiple json files.
> >>>>>
> >>>>> Fortunately, we can use the same SQL to get the same results:
> >>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t;
> >>>>> +------------+
> >>>>> |   EXPR$0   |
> >>>>> +------------+
> >>>>> | []         |
> >>>>> | [{"text":"GoPatriots"}] |
> >>>>> | [{"text":"aaa"}] |
> >>>>> | []         |
> >>>>> | [{"text":"bbb"}] |
> >>>>> +------------+
> >>>>> 5 rows selected (0.136 seconds)
> >>>>> 0: jdbc:drill:> with tmp as
> >>>>> . . . . . . . > (
> >>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from
> >>>>> dfs.tmp.`z2.json` t
> >>>>> . . . . . . . > )
> >>>>> . . . . . . . > select tmp.c.text from tmp;
> >>>>> +------------+
> >>>>> |   EXPR$0   |
> >>>>> +------------+
> >>>>> | GoPatriots |
> >>>>> | aaa        |
> >>>>> | bbb        |
> >>>>> +------------+
> >>>>> 3 rows selected (0.115 seconds)
> >>>>>
> >>>>> Thanks,
> >>>>> Hao
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht <
> >>>>> [email protected]> wrote:
> >>>>>
> >>>>>> In my case it returns the empty records when flatten is not used.
> >>>>>>
> >>>>>> 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as
> >> hashtags
> >>>>>> from `twitter.json` t limit 10;
> >>>>>> +------------+
> >>>>>> |  hashtags  |
> >>>>>> +------------+
> >>>>>> | []         |
> >>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> >>>>>> | []         |
> >>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> >>>>>> | []         |
> >>>>>> | []         |
> >>>>>> | []         |
> >>>>>> | []         |
> >>>>>> | []         |
> >>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] |
> >>>>>> +------------+
> >>>>>>
> >>>>>> On Jan 21, 2015, at 4:26 PM, Hao Zhu <[email protected]> wrote:
> >>>>>>
> >>>>>>> Actually not due to flatten, if you directly query the file, it
> will
> >>> only
> >>>>>>> show the non-null values.
> >>>>>>>
> >>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t;
> >>>>>>> +------------+
> >>>>>>> |   EXPR$0   |
> >>>>>>> +------------+
> >>>>>>> | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] |
> >>>>>>> +------------+
> >>>>>>> 1 row selected (0.123 seconds)
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Hao
> >>>>>>>
> >>>>>>> On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht <
> >>>>>>> [email protected]> wrote:
> >>>>>>>
> >>>>>>>> Very interesting, flatten seems to bypass empty records. Not sure
> >> if
> >>>>>> that
> >>>>>>>> is an ideal result for all use cases, but certainly usable in this
> >>> case.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> —Andries
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Jan 21, 2015, at 3:45 PM, Hao Zhu <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>>> I can also fetch non-null values for attached json file which
> >>> contains
> >>>>>> 6
> >>>>>>>> "entities", 3 of them are null, 3 of them have "text" value.
> >>>>>>>>>
> >>>>>>>>> Could you share your complete "twitter.json"?
> >>>>>>>>>
> >>>>>>>>> 0: jdbc:drill:> with tmp as
> >>>>>>>>> . . . . . . . > (
> >>>>>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from
> >>>>>>>> dfs.tmp.`z.json` t
> >>>>>>>>> . . . . . . . > )
> >>>>>>>>> . . . . . . . > select tmp.c.text from tmp;
> >>>>>>>>> +------------+
> >>>>>>>>> |   EXPR$0   |
> >>>>>>>>> +------------+
> >>>>>>>>> | GoPatriots |
> >>>>>>>>> | aaa        |
> >>>>>>>>> | bbb        |
> >>>>>>>>> +------------+
> >>>>>>>>> 3 rows selected (0.122 seconds)
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Hao
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht <
> >>>>>>>> [email protected]> wrote:
> >>>>>>>>> The sample data I posted only has 1 element, but some records
> have
> >>>>>>>> multiple elements in them.
> >>>>>>>>>
> >>>>>>>>> Interestingly enough though
> >>>>>>>>> select t.entities.hashtags[0].`text` from `twitter.json` t limit
> >> 10;
> >>>>>>>>>
> >>>>>>>>> Produces
> >>>>>>>>> +------------+
> >>>>>>>>> |   EXPR$0   |
> >>>>>>>>> +------------+
> >>>>>>>>> | null       |
> >>>>>>>>> | SportsNews |
> >>>>>>>>> | null       |
> >>>>>>>>> | SportsNews |
> >>>>>>>>> | null       |
> >>>>>>>>> | null       |
> >>>>>>>>> | null       |
> >>>>>>>>> | null       |
> >>>>>>>>> | null       |
> >>>>>>>>> | CARvsSEA   |
> >>>>>>>>> +——————+
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And
> >>>>>>>>>
> >>>>>>>>> select t.entities.hashtags[0] from `twitter.json` t limit 10;
> >>>>>>>>>
> >>>>>>>>> +------------+
> >>>>>>>>> |   EXPR$0   |
> >>>>>>>>> +------------+
> >>>>>>>>> | {"indices":[]} |
> >>>>>>>>> | {"text":"SportsNews","indices":[90,99]} |
> >>>>>>>>> | {"indices":[]} |
> >>>>>>>>> | {"text":"SportsNews","indices":[90,99]} |
> >>>>>>>>> | {"indices":[]} |
> >>>>>>>>> | {"indices":[]} |
> >>>>>>>>> | {"indices":[]} |
> >>>>>>>>> | {"indices":[]} |
> >>>>>>>>> | {"indices":[]} |
> >>>>>>>>> | {"text":"CARvsSEA","indices":[90,99]} |
> >>>>>>>>> +——————+
> >>>>>>>>>
> >>>>>>>>> Strange part is that there is no indices map in the hashtags
> >> array,
> >>> so
> >>>>>>>> no idea why it shows up when pointing to the first lament in an
> >> empty
> >>>>>> array.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> —Andries
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <[email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>>> I just noticed that the result of "hashtags" is just an array
> >> with
> >>>>>>>> only 1
> >>>>>>>>>> element.
> >>>>>>>>>> So take your example:
> >>>>>>>>>> [root@maprdemo tmp]# cat d.json
> >>>>>>>>>> {
> >>>>>>>>>> "entities": {
> >>>>>>>>>> "trends": [],
> >>>>>>>>>> "symbols": [],
> >>>>>>>>>> "urls": [],
> >>>>>>>>>> "hashtags": [],
> >>>>>>>>>> "user_mentions": []
> >>>>>>>>>> },
> >>>>>>>>>> "entities": {
> >>>>>>>>>> "trends": [1,2,3],
> >>>>>>>>>> "symbols": [4,5,6],
> >>>>>>>>>> "urls": [7,8,9],
> >>>>>>>>>> "hashtags": [
> >>>>>>>>>> {
> >>>>>>>>>>   "text": "GoPatriots",
> >>>>>>>>>>   "indices": []
> >>>>>>>>>> }
> >>>>>>>>>> ],
> >>>>>>>>>> "user_mentions": []
> >>>>>>>>>> }
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> Now we can do this to achieve the results:
> >>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json`
> >> t
> >>> ;
> >>>>>>>>>> +------------+
> >>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>> +------------+
> >>>>>>>>>> | [{"text":"GoPatriots"}] |
> >>>>>>>>>> +------------+
> >>>>>>>>>> 1 row selected (0.09 seconds)
> >>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from
> >>>>>>>> dfs.tmp.`d.json` t ;
> >>>>>>>>>> +------------+
> >>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>> +------------+
> >>>>>>>>>> | GoPatriots |
> >>>>>>>>>> +------------+
> >>>>>>>>>> 1 row selected (0.108 seconds)
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Hao
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht <
> >>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> When I run the query on a larger dataset it actually show the
> >>> empty
> >>>>>>>>>>> records.
> >>>>>>>>>>>
> >>>>>>>>>>> select t.entities.hashtags from `twitter.json` t limit 10;
> >>>>>>>>>>>
> >>>>>>>>>>> +------------+
> >>>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>>> +------------+
> >>>>>>>>>>> | []         |
> >>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> >>>>>>>>>>> | []         |
> >>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> >>>>>>>>>>> | []         |
> >>>>>>>>>>> | []         |
> >>>>>>>>>>> | []         |
> >>>>>>>>>>> | []         |
> >>>>>>>>>>> | []         |
> >>>>>>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] |
> >>>>>>>>>>> +------------+
> >>>>>>>>>>> 10 rows selected (2.899 seconds)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> However having the output as maps is not very useful, unless i
> >> can
> >>>>>>>> filter
> >>>>>>>>>>> out the records with empty arrays and then drill deeper into
> the
> >>> ones
> >>>>>>>> with
> >>>>>>>>>>> data in the arrays.
> >>>>>>>>>>>
> >>>>>>>>>>> BTW: Hao I would have expiated your query to return both rows,
> >> one
> >>>>>>>> with an
> >>>>>>>>>>> empty array as above and the other with the array data.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> —Andries
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]>
> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I am not sure if below is expected behavior.
> >>>>>>>>>>>> If we only select "hashtags", and it will return only 1 row
> >>> ignoring
> >>>>>>>> the
> >>>>>>>>>>>> null value.
> >>>>>>>>>>>> However then if we try to get "hashtags.text", it
> fails...which
> >>>>>>>> means it
> >>>>>>>>>>> is
> >>>>>>>>>>>> still trying to read the NULL value.
> >>>>>>>>>>>> I am thinking it may confuse the SQL developers.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from
> >> dfs.tmp.`d.json`
> >>> t ;
> >>>>>>>>>>>> +------------+
> >>>>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>>>> +------------+
> >>>>>>>>>>>> | [{"text":"GoPatriots"}] |
> >>>>>>>>>>>> +------------+
> >>>>>>>>>>>> 1 row selected (0.109 seconds)
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from
> >>>>>>>> dfs.tmp.`d.json` t ;
> >>>>>>>>>>>>
> >>>>>>>>>>>> Query failed: Query failed: Failure while running fragment.,
> >>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot
> >> be
> >>>>>>>> cast to
> >>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [
> >>>>>>>>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> >>>>>>>>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Error: exception while executing query: Failure while
> executing
> >>>>>>>> query.
> >>>>>>>>>>>> (state=,code=0)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Hao
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
> >>>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Now try on hashtags with the following:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from
> >>>>>>>> `/twitter.json` t
> >>>>>>>>>>>>> where t.entities.hashtags is not null limit 10;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment.,
> >>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot
> >> be
> >>>>>>>> cast to
> >>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [
> >>>>>>>>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> >>>>>>>>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Error: exception while executing query: Failure while
> >> executing
> >>>>>>>> query.
> >>>>>>>>>>>>> (state=,code=0)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> {
> >>>>>>>>>>>>> "entities": {
> >>>>>>>>>>>>> "trends": [],
> >>>>>>>>>>>>> "symbols": [],
> >>>>>>>>>>>>> "urls": [],
> >>>>>>>>>>>>> "hashtags": [],
> >>>>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>>>> },
> >>>>>>>>>>>>> "entities": {
> >>>>>>>>>>>>> "trends": [1,2,3],
> >>>>>>>>>>>>> "symbols": [4,5,6],
> >>>>>>>>>>>>> "urls": [7,8,9],
> >>>>>>>>>>>>> "hashtags": [
> >>>>>>>>>>>>> {
> >>>>>>>>>>>>>  "text": "GoPatriots",
> >>>>>>>>>>>>>  "indices": []
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>> ],
> >>>>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The issue seems to be that if some records have arrays with
> >>> maps in
> >>>>>>>> them
> >>>>>>>>>>>>> and others are empty.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> —Andries
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]>
> >> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Seems it works for below json file:
> >>>>>>>>>>>>>> {
> >>>>>>>>>>>>>> "entities": {
> >>>>>>>>>>>>>> "trends": [],
> >>>>>>>>>>>>>> "symbols": [],
> >>>>>>>>>>>>>> "urls": [],
> >>>>>>>>>>>>>> "hashtags": [
> >>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>  "text": "GoPatriots",
> >>>>>>>>>>>>>>  "indices": [
> >>>>>>>>>>>>>>    83,
> >>>>>>>>>>>>>>    94
> >>>>>>>>>>>>>>  ]
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>> ],
> >>>>>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>>>>> },
> >>>>>>>>>>>>>> "entities": {
> >>>>>>>>>>>>>> "trends": [1,2,3],
> >>>>>>>>>>>>>> "symbols": [4,5,6],
> >>>>>>>>>>>>>> "urls": [7,8,9],
> >>>>>>>>>>>>>> "hashtags": [
> >>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>  "text": "GoPatriots",
> >>>>>>>>>>>>>>  "indices": []
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>> ],
> >>>>>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json`
> >>> as t
> >>>>>>>> where
> >>>>>>>>>>>>>> t.entities.urls is not null;
> >>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>> | [7,8,9]    |
> >>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>> 1 row selected (0.139 seconds)
> >>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json`
> >>> as t
> >>>>>>>> where
> >>>>>>>>>>>>>> t.entities.urls is null;
> >>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>> No rows selected (0.158 seconds)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> Hao
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <
> >>> [email protected]>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I believe that this works if the array contains homogeneous
> >>>>>>>> primitive
> >>>>>>>>>>>>>>> types. In your example, it appears from the error, the
> array
> >>>>>> field
> >>>>>>>>>>>>> 'member'
> >>>>>>>>>>>>>>> contained maps for at least one record.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <
> >>>>>>>> [email protected]>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Trying that locally did not work for me (drill 0.7.0):
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members`
> from
> >>>>>>>>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0;
> >>>>>>>>>>>>>>>> Query failed: Query stopped., Failure while trying to
> >>>>>> materialize
> >>>>>>>>>>>>>>> incoming schema.  Errors:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Error in expression at index -1.  Error: Missing function
> >>>>>>>>>>>>>>> implementation: [repeated_count(MAP-REPEATED)].  Full
> >>> expression:
> >>>>>>>>>>>>> --UNKNOWN
> >>>>>>>>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
> >>>>>>>>>>>>> bullseye-3:31010 ]
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Error: exception while executing query: Failure while
> >>> executing
> >>>>>>>>>>> query.
> >>>>>>>>>>>>>>> (state=,code=0)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Chris Matta
> >>>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>> 215-701-3146
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <
> >>>>>> [email protected]
> >>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> repeated_count('entities.urls') > 0
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
> >>>>>>>>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> How do you filter out records with an empty array in
> >> drill?
> >>>>>>>>>>>>>>>>>> i.e some records have "url":[]  and some will have an
> >> array
> >>>>>>>> with
> >>>>>>>>>>> data
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>> it. When trying to read records with data in the array
> >>> drill
> >>>>>>>> fails
> >>>>>>>>>>>>> due
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> records missing any data in the array. Trying a filter
> >>> with/*
> >>>>>>>> where
> >>>>>>>>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying
> >> url
> >>> is
> >>>>>>>> not
> >>>>>>>>>>>>>>> null.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Note some of the arrays contains maps, using twitter
> data
> >>> as
> >>>>>> an
> >>>>>>>>>>>>>>> example
> >>>>>>>>>>>>>>>>>> below. Some records have an empty array with
> >> “hashtags”:[]
> >>>>>> and
> >>>>>>>>>>>>> others
> >>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>> look similar to what is listed below.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> "entities": {
> >>>>>>>>>>>>>>>>>> "trends": [],
> >>>>>>>>>>>>>>>>>> "symbols": [],
> >>>>>>>>>>>>>>>>>> "urls": [],
> >>>>>>>>>>>>>>>>>> "hashtags": [
> >>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>>  "text": "GoPatriots",
> >>>>>>>>>>>>>>>>>>  "indices": [
> >>>>>>>>>>>>>>>>>>    83,
> >>>>>>>>>>>>>>>>>>    94
> >>>>>>>>>>>>>>>>>>  ]
> >>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>> ],
> >>>>>>>>>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>>>>>>>>> },
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>>>>> —Andries
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> <z.json>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>
> >>>
> >>
>
>

Re: Filter out empty arrays in JSON

Reply via email to