I can also fetch non-null values for attached json file which contains 6 "entities", 3 of them are null, 3 of them have "text" value.
Could you share your complete "twitter.json"? 0: jdbc:drill:> with tmp as . . . . . . . > ( . . . . . . . > select flatten(t.entities.hashtags) as c from dfs.tmp.`z.json` t . . . . . . . > ) . . . . . . . > select tmp.c.text from tmp; +------------+ | EXPR$0 | +------------+ | GoPatriots | | aaa | | bbb | +------------+ 3 rows selected (0.122 seconds) Thanks, Hao On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht < [email protected]> wrote: > The sample data I posted only has 1 element, but some records have > multiple elements in them. > > Interestingly enough though > select t.entities.hashtags[0].`text` from `twitter.json` t limit 10; > > Produces > +------------+ > | EXPR$0 | > +------------+ > | null | > | SportsNews | > | null | > | SportsNews | > | null | > | null | > | null | > | null | > | null | > | CARvsSEA | > +——————+ > > > And > > select t.entities.hashtags[0] from `twitter.json` t limit 10; > > +------------+ > | EXPR$0 | > +------------+ > | {"indices":[]} | > | {"text":"SportsNews","indices":[90,99]} | > | {"indices":[]} | > | {"text":"SportsNews","indices":[90,99]} | > | {"indices":[]} | > | {"indices":[]} | > | {"indices":[]} | > | {"indices":[]} | > | {"indices":[]} | > | {"text":"CARvsSEA","indices":[90,99]} | > +——————+ > > Strange part is that there is no indices map in the hashtags array, so no > idea why it shows up when pointing to the first lament in an empty array. > > > —Andries > > > > > > > On Jan 21, 2015, at 3:24 PM, Hao Zhu <[email protected]> wrote: > > > I just noticed that the result of "hashtags" is just an array with only 1 > > element. > > So take your example: > > [root@maprdemo tmp]# cat d.json > > { > > "entities": { > > "trends": [], > > "symbols": [], > > "urls": [], > > "hashtags": [], > > "user_mentions": [] > > }, > > "entities": { > > "trends": [1,2,3], > > "symbols": [4,5,6], > > "urls": [7,8,9], > > "hashtags": [ > > { > > "text": "GoPatriots", > > "indices": [] > > } > > ], > > "user_mentions": [] > > } > > } > > > > Now we can do this to achieve the results: > > 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ; > > +------------+ > > | EXPR$0 | > > +------------+ > > | [{"text":"GoPatriots"}] | > > +------------+ > > 1 row selected (0.09 seconds) > > 0: jdbc:drill:> select t.entities.hashtags[0].text from dfs.tmp.`d.json` > t ; > > +------------+ > > | EXPR$0 | > > +------------+ > > | GoPatriots | > > +------------+ > > 1 row selected (0.108 seconds) > > > > Thanks, > > Hao > > > > > > > > On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht < > > [email protected]> wrote: > > > >> When I run the query on a larger dataset it actually show the empty > >> records. > >> > >> select t.entities.hashtags from `twitter.json` t limit 10; > >> > >> +------------+ > >> | EXPR$0 | > >> +------------+ > >> | [] | > >> | [{"text":"SportsNews","indices":[0,11]}] | > >> | [] | > >> | [{"text":"SportsNews","indices":[0,11]}] | > >> | [] | > >> | [] | > >> | [] | > >> | [] | > >> | [] | > >> | [{"text":"CARvsSEA","indices":[36,45]}] | > >> +------------+ > >> 10 rows selected (2.899 seconds) > >> > >> > >> > >> However having the output as maps is not very useful, unless i can > filter > >> out the records with empty arrays and then drill deeper into the ones > with > >> data in the arrays. > >> > >> BTW: Hao I would have expiated your query to return both rows, one with > an > >> empty array as above and the other with the array data. > >> > >> > >> —Andries > >> > >> > >> On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]> wrote: > >> > >>> I am not sure if below is expected behavior. > >>> If we only select "hashtags", and it will return only 1 row ignoring > the > >>> null value. > >>> However then if we try to get "hashtags.text", it fails...which means > it > >> is > >>> still trying to read the NULL value. > >>> I am thinking it may confuse the SQL developers. > >>> > >>> > >>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ; > >>> +------------+ > >>> | EXPR$0 | > >>> +------------+ > >>> | [{"text":"GoPatriots"}] | > >>> +------------+ > >>> 1 row selected (0.109 seconds) > >>> > >>> > >>> 0: jdbc:drill:> select t.entities.hashtags.text from dfs.tmp.`d.json` > t ; > >>> > >>> Query failed: Query failed: Failure while running fragment., > >>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast > to > >>> org.apache.drill.exec.vector.complex.MapVector [ > >>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] > >>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] > >>> > >>> > >>> Error: exception while executing query: Failure while executing query. > >>> (state=,code=0) > >>> > >>> Thanks, > >>> Hao > >>> > >>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht < > >>> [email protected]> wrote: > >>> > >>>> Now try on hashtags with the following: > >>>> > >>>> drilldemo:5181> select t.entities.hashtags.`text` from > `/twitter.json` t > >>>> where t.entities.hashtags is not null limit 10; > >>>> > >>>> Query failed: Query failed: Failure while running fragment., > >>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast > to > >>>> org.apache.drill.exec.vector.complex.MapVector [ > >>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] > >>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] > >>>> > >>>> > >>>> Error: exception while executing query: Failure while executing query. > >>>> (state=,code=0) > >>>> > >>>> > >>>> { > >>>> "entities": { > >>>> "trends": [], > >>>> "symbols": [], > >>>> "urls": [], > >>>> "hashtags": [], > >>>> "user_mentions": [] > >>>> }, > >>>> "entities": { > >>>> "trends": [1,2,3], > >>>> "symbols": [4,5,6], > >>>> "urls": [7,8,9], > >>>> "hashtags": [ > >>>> { > >>>> "text": "GoPatriots", > >>>> "indices": [] > >>>> } > >>>> ], > >>>> "user_mentions": [] > >>>> } > >>>> } > >>>> > >>>> The issue seems to be that if some records have arrays with maps in > them > >>>> and others are empty. > >>>> > >>>> —Andries > >>>> > >>>> > >>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]> wrote: > >>>> > >>>>> Seems it works for below json file: > >>>>> { > >>>>> "entities": { > >>>>> "trends": [], > >>>>> "symbols": [], > >>>>> "urls": [], > >>>>> "hashtags": [ > >>>>> { > >>>>> "text": "GoPatriots", > >>>>> "indices": [ > >>>>> 83, > >>>>> 94 > >>>>> ] > >>>>> } > >>>>> ], > >>>>> "user_mentions": [] > >>>>> }, > >>>>> "entities": { > >>>>> "trends": [1,2,3], > >>>>> "symbols": [4,5,6], > >>>>> "urls": [7,8,9], > >>>>> "hashtags": [ > >>>>> { > >>>>> "text": "GoPatriots", > >>>>> "indices": [] > >>>>> } > >>>>> ], > >>>>> "user_mentions": [] > >>>>> } > >>>>> } > >>>>> > >>>>> > >>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t > where > >>>>> t.entities.urls is not null; > >>>>> +------------+ > >>>>> | EXPR$0 | > >>>>> +------------+ > >>>>> | [7,8,9] | > >>>>> +------------+ > >>>>> 1 row selected (0.139 seconds) > >>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t > where > >>>>> t.entities.urls is null; > >>>>> +------------+ > >>>>> | EXPR$0 | > >>>>> +------------+ > >>>>> +------------+ > >>>>> No rows selected (0.158 seconds) > >>>>> > >>>>> Thanks, > >>>>> Hao > >>>>> > >>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <[email protected]> > >> wrote: > >>>>> > >>>>>> I believe that this works if the array contains homogeneous > primitive > >>>>>> types. In your example, it appears from the error, the array field > >>>> 'member' > >>>>>> contained maps for at least one record. > >>>>>> > >>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <[email protected] > > > >>>>>> wrote: > >>>>>> > >>>>>>> Trying that locally did not work for me (drill 0.7.0): > >>>>>>> > >>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from > >>>>>> `Downloads/test.json` where repeated_count(`members`) > 0; > >>>>>>> Query failed: Query stopped., Failure while trying to materialize > >>>>>> incoming schema. Errors: > >>>>>>> > >>>>>>> Error in expression at index -1. Error: Missing function > >>>>>> implementation: [repeated_count(MAP-REPEATED)]. Full expression: > >>>> --UNKNOWN > >>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on > >>>> bullseye-3:31010 ] > >>>>>>> > >>>>>>> Error: exception while executing query: Failure while executing > >> query. > >>>>>> (state=,code=0) > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Chris Matta > >>>>>>> [email protected] > >>>>>>> 215-701-3146 > >>>>>>> > >>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <[email protected]> > >>>> wrote: > >>>>>>> > >>>>>>>> repeated_count('entities.urls') > 0 > >>>>>>>> > >>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht < > >>>>>>>> [email protected]> wrote: > >>>>>>>> > >>>>>>>>> How do you filter out records with an empty array in drill? > >>>>>>>>> i.e some records have "url":[] and some will have an array with > >> data > >>>>>> in > >>>>>>>>> it. When trying to read records with data in the array drill > fails > >>>> due > >>>>>>>> to > >>>>>>>>> records missing any data in the array. Trying a filter with/* > where > >>>>>>>>> "url":[0] is not null */ fails, also fails if applying url is not > >>>>>> null. > >>>>>>>>> > >>>>>>>>> Note some of the arrays contains maps, using twitter data as an > >>>>>> example > >>>>>>>>> below. Some records have an empty array with “hashtags”:[] and > >>>> others > >>>>>>>> will > >>>>>>>>> look similar to what is listed below. > >>>>>>>>> > >>>>>>>>> "entities": { > >>>>>>>>> "trends": [], > >>>>>>>>> "symbols": [], > >>>>>>>>> "urls": [], > >>>>>>>>> "hashtags": [ > >>>>>>>>> { > >>>>>>>>> "text": "GoPatriots", > >>>>>>>>> "indices": [ > >>>>>>>>> 83, > >>>>>>>>> 94 > >>>>>>>>> ] > >>>>>>>>> } > >>>>>>>>> ], > >>>>>>>>> "user_mentions": [] > >>>>>>>>> }, > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> —Andries > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >> > >> > >
z.json
Description: application/json
