Steven, are you referring to the fact that he has two map keys with the same name?
If so, is that technically invalid json? I generally agree with this post [1]. http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object#answer-23195243 If others agree, I think it would be appropriate for Hao to file a JIRA to make sure we follow and check this convention. J On Wed, Jan 21, 2015 at 5:07 PM, Steven Phillips <[email protected]> wrote: > Hao, > > It looks to me like your json file is actually invalid, and it seems like > Drill should be throwing an exception. > > On Wed, Jan 21, 2015 at 5:04 PM, Hao Zhu <[email protected]> wrote: > > > I figured out the differences after getting the json file from Andries. > > > > My json file is like: > > {“entities”:{xxx},“entities”:{yyy}... } > > > > Andries' json file is like: > > {“entities”:{xxx}} > > {“entities”:{yyy}} > > ... > > > > So basically Andries' json file is contains multiple json files. > > > > Fortunately, we can use the same SQL to get the same results: > > 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t; > > +------------+ > > | EXPR$0 | > > +------------+ > > | [] | > > | [{"text":"GoPatriots"}] | > > | [{"text":"aaa"}] | > > | [] | > > | [{"text":"bbb"}] | > > +------------+ > > 5 rows selected (0.136 seconds) > > 0: jdbc:drill:> with tmp as > > . . . . . . . > ( > > . . . . . . . > select flatten(t.entities.hashtags) as c from > > dfs.tmp.`z2.json` t > > . . . . . . . > ) > > . . . . . . . > select tmp.c.text from tmp; > > +------------+ > > | EXPR$0 | > > +------------+ > > | GoPatriots | > > | aaa | > > | bbb | > > +------------+ > > 3 rows selected (0.115 seconds) > > > > Thanks, > > Hao > > > > > > > > > > > > > > On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht < > > [email protected]> wrote: > > > > > In my case it returns the empty records when flatten is not used. > > > > > > 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as hashtags > > > from `twitter.json` t limit 10; > > > +------------+ > > > | hashtags | > > > +------------+ > > > | [] | > > > | [{"text":"SportsNews","indices":[0,11]}] | > > > | [] | > > > | [{"text":"SportsNews","indices":[0,11]}] | > > > | [] | > > > | [] | > > > | [] | > > > | [] | > > > | [] | > > > | [{"text":"CARvsSEA","indices":[36,45]}] | > > > +------------+ > > > > > > On Jan 21, 2015, at 4:26 PM, Hao Zhu <[email protected]> wrote: > > > > > > > Actually not due to flatten, if you directly query the file, it will > > only > > > > show the non-null values. > > > > > > > > 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t; > > > > +------------+ > > > > | EXPR$0 | > > > > +------------+ > > > > | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] | > > > > +------------+ > > > > 1 row selected (0.123 seconds) > > > > > > > > Thanks, > > > > Hao > > > > > > > > On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht < > > > > [email protected]> wrote: > > > > > > > >> Very interesting, flatten seems to bypass empty records. Not sure if > > > that > > > >> is an ideal result for all use cases, but certainly usable in this > > case. > > > >> > > > >> Thanks > > > >> —Andries > > > >> > > > >> > > > >> On Jan 21, 2015, at 3:45 PM, Hao Zhu <[email protected]> wrote: > > > >> > > > >>> I can also fetch non-null values for attached json file which > > contains > > > 6 > > > >> "entities", 3 of them are null, 3 of them have "text" value. > > > >>> > > > >>> Could you share your complete "twitter.json"? > > > >>> > > > >>> 0: jdbc:drill:> with tmp as > > > >>> . . . . . . . > ( > > > >>> . . . . . . . > select flatten(t.entities.hashtags) as c from > > > >> dfs.tmp.`z.json` t > > > >>> . . . . . . . > ) > > > >>> . . . . . . . > select tmp.c.text from tmp; > > > >>> +------------+ > > > >>> | EXPR$0 | > > > >>> +------------+ > > > >>> | GoPatriots | > > > >>> | aaa | > > > >>> | bbb | > > > >>> +------------+ > > > >>> 3 rows selected (0.122 seconds) > > > >>> > > > >>> Thanks, > > > >>> Hao > > > >>> > > > >>> > > > >>> > > > >>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht < > > > >> [email protected]> wrote: > > > >>> The sample data I posted only has 1 element, but some records have > > > >> multiple elements in them. > > > >>> > > > >>> Interestingly enough though > > > >>> select t.entities.hashtags[0].`text` from `twitter.json` t limit > 10; > > > >>> > > > >>> Produces > > > >>> +------------+ > > > >>> | EXPR$0 | > > > >>> +------------+ > > > >>> | null | > > > >>> | SportsNews | > > > >>> | null | > > > >>> | SportsNews | > > > >>> | null | > > > >>> | null | > > > >>> | null | > > > >>> | null | > > > >>> | null | > > > >>> | CARvsSEA | > > > >>> +——————+ > > > >>> > > > >>> > > > >>> And > > > >>> > > > >>> select t.entities.hashtags[0] from `twitter.json` t limit 10; > > > >>> > > > >>> +------------+ > > > >>> | EXPR$0 | > > > >>> +------------+ > > > >>> | {"indices":[]} | > > > >>> | {"text":"SportsNews","indices":[90,99]} | > > > >>> | {"indices":[]} | > > > >>> | {"text":"SportsNews","indices":[90,99]} | > > > >>> | {"indices":[]} | > > > >>> | {"indices":[]} | > > > >>> | {"indices":[]} | > > > >>> | {"indices":[]} | > > > >>> | {"indices":[]} | > > > >>> | {"text":"CARvsSEA","indices":[90,99]} | > > > >>> +——————+ > > > >>> > > > >>> Strange part is that there is no indices map in the hashtags array, > > so > > > >> no idea why it shows up when pointing to the first lament in an > empty > > > array. > > > >>> > > > >>> > > > >>> —Andries > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <[email protected]> wrote: > > > >>> > > > >>>> I just noticed that the result of "hashtags" is just an array with > > > >> only 1 > > > >>>> element. > > > >>>> So take your example: > > > >>>> [root@maprdemo tmp]# cat d.json > > > >>>> { > > > >>>> "entities": { > > > >>>> "trends": [], > > > >>>> "symbols": [], > > > >>>> "urls": [], > > > >>>> "hashtags": [], > > > >>>> "user_mentions": [] > > > >>>> }, > > > >>>> "entities": { > > > >>>> "trends": [1,2,3], > > > >>>> "symbols": [4,5,6], > > > >>>> "urls": [7,8,9], > > > >>>> "hashtags": [ > > > >>>> { > > > >>>> "text": "GoPatriots", > > > >>>> "indices": [] > > > >>>> } > > > >>>> ], > > > >>>> "user_mentions": [] > > > >>>> } > > > >>>> } > > > >>>> > > > >>>> Now we can do this to achieve the results: > > > >>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` > t ; > > > >>>> +------------+ > > > >>>> | EXPR$0 | > > > >>>> +------------+ > > > >>>> | [{"text":"GoPatriots"}] | > > > >>>> +------------+ > > > >>>> 1 row selected (0.09 seconds) > > > >>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from > > > >> dfs.tmp.`d.json` t ; > > > >>>> +------------+ > > > >>>> | EXPR$0 | > > > >>>> +------------+ > > > >>>> | GoPatriots | > > > >>>> +------------+ > > > >>>> 1 row selected (0.108 seconds) > > > >>>> > > > >>>> Thanks, > > > >>>> Hao > > > >>>> > > > >>>> > > > >>>> > > > >>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht < > > > >>>> [email protected]> wrote: > > > >>>> > > > >>>>> When I run the query on a larger dataset it actually show the > empty > > > >>>>> records. > > > >>>>> > > > >>>>> select t.entities.hashtags from `twitter.json` t limit 10; > > > >>>>> > > > >>>>> +------------+ > > > >>>>> | EXPR$0 | > > > >>>>> +------------+ > > > >>>>> | [] | > > > >>>>> | [{"text":"SportsNews","indices":[0,11]}] | > > > >>>>> | [] | > > > >>>>> | [{"text":"SportsNews","indices":[0,11]}] | > > > >>>>> | [] | > > > >>>>> | [] | > > > >>>>> | [] | > > > >>>>> | [] | > > > >>>>> | [] | > > > >>>>> | [{"text":"CARvsSEA","indices":[36,45]}] | > > > >>>>> +------------+ > > > >>>>> 10 rows selected (2.899 seconds) > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> However having the output as maps is not very useful, unless i > can > > > >> filter > > > >>>>> out the records with empty arrays and then drill deeper into the > > ones > > > >> with > > > >>>>> data in the arrays. > > > >>>>> > > > >>>>> BTW: Hao I would have expiated your query to return both rows, > one > > > >> with an > > > >>>>> empty array as above and the other with the array data. > > > >>>>> > > > >>>>> > > > >>>>> —Andries > > > >>>>> > > > >>>>> > > > >>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]> wrote: > > > >>>>> > > > >>>>>> I am not sure if below is expected behavior. > > > >>>>>> If we only select "hashtags", and it will return only 1 row > > ignoring > > > >> the > > > >>>>>> null value. > > > >>>>>> However then if we try to get "hashtags.text", it fails...which > > > >> means it > > > >>>>> is > > > >>>>>> still trying to read the NULL value. > > > >>>>>> I am thinking it may confuse the SQL developers. > > > >>>>>> > > > >>>>>> > > > >>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` > > t ; > > > >>>>>> +------------+ > > > >>>>>> | EXPR$0 | > > > >>>>>> +------------+ > > > >>>>>> | [{"text":"GoPatriots"}] | > > > >>>>>> +------------+ > > > >>>>>> 1 row selected (0.109 seconds) > > > >>>>>> > > > >>>>>> > > > >>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from > > > >> dfs.tmp.`d.json` t ; > > > >>>>>> > > > >>>>>> Query failed: Query failed: Failure while running fragment., > > > >>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be > > > >> cast to > > > >>>>>> org.apache.drill.exec.vector.complex.MapVector [ > > > >>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] > > > >>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] > > > >>>>>> > > > >>>>>> > > > >>>>>> Error: exception while executing query: Failure while executing > > > >> query. > > > >>>>>> (state=,code=0) > > > >>>>>> > > > >>>>>> Thanks, > > > >>>>>> Hao > > > >>>>>> > > > >>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht < > > > >>>>>> [email protected]> wrote: > > > >>>>>> > > > >>>>>>> Now try on hashtags with the following: > > > >>>>>>> > > > >>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from > > > >> `/twitter.json` t > > > >>>>>>> where t.entities.hashtags is not null limit 10; > > > >>>>>>> > > > >>>>>>> Query failed: Query failed: Failure while running fragment., > > > >>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot > be > > > >> cast to > > > >>>>>>> org.apache.drill.exec.vector.complex.MapVector [ > > > >>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] > > > >>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> Error: exception while executing query: Failure while executing > > > >> query. > > > >>>>>>> (state=,code=0) > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> { > > > >>>>>>> "entities": { > > > >>>>>>> "trends": [], > > > >>>>>>> "symbols": [], > > > >>>>>>> "urls": [], > > > >>>>>>> "hashtags": [], > > > >>>>>>> "user_mentions": [] > > > >>>>>>> }, > > > >>>>>>> "entities": { > > > >>>>>>> "trends": [1,2,3], > > > >>>>>>> "symbols": [4,5,6], > > > >>>>>>> "urls": [7,8,9], > > > >>>>>>> "hashtags": [ > > > >>>>>>> { > > > >>>>>>> "text": "GoPatriots", > > > >>>>>>> "indices": [] > > > >>>>>>> } > > > >>>>>>> ], > > > >>>>>>> "user_mentions": [] > > > >>>>>>> } > > > >>>>>>> } > > > >>>>>>> > > > >>>>>>> The issue seems to be that if some records have arrays with > maps > > in > > > >> them > > > >>>>>>> and others are empty. > > > >>>>>>> > > > >>>>>>> —Andries > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]> > wrote: > > > >>>>>>> > > > >>>>>>>> Seems it works for below json file: > > > >>>>>>>> { > > > >>>>>>>> "entities": { > > > >>>>>>>> "trends": [], > > > >>>>>>>> "symbols": [], > > > >>>>>>>> "urls": [], > > > >>>>>>>> "hashtags": [ > > > >>>>>>>> { > > > >>>>>>>> "text": "GoPatriots", > > > >>>>>>>> "indices": [ > > > >>>>>>>> 83, > > > >>>>>>>> 94 > > > >>>>>>>> ] > > > >>>>>>>> } > > > >>>>>>>> ], > > > >>>>>>>> "user_mentions": [] > > > >>>>>>>> }, > > > >>>>>>>> "entities": { > > > >>>>>>>> "trends": [1,2,3], > > > >>>>>>>> "symbols": [4,5,6], > > > >>>>>>>> "urls": [7,8,9], > > > >>>>>>>> "hashtags": [ > > > >>>>>>>> { > > > >>>>>>>> "text": "GoPatriots", > > > >>>>>>>> "indices": [] > > > >>>>>>>> } > > > >>>>>>>> ], > > > >>>>>>>> "user_mentions": [] > > > >>>>>>>> } > > > >>>>>>>> } > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` > as > > t > > > >> where > > > >>>>>>>> t.entities.urls is not null; > > > >>>>>>>> +------------+ > > > >>>>>>>> | EXPR$0 | > > > >>>>>>>> +------------+ > > > >>>>>>>> | [7,8,9] | > > > >>>>>>>> +------------+ > > > >>>>>>>> 1 row selected (0.139 seconds) > > > >>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` > as > > t > > > >> where > > > >>>>>>>> t.entities.urls is null; > > > >>>>>>>> +------------+ > > > >>>>>>>> | EXPR$0 | > > > >>>>>>>> +------------+ > > > >>>>>>>> +------------+ > > > >>>>>>>> No rows selected (0.158 seconds) > > > >>>>>>>> > > > >>>>>>>> Thanks, > > > >>>>>>>> Hao > > > >>>>>>>> > > > >>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya < > > [email protected]> > > > >>>>> wrote: > > > >>>>>>>> > > > >>>>>>>>> I believe that this works if the array contains homogeneous > > > >> primitive > > > >>>>>>>>> types. In your example, it appears from the error, the array > > > field > > > >>>>>>> 'member' > > > >>>>>>>>> contained maps for at least one record. > > > >>>>>>>>> > > > >>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta < > > > >> [email protected]> > > > >>>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>>> Trying that locally did not work for me (drill 0.7.0): > > > >>>>>>>>>> > > > >>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from > > > >>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0; > > > >>>>>>>>>> Query failed: Query stopped., Failure while trying to > > > materialize > > > >>>>>>>>> incoming schema. Errors: > > > >>>>>>>>>> > > > >>>>>>>>>> Error in expression at index -1. Error: Missing function > > > >>>>>>>>> implementation: [repeated_count(MAP-REPEATED)]. Full > > expression: > > > >>>>>>> --UNKNOWN > > > >>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on > > > >>>>>>> bullseye-3:31010 ] > > > >>>>>>>>>> > > > >>>>>>>>>> Error: exception while executing query: Failure while > > executing > > > >>>>> query. > > > >>>>>>>>> (state=,code=0) > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> Chris Matta > > > >>>>>>>>>> [email protected] > > > >>>>>>>>>> 215-701-3146 > > > >>>>>>>>>> > > > >>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya < > > > [email protected] > > > >>> > > > >>>>>>> wrote: > > > >>>>>>>>>> > > > >>>>>>>>>>> repeated_count('entities.urls') > 0 > > > >>>>>>>>>>> > > > >>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht < > > > >>>>>>>>>>> [email protected]> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>>> How do you filter out records with an empty array in > drill? > > > >>>>>>>>>>>> i.e some records have "url":[] and some will have an > array > > > >> with > > > >>>>> data > > > >>>>>>>>> in > > > >>>>>>>>>>>> it. When trying to read records with data in the array > drill > > > >> fails > > > >>>>>>> due > > > >>>>>>>>>>> to > > > >>>>>>>>>>>> records missing any data in the array. Trying a filter > > with/* > > > >> where > > > >>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying url > > is > > > >> not > > > >>>>>>>>> null. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Note some of the arrays contains maps, using twitter data > as > > > an > > > >>>>>>>>> example > > > >>>>>>>>>>>> below. Some records have an empty array with “hashtags”:[] > > > and > > > >>>>>>> others > > > >>>>>>>>>>> will > > > >>>>>>>>>>>> look similar to what is listed below. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> "entities": { > > > >>>>>>>>>>>> "trends": [], > > > >>>>>>>>>>>> "symbols": [], > > > >>>>>>>>>>>> "urls": [], > > > >>>>>>>>>>>> "hashtags": [ > > > >>>>>>>>>>>> { > > > >>>>>>>>>>>> "text": "GoPatriots", > > > >>>>>>>>>>>> "indices": [ > > > >>>>>>>>>>>> 83, > > > >>>>>>>>>>>> 94 > > > >>>>>>>>>>>> ] > > > >>>>>>>>>>>> } > > > >>>>>>>>>>>> ], > > > >>>>>>>>>>>> "user_mentions": [] > > > >>>>>>>>>>>> }, > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Thanks > > > >>>>>>>>>>>> —Andries > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>> > > > >>>>> > > > >>> > > > >>> > > > >>> <z.json> > > > >> > > > >> > > > > > > > > > > > > -- > Steven Phillips > Software Engineer > > mapr.com >
