You missed something: select t.entities.hashtags from dfs.twitter./nflt where t.entities.hashtags[0].*text* is not null limit 10;
Add the .text to the end of the where. Chris Matta [email protected] 215-701-3146 On Thu, Feb 5, 2015 at 10:05 PM, Andries Engelbrecht < [email protected]> wrote: > Chris > > Thanks, but I tried it before without success. > > I still get this on Drill 0.7. > > select t.entities.hashtags from dfs.twitter.`/nfl` t where > t.entities.hashtags[0] is not null limit 10; > > Error: exception while executing query: Failure while executing query. > (state=,code=0) > Query failed: Query failed: Failure while running fragment., Failure while > trying to materialize incoming schema. Errors: > > Error in expression at index 0. Error: Missing function implementation: > [isnotnull(MAP-REQUIRED)]. Full expression: null.. [ > 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ] > [ 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ] > > > Mine seems finicky about even using the first element in the array > > select t.entities.hashtags[0] from dfs.twitter.`/nfl` t limit 10; > > Query failed: Query failed: Failure while running fragment., index: 4, > length: 4 (expected: range(0, 4)) [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on > drilldemo:31010 ] > [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on drilldemo:31010 ] > > > However flatten seems to work. (as long as you don’t have thousands of > small json files) > > select flatten(t.entities.hashtags) from dfs.twitter.`/nfl` t limit 10; > > +------------+ > | EXPR$0 | > +------------+ > | {"text":"PantherPride","indices":["12","25"]} | > | {"text":"KeepPounding","indices":["48","61"]} | > | {"text":"vegas","indices":["107","113"]} | > | {"text":"nfl","indices":["23","27"]} | > | {"text":"Cowboys","indices":["7","15"]} | > | {"text":"Cowboys","indices":["7","15"]} | > | {"text":"Ebay","indices":["52","57"]} | > | {"text":"NFL","indices":["58","62"]} | > | {"text":"Christmas","indices":["63","73"]} | > | {"text":"Gift","indices":["74","79"]} | > +------------+ > > > —Andries > > > > > On Feb 5, 2015, at 6:41 PM, Christopher Matta <[email protected]> wrote: > > > Andreas, I think I may have solved your issue: > > > >> select t.entities.hashtags FROM mfs.cmatta.`tweets/blackfriday` t WHERE > t.entities.hashtags[0].text is not null limit 10; > > +------------+ > > | EXPR$0 | > > +------------+ > > | > [{"text":"Blackfriday","indices":[11,23]},{"text":"facebook","indices":[56,65]},{"text":"christmas","indices":[66,76]}] > > | > > | > [{"text":"BlackFriday","indices":[65,77]},{"text":"ViernesNegro","indices":[105,118]},{"text":"ofertas","indices":[122,130]}] > > | > > | > [{"text":"StarWars","indices":[21,30]},{"text":"BlackFriday","indices":[71,83]}] > > | > > | [{"text":"NotOneDime","indices":[14,25]}] | > > | [{"text":"BlackFriday","indices":[43,55]}] | > > | [{"text":"JRStudio","indices":[93,102]}] | > > | > [{"text":"luv","indices":[38,42]},{"text":"luvmanicures","indices":[43,56]},{"text":"luvpedicures","indices":[57,70]},{"text":"luvroyaloak","indices":[71,83]},{"text":"woodwardave","indices":[84,96]}] > > | > > | [{"text":"BlackFriday","indices":[0,12]}] | > > | > [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}] > > | > > | > [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}] > > | > > +------------+ > > 10 rows selected (1.697 seconds) > > > > I noticed when only extracting the first text item in the hashtag array ( > > t.entities.hashtags[0].text) it was returning null for those tweets that > > didn’t have hashtags, so filtering out that seems to work. > > > > > > Chris Matta > > [email protected] > > 215-701-3146 > > > > On Mon, Jan 26, 2015 at 6:43 PM, Jason Altekruse < > [email protected]> > > wrote: > > > >> As Aditya commented before this will work if the lists only contain > scalars > >> > >> repeated_count('entities.urls') > 0 > >> > >> If the lists contain maps unfortunately this is not available today. > There > >> is an enhancement request open for this feature. I have marked it for a > fix > >> in 0.9 as it is more of a feature request than a bug and we are working > on > >> closing a large number of bugs for 0.8 before we get to issues like > this. > >> > >> https://issues.apache.org/jira/browse/DRILL-1650 > >> > >> -Jason > >> > >> On Mon, Jan 26, 2015 at 3:26 PM, Andries Engelbrecht < > >> [email protected]> wrote: > >> > >>> Unfortunately it seems that with larger data sets the use of flatten > >> seems > >>> to produce an error. > >>> > >>> Any other options to filter out JSON records (objects) where an array > is > >>> empty? > >>> > >>> Thanks > >>> —Andries > >>> > >>> > >>> On Jan 21, 2015, at 5:18 PM, Andries Engelbrecht < > >>> [email protected]> wrote: > >>> > >>>> It was interesting to see flatten bypass the records with an empty > >> array. > >>>> > >>>> Unfortunately some arrays are much more complex than the example here, > >>> and it is still useful to have the ability to filter out ones that are > >>> empty. > >>>> > >>>> —Andries > >>>> > >>>> On Jan 21, 2015, at 5:04 PM, Hao Zhu <[email protected]> wrote: > >>>> > >>>>> I figured out the differences after getting the json file from > >> Andries. > >>>>> > >>>>> My json file is like: > >>>>> {“entities”:{xxx},“entities”:{yyy}... } > >>>>> > >>>>> Andries' json file is like: > >>>>> {“entities”:{xxx}} > >>>>> {“entities”:{yyy}} > >>>>> ... > >>>>> > >>>>> So basically Andries' json file is contains multiple json files. > >>>>> > >>>>> Fortunately, we can use the same SQL to get the same results: > >>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t; > >>>>> +------------+ > >>>>> | EXPR$0 | > >>>>> +------------+ > >>>>> | [] | > >>>>> | [{"text":"GoPatriots"}] | > >>>>> | [{"text":"aaa"}] | > >>>>> | [] | > >>>>> | [{"text":"bbb"}] | > >>>>> +------------+ > >>>>> 5 rows selected (0.136 seconds) > >>>>> 0: jdbc:drill:> with tmp as > >>>>> . . . . . . . > ( > >>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from > >>>>> dfs.tmp.`z2.json` t > >>>>> . . . . . . . > ) > >>>>> . . . . . . . > select tmp.c.text from tmp; > >>>>> +------------+ > >>>>> | EXPR$0 | > >>>>> +------------+ > >>>>> | GoPatriots | > >>>>> | aaa | > >>>>> | bbb | > >>>>> +------------+ > >>>>> 3 rows selected (0.115 seconds) > >>>>> > >>>>> Thanks, > >>>>> Hao > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht < > >>>>> [email protected]> wrote: > >>>>> > >>>>>> In my case it returns the empty records when flatten is not used. > >>>>>> > >>>>>> 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as > >> hashtags > >>>>>> from `twitter.json` t limit 10; > >>>>>> +------------+ > >>>>>> | hashtags | > >>>>>> +------------+ > >>>>>> | [] | > >>>>>> | [{"text":"SportsNews","indices":[0,11]}] | > >>>>>> | [] | > >>>>>> | [{"text":"SportsNews","indices":[0,11]}] | > >>>>>> | [] | > >>>>>> | [] | > >>>>>> | [] | > >>>>>> | [] | > >>>>>> | [] | > >>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] | > >>>>>> +------------+ > >>>>>> > >>>>>> On Jan 21, 2015, at 4:26 PM, Hao Zhu <[email protected]> wrote: > >>>>>> > >>>>>>> Actually not due to flatten, if you directly query the file, it > will > >>> only > >>>>>>> show the non-null values. > >>>>>>> > >>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t; > >>>>>>> +------------+ > >>>>>>> | EXPR$0 | > >>>>>>> +------------+ > >>>>>>> | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] | > >>>>>>> +------------+ > >>>>>>> 1 row selected (0.123 seconds) > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Hao > >>>>>>> > >>>>>>> On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht < > >>>>>>> [email protected]> wrote: > >>>>>>> > >>>>>>>> Very interesting, flatten seems to bypass empty records. Not sure > >> if > >>>>>> that > >>>>>>>> is an ideal result for all use cases, but certainly usable in this > >>> case. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> —Andries > >>>>>>>> > >>>>>>>> > >>>>>>>> On Jan 21, 2015, at 3:45 PM, Hao Zhu <[email protected]> wrote: > >>>>>>>> > >>>>>>>>> I can also fetch non-null values for attached json file which > >>> contains > >>>>>> 6 > >>>>>>>> "entities", 3 of them are null, 3 of them have "text" value. > >>>>>>>>> > >>>>>>>>> Could you share your complete "twitter.json"? > >>>>>>>>> > >>>>>>>>> 0: jdbc:drill:> with tmp as > >>>>>>>>> . . . . . . . > ( > >>>>>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from > >>>>>>>> dfs.tmp.`z.json` t > >>>>>>>>> . . . . . . . > ) > >>>>>>>>> . . . . . . . > select tmp.c.text from tmp; > >>>>>>>>> +------------+ > >>>>>>>>> | EXPR$0 | > >>>>>>>>> +------------+ > >>>>>>>>> | GoPatriots | > >>>>>>>>> | aaa | > >>>>>>>>> | bbb | > >>>>>>>>> +------------+ > >>>>>>>>> 3 rows selected (0.122 seconds) > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Hao > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht < > >>>>>>>> [email protected]> wrote: > >>>>>>>>> The sample data I posted only has 1 element, but some records > have > >>>>>>>> multiple elements in them. > >>>>>>>>> > >>>>>>>>> Interestingly enough though > >>>>>>>>> select t.entities.hashtags[0].`text` from `twitter.json` t limit > >> 10; > >>>>>>>>> > >>>>>>>>> Produces > >>>>>>>>> +------------+ > >>>>>>>>> | EXPR$0 | > >>>>>>>>> +------------+ > >>>>>>>>> | null | > >>>>>>>>> | SportsNews | > >>>>>>>>> | null | > >>>>>>>>> | SportsNews | > >>>>>>>>> | null | > >>>>>>>>> | null | > >>>>>>>>> | null | > >>>>>>>>> | null | > >>>>>>>>> | null | > >>>>>>>>> | CARvsSEA | > >>>>>>>>> +——————+ > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> And > >>>>>>>>> > >>>>>>>>> select t.entities.hashtags[0] from `twitter.json` t limit 10; > >>>>>>>>> > >>>>>>>>> +------------+ > >>>>>>>>> | EXPR$0 | > >>>>>>>>> +------------+ > >>>>>>>>> | {"indices":[]} | > >>>>>>>>> | {"text":"SportsNews","indices":[90,99]} | > >>>>>>>>> | {"indices":[]} | > >>>>>>>>> | {"text":"SportsNews","indices":[90,99]} | > >>>>>>>>> | {"indices":[]} | > >>>>>>>>> | {"indices":[]} | > >>>>>>>>> | {"indices":[]} | > >>>>>>>>> | {"indices":[]} | > >>>>>>>>> | {"indices":[]} | > >>>>>>>>> | {"text":"CARvsSEA","indices":[90,99]} | > >>>>>>>>> +——————+ > >>>>>>>>> > >>>>>>>>> Strange part is that there is no indices map in the hashtags > >> array, > >>> so > >>>>>>>> no idea why it shows up when pointing to the first lament in an > >> empty > >>>>>> array. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> —Andries > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <[email protected]> wrote: > >>>>>>>>> > >>>>>>>>>> I just noticed that the result of "hashtags" is just an array > >> with > >>>>>>>> only 1 > >>>>>>>>>> element. > >>>>>>>>>> So take your example: > >>>>>>>>>> [root@maprdemo tmp]# cat d.json > >>>>>>>>>> { > >>>>>>>>>> "entities": { > >>>>>>>>>> "trends": [], > >>>>>>>>>> "symbols": [], > >>>>>>>>>> "urls": [], > >>>>>>>>>> "hashtags": [], > >>>>>>>>>> "user_mentions": [] > >>>>>>>>>> }, > >>>>>>>>>> "entities": { > >>>>>>>>>> "trends": [1,2,3], > >>>>>>>>>> "symbols": [4,5,6], > >>>>>>>>>> "urls": [7,8,9], > >>>>>>>>>> "hashtags": [ > >>>>>>>>>> { > >>>>>>>>>> "text": "GoPatriots", > >>>>>>>>>> "indices": [] > >>>>>>>>>> } > >>>>>>>>>> ], > >>>>>>>>>> "user_mentions": [] > >>>>>>>>>> } > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> Now we can do this to achieve the results: > >>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` > >> t > >>> ; > >>>>>>>>>> +------------+ > >>>>>>>>>> | EXPR$0 | > >>>>>>>>>> +------------+ > >>>>>>>>>> | [{"text":"GoPatriots"}] | > >>>>>>>>>> +------------+ > >>>>>>>>>> 1 row selected (0.09 seconds) > >>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from > >>>>>>>> dfs.tmp.`d.json` t ; > >>>>>>>>>> +------------+ > >>>>>>>>>> | EXPR$0 | > >>>>>>>>>> +------------+ > >>>>>>>>>> | GoPatriots | > >>>>>>>>>> +------------+ > >>>>>>>>>> 1 row selected (0.108 seconds) > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Hao > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht < > >>>>>>>>>> [email protected]> wrote: > >>>>>>>>>> > >>>>>>>>>>> When I run the query on a larger dataset it actually show the > >>> empty > >>>>>>>>>>> records. > >>>>>>>>>>> > >>>>>>>>>>> select t.entities.hashtags from `twitter.json` t limit 10; > >>>>>>>>>>> > >>>>>>>>>>> +------------+ > >>>>>>>>>>> | EXPR$0 | > >>>>>>>>>>> +------------+ > >>>>>>>>>>> | [] | > >>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | > >>>>>>>>>>> | [] | > >>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | > >>>>>>>>>>> | [] | > >>>>>>>>>>> | [] | > >>>>>>>>>>> | [] | > >>>>>>>>>>> | [] | > >>>>>>>>>>> | [] | > >>>>>>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] | > >>>>>>>>>>> +------------+ > >>>>>>>>>>> 10 rows selected (2.899 seconds) > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> However having the output as maps is not very useful, unless i > >> can > >>>>>>>> filter > >>>>>>>>>>> out the records with empty arrays and then drill deeper into > the > >>> ones > >>>>>>>> with > >>>>>>>>>>> data in the arrays. > >>>>>>>>>>> > >>>>>>>>>>> BTW: Hao I would have expiated your query to return both rows, > >> one > >>>>>>>> with an > >>>>>>>>>>> empty array as above and the other with the array data. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> —Andries > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]> > wrote: > >>>>>>>>>>> > >>>>>>>>>>>> I am not sure if below is expected behavior. > >>>>>>>>>>>> If we only select "hashtags", and it will return only 1 row > >>> ignoring > >>>>>>>> the > >>>>>>>>>>>> null value. > >>>>>>>>>>>> However then if we try to get "hashtags.text", it > fails...which > >>>>>>>> means it > >>>>>>>>>>> is > >>>>>>>>>>>> still trying to read the NULL value. > >>>>>>>>>>>> I am thinking it may confuse the SQL developers. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from > >> dfs.tmp.`d.json` > >>> t ; > >>>>>>>>>>>> +------------+ > >>>>>>>>>>>> | EXPR$0 | > >>>>>>>>>>>> +------------+ > >>>>>>>>>>>> | [{"text":"GoPatriots"}] | > >>>>>>>>>>>> +------------+ > >>>>>>>>>>>> 1 row selected (0.109 seconds) > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from > >>>>>>>> dfs.tmp.`d.json` t ; > >>>>>>>>>>>> > >>>>>>>>>>>> Query failed: Query failed: Failure while running fragment., > >>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot > >> be > >>>>>>>> cast to > >>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [ > >>>>>>>>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] > >>>>>>>>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Error: exception while executing query: Failure while > executing > >>>>>>>> query. > >>>>>>>>>>>> (state=,code=0) > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Hao > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht < > >>>>>>>>>>>> [email protected]> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Now try on hashtags with the following: > >>>>>>>>>>>>> > >>>>>>>>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from > >>>>>>>> `/twitter.json` t > >>>>>>>>>>>>> where t.entities.hashtags is not null limit 10; > >>>>>>>>>>>>> > >>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment., > >>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot > >> be > >>>>>>>> cast to > >>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [ > >>>>>>>>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] > >>>>>>>>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Error: exception while executing query: Failure while > >> executing > >>>>>>>> query. > >>>>>>>>>>>>> (state=,code=0) > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> { > >>>>>>>>>>>>> "entities": { > >>>>>>>>>>>>> "trends": [], > >>>>>>>>>>>>> "symbols": [], > >>>>>>>>>>>>> "urls": [], > >>>>>>>>>>>>> "hashtags": [], > >>>>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>>>> }, > >>>>>>>>>>>>> "entities": { > >>>>>>>>>>>>> "trends": [1,2,3], > >>>>>>>>>>>>> "symbols": [4,5,6], > >>>>>>>>>>>>> "urls": [7,8,9], > >>>>>>>>>>>>> "hashtags": [ > >>>>>>>>>>>>> { > >>>>>>>>>>>>> "text": "GoPatriots", > >>>>>>>>>>>>> "indices": [] > >>>>>>>>>>>>> } > >>>>>>>>>>>>> ], > >>>>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>>>> } > >>>>>>>>>>>>> } > >>>>>>>>>>>>> > >>>>>>>>>>>>> The issue seems to be that if some records have arrays with > >>> maps in > >>>>>>>> them > >>>>>>>>>>>>> and others are empty. > >>>>>>>>>>>>> > >>>>>>>>>>>>> —Andries > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]> > >> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Seems it works for below json file: > >>>>>>>>>>>>>> { > >>>>>>>>>>>>>> "entities": { > >>>>>>>>>>>>>> "trends": [], > >>>>>>>>>>>>>> "symbols": [], > >>>>>>>>>>>>>> "urls": [], > >>>>>>>>>>>>>> "hashtags": [ > >>>>>>>>>>>>>> { > >>>>>>>>>>>>>> "text": "GoPatriots", > >>>>>>>>>>>>>> "indices": [ > >>>>>>>>>>>>>> 83, > >>>>>>>>>>>>>> 94 > >>>>>>>>>>>>>> ] > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> ], > >>>>>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>>>>> }, > >>>>>>>>>>>>>> "entities": { > >>>>>>>>>>>>>> "trends": [1,2,3], > >>>>>>>>>>>>>> "symbols": [4,5,6], > >>>>>>>>>>>>>> "urls": [7,8,9], > >>>>>>>>>>>>>> "hashtags": [ > >>>>>>>>>>>>>> { > >>>>>>>>>>>>>> "text": "GoPatriots", > >>>>>>>>>>>>>> "indices": [] > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> ], > >>>>>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` > >>> as t > >>>>>>>> where > >>>>>>>>>>>>>> t.entities.urls is not null; > >>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>> | EXPR$0 | > >>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>> | [7,8,9] | > >>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>> 1 row selected (0.139 seconds) > >>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` > >>> as t > >>>>>>>> where > >>>>>>>>>>>>>> t.entities.urls is null; > >>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>> | EXPR$0 | > >>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>> No rows selected (0.158 seconds) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> Hao > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya < > >>> [email protected]> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I believe that this works if the array contains homogeneous > >>>>>>>> primitive > >>>>>>>>>>>>>>> types. In your example, it appears from the error, the > array > >>>>>> field > >>>>>>>>>>>>> 'member' > >>>>>>>>>>>>>>> contained maps for at least one record. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta < > >>>>>>>> [email protected]> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Trying that locally did not work for me (drill 0.7.0): > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` > from > >>>>>>>>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0; > >>>>>>>>>>>>>>>> Query failed: Query stopped., Failure while trying to > >>>>>> materialize > >>>>>>>>>>>>>>> incoming schema. Errors: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Error in expression at index -1. Error: Missing function > >>>>>>>>>>>>>>> implementation: [repeated_count(MAP-REPEATED)]. Full > >>> expression: > >>>>>>>>>>>>> --UNKNOWN > >>>>>>>>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on > >>>>>>>>>>>>> bullseye-3:31010 ] > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Error: exception while executing query: Failure while > >>> executing > >>>>>>>>>>> query. > >>>>>>>>>>>>>>> (state=,code=0) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Chris Matta > >>>>>>>>>>>>>>>> [email protected] > >>>>>>>>>>>>>>>> 215-701-3146 > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya < > >>>>>> [email protected] > >>>>>>>>> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> repeated_count('entities.urls') > 0 > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht < > >>>>>>>>>>>>>>>>> [email protected]> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> How do you filter out records with an empty array in > >> drill? > >>>>>>>>>>>>>>>>>> i.e some records have "url":[] and some will have an > >> array > >>>>>>>> with > >>>>>>>>>>> data > >>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>> it. When trying to read records with data in the array > >>> drill > >>>>>>>> fails > >>>>>>>>>>>>> due > >>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> records missing any data in the array. Trying a filter > >>> with/* > >>>>>>>> where > >>>>>>>>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying > >> url > >>> is > >>>>>>>> not > >>>>>>>>>>>>>>> null. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Note some of the arrays contains maps, using twitter > data > >>> as > >>>>>> an > >>>>>>>>>>>>>>> example > >>>>>>>>>>>>>>>>>> below. Some records have an empty array with > >> “hashtags”:[] > >>>>>> and > >>>>>>>>>>>>> others > >>>>>>>>>>>>>>>>> will > >>>>>>>>>>>>>>>>>> look similar to what is listed below. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> "entities": { > >>>>>>>>>>>>>>>>>> "trends": [], > >>>>>>>>>>>>>>>>>> "symbols": [], > >>>>>>>>>>>>>>>>>> "urls": [], > >>>>>>>>>>>>>>>>>> "hashtags": [ > >>>>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>>>> "text": "GoPatriots", > >>>>>>>>>>>>>>>>>> "indices": [ > >>>>>>>>>>>>>>>>>> 83, > >>>>>>>>>>>>>>>>>> 94 > >>>>>>>>>>>>>>>>>> ] > >>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>> ], > >>>>>>>>>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>>>>>>>>> }, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>>>>> —Andries > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> <z.json> > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>> > >>> > >>> > >> > >
