Chris
Thanks, but I tried it before without success.
I still get this on Drill 0.7.
select t.entities.hashtags from dfs.twitter.`/nfl` t where
t.entities.hashtags[0] is not null limit 10;
Error: exception while executing query: Failure while executing query.
(state=,code=0)
Query failed: Query failed: Failure while running fragment., Failure while
trying to materialize incoming schema. Errors:
Error in expression at index 0. Error: Missing function implementation:
[isnotnull(MAP-REQUIRED)]. Full expression: null.. [
0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ]
[ 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ]
Mine seems finicky about even using the first element in the array
select t.entities.hashtags[0] from dfs.twitter.`/nfl` t limit 10;
Query failed: Query failed: Failure while running fragment., index: 4, length:
4 (expected: range(0, 4)) [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on
drilldemo:31010 ]
[ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on drilldemo:31010 ]
However flatten seems to work. (as long as you don’t have thousands of small
json files)
select flatten(t.entities.hashtags) from dfs.twitter.`/nfl` t limit 10;
+------------+
| EXPR$0 |
+------------+
| {"text":"PantherPride","indices":["12","25"]} |
| {"text":"KeepPounding","indices":["48","61"]} |
| {"text":"vegas","indices":["107","113"]} |
| {"text":"nfl","indices":["23","27"]} |
| {"text":"Cowboys","indices":["7","15"]} |
| {"text":"Cowboys","indices":["7","15"]} |
| {"text":"Ebay","indices":["52","57"]} |
| {"text":"NFL","indices":["58","62"]} |
| {"text":"Christmas","indices":["63","73"]} |
| {"text":"Gift","indices":["74","79"]} |
+------------+
—Andries
On Feb 5, 2015, at 6:41 PM, Christopher Matta <[email protected]> wrote:
> Andreas, I think I may have solved your issue:
>
>> select t.entities.hashtags FROM mfs.cmatta.`tweets/blackfriday` t WHERE
>> t.entities.hashtags[0].text is not null limit 10;
> +------------+
> | EXPR$0 |
> +------------+
> |
> [{"text":"Blackfriday","indices":[11,23]},{"text":"facebook","indices":[56,65]},{"text":"christmas","indices":[66,76]}]
> |
> |
> [{"text":"BlackFriday","indices":[65,77]},{"text":"ViernesNegro","indices":[105,118]},{"text":"ofertas","indices":[122,130]}]
> |
> |
> [{"text":"StarWars","indices":[21,30]},{"text":"BlackFriday","indices":[71,83]}]
> |
> | [{"text":"NotOneDime","indices":[14,25]}] |
> | [{"text":"BlackFriday","indices":[43,55]}] |
> | [{"text":"JRStudio","indices":[93,102]}] |
> |
> [{"text":"luv","indices":[38,42]},{"text":"luvmanicures","indices":[43,56]},{"text":"luvpedicures","indices":[57,70]},{"text":"luvroyaloak","indices":[71,83]},{"text":"woodwardave","indices":[84,96]}]
> |
> | [{"text":"BlackFriday","indices":[0,12]}] |
> |
> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}]
> |
> |
> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}]
> |
> +------------+
> 10 rows selected (1.697 seconds)
>
> I noticed when only extracting the first text item in the hashtag array (
> t.entities.hashtags[0].text) it was returning null for those tweets that
> didn’t have hashtags, so filtering out that seems to work.
>
>
> Chris Matta
> [email protected]
> 215-701-3146
>
> On Mon, Jan 26, 2015 at 6:43 PM, Jason Altekruse <[email protected]>
> wrote:
>
>> As Aditya commented before this will work if the lists only contain scalars
>>
>> repeated_count('entities.urls') > 0
>>
>> If the lists contain maps unfortunately this is not available today. There
>> is an enhancement request open for this feature. I have marked it for a fix
>> in 0.9 as it is more of a feature request than a bug and we are working on
>> closing a large number of bugs for 0.8 before we get to issues like this.
>>
>> https://issues.apache.org/jira/browse/DRILL-1650
>>
>> -Jason
>>
>> On Mon, Jan 26, 2015 at 3:26 PM, Andries Engelbrecht <
>> [email protected]> wrote:
>>
>>> Unfortunately it seems that with larger data sets the use of flatten
>> seems
>>> to produce an error.
>>>
>>> Any other options to filter out JSON records (objects) where an array is
>>> empty?
>>>
>>> Thanks
>>> —Andries
>>>
>>>
>>> On Jan 21, 2015, at 5:18 PM, Andries Engelbrecht <
>>> [email protected]> wrote:
>>>
>>>> It was interesting to see flatten bypass the records with an empty
>> array.
>>>>
>>>> Unfortunately some arrays are much more complex than the example here,
>>> and it is still useful to have the ability to filter out ones that are
>>> empty.
>>>>
>>>> —Andries
>>>>
>>>> On Jan 21, 2015, at 5:04 PM, Hao Zhu <[email protected]> wrote:
>>>>
>>>>> I figured out the differences after getting the json file from
>> Andries.
>>>>>
>>>>> My json file is like:
>>>>> {“entities”:{xxx},“entities”:{yyy}... }
>>>>>
>>>>> Andries' json file is like:
>>>>> {“entities”:{xxx}}
>>>>> {“entities”:{yyy}}
>>>>> ...
>>>>>
>>>>> So basically Andries' json file is contains multiple json files.
>>>>>
>>>>> Fortunately, we can use the same SQL to get the same results:
>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t;
>>>>> +------------+
>>>>> | EXPR$0 |
>>>>> +------------+
>>>>> | [] |
>>>>> | [{"text":"GoPatriots"}] |
>>>>> | [{"text":"aaa"}] |
>>>>> | [] |
>>>>> | [{"text":"bbb"}] |
>>>>> +------------+
>>>>> 5 rows selected (0.136 seconds)
>>>>> 0: jdbc:drill:> with tmp as
>>>>> . . . . . . . > (
>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from
>>>>> dfs.tmp.`z2.json` t
>>>>> . . . . . . . > )
>>>>> . . . . . . . > select tmp.c.text from tmp;
>>>>> +------------+
>>>>> | EXPR$0 |
>>>>> +------------+
>>>>> | GoPatriots |
>>>>> | aaa |
>>>>> | bbb |
>>>>> +------------+
>>>>> 3 rows selected (0.115 seconds)
>>>>>
>>>>> Thanks,
>>>>> Hao
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> In my case it returns the empty records when flatten is not used.
>>>>>>
>>>>>> 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as
>> hashtags
>>>>>> from `twitter.json` t limit 10;
>>>>>> +------------+
>>>>>> | hashtags |
>>>>>> +------------+
>>>>>> | [] |
>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
>>>>>> | [] |
>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
>>>>>> | [] |
>>>>>> | [] |
>>>>>> | [] |
>>>>>> | [] |
>>>>>> | [] |
>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] |
>>>>>> +------------+
>>>>>>
>>>>>> On Jan 21, 2015, at 4:26 PM, Hao Zhu <[email protected]> wrote:
>>>>>>
>>>>>>> Actually not due to flatten, if you directly query the file, it will
>>> only
>>>>>>> show the non-null values.
>>>>>>>
>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t;
>>>>>>> +------------+
>>>>>>> | EXPR$0 |
>>>>>>> +------------+
>>>>>>> | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] |
>>>>>>> +------------+
>>>>>>> 1 row selected (0.123 seconds)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Hao
>>>>>>>
>>>>>>> On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Very interesting, flatten seems to bypass empty records. Not sure
>> if
>>>>>> that
>>>>>>>> is an ideal result for all use cases, but certainly usable in this
>>> case.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> —Andries
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jan 21, 2015, at 3:45 PM, Hao Zhu <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> I can also fetch non-null values for attached json file which
>>> contains
>>>>>> 6
>>>>>>>> "entities", 3 of them are null, 3 of them have "text" value.
>>>>>>>>>
>>>>>>>>> Could you share your complete "twitter.json"?
>>>>>>>>>
>>>>>>>>> 0: jdbc:drill:> with tmp as
>>>>>>>>> . . . . . . . > (
>>>>>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from
>>>>>>>> dfs.tmp.`z.json` t
>>>>>>>>> . . . . . . . > )
>>>>>>>>> . . . . . . . > select tmp.c.text from tmp;
>>>>>>>>> +------------+
>>>>>>>>> | EXPR$0 |
>>>>>>>>> +------------+
>>>>>>>>> | GoPatriots |
>>>>>>>>> | aaa |
>>>>>>>>> | bbb |
>>>>>>>>> +------------+
>>>>>>>>> 3 rows selected (0.122 seconds)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Hao
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht <
>>>>>>>> [email protected]> wrote:
>>>>>>>>> The sample data I posted only has 1 element, but some records have
>>>>>>>> multiple elements in them.
>>>>>>>>>
>>>>>>>>> Interestingly enough though
>>>>>>>>> select t.entities.hashtags[0].`text` from `twitter.json` t limit
>> 10;
>>>>>>>>>
>>>>>>>>> Produces
>>>>>>>>> +------------+
>>>>>>>>> | EXPR$0 |
>>>>>>>>> +------------+
>>>>>>>>> | null |
>>>>>>>>> | SportsNews |
>>>>>>>>> | null |
>>>>>>>>> | SportsNews |
>>>>>>>>> | null |
>>>>>>>>> | null |
>>>>>>>>> | null |
>>>>>>>>> | null |
>>>>>>>>> | null |
>>>>>>>>> | CARvsSEA |
>>>>>>>>> +——————+
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And
>>>>>>>>>
>>>>>>>>> select t.entities.hashtags[0] from `twitter.json` t limit 10;
>>>>>>>>>
>>>>>>>>> +------------+
>>>>>>>>> | EXPR$0 |
>>>>>>>>> +------------+
>>>>>>>>> | {"indices":[]} |
>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} |
>>>>>>>>> | {"indices":[]} |
>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} |
>>>>>>>>> | {"indices":[]} |
>>>>>>>>> | {"indices":[]} |
>>>>>>>>> | {"indices":[]} |
>>>>>>>>> | {"indices":[]} |
>>>>>>>>> | {"indices":[]} |
>>>>>>>>> | {"text":"CARvsSEA","indices":[90,99]} |
>>>>>>>>> +——————+
>>>>>>>>>
>>>>>>>>> Strange part is that there is no indices map in the hashtags
>> array,
>>> so
>>>>>>>> no idea why it shows up when pointing to the first lament in an
>> empty
>>>>>> array.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> —Andries
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I just noticed that the result of "hashtags" is just an array
>> with
>>>>>>>> only 1
>>>>>>>>>> element.
>>>>>>>>>> So take your example:
>>>>>>>>>> [root@maprdemo tmp]# cat d.json
>>>>>>>>>> {
>>>>>>>>>> "entities": {
>>>>>>>>>> "trends": [],
>>>>>>>>>> "symbols": [],
>>>>>>>>>> "urls": [],
>>>>>>>>>> "hashtags": [],
>>>>>>>>>> "user_mentions": []
>>>>>>>>>> },
>>>>>>>>>> "entities": {
>>>>>>>>>> "trends": [1,2,3],
>>>>>>>>>> "symbols": [4,5,6],
>>>>>>>>>> "urls": [7,8,9],
>>>>>>>>>> "hashtags": [
>>>>>>>>>> {
>>>>>>>>>> "text": "GoPatriots",
>>>>>>>>>> "indices": []
>>>>>>>>>> }
>>>>>>>>>> ],
>>>>>>>>>> "user_mentions": []
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Now we can do this to achieve the results:
>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json`
>> t
>>> ;
>>>>>>>>>> +------------+
>>>>>>>>>> | EXPR$0 |
>>>>>>>>>> +------------+
>>>>>>>>>> | [{"text":"GoPatriots"}] |
>>>>>>>>>> +------------+
>>>>>>>>>> 1 row selected (0.09 seconds)
>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from
>>>>>>>> dfs.tmp.`d.json` t ;
>>>>>>>>>> +------------+
>>>>>>>>>> | EXPR$0 |
>>>>>>>>>> +------------+
>>>>>>>>>> | GoPatriots |
>>>>>>>>>> +------------+
>>>>>>>>>> 1 row selected (0.108 seconds)
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Hao
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> When I run the query on a larger dataset it actually show the
>>> empty
>>>>>>>>>>> records.
>>>>>>>>>>>
>>>>>>>>>>> select t.entities.hashtags from `twitter.json` t limit 10;
>>>>>>>>>>>
>>>>>>>>>>> +------------+
>>>>>>>>>>> | EXPR$0 |
>>>>>>>>>>> +------------+
>>>>>>>>>>> | [] |
>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
>>>>>>>>>>> | [] |
>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
>>>>>>>>>>> | [] |
>>>>>>>>>>> | [] |
>>>>>>>>>>> | [] |
>>>>>>>>>>> | [] |
>>>>>>>>>>> | [] |
>>>>>>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] |
>>>>>>>>>>> +------------+
>>>>>>>>>>> 10 rows selected (2.899 seconds)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> However having the output as maps is not very useful, unless i
>> can
>>>>>>>> filter
>>>>>>>>>>> out the records with empty arrays and then drill deeper into the
>>> ones
>>>>>>>> with
>>>>>>>>>>> data in the arrays.
>>>>>>>>>>>
>>>>>>>>>>> BTW: Hao I would have expiated your query to return both rows,
>> one
>>>>>>>> with an
>>>>>>>>>>> empty array as above and the other with the array data.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> —Andries
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I am not sure if below is expected behavior.
>>>>>>>>>>>> If we only select "hashtags", and it will return only 1 row
>>> ignoring
>>>>>>>> the
>>>>>>>>>>>> null value.
>>>>>>>>>>>> However then if we try to get "hashtags.text", it fails...which
>>>>>>>> means it
>>>>>>>>>>> is
>>>>>>>>>>>> still trying to read the NULL value.
>>>>>>>>>>>> I am thinking it may confuse the SQL developers.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from
>> dfs.tmp.`d.json`
>>> t ;
>>>>>>>>>>>> +------------+
>>>>>>>>>>>> | EXPR$0 |
>>>>>>>>>>>> +------------+
>>>>>>>>>>>> | [{"text":"GoPatriots"}] |
>>>>>>>>>>>> +------------+
>>>>>>>>>>>> 1 row selected (0.109 seconds)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from
>>>>>>>> dfs.tmp.`d.json` t ;
>>>>>>>>>>>>
>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment.,
>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot
>> be
>>>>>>>> cast to
>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [
>>>>>>>>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
>>>>>>>>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Error: exception while executing query: Failure while executing
>>>>>>>> query.
>>>>>>>>>>>> (state=,code=0)
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Hao
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Now try on hashtags with the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from
>>>>>>>> `/twitter.json` t
>>>>>>>>>>>>> where t.entities.hashtags is not null limit 10;
>>>>>>>>>>>>>
>>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment.,
>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot
>> be
>>>>>>>> cast to
>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [
>>>>>>>>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
>>>>>>>>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Error: exception while executing query: Failure while
>> executing
>>>>>>>> query.
>>>>>>>>>>>>> (state=,code=0)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> {
>>>>>>>>>>>>> "entities": {
>>>>>>>>>>>>> "trends": [],
>>>>>>>>>>>>> "symbols": [],
>>>>>>>>>>>>> "urls": [],
>>>>>>>>>>>>> "hashtags": [],
>>>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>>>> },
>>>>>>>>>>>>> "entities": {
>>>>>>>>>>>>> "trends": [1,2,3],
>>>>>>>>>>>>> "symbols": [4,5,6],
>>>>>>>>>>>>> "urls": [7,8,9],
>>>>>>>>>>>>> "hashtags": [
>>>>>>>>>>>>> {
>>>>>>>>>>>>> "text": "GoPatriots",
>>>>>>>>>>>>> "indices": []
>>>>>>>>>>>>> }
>>>>>>>>>>>>> ],
>>>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>>>> }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> The issue seems to be that if some records have arrays with
>>> maps in
>>>>>>>> them
>>>>>>>>>>>>> and others are empty.
>>>>>>>>>>>>>
>>>>>>>>>>>>> —Andries
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]>
>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Seems it works for below json file:
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> "entities": {
>>>>>>>>>>>>>> "trends": [],
>>>>>>>>>>>>>> "symbols": [],
>>>>>>>>>>>>>> "urls": [],
>>>>>>>>>>>>>> "hashtags": [
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> "text": "GoPatriots",
>>>>>>>>>>>>>> "indices": [
>>>>>>>>>>>>>> 83,
>>>>>>>>>>>>>> 94
>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>>>>> },
>>>>>>>>>>>>>> "entities": {
>>>>>>>>>>>>>> "trends": [1,2,3],
>>>>>>>>>>>>>> "symbols": [4,5,6],
>>>>>>>>>>>>>> "urls": [7,8,9],
>>>>>>>>>>>>>> "hashtags": [
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> "text": "GoPatriots",
>>>>>>>>>>>>>> "indices": []
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json`
>>> as t
>>>>>>>> where
>>>>>>>>>>>>>> t.entities.urls is not null;
>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>> | EXPR$0 |
>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>> | [7,8,9] |
>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>> 1 row selected (0.139 seconds)
>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json`
>>> as t
>>>>>>>> where
>>>>>>>>>>>>>> t.entities.urls is null;
>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>> | EXPR$0 |
>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>> No rows selected (0.158 seconds)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Hao
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <
>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I believe that this works if the array contains homogeneous
>>>>>>>> primitive
>>>>>>>>>>>>>>> types. In your example, it appears from the error, the array
>>>>>> field
>>>>>>>>>>>>> 'member'
>>>>>>>>>>>>>>> contained maps for at least one record.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <
>>>>>>>> [email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Trying that locally did not work for me (drill 0.7.0):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from
>>>>>>>>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0;
>>>>>>>>>>>>>>>> Query failed: Query stopped., Failure while trying to
>>>>>> materialize
>>>>>>>>>>>>>>> incoming schema. Errors:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Error in expression at index -1. Error: Missing function
>>>>>>>>>>>>>>> implementation: [repeated_count(MAP-REPEATED)]. Full
>>> expression:
>>>>>>>>>>>>> --UNKNOWN
>>>>>>>>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
>>>>>>>>>>>>> bullseye-3:31010 ]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Error: exception while executing query: Failure while
>>> executing
>>>>>>>>>>> query.
>>>>>>>>>>>>>>> (state=,code=0)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Chris Matta
>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> 215-701-3146
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <
>>>>>> [email protected]
>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> repeated_count('entities.urls') > 0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> How do you filter out records with an empty array in
>> drill?
>>>>>>>>>>>>>>>>>> i.e some records have "url":[] and some will have an
>> array
>>>>>>>> with
>>>>>>>>>>> data
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> it. When trying to read records with data in the array
>>> drill
>>>>>>>> fails
>>>>>>>>>>>>> due
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> records missing any data in the array. Trying a filter
>>> with/*
>>>>>>>> where
>>>>>>>>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying
>> url
>>> is
>>>>>>>> not
>>>>>>>>>>>>>>> null.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Note some of the arrays contains maps, using twitter data
>>> as
>>>>>> an
>>>>>>>>>>>>>>> example
>>>>>>>>>>>>>>>>>> below. Some records have an empty array with
>> “hashtags”:[]
>>>>>> and
>>>>>>>>>>>>> others
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>> look similar to what is listed below.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "entities": {
>>>>>>>>>>>>>>>>>> "trends": [],
>>>>>>>>>>>>>>>>>> "symbols": [],
>>>>>>>>>>>>>>>>>> "urls": [],
>>>>>>>>>>>>>>>>>> "hashtags": [
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>> "text": "GoPatriots",
>>>>>>>>>>>>>>>>>> "indices": [
>>>>>>>>>>>>>>>>>> 83,
>>>>>>>>>>>>>>>>>> 94
>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>>>>>>>>> },
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>> —Andries
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <z.json>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>