When I run the query on a larger dataset it actually show the empty records.

select t.entities.hashtags from `twitter.json` t limit 10;

+------------+
|   EXPR$0   |
+------------+
| []         |
| [{"text":"SportsNews","indices":[0,11]}] |
| []         |
| [{"text":"SportsNews","indices":[0,11]}] |
| []         |
| []         |
| []         |
| []         |
| []         |
| [{"text":"CARvsSEA","indices":[36,45]}] |
+------------+
10 rows selected (2.899 seconds)



However having the output as maps is not very useful, unless i can filter out 
the records with empty arrays and then drill deeper into the ones with data in 
the arrays.

BTW: Hao I would have expiated your query to return both rows, one with an 
empty array as above and the other with the array data.


—Andries


On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]> wrote:

> I am not sure if below is expected behavior.
> If we only select "hashtags", and it will return only 1 row ignoring the
> null value.
> However then if we try to get "hashtags.text", it fails...which means it is
> still trying to read the NULL value.
> I am thinking it may confuse the SQL developers.
> 
> 
> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ;
> +------------+
> |   EXPR$0   |
> +------------+
> | [{"text":"GoPatriots"}] |
> +------------+
> 1 row selected (0.109 seconds)
> 
> 
> 0: jdbc:drill:> select t.entities.hashtags.text from dfs.tmp.`d.json` t ;
> 
> Query failed: Query failed: Failure while running fragment.,
> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
> org.apache.drill.exec.vector.complex.MapVector [
> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> 
> 
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> 
> Thanks,
> Hao
> 
> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
> [email protected]> wrote:
> 
>> Now try on hashtags with the following:
>> 
>> drilldemo:5181> select t.entities.hashtags.`text` from `/twitter.json` t
>> where t.entities.hashtags is not null limit 10;
>> 
>> Query failed: Query failed: Failure while running fragment.,
>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
>> org.apache.drill.exec.vector.complex.MapVector [
>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
>> 
>> 
>> Error: exception while executing query: Failure while executing query.
>> (state=,code=0)
>> 
>> 
>> {
>> "entities": {
>>   "trends": [],
>>   "symbols": [],
>>   "urls": [],
>>   "hashtags": [],
>>   "user_mentions": []
>> },
>> "entities": {
>>   "trends": [1,2,3],
>>   "symbols": [4,5,6],
>>   "urls": [7,8,9],
>>   "hashtags": [
>>     {
>>       "text": "GoPatriots",
>>       "indices": []
>>     }
>>   ],
>>   "user_mentions": []
>> }
>> }
>> 
>> The issue seems to be that if some records have arrays with maps in them
>> and others are empty.
>> 
>> —Andries
>> 
>> 
>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]> wrote:
>> 
>>> Seems it works for below json file:
>>> {
>>> "entities": {
>>>   "trends": [],
>>>   "symbols": [],
>>>   "urls": [],
>>>   "hashtags": [
>>>     {
>>>       "text": "GoPatriots",
>>>       "indices": [
>>>         83,
>>>         94
>>>       ]
>>>     }
>>>   ],
>>>   "user_mentions": []
>>> },
>>> "entities": {
>>>   "trends": [1,2,3],
>>>   "symbols": [4,5,6],
>>>   "urls": [7,8,9],
>>>   "hashtags": [
>>>     {
>>>       "text": "GoPatriots",
>>>       "indices": []
>>>     }
>>>   ],
>>>   "user_mentions": []
>>> }
>>> }
>>> 
>>> 
>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t where
>>> t.entities.urls is not null;
>>> +------------+
>>> |   EXPR$0   |
>>> +------------+
>>> | [7,8,9]    |
>>> +------------+
>>> 1 row selected (0.139 seconds)
>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t where
>>> t.entities.urls is null;
>>> +------------+
>>> |   EXPR$0   |
>>> +------------+
>>> +------------+
>>> No rows selected (0.158 seconds)
>>> 
>>> Thanks,
>>> Hao
>>> 
>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <[email protected]> wrote:
>>> 
>>>> I believe that this works if the array contains homogeneous primitive
>>>> types. In your example, it appears from the error, the array field
>> 'member'
>>>> contained maps for at least one record.
>>>> 
>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <[email protected]>
>>>> wrote:
>>>> 
>>>>> Trying that locally did not work for me (drill 0.7.0):
>>>>> 
>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from
>>>> `Downloads/test.json` where repeated_count(`members`) > 0;
>>>>> Query failed: Query stopped., Failure while trying to materialize
>>>> incoming schema.  Errors:
>>>>> 
>>>>> Error in expression at index -1.  Error: Missing function
>>>> implementation: [repeated_count(MAP-REPEATED)].  Full expression:
>> --UNKNOWN
>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
>> bullseye-3:31010 ]
>>>>> 
>>>>> Error: exception while executing query: Failure while executing query.
>>>> (state=,code=0)
>>>>> 
>>>>> ​
>>>>> 
>>>>> Chris Matta
>>>>> [email protected]
>>>>> 215-701-3146
>>>>> 
>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <[email protected]>
>> wrote:
>>>>> 
>>>>>> repeated_count('entities.urls') > 0
>>>>>> 
>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
>>>>>> [email protected]> wrote:
>>>>>> 
>>>>>>> How do you filter out records with an empty array in drill?
>>>>>>> i.e some records have "url":[]  and some will have an array with data
>>>> in
>>>>>>> it. When trying to read records with data in the array drill fails
>> due
>>>>>> to
>>>>>>> records missing any data in the array. Trying a filter with/* where
>>>>>>> "url":[0] is not null */ fails, also fails if applying url is not
>>>> null.
>>>>>>> 
>>>>>>> Note some of the arrays contains maps, using twitter data as an
>>>> example
>>>>>>> below. Some records have an empty array with “hashtags”:[]  and
>> others
>>>>>> will
>>>>>>> look similar to what is listed below.
>>>>>>> 
>>>>>>> "entities": {
>>>>>>>   "trends": [],
>>>>>>>   "symbols": [],
>>>>>>>   "urls": [],
>>>>>>>   "hashtags": [
>>>>>>>     {
>>>>>>>       "text": "GoPatriots",
>>>>>>>       "indices": [
>>>>>>>         83,
>>>>>>>         94
>>>>>>>       ]
>>>>>>>     }
>>>>>>>   ],
>>>>>>>   "user_mentions": []
>>>>>>> },
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks
>>>>>>> —Andries
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 

Reply via email to