The sample data I posted only has 1 element, but some records have multiple 
elements in them.

Interestingly enough though
select t.entities.hashtags[0].`text` from `twitter.json` t limit 10;

Produces
+------------+
|   EXPR$0   |
+------------+
| null       |
| SportsNews |
| null       |
| SportsNews |
| null       |
| null       |
| null       |
| null       |
| null       |
| CARvsSEA   |
+——————+


And

select t.entities.hashtags[0] from `twitter.json` t limit 10;

+------------+
|   EXPR$0   |
+------------+
| {"indices":[]} |
| {"text":"SportsNews","indices":[90,99]} |
| {"indices":[]} |
| {"text":"SportsNews","indices":[90,99]} |
| {"indices":[]} |
| {"indices":[]} |
| {"indices":[]} |
| {"indices":[]} |
| {"indices":[]} |
| {"text":"CARvsSEA","indices":[90,99]} |
+——————+

Strange part is that there is no indices map in the hashtags array, so no idea 
why it shows up when pointing to the first lament in an empty array.


—Andries






On Jan 21, 2015, at 3:24 PM, Hao Zhu <[email protected]> wrote:

> I just noticed that the result of "hashtags" is just an array with only 1
> element.
> So take your example:
> [root@maprdemo tmp]# cat d.json
> {
> "entities": {
>   "trends": [],
>   "symbols": [],
>   "urls": [],
>   "hashtags": [],
>   "user_mentions": []
> },
> "entities": {
>   "trends": [1,2,3],
>   "symbols": [4,5,6],
>   "urls": [7,8,9],
>   "hashtags": [
>     {
>       "text": "GoPatriots",
>       "indices": []
>     }
>   ],
>   "user_mentions": []
> }
> }
> 
> Now we can do this to achieve the results:
> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ;
> +------------+
> |   EXPR$0   |
> +------------+
> | [{"text":"GoPatriots"}] |
> +------------+
> 1 row selected (0.09 seconds)
> 0: jdbc:drill:> select t.entities.hashtags[0].text from dfs.tmp.`d.json` t ;
> +------------+
> |   EXPR$0   |
> +------------+
> | GoPatriots |
> +------------+
> 1 row selected (0.108 seconds)
> 
> Thanks,
> Hao
> 
> 
> 
> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht <
> [email protected]> wrote:
> 
>> When I run the query on a larger dataset it actually show the empty
>> records.
>> 
>> select t.entities.hashtags from `twitter.json` t limit 10;
>> 
>> +------------+
>> |   EXPR$0   |
>> +------------+
>> | []         |
>> | [{"text":"SportsNews","indices":[0,11]}] |
>> | []         |
>> | [{"text":"SportsNews","indices":[0,11]}] |
>> | []         |
>> | []         |
>> | []         |
>> | []         |
>> | []         |
>> | [{"text":"CARvsSEA","indices":[36,45]}] |
>> +------------+
>> 10 rows selected (2.899 seconds)
>> 
>> 
>> 
>> However having the output as maps is not very useful, unless i can filter
>> out the records with empty arrays and then drill deeper into the ones with
>> data in the arrays.
>> 
>> BTW: Hao I would have expiated your query to return both rows, one with an
>> empty array as above and the other with the array data.
>> 
>> 
>> —Andries
>> 
>> 
>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]> wrote:
>> 
>>> I am not sure if below is expected behavior.
>>> If we only select "hashtags", and it will return only 1 row ignoring the
>>> null value.
>>> However then if we try to get "hashtags.text", it fails...which means it
>> is
>>> still trying to read the NULL value.
>>> I am thinking it may confuse the SQL developers.
>>> 
>>> 
>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ;
>>> +------------+
>>> |   EXPR$0   |
>>> +------------+
>>> | [{"text":"GoPatriots"}] |
>>> +------------+
>>> 1 row selected (0.109 seconds)
>>> 
>>> 
>>> 0: jdbc:drill:> select t.entities.hashtags.text from dfs.tmp.`d.json` t ;
>>> 
>>> Query failed: Query failed: Failure while running fragment.,
>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
>>> org.apache.drill.exec.vector.complex.MapVector [
>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
>>> 
>>> 
>>> Error: exception while executing query: Failure while executing query.
>>> (state=,code=0)
>>> 
>>> Thanks,
>>> Hao
>>> 
>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
>>> [email protected]> wrote:
>>> 
>>>> Now try on hashtags with the following:
>>>> 
>>>> drilldemo:5181> select t.entities.hashtags.`text` from `/twitter.json` t
>>>> where t.entities.hashtags is not null limit 10;
>>>> 
>>>> Query failed: Query failed: Failure while running fragment.,
>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
>>>> org.apache.drill.exec.vector.complex.MapVector [
>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
>>>> 
>>>> 
>>>> Error: exception while executing query: Failure while executing query.
>>>> (state=,code=0)
>>>> 
>>>> 
>>>> {
>>>> "entities": {
>>>>  "trends": [],
>>>>  "symbols": [],
>>>>  "urls": [],
>>>>  "hashtags": [],
>>>>  "user_mentions": []
>>>> },
>>>> "entities": {
>>>>  "trends": [1,2,3],
>>>>  "symbols": [4,5,6],
>>>>  "urls": [7,8,9],
>>>>  "hashtags": [
>>>>    {
>>>>      "text": "GoPatriots",
>>>>      "indices": []
>>>>    }
>>>>  ],
>>>>  "user_mentions": []
>>>> }
>>>> }
>>>> 
>>>> The issue seems to be that if some records have arrays with maps in them
>>>> and others are empty.
>>>> 
>>>> —Andries
>>>> 
>>>> 
>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]> wrote:
>>>> 
>>>>> Seems it works for below json file:
>>>>> {
>>>>> "entities": {
>>>>>  "trends": [],
>>>>>  "symbols": [],
>>>>>  "urls": [],
>>>>>  "hashtags": [
>>>>>    {
>>>>>      "text": "GoPatriots",
>>>>>      "indices": [
>>>>>        83,
>>>>>        94
>>>>>      ]
>>>>>    }
>>>>>  ],
>>>>>  "user_mentions": []
>>>>> },
>>>>> "entities": {
>>>>>  "trends": [1,2,3],
>>>>>  "symbols": [4,5,6],
>>>>>  "urls": [7,8,9],
>>>>>  "hashtags": [
>>>>>    {
>>>>>      "text": "GoPatriots",
>>>>>      "indices": []
>>>>>    }
>>>>>  ],
>>>>>  "user_mentions": []
>>>>> }
>>>>> }
>>>>> 
>>>>> 
>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t where
>>>>> t.entities.urls is not null;
>>>>> +------------+
>>>>> |   EXPR$0   |
>>>>> +------------+
>>>>> | [7,8,9]    |
>>>>> +------------+
>>>>> 1 row selected (0.139 seconds)
>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t where
>>>>> t.entities.urls is null;
>>>>> +------------+
>>>>> |   EXPR$0   |
>>>>> +------------+
>>>>> +------------+
>>>>> No rows selected (0.158 seconds)
>>>>> 
>>>>> Thanks,
>>>>> Hao
>>>>> 
>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <[email protected]>
>> wrote:
>>>>> 
>>>>>> I believe that this works if the array contains homogeneous primitive
>>>>>> types. In your example, it appears from the error, the array field
>>>> 'member'
>>>>>> contained maps for at least one record.
>>>>>> 
>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Trying that locally did not work for me (drill 0.7.0):
>>>>>>> 
>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from
>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0;
>>>>>>> Query failed: Query stopped., Failure while trying to materialize
>>>>>> incoming schema.  Errors:
>>>>>>> 
>>>>>>> Error in expression at index -1.  Error: Missing function
>>>>>> implementation: [repeated_count(MAP-REPEATED)].  Full expression:
>>>> --UNKNOWN
>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
>>>> bullseye-3:31010 ]
>>>>>>> 
>>>>>>> Error: exception while executing query: Failure while executing
>> query.
>>>>>> (state=,code=0)
>>>>>>> 
>>>>>>> ​
>>>>>>> 
>>>>>>> Chris Matta
>>>>>>> [email protected]
>>>>>>> 215-701-3146
>>>>>>> 
>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <[email protected]>
>>>> wrote:
>>>>>>> 
>>>>>>>> repeated_count('entities.urls') > 0
>>>>>>>> 
>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
>>>>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>>> How do you filter out records with an empty array in drill?
>>>>>>>>> i.e some records have "url":[]  and some will have an array with
>> data
>>>>>> in
>>>>>>>>> it. When trying to read records with data in the array drill fails
>>>> due
>>>>>>>> to
>>>>>>>>> records missing any data in the array. Trying a filter with/* where
>>>>>>>>> "url":[0] is not null */ fails, also fails if applying url is not
>>>>>> null.
>>>>>>>>> 
>>>>>>>>> Note some of the arrays contains maps, using twitter data as an
>>>>>> example
>>>>>>>>> below. Some records have an empty array with “hashtags”:[]  and
>>>> others
>>>>>>>> will
>>>>>>>>> look similar to what is listed below.
>>>>>>>>> 
>>>>>>>>> "entities": {
>>>>>>>>>  "trends": [],
>>>>>>>>>  "symbols": [],
>>>>>>>>>  "urls": [],
>>>>>>>>>  "hashtags": [
>>>>>>>>>    {
>>>>>>>>>      "text": "GoPatriots",
>>>>>>>>>      "indices": [
>>>>>>>>>        83,
>>>>>>>>>        94
>>>>>>>>>      ]
>>>>>>>>>    }
>>>>>>>>>  ],
>>>>>>>>>  "user_mentions": []
>>>>>>>>> },
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> —Andries
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to