I just noticed that the result of "hashtags" is just an array with only 1
element.
So take your example:
[root@maprdemo tmp]# cat d.json
{
"entities": {
   "trends": [],
   "symbols": [],
   "urls": [],
   "hashtags": [],
   "user_mentions": []
 },
"entities": {
   "trends": [1,2,3],
   "symbols": [4,5,6],
   "urls": [7,8,9],
   "hashtags": [
     {
       "text": "GoPatriots",
       "indices": []
     }
   ],
   "user_mentions": []
 }
}

Now we can do this to achieve the results:
0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ;
+------------+
|   EXPR$0   |
+------------+
| [{"text":"GoPatriots"}] |
+------------+
1 row selected (0.09 seconds)
0: jdbc:drill:> select t.entities.hashtags[0].text from dfs.tmp.`d.json` t ;
+------------+
|   EXPR$0   |
+------------+
| GoPatriots |
+------------+
1 row selected (0.108 seconds)

Thanks,
Hao



On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht <
[email protected]> wrote:

> When I run the query on a larger dataset it actually show the empty
> records.
>
> select t.entities.hashtags from `twitter.json` t limit 10;
>
> +------------+
> |   EXPR$0   |
> +------------+
> | []         |
> | [{"text":"SportsNews","indices":[0,11]}] |
> | []         |
> | [{"text":"SportsNews","indices":[0,11]}] |
> | []         |
> | []         |
> | []         |
> | []         |
> | []         |
> | [{"text":"CARvsSEA","indices":[36,45]}] |
> +------------+
> 10 rows selected (2.899 seconds)
>
>
>
> However having the output as maps is not very useful, unless i can filter
> out the records with empty arrays and then drill deeper into the ones with
> data in the arrays.
>
> BTW: Hao I would have expiated your query to return both rows, one with an
> empty array as above and the other with the array data.
>
>
> —Andries
>
>
> On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]> wrote:
>
> > I am not sure if below is expected behavior.
> > If we only select "hashtags", and it will return only 1 row ignoring the
> > null value.
> > However then if we try to get "hashtags.text", it fails...which means it
> is
> > still trying to read the NULL value.
> > I am thinking it may confuse the SQL developers.
> >
> >
> > 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ;
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > | [{"text":"GoPatriots"}] |
> > +------------+
> > 1 row selected (0.109 seconds)
> >
> >
> > 0: jdbc:drill:> select t.entities.hashtags.text from dfs.tmp.`d.json` t ;
> >
> > Query failed: Query failed: Failure while running fragment.,
> > org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
> > org.apache.drill.exec.vector.complex.MapVector [
> > 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> > [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> >
> >
> > Error: exception while executing query: Failure while executing query.
> > (state=,code=0)
> >
> > Thanks,
> > Hao
> >
> > On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
> > [email protected]> wrote:
> >
> >> Now try on hashtags with the following:
> >>
> >> drilldemo:5181> select t.entities.hashtags.`text` from `/twitter.json` t
> >> where t.entities.hashtags is not null limit 10;
> >>
> >> Query failed: Query failed: Failure while running fragment.,
> >> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
> >> org.apache.drill.exec.vector.complex.MapVector [
> >> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> >> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> >>
> >>
> >> Error: exception while executing query: Failure while executing query.
> >> (state=,code=0)
> >>
> >>
> >> {
> >> "entities": {
> >>   "trends": [],
> >>   "symbols": [],
> >>   "urls": [],
> >>   "hashtags": [],
> >>   "user_mentions": []
> >> },
> >> "entities": {
> >>   "trends": [1,2,3],
> >>   "symbols": [4,5,6],
> >>   "urls": [7,8,9],
> >>   "hashtags": [
> >>     {
> >>       "text": "GoPatriots",
> >>       "indices": []
> >>     }
> >>   ],
> >>   "user_mentions": []
> >> }
> >> }
> >>
> >> The issue seems to be that if some records have arrays with maps in them
> >> and others are empty.
> >>
> >> —Andries
> >>
> >>
> >> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]> wrote:
> >>
> >>> Seems it works for below json file:
> >>> {
> >>> "entities": {
> >>>   "trends": [],
> >>>   "symbols": [],
> >>>   "urls": [],
> >>>   "hashtags": [
> >>>     {
> >>>       "text": "GoPatriots",
> >>>       "indices": [
> >>>         83,
> >>>         94
> >>>       ]
> >>>     }
> >>>   ],
> >>>   "user_mentions": []
> >>> },
> >>> "entities": {
> >>>   "trends": [1,2,3],
> >>>   "symbols": [4,5,6],
> >>>   "urls": [7,8,9],
> >>>   "hashtags": [
> >>>     {
> >>>       "text": "GoPatriots",
> >>>       "indices": []
> >>>     }
> >>>   ],
> >>>   "user_mentions": []
> >>> }
> >>> }
> >>>
> >>>
> >>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t where
> >>> t.entities.urls is not null;
> >>> +------------+
> >>> |   EXPR$0   |
> >>> +------------+
> >>> | [7,8,9]    |
> >>> +------------+
> >>> 1 row selected (0.139 seconds)
> >>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t where
> >>> t.entities.urls is null;
> >>> +------------+
> >>> |   EXPR$0   |
> >>> +------------+
> >>> +------------+
> >>> No rows selected (0.158 seconds)
> >>>
> >>> Thanks,
> >>> Hao
> >>>
> >>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <[email protected]>
> wrote:
> >>>
> >>>> I believe that this works if the array contains homogeneous primitive
> >>>> types. In your example, it appears from the error, the array field
> >> 'member'
> >>>> contained maps for at least one record.
> >>>>
> >>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> Trying that locally did not work for me (drill 0.7.0):
> >>>>>
> >>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from
> >>>> `Downloads/test.json` where repeated_count(`members`) > 0;
> >>>>> Query failed: Query stopped., Failure while trying to materialize
> >>>> incoming schema.  Errors:
> >>>>>
> >>>>> Error in expression at index -1.  Error: Missing function
> >>>> implementation: [repeated_count(MAP-REPEATED)].  Full expression:
> >> --UNKNOWN
> >>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
> >> bullseye-3:31010 ]
> >>>>>
> >>>>> Error: exception while executing query: Failure while executing
> query.
> >>>> (state=,code=0)
> >>>>>
> >>>>> ​
> >>>>>
> >>>>> Chris Matta
> >>>>> [email protected]
> >>>>> 215-701-3146
> >>>>>
> >>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <[email protected]>
> >> wrote:
> >>>>>
> >>>>>> repeated_count('entities.urls') > 0
> >>>>>>
> >>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
> >>>>>> [email protected]> wrote:
> >>>>>>
> >>>>>>> How do you filter out records with an empty array in drill?
> >>>>>>> i.e some records have "url":[]  and some will have an array with
> data
> >>>> in
> >>>>>>> it. When trying to read records with data in the array drill fails
> >> due
> >>>>>> to
> >>>>>>> records missing any data in the array. Trying a filter with/* where
> >>>>>>> "url":[0] is not null */ fails, also fails if applying url is not
> >>>> null.
> >>>>>>>
> >>>>>>> Note some of the arrays contains maps, using twitter data as an
> >>>> example
> >>>>>>> below. Some records have an empty array with “hashtags”:[]  and
> >> others
> >>>>>> will
> >>>>>>> look similar to what is listed below.
> >>>>>>>
> >>>>>>> "entities": {
> >>>>>>>   "trends": [],
> >>>>>>>   "symbols": [],
> >>>>>>>   "urls": [],
> >>>>>>>   "hashtags": [
> >>>>>>>     {
> >>>>>>>       "text": "GoPatriots",
> >>>>>>>       "indices": [
> >>>>>>>         83,
> >>>>>>>         94
> >>>>>>>       ]
> >>>>>>>     }
> >>>>>>>   ],
> >>>>>>>   "user_mentions": []
> >>>>>>> },
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> —Andries
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Reply via email to