Dechang provided method by using drill flatten function with mongdb plugin 
works fine in my test env by using the json document Yousuf provided.

Details as below:

I am using drill 1.8 and mongodb 3.2.9

> db.test.find()
{ "_id" : ObjectId("584f3bad7161b4e2b9fa0a7d"), "twitter_entities" : { 
"hashtags" : [ { "text" : "snow", "indices" : [ 66, 71 ] }, { "text" : 
"skiing", "indices" : [ 73, 80 ] } ], "trends" : [ ], "urls" : [ { "url" : 
"http://xxxxxxx";, "expanded_url" : "http://xxxxx";, "display_url" : "xxxxx", 
"indices" : [ 71, 93 ] } ] } }



0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> use mongo.test;
+-------+-----------------------------------------+
|  ok   |                 summary                 |
+-------+-----------------------------------------+
| true  | Default schema changed to [mongo.test]  |
+-------+-----------------------------------------+
1 row selected (0.376 seconds)
0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select t1.c1.text, 
count(t1.c1.text) from (select
. . . . . . . . . . . . . . . . . . . . . . .> 
flatten(tbl.twitter_entities.hashtags) as c1 from test as tbl) t1 group
. . . . . . . . . . . . . . . . . . . . . . .> by t1.c1.text;
+---------+---------+
| EXPR$0  | EXPR$1  |
+---------+---------+
| skiing  | 1       |
| snow    | 1       |
+---------+---------+
2 rows selected (0.372 seconds)





The only thing difference is that I noticed in the original document Yousuf 
provided, for key/value pair "display_url": xxxxx”, “ is missing before xxxxx” 
and I was not able to insert/import into mongodb, after I added “ 
—>"display_url": “xxxxx”, everything works fine.

Regards,

Kathleen

On 12/13/16, 6:30 AM, "yousuf" <[email protected]> wrote:

>Hi Dechang,
>
>The query didn't work with mongodb.
>
>
>use mongo.test;
>
>select t1.c1.text, count(t1.c1.text) from (select 
>flatten(tbl.twitter_entities.hashtags) as c1 from test as tbl) t1 group 
>by t1.c1.text;
>+---------+---------+
>| EXPR$0  | EXPR$1  |
>+---------+---------+
>+---------+---------+
>
>Kind Regards
>
>Yousuf
>
>
>
>
>
>
>-------- Forwarded Message --------
>Subject:       Re: nested array flatten and then group by - Apache drill
>Date:  Mon, 12 Dec 2016 10:32:53 -0800
>From:  Dechang Gu <[email protected]>
>Reply-To:      [email protected]
>To:    [email protected]
>
>
>
>AFAIK,  for MongoDB plugin, we have very limited tests.
>Anyway, I  tried the following query on your sample json file:
>
>0: jdbc:drill:schema=dfs.tpchPar100> select t1.c1.text, count(t1.c1.text)
>from (select flatten(tbl.twitter_entities.hashtags) as c1 from
>dfs.`/jsondata/junk.json` as tbl) t1 group by t1.c1.text;
>
>+---------+---------+
>
>| EXPR$0  | EXPR$1  |
>
>+---------+---------+
>
>| skiing  | 1       |
>
>| snow    | 1       |
>
>+---------+---------+
>
>2 rows selected (0.542 seconds)
>
>
>is this the group/count you want to retrieve?
>
>
>HTH,
>Dechang
>
>On Sun, Dec 11, 2016 at 12:10 AM, yousuf <[email protected]> wrote:
>
>> Hi,
>>
>> How can I retrieve hashtags[n].text values as flatten from the following
>> json document (storage mongo). My final goal is to group text with count.
>>
>> Apache drill version : 1.8.0
>> MongoDB version: 3.2
>>
>> I've tried several examples but my final result is either empty array eg
>> [] or blank.
>>
>> {
>>     "twitter_entities": {
>>         "hashtags": [{
>>             "text": "snow",
>>             "indices": [
>>                 66,
>>                 71
>>             ]
>>         }, {
>>             "text": "skiing",
>>             "indices": [
>>                 73,
>>                 80
>>             ]
>>         }],
>>         "trends": [],
>>         "urls": [{
>>             "url": "http://xxxxxxx";,
>>             "expanded_url": "http://xxxxx";,
>>             "display_url": xxxxx",
>>             "indices": [
>>                 71,
>>                 93
>>             ]
>>         }]
>>     }
>> }
>>
>> Any help is appreciated
>> Regards
>> Yousuf
>>
>>
>>
>

Reply via email to