Hi Shahab, Do you mean queries with group by and aggregation functions? Once you register the json dataset as a table, you can write queries like querying a regular table. You can join it with other tables and do aggregations. Is it what you were asking for? If not, can you give me a more concrete example?
Thanks, Yin On Mon, Oct 13, 2014 at 10:50 AM, shahab <shahab.mok...@gmail.com> wrote: > Thanks Yin. I trued HiveQL and and it solved that problem. But now I have > second query requirement : > But since you are main developer behind JSON-Spark integration (I saw your > presentation on youtube "Easy JSON Data Manipulation in Spark"), is it > possible to perform aggregation kind queries, > for example counting number of attributes (considering that attributes in > schema is presented as "array"), or any other type of aggregation? > > best, > /Shahab > > On Mon, Oct 13, 2014 at 4:01 PM, Yin Huai <huaiyin....@gmail.com> wrote: > >> Hi Shahab, >> >> Can you try to use HiveContext? Its should work in 1.1. For SQLContext, >> this issues was not fixed in 1.1 and you need to use master branch at the >> moment. >> >> Thanks, >> >> Yin >> >> On Sun, Oct 12, 2014 at 5:20 PM, shahab <shahab.mok...@gmail.com> wrote: >> >>> Hi, >>> >>> Apparently is it is possible to query nested json using spark SQL, but >>> , mainly due to lack of proper documentation/examples, I did not manage to >>> make it working. I do appreciate if you could point me to any example or >>> help with this issue, >>> >>> Here is my code: >>> >>> val anotherPeopleRDD = sc.parallelize( >>> >>> """{ >>> >>> "attributes": [ >>> >>> { >>> >>> "data": { >>> >>> "gender": "woman" >>> >>> }, >>> >>> "section": "Economy", >>> >>> "collectApp": "web", >>> >>> "id": 1409064792512 >>> >>> } >>> >>> ] >>> >>> }""" :: Nil) >>> >>> val anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD) >>> >>> anotherPeople.registerTempTable("people") >>> >>> val query_people = sqlContext.sql("select attributes[0].collectApp >>> from people") >>> >>> query_people.foreach(println) >>> >>> But instead of getting "Web" as print out, I am getting the following: >>> >>> [[web,[woman],1409064792512, Economy]] >>> >>> >>> >>> thanks, >>> >>> /shahab >>> >>> >>> >> >