Hi Shahab,

Do you mean queries with group by and aggregation functions? Once you
register the json dataset as a table, you can write queries like querying a
regular table. You can join it with other tables and do aggregations. Is it
what you were asking for? If not, can you give me a more concrete example?



On Mon, Oct 13, 2014 at 10:50 AM, shahab <shahab.mok...@gmail.com> wrote:

> Thanks Yin.  I trued HiveQL and and it solved that problem. But now I have
> second query requirement :
> But since you are main developer behind JSON-Spark integration (I saw your
> presentation on youtube "Easy JSON Data Manipulation in Spark"), is it
> possible to perform aggregation kind queries,
> for example counting number of attributes (considering that attributes in
> schema is presented as "array"), or any other type of aggregation?
> best,
> /Shahab
> On Mon, Oct 13, 2014 at 4:01 PM, Yin Huai <huaiyin....@gmail.com> wrote:
>> Hi Shahab,
>> Can you try to use HiveContext? Its should work in 1.1. For SQLContext,
>> this issues was not fixed in 1.1 and you need to use master branch at the
>> moment.
>> Thanks,
>> Yin
>> On Sun, Oct 12, 2014 at 5:20 PM, shahab <shahab.mok...@gmail.com> wrote:
>>> Hi,
>>>  Apparently is it is possible to query nested json using spark SQL, but
>>> , mainly due to lack of proper documentation/examples, I did not manage to
>>> make it working. I do appreciate if you could point me to any example or
>>> help with this issue,
>>> Here is my code:
>>>   val anotherPeopleRDD = sc.parallelize(
>>>        """{
>>>     "attributes": [
>>>         {
>>>             "data": {
>>>                 "gender": "woman"
>>>             },
>>>             "section": "Economy",
>>>             "collectApp": "web",
>>>             "id": 1409064792512
>>>         }
>>>     ]
>>> }""" :: Nil)
>>>   val anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD)
>>>   anotherPeople.registerTempTable("people")
>>>    val query_people = sqlContext.sql("select attributes[0].collectApp
>>> from people")
>>>    query_people.foreach(println)
>>> But instead of getting "Web" as print out, I am getting the following:
>>> [[web,[woman],1409064792512, Economy]]
>>> thanks,
>>> /shahab

Reply via email to