Hi Shahab,

Do you mean queries with group by and aggregation functions? Once you
register the json dataset as a table, you can write queries like querying a
regular table. You can join it with other tables and do aggregations. Is it
what you were asking for? If not, can you give me a more concrete example?

Thanks,

Yin

On Mon, Oct 13, 2014 at 10:50 AM, shahab <shahab.mok...@gmail.com> wrote:

> Thanks Yin.  I trued HiveQL and and it solved that problem. But now I have
> second query requirement :
> But since you are main developer behind JSON-Spark integration (I saw your
> presentation on youtube "Easy JSON Data Manipulation in Spark"), is it
> possible to perform aggregation kind queries,
> for example counting number of attributes (considering that attributes in
> schema is presented as "array"), or any other type of aggregation?
>
> best,
> /Shahab
>
> On Mon, Oct 13, 2014 at 4:01 PM, Yin Huai <huaiyin....@gmail.com> wrote:
>
>> Hi Shahab,
>>
>> Can you try to use HiveContext? Its should work in 1.1. For SQLContext,
>> this issues was not fixed in 1.1 and you need to use master branch at the
>> moment.
>>
>> Thanks,
>>
>> Yin
>>
>> On Sun, Oct 12, 2014 at 5:20 PM, shahab <shahab.mok...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>  Apparently is it is possible to query nested json using spark SQL, but
>>> , mainly due to lack of proper documentation/examples, I did not manage to
>>> make it working. I do appreciate if you could point me to any example or
>>> help with this issue,
>>>
>>> Here is my code:
>>>
>>>   val anotherPeopleRDD = sc.parallelize(
>>>
>>>        """{
>>>
>>>     "attributes": [
>>>
>>>         {
>>>
>>>             "data": {
>>>
>>>                 "gender": "woman"
>>>
>>>             },
>>>
>>>             "section": "Economy",
>>>
>>>             "collectApp": "web",
>>>
>>>             "id": 1409064792512
>>>
>>>         }
>>>
>>>     ]
>>>
>>> }""" :: Nil)
>>>
>>>   val anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD)
>>>
>>>   anotherPeople.registerTempTable("people")
>>>
>>>    val query_people = sqlContext.sql("select attributes[0].collectApp
>>> from people")
>>>
>>>    query_people.foreach(println)
>>>
>>> But instead of getting "Web" as print out, I am getting the following:
>>>
>>> [[web,[woman],1409064792512, Economy]]
>>>
>>>
>>>
>>> thanks,
>>>
>>> /shahab
>>>
>>>
>>>
>>
>

Reply via email to