Re: Nested Query using SparkSQL 1.1.0

shahab Mon, 13 Oct 2014 07:51:12 -0700

Thanks Yin.  I trued HiveQL and and it solved that problem. But now I have
second query requirement :
But since you are main developer behind JSON-Spark integration (I saw your
presentation on youtube "Easy JSON Data Manipulation in Spark"), is it
possible to perform aggregation kind queries,
for example counting number of attributes (considering that attributes in
schema is presented as "array"), or any other type of aggregation?


best,
/Shahab

On Mon, Oct 13, 2014 at 4:01 PM, Yin Huai <huaiyin....@gmail.com> wrote:

> Hi Shahab,
>
> Can you try to use HiveContext? Its should work in 1.1. For SQLContext,
> this issues was not fixed in 1.1 and you need to use master branch at the
> moment.
>
> Thanks,
>
> Yin
>
> On Sun, Oct 12, 2014 at 5:20 PM, shahab <shahab.mok...@gmail.com> wrote:
>
>> Hi,
>>
>>  Apparently is it is possible to query nested json using spark SQL, but ,
>> mainly due to lack of proper documentation/examples, I did not manage to
>> make it working. I do appreciate if you could point me to any example or
>> help with this issue,
>>
>> Here is my code:
>>
>>   val anotherPeopleRDD = sc.parallelize(
>>
>>        """{
>>
>>     "attributes": [
>>
>>         {
>>
>>             "data": {
>>
>>                 "gender": "woman"
>>
>>             },
>>
>>             "section": "Economy",
>>
>>             "collectApp": "web",
>>
>>             "id": 1409064792512
>>
>>         }
>>
>>     ]
>>
>> }""" :: Nil)
>>
>>   val anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD)
>>
>>   anotherPeople.registerTempTable("people")
>>
>>    val query_people = sqlContext.sql("select attributes[0].collectApp
>> from people")
>>
>>    query_people.foreach(println)
>>
>> But instead of getting "Web" as print out, I am getting the following:
>>
>> [[web,[woman],1409064792512, Economy]]
>>
>>
>>
>> thanks,
>>
>> /shahab
>>
>>
>>
>

Re: Nested Query using SparkSQL 1.1.0

Reply via email to