Re: Using Spark to analyze complex JSON

Tobias Pfeiffer Wed, 21 May 2014 20:13:26 -0700

Hi,

as far as I understand, if you create an RDD with a relational
structure from your JSON, you should be able to do much of that
already today. For example, take lift-json's deserializer and do
something like


  val json_table: RDD[MyCaseClass] = json_data.flatMap(json =>
json.extractOpt[MyCaseClass])

then I guess you can use Spark SQL on that. (Something like your
likes[2] query won't work, though, I guess.)

Regards
Tobias


On Thu, May 22, 2014 at 5:32 AM, Nicholas Chammas
<nicholas.cham...@gmail.com> wrote:
> Looking forward to that update!
>
> Given a table of JSON objects like this one:
>
> {
>    "name": "Nick",
>    "location": {
>       "x": 241.6,
>       "y": -22.5
>    },
>    "likes": ["ice cream", "dogs", "Vanilla Ice"]
> }
>
> It would be SUPER COOL if we could query that table in a way that is as
> natural as follows:
>
> SELECT DISTINCT name
> FROM json_table;
>
> SELECT MAX(location.x)
> FROM json_table;
>
> SELECT likes[2] -- Ice Ice Baby
> FROM json_table
> WHERE name = "Nick";
>
> Of course, this is just a hand-wavy suggestion of how I’d like to be able to
> query JSON (particularly that last example) using SQL. I’m interested in
> seeing what y’all come up with.
>
> A large part of what my team does is make it easy for analysts to explore
> and query JSON data using SQL. We have a fairly complex home-grown process
> to do that and are looking to replace it with something more out of the box.
> So if you’d like more input on how users might use this feature, I’d be glad
> to chime in.
>
> Nick
>
>
>
> On Wed, May 21, 2014 at 11:21 AM, Michael Armbrust <mich...@databricks.com>
> wrote:
>>
>> You can already extract fields from json data using Hive UDFs.  We have an
>> intern working on on better native support this summer.  We will be sure to
>> post updates once there is a working prototype.
>>
>> Michael
>>
>>
>> On Tue, May 20, 2014 at 6:46 PM, Nick Chammas <nicholas.cham...@gmail.com>
>> wrote:
>>>
>>> The Apache Drill home page has an interesting heading: "Liberate Nested
>>> Data".
>>>
>>> Is there any current or planned functionality in Spark SQL or Shark to
>>> enable SQL-like querying of complex JSON?
>>>
>>> Nick
>>>
>>>
>>> ________________________________
>>> View this message in context: Using Spark to analyze complex JSON
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>>
>

Re: Using Spark to analyze complex JSON

Reply via email to