Re: JSON Usage

2016-04-17 Thread Benjamin Kim
Hyukjin, This is what I did so far. I didn’t use DataSet yet or maybe I don’t need to. var df: DataFrame = null for(message <- messages) { val bodyRdd = sc.parallelize(message.getBody() :: Nil) val fileDf = sqlContext.read.json(bodyRdd) .select(

Re: JSON Usage

2016-04-17 Thread Hyukjin Kwon
Hi! Personally, I don't think it necessarily needs to be DataSet for your goal. Just select your data at "s3" from DataFrame loaded by sqlContext.read.json(). You can try to printSchema() to check the nested schema and then select the data. Also, I guess (from your codes) you are trying to

Re: JSON Usage

2016-04-15 Thread Benjamin Kim
Holden, If I were to use DataSets, then I would essentially do this: val receiveMessageRequest = new ReceiveMessageRequest(myQueueUrl) val messages = sqs.receiveMessage(receiveMessageRequest).getMessages() for (message <- messages.asScala) { val files =

Re: JSON Usage

2016-04-14 Thread Holden Karau
You could certainly use RDDs for that, you might also find using Dataset selecting the fields you need to construct the URL to fetch and then using the map function to be easier. On Thu, Apr 14, 2016 at 12:01 PM, Benjamin Kim wrote: > I was wonder what would be the best way

JSON Usage

2016-04-14 Thread Benjamin Kim
I was wonder what would be the best way to use JSON in Spark/Scala. I need to lookup values of fields in a collection of records to form a URL and download that file at that location. I was thinking an RDD would be perfect for this. I just want to hear from others who might have more experience