Re: Spark distributed SQL: JSON Data set on all worker node

2015-05-03 Thread Ted Yu
Looking at SQLContext.scala (in master branch), jsonFile() returns DataFrame directly: def jsonFile(path: String, samplingRatio: Double): DataFrame = FYI On Sun, May 3, 2015 at 2:14 AM, ayan guha wrote: > Yes it is possible. You need to use jsonfile method on SQL context and > then create a d

Re: Spark distributed SQL: JSON Data set on all worker node

2015-05-03 Thread Dean Wampler
Note that each JSON object has to be on a single line in the files. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition (O'Reilly) Typesafe @deanwampler http://polyglotprogramming.com

Re: Spark distributed SQL: JSON Data set on all worker node

2015-05-03 Thread ayan guha
Yes it is possible. You need to use jsonfile method on SQL context and then create a dataframe from the rdd. Then register it as a table. Should be 3 lines of code, thanks to spark. You may see few YouTube video esp for unifying pipelines. On 3 May 2015 19:02, "Jai" wrote: > Hi, > > I am noob to