Thank you, DataBricks Rules !!!!
On Fri, Jul 4, 2014 at 1:58 PM, Michael Armbrust <mich...@databricks.com> wrote: > sqlContext.jsonFile("data.json") <---- Is this already available in the >> master branch??? >> > > Yes, and it will be available in the soon to come 1.0.1 release. > > >> But the question about the use a combination of resources (Memory >> processing & Disk processing) still remains. >> > > This code should work just fine off of disk. I would not recommend trying > to cache the JSON data in memory as it is heavily nested and this is a > place where the columnar storage code does not do great. Instead, maybe > try converting it to parquet and reading that data from disk > (tweets.saveAsParquetFile(...); > sqlContext.parquetFile(...).registerAsTable(...)) You should see improved > compression and much better performance for queries that only read some of > the columns. You could also just pull out the relevant columns and cache > only that data in memory: > > sqlContext.jsonFile("data.json").registerAsTable("allTweets") > sql("SELECT text FROM allTweets").registerAsTable("tweetText") > sqlContext.cacheTable("tweetText") >