Re: Saving very large data sets as Parquet on S3

2014-10-24 Thread Haoyuan Li
Daniel, Currently, having Tachyon will at least help on the input part in this case. Haoyuan On Fri, Oct 24, 2014 at 2:01 PM, Daniel Mahler wrote: > I am trying to convert some json logs to Parquet and save them on S3. > In principle this is just > > import org.apache.spark._ > val sqlContext

Fwd: Saving very large data sets as Parquet on S3

2014-10-24 Thread Daniel Mahler
I am trying to convert some json logs to Parquet and save them on S3. In principle this is just import org.apache.spark._ val sqlContext = new sql.SQLContext(sc) val data = sqlContext.jsonFile(s3n://source/path/*/*",10e-8) data.registerAsTable("data") data.saveAsParquetFile("s3n://target/path) Th

Saving very large data sets as Parquet on S3

2014-10-20 Thread Daniel Mahler
I am trying to convert some json logs to Parquet and save them on S3. In principle this is just import org.apache.spark._ val sqlContext = new sql.SQLContext(sc) val data = sqlContext.jsonFile(s3n://source/path/*/*",10e-8) data.registerAsTable("data") data.saveAsParquetFile("s3n://target/path) Th