from:"Jelez Raditchkov"

RE: Error building a self contained Spark app

2016-03-04 Thread Jelez Raditchkov

Ok this is what I have: object SQLHiveContextSingleton { @transient private var instance: HiveContext = _ def getInstance(sparkContext: SparkContext): HiveContext = { synchronized { if (instance == null || sparkContext.isStopped) { instance = new

FW: How to get the singleton instance of SQLContext/HiveContext: val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)‏

2016-03-04 Thread Jelez Raditchkov

From: je...@hotmail.com To: yuzhih...@gmail.com Subject: RE: How to get the singleton instance of SQLContext/HiveContext: val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)‏ Date: Fri, 4 Mar 2016 14:09:20 -0800 Below code is from the soruces, is this what you ask? class

Best way to merge files from streaming jobs

2016-03-04 Thread Jelez Raditchkov

My streaming job is creating files on S3.The problem is that those files end up very small if I just write them to S3 directly.This is why I use coalesce() to reduce the number of files and make them larger. However, coalesce shuffles data and my job processing time ends up higher than

How to get the singleton instance of SQLContext/HiveContext: val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)

2016-03-04 Thread Jelez Raditchkov

What is the best approach to use getOrCreate for streaming job with HiveContext.It seems for SQLContext the recommended approach is to use getOrCreate: https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operationsval sqlContext =

S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

2016-03-04 Thread Jelez Raditchkov

Working on a streaming job with DirectParquetOutputCommitter to S3I need to use PartitionBy and hence SaveMode.Append Apparently when using SaveMode.Append spark automatically defaults to the default parquet output committer and ignores DirectParquetOutputCommitter. My problems are:1. the

RE: Error building a self contained Spark app

FW: How to get the singleton instance of SQLContext/HiveContext: val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)‏

Best way to merge files from streaming jobs

How to get the singleton instance of SQLContext/HiveContext: val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)

S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

5 matches

Site Navigation

Mail list logo

Footer information