Ok this is what I have:
object SQLHiveContextSingleton {
@transient private var instance: HiveContext = _
def getInstance(sparkContext: SparkContext): HiveContext = {
synchronized {
if (instance == null || sparkContext.isStopped) {
instance = new
From: je...@hotmail.com
To: yuzhih...@gmail.com
Subject: RE: How to get the singleton instance of SQLContext/HiveContext: val
sqlContext = SQLContext.getOrCreate(rdd.sparkContext)
Date: Fri, 4 Mar 2016 14:09:20 -0800
Below code is from the soruces, is this what you ask?
class
My streaming job is creating files on S3.The problem is that those files end up
very small if I just write them to S3 directly.This is why I use coalesce() to
reduce the number of files and make them larger.
However, coalesce shuffles data and my job processing time ends up higher than
What is the best approach to use getOrCreate for streaming job with
HiveContext.It seems for SQLContext the recommended approach is to use
getOrCreate:
https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operationsval
sqlContext =
Working on a streaming job with DirectParquetOutputCommitter to S3I need to use
PartitionBy and hence SaveMode.Append
Apparently when using SaveMode.Append spark automatically defaults to the
default parquet output committer and ignores DirectParquetOutputCommitter.
My problems are:1. the