I think I'm getting close to find the reason:

When I initialize the SparkContext, the following code is executed:
def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, 
serializer,
             conf, jsc, profiler_cls):
    self.environment = environment or {}
    # java gateway must have been launched at this point.
    if conf is not None and conf._jconf is not None:
        # conf has been initialized in JVM properly, so use conf directly. This 
represent the
        # scenario that JVM has been launched before SparkConf is created (e.g. 
SparkContext is
        # created and then stopped, and we create a new SparkConf and new 
SparkContext again)
       self._conf = conf
    else:
        self._conf = SparkConf(_jvm=SparkContext._jvm)

So I can see that the only way that my SparkConf will be used is if it also has 
a _jvm object.
I've used spark-submit to submit my job and printed the _jvm object but it is 
null, which explains why my SparkConf object is ignored.
I've tried running exactly the same on Spark 2.0.1 and it worked! My SparkConf 
object had a valid _jvm object.
Anybody knows what changed? Or if I got something wrong?

Thanks :)

Sidney Feiner   /  SW Developer
M: +972.528197720  /  Skype: sidney.feiner.startapp

[StartApp]<http://www.startapp.com/>

From: Sidney Feiner
Sent: Thursday, January 26, 2017 9:26 AM
To: user@spark.apache.org
Subject: [PySpark 2.1.0] - SparkContext not properly initialized by SparkConf

Hey, I'm pasting a question I asked on Stack Overflow without getting any 
answers(:()
I hope somebody here knows the answer, thanks in advance!
Link to 
post<https://stackoverflow.com/questions/41847113/pyspark-2-1-0-sparkcontext-not-properly-initialized-by-sparkconf>
I'm migrating from Spark 1.6 to 2.1.0 and I've run into a problem migrating my 
PySpark application.
I'm dynamically setting up my SparkConf object based on configurations in a 
file and when I was on Spark 1.6, the app would run with the correct configs. 
But now, when I open the Spark UI, I can see that NONE of those configs are 
loaded into the SparkContext. Here's my code:
spark_conf = SparkConf().setAll(
    filter(lambda x: x[0].startswith('spark.'), conf_dict.items())
)
sc = SparkContext(conf=spark_conf)
I've also added a print before initializing the SparkContext to make sure the 
SparkConf has all the relevant configs:
[print("{0}: {1}".format(key, value)) for (key, value) in spark_conf.getAll()]
And this outputs all the configs I need:
*         spark.app.name: MyApp
*         spark.akka.threads: 4
*         spark.driver.memory: 2G
*         spark.streaming.receiver.maxRate: 25
*         spark.streaming.backpressure.enabled: true
*         spark.executor.logs.rolling.maxRetainedFiles: 7
*         spark.executor.memory: 3G
*         spark.cores.max: 24
*         spark.executor.cores: 4
*         spark.streaming.blockInterval: 350ms
*         spark.memory.storageFraction: 0.2
*         spark.memory.useLegacyMode: false
*         spark.memory.fraction: 0.8
*         spark.executor.logs.rolling.time.interval: daily
I submit my job with the following:
/usr/local/spark/bin/spark-submit --conf spark.driver.host=i-${HOSTNAME} 
--master spark://i-${HOSTNAME}:7077 /path/to/main/file.py /path/to/config/file
Does anybody know why my SparkContext doesn't get initialized with my SparkConf?
Thanks :)


Sidney Feiner   /  SW Developer
M: +972.528197720  /  Skype: sidney.feiner.startapp

[StartApp]<http://www.startapp.com/>

Reply via email to