Hi Alex, Thanks a lot for helping out with this.
You're correct, but it doesn't seem that it's the interpreter-settings.json for Spark interpreter that is being used. It's conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have ```partial-json "spark": { "id": "spark", "name": "spark", "group": "spark", "properties": { "SPARK_HOME": { "name": "SPARK_HOME", "value": "", "type": "string", "description": "Location of spark distribution" }, "master": { "name": "master", "value": "local[*]", "type": "string", "description": "Spark master uri. local | yarn-client | yarn-cluster | spark master address of standalone mode, ex) spark://master_host:7077" }, ``` That "master" should be "spark.master". By adding an explicit spark.master with the value "local[*]" I can use all cores as expected. Without this and printing sc.master I get "local". With the addition of the spark.master property set to "local[*]" and printing sc.master I get "local[*]". My conclusion is that conf/interpreter.json isn't in sync with the interpreter-settings.json for Spark interpreter. Best regards, Patrik Iselind On Sat, May 16, 2020 at 11:22 AM Alex Ott <alex...@gmail.com> wrote: > Spark master is set to `local[*]` by default. Here is corresponding piece > form interpreter-settings.json for Spark interpreter: > > "master": { > "envName": "MASTER", > "propertyName": "spark.master", > "defaultValue": "local[*]", > "description": "Spark master uri. local | yarn-client | > yarn-cluster | spark master address of standalone mode, ex) > spark://master_host:7077", > "type": "string" > }, > > > Patrik Iselind at "Sun, 10 May 2020 20:31:08 +0200" wrote: > PI> Hi Jeff, > > PI> I've tried the release from http://zeppelin.apache.org/download.html, > both > in a docker and without a docker. They both have the same issue as > PI> previously described. > > PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps > using some environment variable? > > PI> When is the next Zeppelin 0.9.0 docker image planned to be released? > > PI> Best Regards, > PI> Patrik Iselind > > PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zjf...@gmail.com> wrote: > > PI> Hi Patric, > PI> > PI> Do you mind to try the 0.9.0-preview, it might be an issue of > docker container. > PI> > PI> http://zeppelin.apache.org/download.html > > PI> Patrik Iselind <patrik....@gmail.com> 于2020年5月10日周日上午2:30写道: > PI> > PI> Hello Jeff, > PI> > PI> Thank you for looking into this for me. > PI> > PI> Using the latest pushed docker image for 0.9.0 (image > ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image > has > PI> the digest "apache/zeppelin@sha256 > :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092". > PI> > PI> If it's not on the tip of master, could you guys please > release a newer 0.9.0 image? > PI> > PI> Best Regards, > PI> Patrik Iselind > > PI> On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zjf...@gmail.com> > wrote: > PI> > PI> This might be a bug of 0.8, I tried that in 0.9 (master > branch), it works for me. > PI> > PI> print(sc.master) > PI> print(sc.defaultParallelism) > PI> > PI> --- > PI> local[*] 8 > > PI> Patrik Iselind <patrik....@gmail.com> > 于2020年5月9日周六下午8:34写道: > PI> > PI> Hi, > PI> > PI> First comes some background, then I have some > questions. > PI> > PI> Background > PI> I'm trying out Zeppelin 0.8.2 based on the Docker > image. My Docker file looks like this: > PI> > PI> ```Dockerfile > PI> FROM apache/zeppelin:0.8.2 > > PI> > PI> # Install Java and some tools > PI> RUN apt-get -y update &&\ > PI> DEBIAN_FRONTEND=noninteractive \ > PI> apt -y install vim python3-pip > PI> > PI> RUN python3 -m pip install -U pyspark > PI> > PI> ENV PYSPARK_PYTHON python3 > PI> ENV PYSPARK_DRIVER_PYTHON python3 > PI> ``` > PI> > PI> When I start a section like so > PI> > PI> ```Zeppelin paragraph > PI> %pyspark > PI> > PI> print(sc) > PI> print() > PI> print(dir(sc)) > PI> print() > PI> print(sc.master) > PI> print() > PI> print(sc.defaultParallelism) > PI> ``` > PI> > PI> I get the following output > PI> > PI> ```output > PI> <SparkContext master=local appName=Zeppelin> > ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__', > PI> '__doc__', '__enter__', '__eq__', '__exit__', > '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', > PI> '__hash__', '__init__', '__le__', '__lt__', > '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', > '__repr__', > PI> '__setattr__', '__sizeof__', '__str__', > '__subclasshook__', '__weakref__', '_accumulatorServer', > '_active_spark_context', > PI> '_batchSize', '_callsite', '_checkpointFile', > '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway', > PI> '_getJavaStorageLevel', '_initialize_context', > '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id', > PI> '_pickled_broadcast_vars', '_python_includes', > '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', > 'addFile', > PI> 'addPyFile', 'appName', 'applicationId', > 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', > 'cancelJobGroup', > PI> 'defaultMinPartitions', 'defaultParallelism', > 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty', > PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', > 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile', > PI> 'profiler_collector', 'pythonExec', 'pythonVer', > 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir', > PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel', > 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime', > PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', > 'union', 'version', 'wholeTextFiles'] local 1 > PI> ``` > PI> > PI> This even though the "master" property in the > interpretter is set to "local[*]". I'd like to use all cores on my machine. > To > PI> do that I have to explicitly create the > "spark.master" property in the spark interpretter with the value > "local[*]", then I > PI> get > PI> > PI> ```new output > PI> <SparkContext master=local[*] appName=Zeppelin> > ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__', > PI> '__doc__', '__enter__', '__eq__', '__exit__', > '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', > PI> '__hash__', '__init__', '__le__', '__lt__', > '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', > '__repr__', > PI> '__setattr__', '__sizeof__', '__str__', > '__subclasshook__', '__weakref__', '_accumulatorServer', > '_active_spark_context', > PI> '_batchSize', '_callsite', '_checkpointFile', > '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway', > PI> '_getJavaStorageLevel', '_initialize_context', > '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id', > PI> '_pickled_broadcast_vars', '_python_includes', > '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', > 'addFile', > PI> 'addPyFile', 'appName', 'applicationId', > 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', > 'cancelJobGroup', > PI> 'defaultMinPartitions', 'defaultParallelism', > 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty', > PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', > 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile', > PI> 'profiler_collector', 'pythonExec', 'pythonVer', > 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir', > PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel', > 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime', > PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', > 'union', 'version', 'wholeTextFiles'] local[*] 8 > PI> ``` > PI> This is what I want. > PI> > PI> The Questions > PI> @ Why is the "master" property not used in the > created SparkContext? > PI> @ How do I add the spark.master property to the > docker image? > PI> > PI> Any hint or support you can provide would be greatly > appreciated. > PI> > PI> Yours Sincerely, > PI> Patrik Iselind > > PI> -- > PI> Best Regards > PI> > PI> Jeff Zhang > > PI> -- > PI> Best Regards > PI> > PI> Jeff Zhang > > > > -- > With best wishes, Alex Ott > http://alexott.net/ > Twitter: alexott_en (English), alexott (Russian) >