Re: Local spark interpreter with extra java options

Lior Chaga Sun, 11 Jul 2021 01:14:32 -0700

So after adding the quotes in both SparkInterpreterLauncher
and interpreter.sh, interpreter is still failing with same error of
Unrecognized option.
But the weird thing is that if I copy the command supposedly executed from
zeppelin (as it is printed to log) and run it directly in shell, the
interpreter process is properly running. So my guess is that the forked
process command that is created, is not really identical to the one that is
logged.


This is how my cmd looks like (censored a bit):

/usr/local/spark/bin/spark-submit
--class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
--driver-class-path
:/zeppelin/local-repo/spark/*:/zeppelin/interpreter/spark/*:::/zeppelin/inter
preter/zeppelin-interpreter-shaded-0.10.0-SNAPSHOT.jar:/zeppelin/interpreter/spark/spark-interpreter-0.10.0-SNAPSHOT.jar:/etc/hadoop/conf

*--driver-java-options " -DSERVICENAME=zeppelin_docker
-Dfile.encoding=UTF-8
-Dlog4j.configuration=file:///zeppelin/conf/log4j.properties
-Dlog4j.configurationFile=file:///zeppelin/conf/log4j2.properties
-Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-shared_process--zeppelin-test-spark3-7d74d5df4-2g8x5.log"
*
--conf spark.driver.host=10.135.120.245
--conf "spark.dynamicAllocation.minExecutors=1"
--conf "spark.shuffle.service.enabled=true"
--conf "spark.sql.parquet.int96AsTimestamp=true"
--conf "spark.ui.retainedTasks=10000"
--conf "spark.executor.heartbeatInterval=600s"
--conf "spark.ui.retainedJobs=100"
--conf "spark.sql.ui.retainedExecutions=10"
--conf "spark.hadoop.cloneConf=true"
--conf "spark.debug.maxToStringFields=200000"
--conf "spark.executor.memory=70g"
--conf
"spark.executor.extraClassPath=../mysql-connector-java-8.0.18.jar:../guava-19.0.jar"

--conf "spark.hadoop.fs.permissions.umask-mode=000"
--conf "spark.memory.storageFraction=0.1"
--conf "spark.scheduler.mode=FAIR"
--conf "spark.sql.adaptive.enabled=true"
--conf
"spark.master=mesos://zk://zk003:2181,zk004:2181,zk006:2181,/mesos-zeppelin"

--conf "spark.driver.memory=15g"
--conf "spark.io.compression.codec=lz4"
--conf "spark.executor.uri=
https://artifactory.company.com/artifactory/static/spark/spark-dist/spark-3.1.2.2-hadoop-2.7-zulu";
-
-conf "spark.ui.retainedStages=500"
--conf "spark.mesos.uris=
https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/mysql-connector-java-8.0.18.jar,https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/guava-19.0.jar";

--conf "spark.driver.maxResultSize=8g"
*--conf "spark.executor.extraJavaOptions=-DSERVICENAME=Zeppelin
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2015
-XX:-OmitStackTraceInFastThrow -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=55745
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -verbose:gc
-Dlog4j.configurationFile=/etc/config/log4j2-executor-config.xml
-XX:+UseG1GC -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps -XX:+PrintFlagsFinal -XX:+PrintReferenceGC
-XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy
-XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark
-XX:+PrintStringDeduplicationStatistics -XX:+UseStringDeduplication
-XX:InitiatingHeapOccupancyPercent=35
-Dhttps.proxyHost=proxy.service.consul -Dhttps.proxyPort=3128" *
--conf "spark.dynamicAllocation.enabled=true"
--conf "spark.default.parallelism=1200"
--conf "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2"
--conf
"spark.hadoop.fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS"

--conf "spark.app.name=zeppelin_docker_spark3"
--conf "spark.shuffle.service.port=7337"
--conf "spark.memory.fraction=0.75"
--conf "spark.mesos.coarse=true"
--conf "spark.ui.port=4041"
--conf "spark.dynamicAllocation.executorIdleTimeout=60s"
--conf "spark.sql.shuffle.partitions=1200"
--conf "spark.sql.parquet.outputTimestampType=TIMESTAMP_MILLIS"
--conf "spark.dynamicAllocation.cachedExecutorIdleTimeout=120s"
--conf "spark.network.timeout=1200s"
--conf "spark.cores.max=600"
--conf
"spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"

--conf "spark.worker.timeout=150000"
*--conf
"spark.driver.extraJavaOptions=-Dhttps.proxyHost=proxy.service.consul
-Dhttps.proxyPort=3128
-Dlog4j.configuration=file:/usr/local/spark/conf/log4j.properties
-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
-Djavax.jdo.option.ConnectionPassword=2eebb22277
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://proxysql-backend.service.consul.company.com:6033/hms?useSSL=false&databaseTerm=SCHEMA&nullDatabaseMeansCurrent=true
<http://proxysql-backend.service.consul.company.com:6033/hms?useSSL=false&databaseTerm=SCHEMA&nullDatabaseMeansCurrent=true>
-Djavax.jdo.option.ConnectionUserName=hms_rw" *
--conf "spark.files.overwrite=true"
/zeppelin/interpreter/spark/spark-interpreter-0.10.0-SNAPSHOT.jar
10.135.120.245
36419
spark-shared_process :



*Error: Unrecognized option:
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2015*

Will continue tackling it...

On Thu, Jul 8, 2021 at 4:49 PM Jeff Zhang <zjf...@gmail.com> wrote:

> Thanks Lior for the investigation.
>
>
> Lior Chaga <lio...@taboola.com> 于2021年7月8日周四 下午8:31写道：
>
>> Ok, I think I found the issue. It's not only that the quotations are
>> missing from the --conf param, they are also missing from
>> the --driver-java-options, which is concatenated to
>> the INTERPRETER_RUN_COMMAND in interpreter.sh
>>
>> I will fix it in my build, but would like a confirmation that this is
>> indeed the issue (and I'm not missing anything), so I'd open a pull
>> request.
>>
>> On Thu, Jul 8, 2021 at 3:05 PM Lior Chaga <lio...@taboola.com> wrote:
>>
>>> I'm trying to run zeppelin using local spark interpreter.
>>> Basically everything works, but if I try to set
>>> `spark.driver.extraJavaOptions` or `spark.executor.extraJavaOptions`
>>> containing several arguments, I get an exception.
>>> For instance, for providing `-DmyParam=1 -DmyOtherParam=2`, I'd get:
>>> Error: Unrecognized option: -DmyOtherParam=2
>>>
>>> I noticed that the spark submit looks as follow:
>>>
>>> spark-submit --class
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 
>>> --driver-class-path
>>> ....   *--conf spark.driver.extraJavaOptions=-DmyParam=1
>>> -DmyOtherParam=2*
>>>
>>> So I tried to patch SparkInterpreterLauncher to add quotation marks
>>> (like in the example from spark documentation -
>>> https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties
>>> )
>>>
>>> I see that the quotation marks were added: *--conf
>>> "spark.driver.extraJavaOptions=-DmyParam=1 -DmyOtherParam=2"*
>>> But I still get the same error.
>>>
>>> Any idea how I can make it work?
>>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Local spark interpreter with extra java options

Reply via email to