Hello folks! I am running CDH 5.15.0 with parcels and installed Spark 2 as per https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html I also updated the alternatives so that spark-shell, spark-submit and pyspark call spark2-shell, spark2-submit and pyspark2. I have not installed Spark 1.6. I installed Livy manually on one of the nodes, and I can successfully use it with Scala code.
Now, I try to use it with pyspark. My first test was using Postman and sending requests directly. I sent: { "kind": "pyspark", "proxyUser": "spark" } The session starts up fine on the cluster, I can eventually see the Spark UI coming up, but the log contains: 18/09/12 15:52:59 INFO driver.RSCDriver: Connecting to: controller.lama.nuc:10001 18/09/12 15:52:59 INFO driver.RSCDriver: Starting RPC server... 18/09/12 15:53:00 INFO rpc.RpcServer: Connected to the port 10000 18/09/12 15:53:00 WARN rsc.RSCConf: Your hostname, worker005.lama.nuc, resolves to a loopback address, but we couldn't find any external IP address! 18/09/12 15:53:00 WARN rsc.RSCConf: Set livy.rsc.rpc.server.address if you need to bind to another address. 18/09/12 15:53:00 INFO driver.RSCDriver: Received job request 4db9216d-355d-4a41-9365-32968f05e0a7 18/09/12 15:53:00 INFO driver.RSCDriver: SparkContext not yet up, queueing job request. 18/09/12 15:53:00 ERROR repl.PythonInterpreter: Process has died with 1 18/09/12 15:53:00 ERROR repl.PythonInterpreter: Traceback (most recent call last): File "/yarn/nm/usercache/livy/appcache/application_1535188013308_0051/container_1535188013308_0051_01_000001/tmp/3015653701235928503", line 643, in <module> sys.exit(main()) File "/yarn/nm/usercache/livy/appcache/application_1535188013308_0051/container_1535188013308_0051_01_000001/tmp/3015653701235928503", line 533, in main exec('from pyspark.shell import sc', global_dict) File "<string>", line 1, in <module> File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/pyspark.zip/pyspark/shell.py", line 38, in <module> File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 292, in _ensure_initialized File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/pyspark.zip/pyspark/java_gateway.py", line 47, in launch_gateway File "/usr/lib64/python2.7/UserDict.py", line 23, in __getitem__ raise KeyError(key) KeyError: 'PYSPARK_GATEWAY_SECRET' My livy-env.sh contains: export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64 export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop export HADOOP_CONF_DIR=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/conf/yarn-conf export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2 export SPARK_CONF_DIR=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/conf export LIVY_LOG_DIR=/var/log/livy My livy.conf contains: livy.repl.jars = hdfs:///livy/repl/commons-codec-1.9.jar,hdfs:///livy/repl/livy-core_2.11-0.4.0-SNAPSHOT.jar,hdfs:///livy/repl/livy-repl_2.11-0.4.0-SNAPSHOT.jar,hdfs:///some/custom/library-i-wrote-myself.jar livy.rsc.jars = hdfs:///livy/rsc/livy-api-0.4.0-SNAPSHOT.jar,hdfs:///livy/rsc/livy-rsc-0.4.0-SNAPSHOT.jar,hdfs:///livy/rsc/netty-all-4.0.29.Final.jar livy.rsc.rpc.server.address = 192.168.42.200 livy.server.recovery.state-store.url = worker001.lama.nuc:2181,worker002.lama.nuc:2181,worker003.lama.nuc:2181 livy.server.recovery.state-store = zookeeper livy.spark.deploy-mode = cluster livy.spark.master = yarn What am I missing? Regards Jens