Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-08 Thread bsikander
Qiao, Richard wrote > Comparing #1 and #3, my understanding of “submitted” is “the jar is > submitted to executors”. With this concept, you may define your own > status. In SparkLauncher, SUBMITTED means that the Driver was able to acquire cores from Spark cluster and Launcher is waiting for

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-08 Thread bsikander
Qiao, Richard wrote > For your question of example, the answer is yes. Perfect. I am assuming that this is true for Spark-standalone/YARN/Mesos. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To

Best way of shipping self-contained pyspark jobs with 3rd-party dependencies

2017-12-08 Thread Sergey Zhemzhitsky
Hi PySparkers, What currently is the best way of shipping self-contained pyspark jobs with 3rd-party dependencies? There are some open JIRA issues [1], [2] as well as corresponding PRs [3], [4] and articles [5], [6], [7] regarding setting up the python environment with conda and virtualenv

UDF issues with spark

2017-12-08 Thread Afshin, Bardia
Using pyspark cli on spark 2.1.1 I’m getting out of memory issues when running the udf function on a recordset count of 10 with a mapping of the same value (arbirtrary for testing purposes). This is on amazon EMR release label 5.6.0 with the following hardware specs m4.4xlarge 32 vCPU, 64 GiB