Error about PySpark

mingda li Tue, 31 Jan 2017 18:50:01 -0800

Dear all,

We are using Zeppelin. And I have added the export
PYTHONPATH=/home/clash/sparks/spark-1.6.1-bin-hadoop12/python
to zeppelin-env.sh.
But each time, when I want to use pyspark, for example the program:


%pyspark
from pyspark import SparkContext
logFile = "hiv.data"
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print "Lines with a: %i, lines with b: %i" % (numAs, numBs)

It can firstly run well. But second time, I run it again I will get such
error:
*Traceback (most recent call last):*
*  File "/tmp/zeppelin_pyspark-4018989172273347075.py", line 238, in
<module>*
*    sc.setJobGroup(jobGroup, "Zeppelin")*
*  File
"/home/clash/sparks/spark-1.6.1-bin-hadoop12/python/pyspark/context.py",
line 876, in setJobGroup*
*    self._jsc.setJobGroup(groupId, description, interruptOnCancel)*
*AttributeError: 'NoneType' object has no attribute 'setJobGroup'*

*I need to rm */tmp/zeppelin_pyspark-4018989172273347075.py and start
zeppelin again to let it work.
Does anyone have idea why?

Thanks

Error about PySpark

Reply via email to