Hello folks,
I am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster at
our lab. I am trying to use the PySpark shell for the first time. and am
attempting to duplicate the documentation example of creating an RDD which I
called "lines" using a text file.
I placed a a text file called Warehouse.java in this HDFS location:
[rtaylor@bigdatann ~]$ hadoop fs -ls /user/rtaylor/Spark
-rw-r--r-- 3 rtaylor supergroup 1155355 2016-02-28 18:09
/user/rtaylor/Spark/Warehouse.java
[rtaylor@bigdatann ~]$
I then invoked sc.textFile()in the PySpark shell.That did not work. See below.
Apparently a class is not found? Don't know why that would be the case. Any
guidance would be very much appreciated.
The Cloudera Manager for the cluster says that Spark is operating in the
"green", for whatever that is worth.
- Ron Taylor
>>> lines = sc.textFile("file:///user/taylor/Spark/Warehouse.java")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py",
line 451, in textFile
return RDD(self._jsc.textFile(name, minPartitions), self,
File
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
File
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py",
line 36, in deco
return f(*a, **kw)
File
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile.
: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.spark.rdd.RDDOperationScope$
at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:825)
at
org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
>>>