Hello Ronald, Since you have placed the file under HDFS, you might same change the path name to:
val lines = sc.textFile("hdfs://user/taylor/Spark/Warehouse.java") Sent from my iPhone Pardon the dumb thumb typos :) > On Feb 28, 2016, at 9:36 PM, Taylor, Ronald C <ronald.tay...@pnnl.gov> wrote: > > > Hello folks, > > I am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster at > our lab. I am trying to use the PySpark shell for the first time. and am > attempting to duplicate the documentation example of creating an RDD which > I called "lines" using a text file. > > I placed a a text file called Warehouse.java in this HDFS location: > > [rtaylor@bigdatann ~]$ hadoop fs -ls /user/rtaylor/Spark > -rw-r--r-- 3 rtaylor supergroup 1155355 2016-02-28 18:09 > /user/rtaylor/Spark/Warehouse.java > [rtaylor@bigdatann ~]$ > > I then invoked sc.textFile()in the PySpark shell.That did not work. See > below. Apparently a class is not found? Don't know why that would be the > case. Any guidance would be very much appreciated. > > The Cloudera Manager for the cluster says that Spark is operating in the > "green", for whatever that is worth. > > - Ron Taylor > > >>> lines = sc.textFile("file:///user/taylor/Spark/Warehouse.java") > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py", > line 451, in textFile > return RDD(self._jsc.textFile(name, minPartitions), self, > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py", > line 36, in deco > return f(*a, **kw) > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. > : java.lang.NoClassDefFoundError: Could not initialize class > org.apache.spark.rdd.RDDOperationScope$ > at org.apache.spark.SparkContext.withScope(SparkContext.scala:709) > at org.apache.spark.SparkContext.textFile(SparkContext.scala:825) > at > org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:191) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > > >>>