Hello Ronald,

Since you have placed the file under HDFS, you might same change the path name 
to:

val lines = sc.textFile("hdfs://user/taylor/Spark/Warehouse.java")

Sent from my iPhone
Pardon the dumb thumb typos :)

> On Feb 28, 2016, at 9:36 PM, Taylor, Ronald C <ronald.tay...@pnnl.gov> wrote:
> 
> 
> Hello folks,
> 
> I  am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster at 
> our lab. I am trying to use the PySpark shell for the first time. and am 
> attempting to  duplicate the documentation example of creating an RDD  which 
> I called "lines" using a text file.
> 
> I placed a a text file called Warehouse.java in this HDFS location:
> 
> [rtaylor@bigdatann ~]$ hadoop fs -ls /user/rtaylor/Spark
> -rw-r--r--   3 rtaylor supergroup    1155355 2016-02-28 18:09 
> /user/rtaylor/Spark/Warehouse.java
> [rtaylor@bigdatann ~]$ 
> 
> I then invoked sc.textFile()in the PySpark shell.That did not work. See 
> below. Apparently a class is not found? Don't know why that would be the 
> case. Any guidance would be very much appreciated.
> 
> The Cloudera Manager for the cluster says that Spark is operating  in the 
> "green", for whatever that is worth.
> 
>  - Ron Taylor
> 
> >>> lines = sc.textFile("file:///user/taylor/Spark/Warehouse.java")
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File 
> "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py",
>  line 451, in textFile
>     return RDD(self._jsc.textFile(name, minPartitions), self,
>   File 
> "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File 
> "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py",
>  line 36, in deco
>     return f(*a, **kw)
>   File 
> "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile.
> : java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.spark.rdd.RDDOperationScope$
>     at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
>     at org.apache.spark.SparkContext.textFile(SparkContext.scala:825)
>     at 
> org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:191)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>     at py4j.Gateway.invoke(Gateway.java:259)
>     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>     at py4j.commands.CallCommand.execute(CallCommand.java:79)
>     at py4j.GatewayConnection.run(GatewayConnection.java:207)
>     at java.lang.Thread.run(Thread.java:745)
> 
> >>> 

Reply via email to