RDDOperationScope is in spark-core_2.1x jar file. 7148 Mon Feb 29 09:21:32 PST 2016 org/apache/spark/rdd/RDDOperationScope.class
Can you check whether the spark-core jar is in classpath ? FYI On Mon, Feb 29, 2016 at 1:40 PM, Taylor, Ronald C <ronald.tay...@pnnl.gov> wrote: > Hi Jules, folks, > > > > I have tried “hdfs://<HDFS filepath>” as well as “file://<local Linux > filepath>”. And several variants. Every time, I get the same msg – > NoClassDefFoundError. See below. Why do I get such a msg, if the problem is > simply that Spark cannot find the text file? Doesn’t the error msg indicate > some other source of the problem? > > > > I may be missing something in the error report; I am a Java person, not a > Python programmer. But doesn’t it look like a call to a Java class > –something associated with “o9.textFile” - is failing? If so, how do I > fix this? > > > > Ron > > > > > > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py", > line 451, in textFile > > return RDD(self._jsc.textFile(name, minPartitions), self, > > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py", > line 36, in deco > > return f(*a, **kw) > > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > > py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. > > : java.lang.NoClassDefFoundError: Could not initialize class > org.apache.spark.rdd.RDDOperationScope$ > > > > Ronald C. Taylor, Ph.D. > > Computational Biology & Bioinformatics Group > > Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle) > > Richland, WA 99352 > > phone: (509) 372-6568, email: ronald.tay...@pnnl.gov > > web page: http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048 > > > > *From:* Jules Damji [mailto:dmat...@comcast.net] > *Sent:* Sunday, February 28, 2016 10:07 PM > *To:* Taylor, Ronald C > *Cc:* user@spark.apache.org; ronald.taylo...@gmail.com > *Subject:* Re: a basic question on first use of PySpark shell and > example, which is failing > > > > > > Hello Ronald, > > > > Since you have placed the file under HDFS, you might same change the path > name to: > > > > val lines = sc.textFile("hdfs://user/taylor/Spark/Warehouse.java") > > > Sent from my iPhone > > Pardon the dumb thumb typos :) > > > On Feb 28, 2016, at 9:36 PM, Taylor, Ronald C <ronald.tay...@pnnl.gov> > wrote: > > > > Hello folks, > > > > I am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster > at our lab. I am trying to use the PySpark shell for the first time. and am > attempting to duplicate the documentation example of creating an RDD > which I called "lines" using a text file. > > I placed a a text file called Warehouse.java in this HDFS location: > > > [rtaylor@bigdatann ~]$ hadoop fs -ls /user/rtaylor/Spark > -rw-r--r-- 3 rtaylor supergroup 1155355 2016-02-28 18:09 > /user/rtaylor/Spark/Warehouse.java > [rtaylor@bigdatann ~]$ > > I then invoked sc.textFile()in the PySpark shell.That did not work. See > below. Apparently a class is not found? Don't know why that would be the > case. Any guidance would be very much appreciated. > > The Cloudera Manager for the cluster says that Spark is operating in the > "green", for whatever that is worth. > > - Ron Taylor > > > >>> lines = sc.textFile("file:///user/taylor/Spark/Warehouse.java") > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py", > line 451, in textFile > return RDD(self._jsc.textFile(name, minPartitions), self, > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py", > line 36, in deco > return f(*a, **kw) > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. > : java.lang.NoClassDefFoundError: Could not initialize class > org.apache.spark.rdd.RDDOperationScope$ > at org.apache.spark.SparkContext.withScope(SparkContext.scala:709) > at org.apache.spark.SparkContext.textFile(SparkContext.scala:825) > at > org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:191) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > > >>> > >