Re: reduceByKey issue in example wordcount (scala)

Marcelo Vanzin Mon, 14 Apr 2014 11:46:56 -0700

Hi Ian,

On Mon, Apr 14, 2014 at 11:30 AM, Ian Bonnycastle <ibo...@gmail.com> wrote:
>     val sc = new SparkContext("spark://<masternodeip>:7077",
>                               "Simple App", "/usr/local/pkg/spark",
>              List("target/scala-2.10/simple-project_2.10-1.0.jar"))

Hmmm... does /usr/local/pkg/spark exist on all the worker nodes? (I
haven't particularly tried using the sparkHome argument myself, nor
have I traced through the code to see exactly what it does, but...).
I'd try to set the "sparkHome" argument to null and seeing if that
helps. (It has been working for me without it.) Since you're already
listing you app's jar file there, you don't need to explicitly call
addJar().

Note that the class that isn't being found is not a Spark class, it's
a class form your app (SimpleApp$$anonfun$3). That's most probably the
class that implements the closure you're passing as an argument to the
reduceByKey() method. Although I can't really explain why the same
isn't happening for the closure you're passing to map()...

Sorry I can't be more helpful.

> I still get the error, though, with ClassNotFoundException, unless I'm not
> understanding how to run the sc.addJar. I find it a little weird, too, that
> the Spark platform has trouble finding the code that is itself. And why only
> with the reduceByKey function is it occuring? I have no problems with any
> other code running except for that. (BTW, I don't use <masternodeip> in my
> code above... I just removed it for security purposes.)
>
> Thanks,
>
> Ian
>
>
>
> On Mon, Apr 14, 2014 at 12:45 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>>
>> Hi Ian,
>>
>> When you run your packaged application, are you adding its jar file to
>> the SparkContext (by calling the addJar() method)?
>>
>> That will distribute the code to all the worker nodes. The failure
>> you're seeing seems to indicate the worker nodes do not have access to
>> your code.
>>
>> On Mon, Apr 14, 2014 at 9:17 AM, Ian Bonnycastle <ibo...@gmail.com> wrote:
>> > Good afternoon,
>> >
>> > I'm attempting to get the wordcount example working, and I keep getting
>> > an
>> > error in the "reduceByKey(_ + _)" call. I've scoured the mailing lists,
>> > and
>> > haven't been able to find a sure fire solution, unless I'm missing
>> > something
>> > big. I did find something close, but it didn't appear to work in my
>> > case.
>> > The error is:
>> >
>> > org.apache.spark.SparkException: Job aborted: Task 2.0:3 failed 4 times
>> > (most recent failure: Exception failure:
>> > java.lang.ClassNotFoundException:
>> > SimpleApp$$anonfun$3)
>> >         at
>> >
>> > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>>
>> --
>> Marcelo
>
>

-- 
Marcelo

Re: reduceByKey issue in example wordcount (scala)

Reply via email to