Re: Using ItemSimilarity.scala from Java

Frank Scholten Fri, 26 Sep 2014 04:42:42 -0700

When I replaced the TextDelimitedIndexedDatasetReader declaration with
TDIndexedDatasetReader I don't get the NoSuchMethodError anymore and the
process continues.


But now I end up with another error. Any idea what is going on here? Is it
because I am trying to print the contents of the Matrix at the end and I
should process them differently?

2014-09-26 13:36:30,605 DEBUG Logging$class - Task 5's epoch is 3
2014-09-26 13:36:30,605 DEBUG Logging$class - Fetching outputs for shuffle
2, reduce 0
2014-09-26 13:36:30,605 DEBUG Logging$class - Fetching map output location
for shuffle 2, reduce 0 took 0 ms
2014-09-26 13:36:30,605 INFO  Logging$class - maxBytesInFlight: 50331648,
targetRequestSize: 10066329
2014-09-26 13:36:30,605 INFO  Logging$class - Getting 1 non-empty blocks
out of 1 blocks
2014-09-26 13:36:30,605 INFO  Logging$class - Started 0 remote fetches in 0
ms
2014-09-26 13:36:30,606 DEBUG Logging$class - Got local block shuffle_2_0_0
2014-09-26 13:36:30,606 DEBUG Logging$class - Got local blocks in  1 ms ms
2014-09-26 13:36:30,662 ERROR Logging$class - Exception in task ID 5
java.io.NotSerializableException: org.apache.mahout.math.DenseVector
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
    at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
    at
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
    at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
    at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
    at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:71)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
2014-09-26 13:36:30,667 DEBUG Logging$class - parentName: , name:
TaskSet_4, runningTasks: 0
2014-09-26 13:36:30,669 WARN  Logging$class - Lost TID 5 (task 4.0:0)
2014-09-26 13:36:30,671 ERROR Logging$class - Task 4.0:0 had a not
serializable result: java.io.NotSerializableException:
org.apache.mahout.math.DenseVector; not retrying
2014-09-26 13:36:30,672 INFO  Logging$class - Removed TaskSet 4.0, whose
tasks have all completed, from pool
2014-09-26 13:36:30,676 INFO  Logging$class - Cancelling stage 4
2014-09-26 13:36:30,680 DEBUG Logging$class - After removal of stage 5,
remaining stages = 1
2014-09-26 13:36:30,680 INFO  Logging$class - Failed to run reduce at
SparkEngine.scala:72
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 4.0:0 had a not serializable result:
java.io.NotSerializableException: org.apache.mahout.math.DenseVector
    at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
    at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
    at scala.Option.foreach(Option.scala:236)
    at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
    at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
2014-09-26 13:36:30,681 DEBUG Logging$class - Removing running stage 4
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2014-09-26 13:36:30,682 DEBUG Logging$class - Removing pending status for
stage 4
2014-09-26 13:36:30,682 DEBUG Logging$class - After removal of stage 4,
remaining stages = 0
2014-09-26 13:36:30,683 DEBUG FileSystem - Starting clear of FileSystem
cache with 1 elements.
2014-09-26 13:36:30,684 DEBUG FileSystem - Removing filesystem for file:///
2014-09-26 13:36:30,684 DEBUG FileSystem - Removing filesystem for file:///
2014-09-26 13:36:30,684 DEBUG FileSystem - Done clearing cache
2014-09-26 13:36:30,685 DEBUG Logging$class - Shutdown hook called
Disconnected from the target VM, address: '127.0.0.1:53897', transport:
'socket'

On Fri, Sep 12, 2014 at 7:05 PM, Pat Ferrel <[email protected]> wrote:

> True but a bit daunting to get started.
>
> Here is a translation to Scala.
> https://gist.github.com/pferrel/9cfee8b5723bb2e2a22c
>
> It uses the MahoutDriver and IndexedDataset and is compiled from
> org.apache.mahout.examples, which I created and so you’ll need to add the
> right imports if you do it somewhere else. For a bonus it uses Spark's
> parallel writing to part files and you can add command line parsing quite
> easily.
>
> article_views.txt:
> pat,article1
> pat,article2
> pat,article3
> frank,article3
> frank,article4
> joe-bob,article10
> joe-bob,article11
>
> indicators/part-00000
> article2        article1:3.819085009768877 article3:1.046496287529096
> article3        article2:1.046496287529096 article4:1.046496287529096
> article1:1.046496287529096
> article11       article10:3.819085009768877
> article4        article3:1.046496287529096
> article10       article11:3.819085009768877
> article1        article2:3.819085009768877 article3:1.046496287529096
>
> The search using frank’s history will return article2, article3(filter
> out), article4(filter out), and article 1 as you’d expect.
>
> Oh, and I was wrong about the bug—works from the current repo.
>
> You still need to get the right jars in the classpath when running from
> the command line
>
> On Sep 12, 2014, at 9:04 AM, Peter Wolf <[email protected]> wrote:
>
> I'm new here, but I just wanted to add that Scala is extremely cool.  I've
> moved to Scala wherever possible in my work.  It's really nice, and well
> worth effort to learn.  Scala has put the joy back into programming.
>
> Instead of trying to call Scala from Java, perhaps you might enjoy writing
> your stuff in Scala.
>
> On Fri, Sep 12, 2014 at 11:53 AM, Pat Ferrel <[email protected]>
> wrote:
>
> > #1 I’m glad to see someone using this. I haven’t tried calling Scala from
> > Java and would expect a fair amount of difficulty with it. Scala
> constructs
> > objects to deal with its new features (anonymous functions, traits,
> > implicits) and you have to guess at what those will look like to java.
> > Maybe you could try the Scala community.
> >
> > Intellij will auto convert java to scala when you paste it into a .scala
> > file. For some reason yours doesn’t seem to work but I’ve seen it work
> > pretty well.
> >
> > I started to convert your code and it pointed out a bug in mine, a bad
> > value in the default schema. I’d be interested in helping with this as a
> > way to work out the kinks in creating drivers.
> >
> > Are you interested in this or are you set on using java? Either way I’ll
> > post a gist of your code using the MahoutDriver as the template and
> > converted to Scala. It’ll take me a few minutes.
> >
> > On Sep 12, 2014, at 6:46 AM, Frank Scholten <[email protected]>
> > wrote:
> >
> > Hi all,
> >
> > Trying out the new spark-itemsimilarity code, but I am new to Scala and
> > have hard time calling certain methods from Java.
> >
> > Here is a Gist with a Java main that runs the cooccurrence analysis:
> >
> > https://gist.github.com/frankscholten/d373c575ad721dd0204e
> >
> > When I run this I get an exception:
> >
> > Exception in thread "main" java.lang.NoSuchMethodError:
> >
> >
> org.apache.mahout.drivers.TextDelimitedIndexedDatasetReader.readElementsFrom(Ljava/lang/String;Lcom/google/common/collect/BiMap;)Lorg/apache/mahout/drivers/IndexedDataset;
> >
> > What do I have to do here to use the Scala readers from Java?
> >
> > Cheers,
> >
> > Frank
> >
> >
>
>

Re: Using ItemSimilarity.scala from Java

Reply via email to