The spark is included via maven classifier- the sbt line should be
libraryDependencies += "org.apache.mahout" % "mahout-spark_2.11" % "0.13.1-SNAPSHOT" classifier "spark_2.1" On Tue, Oct 3, 2017 at 2:55 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > I’m the aforementioned pferrel > > @Hoa, thanks for that reference, I forgot I had that example. First don’t > use the Hadoop part of Mahout, it is not supported and will be deprecated. > The Spark version of cooccurrence will be supported. You find it in the > SimilarityAnalysis object. > > If you go back to the last release you should be able to make that > https://github.com/pferrel/3-input-cooc <https://github.com/pferrel/3- > input-cooc> work with version updates to Mahout-0.13.0 and dependencies. > To use the latest master of Mahout, there are the problems listed below. > > > I’m having a hard time building with sbt using the mahout-spark module > when I build that latest mahout master with `mvn clean install`. This puts > the mahout-spark module in the local ~/.m2 maven cache. The structure > doesn’t match what SBT expects the path and filenames to be. > > The build.sbt `libraryDependencies` line *should* IMO be: > `"org.apache.mahout" %% "mahout-spark-2.1" % “0.13.1-SNAPSHOT` > > This is parsed by sbt to yield the path of : > org/apache/mahout/mahout-spark-2.1/0.13.1-SNAPSHOT/ > mahout-spark-2.1_2.11-0.13.1-SNAPSHOT.jar > > unfortunately the outcome of `mvn clean install` currently is (I think): > org/apache/mahout/mahout-spark/0.13.1-SNAPSHOT/mahout- > spark-0.13.1-SNAPSHOT-spark_2.1.jar > > I can’t find a way to make SBT parse that structure and name. > > > On Oct 2, 2017, at 11:02 PM, Trevor Grant <trevor.d.gr...@gmail.com> > wrote: > > Code pointer: > https://github.com/rawkintrevo/cylons/tree/master/eigenfaces > > However, I build Mahout (0.13.1-SNAPSHOT) locally with > > mvn clean install -Pscala-2.11,spark-2.1,viennacl-omp -DskipTests > > That's how maven was able to pick those up. > > > On Fri, Sep 22, 2017 at 10:06 PM, Hoa Nguyen <h...@insightdatascience.com> > wrote: > > > Hey all, > > > > Thanks for the offers of help. I've been able to narrow down some of the > > problems to version incompatibility and I just wanted to give an update. > > Just to back track a bit, my initial goal was to run Mahout on a > > distributed cluster whether that was running Hadoop Map Reduce or Spark. > > > > I started out trying to get it to run on Spark, which I have some > > familiarity, but that didn't seem to work. While the error messages seem > to > > indicate there weren't enough resources on the workers ("WARN > > scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; > > check your cluster UI to ensure that workers are registered and have > > sufficient memory"), I'm pretty sure that wasn't the case, not only > because > > it's a 4 node cluster of m4.xlarges, I was able to run another, simpler > > Spark batch job on that same distributed cluster. > > > > After a bit of wrangling, I was able to narrow down some of the issues. > It > > turns out I was kind of blindly using this repo https://github.com/ > > pferrel/3-input-cooc as a guide without fully realizing that it was from > > several years ago and based on Mahout 0.10.0, Scala 2.10 and Spark 1.1.1 > > That is significantly different from my environment, which has Mahout > > 0.13.0 and Spark 2.1.1 installed, which also means I have to use Scala > > 2.11. After modifying the build.sbt file to account for those versions, I > > now have compile type mismatch issues that I'm just not that savvy to fix > > (see attached screenshot if you're interested). > > > > Anyway, the good news that I was able to finally get Mahout code running > > on Hadoop map-reduce, but also after a bit wrangling. It turned out my > > instances were running Ubuntu 14 and apparently that doesn't play well > with > > Hadoop 2.7.4, which prevented me from running any sample Mahout code > (from > > here: https://github.com/apache/mahout/tree/master/examples/bin) that > > relied on map-reduce. Those problems went away after I installed Hadoop > > 2.8.1 instead. Now I'm able to get the shell scripts running on a > > distributed Hadoop cluster (yay!). > > > > Anyway, if anyone has more recent and working Spark Scala code that uses > > Mahout that they can point me to, I'd appreciate it. > > > > Many thanks! > > Hoa > > > > On Fri, Sep 22, 2017 at 1:09 AM, Trevor Grant <trevor.d.gr...@gmail.com> > > wrote: > > > >> Hi Hoa, > >> > >> A few things could be happening here, I haven't run across that specific > >> error. > >> > >> 1) Spark 2.x - Mahout 0.13.0: Mahout 0.13.0 WILL run on Spark 2.x, > however > >> you need to build from source (not the binaries). You can do this by > >> downloading mahout source or cloning the repo and building with: > >> mvn clean install -Pspark-2.1,scala-2.11 -DskipTests > >> > >> 2) Have you setup spark with Kryo serialization? How you do this depends > >> on > >> if you're in the shell/zeppelin or using spark submit. > >> > >> However, for both of these cases- it shouldn't have even run local afaik > >> so > >> the fact it did tells me you probably have gotten this far? > >> > >> Assuming you've done 1 and 2, can you share some code? I'll see if I can > >> recreate on my end. > >> > >> Thanks! > >> > >> tg > >> > >> On Thu, Sep 21, 2017 at 9:37 PM, Hoa Nguyen <h...@insightdatascience.com > > > >> wrote: > >> > >>> I apologize in advance if this is too much of a newbie question but I'm > >>> having a hard time running any Mahout example code in a distributed > >> Spark > >>> cluster. The code runs as advertised when Spark is running locally on > >> one > >>> machine but the minute I point Spark to a cluster and master url, I > >> can't > >>> get it to work, drawing the error: "WARN scheduler.TaskSchedulerImpl: > >>> Initial job has not accepted any resources; check your cluster UI to > >> ensure > >>> that workers are registered and have sufficient memory" > >>> > >>> I know my Spark cluster is configured and working correctly because I > >> ran > >>> non-Mahout code and it runs on a distributed cluster fine. What am I > >> doing > >>> wrong? The only thing I can think of is that my Spark version is too > >> recent > >>> -- 2.1.1 -- for the Mahout version I'm using -- 0.13.0. Is that it or > >> am I > >>> doing something else wrong? > >>> > >>> Thanks for any advice, > >>> Hoa > >>> > >> > > > > > >