Mihai, buna dimeneata,
with 0.10 code name samsara, you have two choices: (1) embedded api -- nest in your java/scala application; or (2) algebra/Scala scripting with spark-shell [1] -- more R-like experience. with embedded api, you will have to take care of writing your own application and set up proper mahout dependencies and imports and take care of context(session) [2,3] creation -- this probably takes more time to try but it is how i use it. Also, Scala idea plugin is more useful there to guide you thru syntax whereas in shell you'd have to do some additional handwaiving to make idea as useful. up 2 you. use head of 0.10.x branch + spark 1.2.x. (h2o is broken there but i assume you don't care about h2o). After you are done with this boilerplate nonsense, you are ready to code: (1) I assume since currently you are trying to use DRM, you have it persisted somewhere. Mahout's DRM is compabile on persistence throughout -- it is native persistence format for DrmLike type in Samsara as well. so first we load it [4] (2) then we ask for dssvd to compute what we need with parameters we need [5] (3) then we save any required product back to dfs where we need it [6] Please do not let amount of references to discourage you; i am just trying to be exhaustively helpful. Bottom line experience should be no more complicated (and in some aspects perhaps even exceed) R experience. [1] http://mahout.apache.org/users/sparkbindings/play-with-shell.html [2] create context: http://apache.github.io/mahout/doc/ScalaSparkBindings.html#pfd [3] distributed imports: http://apache.github.io/mahout/doc/ScalaSparkBindings.html#pfd [4] loading from dfs http://apache.github.io/mahout/doc/ScalaSparkBindings.html#pfe -- this is a bit dated; the name has changed to drmDfsRead [5] dssvd invocation http://apache.github.io/mahout/doc/ScalaSparkBindings.html#pf17 [6] saving back to dfs http://apache.github.io/mahout/doc/ScalaSparkBindings.html#pff -- this has also changed i think to `dfsWrite` to be consistent with conventions. On Wed, Jun 10, 2015 at 11:46 PM, Mihai Dascalu <[email protected]> wrote: > Ok, you convinced me :) But can you please help me with an example or some > documentation? I only found fragments of code (and only Scala, not Java > Spark). > > How should I create the input matrix, invoke dssvd, as well as configure > processors/ memory? > > Also, are there some specific dependencies of versions? Should I wait for > the next release? > > > Thanks a lot and have a great day! > Mihai > > > On Jun 10, 2015, at 23:57, Dmitriy Lyubimov <[email protected]> wrote: > > > > Hadoop has its own guava. This is some dependency clash at runtime, for > > sure. Other than that no idea. MR is being phased out. Why don't u try > > spark version in upcoming .10.2? > > On Jun 10, 2015 12:58 PM, "Mihai Dascalu" <[email protected]> > wrote: > > > >> Hi! > >> > >> After upgrading to Mahout 0.10.1, I have a runtime exception in the > >> following Hadoop code in which I create the input matrix for performing > >> SSVD: > >> > >> // prepare output matrix > >> 81: final Configuration conf = new Configuration(); > >> > >> 83: SequenceFile.Writer writer = > >> SequenceFile.createWriter(conf, > >> Writer.file(new Path(path + "/" + > >> outputFileName)), > >> Writer.keyClass(IntWritable.class), > >> Writer.valueClass(VectorWritable.class)); > >> > >> while in the console we have: > >> ... > >> [Loaded org.apache.hadoop.util.StringInterner from > >> > file:/Users/mihaidascalu/Dropbox%20(Personal)/Workspace/Eclipse/ReaderBenchDev/lib/Mahout/mahout-mr-0.10.1-job.jar] > >> ... > >> java.lang.VerifyError: (class: com/google/common/collect/Interners, > >> method: newWeakInterner signature: > ()Lcom/google/common/collect/Interner;) > >> Incompatible argument to function > >> at > >> org.apache.hadoop.util.StringInterner.<clinit>(StringInterner.java:48) > >> at > >> > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2293) > >> at > >> > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2185) > >> at > >> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2102) > >> at > org.apache.hadoop.conf.Configuration.get(Configuration.java:851) > >> at > >> > org.apache.hadoop.io.SequenceFile.getDefaultCompressionType(SequenceFile.java:234) > >> at > >> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:264) > >> at > >> > services.semanticModels.LSA.CreateInputMatrix.parseCorpus(CreateInputMatrix.java:83) > >> at > >> > services.semanticModels.LSA.CreateInputMatrix.main(CreateInputMatrix.java:197) > >> > >> Any suggestions? I tried adding guava-14.0.1.jar as dependency, but it > did > >> not fix it > >> > >> > >> Thanks and have a great day! > >> Mihai > >
