PPS The shell/spark tutorial i've mentioned is actually being developed in MAHOUT-1542. As it stands, i believe it is now complete in its core.
On Wed, May 14, 2014 at 5:48 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > PS spark shell with all proper imports are also supported natively in > Mahout (mahout spark-shell command). See M-1489 for specifics. There's also > a tutorial somewhere but i suspect it has not been yet finished/publised > via public link yet. Again, you need trunk to use spark shell there. > > > On Wed, May 14, 2014 at 12:43 AM, Stuti Awasthi <stutiawas...@hcl.com>wrote: > >> Hi Xiangrui, >> Thanks for the response .. I tried few ways to include mahout-math jar >> while launching Spark shell.. but no success.. Can you please point what I >> am doing wrong >> >> 1. mahout-math.jar exported in CLASSPATH, and PATH >> 2. Tried Launching Spark Shell by : MASTER=spark://<HOSTNAME>:<PORT> >> ADD_JARS=~/installations/work-space/mahout-math-0.7.jar >> park-0.9.0/bin/spark-shell >> >> After launching, I checked the environment details on WebUi: It looks >> like mahout-math jar is included. >> spark.jars /home/hduser/installations/work-space/mahout-math-0.7.jar >> >> Then I try : >> scala> import org.apache.mahout.math.VectorWritable >> <console>:10: error: object mahout is not a member of package org.apache >> import org.apache.mahout.math.VectorWritable >> >> scala> val raw = sc.sequenceFile(path, classOf[Text], >> classOf[VectorWritable]) >> <console>:12: error: not found: type Text >> val data = >> sc.sequenceFile("/stuti/ML/Clustering/KMeans/HAR/KMeans_dataset_seq/part-r-00000", >> classOf[Text], classOf[VectorWritable]) >> >> ^ >> Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7 >> >> Thanks >> Stuti >> >> >> >> -----Original Message----- >> From: Xiangrui Meng [mailto:men...@gmail.com] >> Sent: Wednesday, May 14, 2014 11:56 AM >> To: user@spark.apache.org >> Subject: Re: How to use Mahout VectorWritable in Spark. >> >> You need >> >> > val raw = sc.sequenceFile(path, classOf[Text], >> > classOf[VectorWriteable]) >> >> to load the data. After that, you can do >> >> > val data = raw.values.map(_.get) >> >> To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar` >> when you launch spark-shell to include mahout-math. >> >> Best, >> Xiangrui >> >> On Tue, May 13, 2014 at 10:37 PM, Stuti Awasthi <stutiawas...@hcl.com> >> wrote: >> > Hi All, >> > >> > I am very new to Spark and trying to play around with Mllib hence >> > apologies for the basic question. >> > >> > >> > >> > I am trying to run KMeans algorithm using Mahout and Spark MLlib to >> > see the performance. Now initial datasize was 10 GB. Mahout converts >> > the data in Sequence File <Text,VectorWritable> which is used for >> KMeans Clustering. >> > The Sequence File crated was ~ 6GB in size. >> > >> > >> > >> > Now I wanted if I can use the Mahout Sequence file to be executed in >> > Spark MLlib for KMeans . I have read that SparkContext.sequenceFile >> > may be used here. Hence I tried to read my sequencefile as below but >> getting the error : >> > >> > >> > >> > Command on Spark Shell : >> > >> > scala> val data = sc.sequenceFile[String,VectorWritable]("/ >> > KMeans_dataset_seq/part-r-00000",String,VectorWritable) >> > >> > <console>:12: error: not found: type VectorWritable >> > >> > val data = sc.sequenceFile[String,VectorWritable](" >> > /KMeans_dataset_seq/part-r-00000",String,VectorWritable) >> > >> > >> > >> > Here I have 2 ques: >> > >> > 1. Mahout has “Text” as Key but Spark is printing “not found: >> type:Text” >> > hence I changed it to String.. Is this correct ??? >> > >> > 2. How will VectorWritable be found in Spark. Do I need to include >> > Mahout jar in Classpath or any other option ?? >> > >> > >> > >> > Please Suggest >> > >> > >> > >> > Regards >> > >> > Stuti Awasthi >> > >> > >> > >> > ::DISCLAIMER:: >> > ---------------------------------------------------------------------- >> > ---------------------------------------------------------------------- >> > -------- >> > >> > The contents of this e-mail and any attachment(s) are confidential and >> > intended for the named recipient(s) only. >> > E-mail transmission is not guaranteed to be secure or error-free as >> > information could be intercepted, corrupted, lost, destroyed, arrive >> > late or incomplete, or may contain viruses in transmission. The e mail >> > and its contents (with or without referred errors) shall therefore not >> > attach any liability on the originator or HCL or its affiliates. >> > Views or opinions, if any, presented in this email are solely those of >> > the author and may not necessarily reflect the views or opinions of >> > HCL or its affiliates. Any form of reproduction, dissemination, >> > copying, disclosure, modification, distribution and / or publication >> > of this message without the prior written consent of authorized >> > representative of HCL is strictly prohibited. If you have received >> > this email in error please delete it and notify the sender >> > immediately. >> > Before opening any email and/or attachments, please check them for >> > viruses and other defects. >> > >> > ---------------------------------------------------------------------- >> > ---------------------------------------------------------------------- >> > -------- >> > >