You need > val raw = sc.sequenceFile(path, classOf[Text], classOf[VectorWriteable])
to load the data. After that, you can do > val data = raw.values.map(_.get) To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar` when you launch spark-shell to include mahout-math. Best, Xiangrui On Tue, May 13, 2014 at 10:37 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote: > Hi All, > > I am very new to Spark and trying to play around with Mllib hence apologies > for the basic question. > > > > I am trying to run KMeans algorithm using Mahout and Spark MLlib to see the > performance. Now initial datasize was 10 GB. Mahout converts the data in > Sequence File <Text,VectorWritable> which is used for KMeans Clustering. > The Sequence File crated was ~ 6GB in size. > > > > Now I wanted if I can use the Mahout Sequence file to be executed in Spark > MLlib for KMeans . I have read that SparkContext.sequenceFile may be used > here. Hence I tried to read my sequencefile as below but getting the error : > > > > Command on Spark Shell : > > scala> val data = sc.sequenceFile[String,VectorWritable]("/ > KMeans_dataset_seq/part-r-00000",String,VectorWritable) > > <console>:12: error: not found: type VectorWritable > > val data = sc.sequenceFile[String,VectorWritable](" > /KMeans_dataset_seq/part-r-00000",String,VectorWritable) > > > > Here I have 2 ques: > > 1. Mahout has “Text” as Key but Spark is printing “not found: type:Text” > hence I changed it to String.. Is this correct ??? > > 2. How will VectorWritable be found in Spark. Do I need to include Mahout > jar in Classpath or any other option ?? > > > > Please Suggest > > > > Regards > > Stuti Awasthi > > > > ::DISCLAIMER:: > ---------------------------------------------------------------------------------------------------------------------------------------------------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > E-mail transmission is not guaranteed to be secure or error-free as > information could be intercepted, corrupted, > lost, destroyed, arrive late or incomplete, or may contain viruses in > transmission. The e mail and its contents > (with or without referred errors) shall therefore not attach any liability > on the originator or HCL or its affiliates. > Views or opinions, if any, presented in this email are solely those of the > author and may not necessarily reflect the > views or opinions of HCL or its affiliates. Any form of reproduction, > dissemination, copying, disclosure, modification, > distribution and / or publication of this message without the prior written > consent of authorized representative of > HCL is strictly prohibited. If you have received this email in error please > delete it and notify the sender immediately. > Before opening any email and/or attachments, please check them for viruses > and other defects. > > ----------------------------------------------------------------------------------------------------------------------------------------------------