You need

> val raw = sc.sequenceFile(path, classOf[Text], classOf[VectorWriteable])

to load the data. After that, you can do

> val data = raw.values.map(_.get)

To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar`
when you launch spark-shell to include mahout-math.

Best,
Xiangrui

On Tue, May 13, 2014 at 10:37 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote:
> Hi All,
>
> I am very new to Spark and trying to play around with Mllib hence apologies
> for the basic question.
>
>
>
> I am trying to run KMeans algorithm using Mahout and Spark MLlib to see the
> performance. Now initial datasize was 10 GB. Mahout converts the data in
> Sequence File <Text,VectorWritable> which is used for KMeans Clustering.
> The Sequence File crated was ~ 6GB in size.
>
>
>
> Now I wanted if I can use the Mahout Sequence file to be executed in Spark
> MLlib for KMeans . I have read that SparkContext.sequenceFile may be used
> here. Hence I tried to read my sequencefile as below but getting the error :
>
>
>
> Command on Spark Shell :
>
> scala> val data = sc.sequenceFile[String,VectorWritable]("/
> KMeans_dataset_seq/part-r-00000",String,VectorWritable)
>
> <console>:12: error: not found: type VectorWritable
>
>        val data = sc.sequenceFile[String,VectorWritable]("
> /KMeans_dataset_seq/part-r-00000",String,VectorWritable)
>
>
>
> Here I have 2 ques:
>
> 1.  Mahout has “Text” as Key but Spark is printing “not found: type:Text”
> hence I changed it to String.. Is this correct ???
>
> 2. How will VectorWritable be found in Spark. Do I need to include Mahout
> jar in Classpath or any other option ??
>
>
>
> Please Suggest
>
>
>
> Regards
>
> Stuti Awasthi
>
>
>
> ::DISCLAIMER::
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as
> information could be intercepted, corrupted,
> lost, destroyed, arrive late or incomplete, or may contain viruses in
> transmission. The e mail and its contents
> (with or without referred errors) shall therefore not attach any liability
> on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification,
> distribution and / or publication of this message without the prior written
> consent of authorized representative of
> HCL is strictly prohibited. If you have received this email in error please
> delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses
> and other defects.
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------

Reply via email to