Adding back to the user for the second question.
Thanx a lot . That worked . > > Now I just have one more doubt. In ALS algorithm, I want to store the > generated model in a csv file. > *model.save(path) *saves the model in parquet format. > > And when I try to save *MatrixFactorizationModel *as a text file, I get > this error: > *value saveAsTextFile is not a member of > org.apache.spark.mllib.recommendation.MatrixFactorizationModel* > > Please suggest me a way to store the recommendation model in a text file. > > > > On Mon, Mar 21, 2016 at 12:19 PM, Akhil Das <[email protected]> > wrote: > >> This *[Ljava.lang.String;@153371c3* is an array of string (columns >> values from file1) and similarly the next one also is an array of string. >> >> You can access column1 this way: >> >> val column1 = joined.map(data => data._2._1(0)) >> >> >> >> >> Thanks >> Best Regards >> >> On Mon, Mar 21, 2016 at 12:15 PM, Shishir Anshuman < >> [email protected]> wrote: >> >>> I did as you said. >>> After joining the two RDDs, when I print the data, I get the following >>> output: >>> >>> *(key-data >>> ,([Ljava.lang.String;@153371c3,[Ljava.lang.String;@7a3e7586)) * >>> >>> Now how do I proceed with the replacement.? >>> >>> On Mon, Mar 21, 2016 at 11:47 AM, Akhil Das <[email protected]> >>> wrote: >>> >>>> What you should be doing is a join, something like this: >>>> >>>> //Create a key, value pair, key being the column1 >>>> val rdd1 = sc.textFile(file1).map(x => (x.split(",")(0),x.split(",")) >>>> >>>> //Create a key, value pair, key being the column2 >>>> val rdd2 = sc.textFile(file2).map(x => (x.split(",")(1),x.split(",")) >>>> >>>> //Now join the dataset >>>> val joined = rdd1.join(rdd2) >>>> >>>> //Now do the replacement >>>> val replaced = joined.map(...) >>>> >>>> >>>> >>>> >>>> >>>> Thanks >>>> Best Regards >>>> >>>> On Mon, Mar 21, 2016 at 10:31 AM, Shishir Anshuman < >>>> [email protected]> wrote: >>>> >>>>> I have stored the contents of two csv files in separate RDDs. >>>>> >>>>> file1.csv format*: (column1,column2,column3)* >>>>> file2.csv format*: (column1, column2)* >>>>> >>>>> *column1 of file1 *and* column2 of file2 *contains similar data. I >>>>> want to compare the two columns and if match is found: >>>>> >>>>> - Replace the data at *column1(file1)* with the* column1(file2)* >>>>> >>>>> >>>>> For this reason, I am not using normal RDD. >>>>> >>>>> I am still new to apache spark, so any suggestion will be greatly >>>>> appreciated. >>>>> >>>>> On Mon, Mar 21, 2016 at 10:09 AM, Prem Sure <[email protected]> >>>>> wrote: >>>>> >>>>>> any specific reason you would like to use collectasmap only? You >>>>>> probably move to normal RDD instead of a Pair. >>>>>> >>>>>> >>>>>> On Monday, March 21, 2016, Mark Hamstra <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> You're not getting what Ted is telling you. Your `dict` is an >>>>>>> RDD[String] -- i.e. it is a collection of a single value type, String. >>>>>>> But `collectAsMap` is only defined for PairRDDs that have key-value >>>>>>> pairs >>>>>>> for their data elements. Both a key and a value are needed to collect >>>>>>> into >>>>>>> a Map[K, V]. >>>>>>> >>>>>>> On Sun, Mar 20, 2016 at 8:19 PM, Shishir Anshuman < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> yes I have included that class in my code. >>>>>>>> I guess its something to do with the RDD format. Not able to figure >>>>>>>> out the exact reason. >>>>>>>> >>>>>>>> On Fri, Mar 18, 2016 at 9:27 AM, Ted Yu <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> It is defined in: >>>>>>>>> core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala >>>>>>>>> >>>>>>>>> On Thu, Mar 17, 2016 at 8:55 PM, Shishir Anshuman < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I am using following code snippet in scala: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *val dict: RDD[String] = sc.textFile("path/to/csv/file")* >>>>>>>>>> *val dict_broadcast=sc.broadcast(dict.collectAsMap())* >>>>>>>>>> >>>>>>>>>> On compiling It generates this error: >>>>>>>>>> >>>>>>>>>> *scala:42: value collectAsMap is not a member of >>>>>>>>>> org.apache.spark.rdd.RDD[String]* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *val dict_broadcast=sc.broadcast(dict.collectAsMap()) >>>>>>>>>> ^* >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>> >> >
