Re: Error using collectAsMap() in scala

Akhil Das Mon, 21 Mar 2016 03:20:07 -0700

Adding back to the user for the second question.


Thanx a lot . That worked .
>
> Now I just have one more doubt. In ALS algorithm, I want to store the
> generated model in a csv file.
> *model.save(path) *saves the model in parquet format.
>
> And when I try to save *MatrixFactorizationModel *as a text file, I get
> this error:
> *value saveAsTextFile is not a member of
> org.apache.spark.mllib.recommendation.MatrixFactorizationModel*
>
> Please suggest me a way to store the recommendation model in a text file.
>
>
>
> On Mon, Mar 21, 2016 at 12:19 PM, Akhil Das <[email protected]>
> wrote:
>
>> This *[Ljava.lang.String;@153371c3*  is an array of string (columns
>> values from file1) and similarly the next one also is an array of string.
>>
>> You can access column1 this way:
>>
>> val column1 = joined.map(data => data._2._1(0))
>>
>>
>>
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Mar 21, 2016 at 12:15 PM, Shishir Anshuman <
>> [email protected]> wrote:
>>
>>> I did as you said.
>>> After joining the two RDDs, when I print the data, I get the following
>>> output:
>>>
>>> *(key-data
>>> ,([Ljava.lang.String;@153371c3,[Ljava.lang.String;@7a3e7586)) *
>>>
>>> Now how do I proceed with the replacement.?
>>>
>>> On Mon, Mar 21, 2016 at 11:47 AM, Akhil Das <[email protected]>
>>> wrote:
>>>
>>>> What you should be doing is a join, something like this:
>>>>
>>>> //Create a key, value pair, key being the column1
>>>> val rdd1 = sc.textFile(file1).map(x => (x.split(",")(0),x.split(","))
>>>>
>>>> //Create a key, value pair, key being the column2
>>>> val rdd2 = sc.textFile(file2).map(x => (x.split(",")(1),x.split(","))
>>>>
>>>> //Now join the dataset
>>>> val joined = rdd1.join(rdd2)
>>>>
>>>> //Now do the replacement
>>>> val replaced = joined.map(...)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Mon, Mar 21, 2016 at 10:31 AM, Shishir Anshuman <
>>>> [email protected]> wrote:
>>>>
>>>>> I have stored the contents of two csv files in separate RDDs.
>>>>>
>>>>> file1.csv format*: (column1,column2,column3)*
>>>>> file2.csv format*: (column1, column2)*
>>>>>
>>>>> *column1 of file1 *and* column2 of file2 *contains similar data. I
>>>>> want to compare the two columns and if match is found:
>>>>>
>>>>>    - Replace the data at *column1(file1)* with the* column1(file2)*
>>>>>
>>>>>
>>>>> For this reason, I am not using normal RDD.
>>>>>
>>>>> I am still new to apache spark, so any suggestion will be greatly
>>>>> appreciated.
>>>>>
>>>>> On Mon, Mar 21, 2016 at 10:09 AM, Prem Sure <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> any specific reason you would like to use collectasmap only? You
>>>>>> probably move to normal RDD instead of a Pair.
>>>>>>
>>>>>>
>>>>>> On Monday, March 21, 2016, Mark Hamstra <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> You're not getting what Ted is telling you.  Your `dict` is an
>>>>>>> RDD[String]  -- i.e. it is a collection of a single value type, String.
>>>>>>> But `collectAsMap` is only defined for PairRDDs that have key-value 
>>>>>>> pairs
>>>>>>> for their data elements.  Both a key and a value are needed to collect 
>>>>>>> into
>>>>>>> a Map[K, V].
>>>>>>>
>>>>>>> On Sun, Mar 20, 2016 at 8:19 PM, Shishir Anshuman <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> yes I have included that class in my code.
>>>>>>>> I guess its something to do with the RDD format. Not able to figure
>>>>>>>> out the exact reason.
>>>>>>>>
>>>>>>>> On Fri, Mar 18, 2016 at 9:27 AM, Ted Yu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> It is defined in:
>>>>>>>>> core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
>>>>>>>>>
>>>>>>>>> On Thu, Mar 17, 2016 at 8:55 PM, Shishir Anshuman <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I am using following code snippet in scala:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *val dict: RDD[String] = sc.textFile("path/to/csv/file")*
>>>>>>>>>> *val dict_broadcast=sc.broadcast(dict.collectAsMap())*
>>>>>>>>>>
>>>>>>>>>> On compiling It generates this error:
>>>>>>>>>>
>>>>>>>>>> *scala:42: value collectAsMap is not a member of
>>>>>>>>>> org.apache.spark.rdd.RDD[String]*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *val dict_broadcast=sc.broadcast(dict.collectAsMap())
>>>>>>>>>>                         ^*
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Error using collectAsMap() in scala

Reply via email to