Re: how to implement ALS with csv file? getting error while calling Rating class

2016-03-08 Thread Nick Pentreath
As I mentioned, using that *train* method returns the user and item factor
RDDs, as opposed to an ALSModel instance. You first need to construct a
model manually yourself. This is exactly why it's marked as *DeveloperApi*,
since it is not user-friendly and not strictly part of the ML pipeline
approach.

If you really want to use it, this should work:

import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.ml.recommendation.ALS.Rating

val conf = new SparkConf().setAppName("ALSWithStringID").setMaster("local[4]")
val sc = new SparkContext(conf)
val sql = new SQLContext(sc)
// Name,Value1,Value2.
val rdd = sc.parallelize(Seq(
  Rating[String]("foo", "1", 4.0f),
  Rating[String]("foo", "2", 2.0f),
  Rating[String]("bar", "1", 5.0f),
  Rating[String]("bar", "3", 1.0f)
))
val als = new ALS()
val (userFactors, itemFactors) = ALS.train(rdd)   // note have not
synced up training params with ALS instance params above.

import sql.implicits._
val userDF = userFactors.toDF("id", "features")
val itemDF = itemFactors.toDF("id", "features")
val model = new ALSModel(als.uid, als.getRank, userDF, itemDF)
  .setParent(als)
  .setUserCol("user")
  .setItemCol("item")

val pred = model.transform(rdd.toDF("user", "item", "rating"))
println(pred.show())


Note that you will need to be careful to sync up parameters between the
ALS.train and ALS instance and ALSModel. Note also that ml.ALS only
supports *transform* (which makes predictions for a set of user and item
columns in a DataFrame), and doesn't yet support the other predict methods
available in mllib.ALS


On Mon, 7 Mar 2016 at 21:25 Shishir Anshuman 
wrote:

> Hello Nick,
>
> I used *ml *instead of *mllib*  for ALS and Rating. But now It gives me
> error while using *predict()* from
> *org.apache.spark.mllib.recommendation.MatrixFactorizationModel.*
>
> I have attached the code and the error screenshot.
>
> Thank you.
>
> On Mon, Mar 7, 2016 at 12:40 PM, Nick Pentreath 
> wrote:
>
>> As you've pointed out, Rating requires user and item ids in Int form. So
>> you will need to map String user ids to integers.
>>
>> See this thread for example:
>> https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAJgQjQ9GhGqpg1=hvxpfrs+59elfj9f7knhp8nyqnh1ut_6...@mail.gmail.com%3E
>> .
>>
>> There is a DeveloperApi method
>> in org.apache.spark.ml.recommendation.ALS that takes Rating with generic
>> type (can be String) for user id and item id. However that is a little more
>> involved, and for larger scale data will be a lot less efficient.
>>
>> Something like this for example:
>>
>> import org.apache.spark.ml.recommendation.ALS
>> import org.apache.spark.ml.recommendation.ALS.Rating
>>
>> val conf = new 
>> SparkConf().setAppName("ALSWithStringID").setMaster("local[4]")
>> val sc = new SparkContext(conf)
>> // Name,Value1,Value2.
>> val rdd = sc.parallelize(Seq(
>>   Rating[String]("foo", "1", 4.0f),
>>   Rating[String]("foo", "2", 2.0f),
>>   Rating[String]("bar", "1", 5.0f),
>>   Rating[String]("bar", "3", 1.0f)
>> ))
>> val (userFactors, itemFactors) = ALS.train(rdd)
>>
>>
>> As you can see, you just get the factor RDDs back, and if you want an
>> ALSModel you will have to construct it yourself.
>>
>>
>> On Sun, 6 Mar 2016 at 18:23 Shishir Anshuman 
>> wrote:
>>
>>> I am new to apache Spark, and I want to implement the Alternating Least
>>> Squares algorithm. The data set is stored in a csv file in the format:
>>> *Name,Value1,Value2*.
>>>
>>> When I read the csv file, I get
>>> *java.lang.NumberFormatException.forInputString* error because the
>>> Rating class needs the parameters in the format: *(user: Int, product:
>>> Int, rating: Double)* and the first column of my file contains *Name*.
>>>
>>> Please suggest me a way to overcome this issue.
>>>
>>
>


Re: how to implement ALS with csv file? getting error while calling Rating class

2016-03-07 Thread Kevin Mellott
If you are using DataFrames, then you also can specify the schema when
loading as an alternate solution. I've found Spark-CSV
 to be a very useful library when
working with CSV data.

http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader


On Mon, Mar 7, 2016 at 1:10 AM, Nick Pentreath 
wrote:

> As you've pointed out, Rating requires user and item ids in Int form. So
> you will need to map String user ids to integers.
>
> See this thread for example:
> https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAJgQjQ9GhGqpg1=hvxpfrs+59elfj9f7knhp8nyqnh1ut_6...@mail.gmail.com%3E
> .
>
> There is a DeveloperApi method
> in org.apache.spark.ml.recommendation.ALS that takes Rating with generic
> type (can be String) for user id and item id. However that is a little more
> involved, and for larger scale data will be a lot less efficient.
>
> Something like this for example:
>
> import org.apache.spark.ml.recommendation.ALS
> import org.apache.spark.ml.recommendation.ALS.Rating
>
> val conf = new SparkConf().setAppName("ALSWithStringID").setMaster("local[4]")
> val sc = new SparkContext(conf)
> // Name,Value1,Value2.
> val rdd = sc.parallelize(Seq(
>   Rating[String]("foo", "1", 4.0f),
>   Rating[String]("foo", "2", 2.0f),
>   Rating[String]("bar", "1", 5.0f),
>   Rating[String]("bar", "3", 1.0f)
> ))
> val (userFactors, itemFactors) = ALS.train(rdd)
>
>
> As you can see, you just get the factor RDDs back, and if you want an
> ALSModel you will have to construct it yourself.
>
>
> On Sun, 6 Mar 2016 at 18:23 Shishir Anshuman 
> wrote:
>
>> I am new to apache Spark, and I want to implement the Alternating Least
>> Squares algorithm. The data set is stored in a csv file in the format:
>> *Name,Value1,Value2*.
>>
>> When I read the csv file, I get
>> *java.lang.NumberFormatException.forInputString* error because the
>> Rating class needs the parameters in the format: *(user: Int, product:
>> Int, rating: Double)* and the first column of my file contains *Name*.
>>
>> Please suggest me a way to overcome this issue.
>>
>


Re: how to implement ALS with csv file? getting error while calling Rating class

2016-03-06 Thread Nick Pentreath
As you've pointed out, Rating requires user and item ids in Int form. So
you will need to map String user ids to integers.

See this thread for example:
https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAJgQjQ9GhGqpg1=hvxpfrs+59elfj9f7knhp8nyqnh1ut_6...@mail.gmail.com%3E
.

There is a DeveloperApi method
in org.apache.spark.ml.recommendation.ALS that takes Rating with generic
type (can be String) for user id and item id. However that is a little more
involved, and for larger scale data will be a lot less efficient.

Something like this for example:

import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.ml.recommendation.ALS.Rating

val conf = new SparkConf().setAppName("ALSWithStringID").setMaster("local[4]")
val sc = new SparkContext(conf)
// Name,Value1,Value2.
val rdd = sc.parallelize(Seq(
  Rating[String]("foo", "1", 4.0f),
  Rating[String]("foo", "2", 2.0f),
  Rating[String]("bar", "1", 5.0f),
  Rating[String]("bar", "3", 1.0f)
))
val (userFactors, itemFactors) = ALS.train(rdd)


As you can see, you just get the factor RDDs back, and if you want an
ALSModel you will have to construct it yourself.


On Sun, 6 Mar 2016 at 18:23 Shishir Anshuman 
wrote:

> I am new to apache Spark, and I want to implement the Alternating Least
> Squares algorithm. The data set is stored in a csv file in the format:
> *Name,Value1,Value2*.
>
> When I read the csv file, I get
> *java.lang.NumberFormatException.forInputString* error because the Rating
> class needs the parameters in the format: *(user: Int, product: Int,
> rating: Double)* and the first column of my file contains *Name*.
>
> Please suggest me a way to overcome this issue.
>


how to implement ALS with csv file? getting error while calling Rating class

2016-03-06 Thread Shishir Anshuman
I am new to apache Spark, and I want to implement the Alternating Least
Squares algorithm. The data set is stored in a csv file in the format:
*Name,Value1,Value2*.

When I read the csv file, I get
*java.lang.NumberFormatException.forInputString* error because the Rating
class needs the parameters in the format: *(user: Int, product: Int,
rating: Double)* and the first column of my file contains *Name*.

Please suggest me a way to overcome this issue.