Re: RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS

Xiangrui Meng Tue, 17 Mar 2015 12:51:14 -0700

Please remember to copy the user list next time. I might not be able
to respond quickly. There are many others who can help or who can
benefit from the discussion. Thanks! -Xiangrui


On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jkatuk...@apple.com> wrote:
> Great Xiangrui. It works now.
>
> Sorry that I needed to bug you :)
>
> Jay
>
>
> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <men...@gmail.com> wrote:
>
>> Please check this section in the user guide:
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>>
>> You need `import sqlContext.implicits._` to use `toDF()`.
>>
>> -Xiangrui
>>
>> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jkatuk...@apple.com> wrote:
>>> Hi Xiangrui,
>>> Thanks a lot for the quick reply.
>>>
>>> I am still facing an issue.
>>>
>>> I have tried the code snippet that you have suggested:
>>>
>>> val ratings = purchase.map { line =>
>>> line.split(',') match { case Array(user, item, rate) =>
>>> (user.toInt, item.toInt, rate.toFloat)
>>> }.toDF("user", "item", "rate”)}
>>>
>>> for this, I got the below error:
>>>
>>> error: ';' expected but '.' found.
>>> [INFO] }.toDF("user", "item", "rate”)}
>>> [INFO]  ^
>>>
>>> when I tried below code
>>>
>>> val ratings = purchase.map ( line =>
>>>    line.split(',') match { case Array(user, item, rate) =>
>>>    (user.toInt, item.toInt, rate.toFloat)
>>>    }).toDF("user", "item", "rate")
>>>
>>>
>>> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
>>> Float)]
>>> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
>>> [INFO]     }).toDF("user", "item", "rate")
>>>
>>>
>>>
>>> I have looked at the document that you have shared and tried the following
>>> code:
>>>
>>> case class Record(user: Int, item: Int, rate:Double)
>>> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
>>> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>>>
>>> for this, I got the below error:
>>>
>>> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>>>
>>>
>>> Appreciate your help !
>>>
>>> Thanks,
>>> Jay
>>>
>>>
>>> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <men...@gmail.com> wrote:
>>>
>>> Try this:
>>>
>>> val ratings = purchase.map { line =>
>>> line.split(',') match { case Array(user, item, rate) =>
>>> (user.toInt, item.toInt, rate.toFloat)
>>> }.toDF("user", "item", "rate")
>>>
>>> Doc for DataFrames:
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html
>>>
>>> -Xiangrui
>>>
>>> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jkatuk...@apple.com> wrote:
>>>
>>> Hi all,
>>> I am trying to use the new ALS implementation under
>>> org.apache.spark.ml.recommendation.ALS.
>>>
>>>
>>>
>>> The new method to invoke for training seems to be  override def fit(dataset:
>>> DataFrame, paramMap: ParamMap): ALSModel.
>>>
>>> How do I create a dataframe object from ratings data set that is on hdfs ?
>>>
>>>
>>> where as the method in the old ALS implementation under
>>> org.apache.spark.mllib.recommendation.ALS was
>>> def train(
>>>     ratings: RDD[Rating],
>>>     rank: Int,
>>>     iterations: Int,
>>>     lambda: Double,
>>>     blocks: Int,
>>>     seed: Long
>>>   ): MatrixFactorizationModel
>>>
>>> My code to run the old ALS train method is as below:
>>>
>>> "val sc = new SparkContext(conf)
>>>
>>>    val pfile = args(0)
>>>    val purchase=sc.textFile(pfile)
>>>   val ratings = purchase.map(_.split(',') match { case Array(user, item,
>>> rate) =>
>>>       Rating(user.toInt, item.toInt, rate.toInt)
>>>   })
>>>
>>> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>>>
>>>
>>> Now, for the new ALS fit method, I am trying to use the below code to run,
>>> but getting a compilation error:
>>>
>>> val als = new ALS()
>>>      .setRank(rank)
>>>     .setRegParam(regParam)
>>>     .setImplicitPrefs(implicitPrefs)
>>>     .setNumUserBlocks(numUserBlocks)
>>>     .setNumItemBlocks(numItemBlocks)
>>>
>>> val sc = new SparkContext(conf)
>>>
>>>    val pfile = args(0)
>>>    val purchase=sc.textFile(pfile)
>>>   val ratings = purchase.map(_.split(',') match { case Array(user, item,
>>> rate) =>
>>>       Rating(user.toInt, item.toInt, rate.toInt)
>>>   })
>>>
>>> val model = als.fit(ratings.toDF())
>>>
>>> I get an error that the method toDF() is not a member of
>>> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>>>
>>> Appreciate the help !
>>>
>>> Thanks,
>>> Jay
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS

Reply via email to