Re: org.apache.spark.ml.recommendation.ALS

Xiangrui Meng Tue, 14 Apr 2015 09:59:20 -0700

Yes, I think the default Spark builds are on Scala 2.10. You need to
follow instructions at
http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211
to build 2.11 packages. -Xiangrui


On Mon, Apr 13, 2015 at 4:00 PM, Jay Katukuri <jkatuk...@apple.com> wrote:
>
> Hi Xiangrui,
>
> Here is the class:
>
>
> object ALSNew {
>
>  def main (args: Array[String]) {
>      val conf = new SparkConf()
>       .setAppName("TrainingDataPurchase")
>       .set("spark.executor.memory", "4g")
>
>
>
>       conf.set("spark.shuffle.memoryFraction","0.65") //default is 0.2
>     conf.set("spark.storage.memoryFraction","0.3")//default is 0.6
>
>
>
>
>
>     val sc = new SparkContext(conf)
>      val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>     import sqlContext.implicits._
>
>
>
>      val pfile = args(0)
>      val purchase=sc.textFile(pfile)
>
>
>
>
>     val ratings = purchase.map ( line =>
>     line.split(',') match { case Array(user, item, rate) =>
>     (user.toInt, item.toInt, rate.toFloat)
>     }).toDF()
>
>
>
>
>
> val rank = args(1).toInt
> val numIterations = args(2).toInt
> val regParam : Double = 0.01
> val implicitPrefs : Boolean = true
> val numUserBlocks : Int = 100
> val numItemBlocks : Int = 100
> val nonnegative : Boolean = true
>
>
> //val paramMap = ParamMap (regParam=0.01)
> //paramMap.put(numUserBlocks=100,  numItemBlocks=100)
>    val als = new ALS()
>        .setRank(rank)
>       .setRegParam(regParam)
>       .setImplicitPrefs(implicitPrefs)
>       .setNumUserBlocks(numUserBlocks)
>       .setNumItemBlocks(numItemBlocks)
>
>
>
>
>
>     val alpha = als.getAlpha
>
>
>
>
>
>   val model =  als.fit(ratings)
>
>
>
>
>
>   val predictions = model.transform(ratings)
>       .select("rating", "prediction")
>       .map { case Row(rating: Float, prediction: Float) =>
>         (rating.toDouble, prediction.toDouble)
>       }
>     val rmse =
>       if (implicitPrefs) {
>         // TODO: Use a better (rank-based?) evaluation metric for implicit
> feedback.
>         // We limit the ratings and the predictions to interval [0, 1] and
> compute the weighted RMSE
>         // with the confidence scores as weights.
>         val (totalWeight, weightedSumSq) = predictions.map { case (rating,
> prediction) =>
>           val confidence = 1.0 + alpha * math.abs(rating)
>           val rating01 = math.max(math.min(rating, 1.0), 0.0)
>           val prediction01 = math.max(math.min(prediction, 1.0), 0.0)
>           val err = prediction01 - rating01
>           (confidence, confidence * err * err)
>         }.reduce { case ((c0, e0), (c1, e1)) =>
>           (c0 + c1, e0 + e1)
>         }
>         math.sqrt(weightedSumSq /totalWeight)
>       } else {
>         val mse = predictions.map { case (rating, prediction) =>
>           val err = rating - prediction
>           err * err
>         }.mean()
>         math.sqrt(mse)
>       }
>
>
>
>     println("Mean Squared Error = " + rmse)
>  }
>
>
>
>
>
>
>
>  }
>
>
>
>
> I am using the following in my maven build (pom.xml):
>
>
> <dependencies>
>     <dependency>
>       <groupId>org.scala-lang</groupId>
>       <artifactId>scala-library</artifactId>
>       <version>2.11.2</version>
>     </dependency>
>     <dependency>
>       <groupId>org.apache.spark</groupId>
>       <artifactId>spark-core_2.11</artifactId>
>       <version>1.3.0</version>
>     </dependency>
>
>
>
>     <dependency>
> <groupId>org.apache.spark</groupId>
> <artifactId>spark-mllib_2.11</artifactId>
> <version>1.3.0</version>
>    </dependency>
>    <dependency>
>    <groupId>org.apache.spark</groupId>
> <artifactId>spark-sql_2.11</artifactId>
> <version>1.3.0</version>
>    </dependency>
>   </dependencies>
>
>
> I am using scala version 2.11.2.
>
> Could it be that "spark-1.3.0-bin-hadoop2.4.tgz requires  a different
> version of scala ?
>
> Thanks,
> Jay
>
>
>
> On Apr 9, 2015, at 4:38 PM, Xiangrui Meng <men...@gmail.com> wrote:
>
> Could you share ALSNew.scala? Which Scala version did you use? -Xiangrui
>
> On Wed, Apr 8, 2015 at 4:09 PM, Jay Katukuri <jkatuk...@apple.com> wrote:
>
> Hi Xiangrui,
>
> I tried running this on my local machine  (laptop) and got the same error:
>
> Here is what I did:
>
> 1. downloaded spark 1.30 release version (prebuilt for hadoop 2.4 and later)
> "spark-1.3.0-bin-hadoop2.4.tgz".
> 2. Ran the following command:
>
> spark-submit --class ALSNew  --master local[8] ALSNew.jar  /input_path
>
>
> The stack trace is exactly same.
>
> Thanks,
> Jay
>
>
>
> On Apr 8, 2015, at 10:47 AM, Jay Katukuri <jkatuk...@apple.com> wrote:
>
> some additional context:
>
> Since, I am using features of spark 1.3.0, I have downloaded spark 1.3.0 and
> used spark-submit from there.
> The cluster is still on spark-1.2.0.
>
> So, this looks to me that at runtime, the executors could not find some
> libraries of spark-1.3.0, even though I ran spark-submit from my downloaded
> spark-1.30.
>
>
>
> On Apr 6, 2015, at 1:37 PM, Jay Katukuri <jkatuk...@apple.com> wrote:
>
> Here is the command that I have used :
>
> spark-submit —class packagename.ALSNew --num-executors 100 --master yarn
> ALSNew.jar -jar spark-sql_2.11-1.3.0.jar hdfs://input_path
>
> Btw - I could run the old ALS in mllib package.
>
>
>
>
>
> On Apr 6, 2015, at 12:32 PM, Xiangrui Meng <men...@gmail.com> wrote:
>
> So ALSNew.scala is your own application, did you add it with
> spark-submit or spark-shell? The correct command should like
>
> spark-submit --class your.package.name.ALSNew ALSNew.jar [options]
>
> Please check the documentation:
> http://spark.apache.org/docs/latest/submitting-applications.html
>
> -Xiangrui
>
> On Mon, Apr 6, 2015 at 12:27 PM, Jay Katukuri <jkatuk...@apple.com> wrote:
>
> Hi,
>
> Here is the stack trace:
>
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
> at ALSNew$.main(ALSNew.scala:35)
> at ALSNew.main(ALSNew.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Thanks,
> Jay
>
>
>
> On Apr 6, 2015, at 12:24 PM, Xiangrui Meng <men...@gmail.com> wrote:
>
> Please attach the full stack trace. -Xiangrui
>
> On Mon, Apr 6, 2015 at 12:06 PM, Jay Katukuri <jkatuk...@apple.com> wrote:
>
>
> Hi all,
>
> I got a runtime error while running the ALS.
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>
>
> The error that I am getting is at the following code:
>
> val ratings = purchase.map ( line =>
>  line.split(',') match { case Array(user, item, rate) =>
>  (user.toInt, item.toInt, rate.toFloat)
>  }).toDF()
>
>
> Any help is appreciated !
>
> I have tried passing the spark-sql jar using the -jar
> spark-sql_2.11-1.3.0.jar
>
> Thanks,
> Jay
>
>
>
> On Mar 17, 2015, at 12:50 PM, Xiangrui Meng <men...@gmail.com> wrote:
>
> Please remember to copy the user list next time. I might not be able
> to respond quickly. There are many others who can help or who can
> benefit from the discussion. Thanks! -Xiangrui
>
> On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jkatuk...@apple.com> wrote:
>
> Great Xiangrui. It works now.
>
> Sorry that I needed to bug you :)
>
> Jay
>
>
> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <men...@gmail.com> wrote:
>
> Please check this section in the user guide:
> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>
> You need `import sqlContext.implicits._` to use `toDF()`.
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jkatuk...@apple.com> wrote:
>
> Hi Xiangrui,
> Thanks a lot for the quick reply.
>
> I am still facing an issue.
>
> I have tried the code snippet that you have suggested:
>
> val ratings = purchase.map { line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate”)}
>
> for this, I got the below error:
>
> error: ';' expected but '.' found.
> [INFO] }.toDF("user", "item", "rate”)}
> [INFO]  ^
>
> when I tried below code
>
> val ratings = purchase.map ( line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }).toDF("user", "item", "rate")
>
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
> Float)]
> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
> [INFO]     }).toDF("user", "item", "rate")
>
>
>
> I have looked at the document that you have shared and tried the following
> code:
>
> case class Record(user: Int, item: Int, rate:Double)
> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>
> for this, I got the below error:
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>
>
> Appreciate your help !
>
> Thanks,
> Jay
>
>
> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <men...@gmail.com> wrote:
>
> Try this:
>
> val ratings = purchase.map { line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate")
>
> Doc for DataFrames:
> http://spark.apache.org/docs/latest/sql-programming-guide.html
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jkatuk...@apple.com> wrote:
>
> Hi all,
> I am trying to use the new ALS implementation under
> org.apache.spark.ml.recommendation.ALS.
>
>
>
> The new method to invoke for training seems to be  override def fit(dataset:
> DataFrame, paramMap: ParamMap): ALSModel.
>
> How do I create a dataframe object from ratings data set that is on hdfs ?
>
>
> where as the method in the old ALS implementation under
> org.apache.spark.mllib.recommendation.ALS was
> def train(
> ratings: RDD[Rating],
> rank: Int,
> iterations: Int,
> lambda: Double,
> blocks: Int,
> seed: Long
> ): MatrixFactorizationModel
>
> My code to run the old ALS train method is as below:
>
> "val sc = new SparkContext(conf)
>
> val pfile = args(0)
> val purchase=sc.textFile(pfile)
> val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>   Rating(user.toInt, item.toInt, rate.toInt)
> })
>
> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>
>
> Now, for the new ALS fit method, I am trying to use the below code to run,
> but getting a compilation error:
>
> val als = new ALS()
>  .setRank(rank)
> .setRegParam(regParam)
> .setImplicitPrefs(implicitPrefs)
> .setNumUserBlocks(numUserBlocks)
> .setNumItemBlocks(numItemBlocks)
>
> val sc = new SparkContext(conf)
>
> val pfile = args(0)
> val purchase=sc.textFile(pfile)
> val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>   Rating(user.toInt, item.toInt, rate.toInt)
> })
>
> val model = als.fit(ratings.toDF())
>
> I get an error that the method toDF() is not a member of
> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>
> Appreciate the help !
>
> Thanks,
> Jay
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: org.apache.spark.ml.recommendation.ALS

Reply via email to