[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-04-07 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r405098967
 
 

 ##
 File path: R/pkg/R/mllib_regression.R
 ##
 @@ -540,3 +546,147 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' Factorization Machines Regression Model
+#'
+#' \code{spark.fmRegressor} fits a factorization regression model against a 
SparkDataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load 
fitted models.
+#'
+#' @param data a \code{SparkDataFrame} of observations and labels for model 
fitting.
+#' @param formula a symbolic description of the model to be fitted. Currently 
only a few formula
+#'operators are supported, including '~', '.', ':', '+', and 
'-'.
+#' @param factorSize dimensionality of the factors.
+#' @param fitLinear whether to fit linear term.  # TODO Can we express this 
with formula?
+#' @param regParam the regularization parameter.
+#' @param miniBatchFraction the mini-batch fraction parameter.
+#' @param initStd the standard deviation of initial coefficients.
+#' @param maxIter maximum iteration number.
+#' @param stepSize stepSize parameter.
+#' @param tol convergence tolerance of iterations.
+#' @param solver solver parameter, supported options: "gd" (minibatch gradient 
descent) or "adamW".
+#' @param seed seed parameter for weights initialization.
+#' @param stringIndexerOrderType how to order categories of a string feature 
column. This is used to
+#'   decide the base level of a string feature as 
the last category
+#'   after ordering is dropped when encoding 
strings. Supported options
+#'   are "frequencyDesc", "frequencyAsc", 
"alphabetDesc", and
+#'   "alphabetAsc". The default value is 
"frequencyDesc". When the
+#'   ordering is set to "alphabetDesc", this drops 
the same category
+#'   as R when encoding strings.
+#' @param ... additional arguments passed to the method.
+#' @return \code{spark.fmRegressor} returns a fitted Factorization Machines 
Regression Model.
+#'
+#' @rdname spark.fmRegressor
+#' @aliases spark.fmRegressor,SparkDataFrame,formula-method
+#' @name spark.fmRegressor
+#' @seealso \link{read.ml}
+#' @examples
+#' \dontrun{
+#' df <- read.df("data/mllib/sample_linear_regression_data.txt", source = 
"libsvm")
+#'
+#' # fit Factorization Machines Regression Model
+#' model <- spark.fmRegressor(
+#'df, label ~ features,
 
 Review comment:
   nit: The indents seem not right? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-04-07 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r405099205
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/r/FMRegressorWrapper.scala
 ##
 @@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.feature.RFormula
+import org.apache.spark.ml.r.RWrapperUtils._
+import org.apache.spark.ml.regression.{FMRegressionModel, FMRegressor}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FMRegressorWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String]) extends MLWritable {
+  import FMRegressorWrapper._
 
 Review comment:
   nit: remove this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-04-07 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r405099515
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/r/FMRegressorWrapper.scala
 ##
 @@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.feature.RFormula
+import org.apache.spark.ml.r.RWrapperUtils._
+import org.apache.spark.ml.regression.{FMRegressionModel, FMRegressor}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FMRegressorWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String]) extends MLWritable {
+  import FMRegressorWrapper._
+
+  private val fmRegressionModel: FMRegressionModel =
+pipeline.stages(1).asInstanceOf[FMRegressionModel]
+
+  lazy val rFeatures: Array[String] = if (fmRegressionModel.getFitIntercept) {
+Array("(Intercept)") ++ features
+  } else {
+features
+  }
+
+  lazy val rCoefficients: Array[Double] = if 
(fmRegressionModel.getFitIntercept) {
+Array(fmRegressionModel.intercept) ++ fmRegressionModel.linear.toArray
+  } else {
+fmRegressionModel.linear.toArray
+  }
+
+  lazy val rFactors = fmRegressionModel.factors.toArray
+
+  lazy val numFeatures: Int = fmRegressionModel.numFeatures
+
+  lazy val factorSize: Int = fmRegressionModel.getFactorSize
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+pipeline.transform(dataset)
+  .drop(fmRegressionModel.getFeaturesCol)
+  }
+
+  override def write: MLWriter = new 
FMRegressorWrapper.FMRegressorWrapperWriter(this)
+}
+
+private[r] object FMRegressorWrapper
+  extends MLReadable[FMRegressorWrapper] {
+
+  def fit(  // scalastyle:ignore
+  data: DataFrame,
+  formula: String,
+  factorSize: Int,
+  fitLinear: Boolean,
+  regParam: Double,
+  miniBatchFraction: Double,
+  initStd: Double,
+  maxIter: Int,
+  stepSize: Double,
+  tol: Double,
+  solver: String,
+  seed: String,
+  stringIndexerOrderType: String): FMRegressorWrapper = {
+
+val rFormula = new RFormula()
+  .setFormula(formula)
+  .setStringIndexerOrderType(stringIndexerOrderType)
+checkDataColumns(rFormula, data)
+val rFormulaModel = rFormula.fit(data)
+
+val fitIntercept = rFormula.hasIntercept
+
+// get feature names from output schema
+val schema = rFormulaModel.transform(data).schema
+val featureAttrs = 
AttributeGroup.fromStructField(schema(rFormulaModel.getFeaturesCol))
+  .attributes.get
+val features = featureAttrs.map(_.name.get)
+
+// assemble and fit the pipeline
+val fmr = new FMRegressor()
+  .setFactorSize(factorSize)
+  .setFitIntercept(fitIntercept)
+  .setFitLinear(fitLinear)
+  .setRegParam(regParam)
+  .setMiniBatchFraction(miniBatchFraction)
+  .setInitStd(initStd)
+  .setMaxIter(maxIter)
+  .setStepSize(stepSize)
+  .setTol(tol)
+  .setSolver(solver)
+  .setFeaturesCol(rFormula.getFeaturesCol)
+
+if (seed != null && seed.length > 0) {
+  fmr.setSeed(seed.toLong)
+}
+
+val pipeline = new Pipeline()
+  .setStages(Array(rFormulaModel, fmr))
+  .fit(data)
+
+new FMRegressorWrapper(pipeline, features)
+  }
+
+  override def read: MLReader[FMRegressorWrapper] = new 
FMRegressorWrapperReader
+
+  override def load(path: String): FMRegressorWrapper = super.load(path)
 
 Review comment:
   nit: remove this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379883014
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/r/FMRegressorWrapper.scala
 ##
 @@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.feature.RFormula
+import org.apache.spark.ml.r.RWrapperUtils._
+import org.apache.spark.ml.regression.{FMRegressionModel, FMRegressor}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FMRegressorWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String]) extends MLWritable {
+  import FMRegressorWrapper._
+
+  private val fmRegressionModel: FMRegressionModel =
+pipeline.stages(1).asInstanceOf[FMRegressionModel]
+
+  lazy val rFeatures: Array[String] = if (fmRegressionModel.getFitIntercept) {
+Array("(Intercept)") ++ features
+  } else {
+features
+  }
+
+  lazy val rCoefficients: Array[Double] = if 
(fmRegressionModel.getFitIntercept) {
+Array(fmRegressionModel.intercept) ++ fmRegressionModel.linear.toArray
+  } else {
+fmRegressionModel.linear.toArray
+  }
+
+  lazy val rFactors = fmRegressionModel.factors.toArray
+
+  lazy val numFeatures: Int = fmRegressionModel.numFeatures
+
+  lazy val factorSize: Int = fmRegressionModel.getFactorSize
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+pipeline.transform(dataset)
+  .drop(fmRegressionModel.getFeaturesCol)
+  }
+
+  override def write: MLWriter = new 
FMRegressorWrapper.FMRegressorWrapperWriter(this)
+}
+
+private[r] object FMRegressorWrapper
+  extends MLReadable[FMRegressorWrapper] {
+
+  def fit(  // scalastyle:ignore
+  data: DataFrame,
+  formula: String,
+  factorSize: Int,
+  fitLinear: Boolean,
+  regParam: Double,
+  miniBatchFraction: Double,
+  initStd: Double,
+  maxIter: Int,
+  stepSize: Double,
+  tol: Double,
+  solver: String,
+  seed: String,
+  stringIndexerOrderType: String): FMRegressorWrapper = {
+
+val rFormula = new RFormula()
+  .setFormula(formula)
+  .setStringIndexerOrderType(stringIndexerOrderType)
+checkDataColumns(rFormula, data)
+val rFormulaModel = rFormula.fit(data)
+
+val fitIntercept = rFormula.hasIntercept
+
+// get feature names from output schema
+val schema = rFormulaModel.transform(data).schema
+val featureAttrs = 
AttributeGroup.fromStructField(schema(rFormulaModel.getFeaturesCol))
+  .attributes.get
+val features = featureAttrs.map(_.name.get)
+
+// assemble and fit the pipeline
+val fmr = new FMRegressor()
+  .setFactorSize(factorSize)
+  .setFitLinear(fitLinear)
+  .setRegParam(regParam)
+  .setMiniBatchFraction(miniBatchFraction)
+  .setInitStd(initStd)
+  .setMaxIter(maxIter)
+  .setTol(tol)
+  .setSolver(solver)
+  .setFitIntercept(fitIntercept)
+  .setFeaturesCol(rFormula.getFeaturesCol)
+
+if (seed != null) {
 
 Review comment:
   also check ```seed.length > 0```?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882860
 
 

 ##
 File path: examples/src/main/r/ml/fmRegressor.R
 ##
 @@ -0,0 +1,40 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/decisionTree.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-fmRegressor-example")
+
+# $example on
+# Load training data
+df <- read.df("data/mllib/sample_linear_regression_data.txt", source = 
"libsvm")
+training_test <- randomSplit(df, c(0.7, 0.3))
+training <- training_test[[1]]
+test <- training_test[[2]]
+
+
+# Fit a FM regression model
+model <- spark.fmRegressor(training, label ~ features)
+
+# Prediction
+predictions <- predict(model, test)
 
 Review comment:
   same as the classifier example, I guess add ```summary(model)```, 
```head(predictions)``` and also add ```sparkR.session.stop()``` in the end?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882476
 
 

 ##
 File path: R/pkg/tests/fulltests/test_mllib_regression.R
 ##
 @@ -551,4 +551,33 @@ test_that("spark.survreg", {
   }
 })
 
+
+test_that("spark.fmRegressor", {
+  df <- suppressWarnings(createDataFrame(iris))
+
+  model <- spark.fmRegressor(
+df,  Sepal_Width ~ .,
+regParam = 0.01, maxIter = 10, fitLinear = TRUE
+  )
+
+  prediction1 <- predict(model, df)
+  expect_is(prediction1, "SparkDataFrame")
 
 Review comment:
   I guess we may want to check the predict result too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882615
 
 

 ##
 File path: examples/src/main/r/ml/fmRegressor.R
 ##
 @@ -0,0 +1,40 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/decisionTree.R
 
 Review comment:
   change this ```decisionTree.R```?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882600
 
 

 ##
 File path: examples/src/main/r/ml/fmRegressor.R
 ##
 @@ -0,0 +1,40 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/decisionTree.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-fmRegressor-example")
+
+# $example on
+# Load training data
+df <- read.df("data/mllib/sample_linear_regression_data.txt", source = 
"libsvm")
+training_test <- randomSplit(df, c(0.7, 0.3))
+training <- training_test[[1]]
+test <- training_test[[2]]
+
+
 
 Review comment:
   nit: delete extra line


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882920
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/r/FMRegressorWrapper.scala
 ##
 @@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.feature.RFormula
+import org.apache.spark.ml.r.RWrapperUtils._
+import org.apache.spark.ml.regression.{FMRegressionModel, FMRegressor}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FMRegressorWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String]) extends MLWritable {
+  import FMRegressorWrapper._
+
+  private val fmRegressionModel: FMRegressionModel =
+pipeline.stages(1).asInstanceOf[FMRegressionModel]
+
+  lazy val rFeatures: Array[String] = if (fmRegressionModel.getFitIntercept) {
+Array("(Intercept)") ++ features
+  } else {
+features
+  }
+
+  lazy val rCoefficients: Array[Double] = if 
(fmRegressionModel.getFitIntercept) {
+Array(fmRegressionModel.intercept) ++ fmRegressionModel.linear.toArray
+  } else {
+fmRegressionModel.linear.toArray
+  }
+
+  lazy val rFactors = fmRegressionModel.factors.toArray
+
+  lazy val numFeatures: Int = fmRegressionModel.numFeatures
+
+  lazy val factorSize: Int = fmRegressionModel.getFactorSize
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+pipeline.transform(dataset)
+  .drop(fmRegressionModel.getFeaturesCol)
+  }
+
+  override def write: MLWriter = new 
FMRegressorWrapper.FMRegressorWrapperWriter(this)
+}
+
+private[r] object FMRegressorWrapper
+  extends MLReadable[FMRegressorWrapper] {
+
+  def fit(  // scalastyle:ignore
+  data: DataFrame,
+  formula: String,
+  factorSize: Int,
+  fitLinear: Boolean,
+  regParam: Double,
+  miniBatchFraction: Double,
+  initStd: Double,
+  maxIter: Int,
+  stepSize: Double,
+  tol: Double,
+  solver: String,
+  seed: String,
+  stringIndexerOrderType: String): FMRegressorWrapper = {
+
+val rFormula = new RFormula()
+  .setFormula(formula)
+  .setStringIndexerOrderType(stringIndexerOrderType)
+checkDataColumns(rFormula, data)
+val rFormulaModel = rFormula.fit(data)
+
+val fitIntercept = rFormula.hasIntercept
+
+// get feature names from output schema
+val schema = rFormulaModel.transform(data).schema
+val featureAttrs = 
AttributeGroup.fromStructField(schema(rFormulaModel.getFeaturesCol))
+  .attributes.get
+val features = featureAttrs.map(_.name.get)
+
+// assemble and fit the pipeline
+val fmr = new FMRegressor()
+  .setFactorSize(factorSize)
+  .setFitLinear(fitLinear)
+  .setRegParam(regParam)
+  .setMiniBatchFraction(miniBatchFraction)
+  .setInitStd(initStd)
+  .setMaxIter(maxIter)
+  .setTol(tol)
+  .setSolver(solver)
+  .setFitIntercept(fitIntercept)
+  .setFeaturesCol(rFormula.getFeaturesCol)
 
 Review comment:
   add ```setStepSize```?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882343
 
 

 ##
 File path: R/pkg/R/mllib_regression.R
 ##
 @@ -540,3 +546,150 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
+#' Factorization Machines Regression Model Model
+#'
+#' \code{spark.fmRegressor} fits a factorization regression model against a 
SparkDataFrame.
+#' Users can call \code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load 
fitted models.
+#'
+#' @param data a \code{SparkDataFrame} of observations and labels for model 
fitting.
+#' @param formula a symbolic description of the model to be fitted. Currently 
only a few formula
+#'operators are supported, including '~', '.', ':', '+', and 
'-'.
+#' @param factorSize dimensionality of the factors.
+#' @param fitLinear whether to fit linear term.  # TODO Can we express this 
with formula?
+#' @param regParam the regularization parameter.
+#' @param miniBatchFraction the mini-batch fraction parameter.
+#' @param initStd the standard deviation of initial coefficients.
+#' @param maxIter maximum iteration number.
+#' @param stepSize stepSize parameter.
+#' @param tol convergence tolerance of iterations.
+#' @param solver solver parameter, supported options: "gd" (minibatch gradient 
descent) or "adamW".
+#' @param seed seed parameter for weights initialization.
+#' @param stringIndexerOrderType how to order categories of a string feature 
column. This is used to
+#'   decide the base level of a string feature as 
the last category
+#'   after ordering is dropped when encoding 
strings. Supported options
+#'   are "frequencyDesc", "frequencyAsc", 
"alphabetDesc", and
+#'   "alphabetAsc". The default value is 
"frequencyDesc". When the
+#'   ordering is set to "alphabetDesc", this drops 
the same category
+#'   as R when encoding strings.
+#' @param ... additional arguments passed to the method.
+#' @return \code{spark.fmRegressor} returns a fitted Factorization Machines 
Regression Model.
+#'
+#' @rdname spark.fmRegressor
+#' @aliases spark.fmRegressor,SparkDataFrame,formula-method
+#' @name spark.fmRegressor
+#' @seealso \link{read.ml}
+#' @examples
+#' \dontrun{
+#' df <- read.df("data/mllib/sample_linear_regression_data.txt", source = 
"libsvm")
+#'
+#' # fit Factorization Machines Regression Model
+#' model <- spark.fmRegressor(
+#'df, label ~ features,
+#'regParam = 0.01, maxIter = 10, fitLinear = TRUE
+#'  )
+#'
+#' # get the summary of the model
+#' summary(model)
+#'
+#' # make predictions
+#' predictions <- predict(model, df)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.fmRegressor since 3.1.0
+setMethod("spark.fmRegressor", signature(data = "SparkDataFrame", formula = 
"formula"),
+  function(data, formula, factorSize = 8, fitLinear = TRUE, regParam = 
0.0,
+   miniBatchFraction = 1.0, initStd = 0.01, maxIter = 100, 
stepSize=1.0,
+   tol = 1e-6, solver = c("adamW", "gd"), seed = NULL,
+   stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
+  "alphabetDesc", "alphabetAsc")) {
+
+formula <- paste(deparse(formula), collapse = "")
+
+if (!is.null(seed)) {
+  seed <- as.character(as.integer(seed))
+}
+
+solver <- match.arg(solver)
+stringIndexerOrderType <- match.arg(stringIndexerOrderType)
+
+jobj <- callJStatic("org.apache.spark.ml.r.FMRegressorWrapper",
+"fit",
+data@sdf,
+formula,
+as.integer(factorSize),
+as.logical(fitLinear),
+as.numeric(regParam),
+as.numeric(miniBatchFraction),
+as.numeric(initStd),
+as.integer(maxIter),
+as.numeric(stepSize),
+as.numeric(tol),
+solver,
+seed,
+stringIndexerOrderType)
+new("FMRegressionModel", jobj = jobj)
+  })
+
+
 
 Review comment:
   nit: delete extra line?


[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882358
 
 

 ##
 File path: R/pkg/R/mllib_regression.R
 ##
 @@ -540,3 +546,150 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
+#' Factorization Machines Regression Model Model
+#'
+#' \code{spark.fmRegressor} fits a factorization regression model against a 
SparkDataFrame.
+#' Users can call \code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load 
fitted models.
+#'
+#' @param data a \code{SparkDataFrame} of observations and labels for model 
fitting.
+#' @param formula a symbolic description of the model to be fitted. Currently 
only a few formula
+#'operators are supported, including '~', '.', ':', '+', and 
'-'.
+#' @param factorSize dimensionality of the factors.
+#' @param fitLinear whether to fit linear term.  # TODO Can we express this 
with formula?
+#' @param regParam the regularization parameter.
+#' @param miniBatchFraction the mini-batch fraction parameter.
+#' @param initStd the standard deviation of initial coefficients.
+#' @param maxIter maximum iteration number.
+#' @param stepSize stepSize parameter.
+#' @param tol convergence tolerance of iterations.
+#' @param solver solver parameter, supported options: "gd" (minibatch gradient 
descent) or "adamW".
+#' @param seed seed parameter for weights initialization.
+#' @param stringIndexerOrderType how to order categories of a string feature 
column. This is used to
+#'   decide the base level of a string feature as 
the last category
+#'   after ordering is dropped when encoding 
strings. Supported options
+#'   are "frequencyDesc", "frequencyAsc", 
"alphabetDesc", and
+#'   "alphabetAsc". The default value is 
"frequencyDesc". When the
+#'   ordering is set to "alphabetDesc", this drops 
the same category
+#'   as R when encoding strings.
+#' @param ... additional arguments passed to the method.
+#' @return \code{spark.fmRegressor} returns a fitted Factorization Machines 
Regression Model.
+#'
+#' @rdname spark.fmRegressor
+#' @aliases spark.fmRegressor,SparkDataFrame,formula-method
+#' @name spark.fmRegressor
+#' @seealso \link{read.ml}
+#' @examples
+#' \dontrun{
+#' df <- read.df("data/mllib/sample_linear_regression_data.txt", source = 
"libsvm")
+#'
+#' # fit Factorization Machines Regression Model
+#' model <- spark.fmRegressor(
+#'df, label ~ features,
+#'regParam = 0.01, maxIter = 10, fitLinear = TRUE
+#'  )
+#'
+#' # get the summary of the model
+#' summary(model)
+#'
+#' # make predictions
+#' predictions <- predict(model, df)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.fmRegressor since 3.1.0
+setMethod("spark.fmRegressor", signature(data = "SparkDataFrame", formula = 
"formula"),
+  function(data, formula, factorSize = 8, fitLinear = TRUE, regParam = 
0.0,
+   miniBatchFraction = 1.0, initStd = 0.01, maxIter = 100, 
stepSize=1.0,
+   tol = 1e-6, solver = c("adamW", "gd"), seed = NULL,
+   stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
+  "alphabetDesc", "alphabetAsc")) {
+
+formula <- paste(deparse(formula), collapse = "")
+
+if (!is.null(seed)) {
+  seed <- as.character(as.integer(seed))
+}
+
+solver <- match.arg(solver)
+stringIndexerOrderType <- match.arg(stringIndexerOrderType)
+
+jobj <- callJStatic("org.apache.spark.ml.r.FMRegressorWrapper",
+"fit",
+data@sdf,
+formula,
+as.integer(factorSize),
+as.logical(fitLinear),
+as.numeric(regParam),
+as.numeric(miniBatchFraction),
+as.numeric(initStd),
+as.integer(maxIter),
+as.numeric(stepSize),
+as.numeric(tol),
+solver,
+seed,
+stringIndexerOrderType)
+new("FMRegressionModel", jobj = jobj)
+  })
+
+
+#  Returns the summary of a FM Regression model produced by 
\code{spark.fmRegressor}
+
+#' @param 

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882027
 
 

 ##
 File path: R/pkg/R/mllib_regression.R
 ##
 @@ -540,3 +546,150 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
+#' Factorization Machines Regression Model Model
 
 Review comment:
   nit: ```Model Model```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882145
 
 

 ##
 File path: R/pkg/R/mllib_regression.R
 ##
 @@ -540,3 +546,150 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
+#' Factorization Machines Regression Model Model
+#'
+#' \code{spark.fmRegressor} fits a factorization regression model against a 
SparkDataFrame.
+#' Users can call \code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load 
fitted models.
 
 Review comment:
   I guess also mention ```summary``` here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882502
 
 

 ##
 File path: R/pkg/tests/fulltests/test_mllib_regression.R
 ##
 @@ -551,4 +551,33 @@ test_that("spark.survreg", {
   }
 })
 
+
+test_that("spark.fmRegressor", {
+  df <- suppressWarnings(createDataFrame(iris))
+
+  model <- spark.fmRegressor(
+df,  Sepal_Width ~ .,
+regParam = 0.01, maxIter = 10, fitLinear = TRUE
+  )
+
+  prediction1 <- predict(model, df)
+  expect_is(prediction1, "SparkDataFrame")
+
+  # Test model save/load
+  if (windows_with_hadoop()) {
+modelPath <- tempfile(pattern = "spark-fmregressor", fileext = ".tmp")
+write.ml(model, modelPath)
+model2 <- read.ml(modelPath)
+
+expect_is(model2, "FMRegressionModel")
+
+prediction2 <- predict(model2, df)
+expect_equal(
+  collect(prediction1),
+  collect(prediction2)
+)
+  }
+})
+
+
 
 Review comment:
   nit: delete extra line


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox
huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379881943
 
 

 ##
 File path: R/pkg/R/mllib_regression.R
 ##
 @@ -540,3 +546,150 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
 
 Review comment:
   nit: delete extra line?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org