[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13381


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66961848
  
--- Diff: examples/src/main/python/mllib/isotonic_regression_example.py ---
@@ -23,18 +23,22 @@
 from pyspark import SparkContext
 # $example on$
 import math
-from pyspark.mllib.regression import IsotonicRegression, 
IsotonicRegressionModel
+from pyspark.mllib.regression import LabeledPoint, IsotonicRegression, 
IsotonicRegressionModel
 # $example off$
 
 if __name__ == "__main__":
 
 sc = SparkContext(appName="PythonIsotonicRegressionExample")
 
 # $example on$
-data = sc.textFile("data/mllib/sample_isotonic_regression_data.txt")
+# Load and parse the data
+def parsePoint(line):
+values = [float(x) for x in line.replace(',', ' ').replace(':', ' 
').split(' ')]
+return (values[0], values[2], 1.0)
+data = 
sc.textFile("data/mllib/sample_isotonic_regression_libsvm_data.txt")
--- End diff --

@yanboliang Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-13 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66828101
  
--- Diff: examples/src/main/python/mllib/isotonic_regression_example.py ---
@@ -23,18 +23,22 @@
 from pyspark import SparkContext
 # $example on$
 import math
-from pyspark.mllib.regression import IsotonicRegression, 
IsotonicRegressionModel
+from pyspark.mllib.regression import LabeledPoint, IsotonicRegression, 
IsotonicRegressionModel
 # $example off$
 
 if __name__ == "__main__":
 
 sc = SparkContext(appName="PythonIsotonicRegressionExample")
 
 # $example on$
-data = sc.textFile("data/mllib/sample_isotonic_regression_data.txt")
+# Load and parse the data
+def parsePoint(line):
+values = [float(x) for x in line.replace(',', ' ').replace(':', ' 
').split(' ')]
+return (values[0], values[2], 1.0)
+data = 
sc.textFile("data/mllib/sample_isotonic_regression_libsvm_data.txt")
--- End diff --

Since we use the dataset with libsvm format, we should use 
```MLUtils.loadLibSVMFile``` to load that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-13 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66827012
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaIsotonicRegressionExample.java
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.ml;
+
+// $example on$
+
+import org.apache.spark.ml.regression.IsotonicRegression;
+import org.apache.spark.ml.regression.IsotonicRegressionModel;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+// $example off$
+import org.apache.spark.sql.SparkSession;
+
+/**
+ * An example demonstrating IsotonicRegression.
+ * Run with
+ * 
+ * bin/run-example ml.JavaIsotonicRegressionExample
+ * 
+ */
+public class JavaIsotonicRegressionExample {
+
+  public static void main(String[] args) {
+// Create a SparkSession.
+SparkSession spark = SparkSession
+  .builder()
+  .appName("JavaIsotonicRegression")
--- End diff --

nit: ```JavaIsotonicRegression``` -> ```JavaIsotonicRegressionExample```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66715565
  
--- Diff: docs/ml-classification-regression.md ---
@@ -685,6 +685,76 @@ The implementation matches the result from R's 
survival function
 
 
 
+## Isotonic regression
+[Isotonic regression](http://en.wikipedia.org/wiki/Isotonic_regression)
+belongs to the family of regression algorithms. Formally isotonic 
regression is a problem where
+given a finite set of real numbers `$Y = {y_1, y_2, ..., y_n}$` 
representing observed responses
+and `$X = {x_1, x_2, ..., x_n}$` the unknown response values to be fitted
+finding a function that minimises
+
+`\begin{equation}
+  f(x) = \sum_{i=1}^n w_i (y_i - x_i)^2
+\end{equation}`
+
+with respect to complete order subject to
+`$x_1\le x_2\le ...\le x_n$` where `$w_i$` are positive weights.
+The resulting function is called isotonic regression and it is unique.
+It can be viewed as least squares problem under order restriction.
+Essentially isotonic regression is a
+[monotonic function](http://en.wikipedia.org/wiki/Monotonic_function)
+best fitting the original data points.
+
+In `spark.ml`, we implement a
--- End diff --

@jkbradley Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66714994
  
--- Diff: docs/ml-classification-regression.md ---
@@ -685,6 +685,76 @@ The implementation matches the result from R's 
survival function
 
 
 
+## Isotonic regression
+[Isotonic regression](http://en.wikipedia.org/wiki/Isotonic_regression)
+belongs to the family of regression algorithms. Formally isotonic 
regression is a problem where
+given a finite set of real numbers `$Y = {y_1, y_2, ..., y_n}$` 
representing observed responses
+and `$X = {x_1, x_2, ..., x_n}$` the unknown response values to be fitted
+finding a function that minimises
+
+`\begin{equation}
+  f(x) = \sum_{i=1}^n w_i (y_i - x_i)^2
+\end{equation}`
+
+with respect to complete order subject to
+`$x_1\le x_2\le ...\le x_n$` where `$w_i$` are positive weights.
+The resulting function is called isotonic regression and it is unique.
+It can be viewed as least squares problem under order restriction.
+Essentially isotonic regression is a
+[monotonic function](http://en.wikipedia.org/wiki/Monotonic_function)
+best fitting the original data points.
+
+In `spark.ml`, we implement a
--- End diff --

I'd avoid using ```spark.ml```.  It was useful when there were 2 active 
APIs, but the naming confuses people sometimes.  I'd just start with "We 
implement..."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66700697
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/IsotonicRegressionExample.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.regression.IsotonicRegression
+// $example off$
+import org.apache.spark.sql.SparkSession
+
+/**
+ * An example demonstrating Isotonic Regression.
+ * Run with
+ * {{{
+ * bin/run-example ml.IsotonicRegressionExample
+ * }}}
+ */
+object IsotonicRegressionExample {
+
+  def main(args: Array[String]): Unit = {
+
+// Creates a SparkSession.
+val spark = SparkSession
+  .builder
+  .appName(s"${this.getClass.getSimpleName}")
+  .getOrCreate()
+
+// $example on$
+// Loads data.
+val dataset = spark.read.format("libsvm")
+  .load("data/mllib/sample_isotonic_regression_libsvm_data.txt")
+
+// Trains an isotonic regression model.
+val ir = new IsotonicRegression()
+val model = ir.fit(dataset)
+
+println(s"Boundaries in increasing order: ${model.boundaries}")
+println(s"Predictions associated with the boundaries: 
${model.predictions}")
+
+// Makes predictions.
+model.transform(dataset).show
--- End diff --

@jkbradley Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66698378
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaIsotonicRegressionExample.java
 ---
@@ -35,14 +37,15 @@ public static void main(String[] args) {
 SparkConf sparkConf = new 
SparkConf().setAppName("JavaIsotonicRegressionExample");
 JavaSparkContext jsc = new JavaSparkContext(sparkConf);
 // $example on$
-JavaRDD data = 
jsc.textFile("data/mllib/sample_isotonic_regression_data.txt");
+JavaRDD data = MLUtils.loadLibSVMFile(
+jsc.sc(), 
"data/mllib/sample_isotonic_regression_libsvm_data.txt").toJavaRDD();
--- End diff --

Fix indentation: indent by 2 spaces here and elsewhere


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66698376
  
--- Diff: docs/ml-classification-regression.md ---
@@ -685,6 +685,76 @@ The implementation matches the result from R's 
survival function
 
 
 
+## Isotonic regression
+[Isotonic regression](http://en.wikipedia.org/wiki/Isotonic_regression)
+belongs to the family of regression algorithms. Formally isotonic 
regression is a problem where
+given a finite set of real numbers `$Y = {y_1, y_2, ..., y_n}$` 
representing observed responses
+and `$X = {x_1, x_2, ..., x_n}$` the unknown response values to be fitted
+finding a function that minimises
+
+`\begin{equation}
+  f(x) = \sum_{i=1}^n w_i (y_i - x_i)^2
+\end{equation}`
+
+with respect to complete order subject to
+`$x_1\le x_2\le ...\le x_n$` where `$w_i$` are positive weights.
+The resulting function is called isotonic regression and it is unique.
+It can be viewed as least squares problem under order restriction.
+Essentially isotonic regression is a
+[monotonic function](http://en.wikipedia.org/wiki/Monotonic_function)
+best fitting the original data points.
+
+MLlib supports a
+[pool adjacent violators algorithm](http://doi.org/10.1198/TECH.2010.10111)
+which uses an approach to
+[parallelizing isotonic 
regression](http://doi.org/10.1007/978-3-642-99789-1_10).
+The training input is a RDD of tuples of three double values that represent
--- End diff --

not an RDD


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-10 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13381#discussion_r66698380
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/IsotonicRegressionExample.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.regression.IsotonicRegression
+// $example off$
+import org.apache.spark.sql.SparkSession
+
+/**
+ * An example demonstrating Isotonic Regression.
+ * Run with
+ * {{{
+ * bin/run-example ml.IsotonicRegressionExample
+ * }}}
+ */
+object IsotonicRegressionExample {
+
+  def main(args: Array[String]): Unit = {
+
+// Creates a SparkSession.
+val spark = SparkSession
+  .builder
+  .appName(s"${this.getClass.getSimpleName}")
+  .getOrCreate()
+
+// $example on$
+// Loads data.
+val dataset = spark.read.format("libsvm")
+  .load("data/mllib/sample_isotonic_regression_libsvm_data.txt")
+
+// Trains an isotonic regression model.
+val ir = new IsotonicRegression()
+val model = ir.fit(dataset)
+
+println(s"Boundaries in increasing order: ${model.boundaries}")
+println(s"Predictions associated with the boundaries: 
${model.predictions}")
+
+// Makes predictions.
+model.transform(dataset).show
--- End diff --

"show" --> "show()"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org