spark git commit: [SPARK-22289][ML] Add JSON support for Matrix parameters (LR with coefficients bound)

2017-12-12 Thread yliang
Repository: spark
Updated Branches:
  refs/heads/branch-2.2 9e2d96d1d -> 00cdb38dc


[SPARK-22289][ML] Add JSON support for Matrix parameters (LR with coefficients 
bound)

## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-22289

add JSON encoding/decoding for Param[Matrix].

The issue was reported by Nic Eggert during saving LR model with 
LowerBoundsOnCoefficients.
There're two ways to resolve this as I see:
1. Support save/load on LogisticRegressionParams, and also adjust the save/load 
in LogisticRegression and LogisticRegressionModel.
2. Directly support Matrix in Param.jsonEncode, similar to what we have done 
for Vector.

After some discussion in jira, we prefer the fix to support Matrix as a valid 
Param type, for simplicity and convenience for other classes.

Note that in the implementation, I added a "class" field in the JSON object to 
match different JSON converters when loading, which is for preciseness and 
future extension.

## How was this patch tested?

new unit test to cover the LR case and JsonMatrixConverter

Author: Yuhao Yang 

Closes #19525 from hhbyyh/lrsave.

(cherry picked from commit 10c27a6559803797e89c28ced11c1087127b82eb)
Signed-off-by: Yanbo Liang 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/00cdb38d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/00cdb38d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/00cdb38d

Branch: refs/heads/branch-2.2
Commit: 00cdb38dcd0f617de7f0559214a8b1a35e9b179c
Parents: 9e2d96d
Author: Yuhao Yang 
Authored: Tue Dec 12 11:27:01 2017 -0800
Committer: Yanbo Liang 
Committed: Tue Dec 12 11:27:40 2017 -0800

--
 .../org/apache/spark/ml/linalg/Matrices.scala   |  7 ++
 .../spark/ml/linalg/JsonMatrixConverter.scala   | 79 
 .../org/apache/spark/ml/param/params.scala  | 36 +++--
 .../LogisticRegressionSuite.scala   | 11 +++
 .../ml/linalg/JsonMatrixConverterSuite.scala| 45 +++
 5 files changed, 170 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/00cdb38d/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
--
diff --git 
a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala 
b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
index 07f3bc2..ed3e493 100644
--- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
+++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
@@ -476,6 +476,9 @@ class DenseMatrix @Since("2.0.0") (
 @Since("2.0.0")
 object DenseMatrix {
 
+  private[ml] def unapply(dm: DenseMatrix): Option[(Int, Int, Array[Double], 
Boolean)] =
+Some((dm.numRows, dm.numCols, dm.values, dm.isTransposed))
+
   /**
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
@@ -827,6 +830,10 @@ class SparseMatrix @Since("2.0.0") (
 @Since("2.0.0")
 object SparseMatrix {
 
+  private[ml] def unapply(
+   sm: SparseMatrix): Option[(Int, Int, Array[Int], Array[Int], 
Array[Double], Boolean)] =
+Some((sm.numRows, sm.numCols, sm.colPtrs, sm.rowIndices, sm.values, 
sm.isTransposed))
+
   /**
* Generate a `SparseMatrix` from Coordinate List (COO) format. Input must 
be an array of
* (i, j, value) tuples. Entries that have duplicate values of i and j are

http://git-wip-us.apache.org/repos/asf/spark/blob/00cdb38d/mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala 
b/mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala
new file mode 100644
index 000..0bee643
--- /dev/null
+++ b/mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific 

spark git commit: [SPARK-22289][ML] Add JSON support for Matrix parameters (LR with coefficients bound)

2017-12-12 Thread yliang
Repository: spark
Updated Branches:
  refs/heads/master e6dc5f280 -> 10c27a655


[SPARK-22289][ML] Add JSON support for Matrix parameters (LR with coefficients 
bound)

## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-22289

add JSON encoding/decoding for Param[Matrix].

The issue was reported by Nic Eggert during saving LR model with 
LowerBoundsOnCoefficients.
There're two ways to resolve this as I see:
1. Support save/load on LogisticRegressionParams, and also adjust the save/load 
in LogisticRegression and LogisticRegressionModel.
2. Directly support Matrix in Param.jsonEncode, similar to what we have done 
for Vector.

After some discussion in jira, we prefer the fix to support Matrix as a valid 
Param type, for simplicity and convenience for other classes.

Note that in the implementation, I added a "class" field in the JSON object to 
match different JSON converters when loading, which is for preciseness and 
future extension.

## How was this patch tested?

new unit test to cover the LR case and JsonMatrixConverter

Author: Yuhao Yang 

Closes #19525 from hhbyyh/lrsave.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/10c27a65
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/10c27a65
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/10c27a65

Branch: refs/heads/master
Commit: 10c27a6559803797e89c28ced11c1087127b82eb
Parents: e6dc5f2
Author: Yuhao Yang 
Authored: Tue Dec 12 11:27:01 2017 -0800
Committer: Yanbo Liang 
Committed: Tue Dec 12 11:27:01 2017 -0800

--
 .../org/apache/spark/ml/linalg/Matrices.scala   |  7 ++
 .../spark/ml/linalg/JsonMatrixConverter.scala   | 79 
 .../org/apache/spark/ml/param/params.scala  | 36 +++--
 .../LogisticRegressionSuite.scala   | 11 +++
 .../ml/linalg/JsonMatrixConverterSuite.scala| 45 +++
 5 files changed, 170 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/10c27a65/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
--
diff --git 
a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala 
b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
index 66c5362..14428c6 100644
--- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
+++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
@@ -476,6 +476,9 @@ class DenseMatrix @Since("2.0.0") (
 @Since("2.0.0")
 object DenseMatrix {
 
+  private[ml] def unapply(dm: DenseMatrix): Option[(Int, Int, Array[Double], 
Boolean)] =
+Some((dm.numRows, dm.numCols, dm.values, dm.isTransposed))
+
   /**
* Generate a `DenseMatrix` consisting of zeros.
* @param numRows number of rows of the matrix
@@ -827,6 +830,10 @@ class SparseMatrix @Since("2.0.0") (
 @Since("2.0.0")
 object SparseMatrix {
 
+  private[ml] def unapply(
+   sm: SparseMatrix): Option[(Int, Int, Array[Int], Array[Int], 
Array[Double], Boolean)] =
+Some((sm.numRows, sm.numCols, sm.colPtrs, sm.rowIndices, sm.values, 
sm.isTransposed))
+
   /**
* Generate a `SparseMatrix` from Coordinate List (COO) format. Input must 
be an array of
* (i, j, value) tuples. Entries that have duplicate values of i and j are

http://git-wip-us.apache.org/repos/asf/spark/blob/10c27a65/mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala 
b/mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala
new file mode 100644
index 000..0bee643
--- /dev/null
+++ b/mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.ml.linalg
+
+import