[spark] branch master updated (58f87b3 -> a0bd273)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 58f87b3 [SPARK-32639][SQL] Support GroupType parquet mapkey field add a0bd273 [SPARK-32092][ML][PYSPARK][FOLLOWUP] Fixed CrossValidatorModel.copy() to copy models instead of list No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 8 python/pyspark/ml/tuning.py| 5 - 2 files changed, 8 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (58f87b3 -> a0bd273)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 58f87b3 [SPARK-32639][SQL] Support GroupType parquet mapkey field add a0bd273 [SPARK-32092][ML][PYSPARK][FOLLOWUP] Fixed CrossValidatorModel.copy() to copy models instead of list No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 8 python/pyspark/ml/tuning.py| 5 - 2 files changed, 8 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (58f87b3 -> a0bd273)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 58f87b3 [SPARK-32639][SQL] Support GroupType parquet mapkey field add a0bd273 [SPARK-32092][ML][PYSPARK][FOLLOWUP] Fixed CrossValidatorModel.copy() to copy models instead of list No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 8 python/pyspark/ml/tuning.py| 5 - 2 files changed, 8 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (58f87b3 -> a0bd273)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 58f87b3 [SPARK-32639][SQL] Support GroupType parquet mapkey field add a0bd273 [SPARK-32092][ML][PYSPARK][FOLLOWUP] Fixed CrossValidatorModel.copy() to copy models instead of list No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 8 python/pyspark/ml/tuning.py| 5 - 2 files changed, 8 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (58f87b3 -> a0bd273)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 58f87b3 [SPARK-32639][SQL] Support GroupType parquet mapkey field add a0bd273 [SPARK-32092][ML][PYSPARK][FOLLOWUP] Fixed CrossValidatorModel.copy() to copy models instead of list No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 8 python/pyspark/ml/tuning.py| 5 - 2 files changed, 8 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1450b5e -> 1fd54f4)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1450b5e [MINOR][DOCS] fix typo for docs,log message and comments add 1fd54f4 [SPARK-32662][ML] CountVectorizerModel: Remove requirement for minimum Vocab size No new revisions were added by this update. Summary of changes: .../apache/spark/ml/feature/CountVectorizer.scala | 5 +- .../spark/ml/feature/CountVectorizerSuite.scala| 74 +- 2 files changed, 63 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1450b5e -> 1fd54f4)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1450b5e [MINOR][DOCS] fix typo for docs,log message and comments add 1fd54f4 [SPARK-32662][ML] CountVectorizerModel: Remove requirement for minimum Vocab size No new revisions were added by this update. Summary of changes: .../apache/spark/ml/feature/CountVectorizer.scala | 5 +- .../spark/ml/feature/CountVectorizerSuite.scala| 74 +- 2 files changed, 63 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1450b5e -> 1fd54f4)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1450b5e [MINOR][DOCS] fix typo for docs,log message and comments add 1fd54f4 [SPARK-32662][ML] CountVectorizerModel: Remove requirement for minimum Vocab size No new revisions were added by this update. Summary of changes: .../apache/spark/ml/feature/CountVectorizer.scala | 5 +- .../spark/ml/feature/CountVectorizerSuite.scala| 74 +- 2 files changed, 63 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1450b5e -> 1fd54f4)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1450b5e [MINOR][DOCS] fix typo for docs,log message and comments add 1fd54f4 [SPARK-32662][ML] CountVectorizerModel: Remove requirement for minimum Vocab size No new revisions were added by this update. Summary of changes: .../apache/spark/ml/feature/CountVectorizer.scala | 5 +- .../spark/ml/feature/CountVectorizerSuite.scala| 74 +- 2 files changed, 63 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1450b5e -> 1fd54f4)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1450b5e [MINOR][DOCS] fix typo for docs,log message and comments add 1fd54f4 [SPARK-32662][ML] CountVectorizerModel: Remove requirement for minimum Vocab size No new revisions were added by this update. Summary of changes: .../apache/spark/ml/feature/CountVectorizer.scala | 5 +- .../spark/ml/feature/CountVectorizerSuite.scala| 74 +- 2 files changed, 63 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code 8aa644e is described below commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f Author: Louiszr AuthorDate: Sun Aug 23 21:10:52 2020 -0700 [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code ### What changes were proposed in this pull request? - Removed `foldCol` related code introduced in #29445 which is causing issues in the base branch. - Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` on the models instead of lists of models. ### Why are the changes needed? - `foldCol` is from 3.1 hence causing tests to fail. - `CrossValidatorModel.copy()` is supposed to shallow copy models not lists of models. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Existing tests created in #29445 ran and passed. - Updated `test_copy` to make sure `copy()` is called on models instead of lists of models. Closes #29524 from Louiszr/remove-foldcol-3.0. Authored-by: Louiszr Signed-off-by: Huaxin Gao --- python/pyspark/ml/tests/test_tuning.py | 11 --- python/pyspark/ml/tuning.py| 7 --- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py index b250740..b1acaf6 100644 --- a/python/pyspark/ml/tests/test_tuning.py +++ b/python/pyspark/ml/tests/test_tuning.py @@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase): lambda x: x.getEstimator().uid, # SPARK-32092: CrossValidator.copy() needs to copy all existing params lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getCollectSubModels(), lambda x: x.getParallelism(), lambda x: x.getSeed() @@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase): # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing params for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed() ]: self.assertEqual(param(cvModel), param(cvModelCopied)) @@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase): 'foo', "Changing the original avgMetrics should not affect the copied model" ) -cvModel.subModels[0] = 'foo' +cvModel.subModels[0][0].getInducedError = lambda: 'foo' self.assertNotEqual( -cvModelCopied.subModels[0], +cvModelCopied.subModels[0][0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) @@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase): loadedCvModel = CrossValidatorModel.load(cvModelPath) for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed(), lambda x: len(x.subModels) ]: @@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase): 'foo', "Changing the original validationMetrics should not affect the copied model" ) -tvsModel.subModels[0] = 'foo' +tvsModel.subModels[0].getInducedError = lambda: 'foo' self.assertNotEqual( -tvsModelCopied.subModels[0], +tvsModelCopied.subModels[0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py index 91f34ef..6283c8b 100644 --- a/python/pyspark/ml/tuning.py +++ b/python/pyspark/ml/tuning.py @@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): extra = dict() bestModel = self.bestModel.copy(extra) avgMetrics = list(self.avgMetrics) -subModels = [model.copy() for model in self.subModels] +subModels = [ +[sub_model.copy() for sub_model in fold_sub_models] +for fold_sub_models in self.subModels +] return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, subModels), extra=extra) @since("2.3.0") @@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): "estimator": estimator, "estimatorParamMaps": epms, "numFolds": java_stage.ge
[spark] branch branch-3.0 updated (da60de5 -> 8aa644e)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from da60de5 [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function add 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 11 --- python/pyspark/ml/tuning.py| 7 --- 2 files changed, 8 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code 8aa644e is described below commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f Author: Louiszr AuthorDate: Sun Aug 23 21:10:52 2020 -0700 [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code ### What changes were proposed in this pull request? - Removed `foldCol` related code introduced in #29445 which is causing issues in the base branch. - Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` on the models instead of lists of models. ### Why are the changes needed? - `foldCol` is from 3.1 hence causing tests to fail. - `CrossValidatorModel.copy()` is supposed to shallow copy models not lists of models. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Existing tests created in #29445 ran and passed. - Updated `test_copy` to make sure `copy()` is called on models instead of lists of models. Closes #29524 from Louiszr/remove-foldcol-3.0. Authored-by: Louiszr Signed-off-by: Huaxin Gao --- python/pyspark/ml/tests/test_tuning.py | 11 --- python/pyspark/ml/tuning.py| 7 --- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py index b250740..b1acaf6 100644 --- a/python/pyspark/ml/tests/test_tuning.py +++ b/python/pyspark/ml/tests/test_tuning.py @@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase): lambda x: x.getEstimator().uid, # SPARK-32092: CrossValidator.copy() needs to copy all existing params lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getCollectSubModels(), lambda x: x.getParallelism(), lambda x: x.getSeed() @@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase): # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing params for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed() ]: self.assertEqual(param(cvModel), param(cvModelCopied)) @@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase): 'foo', "Changing the original avgMetrics should not affect the copied model" ) -cvModel.subModels[0] = 'foo' +cvModel.subModels[0][0].getInducedError = lambda: 'foo' self.assertNotEqual( -cvModelCopied.subModels[0], +cvModelCopied.subModels[0][0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) @@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase): loadedCvModel = CrossValidatorModel.load(cvModelPath) for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed(), lambda x: len(x.subModels) ]: @@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase): 'foo', "Changing the original validationMetrics should not affect the copied model" ) -tvsModel.subModels[0] = 'foo' +tvsModel.subModels[0].getInducedError = lambda: 'foo' self.assertNotEqual( -tvsModelCopied.subModels[0], +tvsModelCopied.subModels[0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py index 91f34ef..6283c8b 100644 --- a/python/pyspark/ml/tuning.py +++ b/python/pyspark/ml/tuning.py @@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): extra = dict() bestModel = self.bestModel.copy(extra) avgMetrics = list(self.avgMetrics) -subModels = [model.copy() for model in self.subModels] +subModels = [ +[sub_model.copy() for sub_model in fold_sub_models] +for fold_sub_models in self.subModels +] return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, subModels), extra=extra) @since("2.3.0") @@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): "estimator": estimator, "estimatorParamMaps": epms, "numFolds": java_stage.ge
[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code 8aa644e is described below commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f Author: Louiszr AuthorDate: Sun Aug 23 21:10:52 2020 -0700 [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code ### What changes were proposed in this pull request? - Removed `foldCol` related code introduced in #29445 which is causing issues in the base branch. - Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` on the models instead of lists of models. ### Why are the changes needed? - `foldCol` is from 3.1 hence causing tests to fail. - `CrossValidatorModel.copy()` is supposed to shallow copy models not lists of models. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Existing tests created in #29445 ran and passed. - Updated `test_copy` to make sure `copy()` is called on models instead of lists of models. Closes #29524 from Louiszr/remove-foldcol-3.0. Authored-by: Louiszr Signed-off-by: Huaxin Gao --- python/pyspark/ml/tests/test_tuning.py | 11 --- python/pyspark/ml/tuning.py| 7 --- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py index b250740..b1acaf6 100644 --- a/python/pyspark/ml/tests/test_tuning.py +++ b/python/pyspark/ml/tests/test_tuning.py @@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase): lambda x: x.getEstimator().uid, # SPARK-32092: CrossValidator.copy() needs to copy all existing params lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getCollectSubModels(), lambda x: x.getParallelism(), lambda x: x.getSeed() @@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase): # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing params for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed() ]: self.assertEqual(param(cvModel), param(cvModelCopied)) @@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase): 'foo', "Changing the original avgMetrics should not affect the copied model" ) -cvModel.subModels[0] = 'foo' +cvModel.subModels[0][0].getInducedError = lambda: 'foo' self.assertNotEqual( -cvModelCopied.subModels[0], +cvModelCopied.subModels[0][0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) @@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase): loadedCvModel = CrossValidatorModel.load(cvModelPath) for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed(), lambda x: len(x.subModels) ]: @@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase): 'foo', "Changing the original validationMetrics should not affect the copied model" ) -tvsModel.subModels[0] = 'foo' +tvsModel.subModels[0].getInducedError = lambda: 'foo' self.assertNotEqual( -tvsModelCopied.subModels[0], +tvsModelCopied.subModels[0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py index 91f34ef..6283c8b 100644 --- a/python/pyspark/ml/tuning.py +++ b/python/pyspark/ml/tuning.py @@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): extra = dict() bestModel = self.bestModel.copy(extra) avgMetrics = list(self.avgMetrics) -subModels = [model.copy() for model in self.subModels] +subModels = [ +[sub_model.copy() for sub_model in fold_sub_models] +for fold_sub_models in self.subModels +] return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, subModels), extra=extra) @since("2.3.0") @@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): "estimator": estimator, "estimatorParamMaps": epms, "numFolds": java_stage.ge
[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code 8aa644e is described below commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f Author: Louiszr AuthorDate: Sun Aug 23 21:10:52 2020 -0700 [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code ### What changes were proposed in this pull request? - Removed `foldCol` related code introduced in #29445 which is causing issues in the base branch. - Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` on the models instead of lists of models. ### Why are the changes needed? - `foldCol` is from 3.1 hence causing tests to fail. - `CrossValidatorModel.copy()` is supposed to shallow copy models not lists of models. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Existing tests created in #29445 ran and passed. - Updated `test_copy` to make sure `copy()` is called on models instead of lists of models. Closes #29524 from Louiszr/remove-foldcol-3.0. Authored-by: Louiszr Signed-off-by: Huaxin Gao --- python/pyspark/ml/tests/test_tuning.py | 11 --- python/pyspark/ml/tuning.py| 7 --- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py index b250740..b1acaf6 100644 --- a/python/pyspark/ml/tests/test_tuning.py +++ b/python/pyspark/ml/tests/test_tuning.py @@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase): lambda x: x.getEstimator().uid, # SPARK-32092: CrossValidator.copy() needs to copy all existing params lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getCollectSubModels(), lambda x: x.getParallelism(), lambda x: x.getSeed() @@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase): # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing params for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed() ]: self.assertEqual(param(cvModel), param(cvModelCopied)) @@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase): 'foo', "Changing the original avgMetrics should not affect the copied model" ) -cvModel.subModels[0] = 'foo' +cvModel.subModels[0][0].getInducedError = lambda: 'foo' self.assertNotEqual( -cvModelCopied.subModels[0], +cvModelCopied.subModels[0][0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) @@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase): loadedCvModel = CrossValidatorModel.load(cvModelPath) for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed(), lambda x: len(x.subModels) ]: @@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase): 'foo', "Changing the original validationMetrics should not affect the copied model" ) -tvsModel.subModels[0] = 'foo' +tvsModel.subModels[0].getInducedError = lambda: 'foo' self.assertNotEqual( -tvsModelCopied.subModels[0], +tvsModelCopied.subModels[0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py index 91f34ef..6283c8b 100644 --- a/python/pyspark/ml/tuning.py +++ b/python/pyspark/ml/tuning.py @@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): extra = dict() bestModel = self.bestModel.copy(extra) avgMetrics = list(self.avgMetrics) -subModels = [model.copy() for model in self.subModels] +subModels = [ +[sub_model.copy() for sub_model in fold_sub_models] +for fold_sub_models in self.subModels +] return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, subModels), extra=extra) @since("2.3.0") @@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): "estimator": estimator, "estimatorParamMaps": epms, "numFolds": java_stage.ge
[spark] branch branch-3.0 updated (4a67f1e -> 007acba)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++ 4 files changed, 60 insertions(+), 86 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans 007acba is described below commit 007acba6e3b0e45e334bed5942692dd88c61b3ea Author: Huaxin Gao AuthorDate: Mon Aug 24 08:47:01 2020 -0700 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans ### What changes were proposed in this pull request? backporting https://github.com/apache/spark/pull/29501 ### Why are the changes needed? avoid double caching ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Existing tests Closes #29528 from huaxingao/kmeans_3.0. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao --- .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++ 4 files changed, 60 insertions(+), 86 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala index b649b1d..b3f2d22 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala @@ -28,9 +28,8 @@ import org.apache.spark.ml.util._ import org.apache.spark.ml.util.Instrumentation.instrumented import org.apache.spark.mllib.clustering.{BisectingKMeans => MLlibBisectingKMeans, BisectingKMeansModel => MLlibBisectingKMeansModel} -import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => OldVectors} +import org.apache.spark.mllib.linalg.{Vectors => OldVectors} import org.apache.spark.mllib.linalg.VectorImplicits._ -import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, Dataset, Row} import org.apache.spark.sql.functions._ import org.apache.spark.sql.types.{DoubleType, IntegerType, StructType} @@ -275,21 +274,6 @@ class BisectingKMeans @Since("2.0.0") ( override def fit(dataset: Dataset[_]): BisectingKMeansModel = instrumented { instr => transformSchema(dataset.schema, logging = true) -val handlePersistence = dataset.storageLevel == StorageLevel.NONE -val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) { - col($(weightCol)).cast(DoubleType) -} else { - lit(1.0) -} - -val instances: RDD[(OldVector, Double)] = dataset - .select(DatasetUtils.columnToVector(dataset, getFeaturesCol), w).rdd.map { - case Row(point: Vector, weight: Double) => (OldVectors.fromML(point), weight) -} -if (handlePersistence) { - instances.persist(StorageLevel.MEMORY_AND_DISK) -} - instr.logPipelineStage(this) instr.logDataset(dataset) instr.logParams(this, featuresCol, predictionCol, k, maxIter, seed, @@ -301,11 +285,18 @@ class BisectingKMeans @Since("2.0.0") ( .setMinDivisibleClusterSize($(minDivisibleClusterSize)) .setSeed($(seed)) .setDistanceMeasure($(distanceMeasure)) -val parentModel = bkm.runWithWeight(instances, Some(instr)) -val model = copyValues(new BisectingKMeansModel(uid, parentModel).setParent(this)) -if (handlePersistence) { - instances.unpersist() + +val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) { + col($(weightCol)).cast(DoubleType) +} else { + lit(1.0) } +val instances = dataset.select(DatasetUtils.columnToVector(dataset, getFeaturesCol), w) + .rdd.map { case Row(point: Vector, weight: Double) => (OldVectors.fromML(point), weight) } + +val handlePersistence = dataset.storageLevel == StorageLevel.NONE +val parentModel = bkm.runWithWeight(instances, handlePersistence, Some(instr)) +val model = copyValues(new BisectingKMeansModel(uid, parentModel).setParent(this)) val summary = new BisectingKMeansSummary( model.transform(dataset), diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala index 5370318..e182f3d 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala @@ -31,7 +31,6 @@ import org.apache.spark.ml.util.Instrumentation.instrumented import org.apache.spark.mllib.clustering.{DistanceMeasure, KMeans => MLlibKMeans, KMeansModel => MLlibKMeansModel} import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => OldVectors} import org.apache.spark
[spark] branch branch-3.0 updated (4a67f1e -> 007acba)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++ 4 files changed, 60 insertions(+), 86 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (4a67f1e -> 007acba)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++ 4 files changed, 60 insertions(+), 86 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (4a67f1e -> 007acba)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++ 4 files changed, 60 insertions(+), 86 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf22d94 -> b05f309)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf22d94 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature add b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 100 - .../apache/spark/ml/regression/FMRegressor.scala | 10 +-- .../spark/mllib/optimization/GradientDescent.scala | 45 ++ .../apache/spark/mllib/optimization/LBFGS.scala| 11 ++- .../ml/classification/FMClassifierSuite.scala | 26 ++ python/pyspark/ml/classification.py| 48 +- python/pyspark/ml/tests/test_training_summary.py | 49 +- 7 files changed, 257 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf22d94 -> b05f309)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf22d94 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature add b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 100 - .../apache/spark/ml/regression/FMRegressor.scala | 10 +-- .../spark/mllib/optimization/GradientDescent.scala | 45 ++ .../apache/spark/mllib/optimization/LBFGS.scala| 11 ++- .../ml/classification/FMClassifierSuite.scala | 26 ++ python/pyspark/ml/classification.py| 48 +- python/pyspark/ml/tests/test_training_summary.py | 49 +- 7 files changed, 257 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf22d94 -> b05f309)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf22d94 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature add b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 100 - .../apache/spark/ml/regression/FMRegressor.scala | 10 +-- .../spark/mllib/optimization/GradientDescent.scala | 45 ++ .../apache/spark/mllib/optimization/LBFGS.scala| 11 ++- .../ml/classification/FMClassifierSuite.scala | 26 ++ python/pyspark/ml/classification.py| 48 +- python/pyspark/ml/tests/test_training_summary.py | 49 +- 7 files changed, 257 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf22d94 -> b05f309)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf22d94 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature add b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 100 - .../apache/spark/ml/regression/FMRegressor.scala | 10 +-- .../spark/mllib/optimization/GradientDescent.scala | 45 ++ .../apache/spark/mllib/optimization/LBFGS.scala| 11 ++- .../ml/classification/FMClassifierSuite.scala | 26 ++ python/pyspark/ml/classification.py| 48 +- python/pyspark/ml/tests/test_training_summary.py | 49 +- 7 files changed, 257 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c6109ba -> bc78859)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c6109ba [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command add bc78859 [SPARK-32310][ML][PYSPARK] ML params default value parity in feature and tuning No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/feature/Imputer.scala | 4 +- .../org/apache/spark/ml/feature/MinMaxScaler.scala | 4 +- .../apache/spark/ml/feature/OneHotEncoder.scala| 5 +- .../spark/ml/feature/QuantileDiscretizer.scala | 4 +- .../org/apache/spark/ml/feature/RFormula.scala | 6 +- .../org/apache/spark/ml/feature/RobustScaler.scala | 8 +- .../org/apache/spark/ml/feature/Selector.scala | 8 +- .../apache/spark/ml/feature/StringIndexer.scala| 6 +- .../apache/spark/ml/feature/VectorIndexer.scala| 6 +- .../org/apache/spark/ml/feature/VectorSlicer.scala | 6 +- .../org/apache/spark/ml/feature/Word2Vec.scala | 9 +- .../org/apache/spark/ml/tree/treeParams.scala | 16 +-- .../apache/spark/ml/tuning/CrossValidator.scala| 4 +- .../spark/ml/util/DefaultReadWriteTest.scala | 3 + python/pyspark/ml/classification.py| 56 +++--- python/pyspark/ml/clustering.py| 30 -- python/pyspark/ml/feature.py | 120 + python/pyspark/ml/fpm.py | 9 +- python/pyspark/ml/recommendation.py| 20 ++-- python/pyspark/ml/regression.py| 60 +++ python/pyspark/ml/tests/test_param.py | 8 +- python/pyspark/ml/tuning.py| 17 ++- 22 files changed, 274 insertions(+), 135 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c6109ba -> bc78859)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c6109ba [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command add bc78859 [SPARK-32310][ML][PYSPARK] ML params default value parity in feature and tuning No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/feature/Imputer.scala | 4 +- .../org/apache/spark/ml/feature/MinMaxScaler.scala | 4 +- .../apache/spark/ml/feature/OneHotEncoder.scala| 5 +- .../spark/ml/feature/QuantileDiscretizer.scala | 4 +- .../org/apache/spark/ml/feature/RFormula.scala | 6 +- .../org/apache/spark/ml/feature/RobustScaler.scala | 8 +- .../org/apache/spark/ml/feature/Selector.scala | 8 +- .../apache/spark/ml/feature/StringIndexer.scala| 6 +- .../apache/spark/ml/feature/VectorIndexer.scala| 6 +- .../org/apache/spark/ml/feature/VectorSlicer.scala | 6 +- .../org/apache/spark/ml/feature/Word2Vec.scala | 9 +- .../org/apache/spark/ml/tree/treeParams.scala | 16 +-- .../apache/spark/ml/tuning/CrossValidator.scala| 4 +- .../spark/ml/util/DefaultReadWriteTest.scala | 3 + python/pyspark/ml/classification.py| 56 +++--- python/pyspark/ml/clustering.py| 30 -- python/pyspark/ml/feature.py | 120 + python/pyspark/ml/fpm.py | 9 +- python/pyspark/ml/recommendation.py| 20 ++-- python/pyspark/ml/regression.py| 60 +++ python/pyspark/ml/tests/test_param.py | 8 +- python/pyspark/ml/tuning.py| 17 ++- 22 files changed, 274 insertions(+), 135 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c6109ba -> bc78859)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c6109ba [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command add bc78859 [SPARK-32310][ML][PYSPARK] ML params default value parity in feature and tuning No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/feature/Imputer.scala | 4 +- .../org/apache/spark/ml/feature/MinMaxScaler.scala | 4 +- .../apache/spark/ml/feature/OneHotEncoder.scala| 5 +- .../spark/ml/feature/QuantileDiscretizer.scala | 4 +- .../org/apache/spark/ml/feature/RFormula.scala | 6 +- .../org/apache/spark/ml/feature/RobustScaler.scala | 8 +- .../org/apache/spark/ml/feature/Selector.scala | 8 +- .../apache/spark/ml/feature/StringIndexer.scala| 6 +- .../apache/spark/ml/feature/VectorIndexer.scala| 6 +- .../org/apache/spark/ml/feature/VectorSlicer.scala | 6 +- .../org/apache/spark/ml/feature/Word2Vec.scala | 9 +- .../org/apache/spark/ml/tree/treeParams.scala | 16 +-- .../apache/spark/ml/tuning/CrossValidator.scala| 4 +- .../spark/ml/util/DefaultReadWriteTest.scala | 3 + python/pyspark/ml/classification.py| 56 +++--- python/pyspark/ml/clustering.py| 30 -- python/pyspark/ml/feature.py | 120 + python/pyspark/ml/fpm.py | 9 +- python/pyspark/ml/recommendation.py| 20 ++-- python/pyspark/ml/regression.py| 60 +++ python/pyspark/ml/tests/test_param.py | 8 +- python/pyspark/ml/tuning.py| 17 ++- 22 files changed, 274 insertions(+), 135 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c6109ba -> bc78859)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c6109ba [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command add bc78859 [SPARK-32310][ML][PYSPARK] ML params default value parity in feature and tuning No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/feature/Imputer.scala | 4 +- .../org/apache/spark/ml/feature/MinMaxScaler.scala | 4 +- .../apache/spark/ml/feature/OneHotEncoder.scala| 5 +- .../spark/ml/feature/QuantileDiscretizer.scala | 4 +- .../org/apache/spark/ml/feature/RFormula.scala | 6 +- .../org/apache/spark/ml/feature/RobustScaler.scala | 8 +- .../org/apache/spark/ml/feature/Selector.scala | 8 +- .../apache/spark/ml/feature/StringIndexer.scala| 6 +- .../apache/spark/ml/feature/VectorIndexer.scala| 6 +- .../org/apache/spark/ml/feature/VectorSlicer.scala | 6 +- .../org/apache/spark/ml/feature/Word2Vec.scala | 9 +- .../org/apache/spark/ml/tree/treeParams.scala | 16 +-- .../apache/spark/ml/tuning/CrossValidator.scala| 4 +- .../spark/ml/util/DefaultReadWriteTest.scala | 3 + python/pyspark/ml/classification.py| 56 +++--- python/pyspark/ml/clustering.py| 30 -- python/pyspark/ml/feature.py | 120 + python/pyspark/ml/fpm.py | 9 +- python/pyspark/ml/recommendation.py| 20 ++-- python/pyspark/ml/regression.py| 60 +++ python/pyspark/ml/tests/test_param.py | 8 +- python/pyspark/ml/tuning.py| 17 ++- 22 files changed, 274 insertions(+), 135 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c6109ba -> bc78859)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c6109ba [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command add bc78859 [SPARK-32310][ML][PYSPARK] ML params default value parity in feature and tuning No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/feature/Imputer.scala | 4 +- .../org/apache/spark/ml/feature/MinMaxScaler.scala | 4 +- .../apache/spark/ml/feature/OneHotEncoder.scala| 5 +- .../spark/ml/feature/QuantileDiscretizer.scala | 4 +- .../org/apache/spark/ml/feature/RFormula.scala | 6 +- .../org/apache/spark/ml/feature/RobustScaler.scala | 8 +- .../org/apache/spark/ml/feature/Selector.scala | 8 +- .../apache/spark/ml/feature/StringIndexer.scala| 6 +- .../apache/spark/ml/feature/VectorIndexer.scala| 6 +- .../org/apache/spark/ml/feature/VectorSlicer.scala | 6 +- .../org/apache/spark/ml/feature/Word2Vec.scala | 9 +- .../org/apache/spark/ml/tree/treeParams.scala | 16 +-- .../apache/spark/ml/tuning/CrossValidator.scala| 4 +- .../spark/ml/util/DefaultReadWriteTest.scala | 3 + python/pyspark/ml/classification.py| 56 +++--- python/pyspark/ml/clustering.py| 30 -- python/pyspark/ml/feature.py | 120 + python/pyspark/ml/fpm.py | 9 +- python/pyspark/ml/recommendation.py| 20 ++-- python/pyspark/ml/regression.py| 60 +++ python/pyspark/ml/tests/test_param.py | 8 +- python/pyspark/ml/tuning.py| 17 ++- 22 files changed, 274 insertions(+), 135 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (89d9b7c -> 81b0785)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode add 81b0785 [SPARK-32455][ML] LogisticRegressionModel prediction optimization No new revisions were added by this update. Summary of changes: .../ml/classification/LogisticRegression.scala | 89 -- 1 file changed, 49 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (89d9b7c -> 81b0785)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode add 81b0785 [SPARK-32455][ML] LogisticRegressionModel prediction optimization No new revisions were added by this update. Summary of changes: .../ml/classification/LogisticRegression.scala | 89 -- 1 file changed, 49 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (89d9b7c -> 81b0785)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode add 81b0785 [SPARK-32455][ML] LogisticRegressionModel prediction optimization No new revisions were added by this update. Summary of changes: .../ml/classification/LogisticRegression.scala | 89 -- 1 file changed, 49 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (89d9b7c -> 81b0785)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode add 81b0785 [SPARK-32455][ML] LogisticRegressionModel prediction optimization No new revisions were added by this update. Summary of changes: .../ml/classification/LogisticRegression.scala | 89 -- 1 file changed, 49 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (89d9b7c -> 81b0785)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode add 81b0785 [SPARK-32455][ML] LogisticRegressionModel prediction optimization No new revisions were added by this update. Summary of changes: .../ml/classification/LogisticRegression.scala | 89 -- 1 file changed, 49 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 30c3a50 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests 30c3a50 is described below commit 30c3a502667bfa1feaf2230b4fc4cc2d36d9b85a Author: Huaxin Gao AuthorDate: Thu Aug 6 13:54:15 2020 -0700 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests ### What changes were proposed in this pull request? The test creates 10 batches of data to train the model and expects to see error on test data improves as model is trained. If the difference between the 2nd error and the 10th error is smaller than 2, the assertion fails: ``` FAIL: test_train_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests) Test that error on test data improves as model is trained. -- Traceback (most recent call last): File "/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 466, in test_train_prediction eventually(condition, timeout=180.0) File "/home/runner/work/spark/spark/python/pyspark/testing/utils.py", line 81, in eventually lastValue = condition() File "/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 461, in condition self.assertGreater(errors[1] - errors[-1], 2) AssertionError: 1.672640157855923 not greater than 2 ``` I saw this quite a few time on Jenkins but was not able to reproduce this on my local. These are the ten errors I got: ``` 4.517395047937127 4.894265404350079 3.0392090466559876 1.8786361640757654 0.8973106042078115 0.3715780507684368 0.20815690742907672 0.17333033743125845 0.15686783249863873 0.12584413600569616 ``` I am thinking of having 15 batches of data instead of 10, so the model can be trained for a longer time. Hopefully the 15th error - 2nd error will always be larger than 2 on Jenkins. These are the 15 errors I got on my local: ``` 4.517395047937127 4.894265404350079 3.0392090466559876 1.8786361640757658 0.8973106042078115 0.3715780507684368 0.20815690742907672 0.17333033743125845 0.15686783249863873 0.12584413600569616 0.11883853835108477 0.09400261862100823 0.08887491447353497 0.05984929624986607 0.07583948141520978 ``` ### Why are the changes needed? Fix flaky test ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually tested Closes #29380 from huaxingao/flaky_test. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao (cherry picked from commit 75c2c53e931187912a92e0b52dae0f772fa970e3) Signed-off-by: Huaxin Gao --- python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/mllib/tests/test_streaming_algorithms.py b/python/pyspark/mllib/tests/test_streaming_algorithms.py index 2f35e07..5818a7c 100644 --- a/python/pyspark/mllib/tests/test_streaming_algorithms.py +++ b/python/pyspark/mllib/tests/test_streaming_algorithms.py @@ -434,9 +434,9 @@ class StreamingLinearRegressionWithTests(MLLibStreamingTestCase): slr = StreamingLinearRegressionWithSGD(stepSize=0.2, numIterations=25) slr.setInitialWeights([0.0]) -# Create ten batches with 100 sample points in each. +# Create fifteen batches with 100 sample points in each. batches = [] -for i in range(10): +for i in range(15): batch = LinearDataGenerator.generateLinearInput( 0.0, [10.0], [0.0], [1.0 / 3.0], 100, 42 + i, 0.1) batches.append(self.sc.parallelize(batch)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 30c3a50 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests 30c3a50 is described below commit 30c3a502667bfa1feaf2230b4fc4cc2d36d9b85a Author: Huaxin Gao AuthorDate: Thu Aug 6 13:54:15 2020 -0700 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests ### What changes were proposed in this pull request? The test creates 10 batches of data to train the model and expects to see error on test data improves as model is trained. If the difference between the 2nd error and the 10th error is smaller than 2, the assertion fails: ``` FAIL: test_train_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests) Test that error on test data improves as model is trained. -- Traceback (most recent call last): File "/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 466, in test_train_prediction eventually(condition, timeout=180.0) File "/home/runner/work/spark/spark/python/pyspark/testing/utils.py", line 81, in eventually lastValue = condition() File "/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 461, in condition self.assertGreater(errors[1] - errors[-1], 2) AssertionError: 1.672640157855923 not greater than 2 ``` I saw this quite a few time on Jenkins but was not able to reproduce this on my local. These are the ten errors I got: ``` 4.517395047937127 4.894265404350079 3.0392090466559876 1.8786361640757654 0.8973106042078115 0.3715780507684368 0.20815690742907672 0.17333033743125845 0.15686783249863873 0.12584413600569616 ``` I am thinking of having 15 batches of data instead of 10, so the model can be trained for a longer time. Hopefully the 15th error - 2nd error will always be larger than 2 on Jenkins. These are the 15 errors I got on my local: ``` 4.517395047937127 4.894265404350079 3.0392090466559876 1.8786361640757658 0.8973106042078115 0.3715780507684368 0.20815690742907672 0.17333033743125845 0.15686783249863873 0.12584413600569616 0.11883853835108477 0.09400261862100823 0.08887491447353497 0.05984929624986607 0.07583948141520978 ``` ### Why are the changes needed? Fix flaky test ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually tested Closes #29380 from huaxingao/flaky_test. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao (cherry picked from commit 75c2c53e931187912a92e0b52dae0f772fa970e3) Signed-off-by: Huaxin Gao --- python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/mllib/tests/test_streaming_algorithms.py b/python/pyspark/mllib/tests/test_streaming_algorithms.py index 2f35e07..5818a7c 100644 --- a/python/pyspark/mllib/tests/test_streaming_algorithms.py +++ b/python/pyspark/mllib/tests/test_streaming_algorithms.py @@ -434,9 +434,9 @@ class StreamingLinearRegressionWithTests(MLLibStreamingTestCase): slr = StreamingLinearRegressionWithSGD(stepSize=0.2, numIterations=25) slr.setInitialWeights([0.0]) -# Create ten batches with 100 sample points in each. +# Create fifteen batches with 100 sample points in each. batches = [] -for i in range(10): +for i in range(15): batch = LinearDataGenerator.generateLinearInput( 0.0, [10.0], [0.0], [1.0 / 3.0], 100, 42 + i, 0.1) batches.append(self.sc.parallelize(batch)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6664e28 -> 75c2c53)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6664e28 [SPARK-32546][SQL][FOLLOWUP] Add `.toSeq` to `tableNames` in `HiveClientImpl.listTablesByType` add 75c2c53 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 30c3a50 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests 30c3a50 is described below commit 30c3a502667bfa1feaf2230b4fc4cc2d36d9b85a Author: Huaxin Gao AuthorDate: Thu Aug 6 13:54:15 2020 -0700 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests ### What changes were proposed in this pull request? The test creates 10 batches of data to train the model and expects to see error on test data improves as model is trained. If the difference between the 2nd error and the 10th error is smaller than 2, the assertion fails: ``` FAIL: test_train_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests) Test that error on test data improves as model is trained. -- Traceback (most recent call last): File "/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 466, in test_train_prediction eventually(condition, timeout=180.0) File "/home/runner/work/spark/spark/python/pyspark/testing/utils.py", line 81, in eventually lastValue = condition() File "/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 461, in condition self.assertGreater(errors[1] - errors[-1], 2) AssertionError: 1.672640157855923 not greater than 2 ``` I saw this quite a few time on Jenkins but was not able to reproduce this on my local. These are the ten errors I got: ``` 4.517395047937127 4.894265404350079 3.0392090466559876 1.8786361640757654 0.8973106042078115 0.3715780507684368 0.20815690742907672 0.17333033743125845 0.15686783249863873 0.12584413600569616 ``` I am thinking of having 15 batches of data instead of 10, so the model can be trained for a longer time. Hopefully the 15th error - 2nd error will always be larger than 2 on Jenkins. These are the 15 errors I got on my local: ``` 4.517395047937127 4.894265404350079 3.0392090466559876 1.8786361640757658 0.8973106042078115 0.3715780507684368 0.20815690742907672 0.17333033743125845 0.15686783249863873 0.12584413600569616 0.11883853835108477 0.09400261862100823 0.08887491447353497 0.05984929624986607 0.07583948141520978 ``` ### Why are the changes needed? Fix flaky test ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually tested Closes #29380 from huaxingao/flaky_test. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao (cherry picked from commit 75c2c53e931187912a92e0b52dae0f772fa970e3) Signed-off-by: Huaxin Gao --- python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/mllib/tests/test_streaming_algorithms.py b/python/pyspark/mllib/tests/test_streaming_algorithms.py index 2f35e07..5818a7c 100644 --- a/python/pyspark/mllib/tests/test_streaming_algorithms.py +++ b/python/pyspark/mllib/tests/test_streaming_algorithms.py @@ -434,9 +434,9 @@ class StreamingLinearRegressionWithTests(MLLibStreamingTestCase): slr = StreamingLinearRegressionWithSGD(stepSize=0.2, numIterations=25) slr.setInitialWeights([0.0]) -# Create ten batches with 100 sample points in each. +# Create fifteen batches with 100 sample points in each. batches = [] -for i in range(10): +for i in range(15): batch = LinearDataGenerator.generateLinearInput( 0.0, [10.0], [0.0], [1.0 / 3.0], 100, 42 + i, 0.1) batches.append(self.sc.parallelize(batch)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 30c3a50 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests 30c3a50 is described below commit 30c3a502667bfa1feaf2230b4fc4cc2d36d9b85a Author: Huaxin Gao AuthorDate: Thu Aug 6 13:54:15 2020 -0700 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests ### What changes were proposed in this pull request? The test creates 10 batches of data to train the model and expects to see error on test data improves as model is trained. If the difference between the 2nd error and the 10th error is smaller than 2, the assertion fails: ``` FAIL: test_train_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests) Test that error on test data improves as model is trained. -- Traceback (most recent call last): File "/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 466, in test_train_prediction eventually(condition, timeout=180.0) File "/home/runner/work/spark/spark/python/pyspark/testing/utils.py", line 81, in eventually lastValue = condition() File "/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 461, in condition self.assertGreater(errors[1] - errors[-1], 2) AssertionError: 1.672640157855923 not greater than 2 ``` I saw this quite a few time on Jenkins but was not able to reproduce this on my local. These are the ten errors I got: ``` 4.517395047937127 4.894265404350079 3.0392090466559876 1.8786361640757654 0.8973106042078115 0.3715780507684368 0.20815690742907672 0.17333033743125845 0.15686783249863873 0.12584413600569616 ``` I am thinking of having 15 batches of data instead of 10, so the model can be trained for a longer time. Hopefully the 15th error - 2nd error will always be larger than 2 on Jenkins. These are the 15 errors I got on my local: ``` 4.517395047937127 4.894265404350079 3.0392090466559876 1.8786361640757658 0.8973106042078115 0.3715780507684368 0.20815690742907672 0.17333033743125845 0.15686783249863873 0.12584413600569616 0.11883853835108477 0.09400261862100823 0.08887491447353497 0.05984929624986607 0.07583948141520978 ``` ### Why are the changes needed? Fix flaky test ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually tested Closes #29380 from huaxingao/flaky_test. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao (cherry picked from commit 75c2c53e931187912a92e0b52dae0f772fa970e3) Signed-off-by: Huaxin Gao --- python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/mllib/tests/test_streaming_algorithms.py b/python/pyspark/mllib/tests/test_streaming_algorithms.py index 2f35e07..5818a7c 100644 --- a/python/pyspark/mllib/tests/test_streaming_algorithms.py +++ b/python/pyspark/mllib/tests/test_streaming_algorithms.py @@ -434,9 +434,9 @@ class StreamingLinearRegressionWithTests(MLLibStreamingTestCase): slr = StreamingLinearRegressionWithSGD(stepSize=0.2, numIterations=25) slr.setInitialWeights([0.0]) -# Create ten batches with 100 sample points in each. +# Create fifteen batches with 100 sample points in each. batches = [] -for i in range(10): +for i in range(15): batch = LinearDataGenerator.generateLinearInput( 0.0, [10.0], [0.0], [1.0 / 3.0], 100, 42 + i, 0.1) batches.append(self.sc.parallelize(batch)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6664e28 -> 75c2c53)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6664e28 [SPARK-32546][SQL][FOLLOWUP] Add `.toSeq` to `tableNames` in `HiveClientImpl.listTablesByType` add 75c2c53 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6664e28 -> 75c2c53)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6664e28 [SPARK-32546][SQL][FOLLOWUP] Add `.toSeq` to `tableNames` in `HiveClientImpl.listTablesByType` add 75c2c53 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6664e28 -> 75c2c53)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6664e28 [SPARK-32546][SQL][FOLLOWUP] Add `.toSeq` to `tableNames` in `HiveClientImpl.listTablesByType` add 75c2c53 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6664e28 -> 75c2c53)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6664e28 [SPARK-32546][SQL][FOLLOWUP] Add `.toSeq` to `tableNames` in `HiveClientImpl.listTablesByType` add 75c2c53 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 30c3a50 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests 30c3a50 is described below commit 30c3a502667bfa1feaf2230b4fc4cc2d36d9b85a Author: Huaxin Gao AuthorDate: Thu Aug 6 13:54:15 2020 -0700 [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests ### What changes were proposed in this pull request? The test creates 10 batches of data to train the model and expects to see error on test data improves as model is trained. If the difference between the 2nd error and the 10th error is smaller than 2, the assertion fails: ``` FAIL: test_train_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests) Test that error on test data improves as model is trained. -- Traceback (most recent call last): File "/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 466, in test_train_prediction eventually(condition, timeout=180.0) File "/home/runner/work/spark/spark/python/pyspark/testing/utils.py", line 81, in eventually lastValue = condition() File "/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 461, in condition self.assertGreater(errors[1] - errors[-1], 2) AssertionError: 1.672640157855923 not greater than 2 ``` I saw this quite a few time on Jenkins but was not able to reproduce this on my local. These are the ten errors I got: ``` 4.517395047937127 4.894265404350079 3.0392090466559876 1.8786361640757654 0.8973106042078115 0.3715780507684368 0.20815690742907672 0.17333033743125845 0.15686783249863873 0.12584413600569616 ``` I am thinking of having 15 batches of data instead of 10, so the model can be trained for a longer time. Hopefully the 15th error - 2nd error will always be larger than 2 on Jenkins. These are the 15 errors I got on my local: ``` 4.517395047937127 4.894265404350079 3.0392090466559876 1.8786361640757658 0.8973106042078115 0.3715780507684368 0.20815690742907672 0.17333033743125845 0.15686783249863873 0.12584413600569616 0.11883853835108477 0.09400261862100823 0.08887491447353497 0.05984929624986607 0.07583948141520978 ``` ### Why are the changes needed? Fix flaky test ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually tested Closes #29380 from huaxingao/flaky_test. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao (cherry picked from commit 75c2c53e931187912a92e0b52dae0f772fa970e3) Signed-off-by: Huaxin Gao --- python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/mllib/tests/test_streaming_algorithms.py b/python/pyspark/mllib/tests/test_streaming_algorithms.py index 2f35e07..5818a7c 100644 --- a/python/pyspark/mllib/tests/test_streaming_algorithms.py +++ b/python/pyspark/mllib/tests/test_streaming_algorithms.py @@ -434,9 +434,9 @@ class StreamingLinearRegressionWithTests(MLLibStreamingTestCase): slr = StreamingLinearRegressionWithSGD(stepSize=0.2, numIterations=25) slr.setInitialWeights([0.0]) -# Create ten batches with 100 sample points in each. +# Create fifteen batches with 100 sample points in each. batches = [] -for i in range(10): +for i in range(15): batch = LinearDataGenerator.generateLinearInput( 0.0, [10.0], [0.0], [1.0 / 3.0], 100, 42 + i, 0.1) batches.append(self.sc.parallelize(batch)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (75d3428 -> 8d5c094)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 75d3428 [SPARK-32209][SQL] Re-use GetTimestamp in ParseToDate add 8d5c094 [SPARK-32164][ML] GeneralizedLinearRegressionSummary optimization No new revisions were added by this update. Summary of changes: .../regression/GeneralizedLinearRegression.scala | 50 -- .../spark/ml/regression/LinearRegression.scala | 2 +- .../spark/mllib/evaluation/RegressionMetrics.scala | 2 + 3 files changed, 40 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (75d3428 -> 8d5c094)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 75d3428 [SPARK-32209][SQL] Re-use GetTimestamp in ParseToDate add 8d5c094 [SPARK-32164][ML] GeneralizedLinearRegressionSummary optimization No new revisions were added by this update. Summary of changes: .../regression/GeneralizedLinearRegression.scala | 50 -- .../spark/ml/regression/LinearRegression.scala | 2 +- .../spark/mllib/evaluation/RegressionMetrics.scala | 2 + 3 files changed, 40 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (75d3428 -> 8d5c094)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 75d3428 [SPARK-32209][SQL] Re-use GetTimestamp in ParseToDate add 8d5c094 [SPARK-32164][ML] GeneralizedLinearRegressionSummary optimization No new revisions were added by this update. Summary of changes: .../regression/GeneralizedLinearRegression.scala | 50 -- .../spark/ml/regression/LinearRegression.scala | 2 +- .../spark/mllib/evaluation/RegressionMetrics.scala | 2 + 3 files changed, 40 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf22d94 -> b05f309)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf22d94 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature add b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 100 - .../apache/spark/ml/regression/FMRegressor.scala | 10 +-- .../spark/mllib/optimization/GradientDescent.scala | 45 ++ .../apache/spark/mllib/optimization/LBFGS.scala| 11 ++- .../ml/classification/FMClassifierSuite.scala | 26 ++ python/pyspark/ml/classification.py| 48 +- python/pyspark/ml/tests/test_training_summary.py | 49 +- 7 files changed, 257 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Add Huaxin Gao to committers.md
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 18d7e21 Add Huaxin Gao to committers.md 18d7e21 is described below commit 18d7e2103f9713adc09d69b65ebd4a48107c88f0 Author: Huaxin Gao AuthorDate: Thu Jul 2 19:38:42 2020 -0700 Add Huaxin Gao to committers.md Author: Huaxin Gao Closes #278 from huaxingao/asf-site. --- committers.md| 1 + site/committers.html | 4 2 files changed, 5 insertions(+) diff --git a/committers.md b/committers.md index 42b89d4..77e768d 100644 --- a/committers.md +++ b/committers.md @@ -26,6 +26,7 @@ navigation: |Erik Erlandson|Red Hat| |Robert Evans|NVIDIA| |Wenchen Fan|Databricks| +|Huaxin Gao|IBM| |Joseph Gonzalez|UC Berkeley| |Thomas Graves|NVIDIA| |Stephen Haberman|LinkedIn| diff --git a/site/committers.html b/site/committers.html index 5299961..66de9a1 100644 --- a/site/committers.html +++ b/site/committers.html @@ -275,6 +275,10 @@ Databricks + Huaxin Gao + IBM + + Joseph Gonzalez UC Berkeley - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new fc2660c [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table fc2660c is described below commit fc2660c302b0c83a9a8a5bec3cc7ae28f8fecdd6 Author: Huaxin Gao AuthorDate: Sat Jul 4 19:01:07 2020 -0700 [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table ### What changes were proposed in this pull request? docs/sql-ref-syntax-qry-select-usedb.md -> docs/sql-ref-syntax-ddl-usedb.md docs/sql-ref-syntax-aux-refresh-table.md -> docs/sql-ref-syntax-aux-cache-refresh-table.md ### Why are the changes needed? usedb belongs to DDL. Its location should be consistent with other DDL commands file locations similar reason for refresh table ### Does this PR introduce _any_ user-facing change? before change, when clicking USE DATABASE, the side bar menu shows select commands https://user-images.githubusercontent.com/13592258/86516696-b45f8a80-bdd7-11ea-8dba-3a5cca22aad3.png;> after change, when clicking USE DATABASE, the side bar menu shows DDL commands https://user-images.githubusercontent.com/13592258/86516703-bf1a1f80-bdd7-11ea-8a90-ae7eaaafd44c.png;> before change, when clicking refresh table, the side bar menu shows Auxiliary statements https://user-images.githubusercontent.com/13592258/86516877-3d2af600-bdd9-11ea-9568-0a6f156f57da.png;> after change, when clicking refresh table, the side bar menu shows Cache statements https://user-images.githubusercontent.com/13592258/86516937-b4f92080-bdd9-11ea-8ad1-5f5a7f58d76b.png;> ### How was this patch tested? Manually build and check Closes #28995 from huaxingao/docs_fix. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao (cherry picked from commit 492d5d174a435c624bd87af9ee3621f4f1c8d1c5) Signed-off-by: Huaxin Gao --- docs/_data/menu-sql.yaml | 4 ++-- docs/sql-ref-syntax-aux-cache-cache-table.md | 2 +- docs/sql-ref-syntax-aux-cache-clear-cache.md | 2 +- ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0 docs/sql-ref-syntax-aux-cache-refresh.md | 2 +- docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +- docs/sql-ref-syntax-aux-cache.md | 2 +- ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0 docs/sql-ref-syntax-ddl.md| 2 +- docs/sql-ref-syntax.md| 4 ++-- 10 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 219e680..eea657e 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -139,7 +139,7 @@ - text: REPAIR TABLE url: sql-ref-syntax-ddl-repair-table.html - text: USE DATABASE - url: sql-ref-syntax-qry-select-usedb.html + url: sql-ref-syntax-ddl-usedb.html - text: Data Manipulation Statements url: sql-ref-syntax-dml.html subitems: @@ -207,7 +207,7 @@ - text: CLEAR CACHE url: sql-ref-syntax-aux-cache-clear-cache.html - text: REFRESH TABLE - url: sql-ref-syntax-aux-refresh-table.html + url: sql-ref-syntax-aux-cache-refresh-table.html - text: REFRESH url: sql-ref-syntax-aux-cache-refresh.html - text: DESCRIBE diff --git a/docs/sql-ref-syntax-aux-cache-cache-table.md b/docs/sql-ref-syntax-aux-cache-cache-table.md index 193e209..fdef3d6 100644 --- a/docs/sql-ref-syntax-aux-cache-cache-table.md +++ b/docs/sql-ref-syntax-aux-cache-cache-table.md @@ -78,5 +78,5 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') SELECT * FROM testDat * [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html) * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html) -* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html) +* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html) * [REFRESH](sql-ref-syntax-aux-cache-refresh.html) diff --git a/docs/sql-ref-syntax-aux-cache-clear-cache.md b/docs/sql-ref-syntax-aux-cache-clear-cache.md index ee33e6a..a27cd83 100644 --- a/docs/sql-ref-syntax-aux-cache-clear-cache.md +++ b/docs/sql-ref-syntax-aux-cache-clear-cache.md @@ -39,5 +39,5 @@ CLEAR CACHE; * [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html) * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html) -* [REFRESH TABLE](sql-ref-synta
[spark] branch branch-3.0 updated: [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new fc2660c [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table fc2660c is described below commit fc2660c302b0c83a9a8a5bec3cc7ae28f8fecdd6 Author: Huaxin Gao AuthorDate: Sat Jul 4 19:01:07 2020 -0700 [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table ### What changes were proposed in this pull request? docs/sql-ref-syntax-qry-select-usedb.md -> docs/sql-ref-syntax-ddl-usedb.md docs/sql-ref-syntax-aux-refresh-table.md -> docs/sql-ref-syntax-aux-cache-refresh-table.md ### Why are the changes needed? usedb belongs to DDL. Its location should be consistent with other DDL commands file locations similar reason for refresh table ### Does this PR introduce _any_ user-facing change? before change, when clicking USE DATABASE, the side bar menu shows select commands https://user-images.githubusercontent.com/13592258/86516696-b45f8a80-bdd7-11ea-8dba-3a5cca22aad3.png;> after change, when clicking USE DATABASE, the side bar menu shows DDL commands https://user-images.githubusercontent.com/13592258/86516703-bf1a1f80-bdd7-11ea-8a90-ae7eaaafd44c.png;> before change, when clicking refresh table, the side bar menu shows Auxiliary statements https://user-images.githubusercontent.com/13592258/86516877-3d2af600-bdd9-11ea-9568-0a6f156f57da.png;> after change, when clicking refresh table, the side bar menu shows Cache statements https://user-images.githubusercontent.com/13592258/86516937-b4f92080-bdd9-11ea-8ad1-5f5a7f58d76b.png;> ### How was this patch tested? Manually build and check Closes #28995 from huaxingao/docs_fix. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao (cherry picked from commit 492d5d174a435c624bd87af9ee3621f4f1c8d1c5) Signed-off-by: Huaxin Gao --- docs/_data/menu-sql.yaml | 4 ++-- docs/sql-ref-syntax-aux-cache-cache-table.md | 2 +- docs/sql-ref-syntax-aux-cache-clear-cache.md | 2 +- ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0 docs/sql-ref-syntax-aux-cache-refresh.md | 2 +- docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +- docs/sql-ref-syntax-aux-cache.md | 2 +- ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0 docs/sql-ref-syntax-ddl.md| 2 +- docs/sql-ref-syntax.md| 4 ++-- 10 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 219e680..eea657e 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -139,7 +139,7 @@ - text: REPAIR TABLE url: sql-ref-syntax-ddl-repair-table.html - text: USE DATABASE - url: sql-ref-syntax-qry-select-usedb.html + url: sql-ref-syntax-ddl-usedb.html - text: Data Manipulation Statements url: sql-ref-syntax-dml.html subitems: @@ -207,7 +207,7 @@ - text: CLEAR CACHE url: sql-ref-syntax-aux-cache-clear-cache.html - text: REFRESH TABLE - url: sql-ref-syntax-aux-refresh-table.html + url: sql-ref-syntax-aux-cache-refresh-table.html - text: REFRESH url: sql-ref-syntax-aux-cache-refresh.html - text: DESCRIBE diff --git a/docs/sql-ref-syntax-aux-cache-cache-table.md b/docs/sql-ref-syntax-aux-cache-cache-table.md index 193e209..fdef3d6 100644 --- a/docs/sql-ref-syntax-aux-cache-cache-table.md +++ b/docs/sql-ref-syntax-aux-cache-cache-table.md @@ -78,5 +78,5 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') SELECT * FROM testDat * [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html) * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html) -* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html) +* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html) * [REFRESH](sql-ref-syntax-aux-cache-refresh.html) diff --git a/docs/sql-ref-syntax-aux-cache-clear-cache.md b/docs/sql-ref-syntax-aux-cache-clear-cache.md index ee33e6a..a27cd83 100644 --- a/docs/sql-ref-syntax-aux-cache-clear-cache.md +++ b/docs/sql-ref-syntax-aux-cache-clear-cache.md @@ -39,5 +39,5 @@ CLEAR CACHE; * [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html) * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html) -* [REFRESH TABLE](sql-ref-synta
[spark] branch master updated (42f01e3 -> 492d5d1)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 42f01e3 [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark add 492d5d1 [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 4 ++-- docs/sql-ref-syntax-aux-cache-cache-table.md | 2 +- docs/sql-ref-syntax-aux-cache-clear-cache.md | 2 +- ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0 docs/sql-ref-syntax-aux-cache-refresh.md | 2 +- docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +- docs/sql-ref-syntax-aux-cache.md | 2 +- ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0 docs/sql-ref-syntax-ddl.md| 2 +- docs/sql-ref-syntax.md| 4 ++-- 10 files changed, 10 insertions(+), 10 deletions(-) rename docs/{sql-ref-syntax-aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} (100%) rename docs/{sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} (100%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 492d5d1 [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table 492d5d1 is described below commit 492d5d174a435c624bd87af9ee3621f4f1c8d1c5 Author: Huaxin Gao AuthorDate: Sat Jul 4 19:01:07 2020 -0700 [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table ### What changes were proposed in this pull request? docs/sql-ref-syntax-qry-select-usedb.md -> docs/sql-ref-syntax-ddl-usedb.md docs/sql-ref-syntax-aux-refresh-table.md -> docs/sql-ref-syntax-aux-cache-refresh-table.md ### Why are the changes needed? usedb belongs to DDL. Its location should be consistent with other DDL commands file locations similar reason for refresh table ### Does this PR introduce _any_ user-facing change? before change, when clicking USE DATABASE, the side bar menu shows select commands https://user-images.githubusercontent.com/13592258/86516696-b45f8a80-bdd7-11ea-8dba-3a5cca22aad3.png;> after change, when clicking USE DATABASE, the side bar menu shows DDL commands https://user-images.githubusercontent.com/13592258/86516703-bf1a1f80-bdd7-11ea-8a90-ae7eaaafd44c.png;> before change, when clicking refresh table, the side bar menu shows Auxiliary statements https://user-images.githubusercontent.com/13592258/86516877-3d2af600-bdd9-11ea-9568-0a6f156f57da.png;> after change, when clicking refresh table, the side bar menu shows Cache statements https://user-images.githubusercontent.com/13592258/86516937-b4f92080-bdd9-11ea-8ad1-5f5a7f58d76b.png;> ### How was this patch tested? Manually build and check Closes #28995 from huaxingao/docs_fix. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao --- docs/_data/menu-sql.yaml | 4 ++-- docs/sql-ref-syntax-aux-cache-cache-table.md | 2 +- docs/sql-ref-syntax-aux-cache-clear-cache.md | 2 +- ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0 docs/sql-ref-syntax-aux-cache-refresh.md | 2 +- docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +- docs/sql-ref-syntax-aux-cache.md | 2 +- ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0 docs/sql-ref-syntax-ddl.md| 2 +- docs/sql-ref-syntax.md| 4 ++-- 10 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 219e680..eea657e 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -139,7 +139,7 @@ - text: REPAIR TABLE url: sql-ref-syntax-ddl-repair-table.html - text: USE DATABASE - url: sql-ref-syntax-qry-select-usedb.html + url: sql-ref-syntax-ddl-usedb.html - text: Data Manipulation Statements url: sql-ref-syntax-dml.html subitems: @@ -207,7 +207,7 @@ - text: CLEAR CACHE url: sql-ref-syntax-aux-cache-clear-cache.html - text: REFRESH TABLE - url: sql-ref-syntax-aux-refresh-table.html + url: sql-ref-syntax-aux-cache-refresh-table.html - text: REFRESH url: sql-ref-syntax-aux-cache-refresh.html - text: DESCRIBE diff --git a/docs/sql-ref-syntax-aux-cache-cache-table.md b/docs/sql-ref-syntax-aux-cache-cache-table.md index 193e209..fdef3d6 100644 --- a/docs/sql-ref-syntax-aux-cache-cache-table.md +++ b/docs/sql-ref-syntax-aux-cache-cache-table.md @@ -78,5 +78,5 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') SELECT * FROM testDat * [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html) * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html) -* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html) +* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html) * [REFRESH](sql-ref-syntax-aux-cache-refresh.html) diff --git a/docs/sql-ref-syntax-aux-cache-clear-cache.md b/docs/sql-ref-syntax-aux-cache-clear-cache.md index ee33e6a..a27cd83 100644 --- a/docs/sql-ref-syntax-aux-cache-clear-cache.md +++ b/docs/sql-ref-syntax-aux-cache-clear-cache.md @@ -39,5 +39,5 @@ CLEAR CACHE; * [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html) * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html) -* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html) +* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html) * [REFRESH](sql-ref-synta
[spark] branch master updated (42f01e3 -> 492d5d1)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 42f01e3 [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark add 492d5d1 [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 4 ++-- docs/sql-ref-syntax-aux-cache-cache-table.md | 2 +- docs/sql-ref-syntax-aux-cache-clear-cache.md | 2 +- ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0 docs/sql-ref-syntax-aux-cache-refresh.md | 2 +- docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +- docs/sql-ref-syntax-aux-cache.md | 2 +- ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0 docs/sql-ref-syntax-ddl.md| 2 +- docs/sql-ref-syntax.md| 4 ++-- 10 files changed, 10 insertions(+), 10 deletions(-) rename docs/{sql-ref-syntax-aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} (100%) rename docs/{sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} (100%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new fc2660c [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table fc2660c is described below commit fc2660c302b0c83a9a8a5bec3cc7ae28f8fecdd6 Author: Huaxin Gao AuthorDate: Sat Jul 4 19:01:07 2020 -0700 [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table ### What changes were proposed in this pull request? docs/sql-ref-syntax-qry-select-usedb.md -> docs/sql-ref-syntax-ddl-usedb.md docs/sql-ref-syntax-aux-refresh-table.md -> docs/sql-ref-syntax-aux-cache-refresh-table.md ### Why are the changes needed? usedb belongs to DDL. Its location should be consistent with other DDL commands file locations similar reason for refresh table ### Does this PR introduce _any_ user-facing change? before change, when clicking USE DATABASE, the side bar menu shows select commands https://user-images.githubusercontent.com/13592258/86516696-b45f8a80-bdd7-11ea-8dba-3a5cca22aad3.png;> after change, when clicking USE DATABASE, the side bar menu shows DDL commands https://user-images.githubusercontent.com/13592258/86516703-bf1a1f80-bdd7-11ea-8a90-ae7eaaafd44c.png;> before change, when clicking refresh table, the side bar menu shows Auxiliary statements https://user-images.githubusercontent.com/13592258/86516877-3d2af600-bdd9-11ea-9568-0a6f156f57da.png;> after change, when clicking refresh table, the side bar menu shows Cache statements https://user-images.githubusercontent.com/13592258/86516937-b4f92080-bdd9-11ea-8ad1-5f5a7f58d76b.png;> ### How was this patch tested? Manually build and check Closes #28995 from huaxingao/docs_fix. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao (cherry picked from commit 492d5d174a435c624bd87af9ee3621f4f1c8d1c5) Signed-off-by: Huaxin Gao --- docs/_data/menu-sql.yaml | 4 ++-- docs/sql-ref-syntax-aux-cache-cache-table.md | 2 +- docs/sql-ref-syntax-aux-cache-clear-cache.md | 2 +- ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0 docs/sql-ref-syntax-aux-cache-refresh.md | 2 +- docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +- docs/sql-ref-syntax-aux-cache.md | 2 +- ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0 docs/sql-ref-syntax-ddl.md| 2 +- docs/sql-ref-syntax.md| 4 ++-- 10 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 219e680..eea657e 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -139,7 +139,7 @@ - text: REPAIR TABLE url: sql-ref-syntax-ddl-repair-table.html - text: USE DATABASE - url: sql-ref-syntax-qry-select-usedb.html + url: sql-ref-syntax-ddl-usedb.html - text: Data Manipulation Statements url: sql-ref-syntax-dml.html subitems: @@ -207,7 +207,7 @@ - text: CLEAR CACHE url: sql-ref-syntax-aux-cache-clear-cache.html - text: REFRESH TABLE - url: sql-ref-syntax-aux-refresh-table.html + url: sql-ref-syntax-aux-cache-refresh-table.html - text: REFRESH url: sql-ref-syntax-aux-cache-refresh.html - text: DESCRIBE diff --git a/docs/sql-ref-syntax-aux-cache-cache-table.md b/docs/sql-ref-syntax-aux-cache-cache-table.md index 193e209..fdef3d6 100644 --- a/docs/sql-ref-syntax-aux-cache-cache-table.md +++ b/docs/sql-ref-syntax-aux-cache-cache-table.md @@ -78,5 +78,5 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') SELECT * FROM testDat * [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html) * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html) -* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html) +* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html) * [REFRESH](sql-ref-syntax-aux-cache-refresh.html) diff --git a/docs/sql-ref-syntax-aux-cache-clear-cache.md b/docs/sql-ref-syntax-aux-cache-clear-cache.md index ee33e6a..a27cd83 100644 --- a/docs/sql-ref-syntax-aux-cache-clear-cache.md +++ b/docs/sql-ref-syntax-aux-cache-clear-cache.md @@ -39,5 +39,5 @@ CLEAR CACHE; * [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html) * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html) -* [REFRESH TABLE](sql-ref-synta
[spark] branch master updated (42f01e3 -> 492d5d1)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 42f01e3 [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark add 492d5d1 [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 4 ++-- docs/sql-ref-syntax-aux-cache-cache-table.md | 2 +- docs/sql-ref-syntax-aux-cache-clear-cache.md | 2 +- ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0 docs/sql-ref-syntax-aux-cache-refresh.md | 2 +- docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +- docs/sql-ref-syntax-aux-cache.md | 2 +- ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0 docs/sql-ref-syntax-ddl.md| 2 +- docs/sql-ref-syntax.md| 4 ++-- 10 files changed, 10 insertions(+), 10 deletions(-) rename docs/{sql-ref-syntax-aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} (100%) rename docs/{sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} (100%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new fc2660c [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table fc2660c is described below commit fc2660c302b0c83a9a8a5bec3cc7ae28f8fecdd6 Author: Huaxin Gao AuthorDate: Sat Jul 4 19:01:07 2020 -0700 [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table ### What changes were proposed in this pull request? docs/sql-ref-syntax-qry-select-usedb.md -> docs/sql-ref-syntax-ddl-usedb.md docs/sql-ref-syntax-aux-refresh-table.md -> docs/sql-ref-syntax-aux-cache-refresh-table.md ### Why are the changes needed? usedb belongs to DDL. Its location should be consistent with other DDL commands file locations similar reason for refresh table ### Does this PR introduce _any_ user-facing change? before change, when clicking USE DATABASE, the side bar menu shows select commands https://user-images.githubusercontent.com/13592258/86516696-b45f8a80-bdd7-11ea-8dba-3a5cca22aad3.png;> after change, when clicking USE DATABASE, the side bar menu shows DDL commands https://user-images.githubusercontent.com/13592258/86516703-bf1a1f80-bdd7-11ea-8a90-ae7eaaafd44c.png;> before change, when clicking refresh table, the side bar menu shows Auxiliary statements https://user-images.githubusercontent.com/13592258/86516877-3d2af600-bdd9-11ea-9568-0a6f156f57da.png;> after change, when clicking refresh table, the side bar menu shows Cache statements https://user-images.githubusercontent.com/13592258/86516937-b4f92080-bdd9-11ea-8ad1-5f5a7f58d76b.png;> ### How was this patch tested? Manually build and check Closes #28995 from huaxingao/docs_fix. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao (cherry picked from commit 492d5d174a435c624bd87af9ee3621f4f1c8d1c5) Signed-off-by: Huaxin Gao --- docs/_data/menu-sql.yaml | 4 ++-- docs/sql-ref-syntax-aux-cache-cache-table.md | 2 +- docs/sql-ref-syntax-aux-cache-clear-cache.md | 2 +- ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0 docs/sql-ref-syntax-aux-cache-refresh.md | 2 +- docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +- docs/sql-ref-syntax-aux-cache.md | 2 +- ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0 docs/sql-ref-syntax-ddl.md| 2 +- docs/sql-ref-syntax.md| 4 ++-- 10 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 219e680..eea657e 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -139,7 +139,7 @@ - text: REPAIR TABLE url: sql-ref-syntax-ddl-repair-table.html - text: USE DATABASE - url: sql-ref-syntax-qry-select-usedb.html + url: sql-ref-syntax-ddl-usedb.html - text: Data Manipulation Statements url: sql-ref-syntax-dml.html subitems: @@ -207,7 +207,7 @@ - text: CLEAR CACHE url: sql-ref-syntax-aux-cache-clear-cache.html - text: REFRESH TABLE - url: sql-ref-syntax-aux-refresh-table.html + url: sql-ref-syntax-aux-cache-refresh-table.html - text: REFRESH url: sql-ref-syntax-aux-cache-refresh.html - text: DESCRIBE diff --git a/docs/sql-ref-syntax-aux-cache-cache-table.md b/docs/sql-ref-syntax-aux-cache-cache-table.md index 193e209..fdef3d6 100644 --- a/docs/sql-ref-syntax-aux-cache-cache-table.md +++ b/docs/sql-ref-syntax-aux-cache-cache-table.md @@ -78,5 +78,5 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') SELECT * FROM testDat * [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html) * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html) -* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html) +* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html) * [REFRESH](sql-ref-syntax-aux-cache-refresh.html) diff --git a/docs/sql-ref-syntax-aux-cache-clear-cache.md b/docs/sql-ref-syntax-aux-cache-clear-cache.md index ee33e6a..a27cd83 100644 --- a/docs/sql-ref-syntax-aux-cache-clear-cache.md +++ b/docs/sql-ref-syntax-aux-cache-clear-cache.md @@ -39,5 +39,5 @@ CLEAR CACHE; * [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html) * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html) -* [REFRESH TABLE](sql-ref-synta
[spark] branch branch-3.0 updated (f50432f -> 8a52bda)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from f50432f [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip package testing in Jenkins add 8a52bda [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 10 -- .../apache/spark/ml/classification/LinearSVC.scala | 11 +-- .../ml/classification/LogisticRegression.scala | 13 +-- .../spark/ml/classification/NaiveBayes.scala | 4 +- .../spark/ml/clustering/BisectingKMeans.scala | 7 +- .../spark/ml/clustering/GaussianMixture.scala | 7 +- .../org/apache/spark/ml/clustering/KMeans.scala| 11 +-- .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 +-- .../ml/clustering/PowerIterationClustering.scala | 7 +- .../evaluation/BinaryClassificationEvaluator.scala | 4 +- .../MulticlassClassificationEvaluator.scala| 8 +- .../MultilabelClassificationEvaluator.scala| 6 +- .../spark/ml/evaluation/RankingEvaluator.scala | 6 +- .../spark/ml/evaluation/RegressionEvaluator.scala | 4 +- .../apache/spark/ml/feature/ChiSqSelector.scala| 9 +- .../org/apache/spark/ml/feature/Imputer.scala | 4 +- .../org/apache/spark/ml/feature/MinMaxScaler.scala | 4 +- .../apache/spark/ml/feature/OneHotEncoder.scala| 5 +- .../spark/ml/feature/QuantileDiscretizer.scala | 4 +- .../org/apache/spark/ml/feature/RFormula.scala | 6 +- .../org/apache/spark/ml/feature/RobustScaler.scala | 8 +- .../apache/spark/ml/feature/StringIndexer.scala| 6 +- .../apache/spark/ml/feature/VectorIndexer.scala| 6 +- .../org/apache/spark/ml/feature/VectorSlicer.scala | 6 +- .../org/apache/spark/ml/feature/Word2Vec.scala | 9 +- .../scala/org/apache/spark/ml/fpm/FPGrowth.scala | 5 +- .../ml/regression/AFTSurvivalRegression.scala | 10 +- .../spark/ml/regression/LinearRegression.scala | 14 +-- .../org/apache/spark/ml/tree/treeParams.scala | 16 +-- .../spark/ml/util/DefaultReadWriteTest.scala | 3 + python/pyspark/ml/classification.py| 86 +++- python/pyspark/ml/clustering.py| 43 ++-- python/pyspark/ml/feature.py | 110 ++--- python/pyspark/ml/fpm.py | 12 ++- python/pyspark/ml/recommendation.py| 20 ++-- python/pyspark/ml/regression.py| 88 - python/pyspark/ml/tests/test_param.py | 7 +- python/pyspark/ml/tuning.py| 16 ++- 38 files changed, 368 insertions(+), 238 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (f50432f -> 8a52bda)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from f50432f [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip package testing in Jenkins add 8a52bda [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 10 -- .../apache/spark/ml/classification/LinearSVC.scala | 11 +-- .../ml/classification/LogisticRegression.scala | 13 +-- .../spark/ml/classification/NaiveBayes.scala | 4 +- .../spark/ml/clustering/BisectingKMeans.scala | 7 +- .../spark/ml/clustering/GaussianMixture.scala | 7 +- .../org/apache/spark/ml/clustering/KMeans.scala| 11 +-- .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 +-- .../ml/clustering/PowerIterationClustering.scala | 7 +- .../evaluation/BinaryClassificationEvaluator.scala | 4 +- .../MulticlassClassificationEvaluator.scala| 8 +- .../MultilabelClassificationEvaluator.scala| 6 +- .../spark/ml/evaluation/RankingEvaluator.scala | 6 +- .../spark/ml/evaluation/RegressionEvaluator.scala | 4 +- .../apache/spark/ml/feature/ChiSqSelector.scala| 9 +- .../org/apache/spark/ml/feature/Imputer.scala | 4 +- .../org/apache/spark/ml/feature/MinMaxScaler.scala | 4 +- .../apache/spark/ml/feature/OneHotEncoder.scala| 5 +- .../spark/ml/feature/QuantileDiscretizer.scala | 4 +- .../org/apache/spark/ml/feature/RFormula.scala | 6 +- .../org/apache/spark/ml/feature/RobustScaler.scala | 8 +- .../apache/spark/ml/feature/StringIndexer.scala| 6 +- .../apache/spark/ml/feature/VectorIndexer.scala| 6 +- .../org/apache/spark/ml/feature/VectorSlicer.scala | 6 +- .../org/apache/spark/ml/feature/Word2Vec.scala | 9 +- .../scala/org/apache/spark/ml/fpm/FPGrowth.scala | 5 +- .../ml/regression/AFTSurvivalRegression.scala | 10 +- .../spark/ml/regression/LinearRegression.scala | 14 +-- .../org/apache/spark/ml/tree/treeParams.scala | 16 +-- .../spark/ml/util/DefaultReadWriteTest.scala | 3 + python/pyspark/ml/classification.py| 86 +++- python/pyspark/ml/clustering.py| 43 ++-- python/pyspark/ml/feature.py | 110 ++--- python/pyspark/ml/fpm.py | 12 ++- python/pyspark/ml/recommendation.py| 20 ++-- python/pyspark/ml/regression.py| 88 - python/pyspark/ml/tests/test_param.py | 7 +- python/pyspark/ml/tuning.py| 16 ++- 38 files changed, 368 insertions(+), 238 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (f50432f -> 8a52bda)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from f50432f [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip package testing in Jenkins add 8a52bda [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 10 -- .../apache/spark/ml/classification/LinearSVC.scala | 11 +-- .../ml/classification/LogisticRegression.scala | 13 +-- .../spark/ml/classification/NaiveBayes.scala | 4 +- .../spark/ml/clustering/BisectingKMeans.scala | 7 +- .../spark/ml/clustering/GaussianMixture.scala | 7 +- .../org/apache/spark/ml/clustering/KMeans.scala| 11 +-- .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 +-- .../ml/clustering/PowerIterationClustering.scala | 7 +- .../evaluation/BinaryClassificationEvaluator.scala | 4 +- .../MulticlassClassificationEvaluator.scala| 8 +- .../MultilabelClassificationEvaluator.scala| 6 +- .../spark/ml/evaluation/RankingEvaluator.scala | 6 +- .../spark/ml/evaluation/RegressionEvaluator.scala | 4 +- .../apache/spark/ml/feature/ChiSqSelector.scala| 9 +- .../org/apache/spark/ml/feature/Imputer.scala | 4 +- .../org/apache/spark/ml/feature/MinMaxScaler.scala | 4 +- .../apache/spark/ml/feature/OneHotEncoder.scala| 5 +- .../spark/ml/feature/QuantileDiscretizer.scala | 4 +- .../org/apache/spark/ml/feature/RFormula.scala | 6 +- .../org/apache/spark/ml/feature/RobustScaler.scala | 8 +- .../apache/spark/ml/feature/StringIndexer.scala| 6 +- .../apache/spark/ml/feature/VectorIndexer.scala| 6 +- .../org/apache/spark/ml/feature/VectorSlicer.scala | 6 +- .../org/apache/spark/ml/feature/Word2Vec.scala | 9 +- .../scala/org/apache/spark/ml/fpm/FPGrowth.scala | 5 +- .../ml/regression/AFTSurvivalRegression.scala | 10 +- .../spark/ml/regression/LinearRegression.scala | 14 +-- .../org/apache/spark/ml/tree/treeParams.scala | 16 +-- .../spark/ml/util/DefaultReadWriteTest.scala | 3 + python/pyspark/ml/classification.py| 86 +++- python/pyspark/ml/clustering.py| 43 ++-- python/pyspark/ml/feature.py | 110 ++--- python/pyspark/ml/fpm.py | 12 ++- python/pyspark/ml/recommendation.py| 20 ++-- python/pyspark/ml/regression.py| 88 - python/pyspark/ml/tests/test_param.py | 7 +- python/pyspark/ml/tuning.py| 16 ++- 38 files changed, 368 insertions(+), 238 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (f50432f -> 8a52bda)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from f50432f [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip package testing in Jenkins add 8a52bda [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 10 -- .../apache/spark/ml/classification/LinearSVC.scala | 11 +-- .../ml/classification/LogisticRegression.scala | 13 +-- .../spark/ml/classification/NaiveBayes.scala | 4 +- .../spark/ml/clustering/BisectingKMeans.scala | 7 +- .../spark/ml/clustering/GaussianMixture.scala | 7 +- .../org/apache/spark/ml/clustering/KMeans.scala| 11 +-- .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 +-- .../ml/clustering/PowerIterationClustering.scala | 7 +- .../evaluation/BinaryClassificationEvaluator.scala | 4 +- .../MulticlassClassificationEvaluator.scala| 8 +- .../MultilabelClassificationEvaluator.scala| 6 +- .../spark/ml/evaluation/RankingEvaluator.scala | 6 +- .../spark/ml/evaluation/RegressionEvaluator.scala | 4 +- .../apache/spark/ml/feature/ChiSqSelector.scala| 9 +- .../org/apache/spark/ml/feature/Imputer.scala | 4 +- .../org/apache/spark/ml/feature/MinMaxScaler.scala | 4 +- .../apache/spark/ml/feature/OneHotEncoder.scala| 5 +- .../spark/ml/feature/QuantileDiscretizer.scala | 4 +- .../org/apache/spark/ml/feature/RFormula.scala | 6 +- .../org/apache/spark/ml/feature/RobustScaler.scala | 8 +- .../apache/spark/ml/feature/StringIndexer.scala| 6 +- .../apache/spark/ml/feature/VectorIndexer.scala| 6 +- .../org/apache/spark/ml/feature/VectorSlicer.scala | 6 +- .../org/apache/spark/ml/feature/Word2Vec.scala | 9 +- .../scala/org/apache/spark/ml/fpm/FPGrowth.scala | 5 +- .../ml/regression/AFTSurvivalRegression.scala | 10 +- .../spark/ml/regression/LinearRegression.scala | 14 +-- .../org/apache/spark/ml/tree/treeParams.scala | 16 +-- .../spark/ml/util/DefaultReadWriteTest.scala | 3 + python/pyspark/ml/classification.py| 86 +++- python/pyspark/ml/clustering.py| 43 ++-- python/pyspark/ml/feature.py | 110 ++--- python/pyspark/ml/fpm.py | 12 ++- python/pyspark/ml/recommendation.py| 20 ++-- python/pyspark/ml/regression.py| 88 - python/pyspark/ml/tests/test_param.py | 7 +- python/pyspark/ml/tuning.py| 16 ++- 38 files changed, 368 insertions(+), 238 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (f50432f -> 8a52bda)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from f50432f [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip package testing in Jenkins add 8a52bda [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 10 -- .../apache/spark/ml/classification/LinearSVC.scala | 11 +-- .../ml/classification/LogisticRegression.scala | 13 +-- .../spark/ml/classification/NaiveBayes.scala | 4 +- .../spark/ml/clustering/BisectingKMeans.scala | 7 +- .../spark/ml/clustering/GaussianMixture.scala | 7 +- .../org/apache/spark/ml/clustering/KMeans.scala| 11 +-- .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 +-- .../ml/clustering/PowerIterationClustering.scala | 7 +- .../evaluation/BinaryClassificationEvaluator.scala | 4 +- .../MulticlassClassificationEvaluator.scala| 8 +- .../MultilabelClassificationEvaluator.scala| 6 +- .../spark/ml/evaluation/RankingEvaluator.scala | 6 +- .../spark/ml/evaluation/RegressionEvaluator.scala | 4 +- .../apache/spark/ml/feature/ChiSqSelector.scala| 9 +- .../org/apache/spark/ml/feature/Imputer.scala | 4 +- .../org/apache/spark/ml/feature/MinMaxScaler.scala | 4 +- .../apache/spark/ml/feature/OneHotEncoder.scala| 5 +- .../spark/ml/feature/QuantileDiscretizer.scala | 4 +- .../org/apache/spark/ml/feature/RFormula.scala | 6 +- .../org/apache/spark/ml/feature/RobustScaler.scala | 8 +- .../apache/spark/ml/feature/StringIndexer.scala| 6 +- .../apache/spark/ml/feature/VectorIndexer.scala| 6 +- .../org/apache/spark/ml/feature/VectorSlicer.scala | 6 +- .../org/apache/spark/ml/feature/Word2Vec.scala | 9 +- .../scala/org/apache/spark/ml/fpm/FPGrowth.scala | 5 +- .../ml/regression/AFTSurvivalRegression.scala | 10 +- .../spark/ml/regression/LinearRegression.scala | 14 +-- .../org/apache/spark/ml/tree/treeParams.scala | 16 +-- .../spark/ml/util/DefaultReadWriteTest.scala | 3 + python/pyspark/ml/classification.py| 86 +++- python/pyspark/ml/clustering.py| 43 ++-- python/pyspark/ml/feature.py | 110 ++--- python/pyspark/ml/fpm.py | 12 ++- python/pyspark/ml/recommendation.py| 20 ++-- python/pyspark/ml/regression.py| 88 - python/pyspark/ml/tests/test_param.py | 7 +- python/pyspark/ml/tuning.py| 16 ++- 38 files changed, 368 insertions(+), 238 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c1f160e -> d5c672a)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c1f160e [SPARK-30648][SQL] Support filters pushdown in JSON datasource add d5c672a [SPARK-32315][ML] Provide an explanation error message when calling require No new revisions were added by this update. Summary of changes: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c1f160e -> d5c672a)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c1f160e [SPARK-30648][SQL] Support filters pushdown in JSON datasource add d5c672a [SPARK-32315][ML] Provide an explanation error message when calling require No new revisions were added by this update. Summary of changes: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c1f160e -> d5c672a)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c1f160e [SPARK-30648][SQL] Support filters pushdown in JSON datasource add d5c672a [SPARK-32315][ML] Provide an explanation error message when calling require No new revisions were added by this update. Summary of changes: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c1f160e -> d5c672a)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c1f160e [SPARK-30648][SQL] Support filters pushdown in JSON datasource add d5c672a [SPARK-32315][ML] Provide an explanation error message when calling require No new revisions were added by this update. Summary of changes: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c1f160e -> d5c672a)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c1f160e [SPARK-30648][SQL] Support filters pushdown in JSON datasource add d5c672a [SPARK-32315][ML] Provide an explanation error message when calling require No new revisions were added by this update. Summary of changes: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d5c672a -> 383f5e9)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d5c672a [SPARK-32315][ML] Provide an explanation error message when calling require add 383f5e9 [SPARK-32310][ML][PYSPARK] ML params default value parity in classification, regression, clustering and fpm No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 10 .../apache/spark/ml/classification/LinearSVC.scala | 12 ++--- .../ml/classification/LogisticRegression.scala | 14 ++ .../spark/ml/classification/NaiveBayes.scala | 4 +- .../spark/ml/clustering/BisectingKMeans.scala | 7 +-- .../spark/ml/clustering/GaussianMixture.scala | 8 +-- .../org/apache/spark/ml/clustering/KMeans.scala| 11 ++-- .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 ++-- .../ml/clustering/PowerIterationClustering.scala | 7 +-- .../evaluation/BinaryClassificationEvaluator.scala | 4 +- .../MulticlassClassificationEvaluator.scala| 8 +-- .../MultilabelClassificationEvaluator.scala| 6 +-- .../spark/ml/evaluation/RankingEvaluator.scala | 6 +-- .../spark/ml/evaluation/RegressionEvaluator.scala | 4 +- .../scala/org/apache/spark/ml/fpm/FPGrowth.scala | 5 +- .../ml/regression/AFTSurvivalRegression.scala | 11 ++-- .../spark/ml/regression/LinearRegression.scala | 15 ++ python/pyspark/ml/classification.py| 58 ++ python/pyspark/ml/clustering.py| 33 python/pyspark/ml/fpm.py | 7 ++- python/pyspark/ml/regression.py| 57 + 21 files changed, 141 insertions(+), 157 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d5c672a -> 383f5e9)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d5c672a [SPARK-32315][ML] Provide an explanation error message when calling require add 383f5e9 [SPARK-32310][ML][PYSPARK] ML params default value parity in classification, regression, clustering and fpm No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 10 .../apache/spark/ml/classification/LinearSVC.scala | 12 ++--- .../ml/classification/LogisticRegression.scala | 14 ++ .../spark/ml/classification/NaiveBayes.scala | 4 +- .../spark/ml/clustering/BisectingKMeans.scala | 7 +-- .../spark/ml/clustering/GaussianMixture.scala | 8 +-- .../org/apache/spark/ml/clustering/KMeans.scala| 11 ++-- .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 ++-- .../ml/clustering/PowerIterationClustering.scala | 7 +-- .../evaluation/BinaryClassificationEvaluator.scala | 4 +- .../MulticlassClassificationEvaluator.scala| 8 +-- .../MultilabelClassificationEvaluator.scala| 6 +-- .../spark/ml/evaluation/RankingEvaluator.scala | 6 +-- .../spark/ml/evaluation/RegressionEvaluator.scala | 4 +- .../scala/org/apache/spark/ml/fpm/FPGrowth.scala | 5 +- .../ml/regression/AFTSurvivalRegression.scala | 11 ++-- .../spark/ml/regression/LinearRegression.scala | 15 ++ python/pyspark/ml/classification.py| 58 ++ python/pyspark/ml/clustering.py| 33 python/pyspark/ml/fpm.py | 7 ++- python/pyspark/ml/regression.py| 57 + 21 files changed, 141 insertions(+), 157 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d5c672a -> 383f5e9)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d5c672a [SPARK-32315][ML] Provide an explanation error message when calling require add 383f5e9 [SPARK-32310][ML][PYSPARK] ML params default value parity in classification, regression, clustering and fpm No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 10 .../apache/spark/ml/classification/LinearSVC.scala | 12 ++--- .../ml/classification/LogisticRegression.scala | 14 ++ .../spark/ml/classification/NaiveBayes.scala | 4 +- .../spark/ml/clustering/BisectingKMeans.scala | 7 +-- .../spark/ml/clustering/GaussianMixture.scala | 8 +-- .../org/apache/spark/ml/clustering/KMeans.scala| 11 ++-- .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 ++-- .../ml/clustering/PowerIterationClustering.scala | 7 +-- .../evaluation/BinaryClassificationEvaluator.scala | 4 +- .../MulticlassClassificationEvaluator.scala| 8 +-- .../MultilabelClassificationEvaluator.scala| 6 +-- .../spark/ml/evaluation/RankingEvaluator.scala | 6 +-- .../spark/ml/evaluation/RegressionEvaluator.scala | 4 +- .../scala/org/apache/spark/ml/fpm/FPGrowth.scala | 5 +- .../ml/regression/AFTSurvivalRegression.scala | 11 ++-- .../spark/ml/regression/LinearRegression.scala | 15 ++ python/pyspark/ml/classification.py| 58 ++ python/pyspark/ml/clustering.py| 33 python/pyspark/ml/fpm.py | 7 ++- python/pyspark/ml/regression.py| 57 + 21 files changed, 141 insertions(+), 157 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d5c672a -> 383f5e9)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d5c672a [SPARK-32315][ML] Provide an explanation error message when calling require add 383f5e9 [SPARK-32310][ML][PYSPARK] ML params default value parity in classification, regression, clustering and fpm No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 10 .../apache/spark/ml/classification/LinearSVC.scala | 12 ++--- .../ml/classification/LogisticRegression.scala | 14 ++ .../spark/ml/classification/NaiveBayes.scala | 4 +- .../spark/ml/clustering/BisectingKMeans.scala | 7 +-- .../spark/ml/clustering/GaussianMixture.scala | 8 +-- .../org/apache/spark/ml/clustering/KMeans.scala| 11 ++-- .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 ++-- .../ml/clustering/PowerIterationClustering.scala | 7 +-- .../evaluation/BinaryClassificationEvaluator.scala | 4 +- .../MulticlassClassificationEvaluator.scala| 8 +-- .../MultilabelClassificationEvaluator.scala| 6 +-- .../spark/ml/evaluation/RankingEvaluator.scala | 6 +-- .../spark/ml/evaluation/RegressionEvaluator.scala | 4 +- .../scala/org/apache/spark/ml/fpm/FPGrowth.scala | 5 +- .../ml/regression/AFTSurvivalRegression.scala | 11 ++-- .../spark/ml/regression/LinearRegression.scala | 15 ++ python/pyspark/ml/classification.py| 58 ++ python/pyspark/ml/clustering.py| 33 python/pyspark/ml/fpm.py | 7 ++- python/pyspark/ml/regression.py| 57 + 21 files changed, 141 insertions(+), 157 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d5c672a -> 383f5e9)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d5c672a [SPARK-32315][ML] Provide an explanation error message when calling require add 383f5e9 [SPARK-32310][ML][PYSPARK] ML params default value parity in classification, regression, clustering and fpm No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 10 .../apache/spark/ml/classification/LinearSVC.scala | 12 ++--- .../ml/classification/LogisticRegression.scala | 14 ++ .../spark/ml/classification/NaiveBayes.scala | 4 +- .../spark/ml/clustering/BisectingKMeans.scala | 7 +-- .../spark/ml/clustering/GaussianMixture.scala | 8 +-- .../org/apache/spark/ml/clustering/KMeans.scala| 11 ++-- .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 ++-- .../ml/clustering/PowerIterationClustering.scala | 7 +-- .../evaluation/BinaryClassificationEvaluator.scala | 4 +- .../MulticlassClassificationEvaluator.scala| 8 +-- .../MultilabelClassificationEvaluator.scala| 6 +-- .../spark/ml/evaluation/RankingEvaluator.scala | 6 +-- .../spark/ml/evaluation/RegressionEvaluator.scala | 4 +- .../scala/org/apache/spark/ml/fpm/FPGrowth.scala | 5 +- .../ml/regression/AFTSurvivalRegression.scala | 11 ++-- .../spark/ml/regression/LinearRegression.scala | 15 ++ python/pyspark/ml/classification.py| 58 ++ python/pyspark/ml/clustering.py| 33 python/pyspark/ml/fpm.py | 7 ++- python/pyspark/ml/regression.py| 57 + 21 files changed, 141 insertions(+), 157 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cfb718 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs 8cfb718 is described below commit 8cfb7183865c5358a547ec892f10d4f1350300ff Author: Xiaochang Wu AuthorDate: Tue Jul 28 08:36:11 2020 -0700 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs ### What changes were proposed in this pull request? Rewrite a clearer and complete BLAS native acceleration enabling guide. ### Why are the changes needed? The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29139 from xwu99/blas-doc. Lead-authored-by: Xiaochang Wu Co-authored-by: Wu, Xiaochang Signed-off-by: Huaxin Gao (cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90) Signed-off-by: Huaxin Gao --- docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) diff --git a/docs/ml-guide.md b/docs/ml-guide.md index ddce98b..1b4a3e4 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on -[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing. -If native libraries[^1] are not available at runtime, you will see a warning message and a pure JVM -implementation will be used instead. +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths. -Due to licensing issues with runtime proprietary binaries, we do not include `netlib-java`'s native -proxies by default. -To configure `netlib-java` / Breeze to use system optimised binaries, include -`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as a dependency of your -project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your -platform's additional installation instructions. - -The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. - -Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. - -Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) or [Intel oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). Note that if nativeBLAS is n [...] +Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead: +``` +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS +``` To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md new file mode 100644 index 000..7390913 --- /dev/null +++ b/docs/ml-linalg-guide.md @@ -0,0 +1,103 @@ +--- +layout: global +title: MLlib Linear Algebra Acceleration Guide +displayTitle: MLlib Linear Algebra Acceleration Guide +license: | + Licensed to the Apache Software Foundation
[spark] branch master updated (c28da67 -> 44c868b)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c28da67 [SPARK-32382][SQL] Override table renaming in JDBC dialects add 44c868b [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs No new revisions were added by this update. Summary of changes: docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) create mode 100644 docs/ml-linalg-guide.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cfb718 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs 8cfb718 is described below commit 8cfb7183865c5358a547ec892f10d4f1350300ff Author: Xiaochang Wu AuthorDate: Tue Jul 28 08:36:11 2020 -0700 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs ### What changes were proposed in this pull request? Rewrite a clearer and complete BLAS native acceleration enabling guide. ### Why are the changes needed? The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29139 from xwu99/blas-doc. Lead-authored-by: Xiaochang Wu Co-authored-by: Wu, Xiaochang Signed-off-by: Huaxin Gao (cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90) Signed-off-by: Huaxin Gao --- docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) diff --git a/docs/ml-guide.md b/docs/ml-guide.md index ddce98b..1b4a3e4 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on -[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing. -If native libraries[^1] are not available at runtime, you will see a warning message and a pure JVM -implementation will be used instead. +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths. -Due to licensing issues with runtime proprietary binaries, we do not include `netlib-java`'s native -proxies by default. -To configure `netlib-java` / Breeze to use system optimised binaries, include -`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as a dependency of your -project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your -platform's additional installation instructions. - -The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. - -Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. - -Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) or [Intel oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). Note that if nativeBLAS is n [...] +Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead: +``` +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS +``` To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md new file mode 100644 index 000..7390913 --- /dev/null +++ b/docs/ml-linalg-guide.md @@ -0,0 +1,103 @@ +--- +layout: global +title: MLlib Linear Algebra Acceleration Guide +displayTitle: MLlib Linear Algebra Acceleration Guide +license: | + Licensed to the Apache Software Foundation
[spark] branch master updated (c28da67 -> 44c868b)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c28da67 [SPARK-32382][SQL] Override table renaming in JDBC dialects add 44c868b [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs No new revisions were added by this update. Summary of changes: docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) create mode 100644 docs/ml-linalg-guide.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cfb718 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs 8cfb718 is described below commit 8cfb7183865c5358a547ec892f10d4f1350300ff Author: Xiaochang Wu AuthorDate: Tue Jul 28 08:36:11 2020 -0700 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs ### What changes were proposed in this pull request? Rewrite a clearer and complete BLAS native acceleration enabling guide. ### Why are the changes needed? The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29139 from xwu99/blas-doc. Lead-authored-by: Xiaochang Wu Co-authored-by: Wu, Xiaochang Signed-off-by: Huaxin Gao (cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90) Signed-off-by: Huaxin Gao --- docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) diff --git a/docs/ml-guide.md b/docs/ml-guide.md index ddce98b..1b4a3e4 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on -[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing. -If native libraries[^1] are not available at runtime, you will see a warning message and a pure JVM -implementation will be used instead. +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths. -Due to licensing issues with runtime proprietary binaries, we do not include `netlib-java`'s native -proxies by default. -To configure `netlib-java` / Breeze to use system optimised binaries, include -`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as a dependency of your -project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your -platform's additional installation instructions. - -The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. - -Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. - -Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) or [Intel oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). Note that if nativeBLAS is n [...] +Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead: +``` +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS +``` To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md new file mode 100644 index 000..7390913 --- /dev/null +++ b/docs/ml-linalg-guide.md @@ -0,0 +1,103 @@ +--- +layout: global +title: MLlib Linear Algebra Acceleration Guide +displayTitle: MLlib Linear Algebra Acceleration Guide +license: | + Licensed to the Apache Software Foundation
[spark] branch master updated (c28da67 -> 44c868b)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c28da67 [SPARK-32382][SQL] Override table renaming in JDBC dialects add 44c868b [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs No new revisions were added by this update. Summary of changes: docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) create mode 100644 docs/ml-linalg-guide.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cfb718 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs 8cfb718 is described below commit 8cfb7183865c5358a547ec892f10d4f1350300ff Author: Xiaochang Wu AuthorDate: Tue Jul 28 08:36:11 2020 -0700 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs ### What changes were proposed in this pull request? Rewrite a clearer and complete BLAS native acceleration enabling guide. ### Why are the changes needed? The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29139 from xwu99/blas-doc. Lead-authored-by: Xiaochang Wu Co-authored-by: Wu, Xiaochang Signed-off-by: Huaxin Gao (cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90) Signed-off-by: Huaxin Gao --- docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) diff --git a/docs/ml-guide.md b/docs/ml-guide.md index ddce98b..1b4a3e4 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on -[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing. -If native libraries[^1] are not available at runtime, you will see a warning message and a pure JVM -implementation will be used instead. +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths. -Due to licensing issues with runtime proprietary binaries, we do not include `netlib-java`'s native -proxies by default. -To configure `netlib-java` / Breeze to use system optimised binaries, include -`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as a dependency of your -project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your -platform's additional installation instructions. - -The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. - -Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. - -Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) or [Intel oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). Note that if nativeBLAS is n [...] +Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead: +``` +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS +``` To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md new file mode 100644 index 000..7390913 --- /dev/null +++ b/docs/ml-linalg-guide.md @@ -0,0 +1,103 @@ +--- +layout: global +title: MLlib Linear Algebra Acceleration Guide +displayTitle: MLlib Linear Algebra Acceleration Guide +license: | + Licensed to the Apache Software Foundation
[spark] branch master updated (c28da67 -> 44c868b)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c28da67 [SPARK-32382][SQL] Override table renaming in JDBC dialects add 44c868b [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs No new revisions were added by this update. Summary of changes: docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) create mode 100644 docs/ml-linalg-guide.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cfb718 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs 8cfb718 is described below commit 8cfb7183865c5358a547ec892f10d4f1350300ff Author: Xiaochang Wu AuthorDate: Tue Jul 28 08:36:11 2020 -0700 [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs ### What changes were proposed in this pull request? Rewrite a clearer and complete BLAS native acceleration enabling guide. ### Why are the changes needed? The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #29139 from xwu99/blas-doc. Lead-authored-by: Xiaochang Wu Co-authored-by: Wu, Xiaochang Signed-off-by: Huaxin Gao (cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90) Signed-off-by: Huaxin Gao --- docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) diff --git a/docs/ml-guide.md b/docs/ml-guide.md index ddce98b..1b4a3e4 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on -[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing. -If native libraries[^1] are not available at runtime, you will see a warning message and a pure JVM -implementation will be used instead. +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths. -Due to licensing issues with runtime proprietary binaries, we do not include `netlib-java`'s native -proxies by default. -To configure `netlib-java` / Breeze to use system optimised binaries, include -`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as a dependency of your -project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your -platform's additional installation instructions. - -The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. - -Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. - -Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) or [Intel oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). Note that if nativeBLAS is n [...] +Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead: +``` +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS +WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS +``` To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md new file mode 100644 index 000..7390913 --- /dev/null +++ b/docs/ml-linalg-guide.md @@ -0,0 +1,103 @@ +--- +layout: global +title: MLlib Linear Algebra Acceleration Guide +displayTitle: MLlib Linear Algebra Acceleration Guide +license: | + Licensed to the Apache Software Foundation
[spark] branch master updated (c28da67 -> 44c868b)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c28da67 [SPARK-32382][SQL] Override table renaming in JDBC dialects add 44c868b [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs No new revisions were added by this update. Summary of changes: docs/ml-guide.md| 22 +++ docs/ml-linalg-guide.md | 103 2 files changed, 109 insertions(+), 16 deletions(-) create mode 100644 docs/ml-linalg-guide.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c114066 -> f7542d3)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c114066 [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable add f7542d3 [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/classification/DecisionTreeClassifier.scala | 2 +- .../scala/org/apache/spark/ml/classification/FMClassifier.scala | 6 ++ .../scala/org/apache/spark/ml/classification/GBTClassifier.scala| 2 +- .../org/apache/spark/ml/classification/LogisticRegression.scala | 4 ++-- .../spark/ml/classification/MultilayerPerceptronClassifier.scala| 2 +- 5 files changed, 7 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c114066 -> f7542d3)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c114066 [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable add f7542d3 [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/classification/DecisionTreeClassifier.scala | 2 +- .../scala/org/apache/spark/ml/classification/FMClassifier.scala | 6 ++ .../scala/org/apache/spark/ml/classification/GBTClassifier.scala| 2 +- .../org/apache/spark/ml/classification/LogisticRegression.scala | 4 ++-- .../spark/ml/classification/MultilayerPerceptronClassifier.scala| 2 +- 5 files changed, 7 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c114066 -> f7542d3)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c114066 [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable add f7542d3 [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/classification/DecisionTreeClassifier.scala | 2 +- .../scala/org/apache/spark/ml/classification/FMClassifier.scala | 6 ++ .../scala/org/apache/spark/ml/classification/GBTClassifier.scala| 2 +- .../org/apache/spark/ml/classification/LogisticRegression.scala | 4 ++-- .../spark/ml/classification/MultilayerPerceptronClassifier.scala| 2 +- 5 files changed, 7 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c114066 -> f7542d3)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c114066 [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable add f7542d3 [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/classification/DecisionTreeClassifier.scala | 2 +- .../scala/org/apache/spark/ml/classification/FMClassifier.scala | 6 ++ .../scala/org/apache/spark/ml/classification/GBTClassifier.scala| 2 +- .../org/apache/spark/ml/classification/LogisticRegression.scala | 4 ++-- .../spark/ml/classification/MultilayerPerceptronClassifier.scala| 2 +- 5 files changed, 7 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c114066 -> f7542d3)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c114066 [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable add f7542d3 [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/classification/DecisionTreeClassifier.scala | 2 +- .../scala/org/apache/spark/ml/classification/FMClassifier.scala | 6 ++ .../scala/org/apache/spark/ml/classification/GBTClassifier.scala| 2 +- .../org/apache/spark/ml/classification/LogisticRegression.scala | 4 ++-- .../spark/ml/classification/MultilayerPerceptronClassifier.scala| 2 +- 5 files changed, 7 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3165ca7 -> 122c899)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3165ca7 [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" in Hive IsolatedClientLoader add 122c899 [SPARK-33251][FOLLOWUP][PYTHON][DOCS][MINOR] Adjusts returns PrefixSpan.findFrequentSequentialPatterns No new revisions were added by this update. Summary of changes: python/pyspark/ml/fpm.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3165ca7 -> 122c899)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3165ca7 [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" in Hive IsolatedClientLoader add 122c899 [SPARK-33251][FOLLOWUP][PYTHON][DOCS][MINOR] Adjusts returns PrefixSpan.findFrequentSequentialPatterns No new revisions were added by this update. Summary of changes: python/pyspark/ml/fpm.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3165ca7 -> 122c899)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3165ca7 [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" in Hive IsolatedClientLoader add 122c899 [SPARK-33251][FOLLOWUP][PYTHON][DOCS][MINOR] Adjusts returns PrefixSpan.findFrequentSequentialPatterns No new revisions were added by this update. Summary of changes: python/pyspark/ml/fpm.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3165ca7 -> 122c899)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3165ca7 [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" in Hive IsolatedClientLoader add 122c899 [SPARK-33251][FOLLOWUP][PYTHON][DOCS][MINOR] Adjusts returns PrefixSpan.findFrequentSequentialPatterns No new revisions were added by this update. Summary of changes: python/pyspark/ml/fpm.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3165ca7 -> 122c899)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3165ca7 [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" in Hive IsolatedClientLoader add 122c899 [SPARK-33251][FOLLOWUP][PYTHON][DOCS][MINOR] Adjusts returns PrefixSpan.findFrequentSequentialPatterns No new revisions were added by this update. Summary of changes: python/pyspark/ml/fpm.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35678][ML][FOLLOWUP] softmax support offset and step
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fdf86fd [SPARK-35678][ML][FOLLOWUP] softmax support offset and step fdf86fd is described below commit fdf86fd6e795474afb78d1917369fec288d06b24 Author: Ruifeng Zheng AuthorDate: Thu Jun 17 22:46:36 2021 -0700 [SPARK-35678][ML][FOLLOWUP] softmax support offset and step ### What changes were proposed in this pull request? use newly impled softmax function in NB ### Why are the changes needed? to simplify impl ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing testsuite Closes #32927 from zhengruifeng/softmax__followup. Authored-by: Ruifeng Zheng Signed-off-by: Huaxin Gao --- .../scala/org/apache/spark/ml/impl/Utils.scala | 44 ++-- .../org/apache/spark/ml/ann/LossFunction.scala | 26 ++ .../spark/ml/classification/NaiveBayes.scala | 15 +- .../MultinomialLogisticBlockAggregator.scala | 58 +++--- 4 files changed, 62 insertions(+), 81 deletions(-) diff --git a/mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala b/mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala index 8ff5b6a..abe1d4b 100644 --- a/mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala +++ b/mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala @@ -99,30 +99,42 @@ private[spark] object Utils { /** * Perform in-place softmax conversion. */ - def softmax(values: Array[Double]): Unit = { + def softmax(array: Array[Double]): Unit = +softmax(array, array.length, 0, 1, array) + + /** + * Perform softmax conversion. + */ + def softmax( + input: Array[Double], + n: Int, + offset: Int, + step: Int, + output: Array[Double]): Unit = { var maxValue = Double.MinValue -var i = 0 -while (i < values.length) { - val value = values(i) - if (value.isPosInfinity) { -java.util.Arrays.fill(values, 0) -values(i) = 1.0 +var i = offset +val end = offset + step * n +while (i < end) { + val v = input(i) + if (v.isPosInfinity) { +BLAS.javaBLAS.dscal(n, 0.0, output, offset, step) +output(i) = 1.0 return - } else if (value > maxValue) { -maxValue = value + } else if (v > maxValue) { +maxValue = v } - i += 1 + i += step } var sum = 0.0 -i = 0 -while (i < values.length) { - val exp = math.exp(values(i) - maxValue) - values(i) = exp +i = offset +while (i < end) { + val exp = math.exp(input(i) - maxValue) + output(i) = exp sum += exp - i += 1 + i += step } -BLAS.javaBLAS.dscal(values.length, 1.0 / sum, values, 1) +BLAS.javaBLAS.dscal(n, 1.0 / sum, output, offset, step) } } diff --git a/mllib/src/main/scala/org/apache/spark/ml/ann/LossFunction.scala b/mllib/src/main/scala/org/apache/spark/ml/ann/LossFunction.scala index 3aea568..37e7b53 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/ann/LossFunction.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/ann/LossFunction.scala @@ -22,6 +22,8 @@ import java.util.Random import breeze.linalg.{sum => Bsum, DenseMatrix => BDM, DenseVector => BDV} import breeze.numerics.{log => brzlog} +import org.apache.spark.ml.impl.Utils + /** * Trait for loss function */ @@ -79,30 +81,10 @@ private[ann] class SoftmaxLayerModelWithCrossEntropyLoss extends LayerModel with val weights = new BDV[Double](0) override def eval(data: BDM[Double], output: BDM[Double]): Unit = { +require(!data.isTranspose && !output.isTranspose) var j = 0 -// find max value to make sure later that exponent is computable while (j < data.cols) { - var i = 0 - var max = Double.MinValue - while (i < data.rows) { -if (data(i, j) > max) { - max = data(i, j) -} -i += 1 - } - var sum = 0.0 - i = 0 - while (i < data.rows) { -val res = math.exp(data(i, j) - max) -output(i, j) = res -sum += res -i += 1 - } - i = 0 - while (i < data.rows) { -output(i, j) /= sum -i += 1 - } + Utils.softmax(data.data, data.rows, j * data.rows, 1, output.data) j += 1 } } diff --git a/mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala b/mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala index 6b1537b..fd19ec3 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala @@ -2
[spark] branch master updated (be90897 -> a667388)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from be90897 [SPARK-35588][PYTHON][DOCS] Merge Binder integration and quickstart notebook for pandas API on Spark add a667388 [SPARK-35678][ML][FOLLOWUP] softmax support offset and step No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ml/impl/Utils.scala | 44 ++-- .../org/apache/spark/ml/ann/LossFunction.scala | 26 ++ .../spark/ml/classification/NaiveBayes.scala | 15 +- .../MultinomialLogisticBlockAggregator.scala | 58 +++--- python/pyspark/ml/tests/test_algorithms.py | 2 +- 5 files changed, 63 insertions(+), 82 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Organization update
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new f9cf29d Organization update f9cf29d is described below commit f9cf29d603ed5ce5bd6388c5824d02f95082c8b0 Author: Jungtaek Lim AuthorDate: Mon May 3 23:30:43 2021 -0700 Organization update Author: Jungtaek Lim Closes #337 from HeartSaVioR/jungtaek-dbx. --- committers.md| 2 +- site/committers.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/committers.md b/committers.md index 458b5ac..ad12fa9 100644 --- a/committers.md +++ b/committers.md @@ -53,7 +53,7 @@ navigation: |Davies Liu|Juicedata| |Cheng Lian|Databricks| |Yanbo Liang|Facebook| -|Jungtaek Lim|Cloudera| +|Jungtaek Lim|Databricks| |Sean McNamara|Oracle| |Xiangrui Meng|Databricks| |Mridul Muralidharan|LinkedIn| diff --git a/site/committers.html b/site/committers.html index 16c048b..93bab98 100644 --- a/site/committers.html +++ b/site/committers.html @@ -384,7 +384,7 @@ Jungtaek Lim - Cloudera + Databricks Sean McNamara - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0076eba [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation 0076eba is described below commit 0076eba8d066936c32790ebc4058c52e2d21a207 Author: Hyukjin Kwon AuthorDate: Wed Sep 22 23:00:15 2021 -0700 [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation ### What changes were proposed in this pull request? This PR fixes the 'options' description on `UnresolvedRelation`. This comment was added in https://github.com/apache/spark/pull/29535 but not valid anymore because V1 also uses this `options` (and merge the options with the table properties) per https://github.com/apache/spark/pull/29712. This PR can go through from `master` to `branch-3.1`. ### Why are the changes needed? To make `UnresolvedRelation.options`'s description clearer. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Scala linter by `dev/linter-scala`. Closes #34075 from HyukjinKwon/minor-comment-unresolved-releation. Authored-by: Hyukjin Kwon Signed-off-by: Huaxin Gao --- .../main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala index 8417203..0785336 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala @@ -40,7 +40,7 @@ class UnresolvedException(function: String) * Holds the name of a relation that has yet to be looked up in a catalog. * * @param multipartIdentifier table name - * @param options options to scan this relation. Only applicable to v2 table scan. + * @param options options to scan this relation. */ case class UnresolvedRelation( multipartIdentifier: Seq[String], - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new b5cb3b6 [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation b5cb3b6 is described below commit b5cb3b682a2cecae6d826f7610a2606c48fc9643 Author: Hyukjin Kwon AuthorDate: Wed Sep 22 23:00:15 2021 -0700 [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation ### What changes were proposed in this pull request? This PR fixes the 'options' description on `UnresolvedRelation`. This comment was added in https://github.com/apache/spark/pull/29535 but not valid anymore because V1 also uses this `options` (and merge the options with the table properties) per https://github.com/apache/spark/pull/29712. This PR can go through from `master` to `branch-3.1`. ### Why are the changes needed? To make `UnresolvedRelation.options`'s description clearer. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Scala linter by `dev/linter-scala`. Closes #34075 from HyukjinKwon/minor-comment-unresolved-releation. Authored-by: Hyukjin Kwon Signed-off-by: Huaxin Gao (cherry picked from commit 0076eba8d066936c32790ebc4058c52e2d21a207) Signed-off-by: Huaxin Gao --- .../main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala index 55eca63..ec420c4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala @@ -41,7 +41,7 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, function: Str * Holds the name of a relation that has yet to be looked up in a catalog. * * @param multipartIdentifier table name - * @param options options to scan this relation. Only applicable to v2 table scan. + * @param options options to scan this relation. */ case class UnresolvedRelation( multipartIdentifier: Seq[String], - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new af569d1 [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation af569d1 is described below commit af569d1b0ac6b25dbd500804a395964ef7f9e60f Author: Hyukjin Kwon AuthorDate: Wed Sep 22 23:00:15 2021 -0700 [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation ### What changes were proposed in this pull request? This PR fixes the 'options' description on `UnresolvedRelation`. This comment was added in https://github.com/apache/spark/pull/29535 but not valid anymore because V1 also uses this `options` (and merge the options with the table properties) per https://github.com/apache/spark/pull/29712. This PR can go through from `master` to `branch-3.1`. ### Why are the changes needed? To make `UnresolvedRelation.options`'s description clearer. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Scala linter by `dev/linter-scala`. Closes #34075 from HyukjinKwon/minor-comment-unresolved-releation. Authored-by: Hyukjin Kwon Signed-off-by: Huaxin Gao (cherry picked from commit 0076eba8d066936c32790ebc4058c52e2d21a207) Signed-off-by: Huaxin Gao --- .../main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala index 9f05367..9db038d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala @@ -41,7 +41,7 @@ class UnresolvedException(function: String) * Holds the name of a relation that has yet to be looked up in a catalog. * * @param multipartIdentifier table name - * @param options options to scan this relation. Only applicable to v2 table scan. + * @param options options to scan this relation. */ case class UnresolvedRelation( multipartIdentifier: Seq[String], - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1f3eb73 -> c411d26)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1f3eb73 [SPARK-37510][PYTHON] Support basic operations of timedelta Series/Index add c411d26 [SPARK-37330][SQL] Migrate ReplaceTableStatement to v2 command No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/ResolveCatalogs.scala| 12 --- .../spark/sql/catalyst/parser/AstBuilder.scala | 11 ++- .../sql/catalyst/plans/logical/statements.scala| 23 +- .../sql/catalyst/plans/logical/v2Commands.scala| 19 ++ .../sql/connector/catalog/CatalogV2Util.scala | 6 +- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 20 +-- .../catalyst/analysis/ResolveSessionCatalog.scala | 19 +++--- .../datasources/v2/DataSourceV2Strategy.scala | 19 -- .../datasources/v2/ReplaceTableExec.scala | 11 --- 9 files changed, 56 insertions(+), 84 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37545][SQL] V2 CreateTableAsSelect command should qualify location
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new feba5ac [SPARK-37545][SQL] V2 CreateTableAsSelect command should qualify location feba5ac is described below commit feba5ac32f2598f6ca8a274850934106be0db64d Author: Terry Kim AuthorDate: Sat Dec 4 20:47:45 2021 -0800 [SPARK-37545][SQL] V2 CreateTableAsSelect command should qualify location ### What changes were proposed in this pull request? Currently, v2 CTAS command doesn't qualify the location: ``` spark.sql("CREATE TABLE testcat.t USING foo LOCATION '/tmp/foo' AS SELECT id FROM source") spark.sql("DESCRIBE EXTENDED testcat.t").filter("col_name = 'Location'").show ++-+---+ |col_name|data_type|comment| ++-+---+ |Location|/tmp/foo | | ++-+---+ ``` , whereas v1 command qualifies the location as `file:/tmp/foo` which is the correct behavior since the default filesystem can change for different sessions. ### Why are the changes needed? This PR proposes to store the qualified location in order to prevent the issue where default filesystem changes for different sessions. ### Does this PR introduce _any_ user-facing change? Yes, now, v2 CTAS command will store qualified location: ``` ++-+---+ |col_name|data_type|comment| ++-+---+ |Location|file:/tmp/foo| | ++-+---+ ``` ### How was this patch tested? Added new test Closes #34806 from imback82/v2_ctas_qualified_loc. Authored-by: Terry Kim Signed-off-by: Huaxin Gao --- .../execution/datasources/v2/DataSourceV2Strategy.scala | 6 -- .../DataSourceV2DataFrameSessionCatalogSuite.scala| 4 ++-- .../apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 15 +++ 3 files changed, 21 insertions(+), 4 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala index f73b1a6..dbe4168 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala @@ -172,13 +172,15 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat case CreateTableAsSelect(ResolvedDBObjectName(catalog, ident), parts, query, tableSpec, options, ifNotExists) => val writeOptions = new CaseInsensitiveStringMap(options.asJava) + val tableSpecWithQualifiedLocation = tableSpec.copy( +location = tableSpec.location.map(makeQualifiedDBObjectPath(_))) catalog match { case staging: StagingTableCatalog => AtomicCreateTableAsSelectExec(staging, ident.asIdentifier, parts, query, planLater(query), -tableSpec, writeOptions, ifNotExists) :: Nil +tableSpecWithQualifiedLocation, writeOptions, ifNotExists) :: Nil case _ => CreateTableAsSelectExec(catalog.asTableCatalog, ident.asIdentifier, parts, query, -planLater(query), tableSpec, writeOptions, ifNotExists) :: Nil +planLater(query), tableSpecWithQualifiedLocation, writeOptions, ifNotExists) :: Nil } case RefreshTable(r: ResolvedTable) => diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSessionCatalogSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSessionCatalogSuite.scala index 91ac7db..3edc4b9 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSessionCatalogSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSessionCatalogSuite.scala @@ -83,10 +83,10 @@ class DataSourceV2DataFrameSessionCatalogSuite test("saveAsTable passes path and provider information properly") { val t1 = "prop_table" withTable(t1) { - spark.range(20).write.format(v2Format).option("path", "abc").saveAsTable(t1) + spark.range(20).write.format(v2Format).option("path", "/abc").saveAsTable(t1) val cat = spark.sessionState.catalogManager.currentCatalog.asInstanceOf[TableCatalog] val tableInfo = cat.loadTable(Identifier.of(Array("default"), t1)) - assert(tableInfo.properties().get("location") === "abc") + assert(tableInfo.properties().get("location&qu