[spark] branch master updated (58f87b3 -> a0bd273)

2020-08-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 58f87b3  [SPARK-32639][SQL] Support GroupType parquet mapkey field
 add a0bd273  [SPARK-32092][ML][PYSPARK][FOLLOWUP] Fixed 
CrossValidatorModel.copy() to copy models instead of list

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/tests/test_tuning.py | 8 
 python/pyspark/ml/tuning.py| 5 -
 2 files changed, 8 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (58f87b3 -> a0bd273)

2020-08-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 58f87b3  [SPARK-32639][SQL] Support GroupType parquet mapkey field
 add a0bd273  [SPARK-32092][ML][PYSPARK][FOLLOWUP] Fixed 
CrossValidatorModel.copy() to copy models instead of list

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/tests/test_tuning.py | 8 
 python/pyspark/ml/tuning.py| 5 -
 2 files changed, 8 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (58f87b3 -> a0bd273)

2020-08-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 58f87b3  [SPARK-32639][SQL] Support GroupType parquet mapkey field
 add a0bd273  [SPARK-32092][ML][PYSPARK][FOLLOWUP] Fixed 
CrossValidatorModel.copy() to copy models instead of list

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/tests/test_tuning.py | 8 
 python/pyspark/ml/tuning.py| 5 -
 2 files changed, 8 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (58f87b3 -> a0bd273)

2020-08-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 58f87b3  [SPARK-32639][SQL] Support GroupType parquet mapkey field
 add a0bd273  [SPARK-32092][ML][PYSPARK][FOLLOWUP] Fixed 
CrossValidatorModel.copy() to copy models instead of list

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/tests/test_tuning.py | 8 
 python/pyspark/ml/tuning.py| 5 -
 2 files changed, 8 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (58f87b3 -> a0bd273)

2020-08-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 58f87b3  [SPARK-32639][SQL] Support GroupType parquet mapkey field
 add a0bd273  [SPARK-32092][ML][PYSPARK][FOLLOWUP] Fixed 
CrossValidatorModel.copy() to copy models instead of list

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/tests/test_tuning.py | 8 
 python/pyspark/ml/tuning.py| 5 -
 2 files changed, 8 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1450b5e -> 1fd54f4)

2020-08-21 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1450b5e  [MINOR][DOCS] fix typo for docs,log message and comments
 add 1fd54f4  [SPARK-32662][ML] CountVectorizerModel: Remove requirement 
for minimum Vocab size

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/ml/feature/CountVectorizer.scala  |  5 +-
 .../spark/ml/feature/CountVectorizerSuite.scala| 74 +-
 2 files changed, 63 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1450b5e -> 1fd54f4)

2020-08-21 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1450b5e  [MINOR][DOCS] fix typo for docs,log message and comments
 add 1fd54f4  [SPARK-32662][ML] CountVectorizerModel: Remove requirement 
for minimum Vocab size

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/ml/feature/CountVectorizer.scala  |  5 +-
 .../spark/ml/feature/CountVectorizerSuite.scala| 74 +-
 2 files changed, 63 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1450b5e -> 1fd54f4)

2020-08-21 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1450b5e  [MINOR][DOCS] fix typo for docs,log message and comments
 add 1fd54f4  [SPARK-32662][ML] CountVectorizerModel: Remove requirement 
for minimum Vocab size

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/ml/feature/CountVectorizer.scala  |  5 +-
 .../spark/ml/feature/CountVectorizerSuite.scala| 74 +-
 2 files changed, 63 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1450b5e -> 1fd54f4)

2020-08-21 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1450b5e  [MINOR][DOCS] fix typo for docs,log message and comments
 add 1fd54f4  [SPARK-32662][ML] CountVectorizerModel: Remove requirement 
for minimum Vocab size

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/ml/feature/CountVectorizer.scala  |  5 +-
 .../spark/ml/feature/CountVectorizerSuite.scala| 74 +-
 2 files changed, 63 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1450b5e -> 1fd54f4)

2020-08-21 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1450b5e  [MINOR][DOCS] fix typo for docs,log message and comments
 add 1fd54f4  [SPARK-32662][ML] CountVectorizerModel: Remove requirement 
for minimum Vocab size

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/ml/feature/CountVectorizer.scala  |  5 +-
 .../spark/ml/feature/CountVectorizerSuite.scala| 74 +-
 2 files changed, 63 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

2020-08-23 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
8aa644e is described below

commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f
Author: Louiszr 
AuthorDate: Sun Aug 23 21:10:52 2020 -0700

[SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

### What changes were proposed in this pull request?

- Removed `foldCol` related code introduced in #29445 which is causing 
issues in the base branch.
- Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` 
on the models instead of lists of models.

### Why are the changes needed?

- `foldCol` is from 3.1 hence causing tests to fail.
- `CrossValidatorModel.copy()` is supposed to shallow copy models not lists 
of models.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing tests created in #29445 ran and passed.
- Updated `test_copy` to make sure `copy()` is called on models instead of 
lists of models.

Closes #29524 from Louiszr/remove-foldcol-3.0.

Authored-by: Louiszr 
Signed-off-by: Huaxin Gao 
---
 python/pyspark/ml/tests/test_tuning.py | 11 ---
 python/pyspark/ml/tuning.py|  7 ---
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/python/pyspark/ml/tests/test_tuning.py 
b/python/pyspark/ml/tests/test_tuning.py
index b250740..b1acaf6 100644
--- a/python/pyspark/ml/tests/test_tuning.py
+++ b/python/pyspark/ml/tests/test_tuning.py
@@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 lambda x: x.getEstimator().uid,
 # SPARK-32092: CrossValidator.copy() needs to copy all existing 
params
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getCollectSubModels(),
 lambda x: x.getParallelism(),
 lambda x: x.getSeed()
@@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing 
params
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed()
 ]:
 self.assertEqual(param(cvModel), param(cvModelCopied))
@@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase):
 'foo',
 "Changing the original avgMetrics should not affect the copied 
model"
 )
-cvModel.subModels[0] = 'foo'
+cvModel.subModels[0][0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-cvModelCopied.subModels[0],
+cvModelCopied.subModels[0][0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
@@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 loadedCvModel = CrossValidatorModel.load(cvModelPath)
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed(),
 lambda x: len(x.subModels)
 ]:
@@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase):
 'foo',
 "Changing the original validationMetrics should not affect the 
copied model"
 )
-tvsModel.subModels[0] = 'foo'
+tvsModel.subModels[0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-tvsModelCopied.subModels[0],
+tvsModelCopied.subModels[0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py
index 91f34ef..6283c8b 100644
--- a/python/pyspark/ml/tuning.py
+++ b/python/pyspark/ml/tuning.py
@@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 extra = dict()
 bestModel = self.bestModel.copy(extra)
 avgMetrics = list(self.avgMetrics)
-subModels = [model.copy() for model in self.subModels]
+subModels = [
+[sub_model.copy() for sub_model in fold_sub_models]
+for fold_sub_models in self.subModels
+]
 return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, 
subModels), extra=extra)
 
 @since("2.3.0")
@@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 "estimator": estimator,
 "estimatorParamMaps": epms,
 "numFolds": java_stage.ge

[spark] branch branch-3.0 updated (da60de5 -> 8aa644e)

2020-08-23 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from da60de5  [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function
 add 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/tests/test_tuning.py | 11 ---
 python/pyspark/ml/tuning.py|  7 ---
 2 files changed, 8 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

2020-08-23 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
8aa644e is described below

commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f
Author: Louiszr 
AuthorDate: Sun Aug 23 21:10:52 2020 -0700

[SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

### What changes were proposed in this pull request?

- Removed `foldCol` related code introduced in #29445 which is causing 
issues in the base branch.
- Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` 
on the models instead of lists of models.

### Why are the changes needed?

- `foldCol` is from 3.1 hence causing tests to fail.
- `CrossValidatorModel.copy()` is supposed to shallow copy models not lists 
of models.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing tests created in #29445 ran and passed.
- Updated `test_copy` to make sure `copy()` is called on models instead of 
lists of models.

Closes #29524 from Louiszr/remove-foldcol-3.0.

Authored-by: Louiszr 
Signed-off-by: Huaxin Gao 
---
 python/pyspark/ml/tests/test_tuning.py | 11 ---
 python/pyspark/ml/tuning.py|  7 ---
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/python/pyspark/ml/tests/test_tuning.py 
b/python/pyspark/ml/tests/test_tuning.py
index b250740..b1acaf6 100644
--- a/python/pyspark/ml/tests/test_tuning.py
+++ b/python/pyspark/ml/tests/test_tuning.py
@@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 lambda x: x.getEstimator().uid,
 # SPARK-32092: CrossValidator.copy() needs to copy all existing 
params
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getCollectSubModels(),
 lambda x: x.getParallelism(),
 lambda x: x.getSeed()
@@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing 
params
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed()
 ]:
 self.assertEqual(param(cvModel), param(cvModelCopied))
@@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase):
 'foo',
 "Changing the original avgMetrics should not affect the copied 
model"
 )
-cvModel.subModels[0] = 'foo'
+cvModel.subModels[0][0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-cvModelCopied.subModels[0],
+cvModelCopied.subModels[0][0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
@@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 loadedCvModel = CrossValidatorModel.load(cvModelPath)
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed(),
 lambda x: len(x.subModels)
 ]:
@@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase):
 'foo',
 "Changing the original validationMetrics should not affect the 
copied model"
 )
-tvsModel.subModels[0] = 'foo'
+tvsModel.subModels[0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-tvsModelCopied.subModels[0],
+tvsModelCopied.subModels[0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py
index 91f34ef..6283c8b 100644
--- a/python/pyspark/ml/tuning.py
+++ b/python/pyspark/ml/tuning.py
@@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 extra = dict()
 bestModel = self.bestModel.copy(extra)
 avgMetrics = list(self.avgMetrics)
-subModels = [model.copy() for model in self.subModels]
+subModels = [
+[sub_model.copy() for sub_model in fold_sub_models]
+for fold_sub_models in self.subModels
+]
 return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, 
subModels), extra=extra)
 
 @since("2.3.0")
@@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 "estimator": estimator,
 "estimatorParamMaps": epms,
 "numFolds": java_stage.ge

[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

2020-08-23 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
8aa644e is described below

commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f
Author: Louiszr 
AuthorDate: Sun Aug 23 21:10:52 2020 -0700

[SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

### What changes were proposed in this pull request?

- Removed `foldCol` related code introduced in #29445 which is causing 
issues in the base branch.
- Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` 
on the models instead of lists of models.

### Why are the changes needed?

- `foldCol` is from 3.1 hence causing tests to fail.
- `CrossValidatorModel.copy()` is supposed to shallow copy models not lists 
of models.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing tests created in #29445 ran and passed.
- Updated `test_copy` to make sure `copy()` is called on models instead of 
lists of models.

Closes #29524 from Louiszr/remove-foldcol-3.0.

Authored-by: Louiszr 
Signed-off-by: Huaxin Gao 
---
 python/pyspark/ml/tests/test_tuning.py | 11 ---
 python/pyspark/ml/tuning.py|  7 ---
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/python/pyspark/ml/tests/test_tuning.py 
b/python/pyspark/ml/tests/test_tuning.py
index b250740..b1acaf6 100644
--- a/python/pyspark/ml/tests/test_tuning.py
+++ b/python/pyspark/ml/tests/test_tuning.py
@@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 lambda x: x.getEstimator().uid,
 # SPARK-32092: CrossValidator.copy() needs to copy all existing 
params
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getCollectSubModels(),
 lambda x: x.getParallelism(),
 lambda x: x.getSeed()
@@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing 
params
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed()
 ]:
 self.assertEqual(param(cvModel), param(cvModelCopied))
@@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase):
 'foo',
 "Changing the original avgMetrics should not affect the copied 
model"
 )
-cvModel.subModels[0] = 'foo'
+cvModel.subModels[0][0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-cvModelCopied.subModels[0],
+cvModelCopied.subModels[0][0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
@@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 loadedCvModel = CrossValidatorModel.load(cvModelPath)
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed(),
 lambda x: len(x.subModels)
 ]:
@@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase):
 'foo',
 "Changing the original validationMetrics should not affect the 
copied model"
 )
-tvsModel.subModels[0] = 'foo'
+tvsModel.subModels[0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-tvsModelCopied.subModels[0],
+tvsModelCopied.subModels[0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py
index 91f34ef..6283c8b 100644
--- a/python/pyspark/ml/tuning.py
+++ b/python/pyspark/ml/tuning.py
@@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 extra = dict()
 bestModel = self.bestModel.copy(extra)
 avgMetrics = list(self.avgMetrics)
-subModels = [model.copy() for model in self.subModels]
+subModels = [
+[sub_model.copy() for sub_model in fold_sub_models]
+for fold_sub_models in self.subModels
+]
 return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, 
subModels), extra=extra)
 
 @since("2.3.0")
@@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 "estimator": estimator,
 "estimatorParamMaps": epms,
 "numFolds": java_stage.ge

[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

2020-08-23 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
8aa644e is described below

commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f
Author: Louiszr 
AuthorDate: Sun Aug 23 21:10:52 2020 -0700

[SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

### What changes were proposed in this pull request?

- Removed `foldCol` related code introduced in #29445 which is causing 
issues in the base branch.
- Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` 
on the models instead of lists of models.

### Why are the changes needed?

- `foldCol` is from 3.1 hence causing tests to fail.
- `CrossValidatorModel.copy()` is supposed to shallow copy models not lists 
of models.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing tests created in #29445 ran and passed.
- Updated `test_copy` to make sure `copy()` is called on models instead of 
lists of models.

Closes #29524 from Louiszr/remove-foldcol-3.0.

Authored-by: Louiszr 
Signed-off-by: Huaxin Gao 
---
 python/pyspark/ml/tests/test_tuning.py | 11 ---
 python/pyspark/ml/tuning.py|  7 ---
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/python/pyspark/ml/tests/test_tuning.py 
b/python/pyspark/ml/tests/test_tuning.py
index b250740..b1acaf6 100644
--- a/python/pyspark/ml/tests/test_tuning.py
+++ b/python/pyspark/ml/tests/test_tuning.py
@@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 lambda x: x.getEstimator().uid,
 # SPARK-32092: CrossValidator.copy() needs to copy all existing 
params
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getCollectSubModels(),
 lambda x: x.getParallelism(),
 lambda x: x.getSeed()
@@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing 
params
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed()
 ]:
 self.assertEqual(param(cvModel), param(cvModelCopied))
@@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase):
 'foo',
 "Changing the original avgMetrics should not affect the copied 
model"
 )
-cvModel.subModels[0] = 'foo'
+cvModel.subModels[0][0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-cvModelCopied.subModels[0],
+cvModelCopied.subModels[0][0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
@@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 loadedCvModel = CrossValidatorModel.load(cvModelPath)
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed(),
 lambda x: len(x.subModels)
 ]:
@@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase):
 'foo',
 "Changing the original validationMetrics should not affect the 
copied model"
 )
-tvsModel.subModels[0] = 'foo'
+tvsModel.subModels[0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-tvsModelCopied.subModels[0],
+tvsModelCopied.subModels[0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py
index 91f34ef..6283c8b 100644
--- a/python/pyspark/ml/tuning.py
+++ b/python/pyspark/ml/tuning.py
@@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 extra = dict()
 bestModel = self.bestModel.copy(extra)
 avgMetrics = list(self.avgMetrics)
-subModels = [model.copy() for model in self.subModels]
+subModels = [
+[sub_model.copy() for sub_model in fold_sub_models]
+for fold_sub_models in self.subModels
+]
 return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, 
subModels), extra=extra)
 
 @since("2.3.0")
@@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 "estimator": estimator,
 "estimatorParamMaps": epms,
 "numFolds": java_stage.ge

[spark] branch branch-3.0 updated (4a67f1e -> 007acba)

2020-08-24 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++
 4 files changed, 60 insertions(+), 86 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

2020-08-24 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
007acba is described below

commit 007acba6e3b0e45e334bed5942692dd88c61b3ea
Author: Huaxin Gao 
AuthorDate: Mon Aug 24 08:47:01 2020 -0700

[SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

### What changes were proposed in this pull request?
backporting https://github.com/apache/spark/pull/29501

### Why are the changes needed?
avoid double caching

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
Existing tests

Closes #29528 from huaxingao/kmeans_3.0.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
---
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++
 4 files changed, 60 insertions(+), 86 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
index b649b1d..b3f2d22 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
@@ -28,9 +28,8 @@ import org.apache.spark.ml.util._
 import org.apache.spark.ml.util.Instrumentation.instrumented
 import org.apache.spark.mllib.clustering.{BisectingKMeans => 
MLlibBisectingKMeans,
   BisectingKMeansModel => MLlibBisectingKMeansModel}
-import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => 
OldVectors}
+import org.apache.spark.mllib.linalg.{Vectors => OldVectors}
 import org.apache.spark.mllib.linalg.VectorImplicits._
-import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.{DataFrame, Dataset, Row}
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.types.{DoubleType, IntegerType, StructType}
@@ -275,21 +274,6 @@ class BisectingKMeans @Since("2.0.0") (
   override def fit(dataset: Dataset[_]): BisectingKMeansModel = instrumented { 
instr =>
 transformSchema(dataset.schema, logging = true)
 
-val handlePersistence = dataset.storageLevel == StorageLevel.NONE
-val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) {
-  col($(weightCol)).cast(DoubleType)
-} else {
-  lit(1.0)
-}
-
-val instances: RDD[(OldVector, Double)] = dataset
-  .select(DatasetUtils.columnToVector(dataset, getFeaturesCol), w).rdd.map 
{
-  case Row(point: Vector, weight: Double) => (OldVectors.fromML(point), 
weight)
-}
-if (handlePersistence) {
-  instances.persist(StorageLevel.MEMORY_AND_DISK)
-}
-
 instr.logPipelineStage(this)
 instr.logDataset(dataset)
 instr.logParams(this, featuresCol, predictionCol, k, maxIter, seed,
@@ -301,11 +285,18 @@ class BisectingKMeans @Since("2.0.0") (
   .setMinDivisibleClusterSize($(minDivisibleClusterSize))
   .setSeed($(seed))
   .setDistanceMeasure($(distanceMeasure))
-val parentModel = bkm.runWithWeight(instances, Some(instr))
-val model = copyValues(new BisectingKMeansModel(uid, 
parentModel).setParent(this))
-if (handlePersistence) {
-  instances.unpersist()
+
+val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) {
+  col($(weightCol)).cast(DoubleType)
+} else {
+  lit(1.0)
 }
+val instances = dataset.select(DatasetUtils.columnToVector(dataset, 
getFeaturesCol), w)
+  .rdd.map { case Row(point: Vector, weight: Double) => 
(OldVectors.fromML(point), weight) }
+
+val handlePersistence = dataset.storageLevel == StorageLevel.NONE
+val parentModel = bkm.runWithWeight(instances, handlePersistence, 
Some(instr))
+val model = copyValues(new BisectingKMeansModel(uid, 
parentModel).setParent(this))
 
 val summary = new BisectingKMeansSummary(
   model.transform(dataset),
diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
index 5370318..e182f3d 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
@@ -31,7 +31,6 @@ import org.apache.spark.ml.util.Instrumentation.instrumented
 import org.apache.spark.mllib.clustering.{DistanceMeasure, KMeans => 
MLlibKMeans, KMeansModel => MLlibKMeansModel}
 import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => 
OldVectors}
 import org.apache.spark

[spark] branch branch-3.0 updated (4a67f1e -> 007acba)

2020-08-24 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++
 4 files changed, 60 insertions(+), 86 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (4a67f1e -> 007acba)

2020-08-24 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++
 4 files changed, 60 insertions(+), 86 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (4a67f1e -> 007acba)

2020-08-24 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++
 4 files changed, 60 insertions(+), 86 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cf22d94 -> b05f309)

2020-07-15 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf22d94  [SPARK-32036] Replace references to blacklist/whitelist 
language with more appropriate terminology, excluding the blacklisting feature
 add b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 100 -
 .../apache/spark/ml/regression/FMRegressor.scala   |  10 +--
 .../spark/mllib/optimization/GradientDescent.scala |  45 ++
 .../apache/spark/mllib/optimization/LBFGS.scala|  11 ++-
 .../ml/classification/FMClassifierSuite.scala  |  26 ++
 python/pyspark/ml/classification.py|  48 +-
 python/pyspark/ml/tests/test_training_summary.py   |  49 +-
 7 files changed, 257 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cf22d94 -> b05f309)

2020-07-15 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf22d94  [SPARK-32036] Replace references to blacklist/whitelist 
language with more appropriate terminology, excluding the blacklisting feature
 add b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 100 -
 .../apache/spark/ml/regression/FMRegressor.scala   |  10 +--
 .../spark/mllib/optimization/GradientDescent.scala |  45 ++
 .../apache/spark/mllib/optimization/LBFGS.scala|  11 ++-
 .../ml/classification/FMClassifierSuite.scala  |  26 ++
 python/pyspark/ml/classification.py|  48 +-
 python/pyspark/ml/tests/test_training_summary.py   |  49 +-
 7 files changed, 257 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cf22d94 -> b05f309)

2020-07-15 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf22d94  [SPARK-32036] Replace references to blacklist/whitelist 
language with more appropriate terminology, excluding the blacklisting feature
 add b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 100 -
 .../apache/spark/ml/regression/FMRegressor.scala   |  10 +--
 .../spark/mllib/optimization/GradientDescent.scala |  45 ++
 .../apache/spark/mllib/optimization/LBFGS.scala|  11 ++-
 .../ml/classification/FMClassifierSuite.scala  |  26 ++
 python/pyspark/ml/classification.py|  48 +-
 python/pyspark/ml/tests/test_training_summary.py   |  49 +-
 7 files changed, 257 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cf22d94 -> b05f309)

2020-07-15 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf22d94  [SPARK-32036] Replace references to blacklist/whitelist 
language with more appropriate terminology, excluding the blacklisting feature
 add b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 100 -
 .../apache/spark/ml/regression/FMRegressor.scala   |  10 +--
 .../spark/mllib/optimization/GradientDescent.scala |  45 ++
 .../apache/spark/mllib/optimization/LBFGS.scala|  11 ++-
 .../ml/classification/FMClassifierSuite.scala  |  26 ++
 python/pyspark/ml/classification.py|  48 +-
 python/pyspark/ml/tests/test_training_summary.py   |  49 +-
 7 files changed, 257 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c6109ba -> bc78859)

2020-08-03 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c6109ba  [SPARK-32257][SQL] Reports explicit errors for invalid usage 
of SET/RESET command
 add bc78859  [SPARK-32310][ML][PYSPARK] ML params default value parity in 
feature and tuning

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../org/apache/spark/ml/feature/Selector.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../apache/spark/ml/tuning/CrossValidator.scala|   4 +-
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  56 +++---
 python/pyspark/ml/clustering.py|  30 --
 python/pyspark/ml/feature.py   | 120 +
 python/pyspark/ml/fpm.py   |   9 +-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  60 +++
 python/pyspark/ml/tests/test_param.py  |   8 +-
 python/pyspark/ml/tuning.py|  17 ++-
 22 files changed, 274 insertions(+), 135 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c6109ba -> bc78859)

2020-08-03 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c6109ba  [SPARK-32257][SQL] Reports explicit errors for invalid usage 
of SET/RESET command
 add bc78859  [SPARK-32310][ML][PYSPARK] ML params default value parity in 
feature and tuning

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../org/apache/spark/ml/feature/Selector.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../apache/spark/ml/tuning/CrossValidator.scala|   4 +-
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  56 +++---
 python/pyspark/ml/clustering.py|  30 --
 python/pyspark/ml/feature.py   | 120 +
 python/pyspark/ml/fpm.py   |   9 +-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  60 +++
 python/pyspark/ml/tests/test_param.py  |   8 +-
 python/pyspark/ml/tuning.py|  17 ++-
 22 files changed, 274 insertions(+), 135 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c6109ba -> bc78859)

2020-08-03 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c6109ba  [SPARK-32257][SQL] Reports explicit errors for invalid usage 
of SET/RESET command
 add bc78859  [SPARK-32310][ML][PYSPARK] ML params default value parity in 
feature and tuning

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../org/apache/spark/ml/feature/Selector.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../apache/spark/ml/tuning/CrossValidator.scala|   4 +-
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  56 +++---
 python/pyspark/ml/clustering.py|  30 --
 python/pyspark/ml/feature.py   | 120 +
 python/pyspark/ml/fpm.py   |   9 +-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  60 +++
 python/pyspark/ml/tests/test_param.py  |   8 +-
 python/pyspark/ml/tuning.py|  17 ++-
 22 files changed, 274 insertions(+), 135 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c6109ba -> bc78859)

2020-08-03 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c6109ba  [SPARK-32257][SQL] Reports explicit errors for invalid usage 
of SET/RESET command
 add bc78859  [SPARK-32310][ML][PYSPARK] ML params default value parity in 
feature and tuning

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../org/apache/spark/ml/feature/Selector.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../apache/spark/ml/tuning/CrossValidator.scala|   4 +-
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  56 +++---
 python/pyspark/ml/clustering.py|  30 --
 python/pyspark/ml/feature.py   | 120 +
 python/pyspark/ml/fpm.py   |   9 +-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  60 +++
 python/pyspark/ml/tests/test_param.py  |   8 +-
 python/pyspark/ml/tuning.py|  17 ++-
 22 files changed, 274 insertions(+), 135 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c6109ba -> bc78859)

2020-08-03 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c6109ba  [SPARK-32257][SQL] Reports explicit errors for invalid usage 
of SET/RESET command
 add bc78859  [SPARK-32310][ML][PYSPARK] ML params default value parity in 
feature and tuning

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../org/apache/spark/ml/feature/Selector.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../apache/spark/ml/tuning/CrossValidator.scala|   4 +-
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  56 +++---
 python/pyspark/ml/clustering.py|  30 --
 python/pyspark/ml/feature.py   | 120 +
 python/pyspark/ml/fpm.py   |   9 +-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  60 +++
 python/pyspark/ml/tests/test_param.py  |   8 +-
 python/pyspark/ml/tuning.py|  17 ++-
 22 files changed, 274 insertions(+), 135 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (89d9b7c -> 81b0785)

2020-07-29 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 89d9b7c  [SPARK-32010][PYTHON][CORE] Add InheritableThread for local 
properties and fixing a thread leak issue in pinned thread mode
 add 81b0785  [SPARK-32455][ML] LogisticRegressionModel prediction 
optimization

No new revisions were added by this update.

Summary of changes:
 .../ml/classification/LogisticRegression.scala | 89 --
 1 file changed, 49 insertions(+), 40 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (89d9b7c -> 81b0785)

2020-07-29 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 89d9b7c  [SPARK-32010][PYTHON][CORE] Add InheritableThread for local 
properties and fixing a thread leak issue in pinned thread mode
 add 81b0785  [SPARK-32455][ML] LogisticRegressionModel prediction 
optimization

No new revisions were added by this update.

Summary of changes:
 .../ml/classification/LogisticRegression.scala | 89 --
 1 file changed, 49 insertions(+), 40 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (89d9b7c -> 81b0785)

2020-07-29 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 89d9b7c  [SPARK-32010][PYTHON][CORE] Add InheritableThread for local 
properties and fixing a thread leak issue in pinned thread mode
 add 81b0785  [SPARK-32455][ML] LogisticRegressionModel prediction 
optimization

No new revisions were added by this update.

Summary of changes:
 .../ml/classification/LogisticRegression.scala | 89 --
 1 file changed, 49 insertions(+), 40 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (89d9b7c -> 81b0785)

2020-07-29 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 89d9b7c  [SPARK-32010][PYTHON][CORE] Add InheritableThread for local 
properties and fixing a thread leak issue in pinned thread mode
 add 81b0785  [SPARK-32455][ML] LogisticRegressionModel prediction 
optimization

No new revisions were added by this update.

Summary of changes:
 .../ml/classification/LogisticRegression.scala | 89 --
 1 file changed, 49 insertions(+), 40 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (89d9b7c -> 81b0785)

2020-07-29 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 89d9b7c  [SPARK-32010][PYTHON][CORE] Add InheritableThread for local 
properties and fixing a thread leak issue in pinned thread mode
 add 81b0785  [SPARK-32455][ML] LogisticRegressionModel prediction 
optimization

No new revisions were added by this update.

Summary of changes:
 .../ml/classification/LogisticRegression.scala | 89 --
 1 file changed, 49 insertions(+), 40 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests

2020-08-06 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 30c3a50  [SPARK-32506][TESTS] Flaky test: 
StreamingLinearRegressionWithTests
30c3a50 is described below

commit 30c3a502667bfa1feaf2230b4fc4cc2d36d9b85a
Author: Huaxin Gao 
AuthorDate: Thu Aug 6 13:54:15 2020 -0700

[SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests

### What changes were proposed in this pull request?
The test creates 10 batches of data  to train the model and expects to see 
error on test data improves as model is trained. If the difference between the 
2nd error and the 10th error is smaller than 2, the assertion fails:
```
FAIL: test_train_prediction 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests)
Test that error on test data improves as model is trained.
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 466, in test_train_prediction
eventually(condition, timeout=180.0)
  File "/home/runner/work/spark/spark/python/pyspark/testing/utils.py", 
line 81, in eventually
lastValue = condition()
  File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 461, in condition
self.assertGreater(errors[1] - errors[-1], 2)
AssertionError: 1.672640157855923 not greater than 2
```
I saw this quite a few time on Jenkins but was not able to reproduce this 
on my local. These are the ten errors I got:
```
4.517395047937127
4.894265404350079
3.0392090466559876
1.8786361640757654
0.8973106042078115
0.3715780507684368
0.20815690742907672
0.17333033743125845
0.15686783249863873
0.12584413600569616
```
I am thinking of having 15 batches of data instead of 10, so the model can 
be trained for a longer time. Hopefully the 15th error - 2nd error will always 
be larger than 2 on Jenkins. These are the 15 errors I got on my local:
```
4.517395047937127
4.894265404350079
3.0392090466559876
1.8786361640757658
0.8973106042078115
0.3715780507684368
0.20815690742907672
0.17333033743125845
0.15686783249863873
0.12584413600569616
0.11883853835108477
0.09400261862100823
0.08887491447353497
0.05984929624986607
0.07583948141520978
```

### Why are the changes needed?
Fix flaky test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually tested

    Closes #29380 from huaxingao/flaky_test.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 75c2c53e931187912a92e0b52dae0f772fa970e3)
Signed-off-by: Huaxin Gao 
---
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/mllib/tests/test_streaming_algorithms.py 
b/python/pyspark/mllib/tests/test_streaming_algorithms.py
index 2f35e07..5818a7c 100644
--- a/python/pyspark/mllib/tests/test_streaming_algorithms.py
+++ b/python/pyspark/mllib/tests/test_streaming_algorithms.py
@@ -434,9 +434,9 @@ class 
StreamingLinearRegressionWithTests(MLLibStreamingTestCase):
 slr = StreamingLinearRegressionWithSGD(stepSize=0.2, numIterations=25)
 slr.setInitialWeights([0.0])
 
-# Create ten batches with 100 sample points in each.
+# Create fifteen batches with 100 sample points in each.
 batches = []
-for i in range(10):
+for i in range(15):
 batch = LinearDataGenerator.generateLinearInput(
 0.0, [10.0], [0.0], [1.0 / 3.0], 100, 42 + i, 0.1)
 batches.append(self.sc.parallelize(batch))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests

2020-08-06 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 30c3a50  [SPARK-32506][TESTS] Flaky test: 
StreamingLinearRegressionWithTests
30c3a50 is described below

commit 30c3a502667bfa1feaf2230b4fc4cc2d36d9b85a
Author: Huaxin Gao 
AuthorDate: Thu Aug 6 13:54:15 2020 -0700

[SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests

### What changes were proposed in this pull request?
The test creates 10 batches of data  to train the model and expects to see 
error on test data improves as model is trained. If the difference between the 
2nd error and the 10th error is smaller than 2, the assertion fails:
```
FAIL: test_train_prediction 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests)
Test that error on test data improves as model is trained.
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 466, in test_train_prediction
eventually(condition, timeout=180.0)
  File "/home/runner/work/spark/spark/python/pyspark/testing/utils.py", 
line 81, in eventually
lastValue = condition()
  File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 461, in condition
self.assertGreater(errors[1] - errors[-1], 2)
AssertionError: 1.672640157855923 not greater than 2
```
I saw this quite a few time on Jenkins but was not able to reproduce this 
on my local. These are the ten errors I got:
```
4.517395047937127
4.894265404350079
3.0392090466559876
1.8786361640757654
0.8973106042078115
0.3715780507684368
0.20815690742907672
0.17333033743125845
0.15686783249863873
0.12584413600569616
```
I am thinking of having 15 batches of data instead of 10, so the model can 
be trained for a longer time. Hopefully the 15th error - 2nd error will always 
be larger than 2 on Jenkins. These are the 15 errors I got on my local:
```
4.517395047937127
4.894265404350079
3.0392090466559876
1.8786361640757658
0.8973106042078115
0.3715780507684368
0.20815690742907672
0.17333033743125845
0.15686783249863873
0.12584413600569616
0.11883853835108477
0.09400261862100823
0.08887491447353497
0.05984929624986607
0.07583948141520978
```

### Why are the changes needed?
Fix flaky test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually tested

    Closes #29380 from huaxingao/flaky_test.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 75c2c53e931187912a92e0b52dae0f772fa970e3)
Signed-off-by: Huaxin Gao 
---
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/mllib/tests/test_streaming_algorithms.py 
b/python/pyspark/mllib/tests/test_streaming_algorithms.py
index 2f35e07..5818a7c 100644
--- a/python/pyspark/mllib/tests/test_streaming_algorithms.py
+++ b/python/pyspark/mllib/tests/test_streaming_algorithms.py
@@ -434,9 +434,9 @@ class 
StreamingLinearRegressionWithTests(MLLibStreamingTestCase):
 slr = StreamingLinearRegressionWithSGD(stepSize=0.2, numIterations=25)
 slr.setInitialWeights([0.0])
 
-# Create ten batches with 100 sample points in each.
+# Create fifteen batches with 100 sample points in each.
 batches = []
-for i in range(10):
+for i in range(15):
 batch = LinearDataGenerator.generateLinearInput(
 0.0, [10.0], [0.0], [1.0 / 3.0], 100, 42 + i, 0.1)
 batches.append(self.sc.parallelize(batch))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6664e28 -> 75c2c53)

2020-08-06 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6664e28  [SPARK-32546][SQL][FOLLOWUP] Add `.toSeq` to `tableNames` in 
`HiveClientImpl.listTablesByType`
 add 75c2c53  [SPARK-32506][TESTS] Flaky test: 
StreamingLinearRegressionWithTests

No new revisions were added by this update.

Summary of changes:
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests

2020-08-06 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 30c3a50  [SPARK-32506][TESTS] Flaky test: 
StreamingLinearRegressionWithTests
30c3a50 is described below

commit 30c3a502667bfa1feaf2230b4fc4cc2d36d9b85a
Author: Huaxin Gao 
AuthorDate: Thu Aug 6 13:54:15 2020 -0700

[SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests

### What changes were proposed in this pull request?
The test creates 10 batches of data  to train the model and expects to see 
error on test data improves as model is trained. If the difference between the 
2nd error and the 10th error is smaller than 2, the assertion fails:
```
FAIL: test_train_prediction 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests)
Test that error on test data improves as model is trained.
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 466, in test_train_prediction
eventually(condition, timeout=180.0)
  File "/home/runner/work/spark/spark/python/pyspark/testing/utils.py", 
line 81, in eventually
lastValue = condition()
  File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 461, in condition
self.assertGreater(errors[1] - errors[-1], 2)
AssertionError: 1.672640157855923 not greater than 2
```
I saw this quite a few time on Jenkins but was not able to reproduce this 
on my local. These are the ten errors I got:
```
4.517395047937127
4.894265404350079
3.0392090466559876
1.8786361640757654
0.8973106042078115
0.3715780507684368
0.20815690742907672
0.17333033743125845
0.15686783249863873
0.12584413600569616
```
I am thinking of having 15 batches of data instead of 10, so the model can 
be trained for a longer time. Hopefully the 15th error - 2nd error will always 
be larger than 2 on Jenkins. These are the 15 errors I got on my local:
```
4.517395047937127
4.894265404350079
3.0392090466559876
1.8786361640757658
0.8973106042078115
0.3715780507684368
0.20815690742907672
0.17333033743125845
0.15686783249863873
0.12584413600569616
0.11883853835108477
0.09400261862100823
0.08887491447353497
0.05984929624986607
0.07583948141520978
```

### Why are the changes needed?
Fix flaky test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually tested

    Closes #29380 from huaxingao/flaky_test.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 75c2c53e931187912a92e0b52dae0f772fa970e3)
Signed-off-by: Huaxin Gao 
---
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/mllib/tests/test_streaming_algorithms.py 
b/python/pyspark/mllib/tests/test_streaming_algorithms.py
index 2f35e07..5818a7c 100644
--- a/python/pyspark/mllib/tests/test_streaming_algorithms.py
+++ b/python/pyspark/mllib/tests/test_streaming_algorithms.py
@@ -434,9 +434,9 @@ class 
StreamingLinearRegressionWithTests(MLLibStreamingTestCase):
 slr = StreamingLinearRegressionWithSGD(stepSize=0.2, numIterations=25)
 slr.setInitialWeights([0.0])
 
-# Create ten batches with 100 sample points in each.
+# Create fifteen batches with 100 sample points in each.
 batches = []
-for i in range(10):
+for i in range(15):
 batch = LinearDataGenerator.generateLinearInput(
 0.0, [10.0], [0.0], [1.0 / 3.0], 100, 42 + i, 0.1)
 batches.append(self.sc.parallelize(batch))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests

2020-08-06 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 30c3a50  [SPARK-32506][TESTS] Flaky test: 
StreamingLinearRegressionWithTests
30c3a50 is described below

commit 30c3a502667bfa1feaf2230b4fc4cc2d36d9b85a
Author: Huaxin Gao 
AuthorDate: Thu Aug 6 13:54:15 2020 -0700

[SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests

### What changes were proposed in this pull request?
The test creates 10 batches of data  to train the model and expects to see 
error on test data improves as model is trained. If the difference between the 
2nd error and the 10th error is smaller than 2, the assertion fails:
```
FAIL: test_train_prediction 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests)
Test that error on test data improves as model is trained.
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 466, in test_train_prediction
eventually(condition, timeout=180.0)
  File "/home/runner/work/spark/spark/python/pyspark/testing/utils.py", 
line 81, in eventually
lastValue = condition()
  File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 461, in condition
self.assertGreater(errors[1] - errors[-1], 2)
AssertionError: 1.672640157855923 not greater than 2
```
I saw this quite a few time on Jenkins but was not able to reproduce this 
on my local. These are the ten errors I got:
```
4.517395047937127
4.894265404350079
3.0392090466559876
1.8786361640757654
0.8973106042078115
0.3715780507684368
0.20815690742907672
0.17333033743125845
0.15686783249863873
0.12584413600569616
```
I am thinking of having 15 batches of data instead of 10, so the model can 
be trained for a longer time. Hopefully the 15th error - 2nd error will always 
be larger than 2 on Jenkins. These are the 15 errors I got on my local:
```
4.517395047937127
4.894265404350079
3.0392090466559876
1.8786361640757658
0.8973106042078115
0.3715780507684368
0.20815690742907672
0.17333033743125845
0.15686783249863873
0.12584413600569616
0.11883853835108477
0.09400261862100823
0.08887491447353497
0.05984929624986607
0.07583948141520978
```

### Why are the changes needed?
Fix flaky test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually tested

    Closes #29380 from huaxingao/flaky_test.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 75c2c53e931187912a92e0b52dae0f772fa970e3)
Signed-off-by: Huaxin Gao 
---
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/mllib/tests/test_streaming_algorithms.py 
b/python/pyspark/mllib/tests/test_streaming_algorithms.py
index 2f35e07..5818a7c 100644
--- a/python/pyspark/mllib/tests/test_streaming_algorithms.py
+++ b/python/pyspark/mllib/tests/test_streaming_algorithms.py
@@ -434,9 +434,9 @@ class 
StreamingLinearRegressionWithTests(MLLibStreamingTestCase):
 slr = StreamingLinearRegressionWithSGD(stepSize=0.2, numIterations=25)
 slr.setInitialWeights([0.0])
 
-# Create ten batches with 100 sample points in each.
+# Create fifteen batches with 100 sample points in each.
 batches = []
-for i in range(10):
+for i in range(15):
 batch = LinearDataGenerator.generateLinearInput(
 0.0, [10.0], [0.0], [1.0 / 3.0], 100, 42 + i, 0.1)
 batches.append(self.sc.parallelize(batch))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6664e28 -> 75c2c53)

2020-08-06 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6664e28  [SPARK-32546][SQL][FOLLOWUP] Add `.toSeq` to `tableNames` in 
`HiveClientImpl.listTablesByType`
 add 75c2c53  [SPARK-32506][TESTS] Flaky test: 
StreamingLinearRegressionWithTests

No new revisions were added by this update.

Summary of changes:
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6664e28 -> 75c2c53)

2020-08-06 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6664e28  [SPARK-32546][SQL][FOLLOWUP] Add `.toSeq` to `tableNames` in 
`HiveClientImpl.listTablesByType`
 add 75c2c53  [SPARK-32506][TESTS] Flaky test: 
StreamingLinearRegressionWithTests

No new revisions were added by this update.

Summary of changes:
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6664e28 -> 75c2c53)

2020-08-06 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6664e28  [SPARK-32546][SQL][FOLLOWUP] Add `.toSeq` to `tableNames` in 
`HiveClientImpl.listTablesByType`
 add 75c2c53  [SPARK-32506][TESTS] Flaky test: 
StreamingLinearRegressionWithTests

No new revisions were added by this update.

Summary of changes:
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6664e28 -> 75c2c53)

2020-08-06 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6664e28  [SPARK-32546][SQL][FOLLOWUP] Add `.toSeq` to `tableNames` in 
`HiveClientImpl.listTablesByType`
 add 75c2c53  [SPARK-32506][TESTS] Flaky test: 
StreamingLinearRegressionWithTests

No new revisions were added by this update.

Summary of changes:
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests

2020-08-06 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 30c3a50  [SPARK-32506][TESTS] Flaky test: 
StreamingLinearRegressionWithTests
30c3a50 is described below

commit 30c3a502667bfa1feaf2230b4fc4cc2d36d9b85a
Author: Huaxin Gao 
AuthorDate: Thu Aug 6 13:54:15 2020 -0700

[SPARK-32506][TESTS] Flaky test: StreamingLinearRegressionWithTests

### What changes were proposed in this pull request?
The test creates 10 batches of data  to train the model and expects to see 
error on test data improves as model is trained. If the difference between the 
2nd error and the 10th error is smaller than 2, the assertion fails:
```
FAIL: test_train_prediction 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests)
Test that error on test data improves as model is trained.
--
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 466, in test_train_prediction
eventually(condition, timeout=180.0)
  File "/home/runner/work/spark/spark/python/pyspark/testing/utils.py", 
line 81, in eventually
lastValue = condition()
  File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 461, in condition
self.assertGreater(errors[1] - errors[-1], 2)
AssertionError: 1.672640157855923 not greater than 2
```
I saw this quite a few time on Jenkins but was not able to reproduce this 
on my local. These are the ten errors I got:
```
4.517395047937127
4.894265404350079
3.0392090466559876
1.8786361640757654
0.8973106042078115
0.3715780507684368
0.20815690742907672
0.17333033743125845
0.15686783249863873
0.12584413600569616
```
I am thinking of having 15 batches of data instead of 10, so the model can 
be trained for a longer time. Hopefully the 15th error - 2nd error will always 
be larger than 2 on Jenkins. These are the 15 errors I got on my local:
```
4.517395047937127
4.894265404350079
3.0392090466559876
1.8786361640757658
0.8973106042078115
0.3715780507684368
0.20815690742907672
0.17333033743125845
0.15686783249863873
0.12584413600569616
0.11883853835108477
0.09400261862100823
0.08887491447353497
0.05984929624986607
0.07583948141520978
```

### Why are the changes needed?
Fix flaky test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually tested

    Closes #29380 from huaxingao/flaky_test.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 75c2c53e931187912a92e0b52dae0f772fa970e3)
Signed-off-by: Huaxin Gao 
---
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/mllib/tests/test_streaming_algorithms.py 
b/python/pyspark/mllib/tests/test_streaming_algorithms.py
index 2f35e07..5818a7c 100644
--- a/python/pyspark/mllib/tests/test_streaming_algorithms.py
+++ b/python/pyspark/mllib/tests/test_streaming_algorithms.py
@@ -434,9 +434,9 @@ class 
StreamingLinearRegressionWithTests(MLLibStreamingTestCase):
 slr = StreamingLinearRegressionWithSGD(stepSize=0.2, numIterations=25)
 slr.setInitialWeights([0.0])
 
-# Create ten batches with 100 sample points in each.
+# Create fifteen batches with 100 sample points in each.
 batches = []
-for i in range(10):
+for i in range(15):
 batch = LinearDataGenerator.generateLinearInput(
 0.0, [10.0], [0.0], [1.0 / 3.0], 100, 42 + i, 0.1)
 batches.append(self.sc.parallelize(batch))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (75d3428 -> 8d5c094)

2020-07-07 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75d3428  [SPARK-32209][SQL] Re-use GetTimestamp in ParseToDate
 add 8d5c094  [SPARK-32164][ML] GeneralizedLinearRegressionSummary 
optimization

No new revisions were added by this update.

Summary of changes:
 .../regression/GeneralizedLinearRegression.scala   | 50 --
 .../spark/ml/regression/LinearRegression.scala |  2 +-
 .../spark/mllib/evaluation/RegressionMetrics.scala |  2 +
 3 files changed, 40 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (75d3428 -> 8d5c094)

2020-07-07 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75d3428  [SPARK-32209][SQL] Re-use GetTimestamp in ParseToDate
 add 8d5c094  [SPARK-32164][ML] GeneralizedLinearRegressionSummary 
optimization

No new revisions were added by this update.

Summary of changes:
 .../regression/GeneralizedLinearRegression.scala   | 50 --
 .../spark/ml/regression/LinearRegression.scala |  2 +-
 .../spark/mllib/evaluation/RegressionMetrics.scala |  2 +
 3 files changed, 40 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (75d3428 -> 8d5c094)

2020-07-07 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75d3428  [SPARK-32209][SQL] Re-use GetTimestamp in ParseToDate
 add 8d5c094  [SPARK-32164][ML] GeneralizedLinearRegressionSummary 
optimization

No new revisions were added by this update.

Summary of changes:
 .../regression/GeneralizedLinearRegression.scala   | 50 --
 .../spark/ml/regression/LinearRegression.scala |  2 +-
 .../spark/mllib/evaluation/RegressionMetrics.scala |  2 +
 3 files changed, 40 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cf22d94 -> b05f309)

2020-07-15 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf22d94  [SPARK-32036] Replace references to blacklist/whitelist 
language with more appropriate terminology, excluding the blacklisting feature
 add b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 100 -
 .../apache/spark/ml/regression/FMRegressor.scala   |  10 +--
 .../spark/mllib/optimization/GradientDescent.scala |  45 ++
 .../apache/spark/mllib/optimization/LBFGS.scala|  11 ++-
 .../ml/classification/FMClassifierSuite.scala  |  26 ++
 python/pyspark/ml/classification.py|  48 +-
 python/pyspark/ml/tests/test_training_summary.py   |  49 +-
 7 files changed, 257 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: Add Huaxin Gao to committers.md

2020-07-02 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 18d7e21  Add Huaxin Gao to committers.md
18d7e21 is described below

commit 18d7e2103f9713adc09d69b65ebd4a48107c88f0
Author: Huaxin Gao 
AuthorDate: Thu Jul 2 19:38:42 2020 -0700

Add Huaxin Gao to committers.md

Author: Huaxin Gao 

Closes #278 from huaxingao/asf-site.
---
 committers.md| 1 +
 site/committers.html | 4 
 2 files changed, 5 insertions(+)

diff --git a/committers.md b/committers.md
index 42b89d4..77e768d 100644
--- a/committers.md
+++ b/committers.md
@@ -26,6 +26,7 @@ navigation:
 |Erik Erlandson|Red Hat|
 |Robert Evans|NVIDIA|
 |Wenchen Fan|Databricks|
+|Huaxin Gao|IBM|
 |Joseph Gonzalez|UC Berkeley|
 |Thomas Graves|NVIDIA|
 |Stephen Haberman|LinkedIn|
diff --git a/site/committers.html b/site/committers.html
index 5299961..66de9a1 100644
--- a/site/committers.html
+++ b/site/committers.html
@@ -275,6 +275,10 @@
   Databricks
 
 
+  Huaxin Gao
+  IBM
+
+
   Joseph Gonzalez
   UC Berkeley
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table

2020-07-04 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new fc2660c  [SPARK-32171][SQL][DOCS] Change file locations for use db and 
refresh table
fc2660c is described below

commit fc2660c302b0c83a9a8a5bec3cc7ae28f8fecdd6
Author: Huaxin Gao 
AuthorDate: Sat Jul 4 19:01:07 2020 -0700

[SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table

### What changes were proposed in this pull request?

docs/sql-ref-syntax-qry-select-usedb.md -> docs/sql-ref-syntax-ddl-usedb.md
docs/sql-ref-syntax-aux-refresh-table.md -> 
docs/sql-ref-syntax-aux-cache-refresh-table.md

### Why are the changes needed?
usedb belongs to DDL. Its location should be consistent with other DDL 
commands file locations
similar reason for refresh table

### Does this PR introduce _any_ user-facing change?
before change, when clicking USE DATABASE, the side bar menu shows select 
commands
https://user-images.githubusercontent.com/13592258/86516696-b45f8a80-bdd7-11ea-8dba-3a5cca22aad3.png;>

after change, when clicking USE DATABASE, the side bar menu shows DDL 
commands
https://user-images.githubusercontent.com/13592258/86516703-bf1a1f80-bdd7-11ea-8a90-ae7eaaafd44c.png;>

before change, when clicking refresh table, the side bar menu shows 
Auxiliary statements
https://user-images.githubusercontent.com/13592258/86516877-3d2af600-bdd9-11ea-9568-0a6f156f57da.png;>

after change, when clicking refresh table, the side bar menu shows Cache 
statements
https://user-images.githubusercontent.com/13592258/86516937-b4f92080-bdd9-11ea-8ad1-5f5a7f58d76b.png;>

### How was this patch tested?
Manually build and check

Closes #28995 from huaxingao/docs_fix.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 492d5d174a435c624bd87af9ee3621f4f1c8d1c5)
Signed-off-by: Huaxin Gao 
---
 docs/_data/menu-sql.yaml  | 4 ++--
 docs/sql-ref-syntax-aux-cache-cache-table.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-clear-cache.md  | 2 +-
 ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0
 docs/sql-ref-syntax-aux-cache-refresh.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +-
 docs/sql-ref-syntax-aux-cache.md  | 2 +-
 ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0
 docs/sql-ref-syntax-ddl.md| 2 +-
 docs/sql-ref-syntax.md| 4 ++--
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 219e680..eea657e 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -139,7 +139,7 @@
 - text: REPAIR TABLE
   url: sql-ref-syntax-ddl-repair-table.html
 - text: USE DATABASE
-  url: sql-ref-syntax-qry-select-usedb.html
+  url: sql-ref-syntax-ddl-usedb.html
 - text: Data Manipulation Statements
   url: sql-ref-syntax-dml.html
   subitems:
@@ -207,7 +207,7 @@
 - text: CLEAR CACHE
   url: sql-ref-syntax-aux-cache-clear-cache.html
 - text: REFRESH TABLE
-  url: sql-ref-syntax-aux-refresh-table.html
+  url: sql-ref-syntax-aux-cache-refresh-table.html
 - text: REFRESH
   url: sql-ref-syntax-aux-cache-refresh.html
 - text: DESCRIBE
diff --git a/docs/sql-ref-syntax-aux-cache-cache-table.md 
b/docs/sql-ref-syntax-aux-cache-cache-table.md
index 193e209..fdef3d6 100644
--- a/docs/sql-ref-syntax-aux-cache-cache-table.md
+++ b/docs/sql-ref-syntax-aux-cache-cache-table.md
@@ -78,5 +78,5 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') 
SELECT * FROM testDat
 
 * [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html)
 * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
-* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html)
+* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html)
 * [REFRESH](sql-ref-syntax-aux-cache-refresh.html)
diff --git a/docs/sql-ref-syntax-aux-cache-clear-cache.md 
b/docs/sql-ref-syntax-aux-cache-clear-cache.md
index ee33e6a..a27cd83 100644
--- a/docs/sql-ref-syntax-aux-cache-clear-cache.md
+++ b/docs/sql-ref-syntax-aux-cache-clear-cache.md
@@ -39,5 +39,5 @@ CLEAR CACHE;
 
 * [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html)
 * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
-* [REFRESH TABLE](sql-ref-synta

[spark] branch branch-3.0 updated: [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table

2020-07-04 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new fc2660c  [SPARK-32171][SQL][DOCS] Change file locations for use db and 
refresh table
fc2660c is described below

commit fc2660c302b0c83a9a8a5bec3cc7ae28f8fecdd6
Author: Huaxin Gao 
AuthorDate: Sat Jul 4 19:01:07 2020 -0700

[SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table

### What changes were proposed in this pull request?

docs/sql-ref-syntax-qry-select-usedb.md -> docs/sql-ref-syntax-ddl-usedb.md
docs/sql-ref-syntax-aux-refresh-table.md -> 
docs/sql-ref-syntax-aux-cache-refresh-table.md

### Why are the changes needed?
usedb belongs to DDL. Its location should be consistent with other DDL 
commands file locations
similar reason for refresh table

### Does this PR introduce _any_ user-facing change?
before change, when clicking USE DATABASE, the side bar menu shows select 
commands
https://user-images.githubusercontent.com/13592258/86516696-b45f8a80-bdd7-11ea-8dba-3a5cca22aad3.png;>

after change, when clicking USE DATABASE, the side bar menu shows DDL 
commands
https://user-images.githubusercontent.com/13592258/86516703-bf1a1f80-bdd7-11ea-8a90-ae7eaaafd44c.png;>

before change, when clicking refresh table, the side bar menu shows 
Auxiliary statements
https://user-images.githubusercontent.com/13592258/86516877-3d2af600-bdd9-11ea-9568-0a6f156f57da.png;>

after change, when clicking refresh table, the side bar menu shows Cache 
statements
https://user-images.githubusercontent.com/13592258/86516937-b4f92080-bdd9-11ea-8ad1-5f5a7f58d76b.png;>

### How was this patch tested?
Manually build and check

Closes #28995 from huaxingao/docs_fix.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 492d5d174a435c624bd87af9ee3621f4f1c8d1c5)
Signed-off-by: Huaxin Gao 
---
 docs/_data/menu-sql.yaml  | 4 ++--
 docs/sql-ref-syntax-aux-cache-cache-table.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-clear-cache.md  | 2 +-
 ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0
 docs/sql-ref-syntax-aux-cache-refresh.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +-
 docs/sql-ref-syntax-aux-cache.md  | 2 +-
 ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0
 docs/sql-ref-syntax-ddl.md| 2 +-
 docs/sql-ref-syntax.md| 4 ++--
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 219e680..eea657e 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -139,7 +139,7 @@
 - text: REPAIR TABLE
   url: sql-ref-syntax-ddl-repair-table.html
 - text: USE DATABASE
-  url: sql-ref-syntax-qry-select-usedb.html
+  url: sql-ref-syntax-ddl-usedb.html
 - text: Data Manipulation Statements
   url: sql-ref-syntax-dml.html
   subitems:
@@ -207,7 +207,7 @@
 - text: CLEAR CACHE
   url: sql-ref-syntax-aux-cache-clear-cache.html
 - text: REFRESH TABLE
-  url: sql-ref-syntax-aux-refresh-table.html
+  url: sql-ref-syntax-aux-cache-refresh-table.html
 - text: REFRESH
   url: sql-ref-syntax-aux-cache-refresh.html
 - text: DESCRIBE
diff --git a/docs/sql-ref-syntax-aux-cache-cache-table.md 
b/docs/sql-ref-syntax-aux-cache-cache-table.md
index 193e209..fdef3d6 100644
--- a/docs/sql-ref-syntax-aux-cache-cache-table.md
+++ b/docs/sql-ref-syntax-aux-cache-cache-table.md
@@ -78,5 +78,5 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') 
SELECT * FROM testDat
 
 * [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html)
 * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
-* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html)
+* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html)
 * [REFRESH](sql-ref-syntax-aux-cache-refresh.html)
diff --git a/docs/sql-ref-syntax-aux-cache-clear-cache.md 
b/docs/sql-ref-syntax-aux-cache-clear-cache.md
index ee33e6a..a27cd83 100644
--- a/docs/sql-ref-syntax-aux-cache-clear-cache.md
+++ b/docs/sql-ref-syntax-aux-cache-clear-cache.md
@@ -39,5 +39,5 @@ CLEAR CACHE;
 
 * [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html)
 * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
-* [REFRESH TABLE](sql-ref-synta

[spark] branch master updated (42f01e3 -> 492d5d1)

2020-07-04 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 42f01e3  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark
 add 492d5d1  [SPARK-32171][SQL][DOCS] Change file locations for use db and 
refresh table

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml  | 4 ++--
 docs/sql-ref-syntax-aux-cache-cache-table.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-clear-cache.md  | 2 +-
 ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0
 docs/sql-ref-syntax-aux-cache-refresh.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +-
 docs/sql-ref-syntax-aux-cache.md  | 2 +-
 ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0
 docs/sql-ref-syntax-ddl.md| 2 +-
 docs/sql-ref-syntax.md| 4 ++--
 10 files changed, 10 insertions(+), 10 deletions(-)
 rename docs/{sql-ref-syntax-aux-refresh-table.md => 
sql-ref-syntax-aux-cache-refresh-table.md} (100%)
 rename docs/{sql-ref-syntax-qry-select-usedb.md => 
sql-ref-syntax-ddl-usedb.md} (100%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table

2020-07-04 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 492d5d1  [SPARK-32171][SQL][DOCS] Change file locations for use db and 
refresh table
492d5d1 is described below

commit 492d5d174a435c624bd87af9ee3621f4f1c8d1c5
Author: Huaxin Gao 
AuthorDate: Sat Jul 4 19:01:07 2020 -0700

[SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table

### What changes were proposed in this pull request?

docs/sql-ref-syntax-qry-select-usedb.md -> docs/sql-ref-syntax-ddl-usedb.md
docs/sql-ref-syntax-aux-refresh-table.md -> 
docs/sql-ref-syntax-aux-cache-refresh-table.md

### Why are the changes needed?
usedb belongs to DDL. Its location should be consistent with other DDL 
commands file locations
similar reason for refresh table

### Does this PR introduce _any_ user-facing change?
before change, when clicking USE DATABASE, the side bar menu shows select 
commands
https://user-images.githubusercontent.com/13592258/86516696-b45f8a80-bdd7-11ea-8dba-3a5cca22aad3.png;>

after change, when clicking USE DATABASE, the side bar menu shows DDL 
commands
https://user-images.githubusercontent.com/13592258/86516703-bf1a1f80-bdd7-11ea-8a90-ae7eaaafd44c.png;>

before change, when clicking refresh table, the side bar menu shows 
Auxiliary statements
https://user-images.githubusercontent.com/13592258/86516877-3d2af600-bdd9-11ea-9568-0a6f156f57da.png;>

after change, when clicking refresh table, the side bar menu shows Cache 
statements
https://user-images.githubusercontent.com/13592258/86516937-b4f92080-bdd9-11ea-8ad1-5f5a7f58d76b.png;>

### How was this patch tested?
Manually build and check

Closes #28995 from huaxingao/docs_fix.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
---
 docs/_data/menu-sql.yaml  | 4 ++--
 docs/sql-ref-syntax-aux-cache-cache-table.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-clear-cache.md  | 2 +-
 ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0
 docs/sql-ref-syntax-aux-cache-refresh.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +-
 docs/sql-ref-syntax-aux-cache.md  | 2 +-
 ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0
 docs/sql-ref-syntax-ddl.md| 2 +-
 docs/sql-ref-syntax.md| 4 ++--
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 219e680..eea657e 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -139,7 +139,7 @@
 - text: REPAIR TABLE
   url: sql-ref-syntax-ddl-repair-table.html
 - text: USE DATABASE
-  url: sql-ref-syntax-qry-select-usedb.html
+  url: sql-ref-syntax-ddl-usedb.html
 - text: Data Manipulation Statements
   url: sql-ref-syntax-dml.html
   subitems:
@@ -207,7 +207,7 @@
 - text: CLEAR CACHE
   url: sql-ref-syntax-aux-cache-clear-cache.html
 - text: REFRESH TABLE
-  url: sql-ref-syntax-aux-refresh-table.html
+  url: sql-ref-syntax-aux-cache-refresh-table.html
 - text: REFRESH
   url: sql-ref-syntax-aux-cache-refresh.html
 - text: DESCRIBE
diff --git a/docs/sql-ref-syntax-aux-cache-cache-table.md 
b/docs/sql-ref-syntax-aux-cache-cache-table.md
index 193e209..fdef3d6 100644
--- a/docs/sql-ref-syntax-aux-cache-cache-table.md
+++ b/docs/sql-ref-syntax-aux-cache-cache-table.md
@@ -78,5 +78,5 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') 
SELECT * FROM testDat
 
 * [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html)
 * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
-* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html)
+* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html)
 * [REFRESH](sql-ref-syntax-aux-cache-refresh.html)
diff --git a/docs/sql-ref-syntax-aux-cache-clear-cache.md 
b/docs/sql-ref-syntax-aux-cache-clear-cache.md
index ee33e6a..a27cd83 100644
--- a/docs/sql-ref-syntax-aux-cache-clear-cache.md
+++ b/docs/sql-ref-syntax-aux-cache-clear-cache.md
@@ -39,5 +39,5 @@ CLEAR CACHE;
 
 * [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html)
 * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
-* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html)
+* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html)
 * [REFRESH](sql-ref-synta

[spark] branch master updated (42f01e3 -> 492d5d1)

2020-07-04 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 42f01e3  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark
 add 492d5d1  [SPARK-32171][SQL][DOCS] Change file locations for use db and 
refresh table

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml  | 4 ++--
 docs/sql-ref-syntax-aux-cache-cache-table.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-clear-cache.md  | 2 +-
 ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0
 docs/sql-ref-syntax-aux-cache-refresh.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +-
 docs/sql-ref-syntax-aux-cache.md  | 2 +-
 ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0
 docs/sql-ref-syntax-ddl.md| 2 +-
 docs/sql-ref-syntax.md| 4 ++--
 10 files changed, 10 insertions(+), 10 deletions(-)
 rename docs/{sql-ref-syntax-aux-refresh-table.md => 
sql-ref-syntax-aux-cache-refresh-table.md} (100%)
 rename docs/{sql-ref-syntax-qry-select-usedb.md => 
sql-ref-syntax-ddl-usedb.md} (100%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table

2020-07-04 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new fc2660c  [SPARK-32171][SQL][DOCS] Change file locations for use db and 
refresh table
fc2660c is described below

commit fc2660c302b0c83a9a8a5bec3cc7ae28f8fecdd6
Author: Huaxin Gao 
AuthorDate: Sat Jul 4 19:01:07 2020 -0700

[SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table

### What changes were proposed in this pull request?

docs/sql-ref-syntax-qry-select-usedb.md -> docs/sql-ref-syntax-ddl-usedb.md
docs/sql-ref-syntax-aux-refresh-table.md -> 
docs/sql-ref-syntax-aux-cache-refresh-table.md

### Why are the changes needed?
usedb belongs to DDL. Its location should be consistent with other DDL 
commands file locations
similar reason for refresh table

### Does this PR introduce _any_ user-facing change?
before change, when clicking USE DATABASE, the side bar menu shows select 
commands
https://user-images.githubusercontent.com/13592258/86516696-b45f8a80-bdd7-11ea-8dba-3a5cca22aad3.png;>

after change, when clicking USE DATABASE, the side bar menu shows DDL 
commands
https://user-images.githubusercontent.com/13592258/86516703-bf1a1f80-bdd7-11ea-8a90-ae7eaaafd44c.png;>

before change, when clicking refresh table, the side bar menu shows 
Auxiliary statements
https://user-images.githubusercontent.com/13592258/86516877-3d2af600-bdd9-11ea-9568-0a6f156f57da.png;>

after change, when clicking refresh table, the side bar menu shows Cache 
statements
https://user-images.githubusercontent.com/13592258/86516937-b4f92080-bdd9-11ea-8ad1-5f5a7f58d76b.png;>

### How was this patch tested?
Manually build and check

Closes #28995 from huaxingao/docs_fix.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 492d5d174a435c624bd87af9ee3621f4f1c8d1c5)
Signed-off-by: Huaxin Gao 
---
 docs/_data/menu-sql.yaml  | 4 ++--
 docs/sql-ref-syntax-aux-cache-cache-table.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-clear-cache.md  | 2 +-
 ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0
 docs/sql-ref-syntax-aux-cache-refresh.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +-
 docs/sql-ref-syntax-aux-cache.md  | 2 +-
 ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0
 docs/sql-ref-syntax-ddl.md| 2 +-
 docs/sql-ref-syntax.md| 4 ++--
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 219e680..eea657e 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -139,7 +139,7 @@
 - text: REPAIR TABLE
   url: sql-ref-syntax-ddl-repair-table.html
 - text: USE DATABASE
-  url: sql-ref-syntax-qry-select-usedb.html
+  url: sql-ref-syntax-ddl-usedb.html
 - text: Data Manipulation Statements
   url: sql-ref-syntax-dml.html
   subitems:
@@ -207,7 +207,7 @@
 - text: CLEAR CACHE
   url: sql-ref-syntax-aux-cache-clear-cache.html
 - text: REFRESH TABLE
-  url: sql-ref-syntax-aux-refresh-table.html
+  url: sql-ref-syntax-aux-cache-refresh-table.html
 - text: REFRESH
   url: sql-ref-syntax-aux-cache-refresh.html
 - text: DESCRIBE
diff --git a/docs/sql-ref-syntax-aux-cache-cache-table.md 
b/docs/sql-ref-syntax-aux-cache-cache-table.md
index 193e209..fdef3d6 100644
--- a/docs/sql-ref-syntax-aux-cache-cache-table.md
+++ b/docs/sql-ref-syntax-aux-cache-cache-table.md
@@ -78,5 +78,5 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') 
SELECT * FROM testDat
 
 * [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html)
 * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
-* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html)
+* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html)
 * [REFRESH](sql-ref-syntax-aux-cache-refresh.html)
diff --git a/docs/sql-ref-syntax-aux-cache-clear-cache.md 
b/docs/sql-ref-syntax-aux-cache-clear-cache.md
index ee33e6a..a27cd83 100644
--- a/docs/sql-ref-syntax-aux-cache-clear-cache.md
+++ b/docs/sql-ref-syntax-aux-cache-clear-cache.md
@@ -39,5 +39,5 @@ CLEAR CACHE;
 
 * [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html)
 * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
-* [REFRESH TABLE](sql-ref-synta

[spark] branch master updated (42f01e3 -> 492d5d1)

2020-07-04 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 42f01e3  [SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in 
JsonBenchmark
 add 492d5d1  [SPARK-32171][SQL][DOCS] Change file locations for use db and 
refresh table

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml  | 4 ++--
 docs/sql-ref-syntax-aux-cache-cache-table.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-clear-cache.md  | 2 +-
 ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0
 docs/sql-ref-syntax-aux-cache-refresh.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +-
 docs/sql-ref-syntax-aux-cache.md  | 2 +-
 ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0
 docs/sql-ref-syntax-ddl.md| 2 +-
 docs/sql-ref-syntax.md| 4 ++--
 10 files changed, 10 insertions(+), 10 deletions(-)
 rename docs/{sql-ref-syntax-aux-refresh-table.md => 
sql-ref-syntax-aux-cache-refresh-table.md} (100%)
 rename docs/{sql-ref-syntax-qry-select-usedb.md => 
sql-ref-syntax-ddl-usedb.md} (100%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table

2020-07-04 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new fc2660c  [SPARK-32171][SQL][DOCS] Change file locations for use db and 
refresh table
fc2660c is described below

commit fc2660c302b0c83a9a8a5bec3cc7ae28f8fecdd6
Author: Huaxin Gao 
AuthorDate: Sat Jul 4 19:01:07 2020 -0700

[SPARK-32171][SQL][DOCS] Change file locations for use db and refresh table

### What changes were proposed in this pull request?

docs/sql-ref-syntax-qry-select-usedb.md -> docs/sql-ref-syntax-ddl-usedb.md
docs/sql-ref-syntax-aux-refresh-table.md -> 
docs/sql-ref-syntax-aux-cache-refresh-table.md

### Why are the changes needed?
usedb belongs to DDL. Its location should be consistent with other DDL 
commands file locations
similar reason for refresh table

### Does this PR introduce _any_ user-facing change?
before change, when clicking USE DATABASE, the side bar menu shows select 
commands
https://user-images.githubusercontent.com/13592258/86516696-b45f8a80-bdd7-11ea-8dba-3a5cca22aad3.png;>

after change, when clicking USE DATABASE, the side bar menu shows DDL 
commands
https://user-images.githubusercontent.com/13592258/86516703-bf1a1f80-bdd7-11ea-8a90-ae7eaaafd44c.png;>

before change, when clicking refresh table, the side bar menu shows 
Auxiliary statements
https://user-images.githubusercontent.com/13592258/86516877-3d2af600-bdd9-11ea-9568-0a6f156f57da.png;>

after change, when clicking refresh table, the side bar menu shows Cache 
statements
https://user-images.githubusercontent.com/13592258/86516937-b4f92080-bdd9-11ea-8ad1-5f5a7f58d76b.png;>

### How was this patch tested?
Manually build and check

Closes #28995 from huaxingao/docs_fix.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 492d5d174a435c624bd87af9ee3621f4f1c8d1c5)
Signed-off-by: Huaxin Gao 
---
 docs/_data/menu-sql.yaml  | 4 ++--
 docs/sql-ref-syntax-aux-cache-cache-table.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-clear-cache.md  | 2 +-
 ...aux-refresh-table.md => sql-ref-syntax-aux-cache-refresh-table.md} | 0
 docs/sql-ref-syntax-aux-cache-refresh.md  | 2 +-
 docs/sql-ref-syntax-aux-cache-uncache-table.md| 2 +-
 docs/sql-ref-syntax-aux-cache.md  | 2 +-
 ...sql-ref-syntax-qry-select-usedb.md => sql-ref-syntax-ddl-usedb.md} | 0
 docs/sql-ref-syntax-ddl.md| 2 +-
 docs/sql-ref-syntax.md| 4 ++--
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 219e680..eea657e 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -139,7 +139,7 @@
 - text: REPAIR TABLE
   url: sql-ref-syntax-ddl-repair-table.html
 - text: USE DATABASE
-  url: sql-ref-syntax-qry-select-usedb.html
+  url: sql-ref-syntax-ddl-usedb.html
 - text: Data Manipulation Statements
   url: sql-ref-syntax-dml.html
   subitems:
@@ -207,7 +207,7 @@
 - text: CLEAR CACHE
   url: sql-ref-syntax-aux-cache-clear-cache.html
 - text: REFRESH TABLE
-  url: sql-ref-syntax-aux-refresh-table.html
+  url: sql-ref-syntax-aux-cache-refresh-table.html
 - text: REFRESH
   url: sql-ref-syntax-aux-cache-refresh.html
 - text: DESCRIBE
diff --git a/docs/sql-ref-syntax-aux-cache-cache-table.md 
b/docs/sql-ref-syntax-aux-cache-cache-table.md
index 193e209..fdef3d6 100644
--- a/docs/sql-ref-syntax-aux-cache-cache-table.md
+++ b/docs/sql-ref-syntax-aux-cache-cache-table.md
@@ -78,5 +78,5 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') 
SELECT * FROM testDat
 
 * [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html)
 * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
-* [REFRESH TABLE](sql-ref-syntax-aux-refresh-table.html)
+* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html)
 * [REFRESH](sql-ref-syntax-aux-cache-refresh.html)
diff --git a/docs/sql-ref-syntax-aux-cache-clear-cache.md 
b/docs/sql-ref-syntax-aux-cache-clear-cache.md
index ee33e6a..a27cd83 100644
--- a/docs/sql-ref-syntax-aux-cache-clear-cache.md
+++ b/docs/sql-ref-syntax-aux-cache-clear-cache.md
@@ -39,5 +39,5 @@ CLEAR CACHE;
 
 * [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html)
 * [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
-* [REFRESH TABLE](sql-ref-synta

[spark] branch branch-3.0 updated (f50432f -> 8a52bda)

2020-07-24 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
 add 8a52bda  [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala |  10 --
 .../apache/spark/ml/classification/LinearSVC.scala |  11 +--
 .../ml/classification/LogisticRegression.scala |  13 +--
 .../spark/ml/classification/NaiveBayes.scala   |   4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |   7 +-
 .../spark/ml/clustering/GaussianMixture.scala  |   7 +-
 .../org/apache/spark/ml/clustering/KMeans.scala|  11 +--
 .../scala/org/apache/spark/ml/clustering/LDA.scala |  11 +--
 .../ml/clustering/PowerIterationClustering.scala   |   7 +-
 .../evaluation/BinaryClassificationEvaluator.scala |   4 +-
 .../MulticlassClassificationEvaluator.scala|   8 +-
 .../MultilabelClassificationEvaluator.scala|   6 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |   6 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |   4 +-
 .../apache/spark/ml/feature/ChiSqSelector.scala|   9 +-
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |   5 +-
 .../ml/regression/AFTSurvivalRegression.scala  |  10 +-
 .../spark/ml/regression/LinearRegression.scala |  14 +--
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  86 +++-
 python/pyspark/ml/clustering.py|  43 ++--
 python/pyspark/ml/feature.py   | 110 ++---
 python/pyspark/ml/fpm.py   |  12 ++-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  88 -
 python/pyspark/ml/tests/test_param.py  |   7 +-
 python/pyspark/ml/tuning.py|  16 ++-
 38 files changed, 368 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (f50432f -> 8a52bda)

2020-07-24 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
 add 8a52bda  [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala |  10 --
 .../apache/spark/ml/classification/LinearSVC.scala |  11 +--
 .../ml/classification/LogisticRegression.scala |  13 +--
 .../spark/ml/classification/NaiveBayes.scala   |   4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |   7 +-
 .../spark/ml/clustering/GaussianMixture.scala  |   7 +-
 .../org/apache/spark/ml/clustering/KMeans.scala|  11 +--
 .../scala/org/apache/spark/ml/clustering/LDA.scala |  11 +--
 .../ml/clustering/PowerIterationClustering.scala   |   7 +-
 .../evaluation/BinaryClassificationEvaluator.scala |   4 +-
 .../MulticlassClassificationEvaluator.scala|   8 +-
 .../MultilabelClassificationEvaluator.scala|   6 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |   6 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |   4 +-
 .../apache/spark/ml/feature/ChiSqSelector.scala|   9 +-
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |   5 +-
 .../ml/regression/AFTSurvivalRegression.scala  |  10 +-
 .../spark/ml/regression/LinearRegression.scala |  14 +--
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  86 +++-
 python/pyspark/ml/clustering.py|  43 ++--
 python/pyspark/ml/feature.py   | 110 ++---
 python/pyspark/ml/fpm.py   |  12 ++-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  88 -
 python/pyspark/ml/tests/test_param.py  |   7 +-
 python/pyspark/ml/tuning.py|  16 ++-
 38 files changed, 368 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (f50432f -> 8a52bda)

2020-07-24 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
 add 8a52bda  [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala |  10 --
 .../apache/spark/ml/classification/LinearSVC.scala |  11 +--
 .../ml/classification/LogisticRegression.scala |  13 +--
 .../spark/ml/classification/NaiveBayes.scala   |   4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |   7 +-
 .../spark/ml/clustering/GaussianMixture.scala  |   7 +-
 .../org/apache/spark/ml/clustering/KMeans.scala|  11 +--
 .../scala/org/apache/spark/ml/clustering/LDA.scala |  11 +--
 .../ml/clustering/PowerIterationClustering.scala   |   7 +-
 .../evaluation/BinaryClassificationEvaluator.scala |   4 +-
 .../MulticlassClassificationEvaluator.scala|   8 +-
 .../MultilabelClassificationEvaluator.scala|   6 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |   6 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |   4 +-
 .../apache/spark/ml/feature/ChiSqSelector.scala|   9 +-
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |   5 +-
 .../ml/regression/AFTSurvivalRegression.scala  |  10 +-
 .../spark/ml/regression/LinearRegression.scala |  14 +--
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  86 +++-
 python/pyspark/ml/clustering.py|  43 ++--
 python/pyspark/ml/feature.py   | 110 ++---
 python/pyspark/ml/fpm.py   |  12 ++-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  88 -
 python/pyspark/ml/tests/test_param.py  |   7 +-
 python/pyspark/ml/tuning.py|  16 ++-
 38 files changed, 368 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (f50432f -> 8a52bda)

2020-07-24 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
 add 8a52bda  [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala |  10 --
 .../apache/spark/ml/classification/LinearSVC.scala |  11 +--
 .../ml/classification/LogisticRegression.scala |  13 +--
 .../spark/ml/classification/NaiveBayes.scala   |   4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |   7 +-
 .../spark/ml/clustering/GaussianMixture.scala  |   7 +-
 .../org/apache/spark/ml/clustering/KMeans.scala|  11 +--
 .../scala/org/apache/spark/ml/clustering/LDA.scala |  11 +--
 .../ml/clustering/PowerIterationClustering.scala   |   7 +-
 .../evaluation/BinaryClassificationEvaluator.scala |   4 +-
 .../MulticlassClassificationEvaluator.scala|   8 +-
 .../MultilabelClassificationEvaluator.scala|   6 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |   6 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |   4 +-
 .../apache/spark/ml/feature/ChiSqSelector.scala|   9 +-
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |   5 +-
 .../ml/regression/AFTSurvivalRegression.scala  |  10 +-
 .../spark/ml/regression/LinearRegression.scala |  14 +--
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  86 +++-
 python/pyspark/ml/clustering.py|  43 ++--
 python/pyspark/ml/feature.py   | 110 ++---
 python/pyspark/ml/fpm.py   |  12 ++-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  88 -
 python/pyspark/ml/tests/test_param.py  |   7 +-
 python/pyspark/ml/tuning.py|  16 ++-
 38 files changed, 368 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (f50432f -> 8a52bda)

2020-07-24 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f50432f  [SPARK-32363][PYTHON][BUILD][3.0] Fix flakiness in pip 
package testing in Jenkins
 add 8a52bda  [SPARK-32310][ML][PYSPARK][3.0] ML params default value parity

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala |  10 --
 .../apache/spark/ml/classification/LinearSVC.scala |  11 +--
 .../ml/classification/LogisticRegression.scala |  13 +--
 .../spark/ml/classification/NaiveBayes.scala   |   4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |   7 +-
 .../spark/ml/clustering/GaussianMixture.scala  |   7 +-
 .../org/apache/spark/ml/clustering/KMeans.scala|  11 +--
 .../scala/org/apache/spark/ml/clustering/LDA.scala |  11 +--
 .../ml/clustering/PowerIterationClustering.scala   |   7 +-
 .../evaluation/BinaryClassificationEvaluator.scala |   4 +-
 .../MulticlassClassificationEvaluator.scala|   8 +-
 .../MultilabelClassificationEvaluator.scala|   6 +-
 .../spark/ml/evaluation/RankingEvaluator.scala |   6 +-
 .../spark/ml/evaluation/RegressionEvaluator.scala  |   4 +-
 .../apache/spark/ml/feature/ChiSqSelector.scala|   9 +-
 .../org/apache/spark/ml/feature/Imputer.scala  |   4 +-
 .../org/apache/spark/ml/feature/MinMaxScaler.scala |   4 +-
 .../apache/spark/ml/feature/OneHotEncoder.scala|   5 +-
 .../spark/ml/feature/QuantileDiscretizer.scala |   4 +-
 .../org/apache/spark/ml/feature/RFormula.scala |   6 +-
 .../org/apache/spark/ml/feature/RobustScaler.scala |   8 +-
 .../apache/spark/ml/feature/StringIndexer.scala|   6 +-
 .../apache/spark/ml/feature/VectorIndexer.scala|   6 +-
 .../org/apache/spark/ml/feature/VectorSlicer.scala |   6 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |   9 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |   5 +-
 .../ml/regression/AFTSurvivalRegression.scala  |  10 +-
 .../spark/ml/regression/LinearRegression.scala |  14 +--
 .../org/apache/spark/ml/tree/treeParams.scala  |  16 +--
 .../spark/ml/util/DefaultReadWriteTest.scala   |   3 +
 python/pyspark/ml/classification.py|  86 +++-
 python/pyspark/ml/clustering.py|  43 ++--
 python/pyspark/ml/feature.py   | 110 ++---
 python/pyspark/ml/fpm.py   |  12 ++-
 python/pyspark/ml/recommendation.py|  20 ++--
 python/pyspark/ml/regression.py|  88 -
 python/pyspark/ml/tests/test_param.py  |   7 +-
 python/pyspark/ml/tuning.py|  16 ++-
 38 files changed, 368 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c1f160e -> d5c672a)

2020-07-16 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c1f160e  [SPARK-30648][SQL] Support filters pushdown in JSON datasource
 add d5c672a  [SPARK-32315][ML] Provide an explanation error message when 
calling require

No new revisions were added by this update.

Summary of changes:
 mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c1f160e -> d5c672a)

2020-07-16 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c1f160e  [SPARK-30648][SQL] Support filters pushdown in JSON datasource
 add d5c672a  [SPARK-32315][ML] Provide an explanation error message when 
calling require

No new revisions were added by this update.

Summary of changes:
 mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c1f160e -> d5c672a)

2020-07-16 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c1f160e  [SPARK-30648][SQL] Support filters pushdown in JSON datasource
 add d5c672a  [SPARK-32315][ML] Provide an explanation error message when 
calling require

No new revisions were added by this update.

Summary of changes:
 mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c1f160e -> d5c672a)

2020-07-16 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c1f160e  [SPARK-30648][SQL] Support filters pushdown in JSON datasource
 add d5c672a  [SPARK-32315][ML] Provide an explanation error message when 
calling require

No new revisions were added by this update.

Summary of changes:
 mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c1f160e -> d5c672a)

2020-07-16 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c1f160e  [SPARK-30648][SQL] Support filters pushdown in JSON datasource
 add d5c672a  [SPARK-32315][ML] Provide an explanation error message when 
calling require

No new revisions were added by this update.

Summary of changes:
 mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d5c672a -> 383f5e9)

2020-07-16 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d5c672a  [SPARK-32315][ML] Provide an explanation error message when 
calling require
 add 383f5e9  [SPARK-32310][ML][PYSPARK] ML params default value parity in 
classification, regression, clustering and fpm

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 10 
 .../apache/spark/ml/classification/LinearSVC.scala | 12 ++---
 .../ml/classification/LogisticRegression.scala | 14 ++
 .../spark/ml/classification/NaiveBayes.scala   |  4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |  7 +--
 .../spark/ml/clustering/GaussianMixture.scala  |  8 +--
 .../org/apache/spark/ml/clustering/KMeans.scala| 11 ++--
 .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 ++--
 .../ml/clustering/PowerIterationClustering.scala   |  7 +--
 .../evaluation/BinaryClassificationEvaluator.scala |  4 +-
 .../MulticlassClassificationEvaluator.scala|  8 +--
 .../MultilabelClassificationEvaluator.scala|  6 +--
 .../spark/ml/evaluation/RankingEvaluator.scala |  6 +--
 .../spark/ml/evaluation/RegressionEvaluator.scala  |  4 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |  5 +-
 .../ml/regression/AFTSurvivalRegression.scala  | 11 ++--
 .../spark/ml/regression/LinearRegression.scala | 15 ++
 python/pyspark/ml/classification.py| 58 ++
 python/pyspark/ml/clustering.py| 33 
 python/pyspark/ml/fpm.py   |  7 ++-
 python/pyspark/ml/regression.py| 57 +
 21 files changed, 141 insertions(+), 157 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d5c672a -> 383f5e9)

2020-07-16 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d5c672a  [SPARK-32315][ML] Provide an explanation error message when 
calling require
 add 383f5e9  [SPARK-32310][ML][PYSPARK] ML params default value parity in 
classification, regression, clustering and fpm

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 10 
 .../apache/spark/ml/classification/LinearSVC.scala | 12 ++---
 .../ml/classification/LogisticRegression.scala | 14 ++
 .../spark/ml/classification/NaiveBayes.scala   |  4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |  7 +--
 .../spark/ml/clustering/GaussianMixture.scala  |  8 +--
 .../org/apache/spark/ml/clustering/KMeans.scala| 11 ++--
 .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 ++--
 .../ml/clustering/PowerIterationClustering.scala   |  7 +--
 .../evaluation/BinaryClassificationEvaluator.scala |  4 +-
 .../MulticlassClassificationEvaluator.scala|  8 +--
 .../MultilabelClassificationEvaluator.scala|  6 +--
 .../spark/ml/evaluation/RankingEvaluator.scala |  6 +--
 .../spark/ml/evaluation/RegressionEvaluator.scala  |  4 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |  5 +-
 .../ml/regression/AFTSurvivalRegression.scala  | 11 ++--
 .../spark/ml/regression/LinearRegression.scala | 15 ++
 python/pyspark/ml/classification.py| 58 ++
 python/pyspark/ml/clustering.py| 33 
 python/pyspark/ml/fpm.py   |  7 ++-
 python/pyspark/ml/regression.py| 57 +
 21 files changed, 141 insertions(+), 157 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d5c672a -> 383f5e9)

2020-07-16 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d5c672a  [SPARK-32315][ML] Provide an explanation error message when 
calling require
 add 383f5e9  [SPARK-32310][ML][PYSPARK] ML params default value parity in 
classification, regression, clustering and fpm

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 10 
 .../apache/spark/ml/classification/LinearSVC.scala | 12 ++---
 .../ml/classification/LogisticRegression.scala | 14 ++
 .../spark/ml/classification/NaiveBayes.scala   |  4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |  7 +--
 .../spark/ml/clustering/GaussianMixture.scala  |  8 +--
 .../org/apache/spark/ml/clustering/KMeans.scala| 11 ++--
 .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 ++--
 .../ml/clustering/PowerIterationClustering.scala   |  7 +--
 .../evaluation/BinaryClassificationEvaluator.scala |  4 +-
 .../MulticlassClassificationEvaluator.scala|  8 +--
 .../MultilabelClassificationEvaluator.scala|  6 +--
 .../spark/ml/evaluation/RankingEvaluator.scala |  6 +--
 .../spark/ml/evaluation/RegressionEvaluator.scala  |  4 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |  5 +-
 .../ml/regression/AFTSurvivalRegression.scala  | 11 ++--
 .../spark/ml/regression/LinearRegression.scala | 15 ++
 python/pyspark/ml/classification.py| 58 ++
 python/pyspark/ml/clustering.py| 33 
 python/pyspark/ml/fpm.py   |  7 ++-
 python/pyspark/ml/regression.py| 57 +
 21 files changed, 141 insertions(+), 157 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d5c672a -> 383f5e9)

2020-07-16 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d5c672a  [SPARK-32315][ML] Provide an explanation error message when 
calling require
 add 383f5e9  [SPARK-32310][ML][PYSPARK] ML params default value parity in 
classification, regression, clustering and fpm

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 10 
 .../apache/spark/ml/classification/LinearSVC.scala | 12 ++---
 .../ml/classification/LogisticRegression.scala | 14 ++
 .../spark/ml/classification/NaiveBayes.scala   |  4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |  7 +--
 .../spark/ml/clustering/GaussianMixture.scala  |  8 +--
 .../org/apache/spark/ml/clustering/KMeans.scala| 11 ++--
 .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 ++--
 .../ml/clustering/PowerIterationClustering.scala   |  7 +--
 .../evaluation/BinaryClassificationEvaluator.scala |  4 +-
 .../MulticlassClassificationEvaluator.scala|  8 +--
 .../MultilabelClassificationEvaluator.scala|  6 +--
 .../spark/ml/evaluation/RankingEvaluator.scala |  6 +--
 .../spark/ml/evaluation/RegressionEvaluator.scala  |  4 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |  5 +-
 .../ml/regression/AFTSurvivalRegression.scala  | 11 ++--
 .../spark/ml/regression/LinearRegression.scala | 15 ++
 python/pyspark/ml/classification.py| 58 ++
 python/pyspark/ml/clustering.py| 33 
 python/pyspark/ml/fpm.py   |  7 ++-
 python/pyspark/ml/regression.py| 57 +
 21 files changed, 141 insertions(+), 157 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d5c672a -> 383f5e9)

2020-07-16 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d5c672a  [SPARK-32315][ML] Provide an explanation error message when 
calling require
 add 383f5e9  [SPARK-32310][ML][PYSPARK] ML params default value parity in 
classification, regression, clustering and fpm

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 10 
 .../apache/spark/ml/classification/LinearSVC.scala | 12 ++---
 .../ml/classification/LogisticRegression.scala | 14 ++
 .../spark/ml/classification/NaiveBayes.scala   |  4 +-
 .../spark/ml/clustering/BisectingKMeans.scala  |  7 +--
 .../spark/ml/clustering/GaussianMixture.scala  |  8 +--
 .../org/apache/spark/ml/clustering/KMeans.scala| 11 ++--
 .../scala/org/apache/spark/ml/clustering/LDA.scala | 11 ++--
 .../ml/clustering/PowerIterationClustering.scala   |  7 +--
 .../evaluation/BinaryClassificationEvaluator.scala |  4 +-
 .../MulticlassClassificationEvaluator.scala|  8 +--
 .../MultilabelClassificationEvaluator.scala|  6 +--
 .../spark/ml/evaluation/RankingEvaluator.scala |  6 +--
 .../spark/ml/evaluation/RegressionEvaluator.scala  |  4 +-
 .../scala/org/apache/spark/ml/fpm/FPGrowth.scala   |  5 +-
 .../ml/regression/AFTSurvivalRegression.scala  | 11 ++--
 .../spark/ml/regression/LinearRegression.scala | 15 ++
 python/pyspark/ml/classification.py| 58 ++
 python/pyspark/ml/clustering.py| 33 
 python/pyspark/ml/fpm.py   |  7 ++-
 python/pyspark/ml/regression.py| 57 +
 21 files changed, 141 insertions(+), 157 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

2020-07-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8cfb718  [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration 
docs
8cfb718 is described below

commit 8cfb7183865c5358a547ec892f10d4f1350300ff
Author: Xiaochang Wu 
AuthorDate: Tue Jul 28 08:36:11 2020 -0700

[SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

### What changes were proposed in this pull request?
Rewrite a clearer and complete BLAS native acceleration enabling guide.

### Why are the changes needed?
The document of enabling BLAS native acceleration in ML guide 
(https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete 
and unclear to the user.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
N/A

Closes #29139 from xwu99/blas-doc.

Lead-authored-by: Xiaochang Wu 
Co-authored-by: Wu, Xiaochang 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90)
Signed-off-by: Huaxin Gao 
---
 docs/ml-guide.md|  22 +++
 docs/ml-linalg-guide.md | 103 
 2 files changed, 109 insertions(+), 16 deletions(-)

diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index ddce98b..1b4a3e4 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the 
[DataFrame](sql-programmin
 
 # Dependencies
 
-MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), 
which depends on
-[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing.
-If native libraries[^1] are not available at runtime, you will see a warning 
message and a pure JVM
-implementation will be used instead.
+MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and 
[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing[^1]. Those packages may call native acceleration libraries such as 
[Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 or [OpenBLAS](http://www.openblas.net) if they are available as system 
libraries or in runtime library paths. 
 
-Due to licensing issues with runtime proprietary binaries, we do not include 
`netlib-java`'s native
-proxies by default.
-To configure `netlib-java` / Breeze to use system optimised binaries, include
-`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as 
a dependency of your
-project and read the [netlib-java](https://github.com/fommil/netlib-java) 
documentation for your
-platform's additional installation instructions.
-
-The most popular native BLAS such as [Intel 
MKL](https://software.intel.com/en-us/mkl), 
[OpenBLAS](http://www.openblas.net), can use multiple threads in a single 
operation, which can conflict with Spark's execution model.
-
-Configuring these BLAS implementations to use a single thread for operations 
may actually improve performance (see 
[SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is 
usually optimal to match this to the number of cores each Spark task is 
configured to use, which is 1 by default and typically left at 1.
-
-Please refer to resources like the following to understand how to configure 
the number of threads these BLAS implementations use: [Intel 
MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications)
 or [Intel 
oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading)
 and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). 
Note that if nativeBLAS is n [...]
+Due to differing OSS licenses, `netlib-java`'s native proxies can't be 
distributed with Spark. See [MLlib Linear Algebra Acceleration 
Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra 
processing. If accelerated native libraries are not enabled, you will see a 
warning message like below and a pure JVM implementation will be used instead:
+```
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeSystemBLAS
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeRefBLAS
+```
 
 To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 
1.4 or newer.
 
diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md
new file mode 100644
index 000..7390913
--- /dev/null
+++ b/docs/ml-linalg-guide.md
@@ -0,0 +1,103 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation

[spark] branch master updated (c28da67 -> 44c868b)

2020-07-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c28da67  [SPARK-32382][SQL] Override table renaming in JDBC dialects
 add 44c868b  [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration 
docs

No new revisions were added by this update.

Summary of changes:
 docs/ml-guide.md|  22 +++
 docs/ml-linalg-guide.md | 103 
 2 files changed, 109 insertions(+), 16 deletions(-)
 create mode 100644 docs/ml-linalg-guide.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

2020-07-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8cfb718  [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration 
docs
8cfb718 is described below

commit 8cfb7183865c5358a547ec892f10d4f1350300ff
Author: Xiaochang Wu 
AuthorDate: Tue Jul 28 08:36:11 2020 -0700

[SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

### What changes were proposed in this pull request?
Rewrite a clearer and complete BLAS native acceleration enabling guide.

### Why are the changes needed?
The document of enabling BLAS native acceleration in ML guide 
(https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete 
and unclear to the user.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
N/A

Closes #29139 from xwu99/blas-doc.

Lead-authored-by: Xiaochang Wu 
Co-authored-by: Wu, Xiaochang 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90)
Signed-off-by: Huaxin Gao 
---
 docs/ml-guide.md|  22 +++
 docs/ml-linalg-guide.md | 103 
 2 files changed, 109 insertions(+), 16 deletions(-)

diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index ddce98b..1b4a3e4 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the 
[DataFrame](sql-programmin
 
 # Dependencies
 
-MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), 
which depends on
-[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing.
-If native libraries[^1] are not available at runtime, you will see a warning 
message and a pure JVM
-implementation will be used instead.
+MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and 
[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing[^1]. Those packages may call native acceleration libraries such as 
[Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 or [OpenBLAS](http://www.openblas.net) if they are available as system 
libraries or in runtime library paths. 
 
-Due to licensing issues with runtime proprietary binaries, we do not include 
`netlib-java`'s native
-proxies by default.
-To configure `netlib-java` / Breeze to use system optimised binaries, include
-`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as 
a dependency of your
-project and read the [netlib-java](https://github.com/fommil/netlib-java) 
documentation for your
-platform's additional installation instructions.
-
-The most popular native BLAS such as [Intel 
MKL](https://software.intel.com/en-us/mkl), 
[OpenBLAS](http://www.openblas.net), can use multiple threads in a single 
operation, which can conflict with Spark's execution model.
-
-Configuring these BLAS implementations to use a single thread for operations 
may actually improve performance (see 
[SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is 
usually optimal to match this to the number of cores each Spark task is 
configured to use, which is 1 by default and typically left at 1.
-
-Please refer to resources like the following to understand how to configure 
the number of threads these BLAS implementations use: [Intel 
MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications)
 or [Intel 
oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading)
 and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). 
Note that if nativeBLAS is n [...]
+Due to differing OSS licenses, `netlib-java`'s native proxies can't be 
distributed with Spark. See [MLlib Linear Algebra Acceleration 
Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra 
processing. If accelerated native libraries are not enabled, you will see a 
warning message like below and a pure JVM implementation will be used instead:
+```
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeSystemBLAS
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeRefBLAS
+```
 
 To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 
1.4 or newer.
 
diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md
new file mode 100644
index 000..7390913
--- /dev/null
+++ b/docs/ml-linalg-guide.md
@@ -0,0 +1,103 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation

[spark] branch master updated (c28da67 -> 44c868b)

2020-07-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c28da67  [SPARK-32382][SQL] Override table renaming in JDBC dialects
 add 44c868b  [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration 
docs

No new revisions were added by this update.

Summary of changes:
 docs/ml-guide.md|  22 +++
 docs/ml-linalg-guide.md | 103 
 2 files changed, 109 insertions(+), 16 deletions(-)
 create mode 100644 docs/ml-linalg-guide.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

2020-07-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8cfb718  [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration 
docs
8cfb718 is described below

commit 8cfb7183865c5358a547ec892f10d4f1350300ff
Author: Xiaochang Wu 
AuthorDate: Tue Jul 28 08:36:11 2020 -0700

[SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

### What changes were proposed in this pull request?
Rewrite a clearer and complete BLAS native acceleration enabling guide.

### Why are the changes needed?
The document of enabling BLAS native acceleration in ML guide 
(https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete 
and unclear to the user.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
N/A

Closes #29139 from xwu99/blas-doc.

Lead-authored-by: Xiaochang Wu 
Co-authored-by: Wu, Xiaochang 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90)
Signed-off-by: Huaxin Gao 
---
 docs/ml-guide.md|  22 +++
 docs/ml-linalg-guide.md | 103 
 2 files changed, 109 insertions(+), 16 deletions(-)

diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index ddce98b..1b4a3e4 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the 
[DataFrame](sql-programmin
 
 # Dependencies
 
-MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), 
which depends on
-[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing.
-If native libraries[^1] are not available at runtime, you will see a warning 
message and a pure JVM
-implementation will be used instead.
+MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and 
[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing[^1]. Those packages may call native acceleration libraries such as 
[Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 or [OpenBLAS](http://www.openblas.net) if they are available as system 
libraries or in runtime library paths. 
 
-Due to licensing issues with runtime proprietary binaries, we do not include 
`netlib-java`'s native
-proxies by default.
-To configure `netlib-java` / Breeze to use system optimised binaries, include
-`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as 
a dependency of your
-project and read the [netlib-java](https://github.com/fommil/netlib-java) 
documentation for your
-platform's additional installation instructions.
-
-The most popular native BLAS such as [Intel 
MKL](https://software.intel.com/en-us/mkl), 
[OpenBLAS](http://www.openblas.net), can use multiple threads in a single 
operation, which can conflict with Spark's execution model.
-
-Configuring these BLAS implementations to use a single thread for operations 
may actually improve performance (see 
[SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is 
usually optimal to match this to the number of cores each Spark task is 
configured to use, which is 1 by default and typically left at 1.
-
-Please refer to resources like the following to understand how to configure 
the number of threads these BLAS implementations use: [Intel 
MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications)
 or [Intel 
oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading)
 and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). 
Note that if nativeBLAS is n [...]
+Due to differing OSS licenses, `netlib-java`'s native proxies can't be 
distributed with Spark. See [MLlib Linear Algebra Acceleration 
Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra 
processing. If accelerated native libraries are not enabled, you will see a 
warning message like below and a pure JVM implementation will be used instead:
+```
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeSystemBLAS
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeRefBLAS
+```
 
 To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 
1.4 or newer.
 
diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md
new file mode 100644
index 000..7390913
--- /dev/null
+++ b/docs/ml-linalg-guide.md
@@ -0,0 +1,103 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation

[spark] branch master updated (c28da67 -> 44c868b)

2020-07-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c28da67  [SPARK-32382][SQL] Override table renaming in JDBC dialects
 add 44c868b  [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration 
docs

No new revisions were added by this update.

Summary of changes:
 docs/ml-guide.md|  22 +++
 docs/ml-linalg-guide.md | 103 
 2 files changed, 109 insertions(+), 16 deletions(-)
 create mode 100644 docs/ml-linalg-guide.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

2020-07-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8cfb718  [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration 
docs
8cfb718 is described below

commit 8cfb7183865c5358a547ec892f10d4f1350300ff
Author: Xiaochang Wu 
AuthorDate: Tue Jul 28 08:36:11 2020 -0700

[SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

### What changes were proposed in this pull request?
Rewrite a clearer and complete BLAS native acceleration enabling guide.

### Why are the changes needed?
The document of enabling BLAS native acceleration in ML guide 
(https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete 
and unclear to the user.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
N/A

Closes #29139 from xwu99/blas-doc.

Lead-authored-by: Xiaochang Wu 
Co-authored-by: Wu, Xiaochang 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90)
Signed-off-by: Huaxin Gao 
---
 docs/ml-guide.md|  22 +++
 docs/ml-linalg-guide.md | 103 
 2 files changed, 109 insertions(+), 16 deletions(-)

diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index ddce98b..1b4a3e4 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the 
[DataFrame](sql-programmin
 
 # Dependencies
 
-MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), 
which depends on
-[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing.
-If native libraries[^1] are not available at runtime, you will see a warning 
message and a pure JVM
-implementation will be used instead.
+MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and 
[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing[^1]. Those packages may call native acceleration libraries such as 
[Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 or [OpenBLAS](http://www.openblas.net) if they are available as system 
libraries or in runtime library paths. 
 
-Due to licensing issues with runtime proprietary binaries, we do not include 
`netlib-java`'s native
-proxies by default.
-To configure `netlib-java` / Breeze to use system optimised binaries, include
-`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as 
a dependency of your
-project and read the [netlib-java](https://github.com/fommil/netlib-java) 
documentation for your
-platform's additional installation instructions.
-
-The most popular native BLAS such as [Intel 
MKL](https://software.intel.com/en-us/mkl), 
[OpenBLAS](http://www.openblas.net), can use multiple threads in a single 
operation, which can conflict with Spark's execution model.
-
-Configuring these BLAS implementations to use a single thread for operations 
may actually improve performance (see 
[SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is 
usually optimal to match this to the number of cores each Spark task is 
configured to use, which is 1 by default and typically left at 1.
-
-Please refer to resources like the following to understand how to configure 
the number of threads these BLAS implementations use: [Intel 
MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications)
 or [Intel 
oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading)
 and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). 
Note that if nativeBLAS is n [...]
+Due to differing OSS licenses, `netlib-java`'s native proxies can't be 
distributed with Spark. See [MLlib Linear Algebra Acceleration 
Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra 
processing. If accelerated native libraries are not enabled, you will see a 
warning message like below and a pure JVM implementation will be used instead:
+```
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeSystemBLAS
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeRefBLAS
+```
 
 To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 
1.4 or newer.
 
diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md
new file mode 100644
index 000..7390913
--- /dev/null
+++ b/docs/ml-linalg-guide.md
@@ -0,0 +1,103 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation

[spark] branch master updated (c28da67 -> 44c868b)

2020-07-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c28da67  [SPARK-32382][SQL] Override table renaming in JDBC dialects
 add 44c868b  [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration 
docs

No new revisions were added by this update.

Summary of changes:
 docs/ml-guide.md|  22 +++
 docs/ml-linalg-guide.md | 103 
 2 files changed, 109 insertions(+), 16 deletions(-)
 create mode 100644 docs/ml-linalg-guide.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

2020-07-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8cfb718  [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration 
docs
8cfb718 is described below

commit 8cfb7183865c5358a547ec892f10d4f1350300ff
Author: Xiaochang Wu 
AuthorDate: Tue Jul 28 08:36:11 2020 -0700

[SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs

### What changes were proposed in this pull request?
Rewrite a clearer and complete BLAS native acceleration enabling guide.

### Why are the changes needed?
The document of enabling BLAS native acceleration in ML guide 
(https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete 
and unclear to the user.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
N/A

Closes #29139 from xwu99/blas-doc.

Lead-authored-by: Xiaochang Wu 
Co-authored-by: Wu, Xiaochang 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 44c868b73a7cb293ec81927c28991677bf33ea90)
Signed-off-by: Huaxin Gao 
---
 docs/ml-guide.md|  22 +++
 docs/ml-linalg-guide.md | 103 
 2 files changed, 109 insertions(+), 16 deletions(-)

diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index ddce98b..1b4a3e4 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the 
[DataFrame](sql-programmin
 
 # Dependencies
 
-MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), 
which depends on
-[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing.
-If native libraries[^1] are not available at runtime, you will see a warning 
message and a pure JVM
-implementation will be used instead.
+MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and 
[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing[^1]. Those packages may call native acceleration libraries such as 
[Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 or [OpenBLAS](http://www.openblas.net) if they are available as system 
libraries or in runtime library paths. 
 
-Due to licensing issues with runtime proprietary binaries, we do not include 
`netlib-java`'s native
-proxies by default.
-To configure `netlib-java` / Breeze to use system optimised binaries, include
-`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as 
a dependency of your
-project and read the [netlib-java](https://github.com/fommil/netlib-java) 
documentation for your
-platform's additional installation instructions.
-
-The most popular native BLAS such as [Intel 
MKL](https://software.intel.com/en-us/mkl), 
[OpenBLAS](http://www.openblas.net), can use multiple threads in a single 
operation, which can conflict with Spark's execution model.
-
-Configuring these BLAS implementations to use a single thread for operations 
may actually improve performance (see 
[SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is 
usually optimal to match this to the number of cores each Spark task is 
configured to use, which is 1 by default and typically left at 1.
-
-Please refer to resources like the following to understand how to configure 
the number of threads these BLAS implementations use: [Intel 
MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications)
 or [Intel 
oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading)
 and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). 
Note that if nativeBLAS is n [...]
+Due to differing OSS licenses, `netlib-java`'s native proxies can't be 
distributed with Spark. See [MLlib Linear Algebra Acceleration 
Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra 
processing. If accelerated native libraries are not enabled, you will see a 
warning message like below and a pure JVM implementation will be used instead:
+```
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeSystemBLAS
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeRefBLAS
+```
 
 To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 
1.4 or newer.
 
diff --git a/docs/ml-linalg-guide.md b/docs/ml-linalg-guide.md
new file mode 100644
index 000..7390913
--- /dev/null
+++ b/docs/ml-linalg-guide.md
@@ -0,0 +1,103 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation

[spark] branch master updated (c28da67 -> 44c868b)

2020-07-28 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c28da67  [SPARK-32382][SQL] Override table renaming in JDBC dialects
 add 44c868b  [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration 
docs

No new revisions were added by this update.

Summary of changes:
 docs/ml-guide.md|  22 +++
 docs/ml-linalg-guide.md | 103 
 2 files changed, 109 insertions(+), 16 deletions(-)
 create mode 100644 docs/ml-linalg-guide.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c114066 -> f7542d3)

2020-07-27 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c114066  [SPARK-32443][CORE] Use POSIX-compatible `command -v` in 
testCommandAvailable
 add f7542d3  [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/classification/DecisionTreeClassifier.scala | 2 +-
 .../scala/org/apache/spark/ml/classification/FMClassifier.scala | 6 ++
 .../scala/org/apache/spark/ml/classification/GBTClassifier.scala| 2 +-
 .../org/apache/spark/ml/classification/LogisticRegression.scala | 4 ++--
 .../spark/ml/classification/MultilayerPerceptronClassifier.scala| 2 +-
 5 files changed, 7 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c114066 -> f7542d3)

2020-07-27 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c114066  [SPARK-32443][CORE] Use POSIX-compatible `command -v` in 
testCommandAvailable
 add f7542d3  [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/classification/DecisionTreeClassifier.scala | 2 +-
 .../scala/org/apache/spark/ml/classification/FMClassifier.scala | 6 ++
 .../scala/org/apache/spark/ml/classification/GBTClassifier.scala| 2 +-
 .../org/apache/spark/ml/classification/LogisticRegression.scala | 4 ++--
 .../spark/ml/classification/MultilayerPerceptronClassifier.scala| 2 +-
 5 files changed, 7 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c114066 -> f7542d3)

2020-07-27 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c114066  [SPARK-32443][CORE] Use POSIX-compatible `command -v` in 
testCommandAvailable
 add f7542d3  [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/classification/DecisionTreeClassifier.scala | 2 +-
 .../scala/org/apache/spark/ml/classification/FMClassifier.scala | 6 ++
 .../scala/org/apache/spark/ml/classification/GBTClassifier.scala| 2 +-
 .../org/apache/spark/ml/classification/LogisticRegression.scala | 4 ++--
 .../spark/ml/classification/MultilayerPerceptronClassifier.scala| 2 +-
 5 files changed, 7 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c114066 -> f7542d3)

2020-07-27 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c114066  [SPARK-32443][CORE] Use POSIX-compatible `command -v` in 
testCommandAvailable
 add f7542d3  [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/classification/DecisionTreeClassifier.scala | 2 +-
 .../scala/org/apache/spark/ml/classification/FMClassifier.scala | 6 ++
 .../scala/org/apache/spark/ml/classification/GBTClassifier.scala| 2 +-
 .../org/apache/spark/ml/classification/LogisticRegression.scala | 4 ++--
 .../spark/ml/classification/MultilayerPerceptronClassifier.scala| 2 +-
 5 files changed, 7 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c114066 -> f7542d3)

2020-07-27 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c114066  [SPARK-32443][CORE] Use POSIX-compatible `command -v` in 
testCommandAvailable
 add f7542d3  [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ml/classification/DecisionTreeClassifier.scala | 2 +-
 .../scala/org/apache/spark/ml/classification/FMClassifier.scala | 6 ++
 .../scala/org/apache/spark/ml/classification/GBTClassifier.scala| 2 +-
 .../org/apache/spark/ml/classification/LogisticRegression.scala | 4 ++--
 .../spark/ml/classification/MultilayerPerceptronClassifier.scala| 2 +-
 5 files changed, 7 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (3165ca7 -> 122c899)

2020-11-10 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3165ca7  [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" 
in Hive IsolatedClientLoader
 add 122c899  [SPARK-33251][FOLLOWUP][PYTHON][DOCS][MINOR] Adjusts returns 
PrefixSpan.findFrequentSequentialPatterns

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/fpm.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (3165ca7 -> 122c899)

2020-11-10 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3165ca7  [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" 
in Hive IsolatedClientLoader
 add 122c899  [SPARK-33251][FOLLOWUP][PYTHON][DOCS][MINOR] Adjusts returns 
PrefixSpan.findFrequentSequentialPatterns

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/fpm.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (3165ca7 -> 122c899)

2020-11-10 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3165ca7  [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" 
in Hive IsolatedClientLoader
 add 122c899  [SPARK-33251][FOLLOWUP][PYTHON][DOCS][MINOR] Adjusts returns 
PrefixSpan.findFrequentSequentialPatterns

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/fpm.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (3165ca7 -> 122c899)

2020-11-10 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3165ca7  [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" 
in Hive IsolatedClientLoader
 add 122c899  [SPARK-33251][FOLLOWUP][PYTHON][DOCS][MINOR] Adjusts returns 
PrefixSpan.findFrequentSequentialPatterns

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/fpm.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (3165ca7 -> 122c899)

2020-11-10 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3165ca7  [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" 
in Hive IsolatedClientLoader
 add 122c899  [SPARK-33251][FOLLOWUP][PYTHON][DOCS][MINOR] Adjusts returns 
PrefixSpan.findFrequentSequentialPatterns

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/fpm.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35678][ML][FOLLOWUP] softmax support offset and step

2021-06-17 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fdf86fd  [SPARK-35678][ML][FOLLOWUP] softmax support offset and step
fdf86fd is described below

commit fdf86fd6e795474afb78d1917369fec288d06b24
Author: Ruifeng Zheng 
AuthorDate: Thu Jun 17 22:46:36 2021 -0700

[SPARK-35678][ML][FOLLOWUP] softmax support offset and step

### What changes were proposed in this pull request?
use newly impled softmax function in NB

### Why are the changes needed?
to simplify impl

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing testsuite

Closes #32927 from zhengruifeng/softmax__followup.

Authored-by: Ruifeng Zheng 
Signed-off-by: Huaxin Gao 
---
 .../scala/org/apache/spark/ml/impl/Utils.scala | 44 ++--
 .../org/apache/spark/ml/ann/LossFunction.scala | 26 ++
 .../spark/ml/classification/NaiveBayes.scala   | 15 +-
 .../MultinomialLogisticBlockAggregator.scala   | 58 +++---
 4 files changed, 62 insertions(+), 81 deletions(-)

diff --git a/mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala 
b/mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala
index 8ff5b6a..abe1d4b 100644
--- a/mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala
+++ b/mllib-local/src/main/scala/org/apache/spark/ml/impl/Utils.scala
@@ -99,30 +99,42 @@ private[spark] object Utils {
   /**
* Perform in-place softmax conversion.
*/
-  def softmax(values: Array[Double]): Unit = {
+  def softmax(array: Array[Double]): Unit =
+softmax(array, array.length, 0, 1, array)
+
+  /**
+   * Perform softmax conversion.
+   */
+  def softmax(
+  input: Array[Double],
+  n: Int,
+  offset: Int,
+  step: Int,
+  output: Array[Double]): Unit = {
 var maxValue = Double.MinValue
-var i = 0
-while (i < values.length) {
-  val value = values(i)
-  if (value.isPosInfinity) {
-java.util.Arrays.fill(values, 0)
-values(i) = 1.0
+var i = offset
+val end = offset + step * n
+while (i < end) {
+  val v = input(i)
+  if (v.isPosInfinity) {
+BLAS.javaBLAS.dscal(n, 0.0, output, offset, step)
+output(i) = 1.0
 return
-  } else if (value > maxValue) {
-maxValue = value
+  } else if (v > maxValue) {
+maxValue = v
   }
-  i += 1
+  i += step
 }
 
 var sum = 0.0
-i = 0
-while (i < values.length) {
-  val exp = math.exp(values(i) - maxValue)
-  values(i) = exp
+i = offset
+while (i < end) {
+  val exp = math.exp(input(i) - maxValue)
+  output(i) = exp
   sum += exp
-  i += 1
+  i += step
 }
 
-BLAS.javaBLAS.dscal(values.length, 1.0 / sum, values, 1)
+BLAS.javaBLAS.dscal(n, 1.0 / sum, output, offset, step)
   }
 }
diff --git a/mllib/src/main/scala/org/apache/spark/ml/ann/LossFunction.scala 
b/mllib/src/main/scala/org/apache/spark/ml/ann/LossFunction.scala
index 3aea568..37e7b53 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/ann/LossFunction.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/ann/LossFunction.scala
@@ -22,6 +22,8 @@ import java.util.Random
 import breeze.linalg.{sum => Bsum, DenseMatrix => BDM, DenseVector => BDV}
 import breeze.numerics.{log => brzlog}
 
+import org.apache.spark.ml.impl.Utils
+
 /**
  * Trait for loss function
  */
@@ -79,30 +81,10 @@ private[ann] class SoftmaxLayerModelWithCrossEntropyLoss 
extends LayerModel with
   val weights = new BDV[Double](0)
 
   override def eval(data: BDM[Double], output: BDM[Double]): Unit = {
+require(!data.isTranspose && !output.isTranspose)
 var j = 0
-// find max value to make sure later that exponent is computable
 while (j < data.cols) {
-  var i = 0
-  var max = Double.MinValue
-  while (i < data.rows) {
-if (data(i, j) > max) {
-  max = data(i, j)
-}
-i += 1
-  }
-  var sum = 0.0
-  i = 0
-  while (i < data.rows) {
-val res = math.exp(data(i, j) - max)
-output(i, j) = res
-sum += res
-i += 1
-  }
-  i = 0
-  while (i < data.rows) {
-output(i, j) /= sum
-i += 1
-  }
+  Utils.softmax(data.data, data.rows, j * data.rows, 1, output.data)
   j += 1
 }
   }
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala 
b/mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala
index 6b1537b..fd19ec3 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala
@@ -2

[spark] branch master updated (be90897 -> a667388)

2021-06-23 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from be90897  [SPARK-35588][PYTHON][DOCS] Merge Binder integration and 
quickstart notebook for pandas API on Spark
 add a667388  [SPARK-35678][ML][FOLLOWUP] softmax support offset and step

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ml/impl/Utils.scala | 44 ++--
 .../org/apache/spark/ml/ann/LossFunction.scala | 26 ++
 .../spark/ml/classification/NaiveBayes.scala   | 15 +-
 .../MultinomialLogisticBlockAggregator.scala   | 58 +++---
 python/pyspark/ml/tests/test_algorithms.py |  2 +-
 5 files changed, 63 insertions(+), 82 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: Organization update

2021-05-04 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new f9cf29d  Organization update
f9cf29d is described below

commit f9cf29d603ed5ce5bd6388c5824d02f95082c8b0
Author: Jungtaek Lim 
AuthorDate: Mon May 3 23:30:43 2021 -0700

Organization update

Author: Jungtaek Lim 

Closes #337 from HeartSaVioR/jungtaek-dbx.
---
 committers.md| 2 +-
 site/committers.html | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/committers.md b/committers.md
index 458b5ac..ad12fa9 100644
--- a/committers.md
+++ b/committers.md
@@ -53,7 +53,7 @@ navigation:
 |Davies Liu|Juicedata|
 |Cheng Lian|Databricks|
 |Yanbo Liang|Facebook|
-|Jungtaek Lim|Cloudera|
+|Jungtaek Lim|Databricks|
 |Sean McNamara|Oracle|
 |Xiangrui Meng|Databricks|
 |Mridul Muralidharan|LinkedIn|
diff --git a/site/committers.html b/site/committers.html
index 16c048b..93bab98 100644
--- a/site/committers.html
+++ b/site/committers.html
@@ -384,7 +384,7 @@
 
 
   Jungtaek Lim
-  Cloudera
+  Databricks
 
 
   Sean McNamara

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

2021-09-23 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0076eba  [MINOR][SQL][DOCS] Correct the 'options' description on 
UnresolvedRelation
0076eba is described below

commit 0076eba8d066936c32790ebc4058c52e2d21a207
Author: Hyukjin Kwon 
AuthorDate: Wed Sep 22 23:00:15 2021 -0700

[MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

### What changes were proposed in this pull request?

This PR fixes the 'options' description on `UnresolvedRelation`. This 
comment was added in https://github.com/apache/spark/pull/29535 but not valid 
anymore because V1 also uses this `options` (and merge the options with the 
table properties) per https://github.com/apache/spark/pull/29712.

This PR can go through from `master` to `branch-3.1`.

### Why are the changes needed?

To make `UnresolvedRelation.options`'s description clearer.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Scala linter by `dev/linter-scala`.

Closes #34075 from HyukjinKwon/minor-comment-unresolved-releation.

Authored-by: Hyukjin Kwon 
Signed-off-by: Huaxin Gao 
---
 .../main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
index 8417203..0785336 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
@@ -40,7 +40,7 @@ class UnresolvedException(function: String)
  * Holds the name of a relation that has yet to be looked up in a catalog.
  *
  * @param multipartIdentifier table name
- * @param options options to scan this relation. Only applicable to v2 table 
scan.
+ * @param options options to scan this relation.
  */
 case class UnresolvedRelation(
 multipartIdentifier: Seq[String],

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

2021-09-23 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new b5cb3b6  [MINOR][SQL][DOCS] Correct the 'options' description on 
UnresolvedRelation
b5cb3b6 is described below

commit b5cb3b682a2cecae6d826f7610a2606c48fc9643
Author: Hyukjin Kwon 
AuthorDate: Wed Sep 22 23:00:15 2021 -0700

[MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

### What changes were proposed in this pull request?

This PR fixes the 'options' description on `UnresolvedRelation`. This 
comment was added in https://github.com/apache/spark/pull/29535 but not valid 
anymore because V1 also uses this `options` (and merge the options with the 
table properties) per https://github.com/apache/spark/pull/29712.

This PR can go through from `master` to `branch-3.1`.

### Why are the changes needed?

To make `UnresolvedRelation.options`'s description clearer.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Scala linter by `dev/linter-scala`.

Closes #34075 from HyukjinKwon/minor-comment-unresolved-releation.

Authored-by: Hyukjin Kwon 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 0076eba8d066936c32790ebc4058c52e2d21a207)
Signed-off-by: Huaxin Gao 
---
 .../main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
index 55eca63..ec420c4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
@@ -41,7 +41,7 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: 
TreeType, function: Str
  * Holds the name of a relation that has yet to be looked up in a catalog.
  *
  * @param multipartIdentifier table name
- * @param options options to scan this relation. Only applicable to v2 table 
scan.
+ * @param options options to scan this relation.
  */
 case class UnresolvedRelation(
 multipartIdentifier: Seq[String],

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

2021-09-23 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new af569d1  [MINOR][SQL][DOCS] Correct the 'options' description on 
UnresolvedRelation
af569d1 is described below

commit af569d1b0ac6b25dbd500804a395964ef7f9e60f
Author: Hyukjin Kwon 
AuthorDate: Wed Sep 22 23:00:15 2021 -0700

[MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

### What changes were proposed in this pull request?

This PR fixes the 'options' description on `UnresolvedRelation`. This 
comment was added in https://github.com/apache/spark/pull/29535 but not valid 
anymore because V1 also uses this `options` (and merge the options with the 
table properties) per https://github.com/apache/spark/pull/29712.

This PR can go through from `master` to `branch-3.1`.

### Why are the changes needed?

To make `UnresolvedRelation.options`'s description clearer.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Scala linter by `dev/linter-scala`.

Closes #34075 from HyukjinKwon/minor-comment-unresolved-releation.

Authored-by: Hyukjin Kwon 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 0076eba8d066936c32790ebc4058c52e2d21a207)
Signed-off-by: Huaxin Gao 
---
 .../main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
index 9f05367..9db038d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
@@ -41,7 +41,7 @@ class UnresolvedException(function: String)
  * Holds the name of a relation that has yet to be looked up in a catalog.
  *
  * @param multipartIdentifier table name
- * @param options options to scan this relation. Only applicable to v2 table 
scan.
+ * @param options options to scan this relation.
  */
 case class UnresolvedRelation(
 multipartIdentifier: Seq[String],

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1f3eb73 -> c411d26)

2021-12-03 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1f3eb73  [SPARK-37510][PYTHON] Support basic operations of timedelta 
Series/Index
 add c411d26  [SPARK-37330][SQL] Migrate ReplaceTableStatement to v2 command

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/ResolveCatalogs.scala| 12 ---
 .../spark/sql/catalyst/parser/AstBuilder.scala | 11 ++-
 .../sql/catalyst/plans/logical/statements.scala| 23 +-
 .../sql/catalyst/plans/logical/v2Commands.scala| 19 ++
 .../sql/connector/catalog/CatalogV2Util.scala  |  6 +-
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 20 +--
 .../catalyst/analysis/ResolveSessionCatalog.scala  | 19 +++---
 .../datasources/v2/DataSourceV2Strategy.scala  | 19 --
 .../datasources/v2/ReplaceTableExec.scala  | 11 ---
 9 files changed, 56 insertions(+), 84 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-37545][SQL] V2 CreateTableAsSelect command should qualify location

2021-12-04 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new feba5ac  [SPARK-37545][SQL] V2 CreateTableAsSelect command should 
qualify location
feba5ac is described below

commit feba5ac32f2598f6ca8a274850934106be0db64d
Author: Terry Kim 
AuthorDate: Sat Dec 4 20:47:45 2021 -0800

[SPARK-37545][SQL] V2 CreateTableAsSelect command should qualify location

### What changes were proposed in this pull request?

Currently, v2 CTAS command doesn't qualify the location:
```
spark.sql("CREATE TABLE testcat.t USING foo LOCATION '/tmp/foo' AS SELECT 
id FROM source")
spark.sql("DESCRIBE EXTENDED testcat.t").filter("col_name = 
'Location'").show

++-+---+
|col_name|data_type|comment|
++-+---+
|Location|/tmp/foo |   |
++-+---+
```
, whereas v1 command qualifies the location as `file:/tmp/foo` which is the 
correct behavior since the default filesystem can change for different sessions.

### Why are the changes needed?

This PR proposes to store the qualified location in order to prevent the 
issue where default filesystem changes for different sessions.

### Does this PR introduce _any_ user-facing change?

Yes, now, v2 CTAS command will store qualified location:
```
++-+---+
|col_name|data_type|comment|
++-+---+
|Location|file:/tmp/foo|   |
++-+---+
```

### How was this patch tested?

Added new test

Closes #34806 from imback82/v2_ctas_qualified_loc.

Authored-by: Terry Kim 
Signed-off-by: Huaxin Gao 
---
 .../execution/datasources/v2/DataSourceV2Strategy.scala   |  6 --
 .../DataSourceV2DataFrameSessionCatalogSuite.scala|  4 ++--
 .../apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 15 +++
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
index f73b1a6..dbe4168 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
@@ -172,13 +172,15 @@ class DataSourceV2Strategy(session: SparkSession) extends 
Strategy with Predicat
 case CreateTableAsSelect(ResolvedDBObjectName(catalog, ident), parts, 
query, tableSpec,
 options, ifNotExists) =>
   val writeOptions = new CaseInsensitiveStringMap(options.asJava)
+  val tableSpecWithQualifiedLocation = tableSpec.copy(
+location = tableSpec.location.map(makeQualifiedDBObjectPath(_)))
   catalog match {
 case staging: StagingTableCatalog =>
   AtomicCreateTableAsSelectExec(staging, ident.asIdentifier, parts, 
query, planLater(query),
-tableSpec, writeOptions, ifNotExists) :: Nil
+tableSpecWithQualifiedLocation, writeOptions, ifNotExists) :: Nil
 case _ =>
   CreateTableAsSelectExec(catalog.asTableCatalog, ident.asIdentifier, 
parts, query,
-planLater(query), tableSpec, writeOptions, ifNotExists) :: Nil
+planLater(query), tableSpecWithQualifiedLocation, writeOptions, 
ifNotExists) :: Nil
   }
 
 case RefreshTable(r: ResolvedTable) =>
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSessionCatalogSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSessionCatalogSuite.scala
index 91ac7db..3edc4b9 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSessionCatalogSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSessionCatalogSuite.scala
@@ -83,10 +83,10 @@ class DataSourceV2DataFrameSessionCatalogSuite
   test("saveAsTable passes path and provider information properly") {
 val t1 = "prop_table"
 withTable(t1) {
-  spark.range(20).write.format(v2Format).option("path", 
"abc").saveAsTable(t1)
+  spark.range(20).write.format(v2Format).option("path", 
"/abc").saveAsTable(t1)
   val cat = 
spark.sessionState.catalogManager.currentCatalog.asInstanceOf[TableCatalog]
   val tableInfo = cat.loadTable(Identifier.of(Array("default"), t1))
-  assert(tableInfo.properties().get("location") === "abc")
+  assert(tableInfo.properties().get("location&qu

  1   2   3   >