[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-11-12 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/19715#discussion_r150450319 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala --- @@ -146,4 +146,172 @@ class QuantileDiscretizerSuite

[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-11-12 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/19715#discussion_r150450305 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala --- @@ -146,4 +146,172 @@ class QuantileDiscretizerSuite

[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-11-30 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/19715#discussion_r154133181 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -129,34 +152,119 @@ final class QuantileDiscretizer @Since

[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-12-12 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/19715#discussion_r156442207 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -107,11 +107,11 @@ private[feature] trait

[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-12-15 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/19715#discussion_r157282154 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala --- @@ -146,4 +147,258 @@ class QuantileDiscretizerSuite

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-15 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/19715 I have also verified the save/load back compatibility. Thanks a lot for your comments! @MLnick --- - To unsubscribe, e

[GitHub] spark issue #19658: [SPARK-22443][SQL]add implementation of quoteIdentifier,...

2017-11-05 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/19658 Thanks a lot!! @gatorsmile @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-11-09 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/19715 [SPARK-22397][ML]add multiple columns support to QuantileDiscretizer ## What changes were proposed in this pull request? add multi columns support to QuantileDiscretizer ## How

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-11-09 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/19715 @MLnick @viirya Could you please review? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19658: [SPARK-22443][SQL]add implementation of quoteIden...

2017-11-04 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/19658 [SPARK-22443][SQL]add implementation of quoteIdentifier, getTableExistsQuery and getSchemaQuery in AggregatedDialect … ## What changes were proposed in this pull request

[GitHub] spark pull request #19658: [SPARK-22443][SQL]add implementation of quoteIden...

2017-11-04 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/19658#discussion_r148944273 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala --- @@ -42,6 +42,18 @@ private class AggregatedDialect(dialects

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-08 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/19715 @MLnick Thank you very much for your comments! I will change these. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-12-11 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/19715#discussion_r156160532 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -129,34 +156,102 @@ final class QuantileDiscretizer @Since

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-20 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/19715 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #21244: [SPARK-24815]add flatten function to SparkR

2018-05-04 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21244 [SPARK-24815]add flatten function to SparkR ## What changes were proposed in this pull request? add array flatten function to SparkR ## How was this patch tested? Unit

[GitHub] spark pull request #21255: [SPARK-24186][SparR][SQL]change reverse and conca...

2018-05-07 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21255 [SPARK-24186][SparR][SQL]change reverse and concat to collection functions in R ## What changes were proposed in this pull request? reverse and concat are already in functions.R

[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-14 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21307 Thanks a lot!! @HyukjinKwon @viirya @felixcheung --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21255: [SPARK-24186][SparR][SQL]change reverse and conca...

2018-05-07 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21255#discussion_r186604721 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1502,12 +1502,21 @@ test_that("column functions", { result <- collect(select(d

[GitHub] spark pull request #21255: [SPARK-24186][SparR][SQL]change reverse and conca...

2018-05-08 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21255#discussion_r186630904 --- Diff: R/pkg/R/functions.R --- @@ -1253,19 +1256,6 @@ setMethod("quarter", column(jc) }) -#

[GitHub] spark pull request #21255: [SPARK-24186][SparR][SQL]change reverse and conca...

2018-05-08 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21255#discussion_r186635312 --- Diff: R/pkg/R/functions.R --- @@ -1253,19 +1256,6 @@ setMethod("quarter", column(jc) }) -#

[GitHub] spark pull request #21255: [SPARK-24186][SparR][SQL]change reverse and conca...

2018-05-08 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21255#discussion_r186631550 --- Diff: R/pkg/R/functions.R --- @@ -2043,34 +2033,6 @@ setMethod("countDistinct", column(jc) }) -#

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-27 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184808830 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-27 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184809072 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21255: [SPARK-24186][R][SQL]change reverse and concat to...

2018-05-10 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21255#discussion_r187262627 --- Diff: R/pkg/R/functions.R --- @@ -219,7 +219,8 @@ NULL #' head(select(tmp3, map_values(tmp3$v3))) #' head(select(tmp3, element_at(tmp3$v3

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-05-10 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21119 @jkbradley Thanks for letting me know. I will change the python API accordingly after the new scala version

[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-12 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21307 @HyukjinKwon I think I resolved the problem. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21307: [SPARK-24186][R][SQL]change reverse and concat to...

2018-05-13 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21307#discussion_r187802112 --- Diff: R/pkg/R/functions.R --- @@ -2055,20 +2058,10 @@ setMethod("countDistinct", #' @details #' \code{concat}: Concatenate

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-05-13 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21313 [SPARK-24187][R][SQL]Add array_join function to SparkR ## What changes were proposed in this pull request? This PR adds array_join function to SparkR ## How was this patch

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-05-13 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r187814160 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,28 @@ setMethod("array_contains", column(jc) }) +#

[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-05-13 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21313 There are still quite a lot of the SQL functions to be added in R. We can bundle several of the functions together in one PR, but I guess it's too much work to add all of them in one PR

[GitHub] spark issue #21255: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-10 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21255 @felixcheung @HyukjinKwon @viirya I am thinking of closing this one and open a new PR if no objection. It's messy to resolve the conflicts because I have quite a few patches. Sorry for my

[GitHub] spark pull request #21255: [SPARK-24186][R][SQL]change reverse and concat to...

2018-05-11 Thread huaxingao
Github user huaxingao closed the pull request at: https://github.com/apache/spark/pull/21255 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21307: [SPARK-24186][R][SQL]change reverse and concat to...

2018-05-11 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21307 [SPARK-24186][R][SQL]change reverse and concat to collection functions in R ## What changes were proposed in this pull request? reverse and concat are already in functions.R

[GitHub] spark pull request #21244: [SPARK-24185][SparkR][SQL]add flatten function to...

2018-05-05 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21244#discussion_r186275756 --- Diff: R/pkg/R/generics.R --- @@ -918,6 +918,10 @@ setGeneric("explode_outer", function(x) { standardGeneric("explode_outer") }

[GitHub] spark issue #21244: [SPARK-24185][SparkR][SQL]add flatten function to SparkR

2018-05-05 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21244 @viirya @mn-mikke @felixcheung @HyukjinKwon Thanks all for your help! @HyukjinKwon I will fix the two small things in my next PR

[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-05-16 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21069 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...

2018-05-22 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21069#discussion_r189978900 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1882,3 +1882,98 @@ case class

[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-05-22 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21069 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-24 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r190659883 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,20 @@ def getSubsamplingRate(self): """ return se

[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-24 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r190638735 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,20 @@ def getSubsamplingRate(self): """ return se

[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...

2018-05-24 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21069#discussion_r190652607 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1882,3 +1882,123 @@ case class

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-05-17 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r189149082 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1059,3 +1059,96 @@ case class Flatten

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-05-17 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r189148953 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1059,3 +1059,96 @@ case class Flatten

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-05-18 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r189358425 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1882,3 +1882,141 @@ case class

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-05-18 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r189357494 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1882,3 +1882,141 @@ case class

[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...

2018-05-15 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21069#discussion_r188494901 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -280,4 +280,35 @@ class

[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-23 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21413 [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBTClassifier ## What changes were proposed in this pull request? Add featureSubsetStrategy in GBTClassifier and GBTRegressor. Also

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-05-18 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r189405672 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -552,4 +552,26 @@ class

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-10 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194247139 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,213 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-10 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194247137 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,213 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-09 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194237396 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-06-08 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21119 @mengxr Sorry for the delay. I will submit an update later today. Do you want me to close this PR and do a new one? or just update this PR

[GitHub] spark issue #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-06-08 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21513 @mengxr @WeichenXu123 Could you please review? Thanks a lot in advance! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21513 [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC ## What changes were proposed in this pull request? add spark.ml Python API for PIC ## How was this patch tested

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-06-08 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21119 @mengxr @WeichenXu123 I will close this one and submit a new PR soon. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread huaxingao
Github user huaxingao closed the pull request at: https://github.com/apache/spark/pull/21119 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

2018-06-08 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21465#discussion_r194198390 --- Diff: python/pyspark/ml/classification.py --- @@ -1251,26 +1256,33 @@ class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194190113 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1159,216 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-06-12 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r194893615 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1882,3 +1883,134 @@ case class

[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...

2018-06-13 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21557 [SPARK-24439][ML][PYTHON]Add distanceMeasure to BisectingKMeans in PySpark ## What changes were proposed in this pull request? add distanceMeasure to BisectingKMeans in Python

[GitHub] spark issue #20442: [SPARK-23265][ML]Update multi-column error handling logi...

2018-06-13 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20442 @jkbradley test added. Could you please review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-06-17 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r195920795 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2355,3 +2356,319 @@ case class

[GitHub] spark issue #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-06-11 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21513 Thanks a lot for your help! @mengxr @WeichenXu123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...

2018-05-30 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21413 Thanks a lot @BryanCutler for your help! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

2018-05-30 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21465 [SPARK-24333][ML][PYTHON]Add fit with validation set to spark.ml GBT: Python API ## What changes were proposed in this pull request? Add validationIndicatorCol and validationTol

[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-29 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r191611779 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,22 @@ def getSubsamplingRate(self): """ return se

[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-29 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r191602398 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,22 @@ def getSubsamplingRate(self): """ return se

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192578750 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192578774 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192578796 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#

[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-05-31 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21069 Thank you for your help! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...

2018-05-31 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21069#discussion_r192256591 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -552,4 +552,60 @@ class

[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-01 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21313 @felixcheung @HyukjinKwon Any more comments? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-05 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21313 Thank you very much for your help! @HyukjinKwon @felixcheung --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-06-06 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21119 @mengxr @WeichenXu123 I will update this. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-04 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192814246 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-02 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192574515 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1518,6 +1518,16 @@ test_that("column functions", { result <- coll

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-06-30 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21649#discussion_r199338975 --- Diff: R/pkg/R/DataFrame.R --- @@ -3905,6 +3905,16 @@ setMethod("rollup", group

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-06-29 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21649#discussion_r199225907 --- Diff: R/pkg/R/DataFrame.R --- @@ -3905,6 +3905,18 @@ setMethod("rollup", groupedData(sgd) }) +isT

[GitHub] spark pull request #21678: [SPARK-23461][R]vignettes should include model pr...

2018-06-29 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21678 [SPARK-23461][R]vignettes should include model predictions for some ML models ## What changes were proposed in this pull request? Add model predictions for Linear Support Vector Machine

[GitHub] spark pull request #21645: [SPARK-24537][R]Add array_remove / array_zip / ma...

2018-06-26 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21645 [SPARK-24537][R]Add array_remove / array_zip / map_from_arrays / array_distinct ## What changes were proposed in this pull request? Add array_remove / array_zip / map_from_arrays

[GitHub] spark pull request #21649: SPARK[23648][R][SQL]Adds more types for hint in S...

2018-06-27 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21649 SPARK[23648][R][SQL]Adds more types for hint in SparkR ## What changes were proposed in this pull request? Addition of numeric and list hints for SparkR. ## How was this patch

[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...

2018-06-27 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21557#discussion_r198684081 --- Diff: python/pyspark/ml/clustering.py --- @@ -622,10 +621,10 @@ def __init__(self, featuresCol="features", predictionCol="predict

[GitHub] spark issue #21050: [SPARK-23912][SQL]add array_distinct

2018-06-20 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21050 @ueshin @kiszk Thanks for your comments. I fixed the problems. I am not sure if I should use ```$i++ ``` or ```$i ++``` in the for loop. It seems other people use ```$i ++```, so I also used

[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...

2018-06-30 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21678 Here is the output for Linear SVM Classifier in sparkr-vignettes.html. ``` prediction <- predict(model, training) head(select(prediction, "Class", "Sex", &

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-06-30 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21649#discussion_r199327384 --- Diff: R/pkg/R/DataFrame.R --- @@ -3905,6 +3905,18 @@ setMethod("rollup", groupedData(sgd) }) +isT

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-05-01 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r185332198 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1059,3 +1059,78 @@ case class Flatten

[GitHub] spark issue #21050: [SPARK-23912][SQL]add array_distinct

2018-05-01 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21050 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #21159: [SPARK-24057][PYTHON]put the real data type in the Asser...

2018-04-26 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21159 Thanks @BryanCutler @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-27 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/17819 @AFractalThought @viirya I have made changes for QuantileDiscretizer based on this PR. Once this PR is merged, I will open a jira to submit the PR for QuantileDiscretizer

[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-03 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21710 [SPARK-24207][R]add R API for PrefixSpan ## What changes were proposed in this pull request? add R API for PrefixSpan ## How was this patch tested? add test

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-31 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/19715 Thank you all for your help!! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

2018-01-24 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20390#discussion_r163759921 --- Diff: python/pyspark/sql/tests.py --- @@ -2855,6 +2855,10 @@ def test_create_dataframe_from_old_pandas(self

[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

2018-01-24 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20390#discussion_r163731132 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1881,6 +1881,15 @@ def toDF(self, *cols): jdf = self._jdf.toDF(self._jseq(cols

[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

2018-01-24 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/20390 [SPARK-23081][PYTHON]Add colRegex API to PySpark ## What changes were proposed in this pull request? Add colRegex API to PySpark ## How was this patch tested? add

[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

2018-01-25 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20390#discussion_r163925810 --- Diff: python/pyspark/sql/dataframe.py --- @@ -819,6 +819,29 @@ def columns(self): """ retur

[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

2018-01-25 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20390#discussion_r163940181 --- Diff: python/pyspark/sql/dataframe.py --- @@ -819,6 +819,29 @@ def columns(self): """ retur

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-25 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/20400 [SPARK-23084][PYTHON]Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark ## What changes were proposed in this pull request? Added unboundedPreceding

[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

2018-01-25 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20390 Thank you all for your help! @HyukjinKwon @gatorsmile @felixcheung --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-26 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r164261413 --- Diff: python/pyspark/sql/window.py --- @@ -124,16 +124,19 @@ def rangeBetween(start, end): values directly. :param

<    1   2   3   4   >