[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-26 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r164261850 --- Diff: python/pyspark/sql/window.py --- @@ -124,16 +124,19 @@ def rangeBetween(start, end): values directly. :param

[GitHub] spark pull request #20442: [SPARK-23265][SQL]Update multi-column error handl...

2018-01-30 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20442#discussion_r164955798 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -167,25 +167,31 @@ final class QuantileDiscretizer @Since

[GitHub] spark pull request #20442: [SPARK-23265][SQL]Update multi-column error handl...

2018-01-30 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20442#discussion_r164956149 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -167,25 +167,31 @@ final class QuantileDiscretizer @Since

[GitHub] spark pull request #20442: [SPARK-23265][SQL]Update multi-column error handl...

2018-01-30 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/20442 [SPARK-23265][SQL]Update multi-column error handling logic in QuantileDiscretizer ## What changes were proposed in this pull request? SPARK-22799 added more comprehensive error

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-30 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r164965040 --- Diff: python/pyspark/sql/functions.py --- @@ -809,6 +809,45 @@ def ntile(n): return Column(sc._jvm.functions.ntile(int(n

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-30 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r164965129 --- Diff: python/pyspark/sql/functions.py --- @@ -809,6 +809,45 @@ def ntile(n): return Column(sc._jvm.functions.ntile(int(n

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-30 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r164966938 --- Diff: python/pyspark/sql/window.py --- @@ -124,16 +126,20 @@ def rangeBetween(start, end): values directly. :param

[GitHub] spark pull request #20442: [SPARK-23265][SQL]Update multi-column error handl...

2018-01-31 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20442#discussion_r165131413 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -167,25 +167,36 @@ final class QuantileDiscretizer @Since

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-31 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r165229511 --- Diff: python/pyspark/sql/window.py --- @@ -212,16 +218,20 @@ def rangeBetween(self, start, end): values directly

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-31 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r165229664 --- Diff: python/pyspark/sql/window.py --- @@ -124,16 +126,20 @@ def rangeBetween(start, end): values directly. :param

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-31 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r165270774 --- Diff: python/pyspark/sql/window.py --- @@ -129,11 +131,34 @@ def rangeBetween(start, end): :param end: boundary end, inclusive

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-31 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r165258763 --- Diff: python/pyspark/sql/functions.py --- @@ -809,6 +809,48 @@ def ntile(n): return Column(sc._jvm.functions.ntile(int(n

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-02-04 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r165893442 --- Diff: python/pyspark/sql/window.py --- @@ -208,20 +236,27 @@ def rangeBetween(self, start, end): and "5" means the five

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-06 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20477 @cloud-fan I have a question about the Optimized Logical Plan. In the "What changed were proposed" section, it is said that after this PR, the Optimized Logical Plan will be as

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-02-12 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20400 @HyukjinKwon Thanks a lot for your help! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20442: [SPARK-23265][ML]Update multi-column error handling logi...

2018-02-17 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20442 Thanks for the comments. I am in China now for Chinese New Year. Will address the comments when I get back to work on 2/21

[GitHub] spark issue #20442: [SPARK-23265][ML]Update multi-column error handling logi...

2018-02-21 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20442 Sorry for not working on this earlier. Just came back from China yesterday morning. Not sure if 2.3 RC4 has already get cut. If this still needs to be merged in 2.3, please let me know and I

[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-12-21 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/19715#discussion_r158347717 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala --- @@ -386,19 +382,16 @@ class QuantileDiscretizerSuite

[GitHub] spark issue #21050: [SPARK-23912][SQL]add array_distinct

2018-06-21 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21050 Thank you very much for your help! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #21925: [SPARK-24973][PYTHON]Add numIter to Python Cluste...

2018-07-30 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21925 [SPARK-24973][PYTHON]Add numIter to Python ClusteringSummary ## What changes were proposed in this pull request? Add numIter to Python version of ClusteringSummary ## How

[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-26 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r205538890 --- Diff: R/pkg/R/functions.R --- @@ -3320,7 +3321,7 @@ setMethod("explode", #' @aliases sequence sequence,Column-method #' @note sequ

[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-26 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r205609369 --- Diff: R/pkg/R/functions.R --- @@ -3320,7 +3321,7 @@ setMethod("explode", #' @aliases sequence sequence,Column-method #' @note sequ

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-08-13 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21439 Sure. I will work on it. Thanks for letting me know. @viirya --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-08-15 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r210326849 --- Diff: R/pkg/R/functions.R --- @@ -3320,7 +3321,7 @@ setMethod("explode", #' @aliases sequence sequence,Column-method #' @note sequ

[GitHub] spark pull request #22136: [SPARK-25124][ML]VectorSizeHint setSize and getSi...

2018-08-17 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/22136 [SPARK-25124][ML]VectorSizeHint setSize and getSize don't return values ## What changes were proposed in this pull request? In feature.py, VectorSizeHint setSize and getSize don't return

[GitHub] spark issue #22228: [SPARK-25124][ML]VectorSizeHint setSize and getSize don'...

2018-08-24 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/8 @jkbradley backport to 2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22228: [SPARK-25124][ML]VectorSizeHint setSize and getSi...

2018-08-24 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/8 [SPARK-25124][ML]VectorSizeHint setSize and getSize don't return values backport to 2.3 ## What changes were proposed in this pull request? In feature.py, VectorSizeHint setSize and getSize

[GitHub] spark pull request #22136: [SPARK-25124][ML]VectorSizeHint setSize and getSi...

2018-08-22 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22136#discussion_r212088986 --- Diff: python/pyspark/ml/tests.py --- @@ -844,6 +844,28 @@ def test_string_indexer_from_labels(self): .select

[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-18 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21710#discussion_r203526021 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/PrefixSpanWrapper.scala --- @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21820: [SPARK-24868][PYTHON]add sequence function in Pyt...

2018-07-19 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21820#discussion_r203934505 --- Diff: python/pyspark/sql/functions.py --- @@ -2551,6 +2551,27 @@ def map_concat(*cols): return Column(jc) +@since(2.4

[GitHub] spark issue #21820: [SPARK-24868][PYTHON]add sequence function in Python

2018-07-20 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21820 @HyukjinKwon Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-17 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21710#discussion_r203229835 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/PrefixSpanWrapper.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-17 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21710#discussion_r203229733 --- Diff: R/pkg/R/generics.R --- @@ -1415,6 +1415,13 @@ setGeneric("spark.freqItemsets", function(object) { standardGeneric("spark.freqI

[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-17 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21710#discussion_r203229794 --- Diff: R/pkg/tests/fulltests/test_mllib_fpm.R --- @@ -82,4 +82,26 @@ test_that("spark.fpGrowth", { }) +test_that("s

[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-18 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21710#discussion_r203481597 --- Diff: R/pkg/R/generics.R --- @@ -1415,6 +1415,13 @@ setGeneric("spark.freqItemsets", function(object) { standardGeneric("spark.freqI

[GitHub] spark issue #21835: [SPARK-24779]Add sequence / map_concat / map_from_entrie...

2018-07-23 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21835 @HyukjinKwon @felixcheung Could you please review? Thank you very much in advance! --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-24 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r204861059 --- Diff: R/pkg/tests/fulltests/test_context.R --- @@ -21,10 +21,11 @@ test_that("Check masked functions", { # Check that we are not m

[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-21 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21835 [SPARK-24779]Add sequence / map_concat / map_from_entries / an option in months_between UDF to disable rounding-off ## What changes were proposed in this pull request? Add

[GitHub] spark pull request #22291: [SPARK-25007][R]Add array_intersect/array_except/...

2018-08-30 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/22291 [SPARK-25007][R]Add array_intersect/array_except/array_union/shuffle to SparkR ## What changes were proposed in this pull request? Add the R version of array_intersect/array_except

[GitHub] spark issue #22291: [SPARK-25007][R]Add array_intersect/array_except/array_u...

2018-08-30 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/22291 @felixcheung @HyukjinKwon Sorry I couldn't figure out how to make the ```sequence``` work in the other PR. I will work on this one first

[GitHub] spark pull request #22291: [SPARK-25007][R]Add array_intersect/array_except/...

2018-08-31 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22291#discussion_r214472480 --- Diff: R/pkg/R/generics.R --- @@ -799,10 +807,18 @@ setGeneric("array_sort", function(x) { standardGeneric("array_sort") }

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-09-04 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r215022091 --- Diff: python/pyspark/sql/session.py --- @@ -252,6 +252,16 @@ def newSession(self): """ return self.__class__

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-09-04 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r215022059 --- Diff: python/pyspark/sql/session.py --- @@ -252,6 +252,16 @@ def newSession(self): """ return self.__class__

[GitHub] spark issue #20442: [SPARK-23265][ML]Update multi-column error handling logi...

2018-09-04 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20442 Any more comments? @MLnick @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21649: [SPARK-23648][R][SQL]Adds more types for hint in SparkR

2018-09-05 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21649 @felixcheung Are there any other things I need to change? If not, could this PR be merged in 2.4? Thanks

[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-09-05 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21710 @felixcheung Are there any other things I need to change? If not, could this PR be merged in 2.4? Thanks

[GitHub] spark pull request #22228: [SPARK-25124][ML]VectorSizeHint setSize and getSi...

2018-09-04 Thread huaxingao
Github user huaxingao closed the pull request at: https://github.com/apache/spark/pull/8 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-09-10 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21649#discussion_r216413819 --- Diff: R/pkg/R/DataFrame.R --- @@ -3905,6 +3905,16 @@ setMethod("rollup", group

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-08-30 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/22295 [SPARK-25255][PYTHON]Add getActiveSession to SparkSession in PySpark ## What changes were proposed in this pull request? add getActiveSession in session.py ## How

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-09-07 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r216115581 --- Diff: python/pyspark/sql/session.py --- @@ -252,6 +252,16 @@ def newSession(self): """ return self.__class__

[GitHub] spark issue #21645: [SPARK-24537][R]Add array_remove / array_zip / map_from_...

2018-07-10 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21645 @HyukjinKwon @felixcheung Could you please review the changes? Thank you very much in advance! --- - To unsubscribe, e

[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...

2018-07-11 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21678 @felixcheung Thanks a lot for your help! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21645: [SPARK-24537][R]Add array_remove / array_zip / ma...

2018-07-11 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21645#discussion_r201827579 --- Diff: R/pkg/R/functions.R --- @@ -3071,6 +3085,19 @@ setMethod("array_position", column(jc) }) +#

[GitHub] spark issue #21645: [SPARK-24537][R]Add array_remove / array_zip / map_from_...

2018-07-12 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21645 Thanks! @HyukjinKwon @felixcheung --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-11 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21710 @felixcheung Can I open a new jira for code example and documentation? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to ...

2018-03-15 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20777#discussion_r174935305 --- Diff: python/pyspark/ml/feature.py --- @@ -465,26 +473,26 @@ class CountVectorizer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadable

[GitHub] spark pull request #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to ...

2018-03-08 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/20777 [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python CountVectorizer ## What changes were proposed in this pull request? The maxDF parameter is for filtering out frequently occurring

[GitHub] spark pull request #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to ...

2018-03-08 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20777#discussion_r173369004 --- Diff: python/pyspark/ml/feature.py --- @@ -465,26 +522,26 @@ class CountVectorizer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadable

[GitHub] spark pull request #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to ...

2018-03-14 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20777#discussion_r174636559 --- Diff: python/pyspark/ml/tests.py --- @@ -679,6 +679,29 @@ def test_count_vectorizer_with_binary(self): feature, expected = r

[GitHub] spark issue #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, asc_nulls...

2018-04-04 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20962 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #20968: [SPARK-23828][ML][PYTHON]PySpark StringIndexerModel shou...

2018-04-06 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20968 @BryanCutler Thank you very much for your help! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21003: [SPARK-23871][ML][PYTHON]add python api for Vecto...

2018-04-07 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21003 [SPARK-23871][ML][PYTHON]add python api for VectorAssembler handleInvalid ## What changes were proposed in this pull request? add python api for VectorAssembler handleInvalid

[GitHub] spark issue #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, asc_nulls...

2018-04-08 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20962 @HyukjinKwon Thank you very much for your help!! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21003: [SPARK-23871][ML][PYTHON]add python api for VectorAssemb...

2018-04-10 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21003 @jkbradley Thank you very much for your help! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21003: [SPARK-23871][ML][PYTHON]add python api for VectorAssemb...

2018-04-10 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21003 @jkbradley Thanks for your comment. I will add "and NaN" in the doc. --- - To unsubscribe, e-mail: review

[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...

2018-04-13 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21069 [SPARK-23920][SQL]add array_remove to remove all elements that equal element from array ## What changes were proposed in this pull request? add array_remove to remove all elements

[GitHub] spark pull request #20968: [SPARK-23828][ML][PYTHON]PySpark StringIndexerMod...

2018-04-06 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20968#discussion_r179791957 --- Diff: python/pyspark/ml/feature.py --- @@ -2342,8 +2342,38 @@ def mean(self): return self._call_java("mean")

[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-04 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r179292870 --- Diff: python/pyspark/sql/functions.py --- @@ -87,7 +87,15 @@ def _(): 'col': 'Returns a :class:`Column` based on the given column name

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-04-11 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21050 [SPARK-23912][SQL]add array_distinct ## What changes were proposed in this pull request? Add array_distinct to remove duplicate value from the array. ## How was this patch

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-20 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21119 [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC ## What changes were proposed in this pull request? add spark.ml Python API for PIC ## How was this patch tested

[GitHub] spark pull request #21090: [SPARK-24026][ML] Add Power Iteration Clustering ...

2018-04-19 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21090#discussion_r182931610 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,256 @@ +/* + * Licensed

[GitHub] spark pull request #21050: [SPARK-23912][SQL]add array_distinct

2018-04-24 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21050#discussion_r183925443 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -105,4 +105,18 @@ class

[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...

2018-04-18 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21069#discussion_r182605111 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,44 @@ case class

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-04-25 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21119 @jkbradley Could you please review when you have time? Thank you very much in advance! --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-02 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/20962 [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, asc_nulls_last to PySpark ## What changes were proposed in this pull request? Column.scala and Functions.scala have asc_nulls_first

[GitHub] spark pull request #20968: [SPARK-23828][ML][PYTHON]PySpark StringIndexerMod...

2018-04-03 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/20968 [SPARK-23828][ML][PYTHON]PySpark StringIndexerModel should have constructor from labels ## What changes were proposed in this pull request? The Scala StringIndexerModel has an alternate

[GitHub] spark pull request #20962: [SPARK-23847][PYTHON][SQL]Add asc_nulls_first, as...

2018-04-03 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20962#discussion_r178957855 --- Diff: python/pyspark/sql/column.py --- @@ -454,6 +454,32 @@ def isin(self, *cols): >>> df.select(df.name).orderBy(df.name.asc()

[GitHub] spark issue #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python ...

2018-03-20 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20777 @BryanCutler Do you mind if I close this PR and open a new one? I got problems when I tried to resolve the conflicts

[GitHub] spark issue #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python ...

2018-03-23 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20777 Thank you very much for your help! @BryanCutler --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21159: [SPARK-24057][PYTHON]put the real data type in th...

2018-04-25 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21159 [SPARK-24057][PYTHON]put the real data type in the AssertionError message ## What changes were proposed in this pull request? Print out the data type in the AssertionError

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-06-28 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21649#discussion_r198913351 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -2370,6 +2370,15 @@ test_that("join(), crossJoin() and merge() on a Data

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-06-28 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21649#discussion_r198913230 --- Diff: R/pkg/R/DataFrame.R --- @@ -3905,6 +3905,18 @@ setMethod("rollup", groupedData(sgd) }) +isT

[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...

2018-06-28 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21557 Thank you very much for your help! @BryanCutler --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-16 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r225667299 --- Diff: python/pyspark/sql/tests.py --- @@ -3654,6 +3654,109 @@ def test_jvm_default_session_already_set(self): spark.stop

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-16 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r225666954 --- Diff: python/pyspark/sql/session.py --- @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None): or SparkSession

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-16 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r225667174 --- Diff: python/pyspark/sql/functions.py --- @@ -2633,6 +2633,23 @@ def sequence(start, stop, step=None): _to_java_column(start

[GitHub] spark pull request #22790: [SPARK-25793][ML]call SaveLoadV2_0.load for class...

2018-10-21 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/22790 [SPARK-25793][ML]call SaveLoadV2_0.load for classNameV2_0 ## What changes were proposed in this pull request? The following code in BisectingKMeansModel.load calls the wrong version of load

[GitHub] spark pull request #22793: [SPARK-25793][ML]Call SaveLoadV2_0.load for class...

2018-10-21 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/22793 [SPARK-25793][ML]Call SaveLoadV2_0.load for classNameV2_0 ## What changes were proposed in this pull request? The wrong version of load is called in BisectingKMeansModel.load

[GitHub] spark pull request #22790: [SPARK-25793][ML]call SaveLoadV2_0.load for class...

2018-10-22 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22790#discussion_r227229331 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala --- @@ -126,7 +126,7 @@ object BisectingKMeansModel extends

[GitHub] spark issue #22793: [SPARK-25793][ML]Call SaveLoadV2_0.load for classNameV2_...

2018-10-22 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/22793 @WeichenXu123 I created two PRs for this jira. I had trouble to create the first one so I created another one. I will close this PR. Please use the other one. Thanks

[GitHub] spark pull request #22793: [SPARK-25793][ML]Call SaveLoadV2_0.load for class...

2018-10-22 Thread huaxingao
Github user huaxingao closed the pull request at: https://github.com/apache/spark/pull/22793 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22788: [SPARK-25769][SQL]escape nested columns by backti...

2018-10-22 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22788#discussion_r227152273 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2702,7 +2702,7 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #22788: [SPARK-25769][SQL]change nested columns from `a.b...

2018-10-21 Thread huaxingao
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/22788 [SPARK-25769][SQL]change nested columns from `a.b` to `a`.`b` ## What changes were proposed in this pull request? Currently, ```$"a.b".expr.asInstanceOf[UnresolvedAttr

[GitHub] spark pull request #22788: [SPARK-25769][SQL]change nested columns from `a.b...

2018-10-21 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22788#discussion_r226872842 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -98,8 +98,18 @@ case class

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-18 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r226178127 --- Diff: python/pyspark/sql/tests.py --- @@ -3863,6 +3863,145 @@ def test_jvm_default_session_already_set(self): spark.stop

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-18 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r226178191 --- Diff: python/pyspark/sql/tests.py --- @@ -3863,6 +3863,145 @@ def test_jvm_default_session_already_set(self): spark.stop

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-18 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r226178054 --- Diff: python/pyspark/sql/functions.py --- @@ -2713,6 +2713,25 @@ def from_csv(col, schema, options={}): return Column(jc

[GitHub] spark issue #22790: [SPARK-25793][ML]call SaveLoadV2_0.load for classNameV2_...

2018-10-23 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/22790 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22790: [SPARK-25793][ML]call SaveLoadV2_0.load for classNameV2_...

2018-10-23 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/22790 I added a regression test in ```org.apache.spark.mllib.clustering.BisectingKMeansSuite``` I could add the following test in ml package. ``` test("SPARK-25793") {

[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

2018-10-26 Thread huaxingao
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/22295 Thank you very much for your help! ! @holdenk @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

<    1   2   3   4   >