[GitHub] spark pull request #21218: [SPARK-24155][ML] Instrumentation improvements fo...

2018-05-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21218#discussion_r185756220 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -378,6 +378,7 @@ class KMeans @Since("

[GitHub] spark pull request #21218: [SPARK-24155][ML] Instrumentation improvements fo...

2018-05-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21218#discussion_r185756193 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -423,6 +423,8 @@ class GaussianMixture @Since("

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-10-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r145913386 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,464 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-10-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r145911522 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,464 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

2017-10-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19433#discussion_r147317401 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/LocalDecisionTree.scala --- @@ -0,0 +1,255 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

2017-10-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19433 After discussion and modifications, I approve this PR overall. Ping @jkbradley Can you take a look now

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19565 Yes, it changed the probability of samples indeed compared with current code. But according to the comments coming from @jkbradley in #18924 , "in order to make **corpusSize**, batc

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

2017-10-22 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r146167919 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -282,8 +348,27 @@ class LinearSVC @Since("

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-22 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r146163724 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,122 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-22 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r146164215 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,122 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-22 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r146163650 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,122 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-22 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r146163447 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,258 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

2017-10-22 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r146167706 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -42,7 +44,26 @@ import org.apache.spark.sql.functions.{col

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-10-27 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19588 [SPARK-12375][ML] VectorIndexerModel support handle unseen categories via handleInvalid ## What changes were proposed in this pull request? Support skip/error/keep strategy, similar

[GitHub] spark issue #19588: [SPARK-12375][ML] VectorIndexerModel support handle unse...

2017-10-27 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19588 cc @hhbyyh @MrBago Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-10-27 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19122 @jkbradley Sure I will! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-10-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r147542224 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml

[GitHub] spark issue #20113: [SPARK-22905][ML][FollowUp] Fix GaussianMixtureModel sav...

2017-12-29 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20113 @zhengruifeng Good work! Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #19979: [SPARK-22881][ML][TEST] ML regression package tes...

2017-12-29 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19979#discussion_r159061186 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/IsotonicRegressionSuite.scala --- @@ -44,13 +41,11 @@ class IsotonicRegressionSuite

[GitHub] spark pull request #19979: [SPARK-22881][ML][TEST] ML regression package tes...

2017-12-29 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19979#discussion_r159061148 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -89,33 +88,31 @@ class

[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...

2017-12-29 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20111#discussion_r159048079 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSHSuite.scala --- @@ -98,6 +97,21 @@ class

[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...

2017-12-29 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20111#discussion_r159116537 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSHSuite.scala --- @@ -98,6 +97,21 @@ class

[GitHub] spark issue #20111: [SPARK-22883][ML][TEST] Streaming tests for spark.ml.fea...

2017-12-29 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20111 LGTM except a tiny issue. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20121: [SPARK-22927][ML][TESTS] ML test for structured s...

2017-12-29 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20121 [SPARK-22927][ML][TESTS] ML test for structured streaming: ml.classification ## What changes were proposed in this pull request? adding Structured Streaming tests for all Models

[GitHub] spark pull request #20077: [SPARK-22899][ML][STREAM] Fix OneVsRestModel tran...

2017-12-25 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20077 [SPARK-22899][ML][STREAM] Fix OneVsRestModel transform on streaming data failed. ## What changes were proposed in this pull request? Fix OneVsRestModel transform on streaming data

[GitHub] spark pull request #20209: [SPARK-23008][ML] OnehotEncoderEstimator python A...

2018-01-09 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20209 [SPARK-23008][ML] OnehotEncoderEstimator python API ## What changes were proposed in this pull request? OnehotEncoderEstimator python API. ## How was this patch tested

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160264829 --- Diff: python/pyspark/ml/image.py --- @@ -71,9 +88,30 @@ def ocvTypes(self): """ if self._o

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160265175 --- Diff: python/pyspark/ml/image.py --- @@ -55,7 +72,7 @@ def imageSchema(self): """ if self._imag

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160264533 --- Diff: python/pyspark/ml/image.py --- @@ -71,9 +88,30 @@ def ocvTypes(self): """ if self._o

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2018-01-19 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r162703633 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -105,20 +106,21 @@ final class Bucketizer @Since("1.4.0"

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2018-01-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17123 But, pls resolve conflicts first. :) Bucketizer add multiple column support so the code is different now

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2018-01-19 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r162703711 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -171,23 +176,23 @@ object Bucketizer extends DefaultParamsReadable

[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile

2018-01-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20324 LGTM. Thanks! 👍 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-24 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19993 +1 merge this to 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20411: [SPARK-17139][ML][FOLLOW-UP] update LogisticRegre...

2018-01-26 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20411 [SPARK-17139][ML][FOLLOW-UP] update LogisticRegressionSummaryExample code ## What changes were proposed in this pull request? New method `trainingSummary.asBinary` added so

[GitHub] spark pull request #20411: [SPARK-17139][ML][FOLLOW-UP] update LogisticRegre...

2018-01-26 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20411 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20411: [SPARK-17139][ML][FOLLOW-UP] update LogisticRegressionSu...

2018-01-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20411 @sethah ok thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164237753 --- Diff: docs/ml-classification-regression.md --- @@ -111,10 +110,9 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20146 @viirya Discuss with @jkbradley offline, we're now busy fixing some issues (e.g. #20238) in ML structured streaming support, it looks bad after the code freeze, and we may not be able

[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...

2018-01-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20146#discussion_r161039325 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -33,12 +33,38 @@ class StringIndexerSuite test

[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...

2018-01-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20146#discussion_r161040131 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +249,16 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...

2018-01-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20146#discussion_r161040537 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -331,4 +357,51 @@ class StringIndexerSuite val

[GitHub] spark pull request #20229: [SPARK-23045][ML][SparkR] Update RFormula to use ...

2018-01-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20229#discussion_r161120354 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -230,16 +231,17 @@ class RFormula @Since("1.5.0") (@Si

[GitHub] spark pull request #20241: [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder p...

2018-01-11 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20241 [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python API deprecated ## What changes were proposed in this pull request? mark OneHotEncoder python API deprecated ## How

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164531329 --- Diff: docs/ml-classification-regression.md --- @@ -111,10 +110,9 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-01-30 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20446 [SPARK-23254][ML] Add user guide entry for DataFrame multivariate summary ## What changes were proposed in this pull request? Add user guide and scala/java examples

[GitHub] spark issue #20446: [SPARK-23254][ML] Add user guide entry for DataFrame mul...

2018-01-30 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20446 @MLnick @MrBago Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20164: [SPARK-22971][ML] OneVsRestModel should use temporary Ra...

2018-02-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20164 Sorry, I haven't understood where is the issue in current master code. The models here should be `ClassificationModel` and will always have `rawPrediction` param and have default value

[GitHub] spark issue #20164: [SPARK-22971][ML] OneVsRestModel should use temporary Ra...

2018-02-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20164 Oh, do you mean if input df including a column named "rawPrediction", then it will be overwritten when it transformed by OVSModel ? Looks like

[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20421 @MLnick Forget one fix: https://github.com/apache/spark/pull/18797 I doubt whether this fix should go into "behavior change". It influences iteration number for algos

[GitHub] spark pull request #20457: [SPARK-23110][MINOR] Make linearRegressionModel c...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20457 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20457: [SPARK-23110][MINOR] Make linearRegressionModel c...

2018-01-31 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20457 [SPARK-23110][MINOR] Make linearRegressionModel constructor private ## What changes were proposed in this pull request? make linearRegressionModel constructor private[ml

[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20421 ah, yes, it backport to 2.2 😳 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165565121 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaSummarizerExample.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165573866 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/SummarizerExample.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

2018-02-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165578020 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/SummarizerExample.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20459#discussion_r165229102 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -93,7 +93,7 @@ private[feature] trait

[GitHub] spark pull request #20457: [SPARK-23110][MINOR] Make linearRegressionModel c...

2018-01-31 Thread WeichenXu123
GitHub user WeichenXu123 reopened a pull request: https://github.com/apache/spark/pull/20457 [SPARK-23110][MINOR] Make linearRegressionModel constructor private ## What changes were proposed in this pull request? make linearRegressionModel constructor private[ml

[GitHub] spark pull request #20457: [SPARK-23110][MINOR] Make linearRegressionModel c...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20457 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20457: [SPARK-23110][MINOR] Make linearRegressionModel construc...

2018-01-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20457 It's covered in this PR #20459 So go there discuss. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #20164: [SPARK-22971][ML] OneVsRestModel should use temporary Ra...

2018-02-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20164 @srowen yeah, all models exist this issue. Although, a little difference, for other models, it is very straightforward for user to call `setRawPrediction` to avoid overwrite the same name

[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-13 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20594#discussion_r167801392 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -290,6 +293,27 @@ object Bucketizer extends DefaultParamsReadable

[GitHub] spark issue #20592: [SPARK-23154][ML][DOC] Document backwards compatibility ...

2018-02-12 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20592 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-13 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20594#discussion_r168067865 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -290,6 +293,27 @@ object Bucketizer extends DefaultParamsReadable

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20594 I thought again, instead of "removing default value and restore it again later (which may cause some side effects)", maybe the better way is, add a

[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-13 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20594#discussion_r168069150 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -290,6 +293,27 @@ object Bucketizer extends DefaultParamsReadable

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST] Make ML testsuite support...

2017-12-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r158692700 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #19979: [SPARK-22881][ML][TEST] ML regression package testsuite ...

2017-12-28 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19979 @jkbradley There're two cases which can use `globalCheckFunction` - test statistics (such as min/max ) on global transformer output - get global result array and compare

[GitHub] spark pull request #20095: [SPARK-22126][ML] Added fitMultiple method with d...

2017-12-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20095#discussion_r158929523 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala --- @@ -79,7 +82,51 @@ abstract class Estimator[M <: Model[M]] exte

[GitHub] spark pull request #20095: [SPARK-22126][ML] Added fitMultiple method with d...

2017-12-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20095#discussion_r158931079 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala --- @@ -79,7 +82,51 @@ abstract class Estimator[M <: Model[M]] exte

[GitHub] spark issue #19979: [SPARK-22881][ML][TEST] ML regression package testsuite ...

2017-12-28 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19979 @MrBago Merge your code suggestion. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #20095: [SPARK-22126][ML] Added fitMultiple method with d...

2017-12-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20095#discussion_r158930992 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala --- @@ -79,7 +82,51 @@ abstract class Estimator[M <: Model[M]] exte

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r158932419 --- Diff: python/pyspark/ml/base.py --- @@ -18,13 +18,40 @@ from abc import ABCMeta, abstractmethod import copy +import threading

[GitHub] spark issue #19979: [SPARK-22881][ML][TEST] ML regression package testsuite ...

2017-12-28 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19979 @jkbradley > When there has been a shuffle, it is likely the Rows will not follow a fixed order. Agreed. But we can make sure it generate fix order from the last shuffle posit

[GitHub] spark issue #20113: [SPARK-22905][ML][FollowUp] Fix GaussianMixtureModel sav...

2017-12-29 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20113 LGTM. Have you checked all the model.save ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19621: [SPARK-11215][ML] Add multiple columns support to...

2018-01-02 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/19621 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 I am too busy recently to fix those failed R tests. Anyone who has spare time can take over this PR and I will help review. Thanks

[GitHub] spark issue #20088: [SPARK-22905][ML][MLLIB][CORE] Fix ChiSqSelectorModel sa...

2017-12-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20088 Currently I cannot construct a failed test for this issue, but the future PR (changing `RoundRobinPartitioning`) by @jiangxb1987 will trigger this bug

[GitHub] spark pull request #20088: [SPARK-22905][ML][MLLIB][CORE] Fix ChiSqSelectorM...

2017-12-26 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20088 [SPARK-22905][ML][MLLIB][CORE] Fix ChiSqSelectorModel save implementation ## What changes were proposed in this pull request? Currently, in `ChiSqSelectorModel`, save

[GitHub] spark issue #19994: [SPARK-22810][ML][PySpark] Expose Python API for LinearR...

2017-12-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19994 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138998 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138889 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -567,6 +567,7 @@ object DataSource extends

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215200249 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageOptions.scala --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #22360: [MINOR][ML] Remove `BisectingKMeansModel.setDistanceMeas...

2018-09-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22360 Do we need to set `distanceMeasure` again for the parent model ? When parent model created, it will use the same `distanceMeasure` with the one used in training

[GitHub] spark pull request #22349: [SPARK-25345][ML] Deprecate public APIs from Imag...

2018-09-06 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/22349 [SPARK-25345][ML] Deprecate public APIs from ImageSchema ## What changes were proposed in this pull request? Deprecate public APIs from ImageSchema. ## How was this patch

[GitHub] spark pull request #22360: [MINOR][ML] Remove `BisectingKMeansModel.setDista...

2018-09-07 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/22360 [MINOR][ML] Remove `BisectingKMeansModel.setDistanceMeasure` method ## What changes were proposed in this pull request? Remove `BisectingKMeansModel.setDistanceMeasure` method

[GitHub] spark pull request #22349: [SPARK-25345][ML] Deprecate public APIs from Imag...

2018-09-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22349#discussion_r216117396 --- Diff: python/pyspark/ml/image.py --- @@ -207,6 +207,9 @@ def readImages(self, path, recursive=False, numPartitions=-1, .. note

[GitHub] spark issue #22360: [MINOR][ML] Remove `BisectingKMeansModel.setDistanceMeas...

2018-09-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22360 @srowen The delegated `mllib.BisectingKMeansModel` is: ``` class BisectingKMeansModel private[clustering] ( private[clustering] val root: ClusteringTreeNode, @Since

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215135665 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138174 --- Diff: data/mllib/images/images/license.txt --- @@ -0,0 +1,13 @@ +The images in the folder "kittens" are under the creative commons CC

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138711 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138728 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138862 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed

[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2018-07-09 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/19666 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20446: [SPARK-23254][ML] Add user guide entry and example for D...

2018-07-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20446 @srowen The reason I do not use `.show` I have already reply here https://github.com/apache/spark/pull/20446#discussion_r165565121 thanks

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161859425 --- Diff: docs/ml-features.md --- @@ -775,35 +775,43 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161857103 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneHotEncoderEstimatorExample.java --- @@ -35,41 +34,37 @@ import

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

2018-01-16 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20257#discussion_r161854406 --- Diff: docs/ml-features.md --- @@ -775,35 +775,43 @@ for more details on the API. -## OneHotEncoder +## OneHotEncoder

[GitHub] spark pull request #20261: [SPARK-22885][ML][TEST] ML test for StructuredStr...

2018-01-13 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20261 [SPARK-22885][ML][TEST] ML test for StructuredStreaming: spark.ml.tuning ## What changes were proposed in this pull request? ML test for StructuredStreaming: spark.ml.tuning

[GitHub] spark issue #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator document an...

2018-01-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20257 Nice, LGTM. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

<    5   6   7   8   9   10   11   12   >