Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16630
@actuaryzhang sorry I'm at Spark Summit East, will take a look soon. For
the feature name or "lazy val featureName: Array[String]", I recall there is a
sparse (eg output by
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r100927396
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -1152,4 +1170,32 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r100931445
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -915,6 +917,22 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r100931568
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -1152,4 +1170,32 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r100932182
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -1104,6 +1103,83 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r100932561
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -1152,4 +1170,32 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r100933207
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -1152,4 +1170,32 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r100933747
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -915,6 +917,22 @@ class
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16630
the code looks very good, I added a few minor comments, will take another
look tomorrow, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16699#discussion_r100934554
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -798,77 +798,160 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16699#discussion_r100934645
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -168,6 +179,7 @@ private[regression] trait
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101338639
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala ---
@@ -124,8 +129,8 @@ private[ml] object TreeTests extends
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101338837
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
---
@@ -106,14 +122,18 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101339648
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala
---
@@ -126,20 +127,22 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101339732
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala
---
@@ -126,20 +127,20 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101339952
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala
---
@@ -99,16 +105,31 @@ class DecisionTreeRegressor
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101341596
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala ---
@@ -60,12 +68,14 @@ private[spark] object BaggedPoint
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101342186
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala
---
@@ -115,7 +122,10 @@ private[spark] object
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101342791
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
---
@@ -351,6 +370,36 @@ class
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16722
the code looks good to me, maybe a contributor can comment? This is a
great feature, nice work!
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101346522
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala ---
@@ -60,12 +68,14 @@ private[spark] object BaggedPoint
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101351063
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala ---
@@ -82,16 +92,16 @@ private[spark] object BaggedPoint
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101354732
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
---
@@ -351,6 +370,36 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r101356243
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -1104,6 +1103,83 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r101356475
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -1152,4 +1173,33 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r101357001
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -915,6 +919,23 @@ class
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16630
Thanks for the updates, the changes look good to me. One question, out of
scope of the specific changes in this review: are there any other summary
statistics that we could add in the future
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r101362237
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -1104,6 +1103,83 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16630#discussion_r101362084
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -1104,6 +1103,83 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101531848
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
---
@@ -351,6 +370,36 @@ class
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101532020
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala ---
@@ -60,12 +68,14 @@ private[spark] object BaggedPoint
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16630
@actuaryzhang thanks, LGTM!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16630
@actuaryzhang sorry, can you comment on this question I had above:
One question, out of scope of the specific changes in this review: are
there any other summary statistics that we could
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16557
ok, I will close this and create three new PRs, one for each of the
evaluators
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user imatiach-msft closed the pull request at:
https://github.com/apache/spark/pull/16557
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
GitHub user imatiach-msft opened a pull request:
https://github.com/apache/spark/pull/17084
[SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added
weight column for binary classification evaluator
## What changes were proposed in this pull request?
The
GitHub user imatiach-msft opened a pull request:
https://github.com/apache/spark/pull/17085
[SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added
weight column for regression evaluator
## What changes were proposed in this pull request?
The evaluators
GitHub user imatiach-msft opened a pull request:
https://github.com/apache/spark/pull/17086
[SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added
weight column for multiclass classification evaluator
## What changes were proposed in this pull request?
The
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16557
I've created 3 PRs, located here:
https://github.com/apache/spark/pull/17084
https://github.com/apache/spark/pull/17085
https://github.com/apache/spark/pull/17086
---
If
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16377
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16377#discussion_r93639635
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala
---
@@ -113,6 +113,10 @@ private[spark] object
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16377
How can I run the build/tests?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
Yep, there is still a TODO to verify the fix. I'm waiting for the dataset
from Alok to reproduce the issue:
https://issues.apache.org/jira/browse/SPARK-16473
---
If your project i
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
Hi Alok!
Sorry I was away for holiday break. I will try to reproduce the failure.
Thank you, Ilya
---
If your project is set up for it, you can reply to this email and have your
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16377
Is there anything I need to do to allow this fix to be pushed to the base
branch? Are there any pending questions/comments that still need to resolved?
Thank you, Ilya
---
If your
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16377
@sethah would you be able to take a look at the proposed changes? Thank
you!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
I have very good news :). I was not only able to repro the issue with your
dataset, but I was also able to verify that with the suggested fix the
algorithm does not fail (adding the val
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
I've updated with a new commit. I was able to reproduce the issue by
generating a synthetic sparse dataset similar to the one Alok sent me, in
accordance with the test-style of spark
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@jkbradley @srowen any comments on the changes? Thank you!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16377
ping @sethah
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user imatiach-msft opened a pull request:
https://github.com/apache/spark/pull/16436
[SPARK-18698][ML][MLLIB] Adding public constructor that takes uid for
IndexToString
## What changes were proposed in this pull request?
Based on SPARK-18698, this adds a public
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16436
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16436
@mengxr @jkbradley @MLnick @HyukjinKwon @holdenk would you be able to take
a look at the changes - it looks like you have previously modified
StringIndexer.scala file.
---
If your project
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16436
done!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16436#discussion_r94174966
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala ---
@@ -219,6 +219,16 @@ class StringIndexerSuite
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16436#discussion_r94175547
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala ---
@@ -219,6 +219,16 @@ class StringIndexerSuite
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16436#discussion_r94175562
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala ---
@@ -219,6 +219,16 @@ class StringIndexerSuite
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16436#discussion_r94175568
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala ---
@@ -219,6 +219,16 @@ class StringIndexerSuite
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16436
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
the only problem I see is that with this code we generate k-1 clusters
instead of k, but it states in the algorithm documentation that it is not
guaranteed to generate k clusters, it could be
GitHub user imatiach-msft opened a pull request:
https://github.com/apache/spark/pull/16441
[SPARK-14975][ML][WIP] Fixed GBTClassifier to predict probability per
training instance and fixed interfaces
## What changes were proposed in this pull request?
For all of the
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16441
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16441
Thanks, I've updated the PR based on your comment. The only disadvantage
to the current code is that I do the probability computation within the
classifier, but it seems like it shou
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16377#discussion_r94850577
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -713,6 +713,15 @@ private[spark] object RandomForest extends
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16377#discussion_r94850633
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -713,6 +713,15 @@ private[spark] object RandomForest extends
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16377#discussion_r94852377
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -713,6 +713,15 @@ private[spark] object RandomForest extends
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16441
@jkbradley I've updated based on your comments, please take another look,
thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16377
@sethah I've updated the code based on your comments, please take a look,
thank you!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16441#discussion_r94873371
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -215,10 +223,23 @@ class GBTClassificationModel private
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16441#discussion_r94873411
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -215,10 +223,23 @@ class GBTClassificationModel private
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16441#discussion_r94874968
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -248,12 +269,38 @@ class GBTClassificationModel private
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16441#discussion_r94875629
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -248,12 +269,38 @@ class GBTClassificationModel private
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16441#discussion_r94877056
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -248,12 +269,38 @@ class GBTClassificationModel private
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16441#discussion_r94898798
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
---
@@ -66,10 +66,39 @@ class GBTClassifierSuite extends
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16441#discussion_r94898909
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
---
@@ -66,10 +66,39 @@ class GBTClassifierSuite extends
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16441#discussion_r94898917
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
---
@@ -66,10 +66,39 @@ class GBTClassifierSuite extends
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16441
@sethah @jkbradley thank you for the review - could you please take another
look since I've updated the code review based on your comments?
---
If your project is set up for it, yo
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/9920
@jliwork @srowen are you currently working on this in-progress JIRA 11569?
If not, I would be interested in continuing the initial pull request that was
closed. Please let me know, thank you
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16441
It looks like I am failing the binary compatibility tests despite this
constructor being private:
class GBTClassificationModel private[ml](
@Since("1.6.0") overri
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16441#discussion_r94957214
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -248,12 +269,38 @@ class GBTClassificationModel private
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16441#discussion_r94957257
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -248,12 +269,38 @@ class GBTClassificationModel private
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16441#discussion_r94957348
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -248,12 +269,38 @@ class GBTClassificationModel private
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16441
Indeed re-adding the constructor seems to make the binary compatibility
tests pass (see spark QA build above). I think in favor of making the binary
compat tests pass, we can keep the extra
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16441
I've removed the WIP from title to reflect the status of the pull request.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as wel
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16471
I think you might need to add [ML] to the pull request name, eg:
[SPARK-19078][ML] hashingTF,ChiSqSelector,IDF,StandardScaler,PCA transform
avoid extra vector conversion
I like
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16355#discussion_r95030008
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala
---
@@ -51,6 +54,23 @@ class BisectingKMeansSuite
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16355#discussion_r95029976
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala
---
@@ -29,9 +29,12 @@ class BisectingKMeansSuite
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16355#discussion_r95030147
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala
---
@@ -51,6 +54,23 @@ class BisectingKMeansSuite
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/16355#discussion_r95030212
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -160,6 +162,17 @@ object KMeansSuite
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@jkbradley Thank you for taking a look! I've updated the code based on your
comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on G
GitHub user imatiach-msft opened a pull request:
https://github.com/apache/spark/pull/16494
[SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with ClassCastException
## What changes were proposed in this pull request?
LDA fails with a ClassCastException when run on a dataset
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16494
@jkbradley @vanzin @skyluc @luluorta @uncleGen @kanzhang Could you please
take a look at this pull request to fix the method fromEdges in EdgeRDD class
used by LDA? Thank you!
---
If your
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16516
This is a nice fix. It looks like some other learners have this issue as
well, eg LogisticRegression.scala under
$(root)/mllib/src/main/scala/org/apache/spark/ml/classification
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16516
Maybe a more generic fix would be to fix the method ParamValidators.inArray
to be case insensitive. I see this method used in a lot of places. Doing a
simple search brings up not just
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@jkbradley @yu-iskw @srowen can you please take another look at the
bisecting k-means algorithm fix? Thank you!
---
If your project is set up for it, you can reply to this email and have
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16377
ping @sethah can you please take another look at the decision tree/random
forest fixes? Thank you!
---
If your project is set up for it, you can reply to this email and have your
reply
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16441
ping @sethah @jkbradley could you please take another look since I've
updated the code review based on your comments? Thank you!
---
If your project is set up for it, you can reply to
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16355
@filousen could you please share the code that you used to load and run the
dataset and the full error message with stack trace you are seeing? I'm a bit
confused since the dataset is
1 - 100 of 670 matches
Mail list logo