[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15831 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15831 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77448/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15831 **[Test build #77448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77448/testReport)** for PR 15831 at commit [`89e6858`](https://github.com/apache/spark/commit/89e6858545d5f9b064b590b3e3e5f34bcb3bfa82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15831 **[Test build #77448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77448/testReport)** for PR 15831 at commit [`89e6858`](https://github.com/apache/spark/commit/89e6858545d5f9b064b590b3e3e5f34bcb3bfa82). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 @HyukjinKwon was busy, will restart this week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15831 Hi @@techaddict, how is this PR going? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15831 @techaddict @sethah I have some time to work on the porting, but I dont find the umbrella JIRA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 @sethah I will revive this pr thanks ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15831 I think we decided to go a different direction than what is proposed here? Actually, I still think there's merit in fixing the problem without having to do full feature ports. Either way, I'm not sure anyone is still taking on this task, so @zhengruifeng or @techaddict it would be great if you wanted to either revive this PR/help review, or start working on the larger umbrella JIRA and sub tasks... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15831 the same TODO also appear in `HashingTF`, what about include it in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 @MLnick I will create a umbrella jira and start adding jira's for things I'm aware of of and you can start prioritising ð sounds like a plan ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15831 I'm also generally supportive of (1) - porting the code to `ml` and having the `mllib` code wrap the `ml` version - this is the approach for other models that have been done. Of course only once *all* `mllib` code has been ported over fully can we ultimately deprecate `mllib`. I guess we can start doing this for some transformers like these - but ideally we should focus on porting stuff that's still missing in `ml` first. I'd prefer that we create a top-level JIRA to track all the components that need to be done, and link everything appropriately. We also need to decide on priority - we may realistically be working on it over a 1-1.5 year time frame. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 @sethah @yanboliang I've started with migrating `IDF`, can you review the WIP and if i'm going in the right direction https://github.com/techaddict/spark/pull/2/files there is some code duplication were we can make mllib code actually depend on the ml one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15831 @techaddict @sethah I'm more prefer option 1, since we would like to remove spark.mllib package in a future release(may be 3.0) and we wouldn't like to make any change to it except bug fix. Could you make this improvement separately for relevant algorithms? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 @sethah I agree, 2nd approach is much more reasonable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15831 I see this patch was created as a result of the PR that separated the ml/mllib linalg packages, to avoid some inefficiencies in conversion. However, it also is a partial step toward feature parity. Typically, we would port full algorithms all at once, instead of just porting the transformer functionality as is done here, but I understand that there is not just about parity. I would suggest one of the following: 1. Port over full feature functionality. This increases the scope and therefore the algos should probably separated out individually into PRs. 2. Keep the scope the same, but avoid copying code. For an example of option 2, for `ChiSqSelector`, we can implement new static methods in the `mllib.ChiSqSelectorModel`: scala private[spark] def compressDense( selectedFeatures: Array[Int], values: Array[Double]): Array[Double] = { selectedFeatures.map(i => values(i)) } private[spark] def compressSparse( compressedSize: Int, selectedFeatures: Array[Int], indices: Array[Int], values: Array[Double]): (Array[Int], Array[Double]) = { ... } then in the actual model classes we can just do something like: scala private def compress(features: Vector): Vector = { features match { case SparseVector(_, indices, values) => val newSize = selectedFeatures.length val (newIndices, newValues) = ChiSqSelectorModel.compressSparse(newSize, selectedFeatures, indices, values) Vectors.sparse(newSize, newIndices, newValues) case DenseVector(values) => Vectors.dense(ChiSqSelectorModel.compressDense(selectedFeatures, values)) } } This approach would allow us to avoid copying a lot of code until we do full feature ports. What are others opinions? I lean towards the second option since it keeps the scope reasonable. cc @dbtsai @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15831 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68411/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15831 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15831 **[Test build #68411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68411/consoleFull)** for PR 15831 at commit [`89e6858`](https://github.com/apache/spark/commit/89e6858545d5f9b064b590b3e3e5f34bcb3bfa82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15831 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68410/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15831 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15831 **[Test build #68410 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68410/consoleFull)** for PR 15831 at commit [`a9483ef`](https://github.com/apache/spark/commit/a9483ef41423f2dfdc3bfb747a3bcf99ea1db50b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15831 **[Test build #68411 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68411/consoleFull)** for PR 15831 at commit [`89e6858`](https://github.com/apache/spark/commit/89e6858545d5f9b064b590b3e3e5f34bcb3bfa82). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15831 **[Test build #68410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68410/consoleFull)** for PR 15831 at commit [`a9483ef`](https://github.com/apache/spark/commit/a9483ef41423f2dfdc3bfb747a3bcf99ea1db50b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/15831 cc: @dbtsai @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org