[GitHub] spark pull request #15831: [SPARK-18385][ML] Make the transformer's natively...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15831 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15831: [SPARK-18385][ML] Make the transformer's natively...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15831#discussion_r88530411 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -243,6 +244,42 @@ final class ChiSqSelectorModel private[ml] ( StructType(outputFields) } + private def compress(features: Vector): Vector = { +features match { + case SparseVector(_, indices, values) => +val newSize = selectedFeatures.length +val newValues = new ArrayBuilder.ofDouble +val newIndices = new ArrayBuilder.ofInt +var i = 0 +var j = 0 +var indicesIdx = 0 +var filterIndicesIdx = 0 +while (i < indices.length && j < newSize) { + indicesIdx = indices(i) + filterIndicesIdx = selectedFeatures(j) + if (indicesIdx == filterIndicesIdx) { +newIndices += j +newValues += values(i) +j += 1 +i += 1 + } else { +if (indicesIdx > filterIndicesIdx) { + j += 1 +} else { + i += 1 +} + } +} +Vectors.sparse(newSize, newIndices.result(), newValues.result()) + case DenseVector(values) => +val values = features.toArray +Vectors.dense(selectedFeatures.map(i => values(i))) + case other => --- End diff -- btw there is no reason to have this case since `Vector` is a sealed trait --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15831: [SPARK-18385][ML] Make the transformer's natively...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/15831 [SPARK-18385][ML] Make the transformer's natively in ml framework to avoid extra conversion ## What changes were proposed in this pull request? Transformer's added in ml framework to avoid extra conversion for: ChiSqSelector IDF StandardScaler PCA ## How was this patch tested? Existing Tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark ml-transformer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15831.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15831 commit da3626168ce264719517a8d34afdc500991fb700 Author: Sandeep SinghDate: 2016-11-09T14:53:14Z ChiSqSelector: make the transformer natively in ml framework to avoid extra conversion commit 733394fb3d7f4ea6891a4f6b0e41a03c9a1abc38 Author: Sandeep Singh Date: 2016-11-09T15:40:24Z add transformer for IDF commit da437316879a6e2cb9df9549e28ea9b1b95b63d5 Author: Sandeep Singh Date: 2016-11-09T15:55:22Z add StandardScaler transform commit a9483ef41423f2dfdc3bfb747a3bcf99ea1db50b Author: Sandeep Singh Date: 2016-11-09T16:03:01Z add PCA transform --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org