[GitHub] spark pull request #15831: [SPARK-18385][ML] Make the transformer's natively...

2017-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15831


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15831: [SPARK-18385][ML] Make the transformer's natively...

2016-11-17 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15831#discussion_r88530411
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -243,6 +244,42 @@ final class ChiSqSelectorModel private[ml] (
 StructType(outputFields)
   }
 
+  private def compress(features: Vector): Vector = {
+features match {
+  case SparseVector(_, indices, values) =>
+val newSize = selectedFeatures.length
+val newValues = new ArrayBuilder.ofDouble
+val newIndices = new ArrayBuilder.ofInt
+var i = 0
+var j = 0
+var indicesIdx = 0
+var filterIndicesIdx = 0
+while (i < indices.length && j < newSize) {
+  indicesIdx = indices(i)
+  filterIndicesIdx = selectedFeatures(j)
+  if (indicesIdx == filterIndicesIdx) {
+newIndices += j
+newValues += values(i)
+j += 1
+i += 1
+  } else {
+if (indicesIdx > filterIndicesIdx) {
+  j += 1
+} else {
+  i += 1
+}
+  }
+}
+Vectors.sparse(newSize, newIndices.result(), newValues.result())
+  case DenseVector(values) =>
+val values = features.toArray
+Vectors.dense(selectedFeatures.map(i => values(i)))
+  case other =>
--- End diff --

btw there is no reason to have this case since `Vector` is a sealed trait


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15831: [SPARK-18385][ML] Make the transformer's natively...

2016-11-09 Thread techaddict
GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/15831

[SPARK-18385][ML] Make the transformer's natively in ml framework to avoid 
extra conversion

## What changes were proposed in this pull request?
Transformer's added in ml framework to avoid extra conversion for:
ChiSqSelector
IDF
StandardScaler
PCA

## How was this patch tested?
Existing Tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark ml-transformer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15831.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15831


commit da3626168ce264719517a8d34afdc500991fb700
Author: Sandeep Singh 
Date:   2016-11-09T14:53:14Z

ChiSqSelector: make the transformer natively in ml framework to avoid extra 
conversion

commit 733394fb3d7f4ea6891a4f6b0e41a03c9a1abc38
Author: Sandeep Singh 
Date:   2016-11-09T15:40:24Z

add transformer for IDF

commit da437316879a6e2cb9df9549e28ea9b1b95b63d5
Author: Sandeep Singh 
Date:   2016-11-09T15:55:22Z

add StandardScaler transform

commit a9483ef41423f2dfdc3bfb747a3bcf99ea1db50b
Author: Sandeep Singh 
Date:   2016-11-09T16:03:01Z

add PCA transform




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org