[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-05-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15831
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-05-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15831
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77448/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-05-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15831
  
**[Test build #77448 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77448/testReport)**
 for PR 15831 at commit 
[`89e6858`](https://github.com/apache/spark/commit/89e6858545d5f9b064b590b3e3e5f34bcb3bfa82).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-05-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15831
  
**[Test build #77448 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77448/testReport)**
 for PR 15831 at commit 
[`89e6858`](https://github.com/apache/spark/commit/89e6858545d5f9b064b590b3e3e5f34bcb3bfa82).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-05-17 Thread techaddict
Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
@HyukjinKwon was busy, will restart this week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15831
  
Hi @@techaddict, how is this PR going?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-01-10 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15831
  
@techaddict @sethah I have some time to work on the porting, but I dont 
find the umbrella JIRA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-01-09 Thread techaddict
Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
@sethah I will revive this pr thanks 👍 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-01-09 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15831
  
I think we decided to go a different direction than what is proposed here? 
Actually, I still think there's merit in fixing the problem without having to 
do full feature ports. Either way, I'm not sure anyone is still taking on this 
task, so @zhengruifeng or @techaddict it would be great if you wanted to either 
revive this PR/help review, or start working on the larger umbrella JIRA and 
sub tasks...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-01-09 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15831
  
the same TODO also appear in `HashingTF`, what about include it in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-12-01 Thread techaddict
Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
@MLnick I will create a umbrella jira and start adding jira's for things 
I'm aware of of and you can start prioritising 👍  sounds like a plan ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-12-01 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/15831
  
I'm also generally supportive of (1) - porting the code to `ml` and having 
the `mllib` code wrap the `ml` version - this is the approach for other models 
that have been done. Of course only once *all* `mllib` code has been ported 
over fully can we ultimately deprecate `mllib`.

I guess we can start doing this for some transformers like these - but 
ideally we should focus on porting stuff that's still missing in `ml` first. 

I'd prefer that we create a top-level JIRA to track all the components that 
need to be done, and link everything appropriately. We also need to decide on 
priority - we may realistically be working on it over a 1-1.5 year time frame.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-12-01 Thread techaddict
Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
@sethah @yanboliang I've started with migrating `IDF`, can you review the 
WIP and if i'm going in the right direction 
https://github.com/techaddict/spark/pull/2/files
there is some code duplication were we can make mllib code actually depend 
on the ml one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-21 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15831
  
@techaddict @sethah I'm more prefer option 1, since we would like to remove 
spark.mllib package in a future release(may be 3.0) and we wouldn't like to 
make any change to it except bug fix. Could you make this improvement 
separately for relevant algorithms? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-17 Thread techaddict
Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
@sethah I agree, 2nd approach is much more reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-17 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15831
  
I see this patch was created as a result of the PR that separated the 
ml/mllib linalg packages, to avoid some inefficiencies in conversion. However, 
it also is a partial step toward feature parity. Typically, we would port full 
algorithms all at once, instead of just porting the transformer functionality 
as is done here, but I understand that there is not just about parity. I would 
suggest one of the following:

1. Port over full feature functionality. This increases the scope and 
therefore the algos should probably separated out individually into PRs.
2. Keep the scope the same, but avoid copying code.

For an example of option 2, for `ChiSqSelector`, we can implement new 
static methods in the `mllib.ChiSqSelectorModel`:

scala
private[spark] def compressDense(
  selectedFeatures: Array[Int],
  values: Array[Double]): Array[Double] = {
selectedFeatures.map(i => values(i))
  }

  private[spark] def compressSparse(
  compressedSize: Int,
  selectedFeatures: Array[Int],
  indices: Array[Int],
  values: Array[Double]): (Array[Int], Array[Double]) = {
  ...
}


then in the actual model classes we can just do something like:

scala
private def compress(features: Vector): Vector = {
features match {
  case SparseVector(_, indices, values) =>
val newSize = selectedFeatures.length
val (newIndices, newValues) =
  ChiSqSelectorModel.compressSparse(newSize, selectedFeatures, 
indices, values)
Vectors.sparse(newSize, newIndices, newValues)
  case DenseVector(values) =>
Vectors.dense(ChiSqSelectorModel.compressDense(selectedFeatures, 
values))
}
}

This approach would allow us to avoid copying a lot of code until we do 
full feature ports. What are others opinions? I lean towards the second option 
since it keeps the scope reasonable.

cc @dbtsai @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15831
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68411/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15831
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15831
  
**[Test build #68411 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68411/consoleFull)**
 for PR 15831 at commit 
[`89e6858`](https://github.com/apache/spark/commit/89e6858545d5f9b064b590b3e3e5f34bcb3bfa82).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15831
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68410/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15831
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15831
  
**[Test build #68410 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68410/consoleFull)**
 for PR 15831 at commit 
[`a9483ef`](https://github.com/apache/spark/commit/a9483ef41423f2dfdc3bfb747a3bcf99ea1db50b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15831
  
**[Test build #68411 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68411/consoleFull)**
 for PR 15831 at commit 
[`89e6858`](https://github.com/apache/spark/commit/89e6858545d5f9b064b590b3e3e5f34bcb3bfa82).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15831
  
**[Test build #68410 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68410/consoleFull)**
 for PR 15831 at commit 
[`a9483ef`](https://github.com/apache/spark/commit/a9483ef41423f2dfdc3bfb747a3bcf99ea1db50b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2016-11-09 Thread techaddict
Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/15831
  
cc: @dbtsai @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org