[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80606/ Test PASSed. ---

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80606/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80606/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80602/ Test FAILed. ---

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80602/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80602/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80600/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80600/ Test FAILed. ---

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80600/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-13 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18899 Thanks @sethah @srowen . The comment is added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-13 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18899 OK maybe include some of this text in the scaladoc for it, to make it clear it is always intended to be called with the value of `numNonZeroes`. --- If your project is set up for it, you can reply

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80529/ Test PASSed. ---

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80529 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80529/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 Ok, it's fairly safe since it's limited to `private[linalg]`. The confusion for me is that this method introduces all sorts of edge cases which have behavior that is not at all obvious or clear. If

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18899 This isn't a public method though. The dv.toSparseWithSize(2) error will never come up unless Spark causes it, and there's no contract for its behavior in that case. It's probably over-specifying

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 I think there _is_ new functionality, a new method that needs its functionality defined. One specific example, we need a test like: scala test("toSparseWithSize") { val dv

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80529/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18899 I did not only test this PR. Only work for PR 18904 and find this performance difference. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18899 @mpjlu sorry which benchmark are you referring to? PR 18904 doesn't seem to benchmark just this in isolation. I just want to be sure the gain is significant --- If your project is set up for it,

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18899 No, duplicate code like that is bad. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18899 Hi @srowen; how about using our first version? though duplicate some code, but change is small. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18899 For PR-18904, before this change, one iteration is about 58s, after this change, one iteration is about:40s --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18899 In theory there's no new functionality here so nothing new to test, but more tests never hurt. This seems OK. Is there any other call site where nnz is already known? It is a

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18899 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80523/ Test FAILed. ---

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18899 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80517/ Test FAILed. ---

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80517/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18899 Hi @sethah , the unit test is added. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80517/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 Ok, yes, I see it now. Though, the point remains but to a lesser degree. We still have a method, albeit private, that indexes the array at potentially unsafe locations. It's probably ok, but at the

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18899 The new method is private. Certainly the user is not intended to call it and supply nnz. This change shouldn't alter any semantics or functionality. It's just trying to avoid calculating nnz twice:

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 Btw, I think the compile error is because `v.toSparse(2)` could resolve to either `v.toSparse(nnz = 2)` OR `v.toSparse.apply(2)`. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 This approach doesn't feel right to me. The goal of the change is to avoid making a pass over the values to find out if there are any explicit zeros that need to be eliminated, which is fine.

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 First suggestion is that there must be unit tests :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80487/ Test PASSed. ---

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80487/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80487/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80485/ Test FAILed. ---

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80485 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80485/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80485 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80485/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18899 The user can't call toSparse(nnz). It will be private. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18899 Thanks @srowen. I will revise the code per your suggestion. when I wrote the code, I just concerned user call toSparse(size) and give a very small size. --- If your project is set up for

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18899 Check what? you're just saving the extra call to numNonZeroes. Change the declaration like so: ``` def toSparse: SparseVector = toSparse(numNonzeros) private[linalg] def

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18899 Yes, I just concern if add toSparse(size) we should check the size in the code, there will be no performance gain. If we don't need to check the "size" (comparing size with numNonZero) in the code,

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18899 This isn't what was proposed in the JIRA? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80473/testReport)** for PR 18899 at commit

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80473/ Test PASSed. ---

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80473/testReport)** for PR 18899 at commit