[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-02 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @MLnick @srowen @imatiach-msft Thank you for your comments and for helping making this improvement robust. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-02 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17059 Merged to master. Thanks @datumbox, and everyone for reviews. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73718/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73718/ Test PASSed. ---

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73718/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73706/ Test PASSed. ---

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73706/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73706/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @MLnick Cool did not know that. Just updated commit messages. I think I'm good with this PR from my side. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17059 @datumbox once a PR is ok'ed to test from a committer, it should automatically retest on each commit. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73697/ Test PASSed. ---

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73697/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73697/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen @MLnick I had a typo on the committed version... Could you please retest it? I don't think I have the rights to do it myself. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73693/ Test FAILed. ---

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73693/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73693/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @MLnick I accepted all proposed changes. Please retest. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17059 @imatiach-msft thanks for reviewing and requesting the test case, I was going to ask for one :) --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17059 @datumbox left a few minor comments to clean up. Overall looks good. I think it is fine for 2.2. For the regParam even though it is a minor thing it would be best to just create a new JIRA

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17059 The discussion around numeric types was at https://github.com/apache/spark/pull/12762#r61539327. The checked cast was introduced because it was previously implicitly cast to `Int` anyway, and would

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73620/ Test PASSed. ---

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73620/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 I updated the exception message and I added Unit-tests to cover the checkedCast method as @imatiach-msft suggested. I have not corrected the value of regParam but I had a look and it is safe to do

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17059 @datumbox ah ok, thanks for the link. LGTM! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @imatiach-msft This is already used in other parts of MLlib. Have a look here:

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17059 @datumbox thanks for adding the tests, I've approved the review, assuming the other reviewers are ok with minimal double support. I haven't seen withClue used in the same way you've used it,

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73620/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17059 @datumbox I added an additional comment -- since you believe we should keep the current code, we should add a test case and slightly modify the error message to let the user know what other

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @imatiach-msft Thanks mate. Show it and replied. I really don't mind it either way to be honest. @MLnick @srowen I am not sure if this PR will make it to Spark 2.2 due to the upcoming

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17059 @datumbox I've added some additional comments with regards to the fractional part, please take a look, thanks! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73585/ Test PASSed. ---

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73585/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73585/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17059 Ok, let me take a look at this. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @MLnick Yeap! I have run into problems with the original implementation when dealing with several billions records. This is when removing casting starts paying off. :) --- If your project is set

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17059 @datumbox you mention there is GC & performance overhead which makes some sense. Have you run into problems with very large scale (like millions users & items & ratings)? I did regression tests

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @imatiach-msft This comparison is intentional and checks 2 things: That the number is within integer range and that the Id does not have any non-zero digits after the decimal point. If the number

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17059 @datumbox I like the changes, I just had a minor concern about the code where we call v.intValue and then compare this to v.doubleValue -- due to precision issues, I'm not sure if this is

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen @mlnick I updated the PR based on what was discussed above and I tested it again on Spark 2.1. I also updated the coding styles and the exception message. The changes requested

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 Yeah, Scala Long matches. Here is the "stand-alone" script that I used to confirm that everything works ok (tested on Spark 2.1): ```scala import org.apache.spark.sql.types._ import

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17059 This one would be good for @mlnick to look at. The Scala Long will match java.lang.Number right? if so it works even if there's another conversion there, but maybe that one is trivial, and

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen The following snippet handles explicitly Longs. It can be rewritten to remove duplicate code by introducing bools for overflow detection but I don't think it is worth it. In theory you can

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 Ignore my comment about duplicate code. It can be written to avoid it. I will investigate handling the SQL decimal types as you recommended and I will update the code tonight. --- If your

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen: Thanks for the comments. We are getting there. :) I will handle the Long case as you suggest. If you think people use SQL decimal types, I can include them at the end of

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17059 That's compelling regarding performance. It's not big but not trivial. My remaining concern is whether you're handling all the cases the original did. `Number` covers a lot but does it include

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-26 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 I decided to provide a few more bencharks in order to alleviate some of the concerns raised by @srowen. To reproduce the results add the following snippet in the ALSSuite class:

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen Yes, the behaviour of the method remains the same. This patch helped me get a measurable improvement on GC overhead in Spark 2.0, so I though that it would be beneficial for others. Anyway

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17059 You are effectively handling the cases that casting handles. Doesn't a Scala long get boxed another time now? I bet it works, just wondering why that's not handled like Int. I'm neutral on this and

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 Could you explain what you mean by "duplicating"? It is safe with Scala Long; I did lots of tests to ensure that it works well. If you require any changes, I'm happy to update the PR. --- If

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17059 I get it, though the drawback is mostly that you've duplicated in a way the machinery that would construe any number as something comparable to the min/max int value. (Does this not miss the case of

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 Concerning the failed scala style test, this is caused by the long in-line comment that I added to explain the if statement. If you decide to approve the PR, I can remove it. Cheers! :) --- If

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen I believe that this needs to be fixed for 2 reasons: 1. Casting the ids to double just to convert it back to integer is not an elegant solution and it is rather confusing. 2. The

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #3582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3582/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #3582 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3582/testReport)** for PR 17059 at commit

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17059 I get it, but is that cast actually slowing things down measurably? because this change also adds some overhead in the checks it does. I think it's better to let the cast to double deal with the