[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-02 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @MLnick @srowen @imatiach-msft Thank you for your comments and for helping making this improvement robust. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17121: [SPARK-19787][ML] Changing the default parameter of regP...

2017-03-01 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17121 @MLnick Sounds good to me thanks. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @MLnick Cool did not know that. Just updated commit messages. I think I'm good with this PR from my side. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen @MLnick I had a typo on the committed version... Could you please retest it? I don't think I have the rights to do it myself. --- If your project is set up for it, you can reply

[GitHub] spark pull request #17121: [SPARK-19787][ML] Changing the default parameter ...

2017-03-01 Thread datumbox
GitHub user datumbox opened a pull request: https://github.com/apache/spark/pull/17121 [SPARK-19787][ML] Changing the default parameter of regParam. ## What changes were proposed in this pull request? In the ALS method the default values of regParam do not match within

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-03-01 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @MLnick I accepted all proposed changes. Please retest. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 I updated the exception message and I added Unit-tests to cover the checkedCast method as @imatiach-msft suggested. I have not corrected the value of regParam but I had a look and it is safe to do

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @imatiach-msft This is already used in other parts of MLlib. Have a look here: https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/classification

[GitHub] spark pull request #17059: [SPARK-19733][ML]Removed unnecessary castings and...

2017-02-28 Thread datumbox
Github user datumbox commented on a diff in the pull request: https://github.com/apache/spark/pull/17059#discussion_r103557982 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @imatiach-msft Thanks mate. Show it and replied. I really don't mind it either way to be honest. @MLnick @srowen I am not sure if this PR will make it to Spark 2.2 due to the upcoming

[GitHub] spark pull request #17059: [SPARK-19733][ML]Removed unnecessary castings and...

2017-02-28 Thread datumbox
Github user datumbox commented on a diff in the pull request: https://github.com/apache/spark/pull/17059#discussion_r103550608 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends

[GitHub] spark pull request #17059: [SPARK-19733][ML]Removed unnecessary castings and...

2017-02-28 Thread datumbox
Github user datumbox commented on a diff in the pull request: https://github.com/apache/spark/pull/17059#discussion_r103427764 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-28 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @MLnick Yeap! I have run into problems with the original implementation when dealing with several billions records. This is when removing casting starts paying off. :) --- If your project is set

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @imatiach-msft This comparison is intentional and checks 2 things: That the number is within integer range and that the Id does not have any non-zero digits after the decimal point. If the number

[GitHub] spark pull request #17059: [SPARK-19733][ML]Removed unnecessary castings and...

2017-02-27 Thread datumbox
Github user datumbox commented on a diff in the pull request: https://github.com/apache/spark/pull/17059#discussion_r103392500 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen @mlnick I updated the PR based on what was discussed above and I tested it again on Spark 2.1. I also updated the coding styles and the exception message. The changes requested

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 Yeah, Scala Long matches. Here is the "stand-alone" script that I used to confirm that everything works ok (tested on Spark 2.1): ```scala import org.apache.spark.sql.types._

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen The following snippet handles explicitly Longs. It can be rewritten to remove duplicate code by introducing bools for overflow detection but I don't think it is worth it. In theory you can

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 Ignore my comment about duplicate code. It can be written to avoid it. I will investigate handling the SQL decimal types as you recommended and I will update the code tonight. --- If your

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen: Thanks for the comments. We are getting there. :) I will handle the Long case as you suggest. If you think people use SQL decimal types, I can include them at the end

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-26 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 I decided to provide a few more bencharks in order to alleviate some of the concerns raised by @srowen. To reproduce the results add the following snippet in the ALSSuite class

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen Yes, the behaviour of the method remains the same. This patch helped me get a measurable improvement on GC overhead in Spark 2.0, so I though that it would be beneficial for others. Anyway

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 Could you explain what you mean by "duplicating"? It is safe with Scala Long; I did lots of tests to ensure that it works well. If you require any changes, I'm happy to upd

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 Concerning the failed scala style test, this is caused by the long in-line comment that I added to explain the if statement. If you decide to approve the PR, I can remove it. Cheers

[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-24 Thread datumbox
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen I believe that this needs to be fixed for 2 reasons: 1. Casting the ids to double just to convert it back to integer is not an elegant solution and it is rather confusing. 2

[GitHub] spark pull request #17059: Removed unnecessary castings and refactored check...

2017-02-24 Thread datumbox
Github user datumbox commented on a diff in the pull request: https://github.com/apache/spark/pull/17059#discussion_r103055703 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends

[GitHub] spark pull request #17059: Removed unnecessary castings and refactored check...

2017-02-24 Thread datumbox
GitHub user datumbox opened a pull request: https://github.com/apache/spark/pull/17059 Removed unnecessary castings and refactored checked casts in ALS. ## What changes were proposed in this pull request? The original ALS was performing unnecessary casting to the user