Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@MLnick @srowen @imatiach-msft Thank you for your comments and for helping
making this improvement robust.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17121
@MLnick Sounds good to me thanks. :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@MLnick Cool did not know that. Just updated commit messages. I think I'm
good with this PR from my side.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@srowen @MLnick I had a typo on the committed version... Could you please
retest it? I don't think I have the rights to do it myself.
---
If your project is set up for it, you can reply
GitHub user datumbox opened a pull request:
https://github.com/apache/spark/pull/17121
[SPARK-19787][ML] Changing the default parameter of regParam.
## What changes were proposed in this pull request?
In the ALS method the default values of regParam do not match within
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@MLnick I accepted all proposed changes. Please retest. :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
I updated the exception message and I added Unit-tests to cover the
checkedCast method as @imatiach-msft suggested. I have not corrected the value
of regParam but I had a look and it is safe to do
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@imatiach-msft This is already used in other parts of MLlib. Have a look
here:
https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/classification
Github user datumbox commented on a diff in the pull request:
https://github.com/apache/spark/pull/17059#discussion_r103557982
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@imatiach-msft Thanks mate. Show it and replied. I really don't mind it
either way to be honest.
@MLnick @srowen I am not sure if this PR will make it to Spark 2.2 due to
the upcoming
Github user datumbox commented on a diff in the pull request:
https://github.com/apache/spark/pull/17059#discussion_r103550608
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends
Github user datumbox commented on a diff in the pull request:
https://github.com/apache/spark/pull/17059#discussion_r103427764
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@MLnick Yeap! I have run into problems with the original implementation
when dealing with several billions records. This is when removing casting
starts paying off. :)
---
If your project is set
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@imatiach-msft This comparison is intentional and checks 2 things: That the
number is within integer range and that the Id does not have any non-zero
digits after the decimal point. If the number
Github user datumbox commented on a diff in the pull request:
https://github.com/apache/spark/pull/17059#discussion_r103392500
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@srowen @mlnick I updated the PR based on what was discussed above and I
tested it again on Spark 2.1. I also updated the coding styles and the
exception message.
The changes requested
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
Yeah, Scala Long matches. Here is the "stand-alone" script that I used to
confirm that everything works ok (tested on Spark 2.1):
```scala
import org.apache.spark.sql.types._
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@srowen The following snippet handles explicitly Longs. It can be rewritten
to remove duplicate code by introducing bools for overflow detection but I
don't think it is worth it. In theory you can
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
Ignore my comment about duplicate code. It can be written to avoid it. I
will investigate handling the SQL decimal types as you recommended and I will
update the code tonight.
---
If your
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@srowen: Thanks for the comments. We are getting there. :)
I will handle the Long case as you suggest.
If you think people use SQL decimal types, I can include them at the end
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
I decided to provide a few more bencharks in order to alleviate some of the
concerns raised by @srowen.
To reproduce the results add the following snippet in the ALSSuite class
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@srowen Yes, the behaviour of the method remains the same. This patch
helped me get a measurable improvement on GC overhead in Spark 2.0, so I though
that it would be beneficial for others. Anyway
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
Could you explain what you mean by "duplicating"? It is safe with Scala
Long; I did lots of tests to ensure that it works well. If you require any
changes, I'm happy to upd
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
Concerning the failed scala style test, this is caused by the long in-line
comment that I added to explain the if statement. If you decide to approve the
PR, I can remove it. Cheers
Github user datumbox commented on the issue:
https://github.com/apache/spark/pull/17059
@srowen I believe that this needs to be fixed for 2 reasons:
1. Casting the ids to double just to convert it back to integer is not an
elegant solution and it is rather confusing.
2
Github user datumbox commented on a diff in the pull request:
https://github.com/apache/spark/pull/17059#discussion_r103055703
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends
GitHub user datumbox opened a pull request:
https://github.com/apache/spark/pull/17059
Removed unnecessary castings and refactored checked casts in ALS.
## What changes were proposed in this pull request?
The original ALS was performing unnecessary casting to the user
28 matches
Mail list logo