GitHub user BryanCutler opened a pull request:

    https://github.com/apache/spark/pull/9361

    [SPARK-10158] [PySpark] [MLlib] ALS better error message when using Long IDs

    Added catch for casting Long to Int exception when PySpark ALS Ratings are 
serialized.  It is easy to accidentally use Long IDs for user/product and 
before, it would fail with a somewhat cryptic "ClassCastException: 
java.lang.Long cannot be cast to java.lang.Integer."  Now if this is done, a 
more descriptive error is shown, e.g. "PickleException: Ratings id 
1205640308657491975 exceeds max integer value of 2147483647."

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BryanCutler/spark 
als-pyspark-long-id-error-SPARK-10158

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9361.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9361
    
----
commit b6b756adc04d996b66ac2555b311ed77bf6e7ec8
Author: Bryan Cutler <[email protected]>
Date:   2015-10-29T17:25:54Z

    [SPARK-10158] Added check for PySpark Ratings with ids of Long values

commit fbea910bc6305639fe12595b3deddd4ecb5fa7dc
Author: Bryan Cutler <[email protected]>
Date:   2015-10-29T17:46:37Z

    [SPARK-10158] Added PySpark ALS test case for using Ratings ids with Long 
values

commit bc50d1020c53290cfa59da72c7448f976c56c1cd
Author: Bryan Cutler <[email protected]>
Date:   2015-10-29T18:40:56Z

    [SPARK-10158] Improved test case to just use Pickler, no need to invoke 
train

commit 45da6c852c81495458dc54fd412e279885eb06f6
Author: Bryan Cutler <[email protected]>
Date:   2015-10-29T20:35:05Z

    Changed wording of exception message to include 'integer'

commit 51f2479f477f3abbd5f808c55d62ae9d4ebbb15c
Author: Bryan Cutler <[email protected]>
Date:   2015-10-29T20:35:30Z

    Added positive test case for ALS Ratings serialize

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to