[GitHub] spark pull request #19226: [SPARK-21985][PySpark] PairDeserializer is broken...

aray Wed, 13 Sep 2017 18:33:41 -0700

GitHub user aray opened a pull request:

    https://github.com/apache/spark/pull/19226


    [SPARK-21985][PySpark] PairDeserializer is broken for double-zipped RDDs

    ## What changes were proposed in this pull request?
    
    This removes the mostly unnecessary test that each individual batch from 
the key and value serializers are of the same size. We already enforce the 
batch sizes are the same in rdd.zip (see: 
https://github.com/apache/spark/blob/c06f3f5ac500b02d38ca7ec5fcb33085e07f2f75/python/pyspark/rdd.py#L2118
 ) which is the only palce it is used in a non trivial manner. This adds a 
comment to the PairDeserializer documentation about this requirement.
    
    ## How was this patch tested?
    
    Additional unit test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aray/spark SPARK-21985

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19226.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19226
    
----
commit 4a9eb935b8438a159c9f12239135eedd59b25fd3
Author: Andrew Ray <[email protected]>
Date:   2017-09-14T01:26:15Z

    remove check and add tests

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19226: [SPARK-21985][PySpark] PairDeserializer is broken...

Reply via email to