GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/3740

    Add an Ordering for NullWritable to make the compiler generate same byte 
codes for RDD

    `NullWritable` is a `Comparable` rather than `Comparable[NullWritable]` in 
Hadoop 1.+, so the compiler cannot find an implicit Ordering for it. It will 
generate different anonymous classes for `saveAsTextFile` in Hadoop 1.+ and 
Hadoop 2.+. Therefore, here we provide an Ordering for NullWritable so that the 
compiler will generate same codes.
    
    I used the following commands to confirm the generated byte codes are some.
    ```
    mvn -Dhadoop.version=1.2.1 -DskipTests clean package -pl core -am
    javap -private -c -classpath core/target/scala-2.10/classes 
org.apache.spark.rdd.RDD > ~/hadoop1.txt
    
    mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package 
-pl core -am
    javap -private -c -classpath core/target/scala-2.10/classes 
org.apache.spark.rdd.RDD > ~/hadoop2.txt
    
    diff ~/hadoop1.txt ~/hadoop2.txt
    ```
    
    However, it's inevitable that generating different codes for the classes 
which call methods of `org.apache.hadoop.mapreduce.TaskAttemptContext`. 
`TaskAttemptContext` is a class in Hadoop 1.+, and calling its method will use 
`invokevirtual`, while it's an interface in Hadoop 2.+, and will use 
`invokeinterface`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark SPARK-2075

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3740.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3740
    
----
commit fa40db011e9dd9e67482d5a659103196d5a6c8a6
Author: zsxwing <[email protected]>
Date:   2014-12-19T02:27:42Z

    Add an Ordering for NullWritable to make the compiler generate same byte 
codes for RDD

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to