GitHub user zsxwing opened a pull request:
https://github.com/apache/spark/pull/3740
Add an Ordering for NullWritable to make the compiler generate same byte
codes for RDD
`NullWritable` is a `Comparable` rather than `Comparable[NullWritable]` in
Hadoop 1.+, so the compiler cannot find an implicit Ordering for it. It will
generate different anonymous classes for `saveAsTextFile` in Hadoop 1.+ and
Hadoop 2.+. Therefore, here we provide an Ordering for NullWritable so that the
compiler will generate same codes.
I used the following commands to confirm the generated byte codes are some.
```
mvn -Dhadoop.version=1.2.1 -DskipTests clean package -pl core -am
javap -private -c -classpath core/target/scala-2.10/classes
org.apache.spark.rdd.RDD > ~/hadoop1.txt
mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
-pl core -am
javap -private -c -classpath core/target/scala-2.10/classes
org.apache.spark.rdd.RDD > ~/hadoop2.txt
diff ~/hadoop1.txt ~/hadoop2.txt
```
However, it's inevitable that generating different codes for the classes
which call methods of `org.apache.hadoop.mapreduce.TaskAttemptContext`.
`TaskAttemptContext` is a class in Hadoop 1.+, and calling its method will use
`invokevirtual`, while it's an interface in Hadoop 2.+, and will use
`invokeinterface`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zsxwing/spark SPARK-2075
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3740.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3740
----
commit fa40db011e9dd9e67482d5a659103196d5a6c8a6
Author: zsxwing <[email protected]>
Date: 2014-12-19T02:27:42Z
Add an Ordering for NullWritable to make the compiler generate same byte
codes for RDD
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]