Github user aarondav commented on a diff in the pull request:
https://github.com/apache/spark/pull/3740#discussion_r22092697
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1174,6 +1174,14 @@ abstract class RDD[T: ClassTag](
* Save this RDD as a text file, using string representations of
elements.
*/
def saveAsTextFile(path: String) {
+ // https://issues.apache.org/jira/browse/SPARK-2075
+ // NullWritable is a Comparable rather than Comparable[NullWritable]
in Hadoop 1.+,
+ // so the compiler cannot find an implicit Ordering for it. It will
generate different
+ // anonymous classes for `saveAsTextFile` in Hadoop 1.+ and Hadoop
2.+. Therefore, here we
+ // provide an Ordering for NullWritable so that the compiler will
generate same codes.
+ implicit val nullWritableOrdering = new Ordering[NullWritable] {
+ override def compare(x: NullWritable, y: NullWritable): Int = 0
+ }
this.map(x => (NullWritable.get(), new Text(x.toString)))
.saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path)
--- End diff --
Is the problem here is that while compiling Hadoop 2, the compiler chooses
to specify the Ordering on the implicit rddToPairRDDFunctions, while in Hadoop
1 it instead uses the default method (`return null`) to invoke the implicit?
I wonder if a more explicit solution, like the introduction of an
conversion to PairRDDFunctions which takes an Ordering, is warranted for these
cases. e.g.:
```scala
this.map(x => (NullWritable.get(), new Text(x.toString)))
.toPairRDD(nullWritableOrdering)
.saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path)
```
This would be less magical in why the definition of an implicit Ordering
changes bytecode.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]