Github user cmccabe commented on the pull request:
https://github.com/apache/spark/pull/898#issuecomment-44368192
hflush __was__ in hadoop 0.21. You can download
http://archive.apache.org/dist/hadoop/core/hadoop-0.21.0/hadoop-0.21.0.tar.gz
and check for yourself in
common/src/java/org/apache/hadoop/fs/FSDataOutputStream.java.
I also verified that hadoop 1.0.4 does __not__ have hflush (although,
amusingly enough, it does have references to hflush in the code and
documentation... from patches that were cherry-picked from other branches,
presumably.) Instead, it has an implementation of hflush (I think?) inside the
sync function.
Looking at the "Hadoop genealogy" reveals how this could have happened:
http://2.bp.blogspot.com/-GO6HF0OAFHw/UOfNEH-4sEI/AAAAAAAAAD0/dEWFFYTRgYw/s1600/output-file.png
It looks like what happened was that the hadoop 0.20 line kind of diverged
from the hadoop 0.21 line. The 1.0.4 release somehow came out of the 0.20
line, while the 0.21 line mutated into hadoop 2.x at some point. This was all
before my time... even CDH3 had hflush, which is the oldest version of Hadoop I
ever worked on.
Sounds like we're back to reflection tricks, then.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---