Github user cmccabe commented on the pull request:

    https://github.com/apache/spark/pull/898#issuecomment-44368192
  
    hflush __was__ in hadoop 0.21.  You can download 
http://archive.apache.org/dist/hadoop/core/hadoop-0.21.0/hadoop-0.21.0.tar.gz 
and check for yourself in 
common/src/java/org/apache/hadoop/fs/FSDataOutputStream.java.
    
    I also verified that hadoop 1.0.4 does __not__ have hflush (although, 
amusingly enough, it does have references to hflush in the code and 
documentation... from patches that were cherry-picked from other branches, 
presumably.)  Instead, it has an implementation of hflush (I think?) inside the 
sync function.
    
    Looking at the "Hadoop genealogy" reveals how this could have happened: 
http://2.bp.blogspot.com/-GO6HF0OAFHw/UOfNEH-4sEI/AAAAAAAAAD0/dEWFFYTRgYw/s1600/output-file.png
    
    It looks like what happened was that the hadoop 0.20 line kind of diverged 
from the hadoop 0.21 line.  The 1.0.4 release somehow came out of the 0.20 
line, while the 0.21 line mutated into hadoop 2.x at some point.  This was all 
before my time... even CDH3 had hflush, which is the oldest version of Hadoop I 
ever worked on.
    
    Sounds like we're back to reflection tricks, then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to