To Devin, thank you very much for your explanation.
I do found that I can read the data out of the file even if I did not close the file I'm writing to ( the read operation is call on another file handler opened on the same file but still in the same process ), which make me more confuse at that time, because I think since I can read the data from the file , why can't I get the length of the file correctly. But from the explantion that you have described, I think I can understand it now. So it seems in order to do what I want ( write some data to the file, and then get the length of the file throuth webhdfs interface), I have to open and close the file every time I do the write operation. Thank you very much again. xiaobinshe 2013/12/19 Devin Suiter RDX <[email protected]> > Hello, > > In my experience with Flume, watching the HDFS Sink verbose output, I know > that even after a file has flushed, but is still open, it reads as a 0-byte > file, even if there is actually data contained in the file. > > A HDFS "file" is a meta-location that can accept streaming input for as > long as it is open, so the length cannot be mathematically defined until a > start and an end are in place. > > The flush operation moves data from a buffer to a storage medium, but I > don't think that necessarily means that it tells the HDFS RecordWriter to > place the "end of stream/EOF" marker down, since the "file" meta-location > in HDFS is a pile of actual files around the cluster on physical disk that > HDFS presents to you as one file. The HDFS "file" and the physical file > splits on disk are distinct, and I would suspect that your HDFS flush calls > are forcing Hadoop to move the physical filesplits from their physical > datanode buffers to disk, but is not telling HDFS that you expect no > further input - that is what the HDFS close will do. > > One thing you could try - instead of asking for the length property, which > is probably unavailable until the close call, try asking for/viewing the > contents of the file. > > Your scenario step 3 says "according to the header hdfs.h, after this > call returns, *new readers should be able to see the data*" which isn't > the same as "new readers can obtain an updated property value from the file > metadata" - one is looking at the data inside the container, and the other > is asking the container to describe itself. > > I hope that helps with your problem! > > > *Devin Suiter* > Jr. Data Solutions Software Engineer > 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 > Google Voice: 412-256-8556 | www.rdx.com > > > On Thu, Dec 19, 2013 at 7:50 AM, Xiaobin She <[email protected]> wrote: > >> >> sorry to reply to my own thread. >> >> Does anyone know the answer to this question? >> If so, can you please tell me if my understanding is right or wrong? >> >> thanks. >> >> >> >> 2013/12/17 Xiaobin She <[email protected]> >> >>> hi, >>> >>> I'm using libhdfs to deal with hdfs in an c++ programme. >>> >>> And I have encountered an problem. >>> >>> here is the scenario : >>> 1. first I call hdfsOpenFile with O_WRONLY flag to open an file >>> 2. call hdfsWrite to write some data >>> 3. call hdfsHFlush to flush the data, according to the header hdfs.h, >>> after this call returns, new readers shoule be able to see the data >>> 4. I use an http get request to get the file list on that directionary >>> through the webhdfs interface, >>> here I have to use the webhdfs interface because I need to deal with >>> symlink file >>> 5. from the json response which is returned by the webhdfs, I found that >>> the lenght of the file is still 0. >>> >>> I have tried to replace hdfsHFlush with hdfsFlush or hdfsSync, or call >>> these three together, but still doesn't work. >>> >>> Buf if I call hdfsCloseFile after I call the hdfsHFlush, then I can get >>> the correct file lenght through the webhdfs interface. >>> >>> >>> Is this right? I mean if you want the other process to see the change >>> of data, you need to call hdfsCloseFile? >>> >>> Or is there somethings I did wrong? >>> >>> thank you very much for your help. >>> >>> >>> >>> >>> >> >
