Yes, checksum match is checked for every form of read (unless explicitly disabled). By default, a checksum is generated and stored for every 512 bytes of data (io.bytes.per.checksum), so only the relevant parts are checked vs. the whole file when doing a partial read.
On Mon, 18 Sep 2017 at 19:23 Ralph Soika <ralph.so...@imixs.com> wrote: > Hi, > > I have a question about the read behavior of partial read in a large data > file. > I want to implement a archive solution where I append smaller XML files > into a big archive file via WebHDFS. > For each new added file, my client stores the offset and size of the xml > file appended into the archive file. > Wen I later need to read a XML file from the big archive file, I use the > 'offset' and 'length' parameter to read only a part of the file: > > http://<HOST>:/webhdfs/v1/<PATH>?op=OPEN[&offset=<LONG>][&length=<LONG>] > > > My question now is: Is in this case Hadoop verifying the checksum to > guaranties the data integrity of the partial read? > > I guess only the checksum of the affected block will be verified but not > the complete archive file? > Or is partial read a performance issue? > > Thanks for help in advance > > === > Ralph > > -- > *Imixs*...extends the way people work together > We are an open source company, read more at: www.imixs.org > ------------------------------ > Imixs Software Solutions GmbH > Agnes-Pockels-Bogen 1, 80992 München > <https://maps.google.com/?q=Agnes-Pockels-Bogen+1,+80992+M%C3%BCnchen&entry=gmail&source=g> > *Web:* www.imixs.com > *Office:* +49 (0)89-452136 16 <+49%2089%2045213616> *Mobil:* > +49-177-4128245 <+49%20177%204128245> > Registergericht: Amtsgericht Muenchen, HRB 136045 > Geschaeftsfuehrer: Gaby Heinle u. Ralph Soika >