Re: inline checksums

2007-01-25 Thread Doug Cutting
Hairong Kuang wrote: If end-to-end is a concern, we could let the client generate the checksums and send it to the data node following the block data. I created an issue in Jira related to this issue: https://issues.apache.org/jira/browse/HADOOP-928 The idea there is to first make it possible

Re: inline checksums

2007-01-24 Thread Tom White
A checksummed filesystem that embeds checksums into data makes the data unapproachable by tools that don't anticipate checksums. In HDFS data is accessible only via the HDFS client so this is not an issue and the checksums can be stripped out before they reach clients. But for Local and S3 where d

Re: inline checksums

2007-01-24 Thread Sameer Paranjpye
A checksum file per block would have many CRCs, one per 64k chunk or so in the block. So it would still permit random access. The datanode would only checksum the data accessed plus on average an extra 32k. Also, if datanodes were to send the checksum after the data on a read, the client could

Re: inline checksums

2007-01-24 Thread Raghu Angadi
Doug Cutting wrote: Hairong Kuang wrote: Another option is to create a checksum file per block at the data node where the block is placed. Yes, but then we'd need a separate checksum implementation for intermediate data, and for other distributed filesystems that don't already guarantee end

RE: inline checksums

2007-01-24 Thread Hairong Kuang
: inline checksums Hairong Kuang wrote: > Another option is to create a checksum file per block at the data node > where the block is placed. Yes, but then we'd need a separate checksum implementation for intermediate data, and for other distributed filesystems that don't already

Re: inline checksums

2007-01-24 Thread Jim White
Doug Cutting wrote: Hairong Kuang wrote: Another option is to create a checksum file per block at the data node where the block is placed. Yes, but then we'd need a separate checksum implementation for intermediate data, and for other distributed filesystems that don't already guarantee e

Re: inline checksums

2007-01-23 Thread Doug Cutting
Hairong Kuang wrote: Another option is to create a checksum file per block at the data node where the block is placed. Yes, but then we'd need a separate checksum implementation for intermediate data, and for other distributed filesystems that don't already guarantee end-to-end data integrity

RE: inline checksums

2007-01-23 Thread Hairong Kuang
Another option is to create a checksum file per block at the data node where the block is placed. This approach clearly separates data and checksums and does not requires too much changes for open(), seek() and length(). For create, when a block is written to a data node, the data node creates a ch