[PATCH 4/4] chunkd checksums each block, as it is read from disk

2010-07-18 Thread Jeff Garzik

Note that we are checksumming hot cache data, so SHA1 isn't as
punishing as one might think.


 chunkd/be-fs.c |   51 ++-
 1 file changed, 50 insertions(+), 1 deletion(-)

commit 2211e3b58620093866be4130397cb3b476620725
Author: Jeff Garzik j...@garzik.org
Date:   Sun Jul 18 03:03:35 2010 -0400

[chunkd] checksum data prior to returning via GET

When reading a file off disk, checksum the data after reading from
disk, prior to sending across network to client.  Fail read, if
checksum fails.

This guarantees we will never send corrupted data to a client.

Signed-off-by: Jeff Garzik jgar...@redhat.com

diff --git a/chunkd/be-fs.c b/chunkd/be-fs.c
index 2120991..dce2561 100644
--- a/chunkd/be-fs.c
+++ b/chunkd/be-fs.c
@@ -49,6 +49,10 @@ struct fs_obj {
 
int in_fd;
char*in_fn;
+   off_t   in_pos;
+
+   off_t   tail_pos;
+   size_t  tail_len;
 
size_t  checked_bytes;
SHA_CTX checksum;
@@ -364,6 +368,8 @@ struct backend_obj *fs_obj_new(uint32_t table_id,
if (!obj-csum_tbl)
goto err_out;
obj-csum_tbl_sz = csum_bytes;
+   obj-tail_pos = data_len  ~(CHUNK_BLK_SZ - 1);
+   obj-tail_len = data_len  (CHUNK_BLK_SZ - 1);
 
/* build local fs pathname */
fn = fs_obj_pathname(table_id, key, key_len);
@@ -488,6 +494,8 @@ struct backend_obj *fs_obj_open(uint32_t table_id, const 
char *user,
value_len = GUINT64_FROM_LE(hdr.value_len);
obj-n_blk = GUINT32_FROM_LE(hdr.n_blk);
csum_bytes = obj-n_blk * CHD_CSUM_SZ;
+   obj-tail_pos = value_len  ~(CHUNK_BLK_SZ - 1);
+   obj-tail_len = value_len  (CHUNK_BLK_SZ - 1);
 
/* verify file size large enough to contain value */
tmp64 = value_len + sizeof(hdr) + key_len + csum_bytes;
@@ -571,15 +579,56 @@ void fs_obj_free(struct backend_obj *bo)
free(obj);
 }
 
+static bool can_csum_blk(struct fs_obj *obj, size_t len)
+{
+   if (obj-in_pos  (CHUNK_BLK_SZ - 1))
+   return false;
+
+   if (obj-in_pos == obj-tail_pos  len == obj-tail_len)
+   return true;
+   if (len == CHUNK_BLK_SZ)
+   return true;
+
+   return false;
+}
+
 ssize_t fs_obj_read(struct backend_obj *bo, void *ptr, size_t len)
 {
struct fs_obj *obj = bo-private;
ssize_t rc;
 
rc = read(obj-in_fd, ptr, len);
-   if (rc  0)
+   if (rc  0) {
applog(LOG_ERR, obj read(%s) failed: %s,
   obj-in_fn, strerror(errno));
+   return -errno;
+   }
+
+   if (can_csum_blk(obj, rc)) {
+   unsigned char md[CHD_CSUM_SZ];
+   unsigned int blk_pos;
+   int cmprc;
+
+   SHA1(ptr, rc, md);
+
+   blk_pos = (unsigned int) (obj-in_pos  CHUNK_BLK_ORDER);
+   cmprc = memcmp(md, obj-csum_tbl + (blk_pos * CHD_CSUM_SZ),
+  CHD_CSUM_SZ);
+
+   if (cmprc) {
+   applog(LOG_WARNING, obj(%s) csum failed @ 0x%llx,
+  obj-in_fn,
+  (unsigned long long) obj-in_pos);
+   return -EIO;
+   }
+   } else {
+   applog(LOG_INFO, obj(%s) unaligned read, 0x%x @ 0x%llx,
+  obj-in_fn, len,
+  (unsigned long long) obj-in_pos);
+   
+   }
+
+   obj-in_pos += rc;
 
return rc;
 }
--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2] chunkd: remove sendfile(2) zero-copy support

2010-07-18 Thread Jeff Garzik

On 07/17/2010 11:45 PM, Steven Dake wrote:

On 07/16/2010 10:46 PM, Jeff Garzik wrote:

chunkd: remove sendfile(2) zero-copy support

chunkd will be soon checksumming data in main memory. That removes
the utility of a zero-copy interface which bypasses the on-heap
data requirement.

Signed-off-by: Jeff Garzikjgar...@redhat.com



May be able to use vmsplice with sendfile (if linux is only target
platform). Haven't tried it myself, but the operations look interesting
at achieving zero copy with sockets from memory addresses.


Even though the man pages say only for pipes, this syscall definitely 
works with TCP.  The big question:  is it actually faster than 
read()+write() ?


Years ago, I experimented with using some fancy new Linux-specific 
syscalls in a from-scratch implementation of cp(1).  It turned out that 
read()+write() was faster than other methods.


That was file-file copying.  It's probably worth investigating 
vmsplice() for our file-checksum-TCP case, definitely.


Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html