Can you provide more links to comments in jira mentioning "loss of zero copy reads"?
Basically what this is referring to are changes made in the 0.20 release of HBase related to the block-based HFile format, the KeyValue data pointer, and other stuff like the Result client return type and the block cache. Previously (in 0.19 and before), when executing read queries, we would make copies of the values we were reading into separate byte arrays to return back to the client. There wasn't much of a way around this until the introduction of blocks and KeyValue. Now, once we read in a block from an HFile (which contains a bunch of KeyValues appended to each other), we don't physically copy the bytes anymore. Rather, we use KeyValue to point to the different KVs contained in the block. Underneath, KeyValue is nothing more than a byte[], offset, and length (essentially, a pointer into a larger byte[]). We pass these KeyValues (which really point into larger blocks) all the way back to the client via the Result data type. Does that make sense? As far as I know, nothing has changed this in 0.20 or trunk. JG > -----Original Message----- > From: Andrew Nguyen [mailto:[email protected]] > Sent: Tuesday, July 27, 2010 10:10 AM > To: [email protected] > Subject: Zero-copy reads > > Hello all, > > I recently saw some references to "zero copy reads" in Lars' blog post > as well as some powerpoints, jira comments, etc. > > Is there any additional information available on this topic? I saw > some comments in jira that mentioned the loss of zero copy reads, while > others mention that it's a feature. > > Thanks! > > --Andrew
