Can you provide more links to comments in jira mentioning "loss of zero copy 
reads"?

Basically what this is referring to are changes made in the 0.20 release of 
HBase related to the block-based HFile format, the KeyValue data pointer, and 
other stuff like the Result client return type and the block cache.

Previously (in 0.19 and before), when executing read queries, we would make 
copies of the values we were reading into separate byte arrays to return back 
to the client.  There wasn't much of a way around this until the introduction 
of blocks and KeyValue.

Now, once we read in a block from an HFile (which contains a bunch of KeyValues 
appended to each other), we don't physically copy the bytes anymore.  Rather, 
we use KeyValue to point to the different KVs contained in the block.  
Underneath, KeyValue is nothing more than a byte[], offset, and length 
(essentially, a pointer into a larger byte[]).

We pass these KeyValues (which really point into larger blocks) all the way 
back to the client via the Result data type.

Does that make sense?

As far as I know, nothing has changed this in 0.20 or trunk.

JG

> -----Original Message-----
> From: Andrew Nguyen [mailto:[email protected]]
> Sent: Tuesday, July 27, 2010 10:10 AM
> To: [email protected]
> Subject: Zero-copy reads
> 
> Hello all,
> 
> I recently saw some references to "zero copy reads" in Lars' blog post
> as well as some powerpoints, jira comments, etc.
> 
> Is there any additional information available on this topic?  I saw
> some comments in jira that mentioned the loss of zero copy reads, while
> others mention that it's a feature.
> 
> Thanks!
> 
> --Andrew

Reply via email to