Re: prefix compression implementation

2011-09-20 Thread Matt Corgan
bringing all questions into a single email: stack I'd say call it Cell rather than HCell. i did think the H was a very simple way to add uniqueness, like isn't HFile a big win over File? there are already two other classes called Cell in hbase (guava and REST gateway). another option could be

Re: HBase as a large, auto-partitioned, sorted, *in-memory* database (was: Re: prefix compression implementation)

2011-09-20 Thread Matt Corgan
inline below: On Mon, Sep 19, 2011 at 10:08 PM, Stack st...@duboce.net wrote: Excellent summary Matt. Some notes in the below. On Sun, Sep 18, 2011 at 6:43 PM, Matt Corgan mcor...@hotpads.com wrote: ... All of this is relatively easy for the data and index blocks because they're

Re: HBase as a large, auto-partitioned, sorted, *in-memory* database (was: Re: prefix compression implementation)

2011-09-20 Thread Jacek Migdal
My notes. By the way, Matt I reviewed your change it is mostly ok, at least one trivial change is needed. On 9/20/11 11:23 AM, Matt Corgan mcor...@hotpads.com wrote: inline below: On Mon, Sep 19, 2011 at 10:08 PM, Stack st...@duboce.net wrote: Excellent summary Matt. Some notes in the

Re: prefix compression implementation

2011-09-20 Thread Matt Corgan
jacek It is a huge chance. It would be great if we could prototype a few things. Especially I would like to avoid any optimizations before we know a got way to measure them. matt agree. i'm not in a rush to get any of this integrated, just trying to feel out the right long-term strategy. do

Re: prefix compression implementation

2011-09-19 Thread Ryan Rawson
I was just pushing back at the idea of 'turn everything into interfaces! problem solved!', and thinking about what was really necessary to get to where you want to go... On Mon, Sep 19, 2011 at 3:26 PM, Matt Corgan mcor...@hotpads.com wrote: Ryan - i answered your question on another thread

Re: prefix compression implementation

2011-09-19 Thread Stack
One other thought is that exposing ByteRange, ByteBuffer, and v1 array stuff in Interface seems like you are exposing 'implementation' details that perhaps shouldn't show through. I'm guessing its unavoidable though if the Interface is to be used in a few different contexts: i.e. v1 has to work

Re: prefix compression implementation

2011-09-19 Thread Ryan Rawson
So if the HCell or whatever ends up returning ByteBuffers, then that plays straight in to scatter/gather NIO calls, and if some of them are DBB, then so much the merrier. For example, the thrift stuff takes ByteBuffers when its calling for a byte sequence. -ryan On Mon, Sep 19, 2011 at 10:39

Re: HBase as a large, auto-partitioned, sorted, *in-memory* database (was: Re: prefix compression implementation)

2011-09-18 Thread Matt Corgan
compression implementation Ryan - thanks for the feedback. The situation I'm thinking of where it's useful to parse DirectBB without copying to heap is when you are serving small random values out of the block cache. At HotPads, we'd like to store hundreds of GB of real estate listing data in memory

Re: prefix compression implementation

2011-09-16 Thread Matt Corgan
Jacek, Thanks for helping out with this. I implemented most of the DeltaEncoder and DeltaEncoderSeeker. I haven't taken the time to generate a good set of test data for any of this, but it does pass on some very small input data that aims to cover the edge cases i can think of. Perhaps you

Re: prefix compression implementation

2011-09-16 Thread Ryan Rawson
Hey this stuff looks really interesting! On the ByteBuffer, the 'array' byte[] access to the underlying data is totally incompatible with the 'off heap' features that are implemented by DirectByteBuffer. While people talk about DBB in terms of nio performance, if you have to roundtrip the data

Re: prefix compression implementation

2011-09-16 Thread Ryan Rawson
On Fri, Sep 16, 2011 at 6:47 PM, Matt Corgan mcor...@hotpads.com wrote: I'm a little confused over the direction of the DBBs in general, hence the lack of clarity in my code. I see value in doing fine-grained parsing of the DBB if you're going to have a large block of data and only want to

Re: prefix compression implementation

2011-09-16 Thread Matt Corgan
Ryan - thanks for the feedback. The situation I'm thinking of where it's useful to parse DirectBB without copying to heap is when you are serving small random values out of the block cache. At HotPads, we'd like to store hundreds of GB of real estate listing data in memory so it can be quickly

Re: prefix compression implementation

2011-09-16 Thread Ryan Rawson
On Fri, Sep 16, 2011 at 7:29 PM, Matt Corgan mcor...@hotpads.com wrote: Ryan - thanks for the feedback.  The situation I'm thinking of where it's useful to parse DirectBB without copying to heap is when you are serving small random values out of the block cache.  At HotPads, we'd like to store

Re: prefix compression implementation

2011-09-14 Thread Jacek Migdal
Matt, Thanks a lot for the code. Great job! As I mentioned in JIRA I work full time on the delta encoding [1]. Right now the code and integration is almost done. Most of the parts are under review. Since it is a big change will plan to test it very carefully. After that, It will be ported to

prefix compression implementation

2011-09-13 Thread Matt Corgan
Hi devs, I put a developer preview of a prefix compression algorithm on github. It still needs some details worked out, a full set of iterators, about 200 optimizations, and a bunch of other stuff... but, it successfully passes some preliminary tests so I thought I'd get it in front of more

Re: prefix compression implementation

2011-09-13 Thread Ted Yu
Matt: Thanks for the update. Cacheable interface is defined in: src/main/java/org/apache/hadoop/hbase/io/hfile/Cacheable.java You can find the implementation at: src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java I will browse your code later. On Tue, Sep 13, 2011 at 12:44 AM, Matt