On Wed, Aug 18, 2010 at 4:47 PM, Stuart Smith <[email protected]> wrote:
>
> Hello,
>
>  I was wondering if there are any plans for a stream interface to Cell data. 
> I saw this:
>
>> > or they are using large client write buffers so big
>> payloads are being
>> > passed to the server in each RPC request.  Our
>> RPC is not streaming.
>
> So I'm guessing there's not one now (and I couldn't find one in 0.20.6 
> either). HDFS does seem to provide a stream interface (I'm about to try it 
> out).
>
> So is there a fundamental limitation on hbase that prevents a streaming
> interface to Cells, is it possible but distasteful for some reason, or is it 
> just a TODO item?
>



Our RPC doesn't do streaming.

A streaming/chunking protocol would be nice -- there is even an old
issue to do it -- but I think general consensus is that its low
priority (do you think different)?

Also, if your cells are large, you might consider keeping the content
in hdfs and their location up in hbase.  If the cell is 100MB, the
lookup in hbase pales beside the time to stream from hdfs.


> I'm thinking this could help alleviate the Big Cell OOME situation. This 
> would be especially handy if you just have a few outlier cells that are 
> really big, and lots of smaller ones.
>

Big cell OOME is rare, unless I'm mistaken.  Or saying it another way,
its rare in my experience that hbase is used hosting big cells.  We
should add better cell size checks out on client and like the
speed-limiter on your hertz ferrari, it'll keep you safe at least
until you go out of your way to dismantle the check.

> Right now I'm just going with the solution of putting a layer on top of my 
> system that writes filemetadata and most (smaller) files to hbase, and the 
> occasional big file to HDFS. This should work, and is probably best in the 
> long run, but a streaming interface would be handy!
>

Oh, yeah, this is a bit of a pain having to handle two sources for
data.  Does your dataset fluctuate wildly in its size?   Is there a
way you can separate the big from the small?  If so, perhaps you could
model it so the big was in one column family and the small in another.
 The big column family held the hdfs location where the small-data
column family actually carried the data?

St.Ack

Reply via email to