> From: Stack <[email protected]> >> 3. The size of them varies like this >> 70% from them have their length < 1MB >> 29% from them have their length between 1MB and 10 MB >> 1% from them have their length > 10MB (they can have also > 100MB) > > What David says above though Jack in his yfrog presentation today > talks of storing all images in hbase up to 5MB in size. > > Karthick in his presentation at hadoop summit talked about how once > cells cross a certain size -- he didn't saw what the threshold was I > believe -- then only the metadata is stored in hbase and the content > goes to their "big stuff" system. > > Try it I'd say. If only a few instances of 100MB, HBase might be fine.
I've seen problematic behavior in the past, if you store values larger than 100 MB and then do concurrent scans over table(s) containing many such objects. The default KeyValue size limit is 10 MB. This is usually sufficient. For webtable-like applications I may raise it to 50 MB, and larger objects are not interesting anyway (to me). One reasonable way to handle native storage of large objects in HBase would be to introduce a layer of indirection. Break the large object up into chunks. Store the chunks in a manner that gets good distribution in the keyspace, maybe by SHA-1 hash of the content. Then store an index to the chunks with the key of your choice. Get the key to retrieve the index, then use a MultiAction to retrieve the referenced chunks in parallel. Given large objects you are going to need a number of round trips over the network to pull all of the data anyway. Adding a couple more in the front may not cause the result to fall outside the performance bound of your application. However you will put your client under heap pressure that way, as objects in HBase are fully transferred at once to the client in the RPC response. Another option is to store large objects directly into HDFS and keep only the path to it in HBase. A benefit of this approach is you can stream the data out of HDFS with as little or as much buffering in your application as you would like. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
