Just a couple of things...
MapR doesn't have the NN limitations.
So if your design requires lots of small files, look at MapR...

You could store your large blobs in a sequence file or series of sequence files 
using HBase to store the index.
Sort of a hybrid approach.

Sent from my iPhone

On Mar 4, 2012, at 11:13 PM, "Rohit Kelkar" <[email protected]> wrote:

> Jacques, I agree that storing files (lets say greater than 15mb) would
> make the namenode run out of space. But what if I make my
> blocksize=15mb ?
> Even I am having the same issue that Konrad is mentioning and I have
> used exactly the approach number 2 that he mentioned.
> 
> - Rohit Kelkar
> 
> On Mon, Mar 5, 2012 at 6:59 AM, Jacques <[email protected]> wrote:
>>>> 2) files bigger than 15MB are stored in HDFS and HBase keeps only some
>> information where file is placed
>> 
>> You're likely to run out of space due to the name node's file count limit
>> if you went with this solution straight.  If you follow this path, you
>> probably would need to either regularily combine files into a Hadoop
>> Archive har file or something similar, or go with another backing
>> filesystem (e.g. mapr) that can support hundreds of millions of files
>> natively...
>> 
>> On Sun, Mar 4, 2012 at 4:12 PM, Konrad Tendera <[email protected]> wrote:
>> 
>>> So, what should I use instead of HBase? I'm wondering about following
>>> solution:
>>> 1) let's say our limit is 15MB - files up to this limit worth to keep in
>>> hbase
>>> 
>>> 
>>> is it appropriate way to solve the problem? Or maybe I should use
>>> http://www.lilyproject.org/ ?
>>> 
>>> 

Reply via email to