Just a couple of things... MapR doesn't have the NN limitations. So if your design requires lots of small files, look at MapR...
You could store your large blobs in a sequence file or series of sequence files using HBase to store the index. Sort of a hybrid approach. Sent from my iPhone On Mar 4, 2012, at 11:13 PM, "Rohit Kelkar" <[email protected]> wrote: > Jacques, I agree that storing files (lets say greater than 15mb) would > make the namenode run out of space. But what if I make my > blocksize=15mb ? > Even I am having the same issue that Konrad is mentioning and I have > used exactly the approach number 2 that he mentioned. > > - Rohit Kelkar > > On Mon, Mar 5, 2012 at 6:59 AM, Jacques <[email protected]> wrote: >>>> 2) files bigger than 15MB are stored in HDFS and HBase keeps only some >> information where file is placed >> >> You're likely to run out of space due to the name node's file count limit >> if you went with this solution straight. If you follow this path, you >> probably would need to either regularily combine files into a Hadoop >> Archive har file or something similar, or go with another backing >> filesystem (e.g. mapr) that can support hundreds of millions of files >> natively... >> >> On Sun, Mar 4, 2012 at 4:12 PM, Konrad Tendera <[email protected]> wrote: >> >>> So, what should I use instead of HBase? I'm wondering about following >>> solution: >>> 1) let's say our limit is 15MB - files up to this limit worth to keep in >>> hbase >>> >>> >>> is it appropriate way to solve the problem? Or maybe I should use >>> http://www.lilyproject.org/ ? >>> >>>
