On Sun, Nov 21, 2010 at 5:53 AM, Oleg Ruchovets <[email protected]>wrote:
> Hi all, > After testing HBase for few months with very light configurations (5 > machines, 2 TB disk, 8 GB RAM), we are now planing for production. > Our Load - > 1) 50GB log files to process per day by Map/Reduce jobs. > 2) Insert 4-5GB to 3 tables in hbase. > Are these insertions the output of the MR jobs? If so, I would strongly recommend the bulk load functionality. It is somewhere between 10x and 100x more efficient than direct API usage. > 3) Run 10-20 scans per day (scanning about 20 regions in a table). > All this should run in parallel. > Our current configuration can't cope with this load and we are having many > stability issues. > > This is what we have in mind : > 1. Master machine - 32 GB, 4 TB, Two quad core CPUs. > 2. Name node - 16 GB, 2TB, Two quad core CPUs. > we plan to have up to 20 name servers (starting with 5). > > We already read > > http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/ > . > > We would appreciate your feedback on our proposed configuration. > > > Regards Oleg & Lior > -- Todd Lipcon Software Engineer, Cloudera
