Is there a limiting factor/setting that limits/controls the bandwidth on HBase nodes? I know there is a number to be set on zoo.cfg to increase the number of incoming connections.
Though I am using a 15 Gigabit ethernet card, I can see only 50-100MB/s of transfer per node (from clients) via ganglia. Viv On Thu, Mar 24, 2011 at 8:42 PM, Ted Dunning <[email protected]> wrote: > > Something is just wrong. You should be able to do 17,000 records from a > few nodes with multiple threads against a fairly small cluster. You should > be able to come close to that from a single node into a dozen region > servers. > > > On Thu, Mar 24, 2011 at 5:32 PM, Vivek Krishna <[email protected]>wrote: > >> I have a total of 10 clients-nodes with 3-10 threads running on each node. >> Record size ~1K >> >> Viv >> >> >> >> >> On Thu, Mar 24, 2011 at 8:28 PM, Ted Dunning <[email protected]>wrote: >> >>> Are you putting this data from a single host? Is your sender >>> multi-threaded? >>> >>> I note that (20 GB / 20 minutes < 20 MB / s) so you aren't particularly >>> stressing the network. You would likely be stressing a single threaded >>> client pretty severely. >>> >>> What is your record size? It may be that you are bound up by the number >>> of records being inserted rather than the total data size. >>> >>> On Thu, Mar 24, 2011 at 5:22 PM, Vivek Krishna <[email protected]>wrote: >>> >>>> Data Size - 20 GB. It took about an hour with default hbase setting and >>>> after varying several parameters, we were able to get this done in ~20 >>>> minutes. This is slow and we are trying to improve. >>>> >>>> We wrote a java client which would essentially `put` to hbase tables in >>>> batches. Our fine-tuning parameters include, >>>> 1. Disabling compaction >>>> 2. Varying batch sizes of put ( tried with 1000, 5000, 10000, 20000, >>>> 40000 >>>> ) >>>> 3. Setting AutoFlush to on/off. >>>> 4. Varying write buffer(in client) with 2mb, 128mb,256mb >>>> 5. Changing regionserver.handler.count to 100 >>>> 6. Varying regionserver size from 128 to 256/512/1024. >>>> 7. Increasing number of regions. >>>> 8. Creating regions with keys pre-specified (so that clients hit the >>>> regions directly) >>>> 9. Varying number of clients (from 30 clients to 100 clients) >>>> >>>> The above was tested on a 38 node cluster with 2 regions each. >>>> >>>> We did not try disabling WAL fearing loss of data. >>>> >>>> Are there any other parameters that we missed during the process? >>>> >>>> >>>> Viv >>>> >>> >>> >> >
