Thank you for the suggestions.

So I changed the setup and now have:
1 Master running Namenode, SecondaryNamenode, ZK and the HMaster
7 Slaves running Datanode and Regionserver
2 Clients to insert data


What I forgot in my first post, that sometimes the clients even get a
SocketTimeOutException when inserting the data. (of course during that time
0 inserts are done)
By looking at the logs, (I also turned on the gc logs) I see the following:

Multiple consecutive entries like:
2012-06-21 11:42:13,962 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 6 on 60020' on region
usertable,user600,1340200683555.a45b03dd65a62afa676488921e47dbaa.: memstore
size 1.0g is >= than blocking 1.0g size

Shortly after those entries, many entries like:
2012-06-21 12:43:53,028 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{"processingtimems":35046,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2642a14d),
rpc version=1, client version=29, methodsFingerPrint=-1508511443","client":"
10.110.129.12:54624
","starttimems":1340275397981,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}

Looking at the gc-logs, many entries like:
2870.329: [GC 2870.330: [ParNew: 108450K->3401K(118016K), 0.0182570 secs]
4184711K->4079843K(12569856K), 0.0183510 secs] [Times: user=0.24 sys=0.00,
real=0.01 secs]

But always arround 0.01 secs - 0.04secs.

And also from the gc-log:
2696.013: [CMS-concurrent-sweep: 8.999/10.448 secs] [Times: user=46.93
sys=2.24, real=10.45 secs]

Is the 10.45 secs too long?
Or what exactly should I watch out for in the gc logs?


I also configured ganglia to have a look at some more metrics. Looking at
io_wait (which should matter concerning my question to the disks), I can
observe values between 10 % and 25 % on the regionserver.
Should that be lower?

Btw. I'm using HBase 0.94 and Hadoop 1.0.3.


Thank you again.


Martin



On Wed, Jun 20, 2012 at 7:04 PM, Dave Wang <[email protected]> wrote:

> I'd also remove the DN and RS from the node running ZK, NN, etc. as you
> don't want heavweight processes on that node.
>
> - Dave
>
> On Wed, Jun 20, 2012 at 9:31 AM, Elliott Clark <[email protected]
> >wrote:
>
> > Basically without metrics on what's going on it's tough to know for sure.
> >
> > I would turn on GC logging and make sure that is not playing a part, get
> > metrics on IO while this is going on, and look through the logs to see
> what
> > is happening when you notice the pause.
> >
> > On Wed, Jun 20, 2012 at 6:39 AM, Martin Alig <[email protected]>
> > wrote:
> >
> > > Hi
> > >
> > > I'm doing some evaluations with HBase. The workload I'm facing is
> mainly
> > > insert-only.
> > > Currently I'm inserting 1KB rows, where 100Bytes go into one column.
> > >
> > > I have the following cluster machines at disposal:
> > >
> > > Intel Xeon L5520 2.26 Ghz (Nehalem, with HT enabled)
> > > 24 GiB Memory
> > > 1 GigE
> > > 2x 15k RPM Sas 73 GB (RAID1)
> > >
> > > I have 10 Nodes.
> > > The first node runs:
> > >
> > > Namenode, SecondaryNamenode, Datanode, HMaster, Zookeeper, and a
> > > RegionServer
> > >
> > > The other nodes run:
> > >
> > > Datanode and RegionServer
> > >
> > >
> > > Now running my test client and inserting rows, the throughput goes up
> to
> > > 150'000 inserts/sec. But then after some time the throughput drops down
> > to
> > > 0 inserts/sec for quite some time, before it goes up again.
> > > My assumption is, that it happens when the RegionServers start to write
> > the
> > > data from memory to the disks. I know, that the recommended hardware
> for
> > > HBase should contain multiple disks using JBOD or RAID 0.
> > > But at that point I am limited right now.
> > >
> > > I am just asking if in my hardware setup, the blocking periods are
> really
> > > caused by the non-optimal disk configuration.
> > >
> > >
> > > Thank you in advance for any suggestions.
> > >
> > >
> > > Martin
> > >
> >
>

Reply via email to