Re: is my hbase cluster overloaded?

lars hofhansl Tue, 22 Apr 2014 21:43:26 -0700

What makes you say this?

HBase has a lot of very short lived garbage (like KeyValue objects that do not 
outlive an RPC request) and a lot of long lived data in the memstore and the 
block cache. We want to avoid accumulating the short lived garbage and at the 
same time leave most heap for memstores and blockcache.


A small eden size of 512mb or even less makes sense to me.

-- Lars



----- Original Message -----
From: Azuryy Yu <[email protected]>
To: [email protected]
Cc: 
Sent: Tuesday, April 22, 2014 12:02 AM
Subject: Re: is my hbase cluster overloaded?

Do you still have the same issue?

and:
-Xmx8000m -server -XX:NewSize=512m -XX:MaxNewSize=512m

the Eden size is too small.




On Tue, Apr 22, 2014 at 2:55 PM, Li Li <[email protected]> wrote:

> <property>
>   <name>dfs.datanode.handler.count</name>
>   <value>100</value>
>   <description>The number of server threads for the datanode.</description>
> </property>
>
>
> 1. namenode/master  192.168.10.48
> http://pastebin.com/7M0zzAAc
>
> $free -m (this is value when I restart the hadoop and hbase now, not
> the value when it crashed)
>              total       used       free     shared    buffers     cached
> Mem:         15951       3819      12131          0        509       1990
> -/+ buffers/cache:       1319      14631
> Swap:         8191          0       8191
>
> 2. datanode/region 192.168.10.45
> http://pastebin.com/FiAw1yju
>
> $free -m
>              total       used       free     shared    buffers     cached
> Mem:         15951       3627      12324          0       1516        641
> -/+ buffers/cache:       1469      14482
> Swap:         8191          8       8183
>
> On Tue, Apr 22, 2014 at 2:29 PM, Azuryy Yu <[email protected]> wrote:
> > one big possible issue is that you have a high concurrent request on HDFS
> > or HBASE, then all Data nodes handlers are all busy, then more requests
> are
> > pending, then timeout, so you can try to increase
> > dfs.datanode.handler.count and dfs.namenode.handler.count in the
> > hdfs-site.xml, then restart the HDFS.
> >
> > another, do you have datanode, namenode, region servers JVM options? if
> > they are all by default, then there is also have this issue.
> >
> >
> >
> >
> > On Tue, Apr 22, 2014 at 2:20 PM, Li Li <[email protected]> wrote:
> >
> >> my cluster setup: both 6 machines are virtual machine. each machine:
> >> 4CPU Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz 16GB memory
> >> 192.168.10.48 namenode/jobtracker
> >> 192.168.10.47 secondary namenode
> >> 192.168.10.45 datanode/tasktracker
> >> 192.168.10.46 datanode/tasktracker
> >> 192.168.10.49 datanode/tasktracker
> >> 192.168.10.50 datanode/tasktracker
> >>
> >> hdfs logs around 20:33
> >> 192.168.10.48 namenode log  http://pastebin.com/rwgmPEXR
> >> 192.168.10.45 datanode log http://pastebin.com/HBgZ8rtV (I found this
> >> datanode crash first)
> >> 192.168.10.46 datanode log http://pastebin.com/aQ2emnUi
> >> 192.168.10.49 datanode log http://pastebin.com/aqsWrrL1
> >> 192.168.10.50 datanode log http://pastebin.com/V7C6tjpB
> >>
> >> hbase logs around 20:33
> >> 192.168.10.48 master log http://pastebin.com/2ZfeYA1p
> >> 192.168.10.45 region log http://pastebin.com/idCF2a7Y
> >> 192.168.10.46 region log http://pastebin.com/WEh4dA0f
> >> 192.168.10.49 region log http://pastebin.com/cGtpbTLz
> >> 192.168.10.50 region log http://pastebin.com/bD6h5T6p(very strange,
> >> not log at 20:33, but have log at 20:32 and 20:34)
> >>
> >> On Tue, Apr 22, 2014 at 12:25 PM, Ted Yu <[email protected]> wrote:
> >> > Can you post more of the data node log, around 20:33 ?
> >> >
> >> > Cheers
> >> >
> >> >
> >> > On Mon, Apr 21, 2014 at 8:57 PM, Li Li <[email protected]> wrote:
> >> >
> >> >> hadoop 1.0
> >> >> hbase 0.94.11
> >> >>
> >> >> datanode log from 192.168.10.45. why it shut down itself?
> >> >>
> >> >> 2014-04-21 20:33:59,309 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> >> >> blk_-7969006819959471805_202154 received exception
> >> >> java.io.InterruptedIOException: Interruped while waiting for IO on
> >> >> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
> >> >> left.
> >> >> 2014-04-21 20:33:59,310 ERROR
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> >> DatanodeRegistration(192.168.10.45:50010,
> >> >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
> >> >> infoPort=50075, ipcPort=50020):DataXceiver
> >> >> java.io.InterruptedIOException: Interruped while waiting for IO on
> >> >> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
> >> >> left.
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
> >> >>         at
> >> >>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> >> >>         at
> >> >>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> >> >>         at
> >> java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> >> >>         at
> >> java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> >> >>         at java.io.DataInputStream.read(DataInputStream.java:149)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:265)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
> >> >>         at java.lang.Thread.run(Thread.java:722)
> >> >> 2014-04-21 20:33:59,310 ERROR
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> >> DatanodeRegistration(192.168.10.45:50010,
> >> >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
> >> >> infoPort=50075, ipcPort=50020):DataXceiver
> >> >> java.io.InterruptedIOException: Interruped while waiting for IO on
> >> >> channel java.nio.channels.SocketChannel[closed]. 466924 millis
> timeout
> >> >> left.
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:245)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
> >> >>         at java.lang.Thread.run(Thread.java:722)
> >> >> 2014-04-21 20:34:00,291 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for
> >> >> threadgroup to exit, active threads is 0
> >> >> 2014-04-21 20:34:00,404 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
> >> >> Shutting down all async disk service threads...
> >> >> 2014-04-21 20:34:00,405 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All
> >> >> async disk service threads have been shut down.
> >> >> 2014-04-21 20:34:00,413 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
> >> >> 2014-04-21 20:34:00,424 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> >> >> /************************************************************
> >> >> SHUTDOWN_MSG: Shutting down DataNode at app-hbase-1/192.168.10.45
> >> >> ************************************************************/
> >> >>
> >> >> On Tue, Apr 22, 2014 at 11:25 AM, Ted Yu <[email protected]>
> wrote:
> >> >> > bq. one datanode failed
> >> >> >
> >> >> > Was the crash due to out of memory error ?
> >> >> > Can you post the tail of data node log on pastebin ?
> >> >> >
> >> >> > Giving us versions of hadoop and hbase would be helpful.
> >> >> >
> >> >> >
> >> >> > On Mon, Apr 21, 2014 at 7:39 PM, Li Li <[email protected]>
> wrote:
> >> >> >
> >> >> >> I have a small hbase cluster with 1 namenode, 1 secondary
> namenode, 4
> >> >> >> datanode.
> >> >> >> and the hbase master is on the same machine with namenode, 4 hbase
> >> >> >> slave on datanode machine.
> >> >> >> I found average requests per seconds is about 10,000. and the
> >> clusters
> >> >> >> crashed. and I found the reason is one datanode failed.
> >> >> >>
> >> >> >> the datanode configuration is about 4 cpu core and 10GB memory
> >> >> >> is my cluster overloaded?
> >> >> >>
> >> >>
> >>
>

Re: is my hbase cluster overloaded?

Reply via email to