I don't see any bounding in the thrift code. Asking Bryan.... St.Ack
On Sat, Mar 12, 2011 at 10:04 AM, Jack Levin <[email protected]> wrote: > So our problem is this: when we restart a region server, or it goes > down, hbase slows down, while we send super high frequency thrift > calls from our PHP front-end APP we actually spawn up 20000+ threads on > thrift, and what this > does is destroys all memory on the boxes, and causes DNs just to shut > down, and everything else crash. > > Is there a way to put thread limiter on thrift? Maybe 1000 threads MAX? > > -Jack > > On Sat, Mar 12, 2011 at 3:31 AM, Suraj Varma <[email protected]> wrote: > >> >> to:java.lang.OutOfMemoryError: unable to create new native thread >> >> This indicates that you are oversubscribed on your RAM to the extent >> that the JVM doesn't have any space to create native threads (which >> are allocated outside of the JVM heap.) >> >> You may actually have to _reduce_ your heap sizes to allow more space >> for native threads (do an inventory of all the JVM heaps and don't let >> it go over about 75% of available RAM.) >> Another option is to use the -Xss stack size JVM arg to reduce the per >> thread stack size - set it to 512k or 256k (you may have to >> experiment/perf test a bit to see what's the optimum size. >> Or ... get more RAM ... >> >> --Suraj >> >> On Fri, Mar 11, 2011 at 8:11 PM, Jack Levin <[email protected]> wrote: >> > I am noticing following errors also: >> > >> > 2011-03-11 17:52:00,376 ERROR >> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( >> > 10.103.7.3:50010, storageID=DS-824332190-10.103.7.3-50010-1290043658438, >> > infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due >> > to:java.lang.OutOfMemoryError: unable to create new native thread >> > at java.lang.Thread.start0(Native Method) >> > at java.lang.Thread.start(Thread.java:597) >> > at >> > >> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:132) >> > at java.lang.Thread.run(Thread.java:619) >> > >> > >> > and this: >> > >> > nf_conntrack: table full, dropping packet. >> > nf_conntrack: table full, dropping packet. >> > nf_conntrack: table full, dropping packet. >> > nf_conntrack: table full, dropping packet. >> > nf_conntrack: table full, dropping packet. >> > nf_conntrack: table full, dropping packet. >> > net_ratelimit: 10 callbacks suppressed >> > nf_conntrack: table full, dropping packet. >> > possible SYN flooding on port 9090. Sending cookies. >> > >> > This seems like a network stack issue? >> > >> > So, does datanode need higher heap than 1GB? Or possible we ran out of >> RAM >> > for other reasons? >> > >> > -Jack >> > >> > On Thu, Mar 10, 2011 at 1:29 PM, Ryan Rawson <[email protected]> wrote: >> > >> >> Looks like a datanode went down. InterruptedException is how java >> >> uses to interrupt IO in threads, its similar to the EINTR errno. That >> >> means the actual source of the abort is higher up... >> >> >> >> So back to how InterruptedException works... at some point a thread in >> >> the JVM decides that the VM should abort. So it calls >> >> thread.interrupt() on all the threads it knows/cares about to >> >> interrupt their IO. That is what you are seeing in the logs. The root >> >> cause lies above I think. >> >> >> >> Look for the first "Exception" string or any FATAL or ERROR strings in >> >> the datanode logfiles. >> >> >> >> -ryan >> >> >> >> On Thu, Mar 10, 2011 at 1:03 PM, Jack Levin <[email protected]> wrote: >> >> > http://pastebin.com/ZmsyvcVc Here is the regionserver log, they all >> >> have >> >> > similar stuff, >> >> > >> >> > On Thu, Mar 10, 2011 at 11:34 AM, Stack <[email protected]> wrote: >> >> > >> >> >> Whats in the regionserver logs? Please put up regionserver and >> >> >> datanode excerpts. >> >> >> Thanks Jack, >> >> >> St.Ack >> >> >> >> >> >> On Thu, Mar 10, 2011 at 10:31 AM, Jack Levin <[email protected]> >> wrote: >> >> >> > All was well, until this happen: >> >> >> > >> >> >> > http://pastebin.com/iM1niwrS >> >> >> > >> >> >> > and all regionservers went down, is this xciever issue? >> >> >> > >> >> >> > <property> >> >> >> > <name>dfs.datanode.max.xcievers</name> >> >> >> > <value>12047</value> >> >> >> > </property> >> >> >> > >> >> >> > this is what I have, should I set it higher? >> >> >> > >> >> >> > -Jack >> >> >> > >> >> >> >> >> > >> >> >> > >> >
