Re: major hdfs issues

Stack Sat, 12 Mar 2011 10:47:51 -0800

I don't see any bounding in the thrift code.  Asking Bryan....
St.Ack


On Sat, Mar 12, 2011 at 10:04 AM, Jack Levin <[email protected]> wrote:
> So our problem is this: when we restart a region server, or it goes
> down, hbase slows down, while we send super high frequency thrift
> calls from our PHP front-end APP we actually spawn up 20000+ threads on
> thrift, and what this
> does is destroys all memory on the boxes, and causes DNs just to shut
> down, and everything else crash.
>
> Is there a way to put thread limiter on thrift? Maybe 1000 threads MAX?
>
> -Jack
>
> On Sat, Mar 12, 2011 at 3:31 AM, Suraj Varma <[email protected]> wrote:
>
>> >> to:java.lang.OutOfMemoryError: unable to create new native thread
>>
>> This indicates that you are oversubscribed on your RAM to the extent
>> that the JVM doesn't have any space to create native threads (which
>> are allocated outside of the JVM heap.)
>>
>> You may actually have to _reduce_ your heap sizes to allow more space
>> for native threads (do an inventory of all the JVM heaps and don't let
>> it go over about 75% of available RAM.)
>> Another option is to use the -Xss stack size JVM arg to reduce the per
>> thread stack size - set it to 512k or 256k (you may have to
>> experiment/perf test a bit to see what's the optimum size.
>> Or ... get more RAM ...
>>
>> --Suraj
>>
>> On Fri, Mar 11, 2011 at 8:11 PM, Jack Levin <[email protected]> wrote:
>> > I am noticing following errors also:
>> >
>> > 2011-03-11 17:52:00,376 ERROR
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> > 10.103.7.3:50010, storageID=DS-824332190-10.103.7.3-50010-1290043658438,
>> > infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due
>> > to:java.lang.OutOfMemoryError: unable to create new native thread
>> >        at java.lang.Thread.start0(Native Method)
>> >        at java.lang.Thread.start(Thread.java:597)
>> >        at
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:132)
>> >        at java.lang.Thread.run(Thread.java:619)
>> >
>> >
>> > and this:
>> >
>> > nf_conntrack: table full, dropping packet.
>> > nf_conntrack: table full, dropping packet.
>> > nf_conntrack: table full, dropping packet.
>> > nf_conntrack: table full, dropping packet.
>> > nf_conntrack: table full, dropping packet.
>> > nf_conntrack: table full, dropping packet.
>> > net_ratelimit: 10 callbacks suppressed
>> > nf_conntrack: table full, dropping packet.
>> > possible SYN flooding on port 9090. Sending cookies.
>> >
>> > This seems like a network stack issue?
>> >
>> > So, does datanode need higher heap than 1GB?  Or possible we ran out of
>> RAM
>> > for other reasons?
>> >
>> > -Jack
>> >
>> > On Thu, Mar 10, 2011 at 1:29 PM, Ryan Rawson <[email protected]> wrote:
>> >
>> >> Looks like a datanode went down.  InterruptedException is how java
>> >> uses to interrupt IO in threads, its similar to the EINTR errno.  That
>> >> means the actual source of the abort is higher up...
>> >>
>> >> So back to how InterruptedException works... at some point a thread in
>> >> the JVM decides that the VM should abort.  So it calls
>> >> thread.interrupt() on all the threads it knows/cares about to
>> >> interrupt their IO.  That is what you are seeing in the logs. The root
>> >> cause lies above I think.
>> >>
>> >> Look for the first "Exception" string or any FATAL or ERROR strings in
>> >> the datanode logfiles.
>> >>
>> >> -ryan
>> >>
>> >> On Thu, Mar 10, 2011 at 1:03 PM, Jack Levin <[email protected]> wrote:
>> >> > http://pastebin.com/ZmsyvcVc  Here is the regionserver log, they all
>> >> have
>> >> > similar stuff,
>> >> >
>> >> > On Thu, Mar 10, 2011 at 11:34 AM, Stack <[email protected]> wrote:
>> >> >
>> >> >> Whats in the regionserver logs?  Please put up regionserver and
>> >> >> datanode excerpts.
>> >> >> Thanks Jack,
>> >> >> St.Ack
>> >> >>
>> >> >> On Thu, Mar 10, 2011 at 10:31 AM, Jack Levin <[email protected]>
>> wrote:
>> >> >> > All was well, until this happen:
>> >> >> >
>> >> >> > http://pastebin.com/iM1niwrS
>> >> >> >
>> >> >> > and all regionservers went down, is this xciever issue?
>> >> >> >
>> >> >> > <property>
>> >> >> > <name>dfs.datanode.max.xcievers</name>
>> >> >> > <value>12047</value>
>> >> >> > </property>
>> >> >> >
>> >> >> > this is what I have, should I set it higher?
>> >> >> >
>> >> >> > -Jack
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: major hdfs issues

Reply via email to