Will do. I'll also see if I can reproduce it on an unclass system as well so I can more easily share whatever turns up. On Dec 16, 2013 10:24 AM, "Eric Newton" <[email protected]> wrote:
> I didn't leave it running. > > Since you have been able reproduce the problem, could you dump the memory > after a few days, and see if there's any obvious reasons why it would run > out of memory? It just doesn't make any sense. > > -Eric > > > On Fri, Dec 13, 2013 at 4:17 PM, Terry P. <[email protected]> wrote: > >> Hi Eric, >> Did your standby GC eventually fail on you with an OOME? >> >> I was able to reproduce the standby GC failure after about 6 days running >> in standby mode again on my Ops cluster after a cluster restart following >> the Thanksgiving holiday week. >> >> Thanks, >> Terry >> >> >> >> On Wed, Dec 4, 2013 at 4:33 PM, Terry P. <[email protected]> wrote: >> >>> Thanks for testing Eric. I have nproc set hard to 32000 on all my nodes, >>> and just double checked it's correct on the Secondary Namenode where this >>> is happening on. nofile is set hard to 64000 so it's not a files/sockets >>> issue either. >>> >>> Please let me know if it eventually fails for you. >>> >>> >>> >>> On Wed, Dec 4, 2013 at 3:41 PM, Eric Newton <[email protected]>wrote: >>> >>>> I fired up a standby GC after I reduced the wait time between zookeeper >>>> lock checks to 10ms, and changed the memory from 256M to 56M. It's been >>>> running for the last hour. I didn't expect it to fail, but I wanted to >>>> make sure it wasn't reproducible. >>>> >>>> It's possible that you are running up against an nproc limit. We've >>>> seen out of memory issues when the JVM can't create a new thread. >>>> >>>> (I ran the test with 1.4.5-SNAPSHOT) >>>> >>>> -Eric >>>> >>>> >>>> On Wed, Dec 4, 2013 at 2:31 PM, Eric Newton <[email protected]>wrote: >>>> >>>>> I don't know if anyone is running a standby GC. Can you go ahead and >>>>> open a ticket? >>>>> >>>>> -Eric >>>>> >>>>> >>>>> >>>>> On Wed, Dec 4, 2013 at 2:29 PM, Terry P. <[email protected]> wrote: >>>>> >>>>>> Greetings folks, >>>>>> With Accumulo 1.4.2 I'm running Standby Master and GC processes on my >>>>>> Secondary Namenode. I've found that my Standby GC gets terminated due to >>>>>> OutOfMemoryError errors after about 6 days of running, even though it is >>>>>> running in standby mode only. The Standby Master is still running fine >>>>>> after 3 weeks of standby mode. >>>>>> >>>>>> My accumulo-env.sh script is using the 3GB environment default GC >>>>>> memory options of -Xmx256m -Xms256m, same as on the Master where the >>>>>> primary GC runs and has never gotten an OOME. At this point I don't see >>>>>> any >>>>>> reason to try increasing that as my bet is it will only delay the OOME. >>>>>> >>>>>> Anyone else running standby GC and running into this (or working >>>>>> fine)? >>>>>> >>>>> >>>>> >>>> >>> >> >
