I agree with John Vines. -- Christopher L Tubbs II http://gravatar.com/ctubbsii
On Wed, Feb 27, 2013 at 12:32 PM, John Vines <[email protected]> wrote: > I don't like the idea of blending manual logging with log4j in a single > file. It's in the .err file already, I don't think anything else is > necessary. > > > > On Wed, Feb 27, 2013 at 3:27 PM, Adam Fuchs <[email protected]> wrote: >> >> So, question for the community: inside bin/accumulo we have: >> -XX:OnOutOfMemoryError="kill -9 %p" >> Should this also append a log message? Something like: >> -XX:OnOutOfMemoryError="kill -9 %p; echo "ran out of memory >> >> logfilename" >> Is this necessary, or should the OutOfMemoryException still find its way >> to the regular log? >> >> Adam >> >> >> >> On Wed, Feb 27, 2013 at 3:17 PM, Mike Hugo <[email protected]> wrote: >>> >>> I'm chalking this up to a mis-configured server. It looks like during >>> the install on this server the accumulo-env.sh file was copied from the >>> examples, but rather than setting editing it to set the JAVA_HOME, >>> HADOOP_HOME, and ZOOKEEPER_HOME, the entire file contents were replaced with >>> those env variables. >>> >>> I'm assuming this caused us to pick up the default (?) _OPTS settings >>> rather than the correct ones we should have been getting based on our server >>> memory capacity from the examples. So we had a bunch of accumulo related >>> java processes all running with memory settings that were way out of whack >>> from what they should have been. >>> >>> To solve it I copied in the files from the conf/examples directory again >>> and made sure everything was set up correctly and restarted everything. >>> >>> We never did see anything in out log files or .out / .err logs indicating >>> the source of the problem, but the above is my best guess as to what was >>> going on. >>> >>> Thanks again for all the tips and pointers! >>> >>> Mike >>> >>> >>> On Wed, Feb 27, 2013 at 11:24 AM, Adam Fuchs <[email protected]> wrote: >>>> >>>> There are a few primary reasons why your tablet server would die: >>>> 1. Lost lock in Zookeeper. If the tablet server and zookeeper can't >>>> communicate with each other then the lock will timeout and the tablet >>>> server >>>> will kill itself. This should show up as several messages in the tserver >>>> log. If this happens when a tablet server is really busy (lots of threads >>>> doing stuff) then the log message about the lost lock can be pretty far >>>> back >>>> in the queue. Java garbage collection can cause long pauses that inhibit >>>> the >>>> tserver/zookeeper messages. Zookeeper can also get overwhelmed and behave >>>> poorly if the server it's running on swaps it out. >>>> 2. Problems talking with the master. If a tablet server is too slow in >>>> communicating with the master then the master will try to kill it. This >>>> should show up in the master log, and also will be noted in the tserver >>>> log. >>>> 3. Out of memory. If the tserver JVM runs out of memory it will >>>> terminate. As John mentioned, this will be in the .err or .out files in the >>>> log directory. >>>> >>>> Adam >>>> >>>> >>>> >>>> On Wed, Feb 27, 2013 at 12:10 PM, Mike Hugo <[email protected]> wrote: >>>>> >>>>> After running an ingest process via map reduce for about an hour or so, >>>>> one of our tserver fails. It happens pretty consistently, we're able to >>>>> replicate it without too much difficulty. >>>>> >>>>> I'm looking in the $ACCUMULO_HOME/logs directory for clues as to why >>>>> the tserver fails, but I'm not seeing much that points to a cause of the >>>>> tserver going offline. One minute it's there, the next it's offline. >>>>> There are some warnings about the swappiness as well as a large row that >>>>> cannot be spit but other than that, not much else to go on. >>>>> >>>>> Is there anything that could help me figure out *why* the tserver died? >>>>> I'm guessing it's something in our client code or a config that's not >>>>> correct on the server, but it'd be really nice to have a hint before we >>>>> start randomly changing things to see what will fix it. >>>>> >>>>> Thanks, >>>>> >>>>> Mike >>>> >>>> >>> >> >
