It's also possible that you're overscribing your memory on the overall system between the tservers and the MR slots. Check yoru syslogs and see if there's anything about killing java processes.
On Wed, Jun 25, 2014 at 3:05 PM, Jacob Rust <[email protected]> wrote: > I will play around with the memory settings some more, it sounds like that > is definitely it. Thanks everyone! > > > On Wed, Jun 25, 2014 at 2:55 PM, Josh Elser <[email protected]> wrote: > >> The lack of exception in the debug log makes it seem even more likely >> that you just got an OOME. >> >> It's a crap-shoot as to whether or not you'll actually get the Exception >> printed in the log, but you should always get it in the .out/.err files as >> previously mentioned. >> >> >> On 6/25/14, 2:44 PM, Jacob Rust wrote: >> >>> Ah, here is the right log: http://pastebin.com/DLEzLGqN >>> >>> I will double check which example. Thanks. >>> >>> >>> On Wed, Jun 25, 2014 at 2:38 PM, John Vines <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> And you're certain your using the standalone example and not the >>> native-standalone? Those expect the native libraries to be extant >>> and if not will eventually cause an OOM. >>> >>> >>> On Wed, Jun 25, 2014 at 2:33 PM, Jacob Rust <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Accumulo version 1.5.1.2.1.2.1-471 >>> Hadoop version 2.4.0.2.1.2.1-471 <tel:2.4.0.2.1.2.1-471> >>> >>> tserver debug log http://pastebin.com/BHdTkxeK >>> >>> I what you mean about the memory. I am using the memory settings >>> from the example files >>> https://github.com/apache/accumulo/tree/master/conf/ >>> examples/512MB/standalone. >>> I also ran into this problem using the 1GB example memory >>> settings. Each node has 4GB RAM. >>> >>> Thanks >>> >>> >>> On Wed, Jun 25, 2014 at 2:10 PM, Sean Busbey >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> What version of Accumulo? >>> >>> What version of Hadoop? >>> >>> What does your server memory and per-role allocation look >>> like? >>> >>> Can you paste the tserver debug log? >>> >>> >>> >>> On Wed, Jun 25, 2014 at 1:01 PM, Jacob Rust >>> <[email protected] <mailto:[email protected]>> >>> wrote: >>> >>> I am trying to create an inverted text index for a table >>> using accumulo input/output format in a java >>> mapreduce program. When the job reaches the reduce >>> phase and creates the table / tries to write to it the >>> tablet servers begin to die. >>> >>> Now when I do a start-all.sh the tablet servers start >>> for about a minute and then die again. Any idea as to >>> why the mapreduce job is killing the tablet servers >>> and/or how to bring the tablet servers back up without >>> failing? >>> >>> This is on a 12 node cluster with low quality hardware. >>> The java code I am running is here >>> http://pastebin.com/ti7Qz19m >>> >>> The log files on each tablet server only display the >>> startup information, no errors. The log files on the >>> master server show these errors >>> http://pastebin.com/LymiTfB7 >>> >>> >>> >>> >>> -- >>> Jacob Rust >>> Software Intern >>> >>> >>> >>> >>> -- >>> Sean >>> >>> >>> >>> >>> -- >>> Jacob Rust >>> Software Intern >>> >>> >>> >>> >>> >>> -- >>> Jacob Rust >>> Software Intern >>> >> > > > -- > Jacob Rust > Software Intern >
