you can also calculate how much memory you need to have (or your cluster management software can do it for you).
Things to factor: OS needs (>= 2GB) DataNode TaskTracker (or NodeManager depending on MRv1 vs YARN) task memory (child slots * per-child max under MRv1) TServer Java Heap TServer native map Plus any other processes you regularly run on those nodes. On Wed, Jun 25, 2014 at 2:07 PM, John Vines <[email protected]> wrote: > It's also possible that you're overscribing your memory on the overall > system between the tservers and the MR slots. Check yoru syslogs and see if > there's anything about killing java processes. > > > On Wed, Jun 25, 2014 at 3:05 PM, Jacob Rust <[email protected]> wrote: > >> I will play around with the memory settings some more, it sounds like >> that is definitely it. Thanks everyone! >> >> >> On Wed, Jun 25, 2014 at 2:55 PM, Josh Elser <[email protected]> wrote: >> >>> The lack of exception in the debug log makes it seem even more likely >>> that you just got an OOME. >>> >>> It's a crap-shoot as to whether or not you'll actually get the Exception >>> printed in the log, but you should always get it in the .out/.err files as >>> previously mentioned. >>> >>> >>> On 6/25/14, 2:44 PM, Jacob Rust wrote: >>> >>>> Ah, here is the right log: http://pastebin.com/DLEzLGqN >>>> >>>> I will double check which example. Thanks. >>>> >>>> >>>> On Wed, Jun 25, 2014 at 2:38 PM, John Vines <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> And you're certain your using the standalone example and not the >>>> native-standalone? Those expect the native libraries to be extant >>>> and if not will eventually cause an OOM. >>>> >>>> >>>> On Wed, Jun 25, 2014 at 2:33 PM, Jacob Rust <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Accumulo version 1.5.1.2.1.2.1-471 >>>> Hadoop version 2.4.0.2.1.2.1-471 <tel:2.4.0.2.1.2.1-471> >>>> >>>> tserver debug log http://pastebin.com/BHdTkxeK >>>> >>>> I what you mean about the memory. I am using the memory settings >>>> from the example files >>>> https://github.com/apache/accumulo/tree/master/conf/ >>>> examples/512MB/standalone. >>>> I also ran into this problem using the 1GB example memory >>>> settings. Each node has 4GB RAM. >>>> >>>> Thanks >>>> >>>> >>>> On Wed, Jun 25, 2014 at 2:10 PM, Sean Busbey >>>> <[email protected] <mailto:[email protected]>> wrote: >>>> >>>> What version of Accumulo? >>>> >>>> What version of Hadoop? >>>> >>>> What does your server memory and per-role allocation look >>>> like? >>>> >>>> Can you paste the tserver debug log? >>>> >>>> >>>> >>>> On Wed, Jun 25, 2014 at 1:01 PM, Jacob Rust >>>> <[email protected] <mailto:[email protected]>> >>>> wrote: >>>> >>>> I am trying to create an inverted text index for a table >>>> using accumulo input/output format in a java >>>> mapreduce program. When the job reaches the reduce >>>> phase and creates the table / tries to write to it the >>>> tablet servers begin to die. >>>> >>>> Now when I do a start-all.sh the tablet servers start >>>> for about a minute and then die again. Any idea as to >>>> why the mapreduce job is killing the tablet servers >>>> and/or how to bring the tablet servers back up without >>>> failing? >>>> >>>> This is on a 12 node cluster with low quality hardware. >>>> The java code I am running is here >>>> http://pastebin.com/ti7Qz19m >>>> >>>> The log files on each tablet server only display the >>>> startup information, no errors. The log files on the >>>> master server show these errors >>>> http://pastebin.com/LymiTfB7 >>>> >>>> >>>> >>>> >>>> -- >>>> Jacob Rust >>>> Software Intern >>>> >>>> >>>> >>>> >>>> -- >>>> Sean >>>> >>>> >>>> >>>> >>>> -- >>>> Jacob Rust >>>> Software Intern >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Jacob Rust >>>> Software Intern >>>> >>> >> >> >> -- >> Jacob Rust >> Software Intern >> > > -- Sean
