If you only have 4G available, >=2G is probably a little excessive for the OS :)

On 6/25/14, 3:30 PM, Sean Busbey wrote:
you can also calculate how much memory you need to have (or your cluster
management software can do it for you).

Things to factor:

OS needs (>= 2GB)
DataNode
TaskTracker (or NodeManager depending on MRv1 vs YARN)
task memory (child slots * per-child max under MRv1)
TServer Java Heap
TServer native map

Plus any other processes you regularly run on those nodes.


On Wed, Jun 25, 2014 at 2:07 PM, John Vines <[email protected]
<mailto:[email protected]>> wrote:

    It's also possible that you're overscribing your memory on the
    overall system between the tservers and the MR slots. Check yoru
    syslogs and see if there's anything about killing java processes.


    On Wed, Jun 25, 2014 at 3:05 PM, Jacob Rust <[email protected]
    <mailto:[email protected]>> wrote:

        I will play around with the memory settings some more, it sounds
        like that is definitely it. Thanks everyone!


        On Wed, Jun 25, 2014 at 2:55 PM, Josh Elser
        <[email protected] <mailto:[email protected]>> wrote:

            The lack of exception in the debug log makes it seem even
            more likely that you just got an OOME.

            It's a crap-shoot as to whether or not you'll actually get
            the Exception printed in the log, but you should always get
            it in the .out/.err files as previously mentioned.


            On 6/25/14, 2:44 PM, Jacob Rust wrote:

                Ah, here is the right log: http://pastebin.com/DLEzLGqN

                I will double check which example. Thanks.


                On Wed, Jun 25, 2014 at 2:38 PM, John Vines
                <[email protected] <mailto:[email protected]>
                <mailto:[email protected] <mailto:[email protected]>>> wrote:

                     And you're certain your using the standalone
                example and not the
                     native-standalone? Those expect the native
                libraries to be extant
                     and if not will eventually cause an OOM.


                     On Wed, Jun 25, 2014 at 2:33 PM, Jacob Rust
                <[email protected] <mailto:[email protected]>
                     <mailto:[email protected]
                <mailto:[email protected]>>__> wrote:

                         Accumulo version   1.5.1.2.1.2.1-471
                         Hadoop version 2.4.0.2.1.2.1-471
                <tel:2.4.0.2.1.2.1-471> <tel:2.4.0.2.1.2.1-471
                <tel:2.4.0.2.1.2.1-471>>

                         tserver debug log http://pastebin.com/BHdTkxeK

                         I what you mean about the memory. I am using
                the memory settings
                         from the example files
                
https://github.com/apache/__accumulo/tree/master/conf/__examples/512MB/standalone
                
<https://github.com/apache/accumulo/tree/master/conf/examples/512MB/standalone>.
                         I also ran into this problem using the 1GB
                example memory
                         settings. Each node has 4GB RAM.

                         Thanks


                         On Wed, Jun 25, 2014 at 2:10 PM, Sean Busbey
                         <[email protected]
                <mailto:[email protected]> <mailto:[email protected]
                <mailto:[email protected]>>> wrote:

                             What version of Accumulo?

                             What version of Hadoop?

                             What does your server memory and per-role
                allocation look like?

                             Can you paste the tserver debug log?



                             On Wed, Jun 25, 2014 at 1:01 PM, Jacob Rust
                             <[email protected]
                <mailto:[email protected]>
                <mailto:[email protected]
                <mailto:[email protected]>>__> wrote:

                                 I am trying to create an inverted text
                index for a table
                                 using accumulo input/output format in a
                java
                                 mapreduce program.  When the job
                reaches the reduce
                                 phase and creates the table / tries to
                write to it the
                                 tablet servers begin to die.

                                 Now when I do a start-all.sh the tablet
                servers start
                                 for about a minute and then die again.
                Any idea as to
                                 why the mapreduce job is killing the
                tablet servers
                                 and/or how to bring the tablet servers
                back up without
                                 failing?

                                 This is on a 12 node cluster with low
                quality hardware.
                                 The java code I am running is here
                http://pastebin.com/ti7Qz19m

                                 The log files on each tablet server
                only display the
                                 startup information, no errors. The log
                files on the
                                 master server show these errors
                http://pastebin.com/LymiTfB7




                                 --
                                 Jacob Rust
                                 Software Intern




                             --
                             Sean




                         --
                         Jacob Rust
                         Software Intern





                --
                Jacob Rust
                Software Intern




        --
        Jacob Rust
        Software Intern





--
Sean

Reply via email to