Re: Mapreduce output format killing tablet servers

Sean Busbey Wed, 25 Jun 2014 13:28:30 -0700

What is the available memory?


On Wed, Jun 25, 2014 at 3:22 PM, Donald Miner <[email protected]>
wrote:

> This is what Jacob is running on:
> https://twitter.com/donaldpminer/status/398514283547328512
>
> 12x 13" 2011 MacBook Pros.
>
> The poor guy is my summer intern and what we keep telling him is that this
> is "building character". Kids these days with their 256GB of RAM!
>
> The plan here is to get something working, not necessarily working well.
> Just to test things in a more realistic manner than on a local group of VMs
> (although not totally realistic since the hardware is crap). Plus I think
> it is cute and it keeps my office warm. We've seen local groups of vms on a
> workstation outperform this.
>
> -d
>
>
> On Wed, Jun 25, 2014 at 3:42 PM, Sean Busbey <[email protected]> wrote:
>
>> if you only have 4G available, I'm not sure what kind of Hadoop cluster
>> you expect to be able to run, let alone Accumulo. ;)
>>
>> -Sean
>>
>>
>> On Wed, Jun 25, 2014 at 2:34 PM, Josh Elser <[email protected]> wrote:
>>
>>> If you only have 4G available, >=2G is probably a little excessive for
>>> the OS :)
>>>
>>>
>>> On 6/25/14, 3:30 PM, Sean Busbey wrote:
>>>
>>>> you can also calculate how much memory you need to have (or your cluster
>>>> management software can do it for you).
>>>>
>>>> Things to factor:
>>>>
>>>> OS needs (>= 2GB)
>>>> DataNode
>>>> TaskTracker (or NodeManager depending on MRv1 vs YARN)
>>>> task memory (child slots * per-child max under MRv1)
>>>> TServer Java Heap
>>>> TServer native map
>>>>
>>>> Plus any other processes you regularly run on those nodes.
>>>>
>>>>
>>>> On Wed, Jun 25, 2014 at 2:07 PM, John Vines <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>>     It's also possible that you're overscribing your memory on the
>>>>     overall system between the tservers and the MR slots. Check yoru
>>>>     syslogs and see if there's anything about killing java processes.
>>>>
>>>>
>>>>     On Wed, Jun 25, 2014 at 3:05 PM, Jacob Rust <[email protected]
>>>>     <mailto:[email protected]>> wrote:
>>>>
>>>>         I will play around with the memory settings some more, it sounds
>>>>         like that is definitely it. Thanks everyone!
>>>>
>>>>
>>>>         On Wed, Jun 25, 2014 at 2:55 PM, Josh Elser
>>>>         <[email protected] <mailto:[email protected]>> wrote:
>>>>
>>>>             The lack of exception in the debug log makes it seem even
>>>>             more likely that you just got an OOME.
>>>>
>>>>             It's a crap-shoot as to whether or not you'll actually get
>>>>             the Exception printed in the log, but you should always get
>>>>             it in the .out/.err files as previously mentioned.
>>>>
>>>>
>>>>             On 6/25/14, 2:44 PM, Jacob Rust wrote:
>>>>
>>>>                 Ah, here is the right log: http://pastebin.com/DLEzLGqN
>>>>
>>>>                 I will double check which example. Thanks.
>>>>
>>>>
>>>>                 On Wed, Jun 25, 2014 at 2:38 PM, John Vines
>>>>                 <[email protected] <mailto:[email protected]>
>>>>                 <mailto:[email protected] <mailto:[email protected]>>>
>>>> wrote:
>>>>
>>>>                      And you're certain your using the standalone
>>>>                 example and not the
>>>>                      native-standalone? Those expect the native
>>>>                 libraries to be extant
>>>>                      and if not will eventually cause an OOM.
>>>>
>>>>
>>>>                      On Wed, Jun 25, 2014 at 2:33 PM, Jacob Rust
>>>>                 <[email protected] <mailto:[email protected]>
>>>>                      <mailto:[email protected]
>>>>
>>>>                 <mailto:[email protected]>>__> wrote:
>>>>
>>>>                          Accumulo version   1.5.1.2.1.2.1-471
>>>>                          Hadoop version 2.4.0.2.1.2.1-471
>>>>                 <tel:2.4.0.2.1.2.1-471> <tel:2.4.0.2.1.2.1-471
>>>>
>>>>                 <tel:2.4.0.2.1.2.1-471>>
>>>>
>>>>                          tserver debug log http://pastebin.com/BHdTkxeK
>>>>
>>>>                          I what you mean about the memory. I am using
>>>>                 the memory settings
>>>>                          from the example files
>>>>                 https://github.com/apache/__
>>>> accumulo/tree/master/conf/__examples/512MB/standalone
>>>>                 <https://github.com/apache/accumulo/tree/master/conf/
>>>> examples/512MB/standalone>.
>>>>
>>>>                          I also ran into this problem using the 1GB
>>>>                 example memory
>>>>                          settings. Each node has 4GB RAM.
>>>>
>>>>                          Thanks
>>>>
>>>>
>>>>                          On Wed, Jun 25, 2014 at 2:10 PM, Sean Busbey
>>>>                          <[email protected]
>>>>                 <mailto:[email protected]> <mailto:
>>>> [email protected]
>>>>
>>>>                 <mailto:[email protected]>>> wrote:
>>>>
>>>>                              What version of Accumulo?
>>>>
>>>>                              What version of Hadoop?
>>>>
>>>>                              What does your server memory and per-role
>>>>                 allocation look like?
>>>>
>>>>                              Can you paste the tserver debug log?
>>>>
>>>>
>>>>
>>>>                              On Wed, Jun 25, 2014 at 1:01 PM, Jacob Rust
>>>>                              <[email protected]
>>>>                 <mailto:[email protected]>
>>>>                 <mailto:[email protected]
>>>>
>>>>                 <mailto:[email protected]>>__> wrote:
>>>>
>>>>                                  I am trying to create an inverted text
>>>>                 index for a table
>>>>                                  using accumulo input/output format in a
>>>>                 java
>>>>                                  mapreduce program.  When the job
>>>>                 reaches the reduce
>>>>                                  phase and creates the table / tries to
>>>>                 write to it the
>>>>                                  tablet servers begin to die.
>>>>
>>>>                                  Now when I do a start-all.sh the tablet
>>>>                 servers start
>>>>                                  for about a minute and then die again.
>>>>                 Any idea as to
>>>>                                  why the mapreduce job is killing the
>>>>                 tablet servers
>>>>                                  and/or how to bring the tablet servers
>>>>                 back up without
>>>>                                  failing?
>>>>
>>>>                                  This is on a 12 node cluster with low
>>>>                 quality hardware.
>>>>                                  The java code I am running is here
>>>>                 http://pastebin.com/ti7Qz19m
>>>>
>>>>                                  The log files on each tablet server
>>>>                 only display the
>>>>                                  startup information, no errors. The log
>>>>                 files on the
>>>>                                  master server show these errors
>>>>                 http://pastebin.com/LymiTfB7
>>>>
>>>>
>>>>
>>>>
>>>>                                  --
>>>>                                  Jacob Rust
>>>>                                  Software Intern
>>>>
>>>>
>>>>
>>>>
>>>>                              --
>>>>                              Sean
>>>>
>>>>
>>>>
>>>>
>>>>                          --
>>>>                          Jacob Rust
>>>>                          Software Intern
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                 --
>>>>                 Jacob Rust
>>>>                 Software Intern
>>>>
>>>>
>>>>
>>>>
>>>>         --
>>>>         Jacob Rust
>>>>         Software Intern
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
>
> Donald Miner
> Chief Technology Officer
> ClearEdge IT Solutions, LLC
> Cell: 443 799 7807
> www.clearedgeit.com
>



-- 
Sean

Re: Mapreduce output format killing tablet servers

Reply via email to