stack-3 wrote:
> On Fri, May 13, 2011 at 7:44 AM, Stan Barton <> wrote:
>> stack-3 wrote:
>>> On Thu, Apr 28, 2011 at 6:54 AM, Stan Barton <> wrote:
>>> Are you swapping Stan?  You are close to the edge with your RAM
>>> allocations.  What do you have swappyness set to?  Is it default?
>>> Writing you don't need that much memory usually but you do have a lot
>>> of regions so you could be flushing a bunch, a bunch of small files.
>> Due to various problems with swap, the swap was turned off and the
>> overcommitment of the memory was turned on.
> Sorry.  How do you enable overcommitment of memory, or do you mean to
> say that your processes add up to more than the RAM you have?

The memory overcommitment is needed because in order to let java still
"allocate" the memory for executing external bash commands like "du" when
the RAM is nearly filled up. I have the swap turned off and have turned the
overcommitment using sysctl and setting vm.overcommit_memory=0 (i.e. the
option when any memory allocation attempt will succeed no matter the resting
free RAM). I was encountering RS crashed caused by the "
Cannot run program "bash": error=12, Cannot allocate
memory". However, my processes should never add up more than the available
RAM-the minimum for OS.

stack-3 wrote:
>> stack-3 wrote:
>>> These are old IA stock machines?  Do they have ECC RAM?  (IIRC, they
>>> used to not have ECC RAM).
>> Strangely, on the machines and the debian installed, only this (star * )
>> approach works.
> OK.  New to me, but hey, what do I know!
>> Originally, I was running the DB on the same cluster as the
>> processing took place - mostly mapreduce jobs reading the data and doing
>> some analysis. But when I started using nutchwax on the same cluster I
>> started running out of memory (on the mapreduce side) and since the
>> machines
>> are so sensitive (no swap and overcommitment) that became a nightmare. So
>> right now the nutch is being ran on a separate cluster - I have tweaked
>> nutchwax to work with recent Hadoop apis and also to take the hbase
>> stored
>> content on as the input (instead of ARC files).
> Good stuff
>> The machines are somehow renovated old red boxes (I dont know what
>> configuration they were originally). The RAM is not an ECC as far as I
>> know,
>> because the chipset on the motherboards does not support that technology.
> OK.  You seeing any issues arising because of checksum issues?  (BTW,
> IIRC, these non-ECC red boxes are the reason HDFS is a checksummed
> filesystem)
How would these manifest? I guess that is not related but on the same note,
I am encountering a quite high disk failure on machines running HBase/HDFS.

stack-3 wrote:
>> stack-3 wrote:
>>>> hadoop/hdfs-site.xml
>>> Did you change the dfs block size?   Looks like its 256M rather than
>>> usual 64M.  Any reason for that?  Would suggest going w/ defaults at
>>> first.
>>> Remove dfs.datanode.socket.write.timeout == 0.  Thats an old config.
>>> recommendation that should no longer be necessary and is likely
>>> corrosive.
>> I have changed the size of the block, to diminish the overall number of
>> blocks. I was following some advices regarding managing that large amount
>> of
>> data in HDFS that I found in the fora.
> Yeah, I suppose, bigger blocksizes would make it so you need less RAM
> in your namenode.  You have lots of files on here?  On the other side,
> bigger blocks are harder for hbase to sling.

In general, the HDFS contains only HBase files, so at this point the memory
consumption on NN is not an issue, so I have lowered that back to the
defaults and will observe.

stack-3 wrote:
>> As for the dfs.datanode.socket.write.timeout, that was set up because I
>> was
>> observing quite often timeouts on the DFS sockets, and by digging around,
>> I
>> have found out, that for some reason the internal java times were not
>> aligned of the connecting machines (even though the hw clock were), I
>> think
>> there was a JIRA for that.
> Not sure what this one is about.  The
> dfs.datanode.socket.write.timeout=0 is old lore by this stage I think
> you'll find.
I should have noted the cause of the problem better, I will remove that and
observe whether will be getting the socket exceptions again.

stack-3 wrote:
>> Again, the reason to upper the block size was motivated by the assumption
>> of
>> lowering the overall number of blocks. If it imposes stress on the RAM it
>> makes sense to leave it on the defaults. I guess it also helps the
>> parallelization.
> Yeah, would suggest you run w/ default sizes.
>> stack-3 wrote:
>>>> hbase/
>>> Remove this:
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> Means it will dump heap if JVM crashes.  This is probably of no
>>> interest to you and could actually cause you pain if you have small
>>> root file system if the heap dump causes you to fill.
>>> The -XX:CMSInitiatingOccupancyFraction=90 is probably near useless
>>> (default is 92% or 88% -- I don't remember which).  Set it down to 80%
>>> or 75% if you want it to actually make a difference.
>>> Are you having issues w/ GC'ing?  I see you have mslab enabled.
>> On the version 0.20.6 I have seen long pauses during the importing phase
>> and
>> also when querying. I was measuring the how many queries were processed
>> per
>> second and could see pauses in the throughput. The only culprit I could
>> find
>> was the gc, but still could not figure out why it pauses the whole DB.
>> Therefore I gave it a shot with mslab with 0.90, but I do still see those
>> pauses in the throughput.
> Importing, yeah, you are probably running into the 'gate' that a
> regionserver puts up when it has filled its memstore while waiting on
> flush to complete.  Check regionserver logs at about this time.  You
> should see 'blocking' messages followed soon after by unblocking after
> the flush runs.

For the import I can understand, but when I am evaluating the querying
performance, almost no writes (besides small statistics data) are going on
and the HBase pauses as a whole, not only one RS (which I would believe is
the case when writes were flushed in the statistics table having one

stack-3 wrote:
> St.Ack

View this message in context:
Sent from the HBase User mailing list archive at

Reply via email to