Re: HBase stability

Stack Tue, 14 Dec 2010 10:52:43 -0800

On Tue, Dec 14, 2010 at 6:47 AM, baggio liu <[email protected]> wrote:
>> This can be true.  Yes.  What are you suggesting here?  What should we
>> tune?
>>
>> In fact, we  found the low ivalid speed is because datanode invalid limit
> per heartbeat. Many invaild block stay in namenode, and can not dispatch to
> datanode. We simply increase block number which datanode fetch per
> heartbeat.
>


Interesting.  So you changed this hardcoding?

  public static final int BLOCK_INVALIDATE_CHUNK = 100;



>> hdfs-630 has been applied to the branch-0.20-append branch (Its also
>> in CDH IIRC).
>>
>
> Yes, Hdfs-630 is nessessary, but it's not enough. When disk failure found,
> it'll exclude datanode,
> We can kick  failure disk out simplify and make block report to namenode.
>

Is this a code change you made Baggio?


>> Usually if RegionServer has issues getting to HDFS, it'll shut itself
>> down.  This is 'normal' perhaps overly-defensive behavior.  The story
>> should be better in 0.90 but would be interested in any list you might
>> have where you think we should be able to catch and continue.
>>
>> Yes, absolutly it's  overly-defensive behavior, and if region server fail
> to make hdfs operation, fail-fast may be a well recovery mechanism. But some
> IOException is not fatal, in our branch, we add retry mechanism in common fs
> operation, such as exist().
>
>

Excellent.  Any chance of your contributing back your internal branch
fixes?  They'd be welcome.
> My itention is that whenever system start/scan,
> region server (as DFSClient) will
> create too many connections to datanode. And the number of connection will
> increase by store file number, when store file num reach a large value, the
> number of connection will out of control.

Yes.


>  In most scence, scan is locality, in our cluster , more than 95%
> connection is not alive. (connection is estabilish, but there's no data is
> being read.), In our branch, we add a time-out  to close idle connection.
> And in long term,  we can re-use connection between DFSClient  and datanode.
> (may be this kind of re-use can be fulfill by RPC framework)
>

The above sounds great.  So, the connection is reestablished
automatically by DFSClient when a read comes in (I suppose HADOOP-3831
does this for you)?  Is the timeout in DFSClient or in HBase?


>> Yes.  Any suggestions from your experience?
>>
>>
> -XX:GCTimeRatio=10 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0
> -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:SoftRefLRUPolicyMSPerMB=0
> -XX:MaxTenuringThreshold=7
>
> we make some trys in gc tuning. Focus less application stop , we use
> Parallel gc in youny gen, and CMS gc in old gen, the thredshould
> CMSInitiatingOccupancyFraction is the same as our hadoop cluster config, we
> have no idea about why it's 70 , not 71 ...
> May I get gc stratigy in your cluster ?
>

I just took a look at one of our production servers.  Here is our config.:


export SERVER_GC_OPTS="-XX:+DoEscapeAnalysis -XX:+AggressiveOpts
-XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
-XX:CMSInitiatingOccupancyFraction=88 -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps"

This is what we are running:

java version "1.6.0_14-ea"
Java(TM) SE Runtime Environment (build 1.6.0_14-ea-b04)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b13, mixed mode)

(I say what we are running because I believe DoEscapeAnalysis is
disabled in later versions of JVM... I think its same for
AggressiveOpts).

I think NewSize should probably be changed -- the argument for such a
small NewSize was that w/o it, the young generation pause times grew
to become substantial.

Regards CMSInitiatingOccupancyFraction of 88%, I wonder how much of an
effect is having?

That said, the above seems to be working for us.

Regards your settings, you set:

-XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0

I haven't looked at the source but going by this message,
http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2008-October/000226.html,
the above just seems to be setting defaults.  Is that your
understanding?

Do you monitor your GC activity?



>    1. Currently, datanode will send more data than DFSClient request,
> (mostly a whole block), it'll helpful in throughput , but it may cause some
> harm for latency, I just image we can add addtionly rpc read/write interface
> between DFSClient and datanode to reduce overhead in hdfs read/write.


When you say block above, you mean hfile block? Thats what hbase is
requesting though?  Pardon me if I'm not understanding what you are
suggesting.


>    2.  in datanode side , meta file and block file will duplicate open and
> close in every block operation. To reduce latency, we can re-use these file
> handle. Even, we can re-design store mechanism in datanode.
>
>

Yes.  Hopefully something can be done about this pretty soon.

Thanks for the above,
St.Ack

Re: HBase stability

Reply via email to