HBase bug HBASE-13566

2015-05-06 Thread kongzhiguiji
hi,I got a metaLogRoller bug can make regionserver abort,and now I use  a conf 
named hbase.regionserver.logroll.errors.tolerated to ignore the problem I don't 
now how to solve it directly,please help meHBASE-13566 : As hbase run 
MetaLogRoller thread for writting meta hlog, I got below errorERROR 
[RS_OPEN_META-xxx:60020-0-MetaLogRoller] wal.ProtobufLogWriter: Got IOException 
while writing trailer
java.io.IOException: Failing write. Tried pipeline recovery 5 times without 
success.
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:918)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2015-04-25 19:52:05,463 ERROR 
[RS_OPEN_META-host152:60020-0-MetaLogRoller]wal.FSHLog: Failed close of HLog 
writer
java.io.IOException: Failing write. Tried pipeline recovery 5 times without 
success.
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:918)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2015-04-25 19:52:05,463 FATAL 
[RS_OPEN_META-host152:60020-0-MetaLogRoller]regionserver.HRegionServer: 
ABORTING region server host152,60020,1429927886571: Failed log close in log 
roller
org.apache.hadoop.hbase.regionserver.wal.FailedLogCloseException: #1429959124806
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.cleanupCurrentWriter(FSHLog.java:777)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:565)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:97)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failing write. Tried pipeline recovery 5 times 
without success.
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:918)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)note:
I hava checked namenode log, there is no error,just close the write socket
and hadoop and hbase file is all fine



Re: [VOTE] First release candidate for HBase 1.1.0 (RC0) is available.

2015-05-06 Thread Nick Dimiduk
On Wed, May 6, 2015 at 10:13 AM, Andrew Purtell andrew.purt...@gmail.com
wrote:

 I prefer to patch the POMs.


Is this a formal -1?

I've opened HBASE-13637 for tracking this issue. Let's get it fixed and
I'll spin a new RC tonight.

 On May 5, 2015, at 4:16 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
  So what's the conclusion here? Are we dropping 2.2 support or updating
 the
  poms and sinking the RC?
 
  On Fri, May 1, 2015 at 7:47 AM, Sean Busbey bus...@cloudera.com
 wrote:
 
  On Thu, Apr 30, 2015 at 6:48 PM, Andrew Purtell apurt...@apache.org
  wrote:
 
  We could patch our POMs to reference the hadoop-minikdc artifact
  independently of the rest of the Hadoop packages. It's standalone and
  rarely changes.
  +1. I've been using HBase to test Hadoop changes for isolating
 dependencies
  from downstream folks (HADOOP-11804), and I've just been leaving the
  hadoop-minikdc artifact as-is due to these very reasons.
 
  --
  Sean
 



Re: [VOTE] First release candidate for HBase 1.1.0 (RC0) is available.

2015-05-06 Thread Nick Dimiduk
I'm also traveling today.

I've already extended the vote for this RC to Sunday, and since no one has
said this is a -1 -worthy regression, this candidate continues to stand.

On Wed, May 6, 2015 at 12:16 PM, Andrew Purtell andrew.purt...@gmail.com
wrote:

 Formally, -0

 Given tomorrow is hbasecon perhaps it would be better to spin a RC on
 Friday?

 I can take HBASE-13637 but am sitting on a plane at the moment. Won't be
 able to get to it until tonight.

  On May 6, 2015, at 10:43 AM, Nick Dimiduk ndimi...@apache.org wrote:
 
  On Wed, May 6, 2015 at 10:13 AM, Andrew Purtell 
 andrew.purt...@gmail.com
  wrote:
 
  I prefer to patch the POMs.
 
  Is this a formal -1?
 
  I've opened HBASE-13637 for tracking this issue. Let's get it fixed and
  I'll spin a new RC tonight.
 
  On May 5, 2015, at 4:16 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
  So what's the conclusion here? Are we dropping 2.2 support or updating
  the
  poms and sinking the RC?
 
  On Fri, May 1, 2015 at 7:47 AM, Sean Busbey bus...@cloudera.com
  wrote:
 
  On Thu, Apr 30, 2015 at 6:48 PM, Andrew Purtell apurt...@apache.org
  wrote:
 
  We could patch our POMs to reference the hadoop-minikdc artifact
  independently of the rest of the Hadoop packages. It's standalone and
  rarely changes.
  +1. I've been using HBase to test Hadoop changes for isolating
  dependencies
  from downstream folks (HADOOP-11804), and I've just been leaving the
  hadoop-minikdc artifact as-is due to these very reasons.
 
  --
  Sean
 



Re: RowKey hashing in HBase 1.0

2015-05-06 Thread jeremy p
Thank you for the explanation, but I'm a little confused.  The key will be
monotonically increasing, but the hash of that key will not be.

So, even though your original keys may look like : 1_foobar, 2_foobar,
3_foobar
After the hashing, they'd look more like : 349000_1_foobar,
99_2_foobar, 01_3_foobar

With five regions, the original key ranges for your regions would look
something like : 00-19, 20-39, 40-59,
60-79, 80-9

So let's say you add another row.  It causes a split.  Now your regions
look like :  00-19, 20-39, 40-59, 60-79,
80-89, 90-99

Since the value that you are prepending to your keys is essentially random,
I don't see why your regions would only fill halfway.  A new, hashed key
would be just as likely to fall within 80-89 as it would be to fall
within 90-99.

Are we working from different assumptions?

On Tue, May 5, 2015 at 4:46 PM, Michael Segel michael_se...@hotmail.com
wrote:

 Yes, what you described  mod(hash(rowkey),n) where n is the number of
 regions will remove the hotspotting issue.

 However, if your key is sequential you will only have regions half full
 post region split.

 Look at it this way…

 If I have a key that is a sequential count 1,2,3,4,5 … I am always adding
 a new row to the last region and its always being added to the right.
 (reading left from right.) Always at the end of the line…

 So if I have 10,000 rows and I split the region… region 1 has 0 to 4,999
 and region 2 has 5000 to 1.

 Now my next row is 10001, the following is 10002 … so they will be added
 at the tail end of region 2 until it splits.  (And so on, and so on…)

 If you take a modulus of the hash, you create n buckets. Again for each
 bucket… I will still be adding a new larger number so it will be added to
 the right hand side or tail of the list.

 Once a region is split… that’s it.

 Bucketing will solve the hot spotting issue by creating n lists of rows,
 but you’re still always adding to the end of the list.

 Does that make sense?


  On May 5, 2015, at 10:04 AM, jeremy p athomewithagroove...@gmail.com
 wrote:
 
  Thank you for your response!
 
  So I guess 'salt' is a bit of a misnomer.  What I used to do is this :
 
  1) Say that my key value is something like '1234foobar'
  2) I obtain the hash of '1234foobar'.  Let's say that's '54824923'
  3) I mod the hash by my number of regions.  Let's say I have 2000
 regions.
  54824923 % 2000 = 923
  4) I prepend that value to my original key value, so my new key is
  '923_1234foobar'
 
  Is this the same thing you were talking about?
 
  A couple questions :
 
  * Why would my regions only be 1/2 full?
  * Why would I only use this for sequential keys?  I would think this
 would
  give better performance in any situation where I don't need range scans.
  For example, let's say my key value is a person's last name.  That will
  naturally cluster around certain letters, giving me an uneven
 distribution.
 
  --Jeremy
 
 
 
  On Sun, May 3, 2015 at 11:46 AM, Michael Segel 
 michael_se...@hotmail.com
  wrote:
 
  Yes, don’t use a salt. Salt implies that your seed is orthogonal (read
  random) to the base table row key.
  You’re better off using a truncated hash (md5 is fastest) so that at
 least
  you can use a single get().
 
  Common?
 
  Only if your row key is mostly sequential.
 
  Note that even with bucketing, you will still end up with regions only
 1/2
  full with the only exception being the last region.
 
  On May 1, 2015, at 11:09 AM, jeremy p athomewithagroove...@gmail.com
  wrote:
 
  Hello all,
 
  I've been out of the HBase world for a while, and I'm just now jumping
  back
  in.
 
  As of HBase .94, it was still common to take a hash of your RowKey and
  use
  that to salt the beginning of your RowKey to obtain an even
  distribution
  among your region servers.  Is this still a common practice, or is
 there
  a
  better way to do this in HBase 1.0?
 
  --Jeremy
 
  The opinions expressed here are mine, while they may reflect a cognitive
  thought, that is purely accidental.
  Use at your own risk.
  Michael Segel
  michael_segel (AT) hotmail.com
 
 
 
 
 
 

 The opinions expressed here are mine, while they may reflect a cognitive
 thought, that is purely accidental.
 Use at your own risk.
 Michael Segel
 michael_segel (AT) hotmail.com








Re: [VOTE] First release candidate for HBase 1.1.0 (RC0) is available.

2015-05-06 Thread Andrew Purtell
Formally, -0

Given tomorrow is hbasecon perhaps it would be better to spin a RC on Friday? 

I can take HBASE-13637 but am sitting on a plane at the moment. Won't be able 
to get to it until tonight. 

 On May 6, 2015, at 10:43 AM, Nick Dimiduk ndimi...@apache.org wrote:
 
 On Wed, May 6, 2015 at 10:13 AM, Andrew Purtell andrew.purt...@gmail.com
 wrote:
 
 I prefer to patch the POMs.
 
 Is this a formal -1?
 
 I've opened HBASE-13637 for tracking this issue. Let's get it fixed and
 I'll spin a new RC tonight.
 
 On May 5, 2015, at 4:16 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
 So what's the conclusion here? Are we dropping 2.2 support or updating
 the
 poms and sinking the RC?
 
 On Fri, May 1, 2015 at 7:47 AM, Sean Busbey bus...@cloudera.com
 wrote:
 
 On Thu, Apr 30, 2015 at 6:48 PM, Andrew Purtell apurt...@apache.org
 wrote:
 
 We could patch our POMs to reference the hadoop-minikdc artifact
 independently of the rest of the Hadoop packages. It's standalone and
 rarely changes.
 +1. I've been using HBase to test Hadoop changes for isolating
 dependencies
 from downstream folks (HADOOP-11804), and I've just been leaving the
 hadoop-minikdc artifact as-is due to these very reasons.
 
 --
 Sean
 


Re: [VOTE] First release candidate for HBase 1.1.0 (RC0) is available.

2015-05-06 Thread Andrew Purtell
I prefer to patch the POMs. 



 On May 5, 2015, at 4:16 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
 So what's the conclusion here? Are we dropping 2.2 support or updating the
 poms and sinking the RC?
 
 On Fri, May 1, 2015 at 7:47 AM, Sean Busbey bus...@cloudera.com wrote:
 
 On Thu, Apr 30, 2015 at 6:48 PM, Andrew Purtell apurt...@apache.org
 wrote:
 
 We could patch our POMs to reference the hadoop-minikdc artifact
 independently of the rest of the Hadoop packages. It's standalone and
 rarely changes.
 +1. I've been using HBase to test Hadoop changes for isolating dependencies
 from downstream folks (HADOOP-11804), and I've just been leaving the
 hadoop-minikdc artifact as-is due to these very reasons.
 
 --
 Sean
 


Re: RowKey hashing in HBase 1.0

2015-05-06 Thread Michael Segel
Jeremy, 

I think you have to be careful in how you say things. 
While over time, you’re going to get an even distribution, the hash isn’t 
random. Its consistent so that hash(x) = y  and will always be the same. 
You’re taking the modulus to create 1 to n buckets. 

In each bucket, your new key is n_rowkey  where rowkey is the original row key. 

Remember that the rowkey is growing sequentially.  rowkey(n)  rowkey(n+1) …   
rowkey(n+k) 

So if you hash and take its modulus and prepend it, you will still have 
X_rowkey(n) , X_rowkey(n+k) , … 


All you have is N sequential lists. And again with a sequential list, you’re 
adding to the right so when you split, the top section is never going to get 
new rows. 

I think you need to create a list  and try this with 3 or 4 buckets and you’ll 
start to see what happens. 

The last region fills, but after it splits, the top half is static. The new 
rows are added to the bottom half only. 

This is a problem with sequential keys that you have to learn to live with. 

Its not a killer issue, but something you need to be  aware… 

 On May 6, 2015, at 4:00 PM, jeremy p athomewithagroove...@gmail.com wrote:
 
 Thank you for the explanation, but I'm a little confused.  The key will be
 monotonically increasing, but the hash of that key will not be.
 
 So, even though your original keys may look like : 1_foobar, 2_foobar,
 3_foobar
 After the hashing, they'd look more like : 349000_1_foobar,
 99_2_foobar, 01_3_foobar
 
 With five regions, the original key ranges for your regions would look
 something like : 00-19, 20-39, 40-59,
 60-79, 80-9
 
 So let's say you add another row.  It causes a split.  Now your regions
 look like :  00-19, 20-39, 40-59, 60-79,
 80-89, 90-99
 
 Since the value that you are prepending to your keys is essentially random,
 I don't see why your regions would only fill halfway.  A new, hashed key
 would be just as likely to fall within 80-89 as it would be to fall
 within 90-99.
 
 Are we working from different assumptions?
 
 On Tue, May 5, 2015 at 4:46 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Yes, what you described  mod(hash(rowkey),n) where n is the number of
 regions will remove the hotspotting issue.
 
 However, if your key is sequential you will only have regions half full
 post region split.
 
 Look at it this way…
 
 If I have a key that is a sequential count 1,2,3,4,5 … I am always adding
 a new row to the last region and its always being added to the right.
 (reading left from right.) Always at the end of the line…
 
 So if I have 10,000 rows and I split the region… region 1 has 0 to 4,999
 and region 2 has 5000 to 1.
 
 Now my next row is 10001, the following is 10002 … so they will be added
 at the tail end of region 2 until it splits.  (And so on, and so on…)
 
 If you take a modulus of the hash, you create n buckets. Again for each
 bucket… I will still be adding a new larger number so it will be added to
 the right hand side or tail of the list.
 
 Once a region is split… that’s it.
 
 Bucketing will solve the hot spotting issue by creating n lists of rows,
 but you’re still always adding to the end of the list.
 
 Does that make sense?
 
 
 On May 5, 2015, at 10:04 AM, jeremy p athomewithagroove...@gmail.com
 wrote:
 
 Thank you for your response!
 
 So I guess 'salt' is a bit of a misnomer.  What I used to do is this :
 
 1) Say that my key value is something like '1234foobar'
 2) I obtain the hash of '1234foobar'.  Let's say that's '54824923'
 3) I mod the hash by my number of regions.  Let's say I have 2000
 regions.
 54824923 % 2000 = 923
 4) I prepend that value to my original key value, so my new key is
 '923_1234foobar'
 
 Is this the same thing you were talking about?
 
 A couple questions :
 
 * Why would my regions only be 1/2 full?
 * Why would I only use this for sequential keys?  I would think this
 would
 give better performance in any situation where I don't need range scans.
 For example, let's say my key value is a person's last name.  That will
 naturally cluster around certain letters, giving me an uneven
 distribution.
 
 --Jeremy
 
 
 
 On Sun, May 3, 2015 at 11:46 AM, Michael Segel 
 michael_se...@hotmail.com
 wrote:
 
 Yes, don’t use a salt. Salt implies that your seed is orthogonal (read
 random) to the base table row key.
 You’re better off using a truncated hash (md5 is fastest) so that at
 least
 you can use a single get().
 
 Common?
 
 Only if your row key is mostly sequential.
 
 Note that even with bucketing, you will still end up with regions only
 1/2
 full with the only exception being the last region.
 
 On May 1, 2015, at 11:09 AM, jeremy p athomewithagroove...@gmail.com
 wrote:
 
 Hello all,
 
 I've been out of the HBase world for a while, and I'm just now jumping
 back
 in.
 
 As of HBase .94, it was still common to take a hash of your RowKey and
 use
 that to salt