HBase bug HBASE-13566
hi,I got a metaLogRoller bug can make regionserver abort,and now I use a conf named hbase.regionserver.logroll.errors.tolerated to ignore the problem I don't now how to solve it directly,please help meHBASE-13566 : As hbase run MetaLogRoller thread for writting meta hlog, I got below errorERROR [RS_OPEN_META-xxx:60020-0-MetaLogRoller] wal.ProtobufLogWriter: Got IOException while writing trailer java.io.IOException: Failing write. Tried pipeline recovery 5 times without success. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:918) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486) 2015-04-25 19:52:05,463 ERROR [RS_OPEN_META-host152:60020-0-MetaLogRoller]wal.FSHLog: Failed close of HLog writer java.io.IOException: Failing write. Tried pipeline recovery 5 times without success. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:918) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486) 2015-04-25 19:52:05,463 FATAL [RS_OPEN_META-host152:60020-0-MetaLogRoller]regionserver.HRegionServer: ABORTING region server host152,60020,1429927886571: Failed log close in log roller org.apache.hadoop.hbase.regionserver.wal.FailedLogCloseException: #1429959124806 at org.apache.hadoop.hbase.regionserver.wal.FSHLog.cleanupCurrentWriter(FSHLog.java:777) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:565) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:97) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failing write. Tried pipeline recovery 5 times without success. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:918) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)note: I hava checked namenode log, there is no error,just close the write socket and hadoop and hbase file is all fine
Re: [VOTE] First release candidate for HBase 1.1.0 (RC0) is available.
On Wed, May 6, 2015 at 10:13 AM, Andrew Purtell andrew.purt...@gmail.com wrote: I prefer to patch the POMs. Is this a formal -1? I've opened HBASE-13637 for tracking this issue. Let's get it fixed and I'll spin a new RC tonight. On May 5, 2015, at 4:16 PM, Nick Dimiduk ndimi...@gmail.com wrote: So what's the conclusion here? Are we dropping 2.2 support or updating the poms and sinking the RC? On Fri, May 1, 2015 at 7:47 AM, Sean Busbey bus...@cloudera.com wrote: On Thu, Apr 30, 2015 at 6:48 PM, Andrew Purtell apurt...@apache.org wrote: We could patch our POMs to reference the hadoop-minikdc artifact independently of the rest of the Hadoop packages. It's standalone and rarely changes. +1. I've been using HBase to test Hadoop changes for isolating dependencies from downstream folks (HADOOP-11804), and I've just been leaving the hadoop-minikdc artifact as-is due to these very reasons. -- Sean
Re: [VOTE] First release candidate for HBase 1.1.0 (RC0) is available.
I'm also traveling today. I've already extended the vote for this RC to Sunday, and since no one has said this is a -1 -worthy regression, this candidate continues to stand. On Wed, May 6, 2015 at 12:16 PM, Andrew Purtell andrew.purt...@gmail.com wrote: Formally, -0 Given tomorrow is hbasecon perhaps it would be better to spin a RC on Friday? I can take HBASE-13637 but am sitting on a plane at the moment. Won't be able to get to it until tonight. On May 6, 2015, at 10:43 AM, Nick Dimiduk ndimi...@apache.org wrote: On Wed, May 6, 2015 at 10:13 AM, Andrew Purtell andrew.purt...@gmail.com wrote: I prefer to patch the POMs. Is this a formal -1? I've opened HBASE-13637 for tracking this issue. Let's get it fixed and I'll spin a new RC tonight. On May 5, 2015, at 4:16 PM, Nick Dimiduk ndimi...@gmail.com wrote: So what's the conclusion here? Are we dropping 2.2 support or updating the poms and sinking the RC? On Fri, May 1, 2015 at 7:47 AM, Sean Busbey bus...@cloudera.com wrote: On Thu, Apr 30, 2015 at 6:48 PM, Andrew Purtell apurt...@apache.org wrote: We could patch our POMs to reference the hadoop-minikdc artifact independently of the rest of the Hadoop packages. It's standalone and rarely changes. +1. I've been using HBase to test Hadoop changes for isolating dependencies from downstream folks (HADOOP-11804), and I've just been leaving the hadoop-minikdc artifact as-is due to these very reasons. -- Sean
Re: RowKey hashing in HBase 1.0
Thank you for the explanation, but I'm a little confused. The key will be monotonically increasing, but the hash of that key will not be. So, even though your original keys may look like : 1_foobar, 2_foobar, 3_foobar After the hashing, they'd look more like : 349000_1_foobar, 99_2_foobar, 01_3_foobar With five regions, the original key ranges for your regions would look something like : 00-19, 20-39, 40-59, 60-79, 80-9 So let's say you add another row. It causes a split. Now your regions look like : 00-19, 20-39, 40-59, 60-79, 80-89, 90-99 Since the value that you are prepending to your keys is essentially random, I don't see why your regions would only fill halfway. A new, hashed key would be just as likely to fall within 80-89 as it would be to fall within 90-99. Are we working from different assumptions? On Tue, May 5, 2015 at 4:46 PM, Michael Segel michael_se...@hotmail.com wrote: Yes, what you described mod(hash(rowkey),n) where n is the number of regions will remove the hotspotting issue. However, if your key is sequential you will only have regions half full post region split. Look at it this way… If I have a key that is a sequential count 1,2,3,4,5 … I am always adding a new row to the last region and its always being added to the right. (reading left from right.) Always at the end of the line… So if I have 10,000 rows and I split the region… region 1 has 0 to 4,999 and region 2 has 5000 to 1. Now my next row is 10001, the following is 10002 … so they will be added at the tail end of region 2 until it splits. (And so on, and so on…) If you take a modulus of the hash, you create n buckets. Again for each bucket… I will still be adding a new larger number so it will be added to the right hand side or tail of the list. Once a region is split… that’s it. Bucketing will solve the hot spotting issue by creating n lists of rows, but you’re still always adding to the end of the list. Does that make sense? On May 5, 2015, at 10:04 AM, jeremy p athomewithagroove...@gmail.com wrote: Thank you for your response! So I guess 'salt' is a bit of a misnomer. What I used to do is this : 1) Say that my key value is something like '1234foobar' 2) I obtain the hash of '1234foobar'. Let's say that's '54824923' 3) I mod the hash by my number of regions. Let's say I have 2000 regions. 54824923 % 2000 = 923 4) I prepend that value to my original key value, so my new key is '923_1234foobar' Is this the same thing you were talking about? A couple questions : * Why would my regions only be 1/2 full? * Why would I only use this for sequential keys? I would think this would give better performance in any situation where I don't need range scans. For example, let's say my key value is a person's last name. That will naturally cluster around certain letters, giving me an uneven distribution. --Jeremy On Sun, May 3, 2015 at 11:46 AM, Michael Segel michael_se...@hotmail.com wrote: Yes, don’t use a salt. Salt implies that your seed is orthogonal (read random) to the base table row key. You’re better off using a truncated hash (md5 is fastest) so that at least you can use a single get(). Common? Only if your row key is mostly sequential. Note that even with bucketing, you will still end up with regions only 1/2 full with the only exception being the last region. On May 1, 2015, at 11:09 AM, jeremy p athomewithagroove...@gmail.com wrote: Hello all, I've been out of the HBase world for a while, and I'm just now jumping back in. As of HBase .94, it was still common to take a hash of your RowKey and use that to salt the beginning of your RowKey to obtain an even distribution among your region servers. Is this still a common practice, or is there a better way to do this in HBase 1.0? --Jeremy The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
Re: [VOTE] First release candidate for HBase 1.1.0 (RC0) is available.
Formally, -0 Given tomorrow is hbasecon perhaps it would be better to spin a RC on Friday? I can take HBASE-13637 but am sitting on a plane at the moment. Won't be able to get to it until tonight. On May 6, 2015, at 10:43 AM, Nick Dimiduk ndimi...@apache.org wrote: On Wed, May 6, 2015 at 10:13 AM, Andrew Purtell andrew.purt...@gmail.com wrote: I prefer to patch the POMs. Is this a formal -1? I've opened HBASE-13637 for tracking this issue. Let's get it fixed and I'll spin a new RC tonight. On May 5, 2015, at 4:16 PM, Nick Dimiduk ndimi...@gmail.com wrote: So what's the conclusion here? Are we dropping 2.2 support or updating the poms and sinking the RC? On Fri, May 1, 2015 at 7:47 AM, Sean Busbey bus...@cloudera.com wrote: On Thu, Apr 30, 2015 at 6:48 PM, Andrew Purtell apurt...@apache.org wrote: We could patch our POMs to reference the hadoop-minikdc artifact independently of the rest of the Hadoop packages. It's standalone and rarely changes. +1. I've been using HBase to test Hadoop changes for isolating dependencies from downstream folks (HADOOP-11804), and I've just been leaving the hadoop-minikdc artifact as-is due to these very reasons. -- Sean
Re: [VOTE] First release candidate for HBase 1.1.0 (RC0) is available.
I prefer to patch the POMs. On May 5, 2015, at 4:16 PM, Nick Dimiduk ndimi...@gmail.com wrote: So what's the conclusion here? Are we dropping 2.2 support or updating the poms and sinking the RC? On Fri, May 1, 2015 at 7:47 AM, Sean Busbey bus...@cloudera.com wrote: On Thu, Apr 30, 2015 at 6:48 PM, Andrew Purtell apurt...@apache.org wrote: We could patch our POMs to reference the hadoop-minikdc artifact independently of the rest of the Hadoop packages. It's standalone and rarely changes. +1. I've been using HBase to test Hadoop changes for isolating dependencies from downstream folks (HADOOP-11804), and I've just been leaving the hadoop-minikdc artifact as-is due to these very reasons. -- Sean
Re: RowKey hashing in HBase 1.0
Jeremy, I think you have to be careful in how you say things. While over time, you’re going to get an even distribution, the hash isn’t random. Its consistent so that hash(x) = y and will always be the same. You’re taking the modulus to create 1 to n buckets. In each bucket, your new key is n_rowkey where rowkey is the original row key. Remember that the rowkey is growing sequentially. rowkey(n) rowkey(n+1) … rowkey(n+k) So if you hash and take its modulus and prepend it, you will still have X_rowkey(n) , X_rowkey(n+k) , … All you have is N sequential lists. And again with a sequential list, you’re adding to the right so when you split, the top section is never going to get new rows. I think you need to create a list and try this with 3 or 4 buckets and you’ll start to see what happens. The last region fills, but after it splits, the top half is static. The new rows are added to the bottom half only. This is a problem with sequential keys that you have to learn to live with. Its not a killer issue, but something you need to be aware… On May 6, 2015, at 4:00 PM, jeremy p athomewithagroove...@gmail.com wrote: Thank you for the explanation, but I'm a little confused. The key will be monotonically increasing, but the hash of that key will not be. So, even though your original keys may look like : 1_foobar, 2_foobar, 3_foobar After the hashing, they'd look more like : 349000_1_foobar, 99_2_foobar, 01_3_foobar With five regions, the original key ranges for your regions would look something like : 00-19, 20-39, 40-59, 60-79, 80-9 So let's say you add another row. It causes a split. Now your regions look like : 00-19, 20-39, 40-59, 60-79, 80-89, 90-99 Since the value that you are prepending to your keys is essentially random, I don't see why your regions would only fill halfway. A new, hashed key would be just as likely to fall within 80-89 as it would be to fall within 90-99. Are we working from different assumptions? On Tue, May 5, 2015 at 4:46 PM, Michael Segel michael_se...@hotmail.com wrote: Yes, what you described mod(hash(rowkey),n) where n is the number of regions will remove the hotspotting issue. However, if your key is sequential you will only have regions half full post region split. Look at it this way… If I have a key that is a sequential count 1,2,3,4,5 … I am always adding a new row to the last region and its always being added to the right. (reading left from right.) Always at the end of the line… So if I have 10,000 rows and I split the region… region 1 has 0 to 4,999 and region 2 has 5000 to 1. Now my next row is 10001, the following is 10002 … so they will be added at the tail end of region 2 until it splits. (And so on, and so on…) If you take a modulus of the hash, you create n buckets. Again for each bucket… I will still be adding a new larger number so it will be added to the right hand side or tail of the list. Once a region is split… that’s it. Bucketing will solve the hot spotting issue by creating n lists of rows, but you’re still always adding to the end of the list. Does that make sense? On May 5, 2015, at 10:04 AM, jeremy p athomewithagroove...@gmail.com wrote: Thank you for your response! So I guess 'salt' is a bit of a misnomer. What I used to do is this : 1) Say that my key value is something like '1234foobar' 2) I obtain the hash of '1234foobar'. Let's say that's '54824923' 3) I mod the hash by my number of regions. Let's say I have 2000 regions. 54824923 % 2000 = 923 4) I prepend that value to my original key value, so my new key is '923_1234foobar' Is this the same thing you were talking about? A couple questions : * Why would my regions only be 1/2 full? * Why would I only use this for sequential keys? I would think this would give better performance in any situation where I don't need range scans. For example, let's say my key value is a person's last name. That will naturally cluster around certain letters, giving me an uneven distribution. --Jeremy On Sun, May 3, 2015 at 11:46 AM, Michael Segel michael_se...@hotmail.com wrote: Yes, don’t use a salt. Salt implies that your seed is orthogonal (read random) to the base table row key. You’re better off using a truncated hash (md5 is fastest) so that at least you can use a single get(). Common? Only if your row key is mostly sequential. Note that even with bucketing, you will still end up with regions only 1/2 full with the only exception being the last region. On May 1, 2015, at 11:09 AM, jeremy p athomewithagroove...@gmail.com wrote: Hello all, I've been out of the HBase world for a while, and I'm just now jumping back in. As of HBase .94, it was still common to take a hash of your RowKey and use that to salt