Ceph as an alternative to HDFS
Article published in this months Usenix Login magazine: http://www.usenix.org/publications/login/2010-08/openpdfs/maltzahn.pdf -ak
How to delete rows in a FIFO manner
Hi, Continuing with testing HBase suitability in a high ingest rate environment, I've come up with a new stumbling block, likely due to my inexperience with HBase. We want to keep and purge records on a time basis: i.e, when a record is older than say, 24 hours, we want to purge it from the database. The problem that I am encountering is the only way I've found to delete records using an arbitrary but strongly ordered over time row id is to scan for rows from lower bound to upper bound, then build an array of Delete using for Result in ResultScanner add new Delete( Result.getRow( ) ) to Delete array. This method is far too slow to keep up with our ingest rate; the iteration over the Results in the ResultScanner is the bottleneck, even though the Scan is limited to a single small column in the column family. The obvious but naive solution is to use a sequential row id where the lower and upper bound can be known. This would allow the building of the array of Delete objects without a scan step. Problem with this approach is how do you guarantee a sequential and non-colliding row id across more than one Put'ing process, and do it efficiently. As it happens, I can do this, but given the details of my operational requirements, it's not a simple thing to do. So I was hoping that I had just missed something. The ideal would be a Delete object that would take row id bounds in the same way that Scan does, allowing the work to be done all on the server side. Does this exists somewhere? Or is there some other way to skin this cat? Thanks Thomas Downing
Re: How to delete rows in a FIFO manner
If the inserts are coming from more than 1 client, and your are trying to delete from only 1 client, then likely it won't work. You could try using a pool of deleters (multiple threads that delete rows) that you feed from the scanner. Or you could run a MapReduce that would parallelize that for you, that takes your table as an input and that outputs Delete objects. J-D On Fri, Aug 6, 2010 at 5:50 AM, Thomas Downing tdown...@proteus-technologies.com wrote: Hi, Continuing with testing HBase suitability in a high ingest rate environment, I've come up with a new stumbling block, likely due to my inexperience with HBase. We want to keep and purge records on a time basis: i.e, when a record is older than say, 24 hours, we want to purge it from the database. The problem that I am encountering is the only way I've found to delete records using an arbitrary but strongly ordered over time row id is to scan for rows from lower bound to upper bound, then build an array of Delete using for Result in ResultScanner add new Delete( Result.getRow( ) ) to Delete array. This method is far too slow to keep up with our ingest rate; the iteration over the Results in the ResultScanner is the bottleneck, even though the Scan is limited to a single small column in the column family. The obvious but naive solution is to use a sequential row id where the lower and upper bound can be known. This would allow the building of the array of Delete objects without a scan step. Problem with this approach is how do you guarantee a sequential and non-colliding row id across more than one Put'ing process, and do it efficiently. As it happens, I can do this, but given the details of my operational requirements, it's not a simple thing to do. So I was hoping that I had just missed something. The ideal would be a Delete object that would take row id bounds in the same way that Scan does, allowing the work to be done all on the server side. Does this exists somewhere? Or is there some other way to skin this cat? Thanks Thomas Downing
Re: How to delete rows in a FIFO manner
I wrestled with that idea of time bounded tables..Would it make it harder to write code/run map reduce on multiple tables ? Also, how do u decide to when to do the cut over (start of a new day, week/month..) if u do how to process data that cross those time boundaries efficiently.. Guess that is not your requirement.. If it is fixed time cut over, is n't enough to set the TTL timestamp ? Interesting thread..thanks -Original Message- From: Thomas Downing tdown...@proteus-technologies.com To: user@hbase.apache.org user@hbase.apache.org Sent: Fri, Aug 6, 2010 11:39 am Subject: Re: How to delete rows in a FIFO manner Thanks for the suggestions. The problem isn't generating the Delete objects, or the delete operation itself - both are fast enough. The problem is generating the list of row keys from which the Delete objects are created. For now, the obvious work-around is to create and drop tables on the fly, using HBaseAdmin, with the tables being time-bounded. When the high end of a table passes the expiry time, just drop the table. When a table is written with the first record greater than the low bound, create a new table for the next time interval. As I am having other problems related to high ingest rates, the fact may be that I am just using the wrong tool for the job. Thanks td On 8/6/2010 10:24 AM, Jean-Daniel Cryans wrote: If the inserts are coming from more than 1 client, and your are trying to delete from only 1 client, then likely it won't work. You could try using a pool of deleters (multiple threads that delete rows) that you feed from the scanner. Or you could run a MapReduce that would parallelize that for you, that takes your table as an input and that outputs Delete objects. J-D On Fri, Aug 6, 2010 at 5:50 AM, Thomas Downing tdown...@proteus-technologies.com wrote: Hi, Continuing with testing HBase suitability in a high ingest rate environment, I've come up with a new stumbling block, likely due to my inexperience with HBase. We want to keep and purge records on a time basis: i.e, when a record is older than say, 24 hours, we want to purge it from the database. The problem that I am encountering is the only way I've found to delete records using an arbitrary but strongly ordered over time row id is to scan for rows from lower bound to upper bound, then build an array of Delete using for Result in ResultScanner add new Delete( Result.getRow( ) ) to Delete array. This method is far too slow to keep up with our ingest rate; the iteration over the Results in the ResultScanner is the bottleneck, even though the Scan is limited to a single small column in the column family. The obvious but naive solution is to use a sequential row id where the lower and upper bound can be known. This would allow the building of the array of Delete objects without a scan step. Problem with this approach is how do you guarantee a sequential and non-colliding row id across more than one Put'ing process, and do it efficiently. As it happens, I can do this, but given the details of my operational requirements, it's not a simple thing to do. So I was hoping that I had just missed something. The ideal would be a Delete object that would take row id bounds in the same way that Scan does, allowing the work to be done all on the server side. Does this exists somewhere? Or is there some other way to skin this cat? Thanks Thomas Downing -- Follow this link to mark it as spam: http://mailfilter.proteus-technologies.com/cgi-bin/learn-msg.cgi?id=6574C2821B.A5164
Re: How to delete rows in a FIFO manner
Our problem does not require significant map/reduce ops, and queries tend to be for sequential rows with the timeframe being the primary consideration. So time-bounded tables are not a big hurdle, as they might be were other columns primary keys or considerations for query or map/reduce ops. TTL timestamp - that may be just the magic I was looking for... thanks, I'll look at that. td On 8/6/2010 11:59 AM, Venkatesh wrote: I wrestled with that idea of time bounded tables..Would it make it harder to write code/run map reduce on multiple tables ? Also, how do u decide to when to do the cut over (start of a new day, week/month..) if u do how to process data that cross those time boundaries efficiently.. Guess that is not your requirement.. If it is fixed time cut over, is n't enough to set the TTL timestamp ? Interesting thread..thanks -Original Message- From: Thomas Downingtdown...@proteus-technologies.com To: user@hbase.apache.orguser@hbase.apache.org Sent: Fri, Aug 6, 2010 11:39 am Subject: Re: How to delete rows in a FIFO manner Thanks for the suggestions. The problem isn't generating the Delete objects, or the delete operation itself - both are fast enough. The problem is generating the list of row keys from which the Delete objects are created. For now, the obvious work-around is to create and drop tables on the fly, using HBaseAdmin, with the tables being time-bounded. When the high end of a table passes the expiry time, just drop the table. When a table is written with the first record greater than the low bound, create a new table for the next time interval. As I am having other problems related to high ingest rates, the fact may be that I am just using the wrong tool for the job. Thanks td On 8/6/2010 10:24 AM, Jean-Daniel Cryans wrote: If the inserts are coming from more than 1 client, and your are trying to delete from only 1 client, then likely it won't work. You could try using a pool of deleters (multiple threads that delete rows) that you feed from the scanner. Or you could run a MapReduce that would parallelize that for you, that takes your table as an input and that outputs Delete objects. J-D On Fri, Aug 6, 2010 at 5:50 AM, Thomas Downing tdown...@proteus-technologies.com wrote: Hi, Continuing with testing HBase suitability in a high ingest rate environment, I've come up with a new stumbling block, likely due to my inexperience with HBase. We want to keep and purge records on a time basis: i.e, when a record is older than say, 24 hours, we want to purge it from the database. The problem that I am encountering is the only way I've found to delete records using an arbitrary but strongly ordered over time row id is to scan for rows from lower bound to upper bound, then build an array of Delete using for Result in ResultScanner add new Delete( Result.getRow( ) ) to Delete array. This method is far too slow to keep up with our ingest rate; the iteration over the Results in the ResultScanner is the bottleneck, even though the Scan is limited to a single small column in the column family. The obvious but naive solution is to use a sequential row id where the lower and upper bound can be known. This would allow the building of the array of Delete objects without a scan step. Problem with this approach is how do you guarantee a sequential and non-colliding row id across more than one Put'ing process, and do it efficiently. As it happens, I can do this, but given the details of my operational requirements, it's not a simple thing to do. So I was hoping that I had just missed something. The ideal would be a Delete object that would take row id bounds in the same way that Scan does, allowing the work to be done all on the server side. Does this exists somewhere? Or is there some other way to skin this cat? Thanks Thomas Downing -- Follow this link to mark it as spam: http://mailfilter.proteus-technologies.com/cgi-bin/learn-msg.cgi?id=6574C2821B.A5164 -- Follow this link to mark it as spam: http://mailfilter.proteus-technologies.com/cgi-bin/learn-msg.cgi?id=7E6BE2821B.A4479
Using HBase's export/import function...
Ok, Silly question... Inside the /usr/lib/hbase/*.jar (base jar for HBase) There's an export/import tool. If you supply the #versions, and the start time and end time, you can timebox your scan so your map/reduce job will let you do daily, weekly, etc type of incremental backups. So here's my questions: 1) Is anyone using this. 2) There isn't any documentation, I'm assuming that the start time and end times are timestamps (long values representing the number of miliseconds since the epoch which are what is being stored in hbase). 3) Is there an easy way to convert a date in to a time stamp? (not in ksh, and I'm struggling on finding a way to reverse the datetime object in python. Thx -Mike
Re: Using HBase's export/import function...
On Fri, Aug 6, 2010 at 11:13 AM, Michael Segel michael_se...@hotmail.com wrote: 2) There isn't any documentation, I'm assuming that the start time and end times are timestamps (long values representing the number of miliseconds since the epoch which are what is being stored in hbase). Yes. What kinda doc. do you need? The javadoc on the class is minimal: http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/mapreduce/Export.html 3) Is there an easy way to convert a date in to a time stamp? (not in ksh, and I'm struggling on finding a way to reverse the datetime object in python. On the end of this page it shows you how to do date convertions inside in the hbase shell: http://wiki.apache.org/hadoop/Hbase/Shell St.Ack Thx -Mike
RE: Using HBase's export/import function...
StAck... LOL... The idea is to automate the use of the export function to be run within a cron job. (And yes, there are some use cases where we want to actually back data up.. ;-) I originally wanted to do this in ksh (yeah I'm that old. :-) but ended up looking at Python because I couldn't figure out how to create the time stamp in ksh. As to documentation... just something which tells us what is meant by start time and end time. (Like that its in ms from the epoch instead of making us assume that.) [And you know what they say about assumptions.] As to converting the date / time to a timestamp... In Python: You build up a date object then you can do the following: mytime = datetime.datetime(year,month,day,hour,min,sec) *where hour,min,sec are optional mytimestamp = time.mktime(mytime.timetuple()) I'm in the process of testing this... I think it will work. Thx -Mike Date: Fri, 6 Aug 2010 11:28:57 -0700 Subject: Re: Using HBase's export/import function... From: st...@duboce.net To: user@hbase.apache.org On Fri, Aug 6, 2010 at 11:13 AM, Michael Segel michael_se...@hotmail.com wrote: 2) There isn't any documentation, I'm assuming that the start time and end times are timestamps (long values representing the number of miliseconds since the epoch which are what is being stored in hbase). Yes. What kinda doc. do you need? The javadoc on the class is minimal: http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/mapreduce/Export.html 3) Is there an easy way to convert a date in to a time stamp? (not in ksh, and I'm struggling on finding a way to reverse the datetime object in python. On the end of this page it shows you how to do date convertions inside in the hbase shell: http://wiki.apache.org/hadoop/Hbase/Shell St.Ack Thx -Mike
Re: HBase storage sizing
With respect to the comment below, I'm trying to determine what the minimum IO requirements are for us... For any given value being stored into HBase, is accurate to calculate the size of the row key, family, qualifier, timestamp, and value and use their sum as the amount of data that needs to be written for every insert? Thanks, Andre On Jul 8, 2010, at 5:44 PM, Jean-Daniel Cryans wrote: keep in mind that every value is also stored with it's full key (row key + family + qualifier + timestamp).
Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:
Hello, I'm running hbase 0.20.5, and seeing Puts() fail repeatedly when trying to insert a specific item into the database. Client side I see: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836 for region filestore, I then looked up which node was hosting the given region (filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b) on the gui, found the following debug message in the regionserver log: 2010-08-06 14:23:47,414 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts interrupted at index=0 because:Requested row out of range for HRegion filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836, startKey='bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b', getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633', row='be1681910b02db5da061659c2cb08f501a135c2f065559a37a1761bf6e203d1d' Which appears to be coming from: /regionserver/HRegionServer.java:1786: LOG.debug(Batch puts interrupted at index= + i + because: + Which is coming from: ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658: throw new WrongRegionException(Requested row out of range for + This happens repeatedly on a specific item over at least a day or so, even when not much is happening with the cluster. As far as I can tell, it looks like the logic to select the correct region for a given row is wrong. The row is indeed not in the correct range (at least from what I can tell of the exception thrown), and the check in HRegion.java:1658: /** Make sure this is a valid row for the HRegion */ private void checkRow(final byte [] row) throws IOException { if(!rowIsInRange(regionInfo, row)) { Is correctly rejecting the Put(). So it appears the error would be somewhere in: HRegion.java:1550: private void put(final Mapbyte [],ListKeyValue familyMap, boolean writeToWAL) throws IOException { Which appears to be the actual guts of the insert operation. However, I don't know enough about the design of HRegions to really decipher this method. I'll dig into it more, but I thought it might be more efficient just to ask you guys first. Any ideas? I can update to 0.20.6, but I don't see any fixed jira's on 0.20.6 that seem related.. I could be wrong. I'm not sure what I should do next. Any more information you guys need? Note that I am inserting file into the database, and using it's sha256sum as the key. And the file that is failing does indeed have a sha that corresponds to the key in the message above (and is out of range). Take care, -stu
Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:
Hi, When you run into this problem, it's usually a sign of a META problem, specifically you have a 'hole' in the META table. The META table contains a series of keys like so: table,start_row1,timestamp[data] table,start_row2,timestamp[data] etc When we search for a region for a given row, we build a key like so: 'table,my_row,9*19' and so a search called 'closestRowBefore'. This finds the region that contains this row. Now notice that we only put the start row in the key each region has a start_row,end_row, and all the regions are mutually exclusive and form complete coverage. Imagine a row for a region was missing, we'd consistently find the wrong region and the regionserver would reject the request (correctly so). That is what is probably happening here. Check the table dump in the master web-ui and see if you can find a 'hole'... where the end-key doesnt match up with the start-key. If that is the case, there is a script add_table.rb which is used to fix these things. -ryan On Fri, Aug 6, 2010 at 2:59 PM, Stuart Smith stu24m...@yahoo.com wrote: Hello, I'm running hbase 0.20.5, and seeing Puts() fail repeatedly when trying to insert a specific item into the database. Client side I see: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836 for region filestore, I then looked up which node was hosting the given region (filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b) on the gui, found the following debug message in the regionserver log: 2010-08-06 14:23:47,414 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts interrupted at index=0 because:Requested row out of range for HRegion filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836, startKey='bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b', getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633', row='be1681910b02db5da061659c2cb08f501a135c2f065559a37a1761bf6e203d1d' Which appears to be coming from: /regionserver/HRegionServer.java:1786: LOG.debug(Batch puts interrupted at index= + i + because: + Which is coming from: ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658: throw new WrongRegionException(Requested row out of range for + This happens repeatedly on a specific item over at least a day or so, even when not much is happening with the cluster. As far as I can tell, it looks like the logic to select the correct region for a given row is wrong. The row is indeed not in the correct range (at least from what I can tell of the exception thrown), and the check in HRegion.java:1658: /** Make sure this is a valid row for the HRegion */ private void checkRow(final byte [] row) throws IOException { if(!rowIsInRange(regionInfo, row)) { Is correctly rejecting the Put(). So it appears the error would be somewhere in: HRegion.java:1550: private void put(final Mapbyte [],ListKeyValue familyMap, boolean writeToWAL) throws IOException { Which appears to be the actual guts of the insert operation. However, I don't know enough about the design of HRegions to really decipher this method. I'll dig into it more, but I thought it might be more efficient just to ask you guys first. Any ideas? I can update to 0.20.6, but I don't see any fixed jira's on 0.20.6 that seem related.. I could be wrong. I'm not sure what I should do next. Any more information you guys need? Note that I am inserting file into the database, and using it's sha256sum as the key. And the file that is failing does indeed have a sha that corresponds to the key in the message above (and is out of range). Take care, -stu
Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:
Hello Ryan, Yup. There's a hole, exactly where it should be. I used add_table.rb once before, and am no expert on it. All I have is a note written down: To recover lost tables: ./hbase org.jruby.Main add_table.rb /hbase/filestore Any thing else I need to know? Do I just run the script like so? Anything need to be shut down before I do? Thanks! Take care, -stu --- On Fri, 8/6/10, Ryan Rawson ryano...@gmail.com wrote: From: Ryan Rawson ryano...@gmail.com Subject: Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException: To: user@hbase.apache.org Date: Friday, August 6, 2010, 6:08 PM Hi, When you run into this problem, it's usually a sign of a META problem, specifically you have a 'hole' in the META table. The META table contains a series of keys like so: table,start_row1,timestamp [data] table,start_row2,timestamp [data] etc When we search for a region for a given row, we build a key like so: 'table,my_row,9*19' and so a search called 'closestRowBefore'. This finds the region that contains this row. Now notice that we only put the start row in the key each region has a start_row,end_row, and all the regions are mutually exclusive and form complete coverage. Imagine a row for a region was missing, we'd consistently find the wrong region and the regionserver would reject the request (correctly so). That is what is probably happening here. Check the table dump in the master web-ui and see if you can find a 'hole'... where the end-key doesnt match up with the start-key. If that is the case, there is a script add_table.rb which is used to fix these things. -ryan On Fri, Aug 6, 2010 at 2:59 PM, Stuart Smith stu24m...@yahoo.com wrote: Hello, I'm running hbase 0.20.5, and seeing Puts() fail repeatedly when trying to insert a specific item into the database. Client side I see: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836 for region filestore, I then looked up which node was hosting the given region (filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b) on the gui, found the following debug message in the regionserver log: 2010-08-06 14:23:47,414 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts interrupted at index=0 because:Requested row out of range for HRegion filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836, startKey='bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b', getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633', row='be1681910b02db5da061659c2cb08f501a135c2f065559a37a1761bf6e203d1d' Which appears to be coming from: /regionserver/HRegionServer.java:1786: LOG.debug(Batch puts interrupted at index= + i + because: + Which is coming from: ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658: throw new WrongRegionException(Requested row out of range for + This happens repeatedly on a specific item over at least a day or so, even when not much is happening with the cluster. As far as I can tell, it looks like the logic to select the correct region for a given row is wrong. The row is indeed not in the correct range (at least from what I can tell of the exception thrown), and the check in HRegion.java:1658: /** Make sure this is a valid row for the HRegion */ private void checkRow(final byte [] row) throws IOException { if(!rowIsInRange(regionInfo, row)) { Is correctly rejecting the Put(). So it appears the error would be somewhere in: HRegion.java:1550: private void put(final Mapbyte [],ListKeyValue familyMap, boolean writeToWAL) throws IOException { Which appears to be the actual guts of the insert operation. However, I don't know enough about the design of HRegions to really decipher this method. I'll dig into it more, but I thought it might be more efficient just to ask you guys first. Any ideas? I can update to 0.20.6, but I don't see any fixed jira's on 0.20.6 that seem related.. I could be wrong. I'm not sure what I should do next. Any more information you guys need? Note that I am inserting file into the database, and using it's sha256sum as the key. And the file that is failing does indeed have a sha that corresponds to the key in the message above (and is out of range). Take care, -stu
Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:
Just to follow up - I ran add_table as I had done when I lost a table before - and it fixed the error. Thanks! Take care, -stu --- On Fri, 8/6/10, Stuart Smith stu24m...@yahoo.com wrote: From: Stuart Smith stu24m...@yahoo.com Subject: Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException: To: user@hbase.apache.org Date: Friday, August 6, 2010, 6:50 PM Hello Ryan, Yup. There's a hole, exactly where it should be. I used add_table.rb once before, and am no expert on it. All I have is a note written down: To recover lost tables: ./hbase org.jruby.Main add_table.rb /hbase/filestore Any thing else I need to know? Do I just run the script like so? Anything need to be shut down before I do? Thanks! Take care, -stu --- On Fri, 8/6/10, Ryan Rawson ryano...@gmail.com wrote: From: Ryan Rawson ryano...@gmail.com Subject: Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException: To: user@hbase.apache.org Date: Friday, August 6, 2010, 6:08 PM Hi, When you run into this problem, it's usually a sign of a META problem, specifically you have a 'hole' in the META table. The META table contains a series of keys like so: table,start_row1,timestamp [data] table,start_row2,timestamp [data] etc When we search for a region for a given row, we build a key like so: 'table,my_row,9*19' and so a search called 'closestRowBefore'. This finds the region that contains this row. Now notice that we only put the start row in the key each region has a start_row,end_row, and all the regions are mutually exclusive and form complete coverage. Imagine a row for a region was missing, we'd consistently find the wrong region and the regionserver would reject the request (correctly so). That is what is probably happening here. Check the table dump in the master web-ui and see if you can find a 'hole'... where the end-key doesnt match up with the start-key. If that is the case, there is a script add_table.rb which is used to fix these things. -ryan On Fri, Aug 6, 2010 at 2:59 PM, Stuart Smith stu24m...@yahoo.com wrote: Hello, I'm running hbase 0.20.5, and seeing Puts() fail repeatedly when trying to insert a specific item into the database. Client side I see: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836 for region filestore, I then looked up which node was hosting the given region (filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b) on the gui, found the following debug message in the regionserver log: 2010-08-06 14:23:47,414 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts interrupted at index=0 because:Requested row out of range for HRegion filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836, startKey='bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b', getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633', row='be1681910b02db5da061659c2cb08f501a135c2f065559a37a1761bf6e203d1d' Which appears to be coming from: /regionserver/HRegionServer.java:1786: LOG.debug(Batch puts interrupted at index= + i + because: + Which is coming from: ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658: throw new WrongRegionException(Requested row out of range for + This happens repeatedly on a specific item over at least a day or so, even when not much is happening with the cluster. As far as I can tell, it looks like the logic to select the correct region for a given row is wrong. The row is indeed not in the correct range (at least from what I can tell of the exception thrown), and the check in HRegion.java:1658: /** Make sure this is a valid row for the HRegion */ private void checkRow(final byte [] row) throws IOException { if(!rowIsInRange(regionInfo, row)) { Is correctly rejecting the Put(). So it appears the error would be somewhere in: HRegion.java:1550: private void put(final Mapbyte [],ListKeyValue familyMap, boolean writeToWAL) throws IOException { Which appears to be the actual guts of the insert operation. However, I don't know enough about the design of HRegions to really decipher this method. I'll dig into it more, but I thought it might be more efficient just to ask you guys first. Any ideas? I can update to