[jira] [Created] (HBASE-7775) Regionservers continue to read/parse XML config files after startup.
Aravind Gottipati created HBASE-7775: Summary: Regionservers continue to read/parse XML config files after startup. Key: HBASE-7775 URL: https://issues.apache.org/jira/browse/HBASE-7775 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.6 Environment: linux x86_64 Reporter: Aravind Gottipati It appears that the region servers continue to parse the xml config files as a part of their normal operation (and not just on startup). I realize this might be coming from hadoop config parsing etc, but it is still a major problem should you happen to push out a bad xml config. Here is the stack trace from the problem in our environment. 13/02/05 13:46:12 INFO regionserver.HRegion: Starting compaction on region tsdb,\x00\x00\x0FP\xFEU\x10\x00\x00\x01\x00 \xEF\x00\x00\x0B\x00\x00\x0F\x00\x00\x0C\x00\x00\x19,1359827642230.aadcc5a9ef4d4f16fb8937c9e93763a1. 13/02/05 13:46:12 INFO regionserver.Store: Started compaction of 3 file(s) in cf=t into hdfs://nn-blah:8020/hbase/tsdb/aadcc5a9ef4d4f16fb8937c9e93763a1/.tmp, seqid=1704265883, totalSize=141.6m 13/02/05 13:47:57 INFO regionserver.StoreFile: Bloom added to HFile (hdfs://nn-blah:8020/hbase/tsdb/aadcc5a9ef4d4f16fb8937c9e93763a1/.tmp/7838002095040213865): 793.2k, 656192/677964 (97%) 13/02/05 13:47:57 INFO regionserver.StoreFile$Reader: Loaded row bloom filter metadata for hdfs://nn-blah:8020/hbase/tsdb/aadcc5a9ef4d4f16fb8937c9e93763a1/t/8826149174883990976 13/02/05 13:47:57 INFO regionserver.Store: Completed compaction of 3 file(s), new file=hdfs://nn-blah:8020/hbase/tsdb/aadcc5a9ef4d4f16fb8937c9e93763a1/t/8826149174883990976, size=141.5m; total size for store is 2.4g 13/02/05 13:47:57 INFO regionserver.HRegion: completed compaction on region tsdb,\x00\x00\x0FP\xFEU\x10\x00\x00\x01\x00 \xEF\x00\x00\x0B\x00\x00\x0F\x00\x00\x0C\x00\x00\x19,1359827642230.aadcc5a9ef4d4f16fb8937c9e93763a1. after 1mins, 44sec [Fatal Error] mapred-site.xml:173:13: The string -- is not permitted within comments. 13/02/05 13:55:33 FATAL conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException: The string -- is not permitted within comments. 13/02/05 13:55:33 INFO regionserver.StoreFile: Bloom added to HFile (hdfs://nn-blah:8020/hbase/tsdb/74a4a785bc317da7282c331f577918a0/.tmp/4658027967280663602): 5.3k, 1/4519 (0%) [Fatal Error] mapred-site.xml:173:13: The string -- is not permitted within comments. 13/02/05 13:55:33 FATAL conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException: The string -- is not permitted within comments. 13/02/05 13:55:33 FATAL regionserver.HRegionServer: ABORTING region server serverName=hbrs-blah,60020,1360021443595, load=(requests=5434, regions=73, usedHeap=4085, maxHeap=15979): Replay of HLog required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,\x00\x060P\xF4\x1Dp\x00\x00\x01\x00\x07\xDB\x00\x00\x02\x00\x02\x02\x00\x00\x87\x00\xB0Y,1359421493144.74a4a785bc317da7282c331f577918a0. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1054) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:954) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:902) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242) Caused by: java.lang.RuntimeException: org.xml.sax.SAXParseException: The string -- is not permitted within comments. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1251) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1192) at org.apache.hadoop.conf.Configuration.get(Configuration.java:493) at com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205) at com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.close(HFile.java:621) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:877) at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:495) at
[jira] [Commented] (HBASE-3866) Script to add regions gradually to a new regionserver.
[ https://issues.apache.org/jira/browse/HBASE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448647#comment-13448647 ] Aravind Gottipati commented on HBASE-3866: -- I will defer to you folks regarding including this script with the distribution. Stack's suggestion of closing the JIRA is a fine one, like he said - this would leave the script here for others to use. I would however like to note a few things. 1. The script attached here is outdated. A newer version of the script that worked with 0.92 is here (https://github.com/aravind/hbase-utils/blob/master/region_mover.rb). I haven't been keeping up with the latest, so there is a very good chance, it might not work with versions after 0.92. 2. The script is pretty inefficient in how it moves and balances regions. It maintains an internal hashmap (two of them even) of the servers - number of regions, to keep the region count balanced. 3. It is as portable as the original region mover script, since it re-uses most of the same mechanisms. Script to add regions gradually to a new regionserver. -- Key: HBASE-3866 URL: https://issues.apache.org/jira/browse/HBASE-3866 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.90.2 Reporter: Aravind Gottipati Priority: Minor Attachments: 3866-max-regions-per-iteration.patch, slow_balancer.rb, slow_balancer.rb When a new region server is brought online, the current balancer kicks off a whole bunch of region moves and causes a lot of regions to be un-available right away. A slower balancer that gradually balances the cluster is probably a good script to have. I have an initial version that mooches off the region_mover script to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5929) HBaseAdmin.majorCompact and hbase shell randomly throw exceptions when asked to majorcompact regions.
Aravind Gottipati created HBASE-5929: Summary: HBaseAdmin.majorCompact and hbase shell randomly throw exceptions when asked to majorcompact regions. Key: HBASE-5929 URL: https://issues.apache.org/jira/browse/HBASE-5929 Project: HBase Issue Type: Bug Components: client, shell Affects Versions: 0.92.1 Environment: Linux Ubuntu Lucid 64bit Reporter: Aravind Gottipati Priority: Minor I have been noticing that calls to HBaseAdmin.majorCompact throws exceptions randomly for some regions. I could not find a pattern to these exception. The code I have simply does this admin.majorCompact(region.getRegionNameAsString()). admin is an instance of HBaseAdmin and region is an instance of HRegionInfo. The exception I get is org.apache.hadoop.hbase.TableNotFoundException: -ROOT-,,0 at org.apache.hadoop.hbase.client.HBaseAdmin.tableNameString(HBaseAdmin.java:1473) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1235) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1209) ~[hbase-0.92.1.jar:0.92.1] at com.stumbleupon.hbaseadmin.HBaseCompact.compactAllServers(Unknown Source) [hbase_compact.jar:na] In this case it's the root region, but I get similar exceptions for other tables, like this. 2012-05-03 19:03:42,994 WARN [main] HBaseCompact: Could not compact: org.apache.hadoop.hbase.TableNotFoundException: ad_daily,49842:2009-07-10,1269763588508.1997607018 at org.apache.hadoop.hbase.client.HBaseAdmin.tableNameString(HBaseAdmin.java:1473) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1235) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1209) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1196) ~[hbase-0.92.1.jar:0.92.1] at com.stumbleupon.hbaseadmin.HBaseCompact.compactAllServers(Unknown Source) [hbase_compact.jar:na] at com.stumbleupon.hbaseadmin.HBaseCompact.main(Unknown Source) [hbase_compact.jar:na] I see this on hbase shell as well. However, I don't see these exceptions if I use admin.majorCompact(region.getRegionName()), so it looks like something gets lost when I use getRegionNameAsString(). Let me know if I can provide more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5929) HBaseAdmin.majorCompact and hbase shell randomly throw exceptions when asked to majorcompact regions.
[ https://issues.apache.org/jira/browse/HBASE-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267792#comment-13267792 ] Aravind Gottipati commented on HBASE-5929: -- Here is the output from hbase shell for a similar table: hbase(main):004:0 major_compact 'ad_campaign_daily_stumbles,81738:2009-02-08,1269765634190.1290583321' ERROR: Unknown table ad_campaign_daily_stumbles,81738:2009-02-08,1269765634190.1290583321! Here is some help for this command: Run major compaction on passed table or pass a region row to major compact an individual region hbase(main):005:0 I get these region names by querying the HRegionInterface of the server, and then proceed to compress them. This is all on the dev cluster (if you want to replicate/test). HBaseAdmin.majorCompact and hbase shell randomly throw exceptions when asked to majorcompact regions. - Key: HBASE-5929 URL: https://issues.apache.org/jira/browse/HBASE-5929 Project: HBase Issue Type: Bug Components: client, shell Affects Versions: 0.92.1 Environment: Linux Ubuntu Lucid 64bit Reporter: Aravind Gottipati Priority: Minor I have been noticing that calls to HBaseAdmin.majorCompact throws exceptions randomly for some regions. I could not find a pattern to these exception. The code I have simply does this admin.majorCompact(region.getRegionNameAsString()). admin is an instance of HBaseAdmin and region is an instance of HRegionInfo. The exception I get is org.apache.hadoop.hbase.TableNotFoundException: -ROOT-,,0 at org.apache.hadoop.hbase.client.HBaseAdmin.tableNameString(HBaseAdmin.java:1473) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1235) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1209) ~[hbase-0.92.1.jar:0.92.1] at com.stumbleupon.hbaseadmin.HBaseCompact.compactAllServers(Unknown Source) [hbase_compact.jar:na] In this case it's the root region, but I get similar exceptions for other tables, like this. 2012-05-03 19:03:42,994 WARN [main] HBaseCompact: Could not compact: org.apache.hadoop.hbase.TableNotFoundException: ad_daily,49842:2009-07-10,1269763588508.1997607018 at org.apache.hadoop.hbase.client.HBaseAdmin.tableNameString(HBaseAdmin.java:1473) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1235) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1209) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1196) ~[hbase-0.92.1.jar:0.92.1] at com.stumbleupon.hbaseadmin.HBaseCompact.compactAllServers(Unknown Source) [hbase_compact.jar:na] at com.stumbleupon.hbaseadmin.HBaseCompact.main(Unknown Source) [hbase_compact.jar:na] I see this on hbase shell as well. However, I don't see these exceptions if I use admin.majorCompact(region.getRegionName()), so it looks like something gets lost when I use getRegionNameAsString(). Let me know if I can provide more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4298) Support to drain RS nodes through ZK
Support to drain RS nodes through ZK Key: HBASE-4298 URL: https://issues.apache.org/jira/browse/HBASE-4298 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Environment: all Reporter: Aravind Gottipati Priority: Minor Fix For: 0.90.4 HDFS currently has a way to exclude certain datanodes and prevent them from getting new blocks. HDFS goes one step further and even drains these nodes for you. This enhancement is a step in that direction. The idea is that we mark nodes in zookeeper as draining nodes. This means that they don't get any more new regions. These draining nodes look exactly the same as the corresponding nodes in /rs, except they live under /draining. Eventually, support for draining them can be added. I am submitting two patches for review - one for the 0.90 branch and one for trunk (in git). Here are the two patches 0.90 - https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2 trunk - https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5 I have tested both these patches and they work as advertised. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3866) Script to add regions gradually to a new regionserver.
Script to add regions gradually to a new regionserver. -- Key: HBASE-3866 URL: https://issues.apache.org/jira/browse/HBASE-3866 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.90.2 Reporter: Aravind Gottipati Priority: Minor When a new region server is brought online, the current balancer kicks off a whole bunch of region moves and causes a lot of regions to be un-available right away. A slower balancer that gradually balances the cluster is probably a good script to have. I have an initial version that mooches off the region_mover script to do this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3866) Script to add regions gradually to a new regionserver.
[ https://issues.apache.org/jira/browse/HBASE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravind Gottipati updated HBASE-3866: - Attachment: slow_balancer.rb This script uses a lot of the code from region_mover.rb. The script should be invoked like this. HBASE_NOEXEC=true $HBASE_HOME/bin/hbase org.jruby.Main $HBASE_HOME/bin/slow_balancer.rb --debug -l 2 The -l option is the target difference between the server with the maximum regions and the server with the minimum regions. Once the delta reaches this point, the script exits. If -l is not passed, it defaults to the number of region servers in your environment. Script to add regions gradually to a new regionserver. -- Key: HBASE-3866 URL: https://issues.apache.org/jira/browse/HBASE-3866 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.90.2 Reporter: Aravind Gottipati Priority: Minor Attachments: slow_balancer.rb When a new region server is brought online, the current balancer kicks off a whole bunch of region moves and causes a lot of regions to be un-available right away. A slower balancer that gradually balances the cluster is probably a good script to have. I have an initial version that mooches off the region_mover script to do this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3866) Script to add regions gradually to a new regionserver.
[ https://issues.apache.org/jira/browse/HBASE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravind Gottipati updated HBASE-3866: - Attachment: slow_balancer.rb Script to add regions gradually to a new regionserver. -- Key: HBASE-3866 URL: https://issues.apache.org/jira/browse/HBASE-3866 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.90.2 Reporter: Aravind Gottipati Priority: Minor Attachments: slow_balancer.rb, slow_balancer.rb When a new region server is brought online, the current balancer kicks off a whole bunch of region moves and causes a lot of regions to be un-available right away. A slower balancer that gradually balances the cluster is probably a good script to have. I have an initial version that mooches off the region_mover script to do this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira