[jira] Created: (HBASE-2905) Nullpointer Exception is throwed when insert mass data via rest interface
Nullpointer Exception is throwed when insert mass data via rest interface - Key: HBASE-2905 URL: https://issues.apache.org/jira/browse/HBASE-2905 Project: HBase Issue Type: Bug Components: rest Affects Versions: 0.89.20100621 Environment: CentOS 5.2 x86_64, HBase 0.89 Reporter: Sandy Yin Assignee: Sandy Yin Fix For: 0.90.0 Nullpointer Exception is throwed when insert mass data via rest interface. {code} java.lang.NullPointerException at org.mortbay.io.ByteArrayBuffer.wrap(ByteArrayBuffer.java:361) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:588) at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:233) at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:108) at org.apache.hadoop.hbase.rest.provider.producer.ProtobufMessageBodyProducer.writeTo(ProtobufMessageBodyProducer.java:78) at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:254) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:744) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:667) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:658) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:318) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:425) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:604) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:389) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520) {code} The issue is caused by using WeakHashMap as buffer. When get the object from the buffer, the JVM gc possible has removed the object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2905) Nullpointer Exception is throwed when insert mass data via rest interface
[ https://issues.apache.org/jira/browse/HBASE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Yin updated HBASE-2905: - Attachment: TextMessageBodyProducer.java.patch Nullpointer Exception is throwed when insert mass data via rest interface - Key: HBASE-2905 URL: https://issues.apache.org/jira/browse/HBASE-2905 Project: HBase Issue Type: Bug Components: rest Affects Versions: 0.89.20100621 Environment: CentOS 5.2 x86_64, HBase 0.89 Reporter: Sandy Yin Assignee: Sandy Yin Fix For: 0.90.0 Attachments: TextMessageBodyProducer.java.patch Nullpointer Exception is throwed when insert mass data via rest interface. {code} java.lang.NullPointerException at org.mortbay.io.ByteArrayBuffer.wrap(ByteArrayBuffer.java:361) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:588) at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:233) at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:108) at org.apache.hadoop.hbase.rest.provider.producer.ProtobufMessageBodyProducer.writeTo(ProtobufMessageBodyProducer.java:78) at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:254) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:744) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:667) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:658) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:318) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:425) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:604) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:389) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520) {code} The issue is caused by using WeakHashMap as buffer. When get the object from the buffer, the JVM gc possible has removed the object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2905) Nullpointer Exception is throwed when insert mass data via rest interface
[ https://issues.apache.org/jira/browse/HBASE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Yin updated HBASE-2905: - Attachment: HBase-2905-89.patch Nullpointer Exception is throwed when insert mass data via rest interface - Key: HBASE-2905 URL: https://issues.apache.org/jira/browse/HBASE-2905 Project: HBase Issue Type: Bug Components: rest Affects Versions: 0.89.20100621 Environment: CentOS 5.2 x86_64, HBase 0.89 Reporter: Sandy Yin Assignee: Sandy Yin Fix For: 0.90.0 Attachments: HBase-2905-89.patch Nullpointer Exception is throwed when insert mass data via rest interface. {code} java.lang.NullPointerException at org.mortbay.io.ByteArrayBuffer.wrap(ByteArrayBuffer.java:361) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:588) at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:233) at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:108) at org.apache.hadoop.hbase.rest.provider.producer.ProtobufMessageBodyProducer.writeTo(ProtobufMessageBodyProducer.java:78) at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:254) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:744) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:667) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:658) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:318) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:425) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:604) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:389) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520) {code} The issue is caused by using WeakHashMap as buffer. When get the object from the buffer, the JVM gc possible has removed the object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2905) Nullpointer Exception is throwed when insert mass data via rest interface
[ https://issues.apache.org/jira/browse/HBASE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Yin updated HBASE-2905: - Attachment: (was: TextMessageBodyProducer.java.patch) Nullpointer Exception is throwed when insert mass data via rest interface - Key: HBASE-2905 URL: https://issues.apache.org/jira/browse/HBASE-2905 Project: HBase Issue Type: Bug Components: rest Affects Versions: 0.89.20100621 Environment: CentOS 5.2 x86_64, HBase 0.89 Reporter: Sandy Yin Assignee: Sandy Yin Fix For: 0.90.0 Attachments: HBase-2905-89.patch Nullpointer Exception is throwed when insert mass data via rest interface. {code} java.lang.NullPointerException at org.mortbay.io.ByteArrayBuffer.wrap(ByteArrayBuffer.java:361) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:588) at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:233) at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:108) at org.apache.hadoop.hbase.rest.provider.producer.ProtobufMessageBodyProducer.writeTo(ProtobufMessageBodyProducer.java:78) at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:254) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:744) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:667) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:658) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:318) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:425) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:604) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:389) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520) {code} The issue is caused by using WeakHashMap as buffer. When get the object from the buffer, the JVM gc possible has removed the object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2905) Nullpointer Exception is throwed when insert mass data via rest interface
[ https://issues.apache.org/jira/browse/HBASE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Yin updated HBASE-2905: - Status: Patch Available (was: Open) Release Note: Fix Nullpointer exception when insert mass data via rest interface Use thread local variable instead of WeakHashmap to prevent the Nullpointer Exception. Nullpointer Exception is throwed when insert mass data via rest interface - Key: HBASE-2905 URL: https://issues.apache.org/jira/browse/HBASE-2905 Project: HBase Issue Type: Bug Components: rest Affects Versions: 0.89.20100621 Environment: CentOS 5.2 x86_64, HBase 0.89 Reporter: Sandy Yin Assignee: Sandy Yin Fix For: 0.90.0 Attachments: HBase-2905-89.patch Nullpointer Exception is throwed when insert mass data via rest interface. {code} java.lang.NullPointerException at org.mortbay.io.ByteArrayBuffer.wrap(ByteArrayBuffer.java:361) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:588) at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:233) at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:108) at org.apache.hadoop.hbase.rest.provider.producer.ProtobufMessageBodyProducer.writeTo(ProtobufMessageBodyProducer.java:78) at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:254) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:744) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:667) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:658) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:318) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:425) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:604) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:389) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520) {code} The issue is caused by using WeakHashMap as buffer. When get the object from the buffer, the JVM gc possible has removed the object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896498#action_12896498 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/ --- (Updated 2010-08-09 03:52:11.875655) Review request for hbase. Changes --- Quite a lot of changes have been made according Todd's review, here are some major ones: 1. Refactor SnapshotMonitor into one part that is master-global and another part that is created once per-snapshot (SnapshotTracker). 2. Catch exceptions in HMaster.snapshot and clean up the snapshot if exceptions occur. 3. Always quit snapshot mode for regions no matter whether the snapshot is created successfully on RS. 4. Add a mechanism to check and synchronize the reference count in META with the number of reference files in BaseScanner. 5. Add snapshot operations: DeleteSnapshot, RestoreSnapshot and corresponding tests (in TestAdmin). Summary --- This patch includes the first three sub-tasks of HBASE-50: 1. Start and monitor the creation of snapshot via ZooKeeper 2. Create snapshot of an HBase table 3. Some existing functions of HBase are modified to support snapshot Currently snapshots can be created as expected, but can not be restored or deleted yet This addresses bug HBASE-50. http://issues.apache.org/jira/browse/HBASE-50 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/HConstants.java c77ebf5 src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 src/main/java/org/apache/hadoop/hbase/SnapshotExistsException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/TablePartialOpenException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java ed12e7a src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 85fde3a src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b2de7e4 src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java d4bcbed src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1183584 src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 69eab39 src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/HMaster.java e4bd30d src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736 src/main/java/org/apache/hadoop/hbase/regionserver/SnapshotThread.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 757a50c src/main/java/org/apache/hadoop/hbase/regionserver/ZKSnapshotWatcher.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 9593286 src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java 4d4b00a src/main/java/org/apache/hadoop/hbase/util/FSUtils.java 5cf3481 src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 3827fa5 src/main/resources/hbase-default.xml b73f0ff src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java c9b78b9 src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 34b8044 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896515#action_12896515 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java, line 22 bq. http://review.cloudera.org/r/467/diff/2/?file=4140#file4140line22 bq. bq. worth noting that this class is not thread-safe? I don't know if these classes need to be thread safe, but you're using an unsynchronized hashset. Also, since refreshHLogsAndSearch clears hlogs before re-adding stuff, it needs to be synchronized more than just using a synchronized collection. This class is only instantiated once by LogsCleaner so it can be seen as a singleton per master. bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java, line 116 bq. http://review.cloudera.org/r/467/diff/2/?file=4141#file4141line116 bq. bq. does ZKW automatically re-watch the nodes for you, here? bq. bq. Also, how does this interact with region server failure? We just assume that the snapshot will timeout and abort, instead of proactively detecting? Yes, the ZKW automatically re-watch the nodes. For snapshot abort, if any region server fails to perform the snapshot, it will remove corresponding ready and finished nodes under snapshot directory. This would notify the master snapshot failure and further abort snapshot on all region servers via ZK For snapshot timeout, it is not detected here. In method waitToFinish, the snapshot status is checked at a regular time (3 seconds here). If this method timeout, exception would be thrown and master will abort the snapshot over the cluster. bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java, line 132 bq. http://review.cloudera.org/r/467/diff/2/?file=4143#file4143line132 bq. bq. is there a process that scans for cases where the reference count has gotten out of sync? bq. I'm worried about a case where a snapshot is half-done, and then it fails, so the snapshot is considered aborted, but we never clean up the references because META has been incremented. This is added in META scanner. Since scanning reference files is expensive, only a few regions are checked and synchronized in one scan. A last checking time is added so that all reference regions are guaranteed to be checked eventually bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java, line 1403 bq. http://review.cloudera.org/r/467/diff/2/?file=4153#file4153line1403 bq. bq. these checks are inherently racy Then remove it? bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 585 bq. http://review.cloudera.org/r/467/diff/2/?file=4148#file4148line585 bq. bq. this seems prone to collision if it's multithreaded, since the exists check and the use of the filename aren't atomic Then how to guarantee atomicity? This unique file name should be unique respecting existing files and files which are already compacted and deleted. Otherwise there might be a name collision in archive directory for HFiles bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java, line 132 bq. http://review.cloudera.org/r/467/diff/2/?file=4130#file4130line132 bq. bq. since we're using the snapshot name as a directory name in HDFS, it has to be a UTF8 string, so why not just keep it as a String above too? I implemented this class following HTableDescriptor. And even for table name, it is usually used as a byte array instead of String bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 61 bq. http://review.cloudera.org/r/467/diff/2/?file=4134#file4134line61 bq. bq. to keep compatibility with current storefiles, entire should be value 2, and bottom should be 0 bq. bq. while we're at it, maybe rename these to be all caps - Range.TOP, Range.BOTTOM, etc Have been renamed in the latest revision bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, lines 918-919 bq. http://review.cloudera.org/r/467/diff/2/?file=4138#file4138line918 bq. bq. should this be an exception, rather than a result code? ie is it normal to fail? Currently all results except ALl_FINISH would throw an exception. bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 925 bq.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896516#action_12896516 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-03 09:58:06, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java, line 246 bq. http://review.cloudera.org/r/467/diff/2/?file=4141#file4141line246 bq. bq. I think this should be (retries == 4) for 3 retries this is actually not 'retry' for snapshot, but check whether the snapshot is finished for three times (retries = 0, 1, 2). - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review631 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2905) Nullpointer Exception is throwed when insert mass data via rest interface
[ https://issues.apache.org/jira/browse/HBASE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896553#action_12896553 ] stack commented on HBASE-2905: -- +1 What you think Andrew? Nullpointer Exception is throwed when insert mass data via rest interface - Key: HBASE-2905 URL: https://issues.apache.org/jira/browse/HBASE-2905 Project: HBase Issue Type: Bug Components: rest Affects Versions: 0.89.20100621 Environment: CentOS 5.2 x86_64, HBase 0.89 Reporter: Sandy Yin Assignee: Sandy Yin Fix For: 0.90.0 Attachments: HBase-2905-89.patch Nullpointer Exception is throwed when insert mass data via rest interface. {code} java.lang.NullPointerException at org.mortbay.io.ByteArrayBuffer.wrap(ByteArrayBuffer.java:361) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:588) at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:233) at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:108) at org.apache.hadoop.hbase.rest.provider.producer.ProtobufMessageBodyProducer.writeTo(ProtobufMessageBodyProducer.java:78) at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:254) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:744) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:667) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:658) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:318) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:425) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:604) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:389) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520) {code} The issue is caused by using WeakHashMap as buffer. When get the object from the buffer, the JVM gc possible has removed the object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2905) Nullpointer Exception is throwed when insert mass data via rest interface
[ https://issues.apache.org/jira/browse/HBASE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896568#action_12896568 ] Andrew Purtell commented on HBASE-2905: --- Looks good, +1, I'll commit. Nullpointer Exception is throwed when insert mass data via rest interface - Key: HBASE-2905 URL: https://issues.apache.org/jira/browse/HBASE-2905 Project: HBase Issue Type: Bug Components: rest Affects Versions: 0.89.20100621 Environment: CentOS 5.2 x86_64, HBase 0.89 Reporter: Sandy Yin Assignee: Sandy Yin Fix For: 0.90.0 Attachments: HBase-2905-89.patch Nullpointer Exception is throwed when insert mass data via rest interface. {code} java.lang.NullPointerException at org.mortbay.io.ByteArrayBuffer.wrap(ByteArrayBuffer.java:361) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:588) at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:233) at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:108) at org.apache.hadoop.hbase.rest.provider.producer.ProtobufMessageBodyProducer.writeTo(ProtobufMessageBodyProducer.java:78) at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:254) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:744) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:667) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:658) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:318) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:425) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:604) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:389) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520) {code} The issue is caused by using WeakHashMap as buffer. When get the object from the buffer, the JVM gc possible has removed the object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HBASE-2844) Capping the number of regions
[ https://issues.apache.org/jira/browse/HBASE-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-2844: --- Configuration parameters such as this should be able to be applied per table via a table attribute, like how we do it with split file size threshold. Capping the number of regions - Key: HBASE-2844 URL: https://issues.apache.org/jira/browse/HBASE-2844 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Pranav Khaitan Assignee: Pranav Khaitan Priority: Minor Fix For: 0.90.0 Attachments: RegionCappingV2.patch It may sometimes be advantageous to prevent the number of regions from growing very large. This may happen if the values are large in size even though the number of keyvalues are not large. If the number of regions becomes too large, then it is difficult to accommodate the memstore for each region in memory. In such cases, we either have to flush out memstore to disk or decrease size of each memstore. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2905) Nullpointer Exception is throwed when insert mass data via rest interface
[ https://issues.apache.org/jira/browse/HBASE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-2905: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Committed to trunk. Nullpointer Exception is throwed when insert mass data via rest interface - Key: HBASE-2905 URL: https://issues.apache.org/jira/browse/HBASE-2905 Project: HBase Issue Type: Bug Components: rest Affects Versions: 0.89.20100621 Environment: CentOS 5.2 x86_64, HBase 0.89 Reporter: Sandy Yin Assignee: Sandy Yin Fix For: 0.90.0 Attachments: HBase-2905-89.patch Nullpointer Exception is throwed when insert mass data via rest interface. {code} java.lang.NullPointerException at org.mortbay.io.ByteArrayBuffer.wrap(ByteArrayBuffer.java:361) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:588) at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:233) at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:108) at org.apache.hadoop.hbase.rest.provider.producer.ProtobufMessageBodyProducer.writeTo(ProtobufMessageBodyProducer.java:78) at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:254) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:744) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:667) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:658) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:318) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:425) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:604) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:389) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520) {code} The issue is caused by using WeakHashMap as buffer. When get the object from the buffer, the JVM gc possible has removed the object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog
[ https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896680#action_12896680 ] stack commented on HBASE-2312: -- When I apply this patch to trunk, unit tests hang. Are we missing something from our hadoop Nicolas? We have 0.20.3-append-r964955-1240 committed to trunk (Thats the svn revision and patch 1240 applied). I see for example that the TestZookeeper is stuck doing this: {code} 2010-08-09 12:45:23,964 WARN [IPC Server handler 6 on 62023] namenode.FSNamesystem(1166): DIR* NameSystem.startFile: failed to create file /user/Stack/.logs/h135.sfo.stumble.net,62051,1281382952055-splitting/10.10.1.135%3A62051.1281382952241 for DFSClient_-1591590456 on client 127.0.0.1, because this file is already being created by DFSClient_hb_m_10.10.1.135:62037 on 127.0.0.1 2010-08-09 12:45:23,965 WARN [master] util.FSUtils(631): Waited 151520ms for lease recovery on hdfs://localhost:62023/user/Stack/.logs/h135.sfo.stumble.net,62051,1281382952055-splitting/10.10.1.135%3A62051.1281382952241:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /user/Stack/.logs/h135.sfo.stumble.net,62051,1281382952055-splitting/10.10.1.135%3A62051.1281382952241 for DFSClient_-1591590456 on client 127.0.0.1, because this file is already being created by DFSClient_hb_m_10.10.1.135:62037 on 127.0.0.1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1093) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1181) at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:396) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955) {code} This test passes if I do not have /HBASE-2312-2.patch in place. Possible data loss when RS goes into GC pause while rolling HLog Key: HBASE-2312 URL: https://issues.apache.org/jira/browse/HBASE-2312 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.20.3 Reporter: Karthik Ranganathan Assignee: Nicolas Spiegelberg Fix For: 0.90.0 There is a very corner case when bad things could happen(ie data loss): 1)RS #1 is going to roll its HLog - not yet created the new one, old one will get no more writes 2)RS #1 enters GC Pause of Death 3)Master lists HLog files of RS#1 that is has to split as RS#1 is dead, starts splitting 4)RS #1 wakes up, created the new HLog (previous one was rolled) and appends an edit - which is lost The following seems like a possible solution: 1)Master detects RS#1 is dead 2)The master renames the /hbase/.logs/regionserver name directory to something else (say /hbase/.logs/regionserver name-dead) 3)Add mkdir support (as opposed to mkdirs) to HDFS - so that a file create fails if the directory doesn't exist. Dhruba tells me this is very doable. 4)RS#1 comes back up and is not able create the new hlog. It restarts itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2902) Improve our default shipping GC config. and doc -- along the way do a bit of GC myth-busting
[ https://issues.apache.org/jira/browse/HBASE-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-2902: --- Attachment: Fragger.java From the Azul guys, Fragger which is in the public domain. A tool to induce fragmentation in a heap to trigger a full heap compaction. Improve our default shipping GC config. and doc -- along the way do a bit of GC myth-busting Key: HBASE-2902 URL: https://issues.apache.org/jira/browse/HBASE-2902 Project: HBase Issue Type: Improvement Reporter: stack Attachments: Fragger.java This issue is about improving the near-term story, working with our current lot, the slowly evolving (?) 1.6x JVMs and CMS (Longer-term, another issue in hbase tracks the G1 story and longer term, Todd is making a bit of traction over on the GC hotspot list). At the moment we ship with CMS and i-CMS enabled by default. At a minimum, i-cms does not apply on most hw hbase is deployed on -- i-cms is for hw w/ 2 or less processors -- and it seems as though we do not use multiple threads doing YG collections; i.e. -XX:UseParNewGC Use parallel threads in the new generation (Here's what I see...it seems to be off in jdk6 according to http://www.md.pp.ru/~eu/jdk6options.html#UseParNewGC but then this says its on by default when use CMS - http://blogs.sun.com/jonthecollector/category/Java ... but then this says enable it http://www.austinjug.org/presentations/JDK6PerfUpdate_Dec2009.pdf. I see this when its enabled: [Rescan (parallel) ... so it seems like its off. Need to review the src code). We should make the above changes or at least doc them. We should consider enabling GC logging by default. Its low cost apparently (citation below). We'd just need to do something about the log management. Not sure you can roll them -- investigate -- and anyways we should roll on startup at least so we don't lose GC logs across restarts. We should play with initiating ratios; maybe starting CMS earlier will push out the fragmented heap that brings on the killer stop-the-world collection. I read somewhere recently that invoking System.gc will run a CMS GC if CMS is enabled. We should investigate. If it ran the serial collector, we could at least doc. that users could run a defragmenting stop-the-world serial collection on 'off' times or at least make it so the stop-the-world happened when expected instead of at some random time. While here, lets do a bit of myth-busting. Here's a few postulates: + Keep the young generation small or at least, cap its size else it grows to occupy a large part of the heap The above is a Ryanism. Doing the above -- along w/ massive heap size -- has put off the fragmentation that others run into at SU at least. Interestingly, this document -- http://www.google.com/url?sa=tsource=webcd=1ved=0CBcQFjAAurl=http%3A%2F%2Fmediacast.sun.com%2Fusers%2FLudovic%2Fmedia%2FGCTuningPresentationFISL10.pdfei=ZPtaTOiLL5bcsAa7gsl1usg=AFQjCNHP691SIIE-6NSKccM4mZtm1U6Ahwsig2=2cjvcaeyn1aISL2THEENjQ -- would seem to recommend near the opposite in that it suggests that when using CMS, do all you can to keep stuff in the YG. Avoid having stuff age up to the tenured heap if you can. This would seem imply using a larger YG. Chatting w/ Ryan, the reason to keep the YG small is so we don't have long pauses doing YG collections. According to the above citation, its not big YGs that cause long YG pauses but the copying of data (not sure if its copying of data inside the YG or if it meant copying up to tenured -- chatting w/ Ryan we thought there'd be no difference -- but we should investigate) I look a look at a running upload with a small heap admittedly. What I was seeing was that using our defaults, rare was anything in YG of age 1 GC; i.e. near everything in YG was being promoted. This may have been a symptom of my small (default) heap but we should look into this and try and ensure objects are promoted because they are old, not because there is not enough space in YG. + We should write a slab allocator or allocate memory outside of the JVM heap Thinking on this, slab allocator, while a lot of work, I can see it helping us w/ block cache, but what if memstore is the fragmented-heap maker? In this case, slab-allocator is only part of the fix. It should be easy to see which is the fragmented heap maker since we can turn off the cache easy enough (though it seems like its accessed anyways even if disabled -- need to make sure its not doing allocations to the cache in this case) Other things while on this topic. We need to come up w/ a loading that brings on the CMS fault that comes of a fragmented heap (CMS is non-compacting but
[jira] Commented: (HBASE-2870) Add Backup CLI Option to HMaster
[ https://issues.apache.org/jira/browse/HBASE-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896743#action_12896743 ] HBase Review Board commented on HBASE-2870: --- Message from: Nicolas nspiegelb...@facebook.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/592/ --- Review request for hbase. Summary --- Adds '--backup' option to HMaster. All backup masters will wait until a primary master has written the /hbase/master znode before poll waiting. This allows us to have a deterministic primary master while starting up multiple masters. Note that you can specify a list of backup masters to automatically start/stop via the 'conf/backup-masters' file. This addresses bug HBASE-2870. http://issues.apache.org/jira/browse/HBASE-2870 Diffs - trunk/bin/hbase-config.sh 983803 trunk/bin/hbase-daemons.sh 983803 trunk/bin/local-master-backup.sh 983803 trunk/bin/master-backup.sh PRE-CREATION trunk/bin/start-hbase.sh 983803 trunk/bin/stop-hbase.sh 983803 trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java 983803 trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 983803 trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 983803 Diff: http://review.cloudera.org/r/592/diff Testing --- Thanks, Nicolas Add Backup CLI Option to HMaster Key: HBASE-2870 URL: https://issues.apache.org/jira/browse/HBASE-2870 Project: HBase Issue Type: New Feature Reporter: Nicolas Spiegelberg Assignee: Karthik Ranganathan Priority: Minor Fix For: 0.90.0 The HMaster main() should allow a toggle like --backup, which forces it to be a secondary master on startup versus a primary candidate. That way, we can start up multiple masters at once and deterministically know which one will be the original primary. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2902) Improve our default shipping GC config. and doc -- along the way do a bit of GC myth-busting
[ https://issues.apache.org/jira/browse/HBASE-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896758#action_12896758 ] stack commented on HBASE-2902: -- Sure, but see the cited slide deck above. Its not the size of the YG but the amount of copying done (Not sure which kinda copying -- to be elicited). Might be worth playing w/ bigger YG making it so stuff is aged having made it through 4 or 5 or even 10 YGGCs before it gets tenured. St.Ack Improve our default shipping GC config. and doc -- along the way do a bit of GC myth-busting Key: HBASE-2902 URL: https://issues.apache.org/jira/browse/HBASE-2902 Project: HBase Issue Type: Improvement Reporter: stack Attachments: Fragger.java This issue is about improving the near-term story, working with our current lot, the slowly evolving (?) 1.6x JVMs and CMS (Longer-term, another issue in hbase tracks the G1 story and longer term, Todd is making a bit of traction over on the GC hotspot list). At the moment we ship with CMS and i-CMS enabled by default. At a minimum, i-cms does not apply on most hw hbase is deployed on -- i-cms is for hw w/ 2 or less processors -- and it seems as though we do not use multiple threads doing YG collections; i.e. -XX:UseParNewGC Use parallel threads in the new generation (Here's what I see...it seems to be off in jdk6 according to http://www.md.pp.ru/~eu/jdk6options.html#UseParNewGC but then this says its on by default when use CMS - http://blogs.sun.com/jonthecollector/category/Java ... but then this says enable it http://www.austinjug.org/presentations/JDK6PerfUpdate_Dec2009.pdf. I see this when its enabled: [Rescan (parallel) ... so it seems like its off. Need to review the src code). We should make the above changes or at least doc them. We should consider enabling GC logging by default. Its low cost apparently (citation below). We'd just need to do something about the log management. Not sure you can roll them -- investigate -- and anyways we should roll on startup at least so we don't lose GC logs across restarts. We should play with initiating ratios; maybe starting CMS earlier will push out the fragmented heap that brings on the killer stop-the-world collection. I read somewhere recently that invoking System.gc will run a CMS GC if CMS is enabled. We should investigate. If it ran the serial collector, we could at least doc. that users could run a defragmenting stop-the-world serial collection on 'off' times or at least make it so the stop-the-world happened when expected instead of at some random time. While here, lets do a bit of myth-busting. Here's a few postulates: + Keep the young generation small or at least, cap its size else it grows to occupy a large part of the heap The above is a Ryanism. Doing the above -- along w/ massive heap size -- has put off the fragmentation that others run into at SU at least. Interestingly, this document -- http://www.google.com/url?sa=tsource=webcd=1ved=0CBcQFjAAurl=http%3A%2F%2Fmediacast.sun.com%2Fusers%2FLudovic%2Fmedia%2FGCTuningPresentationFISL10.pdfei=ZPtaTOiLL5bcsAa7gsl1usg=AFQjCNHP691SIIE-6NSKccM4mZtm1U6Ahwsig2=2cjvcaeyn1aISL2THEENjQ -- would seem to recommend near the opposite in that it suggests that when using CMS, do all you can to keep stuff in the YG. Avoid having stuff age up to the tenured heap if you can. This would seem imply using a larger YG. Chatting w/ Ryan, the reason to keep the YG small is so we don't have long pauses doing YG collections. According to the above citation, its not big YGs that cause long YG pauses but the copying of data (not sure if its copying of data inside the YG or if it meant copying up to tenured -- chatting w/ Ryan we thought there'd be no difference -- but we should investigate) I look a look at a running upload with a small heap admittedly. What I was seeing was that using our defaults, rare was anything in YG of age 1 GC; i.e. near everything in YG was being promoted. This may have been a symptom of my small (default) heap but we should look into this and try and ensure objects are promoted because they are old, not because there is not enough space in YG. + We should write a slab allocator or allocate memory outside of the JVM heap Thinking on this, slab allocator, while a lot of work, I can see it helping us w/ block cache, but what if memstore is the fragmented-heap maker? In this case, slab-allocator is only part of the fix. It should be easy to see which is the fragmented heap maker since we can turn off the cache easy enough (though it seems like its accessed anyways even if disabled -- need to make sure its not doing allocations to the cache in this
[jira] Commented: (HBASE-2870) Add Backup CLI Option to HMaster
[ https://issues.apache.org/jira/browse/HBASE-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896763#action_12896763 ] HBase Review Board commented on HBASE-2870: --- Message from: st...@duboce.net --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/592/#review793 --- I applied patch and built. On shutdown of a standalone instance I got this: pynchon-305:trunk stack$ ./bin/stop-hbase.sh stopping hbase../Users/stack/checkouts/trunk/bin/hbase-daemons.sh: line 49: /Users/stack/checkouts/trunk/bin/master-backup.sh: Permission denied /Users/stack/checkouts/trunk/bin/hbase-daemons.sh: line 49: exec: /Users/stack/checkouts/trunk/bin/master-backup.sh: cannot execute: Unknown error: 0 My script looks to have x perms: pynchon-305:trunk stack$ ls -la bin/hbase-daemons.sh -rwxr-xr-x 1 stack staff 1628 Aug 9 20:21 bin/hbase-daemons.sh The passed 'args' are bad for standalone? trunk/bin/master-backup.sh http://review.cloudera.org/r/592/#comment2668 I don't understand this construct Nicolas (My shell scripting requires me to have a book or a web page open while I write -- currently its not present). - stack Add Backup CLI Option to HMaster Key: HBASE-2870 URL: https://issues.apache.org/jira/browse/HBASE-2870 Project: HBase Issue Type: New Feature Reporter: Nicolas Spiegelberg Assignee: Karthik Ranganathan Priority: Minor Fix For: 0.90.0 The HMaster main() should allow a toggle like --backup, which forces it to be a secondary master on startup versus a primary candidate. That way, we can start up multiple masters at once and deterministically know which one will be the original primary. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.