[jira] Created: (HBASE-2908) Wrong order of null-check
Wrong order of null-check - Key: HBASE-2908 URL: https://issues.apache.org/jira/browse/HBASE-2908 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.89.20100621 Reporter: Libor Dener Priority: Trivial In method org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(JobContext) this.table is used before null-throw check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897246#action_12897246 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-10 22:40:31, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 962 bq. http://review.cloudera.org/r/467/diff/3/?file=6015#file6015line962 bq. bq. Moving crashed snapshots has two benefits: bq. 1. future call to listSnapshots() wouldn't encounter IOException. bq. 2. it's easy for user to get statistics on failed snapshots and analyze them bq. bq. Or, if you log enough information when cleaning up the failed snapshot. bq. What about snapshot fails when it is being created? Currently it is cleaned up if exception occurs in HMaster.snapshot. Should we also move it to this directory? Then for reference information sync, should we also take the reference files of these failed snapshots into account? - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review830 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897250#action_12897250 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java, line 673 bq. http://review.cloudera.org/r/467/diff/3/?file=6002#file6002line673 bq. bq. This is fine for an hbase that is a fresh install but what about case where the data has been migrated from an older hbase version; it won't have this column family in .META. We should make a little migration script that adds it or on start of new version, check for it and if not present, create it. That's right. But AddColumn operation requires the table disabled to proceed, ROOT table can not be disabled once the system is started. Then how could we execute the migration script or check and create it on start of new version? bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 899 bq. http://review.cloudera.org/r/467/diff/3/?file=6005#file6005line899 bq. bq. Can the snapshot name be empty and then we'll make one up? a default snapshot name? or a auto-generated snapshot name, such as creation time? bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 951 bq. http://review.cloudera.org/r/467/diff/3/?file=6005#file6005line951 bq. bq. For restore of the snapshot, do you use loadtable.rb or Todd's new bulkloading scripts? Currently, NO... Snapshot is composed of a list of log files and a bunch of reference files for HFiles of the table. These reference files have the same hierarchy as the original table and the name is in the format of 1239384747630.tablename, where the front is the file name of the referred HFile and the latter is table name for snapshot. Thus to restore a snapshot, just copy reference files (which are just a few bytes) to the table dir, update the META and split the logs. When this table is enabled, the system know how to replay the commit edits and read such a reference file. Methods getReferredToFile, open in StoreFile are updated to deal with this kind of reference files for snapshots. At present, snapshot can only be restored to the table whose name is the same as the one for which the snapshot is created. That the old table with the same name must be deleted before restore a snapshot. That's what I do in unit test TestAdmin. Restoring snapshot to a different table name has a low priority. It has not been implemented yet. bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 50 bq. http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line50 bq. bq. Whats this? A different kind of reference? Yes.. This is the reference file in snapshot. It references an HFile of the original table. bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java, line 115 bq. http://review.cloudera.org/r/467/diff/3/?file=6018#file6018line115 bq. bq. This looks like a class that you could write a unit test for? Sure, I'll add another case in TestLogsCleaner. bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java, line 130 bq. http://review.cloudera.org/r/467/diff/3/?file=6017#file6017line130 bq. bq. If table were big, this could be prohibitively expensive? A single-threaded copy of all of a tables data? We could compliment this with MR-base restore, something that did the copy using MR? This method is only used in RestoreSnapshot, where reference files of snapshot are copied to the table dir. These reference files just contains a few bytes instead of the table's data. Snapshots share the table data with the original table and other snapshots. Do we still need a MR job? bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java, line 212 bq. http://review.cloudera.org/r/467/diff/3/?file=6013#file6013line212 bq. bq. Why Random negative number? Why not just leave it blank? If a blank value is used as the key, there would be only one item at last if it is the first few times to scan the regions. Using random negative number indicates all these regions have not been scanned before. If it has been scanned, there would be a last checking time for it instead. bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java, line 251 bq. http://review.cloudera.org/r/467/diff/3/?file=6012#file6012line251 bq. bq. Is this comment right? I just renamed the Ranges to caps, comment was not
[jira] Updated: (HBASE-2908) Wrong order of null-check
[ https://issues.apache.org/jira/browse/HBASE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Libor Dener updated HBASE-2908: --- Attachment: hbase-2908-fix.patch Wrong order of null-check - Key: HBASE-2908 URL: https://issues.apache.org/jira/browse/HBASE-2908 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.89.20100621 Reporter: Libor Dener Priority: Trivial Attachments: hbase-2908-fix.patch In method org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(JobContext) this.table is used before null-throw check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2908) Wrong order of null-check
[ https://issues.apache.org/jira/browse/HBASE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Libor Dener updated HBASE-2908: --- Status: Patch Available (was: Open) hbase-2908-fix.patch should fix the mentioned issue. Wrong order of null-check - Key: HBASE-2908 URL: https://issues.apache.org/jira/browse/HBASE-2908 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.89.20100621 Reporter: Libor Dener Priority: Trivial Attachments: hbase-2908-fix.patch In method org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(JobContext) this.table is used before null-throw check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897257#action_12897257 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-10 22:20:23, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 156 bq. http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line156 bq. bq. I think the current code is backward compatible. Boolean value of true is interpreted as TOP, value of false is BOTTOM. bq. Since ENTIRE is introduced, this code is not backward compatible. bq. bq. See: bq. http://download.oracle.com/javase/1.4.2/docs/api/java/io/DataOutput.html#writeBoolean%28boolean%29 Why it is not backward compatible when ENTIRE is introduces? The value for ENTIRE is 2, different from the old written value of boolean. - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review829 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2908) Wrong order of null-check
[ https://issues.apache.org/jira/browse/HBASE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-2908: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: 0.90.0 Resolution: Fixed Thanks for the patch Libor. Applied to TRUNK. Wrong order of null-check - Key: HBASE-2908 URL: https://issues.apache.org/jira/browse/HBASE-2908 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.89.20100621 Reporter: Libor Dener Priority: Trivial Fix For: 0.90.0 Attachments: hbase-2908-fix.patch In method org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(JobContext) this.table is used before null-throw check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1697) Discretionary access control
[ https://issues.apache.org/jira/browse/HBASE-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897295#action_12897295 ] stack commented on HBASE-1697: -- Andrew: You need something on this issue? Discretionary access control Key: HBASE-1697 URL: https://issues.apache.org/jira/browse/HBASE-1697 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.92.0 Consider implementing discretionary access control for HBase. Access control has three aspects: authentication, authorization and audit. - Authentication: Access is controlled by insisting on an authentication procedure to establish the identity of the user. The authentication procedure should minimally require a non-plaintext authentication factor (e.g. encrypted password with salt) and should ideally or at least optionally provide cryptographically strong confidence via public key certification. - Authorization: Access is controlled by specifying rights to resources via an access control list (ACL). An ACL is a list of permissions attached to an object. The list specifies who or what is allowed to access the object and what operations are allowed to be performed on the object, f.e. create, update, read, or delete. - Audit: Important actions taken by subjects should be logged for accountability, a chronological record which enables the full reconstruction and examination of a sequence of events, e.g. schema changes or data mutations. Logging activity should be protected from all subjects except for a restricted set with administrative privilege, perhaps to only a single super-user. Discretionary access control means the access policy for an object is determined by the owner of the object. Every object in the system must have a valid owner. Owners can assign access rights and permissions to other users. The initial owner of an object is the subject who created it. If subjects are deleted from a system, ownership of objects owned by them should revert to some super-user or otherwise valid default. HBase can enforce access policy at table, column family, or cell granularity. Cell granularity does not make much sense. An implementation which controls access at both the table and column family levels is recommended, though a first cut could consider control at the table level only. The initial set of permissions can be: Create (table schema or column family), update (table schema or column family), read (column family), delete (table or column family), execute (filters), and transfer ownership. The subject identities and access tokens could be stored in a new administrative table. ACLs on tables and column families can be stored in META. Access other than read access to catalog and administrative tables should be restricted to a set of administrative users or perhaps a single super-user. A data mutation on a user table by a subject without administrative or superuser privilege which results in a table split is an implicit temporary privilege elevation where the regionserver or master updates the catalog tables as necessary to support the split. Audit logging should be configurable on a per-table basis to avoid this overhead where it is not wanted. Consider supporting external authentication and subject identification mechanisms with Java library support: RADIUS/TACACS, Kerberos, LDAP. Consider logging audit trails to an HBase table (bigtable type schemas are natural for this) and optionally external logging options with Java library support -- syslog, etc., or maybe commons-logging is sufficient and punt to administrator to set up appropriate commons-logging/log4j configurations for their needs. If HBASE-1002 is considered, and the option to support filtering via upload of (perhaps complex) bytecode produced by some little language compiler is implemented, the execute privilege could be extended in a manner similar to how stored procedures in SQL land execute either with the privilege of the current user or the (table/procedure) creator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-2868) Do some small cleanups in org.apache.hadoop.hbase.regionserver.wal
[ https://issues.apache.org/jira/browse/HBASE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-2868. -- Resolution: Fixed Committed. Thanks for the patch mighty Alex. Do some small cleanups in org.apache.hadoop.hbase.regionserver.wal -- Key: HBASE-2868 URL: https://issues.apache.org/jira/browse/HBASE-2868 Project: HBase Issue Type: Improvement Reporter: Alex Newman Assignee: Alex Newman Fix For: 0.90.0 Attachments: 0001-HBASE-2868.patch, 1 (1) Since i am touching this area its probably better to leave it in a cleaner state. Non deprecated ,etc -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2909) SoftValueSortedMap is broken, can generate NPEs
SoftValueSortedMap is broken, can generate NPEs --- Key: HBASE-2909 URL: https://issues.apache.org/jira/browse/HBASE-2909 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.89.20100621, 0.20.6 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.20.7, 0.90.0 The way SoftValueSortedMap is using SoftValues, it looks like that it's able to get it's keys garbage collected along with the values themselves. We got this issue in production but I was also able to randomly generate it using YCSB with 300 threads. Here's an example on 0.20 with jdk 1.6u14: {noformat} java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:1036) at org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:104) at org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:96) at java.util.TreeMap.cmp(TreeMap.java:1911) at java.util.TreeMap.get(TreeMap.java:1835) at org.apache.hadoop.hbase.util.SoftValueSortedMap.get(SoftValueSortedMap.java:91) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getCachedLocation(HConnectionManager.java:788) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:651) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:128) at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.getTable(ThriftServer.java:262) at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRowTs(ThriftServer.java:585) at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRow(ThriftServer.java:578) at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.process(Hbase.java:2345) at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor.process(Hbase.java:1988) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:259) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {noformat} In this specific case, the null cannot be the passed key because it's coming from HTable which uses HConstants.EMPTY_START_ROW. It cannot be a null key that was inserted previously because we would have got the NPE at insert time. This can only mean that some key *became* null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897474#action_12897474 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review846 --- src/main/java/org/apache/hadoop/hbase/io/Reference.java http://review.cloudera.org/r/467/#comment2846 I meant value of 2 cannot be correctly interpreted as boolean. src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment2847 I think we need to limit the space consumed by failed snapshots. This issue can be addressed by a future JIRA. - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2909) SoftValueSortedMap is broken, can generate NPEs
[ https://issues.apache.org/jira/browse/HBASE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-2909: -- Attachment: hbase-2909.patch Here's a more normal way of doing a soft references structure, with the SoftValue now inside SoftValueSortedMap. I also got rid of the implementation of Map.Entry (which was suspicious) and disabled entrySet because 1) it wasn't used and 2) it used the Map.Entry which wasn't really one. Ran a few tests and it works, need more at-scale testing. SoftValueSortedMap is broken, can generate NPEs --- Key: HBASE-2909 URL: https://issues.apache.org/jira/browse/HBASE-2909 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.20.6, 0.89.20100621 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.20.7, 0.90.0 Attachments: hbase-2909.patch The way SoftValueSortedMap is using SoftValues, it looks like that it's able to get it's keys garbage collected along with the values themselves. We got this issue in production but I was also able to randomly generate it using YCSB with 300 threads. Here's an example on 0.20 with jdk 1.6u14: {noformat} java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:1036) at org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:104) at org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:96) at java.util.TreeMap.cmp(TreeMap.java:1911) at java.util.TreeMap.get(TreeMap.java:1835) at org.apache.hadoop.hbase.util.SoftValueSortedMap.get(SoftValueSortedMap.java:91) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getCachedLocation(HConnectionManager.java:788) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:651) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:128) at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.getTable(ThriftServer.java:262) at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRowTs(ThriftServer.java:585) at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRow(ThriftServer.java:578) at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.process(Hbase.java:2345) at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor.process(Hbase.java:1988) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:259) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {noformat} In this specific case, the null cannot be the passed key because it's coming from HTable which uses HConstants.EMPTY_START_ROW. It cannot be a null key that was inserted previously because we would have got the NPE at insert time. This can only mean that some key *became* null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2909) SoftValueSortedMap is broken, can generate NPEs
[ https://issues.apache.org/jira/browse/HBASE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897489#action_12897489 ] ryan rawson commented on HBASE-2909: +1 (and not just because it is modelled after my SimpleBlockCache) SoftValueSortedMap is broken, can generate NPEs --- Key: HBASE-2909 URL: https://issues.apache.org/jira/browse/HBASE-2909 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.20.6, 0.89.20100621 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.20.7, 0.90.0 Attachments: hbase-2909.patch The way SoftValueSortedMap is using SoftValues, it looks like that it's able to get it's keys garbage collected along with the values themselves. We got this issue in production but I was also able to randomly generate it using YCSB with 300 threads. Here's an example on 0.20 with jdk 1.6u14: {noformat} java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:1036) at org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:104) at org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:96) at java.util.TreeMap.cmp(TreeMap.java:1911) at java.util.TreeMap.get(TreeMap.java:1835) at org.apache.hadoop.hbase.util.SoftValueSortedMap.get(SoftValueSortedMap.java:91) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getCachedLocation(HConnectionManager.java:788) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:651) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:128) at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.getTable(ThriftServer.java:262) at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRowTs(ThriftServer.java:585) at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRow(ThriftServer.java:578) at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.process(Hbase.java:2345) at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor.process(Hbase.java:1988) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:259) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {noformat} In this specific case, the null cannot be the passed key because it's coming from HTable which uses HConstants.EMPTY_START_ROW. It cannot be a null key that was inserted previously because we would have got the NPE at insert time. This can only mean that some key *became* null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2697) Implement new open/close logic in handlers and stop using heartbeats for open/close messages
[ https://issues.apache.org/jira/browse/HBASE-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897516#action_12897516 ] HBase Review Board commented on HBASE-2697: --- Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/484/#review850 --- branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java http://review.cloudera.org/r/484/#comment2851 Please change description to reflect what the code does - throwing NotAllMetaRegionsOnlineException - Ted Implement new open/close logic in handlers and stop using heartbeats for open/close messages Key: HBASE-2697 URL: https://issues.apache.org/jira/browse/HBASE-2697 Project: HBase Issue Type: Sub-task Components: ipc, master, regionserver Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.90.0 Attachments: HBASE-2697-part1-v10.patch This issue is doing the meat of what HBASE-2485 is about and continues what was started in HBASE-2694 after some code cleanup to make life easier. This deals with no longer piggybacking messages from Master to RegionServers on heartbeat responses and instead sending direct unsolicited messages. This also deals with moving the open/close logic fully into handlers and removing the existing open/close code on both the RS and M sides. There may also be some changes to the master in-memory state of regions in transition. The new load balancer will probably be introduced with this issue but not fully integrated yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1660) need a rolling restart script
[ https://issues.apache.org/jira/browse/HBASE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897560#action_12897560 ] HBase Review Board commented on HBASE-1660: --- Message from: Nicolas nspiegelb...@facebook.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/632/ --- Review request for hbase. Summary --- 1. added 'restart' option to hbase-daemon.sh for unit-level restarting 2. added rolling-restart.sh script to perform system-level rolling restarts This addresses bug HBASE-1660. http://issues.apache.org/jira/browse/HBASE-1660 Diffs - trunk/bin/hbase-daemon.sh 984635 trunk/bin/rolling-restart.sh PRE-CREATION Diff: http://review.cloudera.org/r/632/diff Testing --- ./bin/start-hbase.sh ./bin/rolling-restart.sh ./bin/stop-hbase.sh Thanks, Nicolas need a rolling restart script - Key: HBASE-1660 URL: https://issues.apache.org/jira/browse/HBASE-1660 Project: HBase Issue Type: New Feature Affects Versions: 0.20.0 Reporter: ryan rawson Priority: Minor Fix For: 0.92.0 need a script that will do a rolling restart. It should be configurable in 2 ways: - how long to keep the daemon down per host - how long to wait between hosts for regionservers in my own hacky command line I used 10/60. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1660) need a rolling restart script
[ https://issues.apache.org/jira/browse/HBASE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897561#action_12897561 ] HBase Review Board commented on HBASE-1660: --- Message from: Nicolas nspiegelb...@facebook.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/632/ --- (Updated 2010-08-11 20:26:20.097452) Review request for hbase. Summary (updated) --- 1. added 'restart' option to hbase-daemon.sh for unit-level restarting 2. added rolling-restart.sh script to perform system-level rolling restarts (Note that I intentionally did not restart the ZooKeeper nodes, since those binaries will need an update far less often) This addresses bug HBASE-1660. http://issues.apache.org/jira/browse/HBASE-1660 Diffs - trunk/bin/hbase-daemon.sh 984635 trunk/bin/rolling-restart.sh PRE-CREATION Diff: http://review.cloudera.org/r/632/diff Testing --- ./bin/start-hbase.sh ./bin/rolling-restart.sh ./bin/stop-hbase.sh Thanks, Nicolas need a rolling restart script - Key: HBASE-1660 URL: https://issues.apache.org/jira/browse/HBASE-1660 Project: HBase Issue Type: New Feature Affects Versions: 0.20.0 Reporter: ryan rawson Priority: Minor Fix For: 0.92.0 need a script that will do a rolling restart. It should be configurable in 2 ways: - how long to keep the daemon down per host - how long to wait between hosts for regionservers in my own hacky command line I used 10/60. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.