[jira] Created: (HBASE-2908) Wrong order of null-check

2010-08-11 Thread Libor Dener (JIRA)
Wrong order of null-check
-

 Key: HBASE-2908
 URL: https://issues.apache.org/jira/browse/HBASE-2908
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.89.20100621
Reporter: Libor Dener
Priority: Trivial


In method 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(JobContext)
this.table is used before null-throw check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-50) Snapshot of table

2010-08-11 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897246#action_12897246
 ] 

HBase Review Board commented on HBASE-50:
-

Message from: Chongxin Li lichong...@zju.edu.cn


bq.  On 2010-08-10 22:40:31, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 962
bq.   http://review.cloudera.org/r/467/diff/3/?file=6015#file6015line962
bq.  
bq.   Moving crashed snapshots has two benefits:
bq.   1. future call to listSnapshots() wouldn't encounter IOException.
bq.   2. it's easy for user to get statistics on failed snapshots and 
analyze them
bq.   
bq.   Or, if you log enough information when cleaning up the failed 
snapshot.
bq.  

What about snapshot fails when it is being created? Currently it is cleaned up 
if exception occurs in HMaster.snapshot. Should we also move it to this 
directory? Then for reference information sync, should we also take the 
reference files of these failed snapshots into account?


- Chongxin


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/#review830
---





 Snapshot of table
 -

 Key: HBASE-50
 URL: https://issues.apache.org/jira/browse/HBASE-50
 Project: HBase
  Issue Type: New Feature
Reporter: Billy Pearson
Assignee: Li Chongxin
Priority: Minor
 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot 
 Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class 
 Diagram.png


 Havening an option to take a snapshot of a table would be vary useful in 
 production.
 What I would like to see this option do is do a merge of all the data into 
 one or more files stored in the same folder on the dfs. This way we could 
 save data in case of a software bug in hadoop or user code. 
 The other advantage would be to be able to export a table to multi locations. 
 Say I had a read_only table that must be online. I could take a snapshot of 
 it when needed and export it to a separate data center and have it loaded 
 there and then i would have it online at multi data centers for load 
 balancing and failover.
 I understand that hadoop takes the need out of havening backup to protect 
 from failed servers, but this does not protect use from software bugs that 
 might delete or alter data in ways we did not plan. We should have a way we 
 can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-50) Snapshot of table

2010-08-11 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897250#action_12897250
 ] 

HBase Review Board commented on HBASE-50:
-

Message from: Chongxin Li lichong...@zju.edu.cn


bq.  On 2010-08-10 21:34:40, stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java, line 673
bq.   http://review.cloudera.org/r/467/diff/3/?file=6002#file6002line673
bq.  
bq.   This is fine for an hbase that is a fresh install but what about 
case where the data has been migrated from an older hbase version; it won't 
have this column family in .META.  We should make a little migration script 
that adds it or on start of new version, check for it and if not present, 
create it.

That's right. But AddColumn operation requires the table disabled to proceed, 
ROOT table can not be disabled once the system is started. Then how could we 
execute the migration script or check and create it on start of new version?


bq.  On 2010-08-10 21:34:40, stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 899
bq.   http://review.cloudera.org/r/467/diff/3/?file=6005#file6005line899
bq.  
bq.   Can the snapshot name be empty and then we'll make one up?

a default snapshot name? or a auto-generated snapshot name, such as creation 
time?


bq.  On 2010-08-10 21:34:40, stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 951
bq.   http://review.cloudera.org/r/467/diff/3/?file=6005#file6005line951
bq.  
bq.   For restore of the snapshot, do you use loadtable.rb or Todd's new 
bulkloading scripts?

Currently, NO...
Snapshot is composed of a list of log files and a bunch of reference files for 
HFiles of the table. These reference files have the same hierarchy as the 
original table and the name is in the format of 1239384747630.tablename, 
where the front is the file name of the referred HFile and the latter is table 
name for snapshot. Thus to restore a snapshot, just copy reference files (which 
are just a few bytes) to the table dir, update the META and split the logs. 
When this table is enabled, the system know how to replay the commit edits and 
read such a reference file. Methods getReferredToFile, open in StoreFile are 
updated to deal with this kind of reference files for snapshots.

At present, snapshot can only be restored to the table whose name is the same 
as the one for which the snapshot is created. That the old table with the same 
name must be deleted before restore a snapshot. That's what I do in unit test 
TestAdmin. Restoring snapshot to a different table name has a low priority. It 
has not been implemented yet.


bq.  On 2010-08-10 21:34:40, stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 50
bq.   http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line50
bq.  
bq.   Whats this?  A different kind of reference?

Yes.. This is the reference file in snapshot. It references an HFile of the 
original table.


bq.  On 2010-08-10 21:34:40, stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java, 
line 115
bq.   http://review.cloudera.org/r/467/diff/3/?file=6018#file6018line115
bq.  
bq.   This looks like a class that you could write a unit test for?

Sure, I'll add another case in TestLogsCleaner.


bq.  On 2010-08-10 21:34:40, stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java, line 
130
bq.   http://review.cloudera.org/r/467/diff/3/?file=6017#file6017line130
bq.  
bq.   If table were big, this could be prohibitively expensive?  A 
single-threaded copy of all of a tables data?  We could compliment this with 
MR-base restore, something that did the copy using MR?

This method is only used in RestoreSnapshot, where reference files of snapshot 
are copied to the table dir. These reference files just contains a few bytes 
instead of the table's data. Snapshots share the table data with the original 
table and other snapshots. Do we still need a MR job?


bq.  On 2010-08-10 21:34:40, stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java, line 212
bq.   http://review.cloudera.org/r/467/diff/3/?file=6013#file6013line212
bq.  
bq.   Why Random negative number?  Why not just leave it blank?

If a blank value is used as the key, there would be only one item at last if it 
is the first few times to scan the regions. Using random negative number 
indicates all these regions have not been scanned before. If it has been 
scanned, there would be a last checking time for it instead.


bq.  On 2010-08-10 21:34:40, stack wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java, 
line 251
bq.   http://review.cloudera.org/r/467/diff/3/?file=6012#file6012line251
bq.  
bq.   Is this comment right?

I just renamed the Ranges to caps, comment was not 

[jira] Updated: (HBASE-2908) Wrong order of null-check

2010-08-11 Thread Libor Dener (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Libor Dener updated HBASE-2908:
---

Attachment: hbase-2908-fix.patch

 Wrong order of null-check
 -

 Key: HBASE-2908
 URL: https://issues.apache.org/jira/browse/HBASE-2908
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.89.20100621
Reporter: Libor Dener
Priority: Trivial
 Attachments: hbase-2908-fix.patch


 In method 
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(JobContext)
 this.table is used before null-throw check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2908) Wrong order of null-check

2010-08-11 Thread Libor Dener (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Libor Dener updated HBASE-2908:
---

Status: Patch Available  (was: Open)

hbase-2908-fix.patch should fix the mentioned issue.

 Wrong order of null-check
 -

 Key: HBASE-2908
 URL: https://issues.apache.org/jira/browse/HBASE-2908
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.89.20100621
Reporter: Libor Dener
Priority: Trivial
 Attachments: hbase-2908-fix.patch


 In method 
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(JobContext)
 this.table is used before null-throw check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-50) Snapshot of table

2010-08-11 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897257#action_12897257
 ] 

HBase Review Board commented on HBASE-50:
-

Message from: Chongxin Li lichong...@zju.edu.cn


bq.  On 2010-08-10 22:20:23, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 156
bq.   http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line156
bq.  
bq.   I think the current code is backward compatible. Boolean value of 
true is interpreted as TOP, value of false is BOTTOM.
bq.   Since ENTIRE is introduced, this code is not backward compatible.
bq.   
bq.   See:
bq.   
http://download.oracle.com/javase/1.4.2/docs/api/java/io/DataOutput.html#writeBoolean%28boolean%29

Why it is not backward compatible when ENTIRE is introduces? The value for 
ENTIRE is 2, different from the old written value of boolean.


- Chongxin


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/#review829
---





 Snapshot of table
 -

 Key: HBASE-50
 URL: https://issues.apache.org/jira/browse/HBASE-50
 Project: HBase
  Issue Type: New Feature
Reporter: Billy Pearson
Assignee: Li Chongxin
Priority: Minor
 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot 
 Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class 
 Diagram.png


 Havening an option to take a snapshot of a table would be vary useful in 
 production.
 What I would like to see this option do is do a merge of all the data into 
 one or more files stored in the same folder on the dfs. This way we could 
 save data in case of a software bug in hadoop or user code. 
 The other advantage would be to be able to export a table to multi locations. 
 Say I had a read_only table that must be online. I could take a snapshot of 
 it when needed and export it to a separate data center and have it loaded 
 there and then i would have it online at multi data centers for load 
 balancing and failover.
 I understand that hadoop takes the need out of havening backup to protect 
 from failed servers, but this does not protect use from software bugs that 
 might delete or alter data in ways we did not plan. We should have a way we 
 can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2908) Wrong order of null-check

2010-08-11 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2908:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.90.0
   Resolution: Fixed

Thanks for the patch Libor.  Applied to TRUNK.

 Wrong order of null-check
 -

 Key: HBASE-2908
 URL: https://issues.apache.org/jira/browse/HBASE-2908
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.89.20100621
Reporter: Libor Dener
Priority: Trivial
 Fix For: 0.90.0

 Attachments: hbase-2908-fix.patch


 In method 
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(JobContext)
 this.table is used before null-throw check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1697) Discretionary access control

2010-08-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897295#action_12897295
 ] 

stack commented on HBASE-1697:
--

Andrew: You need something on this issue?

 Discretionary access control
 

 Key: HBASE-1697
 URL: https://issues.apache.org/jira/browse/HBASE-1697
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.92.0


 Consider implementing discretionary access control for HBase.
 Access control has three aspects: authentication, authorization and audit.
 - Authentication: Access is controlled by insisting on an authentication 
 procedure to establish the identity of the user. The authentication procedure 
 should minimally require a non-plaintext authentication factor (e.g. 
 encrypted password with salt) and should ideally or at least optionally 
 provide cryptographically strong confidence via public key certification.
 - Authorization: Access is controlled by specifying rights to resources via 
 an access control list (ACL). An ACL is a list of permissions attached to an 
 object. The list specifies who or what is allowed to access the object and 
 what operations are allowed to be performed on the object, f.e. create, 
 update, read, or delete.
 - Audit: Important actions taken by subjects should be logged for 
 accountability, a chronological record which  enables the full reconstruction 
 and examination of a sequence of events, e.g. schema changes or data 
 mutations. Logging activity should be protected from all subjects except for 
 a restricted set with administrative privilege, perhaps to only a single 
 super-user. 
 Discretionary access control means the access policy for an object is 
 determined by the owner of the object. Every object in the system must have a 
 valid owner. Owners can assign access rights and permissions to other users. 
 The initial owner of an object is the subject who created it. If subjects are 
 deleted from a system, ownership of objects owned by them should revert to 
 some super-user or otherwise valid default. 
 HBase can enforce access policy at table, column family, or cell granularity. 
 Cell granularity does not make much sense. An implementation which controls 
 access at both the table and column family levels is recommended, though a 
 first cut could consider control at the table level only. The initial set of 
 permissions can be: Create (table schema or column family), update (table 
 schema or column family), read (column family), delete (table or column 
 family), execute (filters), and transfer ownership. The subject identities 
 and access tokens could be stored in a new administrative table. ACLs on 
 tables and column families can be stored in META. 
 Access other than read access to catalog and administrative tables should be 
 restricted to a set of administrative users or perhaps a single super-user. A 
 data mutation on a user table by a subject without administrative or 
 superuser privilege which results in a table split is an implicit temporary 
 privilege elevation where the regionserver or master updates the catalog 
 tables as necessary to support the split. 
 Audit logging should be configurable on a per-table basis to avoid this 
 overhead where it is not wanted.
 Consider supporting external authentication and subject identification 
 mechanisms with Java library support: RADIUS/TACACS, Kerberos, LDAP.
 Consider logging audit trails to an HBase table (bigtable type schemas are 
 natural for this) and optionally external logging options with Java library 
 support -- syslog, etc., or maybe commons-logging is sufficient and punt to 
 administrator to set up appropriate commons-logging/log4j configurations for 
 their needs. 
 If HBASE-1002 is considered, and the option to support filtering via upload 
 of (perhaps complex) bytecode produced by some little language compiler is 
 implemented, the execute privilege could be extended in a manner similar to 
 how stored procedures in SQL land execute either with the privilege of the 
 current user or the (table/procedure) creator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-2868) Do some small cleanups in org.apache.hadoop.hbase.regionserver.wal

2010-08-11 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-2868.
--

Resolution: Fixed

Committed.  Thanks for the patch mighty Alex.

 Do some small cleanups in org.apache.hadoop.hbase.regionserver.wal
 --

 Key: HBASE-2868
 URL: https://issues.apache.org/jira/browse/HBASE-2868
 Project: HBase
  Issue Type: Improvement
Reporter: Alex Newman
Assignee: Alex Newman
 Fix For: 0.90.0

 Attachments: 0001-HBASE-2868.patch, 1 (1)


 Since i am touching this area its probably better to leave it in a cleaner 
 state. Non deprecated ,etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-2909) SoftValueSortedMap is broken, can generate NPEs

2010-08-11 Thread Jean-Daniel Cryans (JIRA)
SoftValueSortedMap is broken, can generate NPEs
---

 Key: HBASE-2909
 URL: https://issues.apache.org/jira/browse/HBASE-2909
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.89.20100621, 0.20.6
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.20.7, 0.90.0


The way SoftValueSortedMap is using SoftValues, it looks like that it's able to 
get it's keys garbage collected along with the values themselves. We got this 
issue in production but I was also able to randomly generate it using YCSB with 
300 threads. Here's an example on 0.20 with jdk 1.6u14:

{noformat}

java.lang.NullPointerException
at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:1036)
at 
org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:104)
at 
org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:96)
at java.util.TreeMap.cmp(TreeMap.java:1911)
at java.util.TreeMap.get(TreeMap.java:1835)
at 
org.apache.hadoop.hbase.util.SoftValueSortedMap.get(SoftValueSortedMap.java:91)
at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getCachedLocation(HConnectionManager.java:788)
at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:651)
at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634)
at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
at org.apache.hadoop.hbase.client.HTable.init(HTable.java:128)
at 
org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.getTable(ThriftServer.java:262)
at 
org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRowTs(ThriftServer.java:585)
at 
org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRow(ThriftServer.java:578)
at 
org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.process(Hbase.java:2345)
at 
org.apache.hadoop.hbase.thrift.generated.Hbase$Processor.process(Hbase.java:1988)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:259)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
{noformat}

In this specific case, the null cannot be the passed key because it's coming 
from HTable which uses HConstants.EMPTY_START_ROW. It cannot be a null key that 
was inserted previously because we would have got the NPE at insert time. This 
can only mean that some key *became* null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-50) Snapshot of table

2010-08-11 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897474#action_12897474
 ] 

HBase Review Board commented on HBASE-50:
-

Message from: Ted Yu ted...@yahoo.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/#review846
---



src/main/java/org/apache/hadoop/hbase/io/Reference.java
http://review.cloudera.org/r/467/#comment2846

I meant value of 2 cannot be correctly interpreted as boolean.




src/main/java/org/apache/hadoop/hbase/master/HMaster.java
http://review.cloudera.org/r/467/#comment2847

I think we need to limit the space consumed by failed snapshots.
This issue can be addressed by a future JIRA.


- Ted





 Snapshot of table
 -

 Key: HBASE-50
 URL: https://issues.apache.org/jira/browse/HBASE-50
 Project: HBase
  Issue Type: New Feature
Reporter: Billy Pearson
Assignee: Li Chongxin
Priority: Minor
 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot 
 Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class 
 Diagram.png


 Havening an option to take a snapshot of a table would be vary useful in 
 production.
 What I would like to see this option do is do a merge of all the data into 
 one or more files stored in the same folder on the dfs. This way we could 
 save data in case of a software bug in hadoop or user code. 
 The other advantage would be to be able to export a table to multi locations. 
 Say I had a read_only table that must be online. I could take a snapshot of 
 it when needed and export it to a separate data center and have it loaded 
 there and then i would have it online at multi data centers for load 
 balancing and failover.
 I understand that hadoop takes the need out of havening backup to protect 
 from failed servers, but this does not protect use from software bugs that 
 might delete or alter data in ways we did not plan. We should have a way we 
 can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2909) SoftValueSortedMap is broken, can generate NPEs

2010-08-11 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-2909:
--

Attachment: hbase-2909.patch

Here's a more normal way of doing a soft references structure, with the 
SoftValue now inside SoftValueSortedMap. I also got rid of the implementation 
of Map.Entry (which was suspicious) and disabled entrySet because 1) it wasn't 
used and 2) it used the Map.Entry which wasn't really one. Ran a few tests and 
it works, need more at-scale testing.

 SoftValueSortedMap is broken, can generate NPEs
 ---

 Key: HBASE-2909
 URL: https://issues.apache.org/jira/browse/HBASE-2909
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.20.6, 0.89.20100621
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.20.7, 0.90.0

 Attachments: hbase-2909.patch


 The way SoftValueSortedMap is using SoftValues, it looks like that it's able 
 to get it's keys garbage collected along with the values themselves. We got 
 this issue in production but I was also able to randomly generate it using 
 YCSB with 300 threads. Here's an example on 0.20 with jdk 1.6u14:
 {noformat}
 java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:1036)
 at 
 org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:104)
 at 
 org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:96)
 at java.util.TreeMap.cmp(TreeMap.java:1911)
 at java.util.TreeMap.get(TreeMap.java:1835)
 at 
 org.apache.hadoop.hbase.util.SoftValueSortedMap.get(SoftValueSortedMap.java:91)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getCachedLocation(HConnectionManager.java:788)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:651)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:128)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.getTable(ThriftServer.java:262)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRowTs(ThriftServer.java:585)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRow(ThriftServer.java:578)
 at 
 org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.process(Hbase.java:2345)
 at 
 org.apache.hadoop.hbase.thrift.generated.Hbase$Processor.process(Hbase.java:1988)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:259)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {noformat}
 In this specific case, the null cannot be the passed key because it's coming 
 from HTable which uses HConstants.EMPTY_START_ROW. It cannot be a null key 
 that was inserted previously because we would have got the NPE at insert 
 time. This can only mean that some key *became* null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2909) SoftValueSortedMap is broken, can generate NPEs

2010-08-11 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897489#action_12897489
 ] 

ryan rawson commented on HBASE-2909:


+1 (and not just because it is modelled after my SimpleBlockCache)



 SoftValueSortedMap is broken, can generate NPEs
 ---

 Key: HBASE-2909
 URL: https://issues.apache.org/jira/browse/HBASE-2909
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.20.6, 0.89.20100621
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.20.7, 0.90.0

 Attachments: hbase-2909.patch


 The way SoftValueSortedMap is using SoftValues, it looks like that it's able 
 to get it's keys garbage collected along with the values themselves. We got 
 this issue in production but I was also able to randomly generate it using 
 YCSB with 300 threads. Here's an example on 0.20 with jdk 1.6u14:
 {noformat}
 java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:1036)
 at 
 org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:104)
 at 
 org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:96)
 at java.util.TreeMap.cmp(TreeMap.java:1911)
 at java.util.TreeMap.get(TreeMap.java:1835)
 at 
 org.apache.hadoop.hbase.util.SoftValueSortedMap.get(SoftValueSortedMap.java:91)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getCachedLocation(HConnectionManager.java:788)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:651)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:128)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.getTable(ThriftServer.java:262)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRowTs(ThriftServer.java:585)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRow(ThriftServer.java:578)
 at 
 org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.process(Hbase.java:2345)
 at 
 org.apache.hadoop.hbase.thrift.generated.Hbase$Processor.process(Hbase.java:1988)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:259)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {noformat}
 In this specific case, the null cannot be the passed key because it's coming 
 from HTable which uses HConstants.EMPTY_START_ROW. It cannot be a null key 
 that was inserted previously because we would have got the NPE at insert 
 time. This can only mean that some key *became* null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2697) Implement new open/close logic in handlers and stop using heartbeats for open/close messages

2010-08-11 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897516#action_12897516
 ] 

HBase Review Board commented on HBASE-2697:
---

Message from: Ted Yu ted...@yahoo.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/484/#review850
---



branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
http://review.cloudera.org/r/484/#comment2851

Please change description to reflect what the code does - throwing 
NotAllMetaRegionsOnlineException


- Ted





 Implement new open/close logic in handlers and stop using heartbeats for 
 open/close messages
 

 Key: HBASE-2697
 URL: https://issues.apache.org/jira/browse/HBASE-2697
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, regionserver
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.90.0

 Attachments: HBASE-2697-part1-v10.patch


 This issue is doing the meat of what HBASE-2485 is about and continues what 
 was started in HBASE-2694 after some code cleanup to make life easier.
 This deals with no longer piggybacking messages from Master to RegionServers 
 on heartbeat responses and instead sending direct unsolicited messages.  This 
 also deals with moving the open/close logic fully into handlers and removing 
 the existing open/close code on both the RS and M sides.  There may also be 
 some changes to the master in-memory state of regions in transition.  The new 
 load balancer will probably be introduced with this issue but not fully 
 integrated yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1660) need a rolling restart script

2010-08-11 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897560#action_12897560
 ] 

HBase Review Board commented on HBASE-1660:
---

Message from: Nicolas nspiegelb...@facebook.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/632/
---

Review request for hbase.


Summary
---

1. added 'restart' option to hbase-daemon.sh for unit-level restarting
2. added rolling-restart.sh script to perform system-level rolling restarts


This addresses bug HBASE-1660.
http://issues.apache.org/jira/browse/HBASE-1660


Diffs
-

  trunk/bin/hbase-daemon.sh 984635 
  trunk/bin/rolling-restart.sh PRE-CREATION 

Diff: http://review.cloudera.org/r/632/diff


Testing
---

./bin/start-hbase.sh
./bin/rolling-restart.sh
./bin/stop-hbase.sh


Thanks,

Nicolas




 need a rolling restart script
 -

 Key: HBASE-1660
 URL: https://issues.apache.org/jira/browse/HBASE-1660
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.20.0
Reporter: ryan rawson
Priority: Minor
 Fix For: 0.92.0


 need a script that will do a rolling restart.
 It should be configurable in 2 ways:
 - how long to keep the daemon down per host
 - how long to wait between hosts
 for regionservers in my own hacky command line I used 10/60.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1660) need a rolling restart script

2010-08-11 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897561#action_12897561
 ] 

HBase Review Board commented on HBASE-1660:
---

Message from: Nicolas nspiegelb...@facebook.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/632/
---

(Updated 2010-08-11 20:26:20.097452)


Review request for hbase.


Summary (updated)
---

1. added 'restart' option to hbase-daemon.sh for unit-level restarting
2. added rolling-restart.sh script to perform system-level rolling restarts

(Note that I intentionally did not restart the ZooKeeper nodes, since those 
binaries will need an update far less often)


This addresses bug HBASE-1660.
http://issues.apache.org/jira/browse/HBASE-1660


Diffs
-

  trunk/bin/hbase-daemon.sh 984635 
  trunk/bin/rolling-restart.sh PRE-CREATION 

Diff: http://review.cloudera.org/r/632/diff


Testing
---

./bin/start-hbase.sh
./bin/rolling-restart.sh
./bin/stop-hbase.sh


Thanks,

Nicolas




 need a rolling restart script
 -

 Key: HBASE-1660
 URL: https://issues.apache.org/jira/browse/HBASE-1660
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.20.0
Reporter: ryan rawson
Priority: Minor
 Fix For: 0.92.0


 need a script that will do a rolling restart.
 It should be configurable in 2 ways:
 - how long to keep the daemon down per host
 - how long to wait between hosts
 for regionservers in my own hacky command line I used 10/60.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.