date:20130127


 [ 
https://issues.apache.org/jira/browse/HBASE-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-7210:
-

Fix Version/s: (was: 0.94.5)
   0.94.6

Moving to 0.94.6, I don't think we're ready here.

 Backport HBASE-6059 to 0.94
 ---

 Key: HBASE-7210
 URL: https://issues.apache.org/jira/browse/HBASE-7210
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.2
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.94.6

 Attachments: 6059-94.patch, 7120.txt


 HBASE-6059 seems to be an important issue.  Chunhui has already given a patch 
 for 94. Need to rebase if it does not apply cleanly.
 Raising a new one as the old issue is already closed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication


 [ 
https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-7280.
--

   Resolution: Won't Fix
Fix Version/s: (was: 0.94.5)

Closing as Won't fix. This is working as designed.
We can of course discuss another approach that can also ship meta edits.

 TableNotFoundException thrown in peer cluster will incur endless retry for 
 shipEdits, which in turn block following normal replication
 --

 Key: HBASE-7280
 URL: https://issues.apache.org/jira/browse/HBASE-7280
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.2
Reporter: Feng Honghua
   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 in cluster replication, if the master cluster have 2 tables which have 
 column-family declared with replication scope = 1, and add a peer cluster 
 which has only 1 table with the same name as the master cluster, in the 
 ReplicationSource (thread in master cluster) for this peer, edits (logs) for 
 both tables will be shipped to the peer, the peer will fail applying the 
 edits due to TableNotFoundException, and this exception will also be 
 responsed to the original shipper (ReplicationSource in master cluster), and 
 the shipper will fall into an endless retry for shipping the failed edits 
 without proceeding to read the remained(newer) log files and to ship 
 following edits(maybe the normal, expected edit for the registered table). 
 the symptom looks like the TableNotFoundException incurs endless retry and 
 blocking normal table replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-7101) HBase stuck in Region SPLIT


 [ 
https://issues.apache.org/jira/browse/HBASE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-7101.
--

   Resolution: Fixed
Fix Version/s: (was: 0.94.5)
   (was: 0.96.0)

I think this is a dup.

 HBase stuck in Region SPLIT 
 

 Key: HBASE-7101
 URL: https://issues.apache.org/jira/browse/HBASE-7101
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: Bing Jiang

 I found this issue from a zknode which has existed for a long time in the 
 unassigned parent.And HMaster report warnning log increasingly.The loop log 
 is at below. 
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 
 1a1c950ad45812d7b4b9b90ebf268468 not found on server 
 sev0040,60020,1350378314041; failed processing
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for 
 region 1a1c950ad45812d7b4b9b90ebf268468 from server 
 sev0040,60020,1350378314041 but it doesn't exist anymore, probably already 
 processed its split
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 
 1a1c950ad45812d7b4b9b90ebf268468 not found on server 
 gs-dpo-sev0040,60020,1350378314041; failed processing
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for 
 region 1a1c950ad45812d7b4b9b90ebf268468 from server 
 sev0040,60020,1350378314041 but it doesn't exist anymore, probably already 
 processed its split
 we use Hbase-0.92.1, and I trace back to the source code. HMaster 
 AssignmentManager have already deleted the SPLIT_Region in its memory 
 structure,but HRegionServer SplitTransaction has found the 
 unassigned/parent-node existed in a transient state, precisely 
 SplitTransaction executes tickleNodeSplit to update a new version a little 
 later than  AssignmentManager deleting unassigned/parent-znode. After 
 updating a version of the znode, it will intrigue the handleRegion operation 
 again, however, AssignmentManager assert that the RegionState in Memory has 
 been deleted, and transaction goes into a retry loop.
 In the SplitTransaction, transitionZKNode will retry tickleNodeSplit after 
 sleeping 100ms. In my opinion, if the time is much longger than 100ms, all 
 the operation from AssignmentManagement will finish off completely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-7101) HBase stuck in Region SPLIT


 [ 
https://issues.apache.org/jira/browse/HBASE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reopened HBASE-7101:
--


 HBase stuck in Region SPLIT 
 

 Key: HBASE-7101
 URL: https://issues.apache.org/jira/browse/HBASE-7101
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: Bing Jiang

 I found this issue from a zknode which has existed for a long time in the 
 unassigned parent.And HMaster report warnning log increasingly.The loop log 
 is at below. 
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 
 1a1c950ad45812d7b4b9b90ebf268468 not found on server 
 sev0040,60020,1350378314041; failed processing
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for 
 region 1a1c950ad45812d7b4b9b90ebf268468 from server 
 sev0040,60020,1350378314041 but it doesn't exist anymore, probably already 
 processed its split
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 
 1a1c950ad45812d7b4b9b90ebf268468 not found on server 
 gs-dpo-sev0040,60020,1350378314041; failed processing
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for 
 region 1a1c950ad45812d7b4b9b90ebf268468 from server 
 sev0040,60020,1350378314041 but it doesn't exist anymore, probably already 
 processed its split
 we use Hbase-0.92.1, and I trace back to the source code. HMaster 
 AssignmentManager have already deleted the SPLIT_Region in its memory 
 structure,but HRegionServer SplitTransaction has found the 
 unassigned/parent-node existed in a transient state, precisely 
 SplitTransaction executes tickleNodeSplit to update a new version a little 
 later than  AssignmentManager deleting unassigned/parent-znode. After 
 updating a version of the znode, it will intrigue the handleRegion operation 
 again, however, AssignmentManager assert that the RegionState in Memory has 
 been deleted, and transaction goes into a retry loop.
 In the SplitTransaction, transitionZKNode will retry tickleNodeSplit after 
 sleeping 100ms. In my opinion, if the time is much longger than 100ms, all 
 the operation from AssignmentManagement will finish off completely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-7101) HBase stuck in Region SPLIT


 [ 
https://issues.apache.org/jira/browse/HBASE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-7101.
--

Resolution: Duplicate

 HBase stuck in Region SPLIT 
 

 Key: HBASE-7101
 URL: https://issues.apache.org/jira/browse/HBASE-7101
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: Bing Jiang

 I found this issue from a zknode which has existed for a long time in the 
 unassigned parent.And HMaster report warnning log increasingly.The loop log 
 is at below. 
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 
 1a1c950ad45812d7b4b9b90ebf268468 not found on server 
 sev0040,60020,1350378314041; failed processing
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for 
 region 1a1c950ad45812d7b4b9b90ebf268468 from server 
 sev0040,60020,1350378314041 but it doesn't exist anymore, probably already 
 processed its split
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 
 1a1c950ad45812d7b4b9b90ebf268468 not found on server 
 gs-dpo-sev0040,60020,1350378314041; failed processing
 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for 
 region 1a1c950ad45812d7b4b9b90ebf268468 from server 
 sev0040,60020,1350378314041 but it doesn't exist anymore, probably already 
 processed its split
 we use Hbase-0.92.1, and I trace back to the source code. HMaster 
 AssignmentManager have already deleted the SPLIT_Region in its memory 
 structure,but HRegionServer SplitTransaction has found the 
 unassigned/parent-node existed in a transient state, precisely 
 SplitTransaction executes tickleNodeSplit to update a new version a little 
 later than  AssignmentManager deleting unassigned/parent-znode. After 
 updating a version of the znode, it will intrigue the handleRegion operation 
 again, however, AssignmentManager assert that the RegionState in Memory has 
 been deleted, and transaction goes into a retry loop.
 In the SplitTransaction, transitionZKNode will retry tickleNodeSplit after 
 sleeping 100ms. In my opinion, if the time is much longger than 100ms, all 
 the operation from AssignmentManagement will finish off completely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6335) Switching log-splitting policy after last failure master start may cause data loss


 [ 
https://issues.apache.org/jira/browse/HBASE-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6335:
-

Fix Version/s: (was: 0.94.5)
   0.94.6

Is this an issue in 0.94 only (or in 0.96 as well)?

 Switching log-splitting policy after last failure master start may cause data 
 loss
 --

 Key: HBASE-6335
 URL: https://issues.apache.org/jira/browse/HBASE-6335
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.1, 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.94.6


 How happen?
 If server A is down, and it has three log files, all the data is from one 
 region.
 File 1: kv01 kv02 kv03
 File 2: kv04 kv05 kv06
 File 3: kv07 kv08 kv09
 Here,kv01 means, its log seqID is 01
 Case:Switch to maste-local-log-splitting from distributed-log-splitting
 1.Master find serverA is down, and start to split its log files using 
 split-log-splitting.
 2.Successfully split log file2, and move it to oldLogs, and generate one edit 
 file named 06 in region recover.edits dir.
 3.Master restart, and change the log-splitting policy to 
 maste-local-log-splitting , and start to split file 1, file 3
 4.Successfully split log file1 and file3, and generate one edit file named 09 
 in region recover.edits dir.
 5.Region replay edits from edit file 06 and 09, Region's seqID is 06 after it 
 replay edits from 06, and when replaying edit from 09, it will skip 
 kv01,kv02,kv03, So these data loss.
 As the above case, if we switch  to distributed-log-splitting from 
 maste-local-log-splitting, it could also cause data loss
 Should we fix this bug or avoid the case? I'm not sure...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6469) Failure on enable/disable table will cause table state in zk to be left as enabling/disabling until master is restart


 [ 
https://issues.apache.org/jira/browse/HBASE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6469:
-

Fix Version/s: (was: 0.94.5)
   0.94.6

Any update? Moving out to 0.94.6

 Failure on enable/disable table will cause table state in zk to be left as 
 enabling/disabling until master is restart
 -

 Key: HBASE-6469
 URL: https://issues.apache.org/jira/browse/HBASE-6469
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.2, 0.96.0
Reporter: Enis Soztutar
Assignee: Nick Dimiduk
 Fix For: 0.96.0, 0.94.6

 Attachments: 6469-expose-force-r3.patch


 In Enable/DisableTableHandler code, if something goes wrong in handling, the 
 table state in zk is left as ENABLING / DISABLING. After that we cannot force 
 any more action from the API or CLI, and the only recovery path is restarting 
 the master. 
 {code}
 if (done) {
   // Flip the table to enabled.
   this.assignmentManager.getZKTable().setEnabledTable(
 this.tableNameStr);
   LOG.info(Table ' + this.tableNameStr
   + ' was successfully enabled. Status: done= + done);
 } else {
   LOG.warn(Table ' + this.tableNameStr
   + ' wasn't successfully enabled. Status: done= + done);
 }
 {code}
 Here, if done is false, the table state is not changed. There is also no way 
 to set skipTableStateCheck from cli / api. 
 We have run into this issue a couple of times before. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5664) CP hooks in Scan flow for fast forward when filter filters out a row


[ 
https://issues.apache.org/jira/browse/HBASE-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563763#comment-13563763
 ] 

Lars Hofhansl commented on HBASE-5664:
--

Thanks Anoop. Will do as soon as I get to it (the next two days or so)

 CP hooks in Scan flow for fast forward when filter filters out a row
 

 Key: HBASE-5664
 URL: https://issues.apache.org/jira/browse/HBASE-5664
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Filters
Affects Versions: 0.92.1
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-5664_94.patch, HBASE-5664_94_V2.patch, 
 HBASE-5664_94_V3.patch, HBASE-5664_Trunk.patch, HBASE-5664_Trunk_V2.patch


 In HRegion.nextInternal(int limit, String metric)
   We have while(true) loop so as to fetch a next result which satisfies 
 filter condition. When Filter filters out the current fetched row we call 
 nextRow(byte [] currentRow) before going with the next row.
 {code}
 if (results.isEmpty() || filterRow()) {
 // this seems like a redundant step - we already consumed the row
 // there're no left overs.
 // the reasons for calling this method are:
 // 1. reset the filters.
 // 2. provide a hook to fast forward the row (used by subclasses)
 nextRow(currentRow);
 {code}
 // 2. provide a hook to fast forward the row (used by subclasses)
 We can provide same feature of fast forward support for the CP also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7545) [replication] Break out TestReplication into manageable classes


[ 
https://issues.apache.org/jira/browse/HBASE-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563764#comment-13563764
 ] 

Lars Hofhansl commented on HBASE-7545:
--

Still wanna make a 0.94 patch, [~jdcryans]?

 [replication] Break out TestReplication into manageable classes
 ---

 Key: HBASE-7545
 URL: https://issues.apache.org/jira/browse/HBASE-7545
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.96.0

 Attachments: HBASE-7545.patch


 This has been discussed before but after trying to debug the last failure on 
 Jenkins where I saw the time go back and forth (if you don't care about your 
 own sanity do checkout 
 https://builds.apache.org/job/HBase-TRUNK/3726/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/testDeleteTypes/)
  I think it is time to break out TestReplication.
 The difficulty is that the setup for the 2 clusters is a lot of code I don't 
 want to duplicate. I'm thinking that we can keep {{setUpBeforeClass}} there 
 and have the other classes extend TestReplication (which should also change 
 name). I'm thinking of the following new classes:
  - TestReplicationSmallTests, contains the easy methods that don't mess 
 around too much.
  - TestReplicationQueueFailover, contains one test of the same name
  - TestReplicationDisableInactivePeer, contains one test of the same name
  - Rename TestReplicationWithCompression 
 TestReplicationQueueFailoverWithCompression and make it extends 
 TestReplicationQueueFailover.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7111) hbase zkcli will not start if the zookeeper server choosen to connectted to is not available


 [ 
https://issues.apache.org/jira/browse/HBASE-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-7111:
-

Fix Version/s: (was: 0.94.5)
   0.94.6

 hbase zkcli will not start if the zookeeper server choosen to connectted to  
 is not available
 -

 Key: HBASE-7111
 URL: https://issues.apache.org/jira/browse/HBASE-7111
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Affects Versions: 0.94.2
Reporter: Zhou wenjian
Assignee: Zhou wenjian
 Fix For: 0.94.6

 Attachments: HBASE-7111-trunk.patch, HBASE-7111-trunk-v2.patch


 there are 3 zookeeper servers in my cluster.
 s1
 s2
 s3
 after killing  s3, i found the hbase zkcli will not start again.
 it will try to connect to s3 continuely. 
 /11/07 11:01:01 INFO zookeeper.ClientCnxn: Opening socket connection to 
 server s3
 12/11/07 11:01:01 WARN zookeeper.ClientCnxn: Session 0x0 for server null, 
 unexpected error, closing socket connection and attempting reconnect
 java.net.ConnectException: Connection refused
 from the code 
 {code}
   public String parse(final Configuration c) {
 // Note that we do not simply grab the property
 // HConstants.ZOOKEEPER_QUORUM from the HBaseConfiguration because the
 // user may be using a zoo.cfg file.
 Properties zkProps = ZKConfig.makeZKProps(c);
 String host = null;
 String clientPort = null;
 for (EntryObject, Object entry: zkProps.entrySet()) {
   String key = entry.getKey().toString().trim();
   String value = entry.getValue().toString().trim();
   if (key.startsWith(server.)  host == null) {
 String[] parts = value.split(:);
 host = parts[0];
   } else if (key.endsWith(clientPort)) {
 clientPort = value;
   }
   if (host != null  clientPort != null) break;
 }
 return host != null  clientPort != null? host + : + clientPort: null;
   }
 {code}
 the code will choose the fixed zookeeper server (here is the unavailable s3), 
 which leads to the script fails

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master


 [ 
https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5323:
-

Fix Version/s: (was: 0.94.5)

Removing from 0.94 since there is no movement.

 Need to handle assertion error while splitting log through 
 ServerShutDownHandler by shutting down the master
 

 Key: HBASE-5323
 URL: https://issues.apache.org/jira/browse/HBASE-5323
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.8

 Attachments: HBASE-5323.patch, HBASE-5323.patch


 We know that while parsing the HLog we expect the proper length from HDFS.
 In WALReaderFSDataInputStream
 {code}
   assert(realLength = this.length);
 {code}
 We are trying to come out if the above condition is not satisfied.  But if 
 SSH.splitLog() gets this problem then it lands in the run method of 
 EventHandler.  This kills the SSH thread and so further assignment does not 
 happen.  If ROOT and META are to be assigned they cannot be.
 I think in this condition we abort the master by catching such exceptions.
 Please do suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7684) NullPointerException in SecureClient when Call is cleaned up due to RPC timeout

cuijianwei created HBASE-7684:
-

 Summary: NullPointerException in SecureClient when Call is cleaned 
up due to RPC  timeout
 Key: HBASE-7684
 URL: https://issues.apache.org/jira/browse/HBASE-7684
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.3
Reporter: cuijianwei


We find NullPointException when using SecureClient to access hbase. From the 
source code, we find that receiveResponse() will not check call != null before 
invoke call.setValue(...)(line 378 and 380 in SecureClient.java). However, as 
explained in receiveResponse() of HBaseClient, the call may have been cleaned 
up due to RPC timeout; therefore, it should be better to check call != null 
before invoke call.setValue(...) in SecureClient

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7684) NullPointerException in SecureClient when Call is cleaned up due to RPC timeout


 [ 
https://issues.apache.org/jira/browse/HBASE-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-7684:
--

Attachment: check_call_null_before_set_in_secure_client.diff

 NullPointerException in SecureClient when Call is cleaned up due to RPC  
 timeout
 

 Key: HBASE-7684
 URL: https://issues.apache.org/jira/browse/HBASE-7684
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.3
Reporter: cuijianwei
 Attachments: check_call_null_before_set_in_secure_client.diff


 We find NullPointException when using SecureClient to access hbase. From the 
 source code, we find that receiveResponse() will not check call != null 
 before invoke call.setValue(...)(line 378 and 380 in SecureClient.java). 
 However, as explained in receiveResponse() of HBaseClient, the call may have 
 been cleaned up due to RPC timeout; therefore, it should be better to check 
 call != null before invoke call.setValue(...) in SecureClient

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7685) Closed socket connection can't be remove from SecureClient

cuijianwei created HBASE-7685:
-

 Summary: Closed socket connection can't be remove from SecureClient
 Key: HBASE-7685
 URL: https://issues.apache.org/jira/browse/HBASE-7685
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.3
Reporter: cuijianwei


We found many IOExceptions naming Call # not added as the connection # is 
closing when using SecureClient to access hbase. From the source code of 
SecureClient, we found there may be bug in close() method of SecureClient. The 
following is the current logic in close() method of SecureClient：
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
if (connections.get(remoteId) == this) {
  connections.remove(remoteId);
}
  }
However, connections are managed by PoolMap； therefore, if more than one 
connection are created for the same remoteId, the 'if condition' may not be 
satisfied because the default PoolType is RoundRobin. This could be cause a 
closing state connection can't be removed from connections, making new calls 
which use such connections will always throws IOException naming connection is 
closing.
We can use logic from close() method of HBaseClient to resolve the problem:
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
connections.remove(remoteId, this);
  }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7685) Closed socket connection can't be remove from SecureClient


 [ 
https://issues.apache.org/jira/browse/HBASE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-7685:
--

Attachment: closing_socket_connection_not_removed_in_secure_client.diff

 Closed socket connection can't be remove from SecureClient
 --

 Key: HBASE-7685
 URL: https://issues.apache.org/jira/browse/HBASE-7685
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.3
Reporter: cuijianwei
 Attachments: 
 closing_socket_connection_not_removed_in_secure_client.diff


 We found many IOExceptions naming Call # not added as the connection # is 
 closing when using SecureClient to access hbase. From the source code of 
 SecureClient, we found there may be bug in close() method of SecureClient. 
 The following is the current logic in close() method of SecureClient：
   // release the resources
   // first thing to do;take the connection out of the connection list
   synchronized (connections) {
 if (connections.get(remoteId) == this) {
   connections.remove(remoteId);
 }
   }
 However, connections are managed by PoolMap； therefore, if more than one 
 connection are created for the same remoteId, the 'if condition' may not be 
 satisfied because the default PoolType is RoundRobin. This could be cause a 
 closing state connection can't be removed from connections, making new calls 
 which use such connections will always throws IOException naming connection 
 is closing.
 We can use logic from close() method of HBaseClient to resolve the problem:
   // release the resources
   // first thing to do;take the connection out of the connection list
   synchronized (connections) {
 connections.remove(remoteId, this);
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7685) Closing socket connection can't be remove from SecureClient


 [ 
https://issues.apache.org/jira/browse/HBASE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-7685:
--

Summary: Closing socket connection can't be remove from SecureClient  (was: 
Closed socket connection can't be remove from SecureClient)

 Closing socket connection can't be remove from SecureClient
 ---

 Key: HBASE-7685
 URL: https://issues.apache.org/jira/browse/HBASE-7685
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.3
Reporter: cuijianwei
 Attachments: 
 closing_socket_connection_not_removed_in_secure_client.diff


 We found many IOExceptions naming Call # not added as the connection # is 
 closing when using SecureClient to access hbase. From the source code of 
 SecureClient, we found there may be bug in close() method of SecureClient. 
 The following is the current logic in close() method of SecureClient：
   // release the resources
   // first thing to do;take the connection out of the connection list
   synchronized (connections) {
 if (connections.get(remoteId) == this) {
   connections.remove(remoteId);
 }
   }
 However, connections are managed by PoolMap； therefore, if more than one 
 connection are created for the same remoteId, the 'if condition' may not be 
 satisfied because the default PoolType is RoundRobin. This could be cause a 
 closing state connection can't be removed from connections, making new calls 
 which use such connections will always throws IOException naming connection 
 is closing.
 We can use logic from close() method of HBaseClient to resolve the problem:
   // release the resources
   // first thing to do;take the connection out of the connection list
   synchronized (connections) {
 connections.remove(remoteId, this);
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7685) Closing socket connection can't be remove from SecureClient


 [ 
https://issues.apache.org/jira/browse/HBASE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-7685:
--

Description: 
We found many IOExceptions naming Call # not added as the connection # is 
closing when using SecureClient to access hbase. From the source code of 
SecureClient, we found there may be bug in close() method of SecureClient. The 
following is the current logic in close() method of SecureClient：
/**close logic in SecureClient*/
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
if (connections.get(remoteId) == this) {
  connections.remove(remoteId);
}
  }
/**close logic in SecureClient*/
However, connections are managed by PoolMap； therefore, if more than one 
connection are created for the same remoteId, the 'if condition' may not be 
satisfied because the default PoolType is RoundRobin. This could be cause a 
closing state connection can't be removed from connections, making new calls 
which use such connections will always throws IOException naming connection is 
closing.
We can use logic from close() method of HBaseClient to resolve the problem:
/**close logic in HBaseClient*/
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
connections.remove(remoteId, this);
  }
/**close logic in HBaseClient*/

  was:
We found many IOExceptions naming Call # not added as the connection # is 
closing when using SecureClient to access hbase. From the source code of 
SecureClient, we found there may be bug in close() method of SecureClient. The 
following is the current logic in close() method of SecureClient：
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
if (connections.get(remoteId) == this) {
  connections.remove(remoteId);
}
  }
However, connections are managed by PoolMap； therefore, if more than one 
connection are created for the same remoteId, the 'if condition' may not be 
satisfied because the default PoolType is RoundRobin. This could be cause a 
closing state connection can't be removed from connections, making new calls 
which use such connections will always throws IOException naming connection is 
closing.
We can use logic from close() method of HBaseClient to resolve the problem:
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
connections.remove(remoteId, this);
  }


 Closing socket connection can't be remove from SecureClient
 ---

 Key: HBASE-7685
 URL: https://issues.apache.org/jira/browse/HBASE-7685
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.3
Reporter: cuijianwei
 Attachments: 
 closing_socket_connection_not_removed_in_secure_client.diff


 We found many IOExceptions naming Call # not added as the connection # is 
 closing when using SecureClient to access hbase. From the source code of 
 SecureClient, we found there may be bug in close() method of SecureClient. 
 The following is the current logic in close() method of SecureClient：
 /**close logic in SecureClient*/
   // release the resources
   // first thing to do;take the connection out of the connection list
   synchronized (connections) {
 if (connections.get(remoteId) == this) {
   connections.remove(remoteId);
 }
   }
 /**close logic in SecureClient*/
 However, connections are managed by PoolMap； therefore, if more than one 
 connection are created for the same remoteId, the 'if condition' may not be 
 satisfied because the default PoolType is RoundRobin. This could be cause a 
 closing state connection can't be removed from connections, making new calls 
 which use such connections will always throws IOException naming connection 
 is closing.
 We can use logic from close() method of HBaseClient to resolve the problem:
 /**close logic in HBaseClient*/
   // release the resources
   // first thing to do;take the connection out of the connection list
   synchronized (connections) {
 connections.remove(remoteId, this);
   }
 /**close logic in HBaseClient*/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7685) Closing socket connection can't be remove from SecureClient


 [ 
https://issues.apache.org/jira/browse/HBASE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-7685:
--

Description: 
We found many IOExceptions naming Call # not added as the connection # is 
closing when using SecureClient to access hbase. From the source code of 
SecureClient, we found there may be bug in close() method of SecureClient. The 
following is the current logic in close() method of SecureClient：
/**close logic in SecureClient*/
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
if (connections.get(remoteId) == this) {
  connections.remove(remoteId);
}
  }
However, connections are managed by PoolMap； therefore, if more than one 
connection are created for the same remoteId, the 'if condition' may not be 
satisfied because the default PoolType is RoundRobin. This could be cause a 
closing state connection can't be removed from connections, making new calls 
which use such connections will always throws IOException naming connection is 
closing.
We can use logic from close() method of HBaseClient to resolve the problem:
/**close logic in HBaseClient*/
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
connections.remove(remoteId, this);
  }

  was:
We found many IOExceptions naming Call # not added as the connection # is 
closing when using SecureClient to access hbase. From the source code of 
SecureClient, we found there may be bug in close() method of SecureClient. The 
following is the current logic in close() method of SecureClient：
/**close logic in SecureClient*/
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
if (connections.get(remoteId) == this) {
  connections.remove(remoteId);
}
  }
/**close logic in SecureClient*/
However, connections are managed by PoolMap； therefore, if more than one 
connection are created for the same remoteId, the 'if condition' may not be 
satisfied because the default PoolType is RoundRobin. This could be cause a 
closing state connection can't be removed from connections, making new calls 
which use such connections will always throws IOException naming connection is 
closing.
We can use logic from close() method of HBaseClient to resolve the problem:
/**close logic in HBaseClient*/
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
connections.remove(remoteId, this);
  }
/**close logic in HBaseClient*/


 Closing socket connection can't be remove from SecureClient
 ---

 Key: HBASE-7685
 URL: https://issues.apache.org/jira/browse/HBASE-7685
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.3
Reporter: cuijianwei
 Attachments: 
 closing_socket_connection_not_removed_in_secure_client.diff


 We found many IOExceptions naming Call # not added as the connection # is 
 closing when using SecureClient to access hbase. From the source code of 
 SecureClient, we found there may be bug in close() method of SecureClient. 
 The following is the current logic in close() method of SecureClient：
 /**close logic in SecureClient*/
   // release the resources
   // first thing to do;take the connection out of the connection list
   synchronized (connections) {
 if (connections.get(remoteId) == this) {
   connections.remove(remoteId);
 }
   }
 However, connections are managed by PoolMap； therefore, if more than one 
 connection are created for the same remoteId, the 'if condition' may not be 
 satisfied because the default PoolType is RoundRobin. This could be cause a 
 closing state connection can't be removed from connections, making new calls 
 which use such connections will always throws IOException naming connection 
 is closing.
 We can use logic from close() method of HBaseClient to resolve the problem:
 /**close logic in HBaseClient*/
   // release the resources
   // first thing to do;take the connection out of the connection list
   synchronized (connections) {
 connections.remove(remoteId, this);
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7681) Address some recent random test failures


[ 
https://issues.apache.org/jira/browse/HBASE-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563795#comment-13563795
 ] 

Hudson commented on HBASE-7681:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #378 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/378/])
HBASE-7681 Addendum, close tables in TestRegionServerMetrics (Revision 
1439026)
HBASE-7681 Address some recent random test failures (Revision 1439003)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java

larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestNodeHealthCheckChore.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestLruBlockCache.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestMiniClusterLoadSequential.java


 Address some recent random test failures
 

 Key: HBASE-7681
 URL: https://issues.apache.org/jira/browse/HBASE-7681
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7681-0.94-addendum.txt, 7681-0.94-combined.txt, 
 7681-0.94-combined_v2.txt, 7681-0.94.txt, 7681-0.96-addendum.txt, 
 7681-0.96-combined.txt, 7681-0.96-combined.txt, 7681-94-v1.txt, 
 7681-94-v2.txt, 7681-94-v3.txt, 7681-trunk-v1.txt


 I've seen many unspecific test failures recently that cannot be reproduced 
 locally even when running these test is a loop for a very long time.
 Many of these test one way or the other make assumption w.r.t. wall clock 
 time. While I cannot fix that, an option to increase some of these timeout a 
 bit.
 This issue is to remind me to do that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7684) NullPointerException in SecureClient when Call is cleaned up due to RPC timeout


[ 
https://issues.apache.org/jira/browse/HBASE-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563820#comment-13563820
 ] 

Ted Yu commented on HBASE-7684:
---

Patch looks good.
If you can show us the NullPointException stack trace, that would be nice.
{code}
-  call.setValue(value);
+ if (call != null) {
{code}
You can see that start of if statement doesn't align with call.setValue() 
statement.

Please use dev-support/hbase_eclipse_formatter.xml from trunk so that the code 
style aligns with guideline.

 NullPointerException in SecureClient when Call is cleaned up due to RPC  
 timeout
 

 Key: HBASE-7684
 URL: https://issues.apache.org/jira/browse/HBASE-7684
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.3
Reporter: cuijianwei
 Attachments: check_call_null_before_set_in_secure_client.diff


 We find NullPointException when using SecureClient to access hbase. From the 
 source code, we find that receiveResponse() will not check call != null 
 before invoke call.setValue(...)(line 378 and 380 in SecureClient.java). 
 However, as explained in receiveResponse() of HBaseClient, the call may have 
 been cleaned up due to RPC timeout; therefore, it should be better to check 
 call != null before invoke call.setValue(...) in SecureClient

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7686) TestSplitTransactionOnCluster fails occasionally in trunk builds

Ted Yu created HBASE-7686:
-

 Summary: TestSplitTransactionOnCluster fails occasionally in trunk 
builds
 Key: HBASE-7686
 URL: https://issues.apache.org/jira/browse/HBASE-7686
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Priority: Critical
 Fix For: 0.96.0


From trunk build #3808:
{code} 
testShouldFailSplitIfZNodeDoesNotExistDueToPrevRollBack(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster):
 test timed out after 2 milliseconds
  
testMasterRestartWhenSplittingIsPartial(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster):
 test timed out after 30 milliseconds
  
testExistingZnodeBlocksSplitAndWeRollback(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster):
 test timed out after 30 milliseconds
{code}
From HBase-TRUNK-on-Hadoop-2.0.0 #378 :
{code}
testShutdownSimpleFixup(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster):
 Region not moved off .META. server
  
testShouldFailSplitIfZNodeDoesNotExistDueToPrevRollBack(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster):
 test timed out after 2 milliseconds
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7392) Disable failing example unit tests TestZooKeeperScanPolicyObserver and TestBulkDeleteProtocol


 [ 
https://issues.apache.org/jira/browse/HBASE-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7392:
--

Priority: Critical  (was: Major)

This issue prevents trunk artifacts from being generated.

 Disable failing example unit tests TestZooKeeperScanPolicyObserver and 
 TestBulkDeleteProtocol
 -

 Key: HBASE-7392
 URL: https://issues.apache.org/jira/browse/HBASE-7392
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Critical
 Fix For: 0.96.0

 Attachments: 7392.txt


 Jenkins run https://builds.apache.org/job/HBase-TRUNK/3638/ turned up two 
 broke example tests.  They pass on a jdk6 machine locally but not on my jdk7 
 laptop.  Somethings up.  My guess is that these failures have been there a 
 while but only surfaced because we got further than we normally do on a 
 jenkins run.
 Tests have no output on jenkins.
 If I run w/
 $ MAVEN_OPTS= -Xmx3g mvn test -PlocalTests -Dtest=TestBulkDeleteProtocol 
 -Dtest.output.tofile=false 
 I get:
 {code}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.hbase.coprocessor.example.TestBulkDeleteProtocol
 2012-12-19 09:34:36,340 INFO  [main] hbase.HBaseTestingUtility(713): Starting 
 up minicluster with 1 master(s) and 2 regionserver(s) and 2 datanode(s)
 2012-12-19 09:34:46,362 INFO  [main] hbase.HBaseTestingUtility(343): Created 
 new mini-cluster data directory: 
 /Users/stack/checkouts/trunk/hbase-examples/target/test-data/cbfdeaed-c701-4a96-9277-0d9b9615a06e/dfscluster_1c4634ed-2333-48ee-807d-c56f8c4ff20f
 2012-12-19 09:34:46,362 INFO  [main] hbase.HBaseTestingUtility(553): Setting 
 test.cache.data to 
 /Users/stack/checkouts/trunk/hbase-examples/target/test-data/cbfdeaed-c701-4a96-9277-0d9b9615a06e/cache_data
  in system properties and HBase conf
 2012-12-19 09:34:46,363 INFO  [main] hbase.HBaseTestingUtility(553): Setting 
 hadoop.tmp.dir to 
 /Users/stack/checkouts/trunk/hbase-examples/target/test-data/cbfdeaed-c701-4a96-9277-0d9b9615a06e/hadoop_tmp
  in system properties and HBase conf
 2012-12-19 09:34:46,363 INFO  [main] hbase.HBaseTestingUtility(553): Setting 
 hadoop.log.dir to 
 /Users/stack/checkouts/trunk/hbase-examples/target/test-data/cbfdeaed-c701-4a96-9277-0d9b9615a06e/hadoop_logs
  in system properties and HBase conf
 2012-12-19 09:34:46,364 INFO  [main] hbase.HBaseTestingUtility(553): Setting 
 mapred.local.dir to 
 /Users/stack/checkouts/trunk/hbase-examples/target/test-data/cbfdeaed-c701-4a96-9277-0d9b9615a06e/mapred_local
  in system properties and HBase conf
 2012-12-19 09:34:46,364 INFO  [main] hbase.HBaseTestingUtility(553): Setting 
 mapred.temp.dir to 
 /Users/stack/checkouts/trunk/hbase-examples/target/test-data/cbfdeaed-c701-4a96-9277-0d9b9615a06e/mapred_temp
  in system properties and HBase conf
 2012-12-19 09:34:46,365 INFO  [main] hbase.HBaseTestingUtility(536): read 
 short circuit is ON for user stack
 2012-12-19 09:34:46.438 java[16837:1703] Unable to load realm info from 
 SCDynamicStore
 2012-12-19 09:34:56,540 DEBUG [main] fs.HFileSystem(199): Starting 
 addLocationsOrderInterceptor with class class 
 org.apache.hadoop.hbase.fs.HFileSystem$ReorderWALBlocks
 2012-12-19 09:34:56,541 WARN  [main] fs.HFileSystem(215): The file system is 
 not a DistributedFileSystem.Not adding block location reordering
 2012-12-19 09:34:56,669 WARN  [main] namenode.FSNamesystem(564): The 
 dfs.support.append option is in your configuration, however append is not 
 supported. This configuration option is no longer required to enable sync.
 2012-12-19 09:35:06,962 WARN  [main] namenode.FSNamesystem(564): The 
 dfs.support.append option is in your configuration, however append is not 
 supported. This configuration option is no longer required to enable sync.
 2012-12-19 09:35:07,076 INFO  [main] log.Slf4jLog(67): Logging to 
 org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
 org.mortbay.log.Slf4jLog
 2012-12-19 09:35:07,151 INFO  [main] log.Slf4jLog(67): jetty-6.1.26
 2012-12-19 09:35:07,184 INFO  [main] log.Slf4jLog(67): Extract 
 jar:file:/Users/stack/.m2/repository/org/apache/hadoop/hadoop-core/1.1.1/hadoop-core-1.1.1.jar!/webapps/hdfs
  to 
 /var/folders/bp/2z1cykc92rs6j24251cg__phgp/T/Jetty_localhost_57924_hdfsg9mqyr/webapp
 2012-12-19 09:35:07,402 INFO  [main] log.Slf4jLog(67): Started 
 SelectChannelConnector@localhost:57924
 Starting DataNode 0 with dfs.data.dir:

[jira] [Updated] (HBASE-7685) Closing socket connection can't be removed from SecureClient


 [ 
https://issues.apache.org/jira/browse/HBASE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7685:
--

Description: 
We found many IOExceptions naming Call # not added as the connection # is 
closing when using SecureClient to access hbase. From the source code of 
SecureClient, we found there may be bug in close() method of SecureClient. The 
following is the current logic in close() method of SecureClient：
{code}
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
if (connections.get(remoteId) == this) {
  connections.remove(remoteId);
}
  }
{code}
However, connections are managed by PoolMap； therefore, if more than one 
connection are created for the same remoteId, the 'if condition' may not be 
satisfied because the default PoolType is RoundRobin. This could be cause a 
closing state connection can't be removed from connections, making new calls 
which use such connections will always throws IOException naming connection is 
closing.
We can use logic from close() method of HBaseClient to resolve the problem:
{code}
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
connections.remove(remoteId, this);
  }
{code}

  was:
We found many IOExceptions naming Call # not added as the connection # is 
closing when using SecureClient to access hbase. From the source code of 
SecureClient, we found there may be bug in close() method of SecureClient. The 
following is the current logic in close() method of SecureClient：
/**close logic in SecureClient*/
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
if (connections.get(remoteId) == this) {
  connections.remove(remoteId);
}
  }
However, connections are managed by PoolMap； therefore, if more than one 
connection are created for the same remoteId, the 'if condition' may not be 
satisfied because the default PoolType is RoundRobin. This could be cause a 
closing state connection can't be removed from connections, making new calls 
which use such connections will always throws IOException naming connection is 
closing.
We can use logic from close() method of HBaseClient to resolve the problem:
/**close logic in HBaseClient*/
  // release the resources
  // first thing to do;take the connection out of the connection list
  synchronized (connections) {
connections.remove(remoteId, this);
  }

Summary: Closing socket connection can't be removed from SecureClient  
(was: Closing socket connection can't be remove from SecureClient)

I think the root cause is in PoolMap.
Here is some analysis from Hiroshi-san:

https://issues.apache.org/jira/browse/HBASE-6651?focusedCommentId=13448770page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13448770

 Closing socket connection can't be removed from SecureClient
 

 Key: HBASE-7685
 URL: https://issues.apache.org/jira/browse/HBASE-7685
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.3
Reporter: cuijianwei
 Attachments: 
 closing_socket_connection_not_removed_in_secure_client.diff


 We found many IOExceptions naming Call # not added as the connection # is 
 closing when using SecureClient to access hbase. From the source code of 
 SecureClient, we found there may be bug in close() method of SecureClient. 
 The following is the current logic in close() method of SecureClient：
 {code}
   // release the resources
   // first thing to do;take the connection out of the connection list
   synchronized (connections) {
 if (connections.get(remoteId) == this) {
   connections.remove(remoteId);
 }
   }
 {code}
 However, connections are managed by PoolMap； therefore, if more than one 
 connection are created for the same remoteId, the 'if condition' may not be 
 satisfied because the default PoolType is RoundRobin. This could be cause a 
 closing state connection can't be removed from connections, making new calls 
 which use such connections will always throws IOException naming connection 
 is closing.
 We can use logic from close() method of HBaseClient to resolve the problem:
 {code}
   // release the resources
   // first thing to do;take the connection out of the connection list
   synchronized (connections) {
 connections.remove(remoteId, this);
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA

[jira] [Created] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally

Lars Hofhansl created HBASE-7687:


 Summary: TestCatalogTracker.testServerNotRunningIOException fails 
occasionally
 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl


I would like to remove this method. It micro-tests a single method and poorly 
at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7687:
--

Attachment: 7687-94.txt

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7687:
--

Attachment: 7687-trunk.txt

I looped TestCatalogTracker with 0.94 patch 40 times locally and they passed.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-7687:
-

Assignee: Ted Yu

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7687:
--

Status: Patch Available  (was: Open)

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563841#comment-13563841
 ] 

Lars Hofhansl commented on HBASE-7687:
--

This method and code is wrought with race conditions. I reproduced it locally. 
What happened in this:
# waitForMetaServerConnection started to wait
# the test thread sets a new setMetaLocation
# CatalogTracker.getMetaServerConnection does not find a valid connection and 
resets the meta location flag
# since the test thread only sets it once, waitForMetaServerConnection will 
wait for forever (until timeout).

Let's just remove this stuff.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563842#comment-13563842
 ] 

Lars Hofhansl commented on HBASE-7687:
--

I looked it as well... After 30 round (or so) it failed.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563842#comment-13563842
 ] 

Lars Hofhansl edited comment on HBASE-7687 at 1/27/13 5:17 PM:
---

I looped it as well... After 30 round (or so) it failed.

  was (Author: lhofhansl):
I looked it as well... After 30 round (or so) it failed.
  
 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563843#comment-13563843
 ] 

Lars Hofhansl commented on HBASE-7687:
--

Why would the 0.94 patch make it better? waitForMetaServerConnection just 
called waitForMeta.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563852#comment-13563852
 ] 

Ted Yu commented on HBASE-7687:
---

bq. waitForMetaServerConnection just called waitForMeta.
getCachedConnection() comes into play:
{code}
   * @return The HRegionInterface we got when we connected to codesn/code
   * May have come from cache, may not be good, may have been setup by this
   * invocation, or may be null.
...
try {
  protocol = connection.getHRegionConnection(sn.getHostname(), 
sn.getPort());
{code}
If the test fails with my patch, that means the patch didn't fix the problem :-(

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7607) Fix TestRegionServerCoprocessorExceptionWithAbort flakiness

2013-01-27 Thread Himanshu Vashishtha (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563858#comment-13563858
]

Himanshu Vashishtha commented on HBASE-7607:

I don't think we fixed that in 7681. The 4382 patch doesn't correctly solve the
issue raised in 0.94, where the description says that the client went into
retrying mode. In trunk, all the sleep/interrupt logic is gone, and which is in
the right direction, but I wonder why the client didn't get IOException.

Fix TestRegionServerCoprocessorExceptionWithAbort flakiness

Key: HBASE-7607
URL: https://issues.apache.org/jira/browse/HBASE-7607
Project: HBase
Issue Type: Bug
Components: Client, test
Affects Versions: 0.94.4
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha

TestRegionServerCoprocessorExceptionWithAbort fails sometimes both on trunk
and 0.94.X. The codebase is different in both.
In trunk, table.put() passes even with BuggyRegionCoprocessor failed the
transaction:
The put should have failed, as the coprocessor is buggy
In 0.94.x, client retries to look at the root region, while the cluster is
down and /hbase znode is no longer present.
Check the value configured in 'zookeeper.znode.parent'. There could be a
mismatch with the one configured in the master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7687:
--

Attachment: (was: 7687-trunk.txt)

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally

2013-01-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563860#comment-13563860
 ] 

Hadoop QA commented on HBASE-7687:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566672/7687-trunk.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 24 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4205//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4205//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4205//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4205//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4205//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4205//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4205//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4205//console

This message is automatically generated.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7687:
--

Attachment: 7687-94-v2.txt

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7687:
--

Attachment: 7687-trunk.txt

See if patch v2 is better.

I removed the second Threads.sleep(1) call.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563862#comment-13563862
 ] 

Lars Hofhansl commented on HBASE-7687:
--

v1 failed after 39 iterations for me. trying v2 now

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563864#comment-13563864
 ] 

Lars Hofhansl commented on HBASE-7687:
--

v2 fails always for me.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7607) Fix TestRegionServerCoprocessorExceptionWithAbort flakiness


[ 
https://issues.apache.org/jira/browse/HBASE-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563865#comment-13563865
 ] 

Lars Hofhansl commented on HBASE-7607:
--

Feel free to reopen :)

 Fix TestRegionServerCoprocessorExceptionWithAbort flakiness 
 

 Key: HBASE-7607
 URL: https://issues.apache.org/jira/browse/HBASE-7607
 Project: HBase
  Issue Type: Bug
  Components: Client, test
Affects Versions: 0.94.4
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha

 TestRegionServerCoprocessorExceptionWithAbort fails sometimes both on trunk 
 and 0.94.X. The codebase is different in both. 
 In trunk, table.put() passes even with BuggyRegionCoprocessor failed the 
 transaction:
 The put should have failed, as the coprocessor is buggy
 In 0.94.x, client retries to look at the root region, while the cluster is 
 down and /hbase znode is no longer present.
 Check the value configured in 'zookeeper.znode.parent'. There could be a 
 mismatch with the one configured in the master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563867#comment-13563867
 ] 

Lars Hofhansl commented on HBASE-7687:
--

My vote is still to simply remove this test. What it tests is not useful (IMHO)

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7687:
--

Status: Open  (was: Patch Available)

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7687:
--

Attachment: 7687-94-v3.txt

I stopped the loop for patch v2 after 63 successful iterations.

Patch v3 removes the while loop which waits for Thread to come alive because 
CountDownLatch serves the same purpose.

Hopefully the Thread would see the second ct.setMetaLocation(SN) call.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563873#comment-13563873
 ] 

Lars Hofhansl commented on HBASE-7687:
--

V2 fails every single time for me, on the very first iteration.


 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563874#comment-13563874
 ] 

Lars Hofhansl commented on HBASE-7687:
--

I'm trying my bandaid patch from parent now.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563880#comment-13563880
 ] 

Ted Yu commented on HBASE-7687:
---

@Lars:
If you can try patch v3, that would be nice.

The test, started @ 10:33 this morning, is still looping.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally

2013-01-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563883#comment-13563883
 ] 

Hadoop QA commented on HBASE-7687:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566677/7687-trunk.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4206//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4206//console

This message is automatically generated.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563884#comment-13563884
 ] 

Lars Hofhansl commented on HBASE-7687:
--

I looped my bandaid patch from parent. Looped it for 100 iterations. Will try 
v3.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563885#comment-13563885
 ] 

Lars Hofhansl commented on HBASE-7687:
--

Your v3 looks good. Let's change the Thread.sleep(1) to Thread.sleep(100) just 
be sure.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7687:
--

Attachment: 7687-trunk-v3.txt

Let's get Hadoop QA result.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt, 7687-trunk-v3.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563895#comment-13563895
 ] 

Ted Yu commented on HBASE-7687:
---

Patch v3 for 0.94 made TestCatalogTracker pass 100 times.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt, 7687-trunk-v3.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563896#comment-13563896
 ] 

Lars Hofhansl commented on HBASE-7687:
--

Looped your v3 also 100 times. Seems fine.
I do not think you need to change the call to waitForMetaServerConnection and 
it would still be nice to increase the sleep time to 100ms (just be safe, 
otherwise waitForRoot could enter the wait loop after the main threads sets the 
meta location)

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt, 7687-trunk-v3.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563900#comment-13563900
 ] 

Lars Hofhansl commented on HBASE-7687:
--

Hmm... When I put waitForMetaServerConnection, I do get it to fail pretty 
quickly.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt, 7687-trunk-v3.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563901#comment-13563901
 ] 

Lars Hofhansl commented on HBASE-7687:
--

+1 on your v3.

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt, 7687-trunk-v3.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563933#comment-13563933
 ] 

Ted Yu commented on HBASE-7687:
---

Integrated to 0.94 and trunk.

Thanks for the review, Lars

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt, 7687-trunk-v3.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563938#comment-13563938
 ] 

Hudson commented on HBASE-7687:
---

Integrated in HBase-0.94 #791 (See 
[https://builds.apache.org/job/HBase-0.94/791/])
HBASE-7687 TestCatalogTracker.testServerNotRunningIOException fails 
occasionally (Ted Yu) (Revision 1439179)

 Result = ABORTED
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java


 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt, 7687-trunk-v3.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563941#comment-13563941
 ] 

Hudson commented on HBASE-7687:
---

Integrated in HBase-TRUNK #3811 (See 
[https://builds.apache.org/job/HBase-TRUNK/3811/])
HBASE-7687 TestCatalogTracker.testServerNotRunningIOException fails 
occasionally (Ted Yu) (Revision 1439180)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java


 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt, 7687-trunk-v3.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563942#comment-13563942
 ] 

Hudson commented on HBASE-7687:
---

Integrated in HBase-0.94-security #101 (See 
[https://builds.apache.org/job/HBase-0.94-security/101/])
HBASE-7687 TestCatalogTracker.testServerNotRunningIOException fails 
occasionally (Ted Yu) (Revision 1439179)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java


 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt, 7687-trunk-v3.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7688) Master coprocessor RPCs don't propagate exceptions

Mike Lewis created HBASE-7688:
-

 Summary: Master coprocessor RPCs don't propagate exceptions
 Key: HBASE-7688
 URL: https://issues.apache.org/jira/browse/HBASE-7688
 Project: HBase
  Issue Type: Bug
  Components: Protobufs
Affects Versions: 0.96.0
Reporter: Mike Lewis


I was working on some coprocessor tests, and was wondering why exceptions 
weren't bubbling up while calling checkFailed on my RpcController.

Turns out, that HMaster's execMasterService seems to be sending the wrong 
RpcController (that happens to be null in my case) to the methods.

I have a fix that appears to work, but I am not familiar enough with the code 
to know that it's the write one.

Version I am working on has the git sha 9a322b0


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7688) Master coprocessor RPCs don't propagate exceptions


 [ 
https://issues.apache.org/jira/browse/HBASE-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Lewis updated HBASE-7688:
--

Fix Version/s: 0.96.0
   Status: Patch Available  (was: Open)

 Master coprocessor RPCs don't propagate exceptions
 --

 Key: HBASE-7688
 URL: https://issues.apache.org/jira/browse/HBASE-7688
 Project: HBase
  Issue Type: Bug
  Components: Protobufs
Affects Versions: 0.96.0
Reporter: Mike Lewis
 Fix For: 0.96.0

 Attachments: fix-hmaster-rpc-exceptions.patch


 I was working on some coprocessor tests, and was wondering why exceptions 
 weren't bubbling up while calling checkFailed on my RpcController.
 Turns out, that HMaster's execMasterService seems to be sending the wrong 
 RpcController (that happens to be null in my case) to the methods.
 I have a fix that appears to work, but I am not familiar enough with the code 
 to know that it's the write one.
 Version I am working on has the git sha 9a322b0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7688) Master coprocessor RPCs don't propagate exceptions


 [ 
https://issues.apache.org/jira/browse/HBASE-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Lewis updated HBASE-7688:
--

Status: Open  (was: Patch Available)

 Master coprocessor RPCs don't propagate exceptions
 --

 Key: HBASE-7688
 URL: https://issues.apache.org/jira/browse/HBASE-7688
 Project: HBase
  Issue Type: Bug
  Components: Protobufs
Affects Versions: 0.96.0
Reporter: Mike Lewis
 Fix For: 0.96.0

 Attachments: fix-hmaster-rpc-exceptions.patch


 I was working on some coprocessor tests, and was wondering why exceptions 
 weren't bubbling up while calling checkFailed on my RpcController.
 Turns out, that HMaster's execMasterService seems to be sending the wrong 
 RpcController (that happens to be null in my case) to the methods.
 I have a fix that appears to work, but I am not familiar enough with the code 
 to know that it's the write one.
 Version I am working on has the git sha 9a322b0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7688) Master coprocessor RPCs don't propagate exceptions


 [ 
https://issues.apache.org/jira/browse/HBASE-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Lewis updated HBASE-7688:
--

Attachment: fix-hmaster-rpc-exceptions.patch

Potential fix

 Master coprocessor RPCs don't propagate exceptions
 --

 Key: HBASE-7688
 URL: https://issues.apache.org/jira/browse/HBASE-7688
 Project: HBase
  Issue Type: Bug
  Components: Protobufs
Affects Versions: 0.96.0
Reporter: Mike Lewis
 Fix For: 0.96.0

 Attachments: fix-hmaster-rpc-exceptions.patch


 I was working on some coprocessor tests, and was wondering why exceptions 
 weren't bubbling up while calling checkFailed on my RpcController.
 Turns out, that HMaster's execMasterService seems to be sending the wrong 
 RpcController (that happens to be null in my case) to the methods.
 I have a fix that appears to work, but I am not familiar enough with the code 
 to know that it's the write one.
 Version I am working on has the git sha 9a322b0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563958#comment-13563958
 ] 

Hudson commented on HBASE-7687:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #379 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/379/])
HBASE-7687 TestCatalogTracker.testServerNotRunningIOException fails 
occasionally (Ted Yu) (Revision 1439180)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java


 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt, 7687-trunk-v3.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7688) Master coprocessor RPCs don't propagate exceptions


[ 
https://issues.apache.org/jira/browse/HBASE-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563960#comment-13563960
 ] 

Ted Yu commented on HBASE-7688:
---

Interesting. 

Are you able to add a unit test that shows the exception being propagated ?

Thanks

 Master coprocessor RPCs don't propagate exceptions
 --

 Key: HBASE-7688
 URL: https://issues.apache.org/jira/browse/HBASE-7688
 Project: HBase
  Issue Type: Bug
  Components: Protobufs
Affects Versions: 0.96.0
Reporter: Mike Lewis
 Fix For: 0.96.0

 Attachments: fix-hmaster-rpc-exceptions.patch


 I was working on some coprocessor tests, and was wondering why exceptions 
 weren't bubbling up while calling checkFailed on my RpcController.
 Turns out, that HMaster's execMasterService seems to be sending the wrong 
 RpcController (that happens to be null in my case) to the methods.
 I have a fix that appears to work, but I am not familiar enough with the code 
 to know that it's the write one.
 Version I am working on has the git sha 9a322b0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7688) Master coprocessor RPCs don't propagate exceptions


[ 
https://issues.apache.org/jira/browse/HBASE-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563970#comment-13563970
 ] 

Ted Yu commented on HBASE-7688:
---

{code}
  public ClientProtos.CoprocessorServiceResponse execMasterService(final 
RpcController controller,
  final ClientProtos.CoprocessorServiceRequest request) throws 
ServiceException {
try {
  ServerRpcController execController = new ServerRpcController();
{code}
If controller is an instance of ServerRpcController, do we need to create a new 
instance ?

 Master coprocessor RPCs don't propagate exceptions
 --

 Key: HBASE-7688
 URL: https://issues.apache.org/jira/browse/HBASE-7688
 Project: HBase
  Issue Type: Bug
  Components: Protobufs
Affects Versions: 0.96.0
Reporter: Mike Lewis
 Fix For: 0.96.0

 Attachments: fix-hmaster-rpc-exceptions.patch


 I was working on some coprocessor tests, and was wondering why exceptions 
 weren't bubbling up while calling checkFailed on my RpcController.
 Turns out, that HMaster's execMasterService seems to be sending the wrong 
 RpcController (that happens to be null in my case) to the methods.
 I have a fix that appears to work, but I am not familiar enough with the code 
 to know that it's the write one.
 Version I am working on has the git sha 9a322b0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7688) Master coprocessor RPCs don't propagate exceptions


 [ 
https://issues.apache.org/jira/browse/HBASE-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Lewis updated HBASE-7688:
--

Attachment: fix-hmaster-rpc-exceptions-2.patch

Same patch with test case for Master that fails before the fix, and works 
after.  Also added a test that shows that region RPC error stuff works 
correctly currently.

 Master coprocessor RPCs don't propagate exceptions
 --

 Key: HBASE-7688
 URL: https://issues.apache.org/jira/browse/HBASE-7688
 Project: HBase
  Issue Type: Bug
  Components: Protobufs
Affects Versions: 0.96.0
Reporter: Mike Lewis
 Fix For: 0.96.0

 Attachments: fix-hmaster-rpc-exceptions-2.patch, 
 fix-hmaster-rpc-exceptions.patch


 I was working on some coprocessor tests, and was wondering why exceptions 
 weren't bubbling up while calling checkFailed on my RpcController.
 Turns out, that HMaster's execMasterService seems to be sending the wrong 
 RpcController (that happens to be null in my case) to the methods.
 I have a fix that appears to work, but I am not familiar enough with the code 
 to know that it's the write one.
 Version I am working on has the git sha 9a322b0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7689) CloneTableHandler notify completion too early

Matteo Bertozzi created HBASE-7689:
--

 Summary: CloneTableHandler notify completion too early
 Key: HBASE-7689
 URL: https://issues.apache.org/jira/browse/HBASE-7689
 Project: HBase
  Issue Type: Sub-task
  Components: snapshots
Affects Versions: hbase-6055
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: hbase-6055, 0.96.0


With HBASE-7365 the clone table handler notify that the clone is finished 
before the base create table handler has moved the table folder in place and 
added regions to .META.

If the client call isRestoreDone() before the META+disk update gets an 
exception like this:
{code}
org.apache.hadoop.hbase.TableNotFoundException: clonedtb-1359268409309
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:991)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:856)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:813)
at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
at org.apache.hadoop.hbase.client.HTable.init(HTable.java:174)
at 
org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.verifyRowCount(TestRestoreSnapshotFromClient.java:387)
at 
org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.testCloneSnapshot(TestRestoreSnapshotFromClient.java:259)
at 
org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.testCloneSnapshot(TestRestoreSnapshotFromClient.java:251)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7689) CloneTableHandler notify completion too early


 [ 
https://issues.apache.org/jira/browse/HBASE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-7689:
---

Attachment: HBASE-7689-v0.patch

Added to methods completed() and failed() to the CreateTableHandler.

I think that in the long term, we should add these two methods to the base 
EventHandler and get a way to notify the client based on these.

 CloneTableHandler notify completion too early
 -

 Key: HBASE-7689
 URL: https://issues.apache.org/jira/browse/HBASE-7689
 Project: HBase
  Issue Type: Sub-task
  Components: snapshots
Affects Versions: hbase-6055
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: hbase-6055, 0.96.0

 Attachments: HBASE-7689-v0.patch


 With HBASE-7365 the clone table handler notify that the clone is finished 
 before the base create table handler has moved the table folder in place and 
 added regions to .META.
 If the client call isRestoreDone() before the META+disk update gets an 
 exception like this:
 {code}
 org.apache.hadoop.hbase.TableNotFoundException: clonedtb-1359268409309
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:991)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:856)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:813)
   at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
   at org.apache.hadoop.hbase.client.HTable.init(HTable.java:174)
   at 
 org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.verifyRowCount(TestRestoreSnapshotFromClient.java:387)
   at 
 org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.testCloneSnapshot(TestRestoreSnapshotFromClient.java:259)
   at 
 org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.testCloneSnapshot(TestRestoreSnapshotFromClient.java:251)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7689) CloneTableHandler notify completion too early


[ 
https://issues.apache.org/jira/browse/HBASE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563989#comment-13563989
 ] 

Ted Yu commented on HBASE-7689:
---

{code}
+  completed();
   if (cpHost != null) {
 cpHost.postCreateTableHandler(this.hTableDescriptor, this.newRegions);
   }
{code}
What if call of postCreateTableHandler() results in exception ?
I think call to completed() should be moved after call to 
postCreateTableHandler()

Otherwise patch looks good.

 CloneTableHandler notify completion too early
 -

 Key: HBASE-7689
 URL: https://issues.apache.org/jira/browse/HBASE-7689
 Project: HBase
  Issue Type: Sub-task
  Components: snapshots
Affects Versions: hbase-6055
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: hbase-6055, 0.96.0

 Attachments: HBASE-7689-v0.patch


 With HBASE-7365 the clone table handler notify that the clone is finished 
 before the base create table handler has moved the table folder in place and 
 added regions to .META.
 If the client call isRestoreDone() before the META+disk update gets an 
 exception like this:
 {code}
 org.apache.hadoop.hbase.TableNotFoundException: clonedtb-1359268409309
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:991)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:856)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:813)
   at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
   at org.apache.hadoop.hbase.client.HTable.init(HTable.java:174)
   at 
 org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.verifyRowCount(TestRestoreSnapshotFromClient.java:387)
   at 
 org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.testCloneSnapshot(TestRestoreSnapshotFromClient.java:259)
   at 
 org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.testCloneSnapshot(TestRestoreSnapshotFromClient.java:251)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7689) CloneTableHandler notify completion too early


[ 
https://issues.apache.org/jira/browse/HBASE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563990#comment-13563990
 ] 

Matteo Bertozzi commented on HBASE-7689:


There's nothing that removes the table on failure, so if the coprocessor fail 
the table is still there. That's why the complete() is before the coprocessor.

 CloneTableHandler notify completion too early
 -

 Key: HBASE-7689
 URL: https://issues.apache.org/jira/browse/HBASE-7689
 Project: HBase
  Issue Type: Sub-task
  Components: snapshots
Affects Versions: hbase-6055
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: hbase-6055, 0.96.0

 Attachments: HBASE-7689-v0.patch


 With HBASE-7365 the clone table handler notify that the clone is finished 
 before the base create table handler has moved the table folder in place and 
 added regions to .META.
 If the client call isRestoreDone() before the META+disk update gets an 
 exception like this:
 {code}
 org.apache.hadoop.hbase.TableNotFoundException: clonedtb-1359268409309
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:991)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:856)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:813)
   at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
   at org.apache.hadoop.hbase.client.HTable.init(HTable.java:174)
   at 
 org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.verifyRowCount(TestRestoreSnapshotFromClient.java:387)
   at 
 org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.testCloneSnapshot(TestRestoreSnapshotFromClient.java:259)
   at 
 org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.testCloneSnapshot(TestRestoreSnapshotFromClient.java:251)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7624) Backport HBASE-5359 and HBASE-7596 to 0.94


 [ 
https://issues.apache.org/jira/browse/HBASE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-7624:
-

Fix Version/s: (was: 0.94.5)
   0.94.6

HBASE-5359 needs a lot of hand merging.
Moving out unless somebody volunteers.

 Backport HBASE-5359 and HBASE-7596 to 0.94
 --

 Key: HBASE-7624
 URL: https://issues.apache.org/jira/browse/HBASE-7624
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.94.6


 Both HBASE-5359 and HBASE-7596 are useful and should be added to 0.94.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-7687) TestCatalogTracker.testServerNotRunningIOException fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-7687.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

 TestCatalogTracker.testServerNotRunningIOException fails occasionally
 -

 Key: HBASE-7687
 URL: https://issues.apache.org/jira/browse/HBASE-7687
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Ted Yu
 Fix For: 0.96.0, 0.94.5

 Attachments: 7687-94.txt, 7687-94-v2.txt, 7687-94-v3.txt, 
 7687-trunk.txt, 7687-trunk-v3.txt


 I would like to remove this method. It micro-tests a single method and poorly 
 at that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6347) -ROOT- and .META. are stale in table.jsp if they moved


[ 
https://issues.apache.org/jira/browse/HBASE-6347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564014#comment-13564014
 ] 

Lars Hofhansl commented on HBASE-6347:
--

Looks like ROOT should be handled correctly.
META needs to be fixed, though

 -ROOT- and .META. are stale in table.jsp if they moved
 --

 Key: HBASE-6347
 URL: https://issues.apache.org/jira/browse/HBASE-6347
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
  Labels: noob
 Fix For: 0.90.8, 0.92.3, 0.94.5


 table.jsp does not use a lookup method on {{CatalogTracker}} that does not 
 force a refresh of the cache, thus it can get a stale location if -ROOT- or 
 .META. moved and the master hasn't tried to access them yet.
 Should just be a matter of using waitForRoot/Meta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one


[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564016#comment-13564016
 ] 

Lars Hofhansl commented on HBASE-2611:
--

[~jdcryans]?

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-trunk-v3.patch, 2611-v3.patch, 
 HBASE-2611-trunk-v2.patch, HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7684) NullPointerException in SecureClient when Call is cleaned up due to RPC timeout


 [ 
https://issues.apache.org/jira/browse/HBASE-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-7684:
--

Description: 
We find NullPointException when using SecureClient to access hbase. From the 
source code, we find that receiveResponse() will not check call != null before 
invoke call.setValue(...)(line 378 and 380 in SecureClient.java). However, as 
explained in receiveResponse() of HBaseClient, the call may have been cleaned 
up due to RPC timeout; therefore, it should be better to check call != null 
before invoke call.setValue(...) in SecureClient.
The following is the Exception trace:
2013-01-28 12:11:20,060 [request-queuepool-10-thread-1] WARN  
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - 
Failed all from 
region=sms_message,98,1358995149987.0c18f72df3f3f398f9f6dc83fe65afad., 
hostname=10.20.2.72, port=11600
java.util.concurrent.ExecutionException: java.io.IOException: Call to 
10.20.2.72/10.20.2.72:11600 failed on local exception: java.io.IOException: 
Unexpected exception receiving call responses
  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
  at java.util.concurrent.FutureTask.get(FutureTask.java:83)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1544)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1396)
  at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:937)
  at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
  at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752)
  at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397)
  at 
com.xiaomi.infra.hbase.business.dao.sms.SmsMessage.updateStatus(SmsMessage.java:245)
  at 
com.xiaomi.infra.hbase.business.client.sms.HMessageDao.doUpdateMessage(HMessageDao.java:165)
  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
com.xiaomi.infra.hbase.business.client.AsyncDaoWrapper.wrapperRunMethod(AsyncDaoWrapper.java:97)
  at 
com.xiaomi.infra.hbase.business.client.HBaseRequest.run(AsyncDaoWrapper.java:135)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Call to 10.20.2.72/10.20.2.72:11600 failed on 
local exception: java.io.IOException: Unexpected exception receiving call 
responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1056)
  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1025)
  at 
org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:165)
  at $Proxy7.multi(Unknown Source)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1373)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1)
  at 
org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1380)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  ... 3 more
Caused by: java.io.IOException: Unexpected exception receiving call responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:610)
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.receiveResponse(SecureClient.java:378)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:606)

  was:We find NullPointException when using SecureClient to access hbase. From 
the source code, we find that receiveResponse() will not check call != null 
before invoke call.setValue(...)(line 378 and 380 in SecureClient.java). 
However, as explained in receiveResponse() of HBaseClient, the call may have 
been cleaned up due to RPC timeout; therefore, it should be better to check 
call != null before invoke call.setValue(...) in SecureClient


 NullPointerException in SecureClient when Call is cleaned up due to RPC  
 timeout
 

 Key: HBASE-7684
 URL: https://issues.apache.org/jira/browse/HBASE-7684
 Project: HBase
  Issue Type: Bug
  Components: security

[jira] [Updated] (HBASE-7684) NullPointerException in SecureClient when Call is cleaned up due to RPC timeout


 [ 
https://issues.apache.org/jira/browse/HBASE-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-7684:
--

Description: We find NullPointException when using SecureClient to access 
hbase. From the source code, we find that receiveResponse() will not check call 
!= null before invoke call.setValue(...)(line 378 and 380 in 
SecureClient.java). However, as explained in receiveResponse() of HBaseClient, 
the call may have been cleaned up due to RPC timeout; therefore, it should be 
better to check call != null before invoke call.setValue(...) in SecureClient.  
(was: We find NullPointException when using SecureClient to access hbase. From 
the source code, we find that receiveResponse() will not check call != null 
before invoke call.setValue(...)(line 378 and 380 in SecureClient.java). 
However, as explained in receiveResponse() of HBaseClient, the call may have 
been cleaned up due to RPC timeout; therefore, it should be better to check 
call != null before invoke call.setValue(...) in SecureClient.
The following is the Exception trace:
2013-01-28 12:11:20,060 [request-queuepool-10-thread-1] WARN  
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - 
Failed all from 
region=sms_message,98,1358995149987.0c18f72df3f3f398f9f6dc83fe65afad., 
hostname=10.20.2.72, port=11600
java.util.concurrent.ExecutionException: java.io.IOException: Call to 
10.20.2.72/10.20.2.72:11600 failed on local exception: java.io.IOException: 
Unexpected exception receiving call responses
  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
  at java.util.concurrent.FutureTask.get(FutureTask.java:83)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1544)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1396)
  at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:937)
  at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
  at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752)
  at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397)
  at 
com.xiaomi.infra.hbase.business.dao.sms.SmsMessage.updateStatus(SmsMessage.java:245)
  at 
com.xiaomi.infra.hbase.business.client.sms.HMessageDao.doUpdateMessage(HMessageDao.java:165)
  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
com.xiaomi.infra.hbase.business.client.AsyncDaoWrapper.wrapperRunMethod(AsyncDaoWrapper.java:97)
  at 
com.xiaomi.infra.hbase.business.client.HBaseRequest.run(AsyncDaoWrapper.java:135)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Call to 10.20.2.72/10.20.2.72:11600 failed on 
local exception: java.io.IOException: Unexpected exception receiving call 
responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1056)
  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1025)
  at 
org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:165)
  at $Proxy7.multi(Unknown Source)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1373)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1)
  at 
org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1380)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  ... 3 more
Caused by: java.io.IOException: Unexpected exception receiving call responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:610)
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.receiveResponse(SecureClient.java:378)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:606))

 NullPointerException in SecureClient when Call is cleaned up due to RPC  
 timeout
 

 Key: HBASE-7684
 URL: https://issues.apache.org/jira/browse/HBASE-7684
 Project: HBase
  Issue Type: Bug
  Components: security

[jira] [Commented] (HBASE-7337) SingleColumnValueFilter seems to get unavailble data

2013-01-27 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564032#comment-13564032
 ] 

Anoop Sam John commented on HBASE-7337:
---

[~zhou wen jian]
When you scan data and your table having multiple versions for a cell, which 
all versions the scan should return to be specified in Scan object. By default 
HBase will return only the latest version. But you have setters on Scan object 
using which you can tell to get more version.
SCVF is used to specify a column value check. If the condition is not satisfied 
for a row, then that row will be fully filtered out from the returned results. 
Here using latestVersionOnly = true , you are specifying to check the condition 
only on the latest version. [*Not* that to return the latestVersion only in 
result]If this is set as false, all the versions will be checked for the 
condition and if any of the version value is satisfying the condition, that row 
will get included. But remember SCVF can not specify to return only latest 
version of cell or not. This is just for specifying the condition and all the 
filter will see all the versions of cells. Which version to be returned is 
decided down the line in another part of code which will be executed after this 
Filter#filterKeyValue(KeyValue)


SingleColumnValueFilter seems to get unavailble data - Your heading says 
getting unavailable data. Can you tell more? Or your problem is getting the 
older versions?

As per your reply we can check whether some real bug is there or not. If no 
issues we can close this.

 SingleColumnValueFilter seems to get unavailble data
 

 Key: HBASE-7337
 URL: https://issues.apache.org/jira/browse/HBASE-7337
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.3, 0.96.0
 Environment: 0.94
Reporter: Zhou wenjian
Assignee: Zhou wenjian
 Fix For: 0.96.0, 0.94.6


 put multi versions of a row.
 r1 cf:q  version:1 value:1
 r1 cf:q  version:2 value:3
 r1 cf:q  version:3 value:2
 the filter in scan is set as below:
 SingleColumnValueFilter valueF = new SingleColumnValueFilter(
 family,qualifier,CompareOp.EQUAL,new BinaryComparator(Bytes
 .toBytes(2)));
 then i found all of the three versions will be emmitted, then i set 
 latestVersionOnly to false, the result does no change.
 {code}
   public ReturnCode filterKeyValue(KeyValue keyValue) {
 // System.out.println(REMOVE KEY= + keyValue.toString() + , value= + 
 Bytes.toString(keyValue.getValue()));
 if (this.matchedColumn) {
   // We already found and matched the single column, all keys now pass
   return ReturnCode.INCLUDE;
 } else if (this.latestVersionOnly  this.foundColumn) {
   // We found but did not match the single column, skip to next row
   return ReturnCode.NEXT_ROW;
 }
 if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier)) {
   return ReturnCode.INCLUDE;
 }
 foundColumn = true;
 if (filterColumnValue(keyValue.getBuffer(),
 keyValue.getValueOffset(), keyValue.getValueLength())) {
   return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE;
 }
 this.matchedColumn = true;
 return ReturnCode.INCLUDE;
   }
 {code}
 From the code above, it seeems that version 3 will be first emmited, and set 
 matchedColumn to true, which leads the following version 2 and 1 emmited too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7638) [0.94] region cache entry should only be removed on error if the error is from the server currently in cache


[ 
https://issues.apache.org/jira/browse/HBASE-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564036#comment-13564036
 ] 

Lars Hofhansl commented on HBASE-7638:
--

What if the region has moved, the cache was not updated, and now the new RS 
returns an error?

 [0.94] region cache entry should only be removed on error if the error is 
 from the server currently in cache
 

 Key: HBASE-7638
 URL: https://issues.apache.org/jira/browse/HBASE-7638
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.4
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Fix For: 0.94.5

 Attachments: HBASE-7638-v0.patch


 See HBASE-7268. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7638) [0.94] region cache entry should only be removed on error if the error is from the server currently in cache


 [ 
https://issues.apache.org/jira/browse/HBASE-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-7638:
-

Fix Version/s: (was: 0.94.5)
   0.94.6

 [0.94] region cache entry should only be removed on error if the error is 
 from the server currently in cache
 

 Key: HBASE-7638
 URL: https://issues.apache.org/jira/browse/HBASE-7638
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.4
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Fix For: 0.94.6

 Attachments: HBASE-7638-v0.patch


 See HBASE-7268. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7690) Improve metadata printing in HFilePrettyPrinter

Nick Dimiduk created HBASE-7690:
---

 Summary: Improve metadata printing in HFilePrettyPrinter
 Key: HBASE-7690
 URL: https://issues.apache.org/jira/browse/HBASE-7690
 Project: HBase
  Issue Type: Improvement
  Components: HFile
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor


The pretty printer could do a better job with metadata. For example:

{pre}
...
Fileinfo:
BULKLOAD_SOURCE_TASK = attempt_201301272014_0001_r_00_0
BULKLOAD_TIMESTAMP = \x00\x00\x01\x7FcG\x8E
DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
EARLIEST_PUT_TS = \x00\x00\x01\x7FcF
EXCLUDE_FROM_MINOR_COMPACTION = \x00
KEY_VALUE_VERSION = \x00\x00\x00\x01
MAJOR_COMPACTION_KEY = \xFF
MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x00\x00
TIMERANGE = 13593468698301359346869830
hfile.AVG_KEY_LEN = 19
hfile.AVG_VALUE_LEN = 2
hfile.LASTKEY = \x00\x04row9\x01dc2\x00\x00\x01\x7FcF\x04
...
{pre}

May of these fields could be cleaned up to print in human-readable values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7690) Improve metadata printing in HFilePrettyPrinter


 [ 
https://issues.apache.org/jira/browse/HBASE-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-7690:


Description: 
The pretty printer could do a better job with metadata. For example:

{noformat}
...
Fileinfo:
BULKLOAD_SOURCE_TASK = attempt_201301272014_0001_r_00_0
BULKLOAD_TIMESTAMP = \x00\x00\x01\x7FcG\x8E
DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
EARLIEST_PUT_TS = \x00\x00\x01\x7FcF
EXCLUDE_FROM_MINOR_COMPACTION = \x00
KEY_VALUE_VERSION = \x00\x00\x00\x01
MAJOR_COMPACTION_KEY = \xFF
MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x00\x00
TIMERANGE = 13593468698301359346869830
hfile.AVG_KEY_LEN = 19
hfile.AVG_VALUE_LEN = 2
hfile.LASTKEY = \x00\x04row9\x01dc2\x00\x00\x01\x7FcF\x04
...
{noformat}

May of these fields could be cleaned up to print in human-readable values.

  was:
The pretty printer could do a better job with metadata. For example:

{pre}
...
Fileinfo:
BULKLOAD_SOURCE_TASK = attempt_201301272014_0001_r_00_0
BULKLOAD_TIMESTAMP = \x00\x00\x01\x7FcG\x8E
DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
EARLIEST_PUT_TS = \x00\x00\x01\x7FcF
EXCLUDE_FROM_MINOR_COMPACTION = \x00
KEY_VALUE_VERSION = \x00\x00\x00\x01
MAJOR_COMPACTION_KEY = \xFF
MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x00\x00
TIMERANGE = 13593468698301359346869830
hfile.AVG_KEY_LEN = 19
hfile.AVG_VALUE_LEN = 2
hfile.LASTKEY = \x00\x04row9\x01dc2\x00\x00\x01\x7FcF\x04
...
{pre}

May of these fields could be cleaned up to print in human-readable values.


 Improve metadata printing in HFilePrettyPrinter
 ---

 Key: HBASE-7690
 URL: https://issues.apache.org/jira/browse/HBASE-7690
 Project: HBase
  Issue Type: Improvement
  Components: HFile
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor

 The pretty printer could do a better job with metadata. For example:
 {noformat}
 ...
 Fileinfo:
 BULKLOAD_SOURCE_TASK = attempt_201301272014_0001_r_00_0
 BULKLOAD_TIMESTAMP = \x00\x00\x01\x7FcG\x8E
 DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
 EARLIEST_PUT_TS = \x00\x00\x01\x7FcF
 EXCLUDE_FROM_MINOR_COMPACTION = \x00
 KEY_VALUE_VERSION = \x00\x00\x00\x01
 MAJOR_COMPACTION_KEY = \xFF
 MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x00\x00
 TIMERANGE = 13593468698301359346869830
 hfile.AVG_KEY_LEN = 19
 hfile.AVG_VALUE_LEN = 2
 hfile.LASTKEY = \x00\x04row9\x01dc2\x00\x00\x01\x7FcF\x04
 ...
 {noformat}
 May of these fields could be cleaned up to print in human-readable values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7690) Improve metadata printing in HFilePrettyPrinter


 [ 
https://issues.apache.org/jira/browse/HBASE-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-7690:


Attachment: 0001-HBASE-7690-Improve-printing-of-HFile-metadata.patch

This improves the output a bit:

{noformat}
Fileinfo:
BULKLOAD_SOURCE_TASK = attempt_201301272014_0001_r_00_0
BULKLOAD_TIMESTAMP = 1359346878350
DELETE_FAMILY_COUNT = 0
EARLIEST_PUT_TS = 1359346869830
EXCLUDE_FROM_MINOR_COMPACTION = false
KEY_VALUE_VERSION = 1
MAJOR_COMPACTION_KEY = true
MAX_MEMSTORE_TS_KEY = 0
TIMERANGE = 13593468698301359346869830
hfile.AVG_KEY_LEN = 19
hfile.AVG_VALUE_LEN = 2
hfile.LASTKEY = \x00\x04row9\x01dc2\x00\x00\x01\x7FcF\x04
{noformat}

 Improve metadata printing in HFilePrettyPrinter
 ---

 Key: HBASE-7690
 URL: https://issues.apache.org/jira/browse/HBASE-7690
 Project: HBase
  Issue Type: Improvement
  Components: HFile
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Attachments: 0001-HBASE-7690-Improve-printing-of-HFile-metadata.patch


 The pretty printer could do a better job with metadata. For example:
 {noformat}
 ...
 Fileinfo:
 BULKLOAD_SOURCE_TASK = attempt_201301272014_0001_r_00_0
 BULKLOAD_TIMESTAMP = \x00\x00\x01\x7FcG\x8E
 DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
 EARLIEST_PUT_TS = \x00\x00\x01\x7FcF
 EXCLUDE_FROM_MINOR_COMPACTION = \x00
 KEY_VALUE_VERSION = \x00\x00\x00\x01
 MAJOR_COMPACTION_KEY = \xFF
 MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x00\x00
 TIMERANGE = 13593468698301359346869830
 hfile.AVG_KEY_LEN = 19
 hfile.AVG_VALUE_LEN = 2
 hfile.LASTKEY = \x00\x04row9\x01dc2\x00\x00\x01\x7FcF\x04
 ...
 {noformat}
 May of these fields could be cleaned up to print in human-readable values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7690) Improve metadata printing in HFilePrettyPrinter


 [ 
https://issues.apache.org/jira/browse/HBASE-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-7690:


Status: Patch Available  (was: Open)

 Improve metadata printing in HFilePrettyPrinter
 ---

 Key: HBASE-7690
 URL: https://issues.apache.org/jira/browse/HBASE-7690
 Project: HBase
  Issue Type: Improvement
  Components: HFile
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Attachments: 0001-HBASE-7690-Improve-printing-of-HFile-metadata.patch


 The pretty printer could do a better job with metadata. For example:
 {noformat}
 ...
 Fileinfo:
 BULKLOAD_SOURCE_TASK = attempt_201301272014_0001_r_00_0
 BULKLOAD_TIMESTAMP = \x00\x00\x01\x7FcG\x8E
 DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
 EARLIEST_PUT_TS = \x00\x00\x01\x7FcF
 EXCLUDE_FROM_MINOR_COMPACTION = \x00
 KEY_VALUE_VERSION = \x00\x00\x00\x01
 MAJOR_COMPACTION_KEY = \xFF
 MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x00\x00
 TIMERANGE = 13593468698301359346869830
 hfile.AVG_KEY_LEN = 19
 hfile.AVG_VALUE_LEN = 2
 hfile.LASTKEY = \x00\x04row9\x01dc2\x00\x00\x01\x7FcF\x04
 ...
 {noformat}
 May of these fields could be cleaned up to print in human-readable values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7690) Improve metadata printing in HFilePrettyPrinter


[ 
https://issues.apache.org/jira/browse/HBASE-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564051#comment-13564051
 ] 

Matteo Bertozzi commented on HBASE-7690:


The hfile doesn't know nothing about its content.

Most of the fields are written by the store file. 
if we want to keep the separation, only the store file knows the meaning of 
each field.
(Also with HBASE-7603 probably only the right StoreFileManager is able to dump 
metadata in a human-readable way)

 Improve metadata printing in HFilePrettyPrinter
 ---

 Key: HBASE-7690
 URL: https://issues.apache.org/jira/browse/HBASE-7690
 Project: HBase
  Issue Type: Improvement
  Components: HFile
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Attachments: 0001-HBASE-7690-Improve-printing-of-HFile-metadata.patch


 The pretty printer could do a better job with metadata. For example:
 {noformat}
 ...
 Fileinfo:
 BULKLOAD_SOURCE_TASK = attempt_201301272014_0001_r_00_0
 BULKLOAD_TIMESTAMP = \x00\x00\x01\x7FcG\x8E
 DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
 EARLIEST_PUT_TS = \x00\x00\x01\x7FcF
 EXCLUDE_FROM_MINOR_COMPACTION = \x00
 KEY_VALUE_VERSION = \x00\x00\x00\x01
 MAJOR_COMPACTION_KEY = \xFF
 MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x00\x00
 TIMERANGE = 13593468698301359346869830
 hfile.AVG_KEY_LEN = 19
 hfile.AVG_VALUE_LEN = 2
 hfile.LASTKEY = \x00\x04row9\x01dc2\x00\x00\x01\x7FcF\x04
 ...
 {noformat}
 May of these fields could be cleaned up to print in human-readable values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7691) A couple docbook updates

Nick Dimiduk created HBASE-7691:
---

 Summary: A couple docbook updates
 Key: HBASE-7691
 URL: https://issues.apache.org/jira/browse/HBASE-7691
 Project: HBase
  Issue Type: Improvement
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Trivial


I've noticed a couple mistakes in the book.

- link to external (hadoop) documentation which is no longer available on 
{{hadoop.apache.org/common/docs/current}}.
- small mistake in instructions for generating a release tarball.
- mistake in descriptions of test profiles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7691) A couple docbook updates


 [ 
https://issues.apache.org/jira/browse/HBASE-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-7691:


Attachment: 0001-HBASE-7691-A-couple-docbook-updates.patch

 A couple docbook updates
 

 Key: HBASE-7691
 URL: https://issues.apache.org/jira/browse/HBASE-7691
 Project: HBase
  Issue Type: Improvement
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Trivial
 Attachments: 0001-HBASE-7691-A-couple-docbook-updates.patch


 I've noticed a couple mistakes in the book.
 - link to external (hadoop) documentation which is no longer available on 
 {{hadoop.apache.org/common/docs/current}}.
 - small mistake in instructions for generating a release tarball.
 - mistake in descriptions of test profiles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7691) A couple docbook updates


 [ 
https://issues.apache.org/jira/browse/HBASE-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-7691:


Status: Patch Available  (was: Open)

 A couple docbook updates
 

 Key: HBASE-7691
 URL: https://issues.apache.org/jira/browse/HBASE-7691
 Project: HBase
  Issue Type: Improvement
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Trivial
 Attachments: 0001-HBASE-7691-A-couple-docbook-updates.patch


 I've noticed a couple mistakes in the book.
 - link to external (hadoop) documentation which is no longer available on 
 {{hadoop.apache.org/common/docs/current}}.
 - small mistake in instructions for generating a release tarball.
 - mistake in descriptions of test profiles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7671) Flushing memstore again after last failure could cause data loss


[ 
https://issues.apache.org/jira/browse/HBASE-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564052#comment-13564052
 ] 

chunhui shen commented on HBASE-7671:
-

Each flushed file has a sequence id, and all hlog entries which smaller than 
the max sequence id will be skipped when replaying edit logs.

It means all data in memstore smaller than the sequence id has been flushed to 
storefile when flushing, but broken by the case that snapshot called again 
without clearing previous because of last failure.


bq.Could somehow store the correct sequenceId with the snapshot?
I have considered this solution, but it seems not convenient for multi stores. 
In that way we should maintain sequence id for each memstore rather than one 
region.

Snapshot means creating a snapshot of the current memstore. If data belongs to 
memstore before successfully flushed, IMO, copies any new KVs is reasonable.

 Flushing memstore again after last failure could cause data loss
 

 Key: HBASE-7671
 URL: https://issues.apache.org/jira/browse/HBASE-7671
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.4
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7671.patch, HBASE-7671v2.patch, HBASE-7671v3.patch


 See the following logs first:
 {code}
 2013-01-23 18:58:38,801 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Flushed , sequenceid=9746535080, memsize=101.8m, into tmp file 
 hdfs://dw77.kgb.sqa.cm4:9900/hbase-test3/writetest1/8dc14e35b4d7c0e481e0bb30849cff7d/.tmp/bebeeecc56364b6c8126cf1dc6782a25
 2013-01-23 18:58:41,982 WARN org.apache.hadoop.hbase.regionserver.MemStore: 
 Snapshot called again without clearing previous. Doing nothing. Another 
 ongoing flush or did we fail last attempt?
 2013-01-23 18:58:43,274 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Flushed , sequenceid=9746599334, memsize=101.8m, into tmp file 
 hdfs://dw77.kgb.sqa.cm4:9900/hbase-test3/writetest1/8dc14e35b4d7c0e481e0bb30849cff7d/.tmp/4eede32dc469480bb3d469aaff332313
 {code}
 The first time memstore flush is failed when commitFile()(Logged the first 
 edit above), then trigger server abort, but another flush is coming 
 immediately(could caused by move/split,Logged the third edit above) and 
 successful.
 For the same memstore's snapshot, we get different sequenceid, it causes data 
 loss when replaying log edits
 See details from the unit test case in the patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6335) Switching log-splitting policy after last failure master start may cause data loss

[
https://issues.apache.org/jira/browse/HBASE-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564056#comment-13564056
]

chunhui shen commented on HBASE-6335:
-

In 0.96 as well.

For this problem, I don't have any good idea...

Switching log-splitting policy after last failure master start may cause data
loss
--

Key: HBASE-6335
URL: https://issues.apache.org/jira/browse/HBASE-6335
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 0.92.1, 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Fix For: 0.94.6

How happen?
If server A is down, and it has three log files, all the data is from one
region.
File 1: kv01 kv02 kv03
File 2: kv04 kv05 kv06
File 3: kv07 kv08 kv09
Here,kv01 means, its log seqID is 01
Case:Switch to maste-local-log-splitting from distributed-log-splitting
1.Master find serverA is down, and start to split its log files using
split-log-splitting.
2.Successfully split log file2, and move it to oldLogs, and generate one edit
file named 06 in region recover.edits dir.
3.Master restart, and change the log-splitting policy to
maste-local-log-splitting , and start to split file 1, file 3
4.Successfully split log file1 and file3, and generate one edit file named 09
in region recover.edits dir.
5.Region replay edits from edit file 06 and 09, Region's seqID is 06 after it
replay edits from 06, and when replaying edit from 09, it will skip
kv01,kv02,kv03, So these data loss.
As the above case, if we switch to distributed-log-splitting from
maste-local-log-splitting, it could also cause data loss
Should we fix this bug or avoid the case? I'm not sure...

[jira] [Commented] (HBASE-7507) Make memstore flush be able to retry after exception


[ 
https://issues.apache.org/jira/browse/HBASE-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564061#comment-13564061
 ] 

chunhui shen commented on HBASE-7507:
-

I have no objection for revert.

 Make memstore flush be able to retry after exception
 

 Key: HBASE-7507
 URL: https://issues.apache.org/jira/browse/HBASE-7507
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7507-94.patch, 7507-trunk v1.patch, 7507-trunk v2.patch, 
 7507-trunkv3.patch


 We will abort regionserver if memstore flush throws exception.
 I thinks we could do retry to make regionserver more stable because file 
 system may be not ok in a transient time. e.g. Switching namenode in the 
 NamenodeHA environment
 {code}
 HRegion#internalFlushcache(){
 ...
 try {
 ...
 }catch(Throwable t){
 DroppedSnapshotException dse = new DroppedSnapshotException(region:  +
   Bytes.toStringBinary(getRegionName()));
 dse.initCause(t);
 throw dse;
 }
 ...
 }
 MemStoreFlusher#flushRegion(){
 ...
 region.flushcache();
 ...
  try {
 }catch(DroppedSnapshotException ex){
 server.abort(Replay of HLog required. Forcing server shutdown, ex);
 }
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7684) NullPointerException in SecureClient when Call is cleaned up due to RPC timeout


 [ 
https://issues.apache.org/jira/browse/HBASE-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-7684:
--

Description: 
We find NullPointException when using SecureClient to access hbase. From the 
source code, we find that receiveResponse() will not check call != null before 
invoke call.setValue(...)(line 378 and 380 in SecureClient.java). However, as 
explained in receiveResponse() of HBaseClient, the call may have been cleaned 
up due to RPC timeout; therefore, it should be better to check call != null 
before invoke call.setValue(...) in SecureClient.
The exception stack trace is : 
  2013-01-28 12:11:20,060 [request-queuepool-10-thread-1] WARN  
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - 
Failed all from 
region=sms_message,98,1358995149987.0c18f72df3f3f398f9f6dc83fe65afad., 
hostname=10.20.2.72, port=11600
  java.util.concurrent.ExecutionException: java.io.IOException: Call to 
10.20.2.72/10.20.2.72:11600 failed on local exception: java.io.IOException: 
Unexpected exception receiving call responses
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1544)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1396)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:937)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752)
at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397)
at 
com.xiaomi.infra.hbase.business.dao.sms.SmsMessage.updateStatus(SmsMessage.java:245)
at 
com.xiaomi.infra.hbase.business.client.sms.HMessageDao.doUpdateMessage(HMessageDao.java:165)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.xiaomi.infra.hbase.business.client.AsyncDaoWrapper.wrapperRunMethod(AsyncDaoWrapper.java:97)
at 
com.xiaomi.infra.hbase.business.client.HBaseRequest.run(AsyncDaoWrapper.java:135)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
  Caused by: java.io.IOException: Call to 10.20.2.72/10.20.2.72:11600 failed on 
local exception: java.io.IOException: Unexpected exception receiving call 
responses
at 
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1056)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1025)
at 
org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:165)
at $Proxy7.multi(Unknown Source)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1373)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1)
at 
org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1380)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
... 3 more
  Caused by: java.io.IOException: Unexpected exception receiving call responses
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:610)
  Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.receiveResponse(SecureClient.java:378)
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:606)

  was:We find NullPointException when using SecureClient to access hbase. From 
the source code, we find that receiveResponse() will not check call != null 
before invoke call.setValue(...)(line 378 and 380 in SecureClient.java). 
However, as explained in receiveResponse() of HBaseClient, the call may have 
been cleaned up due to RPC timeout; therefore, it should be better to check 
call != null before invoke call.setValue(...) in SecureClient.


 NullPointerException in SecureClient when Call is cleaned up due to RPC  
 timeout
 

 Key: HBASE-7684
 URL: https://issues.apache.org/jira/browse/HBASE-7684

[jira] [Updated] (HBASE-7684) NullPointerException in SecureClient when Call is cleaned up due to RPC timeout


 [ 
https://issues.apache.org/jira/browse/HBASE-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-7684:
--

Description: 
We find NullPointException when using SecureClient to access hbase. From the 
source code, we find that receiveResponse() will not check call != null before 
invoke call.setValue(...)(line 378 and 380 in SecureClient.java). However, as 
explained in receiveResponse() of HBaseClient, the call may have been cleaned 
up due to RPC timeout; therefore, it should be better to check call != null 
before invoke call.setValue(...) in SecureClient.
The exception stack trace is : 
  2013-01-28 12:11:20,060 [request-queuepool-10-thread-1] WARN  
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - 
Failed all from 
region=sms_message,98,1358995149987.0c18f72df3f3f398f9f6dc83fe65afad., 
hostname=10.20.2.72, port=11600
2013-01-28 12:11:20,060 [request-queuepool-10-thread-1] WARN  
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - 
Failed all from 
region=sms_message,98,1358995149987.0c18f72df3f3f398f9f6dc83fe65afad., 
hostname=10.20.2.72, port=11600
java.util.concurrent.ExecutionException: java.io.IOException: Call to 
10.20.2.72/10.20.2.72:11600 failed on local exception: java.io.IOException: 
Unexpected exception receiving call responses
  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
  at java.util.concurrent.FutureTask.get(FutureTask.java:83)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1544)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1396)
  at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:937)
  at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
  at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752)
  at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397)
  at 
com.xiaomi.infra.hbase.business.dao.sms.SmsMessage.updateStatus(SmsMessage.java:245)
  at 
com.xiaomi.infra.hbase.business.client.sms.HMessageDao.doUpdateMessage(HMessageDao.java:165)
  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
com.xiaomi.infra.hbase.business.client.AsyncDaoWrapper.wrapperRunMethod(AsyncDaoWrapper.java:97)
  at 
com.xiaomi.infra.hbase.business.client.HBaseRequest.run(AsyncDaoWrapper.java:135)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Call to 10.20.2.72/10.20.2.72:11600 failed 
on local exception: java.io.IOException: Unexpected exception receiving call 
responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1056)
  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1025)
  at 
org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:165)
  at $Proxy7.multi(Unknown Source)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1373)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1)
  at 
org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1380)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  ... 3 more
Caused by: java.io.IOException: Unexpected exception receiving call 
responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:610)
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.receiveResponse(SecureClient.java:378)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:606)

  was:
We find NullPointException when using SecureClient to access hbase. From the 
source code, we find that receiveResponse() will not check call != null before 
invoke call.setValue(...)(line 378 and 380 in SecureClient.java). However, as 
explained in receiveResponse() of HBaseClient, the call may have been cleaned 
up due to RPC timeout; therefore, it should be better to check call != null

[jira] [Updated] (HBASE-7684) NullPointerException in SecureClient when Call is cleaned up due to RPC timeout


 [ 
https://issues.apache.org/jira/browse/HBASE-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-7684:
--

Description: 
We find NullPointException when using SecureClient to access hbase. From the 
source code, we find that receiveResponse() will not check call != null before 
invoke call.setValue(...)(line 378 and 380 in SecureClient.java). However, as 
explained in receiveResponse() of HBaseClient, the call may have been cleaned 
up due to RPC timeout; therefore, it should be better to check call != null 
before invoke call.setValue(...) in SecureClient.
The exception stack trace is : 
{code}
  2013-01-28 12:11:20,060 [request-queuepool-10-thread-1] WARN  
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - 
Failed all from 
region=sms_message,98,1358995149987.0c18f72df3f3f398f9f6dc83fe65afad., 
hostname=10.20.2.72, port=11600
2013-01-28 12:11:20,060 [request-queuepool-10-thread-1] WARN  
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - 
Failed all from 
region=sms_message,98,1358995149987.0c18f72df3f3f398f9f6dc83fe65afad., 
hostname=10.20.2.72, port=11600
java.util.concurrent.ExecutionException: java.io.IOException: Call to 
10.20.2.72/10.20.2.72:11600 failed on local exception: java.io.IOException: 
Unexpected exception receiving call responses
  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
  at java.util.concurrent.FutureTask.get(FutureTask.java:83)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1544)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1396)
  at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:937)
  at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
  at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752)
  at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397)
  at 
com.xiaomi.infra.hbase.business.dao.sms.SmsMessage.updateStatus(SmsMessage.java:245)
  at 
com.xiaomi.infra.hbase.business.client.sms.HMessageDao.doUpdateMessage(HMessageDao.java:165)
  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
com.xiaomi.infra.hbase.business.client.AsyncDaoWrapper.wrapperRunMethod(AsyncDaoWrapper.java:97)
  at 
com.xiaomi.infra.hbase.business.client.HBaseRequest.run(AsyncDaoWrapper.java:135)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Call to 10.20.2.72/10.20.2.72:11600 failed 
on local exception: java.io.IOException: Unexpected exception receiving call 
responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1056)
  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1025)
  at 
org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:165)
  at $Proxy7.multi(Unknown Source)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1373)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1)
  at 
org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1380)
  at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  ... 3 more
Caused by: java.io.IOException: Unexpected exception receiving call 
responses
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:610)
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.receiveResponse(SecureClient.java:378)
  at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:606)
{code}

  was:
We find NullPointException when using SecureClient to access hbase. From the 
source code, we find that receiveResponse() will not check call != null before 
invoke call.setValue(...)(line 378 and 380 in SecureClient.java). However, as 
explained in receiveResponse() of HBaseClient, the call may have been cleaned 
up due to RPC timeout; therefore, it should be better to check

[jira] [Created] (HBASE-7692) Add utility class to generate ordered byte[] serialization

Nick Dimiduk created HBASE-7692:
---

 Summary: Add utility class to generate ordered byte[] serialization
 Key: HBASE-7692
 URL: https://issues.apache.org/jira/browse/HBASE-7692
 Project: HBase
  Issue Type: Improvement
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


The current Bytes utility class works, but produces output that does not 
maintain the native sort ordering of the input value. This results in, for 
example, a negative value that does not necessarily sort before a positive 
value. HBase should provide a canonical implementation of such a serialization 
format so that third-parties can reliably build on top of HBase. This will 
allow an implementation for HIVE-3634, HIVE-2599, or HIVE-2903 that is 
compatible with similar features in Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7684) NullPointerException in SecureClient when Call is cleaned up due to RPC timeout


 [ 
https://issues.apache.org/jira/browse/HBASE-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-7684:
--

Attachment: check_call_null_before_set_value_in_secure_client.txt

formatting by  dev-support/hbase_eclipse_formatter.xml 

 NullPointerException in SecureClient when Call is cleaned up due to RPC  
 timeout
 

 Key: HBASE-7684
 URL: https://issues.apache.org/jira/browse/HBASE-7684
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.3
Reporter: cuijianwei
 Attachments: check_call_null_before_set_in_secure_client.diff, 
 check_call_null_before_set_value_in_secure_client.txt


 We find NullPointException when using SecureClient to access hbase. From the 
 source code, we find that receiveResponse() will not check call != null 
 before invoke call.setValue(...)(line 378 and 380 in SecureClient.java). 
 However, as explained in receiveResponse() of HBaseClient, the call may have 
 been cleaned up due to RPC timeout; therefore, it should be better to check 
 call != null before invoke call.setValue(...) in SecureClient.
 The exception stack trace is : 
 {code}
   2013-01-28 12:11:20,060 [request-queuepool-10-thread-1] WARN  
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - 
 Failed all from 
 region=sms_message,98,1358995149987.0c18f72df3f3f398f9f6dc83fe65afad., 
 hostname=10.20.2.72, port=11600
 2013-01-28 12:11:20,060 [request-queuepool-10-thread-1] WARN  
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - 
 Failed all from 
 region=sms_message,98,1358995149987.0c18f72df3f3f398f9f6dc83fe65afad., 
 hostname=10.20.2.72, port=11600
 java.util.concurrent.ExecutionException: java.io.IOException: Call to 
 10.20.2.72/10.20.2.72:11600 failed on local exception: java.io.IOException: 
 Unexpected exception receiving call responses
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1544)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1396)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:937)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752)
   at 
 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397)
   at 
 com.xiaomi.infra.hbase.business.dao.sms.SmsMessage.updateStatus(SmsMessage.java:245)
   at 
 com.xiaomi.infra.hbase.business.client.sms.HMessageDao.doUpdateMessage(HMessageDao.java:165)
   at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 com.xiaomi.infra.hbase.business.client.AsyncDaoWrapper.wrapperRunMethod(AsyncDaoWrapper.java:97)
   at 
 com.xiaomi.infra.hbase.business.client.HBaseRequest.run(AsyncDaoWrapper.java:135)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Call to 10.20.2.72/10.20.2.72:11600 
 failed on local exception: java.io.IOException: Unexpected exception 
 receiving call responses
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1056)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1025)
   at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:165)
   at $Proxy7.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1373)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1380)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   ... 3 more
 Caused by: java.io.IOException: Unexpected exception

[jira] [Updated] (HBASE-7691) A couple docbook updates

2013-01-27 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7691:
-

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Will show on site next time we push.  Thanks Nick

 A couple docbook updates
 

 Key: HBASE-7691
 URL: https://issues.apache.org/jira/browse/HBASE-7691
 Project: HBase
  Issue Type: Improvement
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Trivial
 Fix For: 0.96.0

 Attachments: 0001-HBASE-7691-A-couple-docbook-updates.patch


 I've noticed a couple mistakes in the book.
 - link to external (hadoop) documentation which is no longer available on 
 {{hadoop.apache.org/common/docs/current}}.
 - small mistake in instructions for generating a release tarball.
 - mistake in descriptions of test profiles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7684) NullPointerException in SecureClient when Call is cleaned up due to RPC timeout


[ 
https://issues.apache.org/jira/browse/HBASE-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564069#comment-13564069
 ] 

cuijianwei commented on HBASE-7684:
---

Thanks for your concern. I use the source code of SecureClient in 0.94.3 and 
get the exception stack trace as shown above. The diff patch is updated using 
dev-support/hbase_eclipse_formatter.xml.

 NullPointerException in SecureClient when Call is cleaned up due to RPC  
 timeout
 

 Key: HBASE-7684
 URL: https://issues.apache.org/jira/browse/HBASE-7684
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.3
Reporter: cuijianwei
 Attachments: check_call_null_before_set_in_secure_client.diff, 
 check_call_null_before_set_value_in_secure_client.txt


 We find NullPointException when using SecureClient to access hbase. From the 
 source code, we find that receiveResponse() will not check call != null 
 before invoke call.setValue(...)(line 378 and 380 in SecureClient.java). 
 However, as explained in receiveResponse() of HBaseClient, the call may have 
 been cleaned up due to RPC timeout; therefore, it should be better to check 
 call != null before invoke call.setValue(...) in SecureClient.
 The exception stack trace is : 
 {code}
   2013-01-28 12:11:20,060 [request-queuepool-10-thread-1] WARN  
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - 
 Failed all from 
 region=sms_message,98,1358995149987.0c18f72df3f3f398f9f6dc83fe65afad., 
 hostname=10.20.2.72, port=11600
 2013-01-28 12:11:20,060 [request-queuepool-10-thread-1] WARN  
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - 
 Failed all from 
 region=sms_message,98,1358995149987.0c18f72df3f3f398f9f6dc83fe65afad., 
 hostname=10.20.2.72, port=11600
 java.util.concurrent.ExecutionException: java.io.IOException: Call to 
 10.20.2.72/10.20.2.72:11600 failed on local exception: java.io.IOException: 
 Unexpected exception receiving call responses
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1544)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1396)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:937)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752)
   at 
 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397)
   at 
 com.xiaomi.infra.hbase.business.dao.sms.SmsMessage.updateStatus(SmsMessage.java:245)
   at 
 com.xiaomi.infra.hbase.business.client.sms.HMessageDao.doUpdateMessage(HMessageDao.java:165)
   at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 com.xiaomi.infra.hbase.business.client.AsyncDaoWrapper.wrapperRunMethod(AsyncDaoWrapper.java:97)
   at 
 com.xiaomi.infra.hbase.business.client.HBaseRequest.run(AsyncDaoWrapper.java:135)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Call to 10.20.2.72/10.20.2.72:11600 
 failed on local exception: java.io.IOException: Unexpected exception 
 receiving call responses
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1056)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1025)
   at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:165)
   at $Proxy7.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1373)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1380)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at

[jira] [Commented] (HBASE-7692) Add utility class to generate ordered byte[] serialization

2013-01-27 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564070#comment-13564070
 ] 

Anoop Sam John commented on HBASE-7692:
---

So here the utility will take some type information of the value right?

 Add utility class to generate ordered byte[] serialization
 --

 Key: HBASE-7692
 URL: https://issues.apache.org/jira/browse/HBASE-7692
 Project: HBase
  Issue Type: Improvement
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk

 The current Bytes utility class works, but produces output that does not 
 maintain the native sort ordering of the input value. This results in, for 
 example, a negative value that does not necessarily sort before a positive 
 value. HBase should provide a canonical implementation of such a 
 serialization format so that third-parties can reliably build on top of 
 HBase. This will allow an implementation for HIVE-3634, HIVE-2599, or 
 HIVE-2903 that is compatible with similar features in Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1212) merge tool expects regions all have different sequence ids