[jira] [Commented] (HBASE-7958) Statistics per-column family per-region

2013-04-06 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624358#comment-13624358
 ] 

Elliott Clark commented on HBASE-7958:
--

bq.Yeah, they are certainly pretty, but IMO pretty useless for anyone but the 
most novice user.
I think we've been pretty remiss to ignore the novice user.  If those graphs 
could be the thing that gets users to use hbase then they did their job.

 Statistics per-column family per-region
 ---

 Key: HBASE-7958
 URL: https://issues.apache.org/jira/browse/HBASE-7958
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.95.2
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.95.1

 Attachments: hbase-7958_rough-cut-v0.patch, 
 hbase-7958-v0-parent.patch, hbase-7958-v0.patch


 Originating from this discussion on the dev list: 
 http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain
 Essentially, we should have built-in statistics gathering for HBase tables. 
 This allows clients to have a better understanding of the distribution of 
 keys within a table and a given region. We could also surface this 
 information via the UI.
 There are a couple different proposals from the email, the overview is this:
 We add in something on compactions that gathers stats about the keys that are 
 written and then we surface them to a table.
 The possible proposals include:
 *How to implement it?*
 # Coprocessors - 
 ** advantage - it easily plugs in and people could pretty easily add their 
 own statistics. 
 ** disadvantage - UI elements would also require this, we get into dependent 
 loading, which leads down the OSGi path. Also, these CPs need to be installed 
 _after_ all the other CPs on compaction to ensure they see exactly what gets 
 written (doable, but a pain)
 # Built into HBase as a custom scanner
 ** advantage - always goes in the right place and no need to muck about with 
 loading CPs etc.
 ** disadvantage - less pluggable, at least for the initial cut
 *Where do we store data?*
 # .META.
 ** advantage - its an existing table, so we can jam it into another CF there
 ** disadvantage - this would make META much larger, possibly leading to 
 splits AND will make it much harder for other processes to read the info
 # A new stats table
 ** advantage - cleanly separates out the information from META
 ** disadvantage - should use a 'system table' idea to prevent accidental 
 deletion, manipulation by arbitrary clients, but still allow clients to read 
 it.
 Once we have this framework, we can then move to an actual implementation of 
 various statistics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8273) HColumnDescriptor setters should return void

2013-04-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624412#comment-13624412
 ] 

Ted Yu commented on HBASE-8273:
---

I logged this JIRA with an attempt to solve the binary compatibility issue.

After further investigation, it turns out binary compatibility can be achieved 
between 0.92.3 and 0.94.7 in the following way:
1. introduce an interface, called X e.g., in 0.92(.3) with the setters for 
HColumnDescriptor
2. introduce an interface, called Y e.g., in 0.94(.7) with the setters for 
HColumnDescriptor where build pattern is supported
3. write Java assembly code in 0.94(.7) for HColumnDescriptor supporting both 
interfaces

Client code does not need reflection to use those methods.

 HColumnDescriptor setters should return void
 

 Key: HBASE-8273
 URL: https://issues.apache.org/jira/browse/HBASE-8273
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu

 See discussion on dev@ mailing list, entitled 'Does compatibility between 
 versions also mean binary compatibility?'
 Synopsis from that thread:
 HBASE-5357 Use builder pattern in HColumnDescriptor changed the
 method signatures by changing void to HColumnDescriptor so it' not
 the same methods anymore.
 if you invoke setters
 on HColumnDescriptor as you'll get:
 java.lang.NoSuchMethodError:
 org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Closed] (HBASE-7870) Delete(byte [] row, long timestamp) constructor is deprecate and should not.

2013-04-06 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl closed HBASE-7870.



 Delete(byte [] row, long timestamp) constructor is deprecate and should not.
 

 Key: HBASE-7870
 URL: https://issues.apache.org/jira/browse/HBASE-7870
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Critical
 Fix For: 0.94.6

 Attachments: HBASE-7870-v0-0.94.patch


 Most probably a cutpast issue. Patch is coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8240) CompoundConfiguration should implement Iterable

2013-04-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624477#comment-13624477
 ] 

Ted Yu commented on HBASE-8240:
---

[~lhofhansl]:
Are you Okay with this going into 0.94 ?

Thanks

 CompoundConfiguration should implement Iterable
 ---

 Key: HBASE-8240
 URL: https://issues.apache.org/jira/browse/HBASE-8240
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.95.1, 0.98.0

 Attachments: 8240-v1.txt, 8240-v2.txt, 8240-v3.txt, 8240-v4.txt, 
 8240-v6.txt, 8240-v7.txt


 Here is from hadoop Configuration class:
 {code}
 public class Configuration implements IterableMap.EntryString,String,
 {code}
 There're 3 addXX() methods for CompoundConfiguration:
 {code}
   public CompoundConfiguration add(final Configuration conf) {
   public CompoundConfiguration addWritableMap(
   final MapImmutableBytesWritable, ImmutableBytesWritable map) {
   public CompoundConfiguration addStringMap(final MapString, String map) {
 {code}
 Parameters to these methods all support iteration.
 We can enhance ImmutableConfigMap with the following new method:
 {code}
   public abstract java.util.Iterator iterator();
 {code}
 Then the following method of CompoundConfiguration can be implemented:
 {code}
   public IteratorMap.EntryString, String iterator() {
 {code}
 This enhancement would be useful in scenario where a mutable Configuration is 
 required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Varun Sharma (JIRA)
Varun Sharma created HBASE-8285:
---

 Summary: HBaseClient never recovers for single HTable.get() calls 
when regions move
 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.7


Steps to reproduce this bug:
1) Gracefull restart a region server causing regions to get redistributed.
2) Client call to this region keeps failing since Meta Cache is never purged on 
the client for the region that moved.

Reason behind the bug:
1) Client continues to hit the old region server.
2) The old region server throws NotServingRegionException which is not handled 
correctly and the META cache entries are never purged for that server causing 
the client to keep hitting the old server.

The reason lies in ServerCallable code since we only purge META cache entries 
when there is a RetriesExhaustedException, SocketTimeoutException or 
ConnectException. However, there is no case check for 
NotServingRegionException(s).

Why is this not a problem for Scan(s) and Put(s) ?

a) If a region server is not hosting a region/scanner, then an 
UnknownScannerException is thrown which causes a relocateRegion() call causing 
a refresh of the META cache for that particular region.
b) For put(s), the processBatchCallback() interface in HConnectionManager is 
used which clears out META cache entries for all kinds of exceptions except 
DoNotRetryException.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Varun Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Sharma updated HBASE-8285:


Attachment: 8285-0.94.txt

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.7

 Attachments: 8285-0.94.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Varun Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Sharma updated HBASE-8285:


Attachment: (was: 8285-0.94.txt)

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Varun Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Sharma updated HBASE-8285:


Attachment: 8285-trunk.txt

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Varun Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Sharma updated HBASE-8285:


Fix Version/s: 0.98.0
   Status: Patch Available  (was: Open)

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Varun Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Sharma updated HBASE-8285:


Attachment: 8285-0.94.txt

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624494#comment-13624494
 ] 

Hadoop QA commented on HBASE-8285:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577392/8285-0.94.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5162//console

This message is automatically generated.

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624495#comment-13624495
 ] 

Ted Yu commented on HBASE-8285:
---

Nice analysis, Varun.
{code}
public void clearCaches(final ServerName serverName){
...
if (serverName.equals(e.getValue().getServerName())) {
  tableLocations.remove(e.getKey());
{code}
We clear all entries on the server (serverName) regardless of region.

location contains region information. When NotServingRegionException happens, 
should we just clear entries for that region ?

BTW 0.94 patch was more recent than trunk patch. Hadoop QA is down. When it 
comes back up, 0.94 patch would be picked up.

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8285:
--

Fix Version/s: 0.95.1

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Varun Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624499#comment-13624499
 ] 

Varun Sharma commented on HBASE-8285:
-

I don't see an API to clear a specific region here. We might consider adding 
one...

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnection.html

I make this mistake of attaching in the wrong order all the time. Should I 
remove attachements and reattach ?

Varun

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624501#comment-13624501
 ] 

Ted Yu commented on HBASE-8285:
---

I think we should add another API - this should be feasible for 0.95 / trunk.
For 0.94, please run test suite based on your 0.94 patch yourself.

bq. Should I remove attachements and reattach ?

Please attach new version of trunk patch.

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Varun Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624502#comment-13624502
 ] 

Varun Sharma commented on HBASE-8285:
-

Actually, I think should use relocateRegion() API, lemme update this...

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7801) Allow a deferred sync option per Mutation.

2013-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624505#comment-13624505
 ] 

Hadoop QA commented on HBASE-7801:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577359/7801-0.96-v6.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 156 
new or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.protobuf.TestProtobufUtil

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5163//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5163//console

This message is automatically generated.

 Allow a deferred sync option per Mutation.
 --

 Key: HBASE-7801
 URL: https://issues.apache.org/jira/browse/HBASE-7801
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.95.0, 0.94.6
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 7801-0.94-v1.txt, 7801-0.94-v2.txt, 7801-0.94-v3.txt, 
 7801-0.96-full-v2.txt, 7801-0.96-full-v3.txt, 7801-0.96-full-v4.txt, 
 7801-0.96-full-v5.txt, 7801-0.96-v1.txt, 7801-0.96-v6.txt


 Won't have time for parent. But a deferred sync option on a per operation 
 basis comes up quite frequently.
 In 0.96 this can be handled cleanly via protobufs and 0.94 we can have a 
 special mutation attribute.
 For batch operation we'd take the safest sync option of any of the mutations. 
 I.e. if there is at least one that wants to be flushed we'd sync the batch, 
 if there's none of those but at least one that wants deferred flush we defer 
 flush the batch, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8240) CompoundConfiguration should implement Iterable

2013-04-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624528#comment-13624528
 ] 

stack commented on HBASE-8240:
--

-1 for 0.94.  It is not even needed in 0.95 it seems.

 CompoundConfiguration should implement Iterable
 ---

 Key: HBASE-8240
 URL: https://issues.apache.org/jira/browse/HBASE-8240
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.95.1, 0.98.0

 Attachments: 8240-v1.txt, 8240-v2.txt, 8240-v3.txt, 8240-v4.txt, 
 8240-v6.txt, 8240-v7.txt


 Here is from hadoop Configuration class:
 {code}
 public class Configuration implements IterableMap.EntryString,String,
 {code}
 There're 3 addXX() methods for CompoundConfiguration:
 {code}
   public CompoundConfiguration add(final Configuration conf) {
   public CompoundConfiguration addWritableMap(
   final MapImmutableBytesWritable, ImmutableBytesWritable map) {
   public CompoundConfiguration addStringMap(final MapString, String map) {
 {code}
 Parameters to these methods all support iteration.
 We can enhance ImmutableConfigMap with the following new method:
 {code}
   public abstract java.util.Iterator iterator();
 {code}
 Then the following method of CompoundConfiguration can be implemented:
 {code}
   public IteratorMap.EntryString, String iterator() {
 {code}
 This enhancement would be useful in scenario where a mutable Configuration is 
 required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8286) Move site back up out of hbase-assembly; bad idea

2013-04-06 Thread stack (JIRA)
stack created HBASE-8286:


 Summary: Move site back up out of hbase-assembly; bad idea
 Key: HBASE-8286
 URL: https://issues.apache.org/jira/browse/HBASE-8286
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack


Redoing the packaging, I thought it smart putting it all under the new 
hbase-assembly module.  A few weeks later, it is not looking like a good idea 
(Enis suggested moving the site was maybe 'odd').  Here are a few issues:

+ The generated reports will have mention of hbase-assembly because that is 
where they are being generated (Elliott pointed out the the scm links mention 
hbase-assembly instead of trunk).
+ Doug Meil wants to build and deploy the site but currently deploy is a bit 
awkward to explain because site presumes it is top level so staging goes to the 
wrong place and you have to come along and do fixup afterward; it shouldn't be 
so hard.

A possible downside is that we will break Nicks site check added to hadoopqa 
because site is an aggregating build plugin... but lets see.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8187) trunk/0.95 tarball packaging

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8187:
-

Release Note: 
To build a tgz that has source only:

$ mvn clean install -DskipTests -Dassembly.file=src/main/assembly/src.xml  
assembly:single

The product can be found in hbase-assembly/target

To build a binary tgz:

$ mvn clean install -DskipTests javadoc:aggregate  site  assembly:single

For hadoop2.0:

$ mvn clean install -DskipTests -Dhadoop.profile=2.0 javadoc:aggregate  site  
assembly:single

See HBASE-8224 for how to add hadoop1 and hadoop2 labels to the jars.

Again the product will be in hbase-assembly/target

  was:
To build a tgz that has source only:

$ mvn clean install -DskipTests -Dassembly.file=src/assembly/src.xml  
assembly:single

The product can be found in hbase-assembly/target

To build a binary tgz:

$ mvn clean install -DskipTests javadoc:aggregate  site  assembly:single

For hadoop2.0:

$ mvn clean install -DskipTests -Dhadoop.profile=2.0 javadoc:aggregate  site  
assembly:single

See HBASE-8224 for how to add hadoop1 and hadoop2 labels to the jars.

Again the product will be in hbase-assembly/target


 trunk/0.95 tarball packaging
 

 Key: HBASE-8187
 URL: https://issues.apache.org/jira/browse/HBASE-8187
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.95.0, 0.98.0

 Attachments: 095tarballs.sh, 8187.txt, 8187v2.txt, 8187v3.txt, 
 8187v5.095.addendum.txt, 8187v5.095.txt, 8187v5.txt


 Packaging needs work now we have maven multi-moduled.  Sounds like folks want 
 a package for hadoop1 and hadoop2.  Also want source package and maybe even a 
 package that includes test jars so can run integration tests.
 Let this be umbrella issue for package fixes.  Will hang smaller issues off 
 this one as we figure them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8208) In some situations data is not replicated to slaves when deferredLogSync is enabled

2013-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624543#comment-13624543
 ] 

Hudson commented on HBASE-8208:
---

Integrated in hbase-0.95 #130 (See 
[https://builds.apache.org/job/hbase-0.95/130/])
HBASE-8208 In some situations data is not replicated to slaves when 
deferredLogSync is enabled (Jeffrey Zhong) (Revision 1465175)

 Result = SUCCESS
larsh : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


 In some situations data is not replicated to slaves when deferredLogSync is 
 enabled
 ---

 Key: HBASE-8208
 URL: https://issues.apache.org/jira/browse/HBASE-8208
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0, 0.98.0, 0.94.6
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: hbase-8208-0.94.patch, hbase-8208.patch, 
 hbase-8208-v1.patch, hbase-8208_v2.patch


 This is a subtle issue. When deferredLogSync is enabled, there are chances we 
 could flush data before syncing all HLog entries. Assuming we just flush the 
 internal cache and the server dies with some unsynced hlog entries. 
 Data is not lost at the source cluster while replication is based on WAL 
 files and some changes we flushed at the source won't be replicated the slave 
 clusters. 
 Although enabling deferredLogSync with tolerances of data loss, it breaks the 
 replication assumption that whatever persisted in the source should be 
 replicated to its slave clusters. 
 In short, the slave cluster could end up with double losses: the data loss in 
 the source and some data stored in source cluster may not be replicated to 
 slaves either.
 The fix of the issue isn't hard. Basically we can invoke sync during each 
 flush when replication is enabled for a region server. Since sync returns 
 immediately when nothing to sync so there should be no performance impact.
 Please let me know what you think!
 Thanks,
 -Jeffrey

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8208) In some situations data is not replicated to slaves when deferredLogSync is enabled

2013-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624549#comment-13624549
 ] 

Hudson commented on HBASE-8208:
---

Integrated in HBase-TRUNK #4022 (See 
[https://builds.apache.org/job/HBase-TRUNK/4022/])
HBASE-8208 In some situations data is not replicated to slaves when 
deferredLogSync is enabled (Jeffrey Zhong) (Revision 1465176)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


 In some situations data is not replicated to slaves when deferredLogSync is 
 enabled
 ---

 Key: HBASE-8208
 URL: https://issues.apache.org/jira/browse/HBASE-8208
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0, 0.98.0, 0.94.6
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: hbase-8208-0.94.patch, hbase-8208.patch, 
 hbase-8208-v1.patch, hbase-8208_v2.patch


 This is a subtle issue. When deferredLogSync is enabled, there are chances we 
 could flush data before syncing all HLog entries. Assuming we just flush the 
 internal cache and the server dies with some unsynced hlog entries. 
 Data is not lost at the source cluster while replication is based on WAL 
 files and some changes we flushed at the source won't be replicated the slave 
 clusters. 
 Although enabling deferredLogSync with tolerances of data loss, it breaks the 
 replication assumption that whatever persisted in the source should be 
 replicated to its slave clusters. 
 In short, the slave cluster could end up with double losses: the data loss in 
 the source and some data stored in source cluster may not be replicated to 
 slaves either.
 The fix of the issue isn't hard. Basically we can invoke sync during each 
 flush when replication is enabled for a region server. Since sync returns 
 immediately when nothing to sync so there should be no performance impact.
 Please let me know what you think!
 Thanks,
 -Jeffrey

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8208) In some situations data is not replicated to slaves when deferredLogSync is enabled

2013-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624562#comment-13624562
 ] 

Hudson commented on HBASE-8208:
---

Integrated in HBase-0.94 #949 (See 
[https://builds.apache.org/job/HBase-0.94/949/])
HBASE-8208 In some situations data is not replicated to slaves when 
deferredLogSync is enabled (Jeffrey Zhong) (Revision 1465174)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java


 In some situations data is not replicated to slaves when deferredLogSync is 
 enabled
 ---

 Key: HBASE-8208
 URL: https://issues.apache.org/jira/browse/HBASE-8208
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0, 0.98.0, 0.94.6
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: hbase-8208-0.94.patch, hbase-8208.patch, 
 hbase-8208-v1.patch, hbase-8208_v2.patch


 This is a subtle issue. When deferredLogSync is enabled, there are chances we 
 could flush data before syncing all HLog entries. Assuming we just flush the 
 internal cache and the server dies with some unsynced hlog entries. 
 Data is not lost at the source cluster while replication is based on WAL 
 files and some changes we flushed at the source won't be replicated the slave 
 clusters. 
 Although enabling deferredLogSync with tolerances of data loss, it breaks the 
 replication assumption that whatever persisted in the source should be 
 replicated to its slave clusters. 
 In short, the slave cluster could end up with double losses: the data loss in 
 the source and some data stored in source cluster may not be replicated to 
 slaves either.
 The fix of the issue isn't hard. Basically we can invoke sync during each 
 flush when replication is enabled for a region server. Since sync returns 
 immediately when nothing to sync so there should be no performance impact.
 Please let me know what you think!
 Thanks,
 -Jeffrey

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8276) Backport hbase-6738 to 0.94 Too aggressive task resubmission from the distributed log manager

2013-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624563#comment-13624563
 ] 

Hudson commented on HBASE-8276:
---

Integrated in HBase-0.94 #949 (See 
[https://builds.apache.org/job/HBase-0.94/949/])
HBASE-8276 Backport hbase-6738 to 0.94 Too aggressive task resubmission 
from the distributed log manager (Jeffrey) (Revision 1465161)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java


 Backport hbase-6738 to 0.94 Too aggressive task resubmission from the 
 distributed log manager
 ---

 Key: HBASE-8276
 URL: https://issues.apache.org/jira/browse/HBASE-8276
 Project: HBase
  Issue Type: Bug
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.7

 Attachments: hbase-8276.patch, hbase-8276-v1.patch


 In recent tests, we found situations that when some data nodes are down and 
 file operations are slow depending on underlying hdfs timeout(normally 30 
 secs and socket connection timeout maybe around 1 min). While split log task 
 heart beat time out is only 25 secs, a split log task will be preempted by 
 SplitLogManager and assign to someone else after the 25 secs. On a small 
 cluster, you'll see the same task is keeping bounced back  force for a 
 while. I pasted a snippet of related logs below. You can search preempted 
 from to see a task is preempted.
 {code}
 2013-04-01 21:22:08,599 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog: 
 hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/.logs/ip-10-137-20-188.ec2.internal,60020,1364849530779-splitting/ip-10-137-20-188.ec2.internal%2C60020%2C1364849530779.1364865506159,
  length=127639653
 2013-04-01 21:22:08,599 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
 Recovering file 
 hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/.logs/ip-10-137-20-188.ec2.internal,60020,1364849530779-splitting/ip-10-137-20-188.ec2.internal%2C60020%2C1364849530779.1364865506159
 2013-04-01 21:22:09,603 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
 Finished lease recover attempt for 
 hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/.logs/ip-10-137-20-188.ec2.internal,60020,1364849530779-splitting/ip-10-137-20-188.ec2.internal%2C60020%2C1364849530779.1364865506159
 2013-04-01 21:22:09,629 WARN 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old 
 edits file. It could be the result of a previous failed split attempt. 
 Deleting 
 hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/IntegrationTestLoadAndVerify/73387f8d327a45bacf069bd631d70b3b/recovered.edits/3703447.temp,
  length=0
 2013-04-01 21:22:09,629 WARN 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old 
 edits file. It could be the result of a previous failed split attempt. 
 Deleting 
 hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/IntegrationTestLoadAndVerify/b749cbceaaf037c97f70cc2a6f48f2b8/recovered.edits/3703446.temp,
  length=0
 2013-04-01 21:22:09,630 WARN 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old 
 edits file. It could be the result of a previous failed split attempt. 
 Deleting 
 hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/IntegrationTestLoadAndVerify/c26b9d4a042d42c1194a8c2f389d33c8/recovered.edits/3703448.temp,
  length=0
 2013-04-01 21:22:09,666 WARN 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old 
 edits file. It could be the result of a previous failed split attempt. 
 Deleting 
 hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/IntegrationTestLoadAndVerify/adabdb40ccd52140f09f953ff41fd829/recovered.edits/3703451.temp,
  length=0
 2013-04-01 21:22:09,722 WARN 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old 
 edits file. It could be the result of a previous failed split attempt. 
 Deleting 
 hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/IntegrationTestLoadAndVerify/19f463fe74f4365e7df3e5fdb13aecad/recovered.edits/3703468.temp,
  length=0
 2013-04-01 21:22:09,734 WARN 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old 
 edits file. It could be the result of a previous failed split attempt. 
 Deleting 
 

[jira] [Commented] (HBASE-7961) truncate on disabled table should throw TableNotEnabledException.

2013-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624564#comment-13624564
 ] 

Hudson commented on HBASE-7961:
---

Integrated in HBase-0.94 #949 (See 
[https://builds.apache.org/job/HBase-0.94/949/])
HBASE-7961 truncate on disabled table should throw 
TableNotEnabledException. (rajeshbabu) (Revision 1465186)

 Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/src/main/ruby/hbase/admin.rb


 truncate on disabled table should throw TableNotEnabledException.
 -

 Key: HBASE-7961
 URL: https://issues.apache.org/jira/browse/HBASE-7961
 Project: HBase
  Issue Type: Bug
  Components: Admin
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: HBASE-7961_94.patch, HBASE-7961.patch


 presently truncate on disabled table is deleting existing table and 
 recreating(ENABLED)
 disable(table_name) call in truncate returing if table is disabled without 
 nofifying to user.
 {code}
 def disable(table_name)
   tableExists(table_name)
   return if disabled?(table_name)
   @admin.disableTable(table_name)
 end
 {code}
 one more thing is we are calling tableExists in disable(table_name) as well 
 as drop(table_name) which is un necessary.
 Any way below HTable object creation will check whether table exists or not.
 {code}
 h_table = org.apache.hadoop.hbase.client.HTable.new(conf, table_name)
 {code}
 We can change it to 
 {code}
   h_table = org.apache.hadoop.hbase.client.HTable.new(conf, table_name)
   table_description = h_table.getTableDescriptor()
   yield 'Disabling table...' if block_given?
   @admin.disableTable(table_name)
   yield 'Dropping table...' if block_given?
   @admin.deleteTable(table_name)
   yield 'Creating table...' if block_given?
   @admin.createTable(table_description)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6738) Too aggressive task resubmission from the distributed log manager

2013-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624565#comment-13624565
 ] 

Hudson commented on HBASE-6738:
---

Integrated in HBase-0.94 #949 (See 
[https://builds.apache.org/job/HBase-0.94/949/])
HBASE-8276 Backport hbase-6738 to 0.94 Too aggressive task resubmission 
from the distributed log manager (Jeffrey) (Revision 1465161)

 Result = FAILURE

 Too aggressive task resubmission from the distributed log manager
 -

 Key: HBASE-6738
 URL: https://issues.apache.org/jira/browse/HBASE-6738
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.94.1, 0.95.2
 Environment: 3 nodes cluster test, but can occur as well on a much 
 bigger one. It's all luck!
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Fix For: 0.95.0

 Attachments: 6738.v1.patch


 With default settings for hbase.splitlog.manager.timeout = 25s and 
 hbase.splitlog.max.resubmit = 3.
 On tests mentionned on HBASE-5843, I have variations around this scenario, 
 0.94 + HDFS 1.0.3:
 The regionserver in charge of the split does not answer in less than 25s, so 
 it gets interrupted but actually continues. Sometimes, we go out of the 
 number of retry, sometimes not, sometimes we're out of retry, but the as the 
 interrupts were ignored we finish nicely. In the mean time, the same single 
 task is executed in parallel by multiple nodes, increasing the probability to 
 get into race conditions.
 Details:
 t0: unplug a box with DN+RS
 t + x: other boxes are already connected, to their connection starts to dies. 
 Nevertheless, they don't consider this node as suspect.
 t + 180s: zookeeper - master detects the node as dead. recovery start. It 
 can be less than 180s sometimes it around 150s.
 t + 180s: distributed split starts. There is only 1 task, it's immediately 
 acquired by a one RS.
 t + 205s: the RS has multiple errors when splitting, because a datanode is 
 missing as well. The master decides to give the task to someone else. But 
 often the task continues in the first RS. Interrupts are often ignored, as 
 it's well stated in the code (// TODO interrupt often gets swallowed, do 
 what else?)
 {code}
2012-09-04 18:27:30,404 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to 
 stop the worker thread
 {code}
 t + 211s: two regionsservers are processing the same task. They fight for the 
 leases:
 {code}
 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception: org.apache.hadoop.ipc.RemoteException:  
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on

 /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp
  owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by 
 DFSClient_hb_rs_BOX1,60020,1346775719125
 {code}
  They can fight like this for many files, until the tasks finally get 
 interrupted or finished.
  The taks on the second box can be cancelled as well. In this case, the 
 task is created again for a new box.
  The master seems to stop after 3 attemps. It can as well renounce to 
 split the files. Sometimes the tasks were not cancelled on the RS side, so 
 the split is finished despites what the master thinks and logs. In this case, 
 the assignement starts. In the other, it's we've got a problem).
 {code}
 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 Skipping resubmissions of task 
 /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832
  because threshold 3 reached 
 {code}
 t + 300s: split is finished. Assignement starts
 t + 330s: assignement is finished, regions are available again.
 There are a lot of subcases possible depending on the number of logs files, 
 of region server and so on.
 The issues are:
 1) it's difficult, especially in HBase but not only, to interrupt a task. The 
 pattern is often
 {code}
  void f() throws IOException{
   try {
  // whatever throw InterruptedException
   }catch(InterruptedException){
 throw new InterruptedIOException();
   }
 }
  boolean g(){
int nbRetry= 0;  
for(;;)
   try{
  f();
  return true;
   }catch(IOException e){
  nbRetry++;
  if ( nbRetry  maxRetry) return false;
   }
} 
  }
 {code}
 This tyically shallows the interrupt. There are other variation, but this one 
 seems to be the standard.
 Even if we fix this in HBase, we need the other layers to be Interrupteble as 
 well. That's not proven.
 2) 25s is very aggressive, considering that 

[jira] [Updated] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-7824:
-

Release Note: Master now won't wait for existing or previously left log 
splitting work(except for ROOT and META region servers) finish before it can 
start up. Sometimes log splitting work could take more than minuets. In normal 
case, master node starts much faster than before when there is region server 
recovery work going on. The feature can be disabled by set configuration value 
hbase.master.wait.for.log.splitting to true.(default value is false)

 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8208) In some situations data is not replicated to slaves when deferredLogSync is enabled

2013-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624566#comment-13624566
 ] 

Hudson commented on HBASE-8208:
---

Integrated in hbase-0.95-on-hadoop2 #58 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/58/])
HBASE-8208 In some situations data is not replicated to slaves when 
deferredLogSync is enabled (Jeffrey Zhong) (Revision 1465175)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


 In some situations data is not replicated to slaves when deferredLogSync is 
 enabled
 ---

 Key: HBASE-8208
 URL: https://issues.apache.org/jira/browse/HBASE-8208
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0, 0.98.0, 0.94.6
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: hbase-8208-0.94.patch, hbase-8208.patch, 
 hbase-8208-v1.patch, hbase-8208_v2.patch


 This is a subtle issue. When deferredLogSync is enabled, there are chances we 
 could flush data before syncing all HLog entries. Assuming we just flush the 
 internal cache and the server dies with some unsynced hlog entries. 
 Data is not lost at the source cluster while replication is based on WAL 
 files and some changes we flushed at the source won't be replicated the slave 
 clusters. 
 Although enabling deferredLogSync with tolerances of data loss, it breaks the 
 replication assumption that whatever persisted in the source should be 
 replicated to its slave clusters. 
 In short, the slave cluster could end up with double losses: the data loss in 
 the source and some data stored in source cluster may not be replicated to 
 slaves either.
 The fix of the issue isn't hard. Basically we can invoke sync during each 
 flush when replication is enabled for a region server. Since sync returns 
 immediately when nothing to sync so there should be no performance impact.
 Please let me know what you think!
 Thanks,
 -Jeffrey

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624568#comment-13624568
 ] 

Jeffrey Zhong commented on HBASE-7824:
--

I have run the whole test suite with the patch 4 times in a row. Three are 
clean as following and one with single failure happened in recent builds as 
well.
{code}
Results :
Tests run: 1339, Failures: 0, Errors: 0, Skipped: 13

Integration Test: IntegrationTestDataIngestWithChaosMonkey
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 905.014 sec
{code}

I also provided a release note in the JIRA.

 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-7824:
-

Release Note: Master now won't wait for existing or previously left log 
splitting work(except for ROOT and META region servers) finish before it can 
start up. Sometimes log splitting work could take more than minutes. In normal 
case, master node starts much faster than before when there is region server 
recovery work going on. The feature can be disabled by set configuration value 
hbase.master.wait.for.log.splitting to true.(default value is false)  (was: 
Master now won't wait for existing or previously left log splitting work(except 
for ROOT and META region servers) finish before it can start up. Sometimes log 
splitting work could take more than minuets. In normal case, master node starts 
much faster than before when there is region server recovery work going on. The 
feature can be disabled by set configuration value 
hbase.master.wait.for.log.splitting to true.(default value is false))

 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-7824:
-

Release Note: Master now won't wait for existing or previously left log 
splitting work(except for ROOT and META region servers) finish before it can 
start up. Sometimes log splitting work could take more than minutes. In normal 
case, master node starts much faster than before when there is region server 
recovery work going on. The feature can be disabled by setting configuration 
value hbase.master.wait.for.log.splitting to true.(default value is false)  
(was: Master now won't wait for existing or previously left log splitting 
work(except for ROOT and META region servers) finish before it can start up. 
Sometimes log splitting work could take more than minutes. In normal case, 
master node starts much faster than before when there is region server recovery 
work going on. The feature can be disabled by set configuration value 
hbase.master.wait.for.log.splitting to true.(default value is false))

 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624572#comment-13624572
 ] 

chunhui shen commented on HBASE-7824:
-

[~jeffreyz]
IMO, it is still able to cause META data loss as I mentioned in HBASE-8251:
1.Assign ROOT to the RS where META on
2.Enable SSH for ROOT
3.Assign META

If the META RS(it is also the ROOT RS) is dead between step2 and step3, MetaSSH 
start splitting its hlog.
However step3 will assign META directly(Because 
HMaster#splitLogAndExpireIfOnline will return null), it means META will loss 
the data from hlog.

Correct me if wrong, thanks

 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8208) In some situations data is not replicated to slaves when deferredLogSync is enabled

2013-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624576#comment-13624576
 ] 

Hudson commented on HBASE-8208:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #480 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/480/])
HBASE-8208 In some situations data is not replicated to slaves when 
deferredLogSync is enabled (Jeffrey Zhong) (Revision 1465176)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


 In some situations data is not replicated to slaves when deferredLogSync is 
 enabled
 ---

 Key: HBASE-8208
 URL: https://issues.apache.org/jira/browse/HBASE-8208
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0, 0.98.0, 0.94.6
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: hbase-8208-0.94.patch, hbase-8208.patch, 
 hbase-8208-v1.patch, hbase-8208_v2.patch


 This is a subtle issue. When deferredLogSync is enabled, there are chances we 
 could flush data before syncing all HLog entries. Assuming we just flush the 
 internal cache and the server dies with some unsynced hlog entries. 
 Data is not lost at the source cluster while replication is based on WAL 
 files and some changes we flushed at the source won't be replicated the slave 
 clusters. 
 Although enabling deferredLogSync with tolerances of data loss, it breaks the 
 replication assumption that whatever persisted in the source should be 
 replicated to its slave clusters. 
 In short, the slave cluster could end up with double losses: the data loss in 
 the source and some data stored in source cluster may not be replicated to 
 slaves either.
 The fix of the issue isn't hard. Basically we can invoke sync during each 
 flush when replication is enabled for a region server. Since sync returns 
 immediately when nothing to sync so there should be no performance impact.
 Please let me know what you think!
 Thanks,
 -Jeffrey

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8219) Align Offline Merge with Online Merge

2013-04-06 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624579#comment-13624579
 ] 

chunhui shen commented on HBASE-8219:
-

[~mbertozzi]
What do you think about the latest patch ?

 Align Offline Merge with Online Merge
 -

 Key: HBASE-8219
 URL: https://issues.apache.org/jira/browse/HBASE-8219
 Project: HBase
  Issue Type: Task
  Components: regionserver
Affects Versions: 0.95.0
Reporter: Matteo Bertozzi
Assignee: chunhui shen
 Attachments: hbase-8219v1.patch, hbase-8219v2.patch, 
 hbase-8219v3.patch


 After HBASE-7403 we now have two different tools for online and offline 
 merge, and the result produced by the two are different. (the online one 
 works with snapshots, the offline not)
 We should remove the offline one, or align it to the online code.
 Most of the offline code in HRegion.merge() can be replaced with the one in 
 RegionMergeTransaction, used by the online version.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624583#comment-13624583
 ] 

Jeffrey Zhong commented on HBASE-7824:
--

[~zjushch] The patch already covered the case you mentioned. You can check both 
v7  v8 patch. The reason I don't mention the scenario in the above suggestion 
is to make the idea easier to be accepted.

Yesterday I replied you on hbase-8251 for ROOT  META collocating on one RS 
scenario. You can check details at MetaSSH in the patch. Basically we only 
recover ROOT portion and leave META part till master meta assignment completes. 
Below is related pseudo code snippet, please let me know if you have more 
questions. Thanks.
{code}
  ...
  re-assign root
  ... 
  
  if(!this.services.isServerShutdownHandlerEnabled()) {
// resubmit in case we're in master initialization and SSH hasn't been 
enabled yet.
this.services.getExecutorService().submit(this);
this.deadServers.add(serverName);
return;
  }
  ...
  re-assign meta
  ...
{code}


 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624587#comment-13624587
 ] 

chunhui shen commented on HBASE-7824:
-

I think your patch couldn't fix the problem.

As the above mentioned case, I have two question.

1.What will the master initialization thread do when assigning META
2.What is the value of isCarryingMeta in ServerManager#expireServer, I think 
it's false rather than true

 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8240) CompoundConfiguration should implement Iterable

2013-04-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624588#comment-13624588
 ] 

Lars Hofhansl commented on HBASE-8240:
--

Patch is simple enough, I am fine on that front.
But I, too, do not understand what this is needed for.


 CompoundConfiguration should implement Iterable
 ---

 Key: HBASE-8240
 URL: https://issues.apache.org/jira/browse/HBASE-8240
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.95.1, 0.98.0

 Attachments: 8240-v1.txt, 8240-v2.txt, 8240-v3.txt, 8240-v4.txt, 
 8240-v6.txt, 8240-v7.txt


 Here is from hadoop Configuration class:
 {code}
 public class Configuration implements IterableMap.EntryString,String,
 {code}
 There're 3 addXX() methods for CompoundConfiguration:
 {code}
   public CompoundConfiguration add(final Configuration conf) {
   public CompoundConfiguration addWritableMap(
   final MapImmutableBytesWritable, ImmutableBytesWritable map) {
   public CompoundConfiguration addStringMap(final MapString, String map) {
 {code}
 Parameters to these methods all support iteration.
 We can enhance ImmutableConfigMap with the following new method:
 {code}
   public abstract java.util.Iterator iterator();
 {code}
 Then the following method of CompoundConfiguration can be implemented:
 {code}
   public IteratorMap.EntryString, String iterator() {
 {code}
 This enhancement would be useful in scenario where a mutable Configuration is 
 required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624590#comment-13624590
 ] 

Lars Hofhansl commented on HBASE-8285:
--

Wow, really?! That seems so core to HBase that is almost unbelievable that this 
would have gone undetected for so long!

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624596#comment-13624596
 ] 

Jeffrey Zhong commented on HBASE-7824:
--

Let me add more clarifications to your first comments to see if you agree 
firstly:

{quote}
If the META RS(it is also the ROOT RS) is dead between step2 and step3, MetaSSH 
start splitting its hlog.
{quote}
The root recovery portion in Meta SSH will complete log splitting for both ROOT 
and META regions. Therefore, all recovered edits files are created before Meta 
region can be assigned because meta region can only be assigned only when ROOT 
is online. By then, all recovered edits files are created and they will be 
replayed when meta region is opened during assignment.

{quote}
However step3 will assign META directly(Because 
HMaster#splitLogAndExpireIfOnline will return null), it means META will loss 
the data from hlog.
{quote}
This is all right because the existing log splitting work has already be done 
and will be replayed during META region open phase.

Thanks for your feedbacks.



 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7937) Retry log rolling to support HA NN scenario

2013-04-06 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624597#comment-13624597
 ] 

Liang Xie commented on HBASE-7937:
--

Hi [~himan...@cloudera.com], i can make a new patch per comments if you are 
busy. I am working logrolling retry in our internal codebase, so it's handy for 
me.

 Retry log rolling to support HA NN scenario
 ---

 Key: HBASE-7937
 URL: https://issues.apache.org/jira/browse/HBASE-7937
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.5
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.95.1

 Attachments: HBASE-7937-trunk.patch, HBASE-7937-v1.patch


 A failure in log rolling causes regionserver abort. In case of HA NN, it will 
 be good if there is a retry mechanism to roll the logs.
 A corresponding jira for MemStore retries is HBASE-7507.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6870) HTable#coprocessorExec always scan the whole table

2013-04-06 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624599#comment-13624599
 ] 

chunhui shen commented on HBASE-6870:
-

bq.How come useCache being true gets translated into includeEndKey being true ?
The useCache flag is not related with includeEndKey, includeEndKey is always 
true in HTable#coprocessorService.

bq.I wonder if we should provide a mechanism for coprocessors to specify 
whether fresh .META. scan is wanted.
Making a useCache flag is used for performance comparision in my initial 
purpose, I think there's no necessary to provide a choice for fresh .META. scan 
since it is still able to get a invalid region location from the .META.

 HTable#coprocessorExec always scan the whole table 
 ---

 Key: HBASE-6870
 URL: https://issues.apache.org/jira/browse/HBASE-6870
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Affects Versions: 0.94.1, 0.95.0, 0.95.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.95.1, 0.98.0

 Attachments: 6870-v4.txt, HBASE-6870.patch, 
 HBASE-6870-testPerformance.patch, HBASE-6870v2.patch, HBASE-6870v3.patch, 
 hbase-6870v5.patch


 In current logic, HTable#coprocessorExec always scan the whole table, its 
 efficiency is low and will affect the Regionserver carrying .META. under 
 large coprocessorExec requests

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6870) HTable#coprocessorExec always scan the whole table

2013-04-06 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6870:


Attachment: hbase-6870v5.patch

Path v5 fixes the warnings

 HTable#coprocessorExec always scan the whole table 
 ---

 Key: HBASE-6870
 URL: https://issues.apache.org/jira/browse/HBASE-6870
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Affects Versions: 0.94.1, 0.95.0, 0.95.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.95.1, 0.98.0

 Attachments: 6870-v4.txt, HBASE-6870.patch, 
 HBASE-6870-testPerformance.patch, HBASE-6870v2.patch, HBASE-6870v3.patch, 
 hbase-6870v5.patch


 In current logic, HTable#coprocessorExec always scan the whole table, its 
 efficiency is low and will affect the Regionserver carrying .META. under 
 large coprocessorExec requests

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6870) HTable#coprocessorExec always scan the whole table

2013-04-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624601#comment-13624601
 ] 

Ted Yu commented on HBASE-6870:
---

{code}
+if (useCache) {
+  return new ArrayListbyte[](getKeysToRegionsInRange(start, end, true)
+  .keySet());
{code}
When useCache is true, what caching mechanism would getKeysToRegionsInRange() 
use ?

 HTable#coprocessorExec always scan the whole table 
 ---

 Key: HBASE-6870
 URL: https://issues.apache.org/jira/browse/HBASE-6870
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Affects Versions: 0.94.1, 0.95.0, 0.95.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.95.1, 0.98.0

 Attachments: 6870-v4.txt, HBASE-6870.patch, 
 HBASE-6870-testPerformance.patch, HBASE-6870v2.patch, HBASE-6870v3.patch, 
 hbase-6870v5.patch


 In current logic, HTable#coprocessorExec always scan the whole table, its 
 efficiency is low and will affect the Regionserver carrying .META. under 
 large coprocessorExec requests

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624602#comment-13624602
 ] 

chunhui shen commented on HBASE-7824:
-

bq.Therefore, all recovered edits files are created before Meta region can be 
assigned because meta region can only be assigned only when ROOT is online.

We can open the META region on RS when ROOT is offline. 
How about if ROOT RS is killed between getMetaLocationOrReadLocationFromRoot 
and assignMeta?

 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624604#comment-13624604
 ] 

chunhui shen commented on HBASE-7824:
-

It means META region is opening on the RS before SSH completed log-split

 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6870) HTable#coprocessorExec always scan the whole table

2013-04-06 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624605#comment-13624605
 ] 

chunhui shen commented on HBASE-6870:
-

bq.When useCache is true, what caching mechanism would 
getKeysToRegionsInRange() use ?
We will use the cached region locations in HConnectionImplementation

 HTable#coprocessorExec always scan the whole table 
 ---

 Key: HBASE-6870
 URL: https://issues.apache.org/jira/browse/HBASE-6870
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Affects Versions: 0.94.1, 0.95.0, 0.95.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.95.1, 0.98.0

 Attachments: 6870-v4.txt, HBASE-6870.patch, 
 HBASE-6870-testPerformance.patch, HBASE-6870v2.patch, HBASE-6870v3.patch, 
 hbase-6870v5.patch


 In current logic, HTable#coprocessorExec always scan the whole table, its 
 efficiency is low and will affect the Regionserver carrying .META. under 
 large coprocessorExec requests

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8287) TestRegionMergeTransactionOnCluster failed in trunk build #4010

2013-04-06 Thread chunhui shen (JIRA)
chunhui shen created HBASE-8287:
---

 Summary: TestRegionMergeTransactionOnCluster failed in trunk build 
#4010
 Key: HBASE-8287
 URL: https://issues.apache.org/jira/browse/HBASE-8287
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.95.2, 0.98.0


From the log of trunk build #4010:
{code}
2013-04-04 05:45:59,396 INFO  
[MASTER_TABLE_OPERATIONS-quirinus.apache.org,53514,1365054344859-0] 
handler.DispatchMergingRegionHandler(157): 
Cancel merging regions 
testCleanMergeReference,,1365054353296.bf3d60360122d6c83a246f5f96c2cdd1.,
testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa.,
 
because can't move them together after 842ms
 
2013-04-04 05:45:59,396 INFO  [hbase-am-zkevent-worker-pool-2-thread-1] 
master.AssignmentManager$4(1164):
The master has opened the region 
testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa.
 that was onlin
e on quirinus.apache.org,45718,1365054345790
{code}

There's a small probability that fail to move merging regions together to same 
regionserver

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8287) TestRegionMergeTransactionOnCluster failed in trunk build #4010

2013-04-06 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-8287:


Priority: Minor  (was: Major)

 TestRegionMergeTransactionOnCluster failed in trunk build #4010
 ---

 Key: HBASE-8287
 URL: https://issues.apache.org/jira/browse/HBASE-8287
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Minor
 Fix For: 0.95.2, 0.98.0


 From the log of trunk build #4010:
 {code}
 2013-04-04 05:45:59,396 INFO  
 [MASTER_TABLE_OPERATIONS-quirinus.apache.org,53514,1365054344859-0] 
 handler.DispatchMergingRegionHandler(157): 
 Cancel merging regions 
 testCleanMergeReference,,1365054353296.bf3d60360122d6c83a246f5f96c2cdd1.,
 testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa.,
  
 because can't move them together after 842ms
  
 2013-04-04 05:45:59,396 INFO  [hbase-am-zkevent-worker-pool-2-thread-1] 
 master.AssignmentManager$4(1164):
 The master has opened the region 
 testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa.
  that was onlin
 e on quirinus.apache.org,45718,1365054345790
 {code}
 There's a small probability that fail to move merging regions together to 
 same regionserver

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624608#comment-13624608
 ] 

Ted Yu commented on HBASE-8285:
---

@Varun:
If you can add a test case to show this scenario, that would be great.

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624612#comment-13624612
 ] 

Jeffrey Zhong commented on HBASE-7824:
--

{quote}
We can open the META region on RS when ROOT is offline. 
{quote}
We need updated META location in ROOT RS when META region is opening on a newly 
assigned RS. Is that true? Therefore, a ROOT RS has to be online for a 
successful META assignment.

So the open will fail even if META region can be opened but no one can access 
it because root is offline and the old location isn't updated to the newly 
assigned location. Later Meta region will be re-assigned by MetaSSH.

 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8287) TestRegionMergeTransactionOnCluster failed in trunk build #4010

2013-04-06 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-8287:


Status: Patch Available  (was: Open)

 TestRegionMergeTransactionOnCluster failed in trunk build #4010
 ---

 Key: HBASE-8287
 URL: https://issues.apache.org/jira/browse/HBASE-8287
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Minor
 Fix For: 0.95.2, 0.98.0

 Attachments: hbase-trunk-8287.patch


 From the log of trunk build #4010:
 {code}
 2013-04-04 05:45:59,396 INFO  
 [MASTER_TABLE_OPERATIONS-quirinus.apache.org,53514,1365054344859-0] 
 handler.DispatchMergingRegionHandler(157): 
 Cancel merging regions 
 testCleanMergeReference,,1365054353296.bf3d60360122d6c83a246f5f96c2cdd1.,
 testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa.,
  
 because can't move them together after 842ms
  
 2013-04-04 05:45:59,396 INFO  [hbase-am-zkevent-worker-pool-2-thread-1] 
 master.AssignmentManager$4(1164):
 The master has opened the region 
 testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa.
  that was onlin
 e on quirinus.apache.org,45718,1365054345790
 {code}
 There's a small probability that fail to move merging regions together to 
 same regionserver

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8287) TestRegionMergeTransactionOnCluster failed in trunk build #4010

2013-04-06 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-8287:


Attachment: hbase-trunk-8287.patch

The root cause of bug:
In DispatchMergingRegionHandler#process
{code}
region_b_location = masterServices.getAssignmentManager()
  .getRegionStates().getRegionServerOfRegion(region_b);
  onSameRS = region_a_location.equals(region_b_location);
  if (onSameRS || !regionStates.isRegionInTransition(region_b)) {
// Regions are on the same RS, or region_b is not in
// RegionInTransition any more
break;
  }
{code}

The value of onSameRS will be false even if regions are already on the same 
regionserver as the following case:

1.getRegionServerOfRegion(region_b)
2.region_b online
3.isRegionInTransition(region_b)

The above steps is synchronized by RegionStates.
Step 1 return the old server of region_b, thus the value of onSameRS is false;
Step 2 return false since region_b is online after step 2.
Finally, the while block is break and onSameRS is false but the merging regions 
are on the same regionserver in fact

 TestRegionMergeTransactionOnCluster failed in trunk build #4010
 ---

 Key: HBASE-8287
 URL: https://issues.apache.org/jira/browse/HBASE-8287
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Minor
 Fix For: 0.95.2, 0.98.0

 Attachments: hbase-trunk-8287.patch


 From the log of trunk build #4010:
 {code}
 2013-04-04 05:45:59,396 INFO  
 [MASTER_TABLE_OPERATIONS-quirinus.apache.org,53514,1365054344859-0] 
 handler.DispatchMergingRegionHandler(157): 
 Cancel merging regions 
 testCleanMergeReference,,1365054353296.bf3d60360122d6c83a246f5f96c2cdd1.,
 testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa.,
  
 because can't move them together after 842ms
  
 2013-04-04 05:45:59,396 INFO  [hbase-am-zkevent-worker-pool-2-thread-1] 
 master.AssignmentManager$4(1164):
 The master has opened the region 
 testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa.
  that was onlin
 e on quirinus.apache.org,45718,1365054345790
 {code}
 There's a small probability that fail to move merging regions together to 
 same regionserver

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work

2013-04-06 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624622#comment-13624622
 ] 

chunhui shen commented on HBASE-7824:
-

bq.So the open will fail even if META region can be opened but no one can 
access it because root is offline
We will retry if fail to update META location in ROOT RS.
Root will be online finally, however Meta region won't be re-assigned.

 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.8

 Attachments: hbase-7824.patch, hbase-7824_v2.patch, 
 hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7801) Allow a deferred sync option per Mutation.

2013-04-06 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-7801:
-

Attachment: 7801-0.96-v7.txt

Fixed TestProtobufUtil

 Allow a deferred sync option per Mutation.
 --

 Key: HBASE-7801
 URL: https://issues.apache.org/jira/browse/HBASE-7801
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.95.0, 0.94.6
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 7801-0.94-v1.txt, 7801-0.94-v2.txt, 7801-0.94-v3.txt, 
 7801-0.96-full-v2.txt, 7801-0.96-full-v3.txt, 7801-0.96-full-v4.txt, 
 7801-0.96-full-v5.txt, 7801-0.96-v1.txt, 7801-0.96-v6.txt, 7801-0.96-v7.txt


 Won't have time for parent. But a deferred sync option on a per operation 
 basis comes up quite frequently.
 In 0.96 this can be handled cleanly via protobufs and 0.94 we can have a 
 special mutation attribute.
 For batch operation we'd take the safest sync option of any of the mutations. 
 I.e. if there is at least one that wants to be flushed we'd sync the batch, 
 if there's none of those but at least one that wants deferred flush we defer 
 flush the batch, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Varun Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624625#comment-13624625
 ] 

Varun Sharma commented on HBASE-8285:
-

Are there mocks for the region server or HRegion interface - wondering if there 
is a way to write an end to end test for this ?

Thanks !

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move

2013-04-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624627#comment-13624627
 ] 

Ted Yu commented on HBASE-8285:
---

Take a look at TestRegionSplitPolicy#setupMocks() for how mockRegion is set up.
Also refer to MockRegionServerServices.java

 HBaseClient never recovers for single HTable.get() calls when regions move
 --

 Key: HBASE-8285
 URL: https://issues.apache.org/jira/browse/HBASE-8285
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.6.1
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8285-0.94.txt, 8285-trunk.txt


 Steps to reproduce this bug:
 1) Gracefull restart a region server causing regions to get redistributed.
 2) Client call to this region keeps failing since Meta Cache is never purged 
 on the client for the region that moved.
 Reason behind the bug:
 1) Client continues to hit the old region server.
 2) The old region server throws NotServingRegionException which is not 
 handled correctly and the META cache entries are never purged for that server 
 causing the client to keep hitting the old server.
 The reason lies in ServerCallable code since we only purge META cache entries 
 when there is a RetriesExhaustedException, SocketTimeoutException or 
 ConnectException. However, there is no case check for 
 NotServingRegionException(s).
 Why is this not a problem for Scan(s) and Put(s) ?
 a) If a region server is not hosting a region/scanner, then an 
 UnknownScannerException is thrown which causes a relocateRegion() call 
 causing a refresh of the META cache for that particular region.
 b) For put(s), the processBatchCallback() interface in HConnectionManager is 
 used which clears out META cache entries for all kinds of exceptions except 
 DoNotRetryException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6870) HTable#coprocessorExec always scan the whole table

2013-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624631#comment-13624631
 ] 

Hadoop QA commented on HBASE-6870:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577422/hbase-6870v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.zookeeper.lock.TestZKInterProcessReadWriteLock

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5164//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5164//console

This message is automatically generated.

 HTable#coprocessorExec always scan the whole table 
 ---

 Key: HBASE-6870
 URL: https://issues.apache.org/jira/browse/HBASE-6870
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Affects Versions: 0.94.1, 0.95.0, 0.95.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.95.1, 0.98.0

 Attachments: 6870-v4.txt, HBASE-6870.patch, 
 HBASE-6870-testPerformance.patch, HBASE-6870v2.patch, HBASE-6870v3.patch, 
 hbase-6870v5.patch


 In current logic, HTable#coprocessorExec always scan the whole table, its 
 efficiency is low and will affect the Regionserver carrying .META. under 
 large coprocessorExec requests

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8286) Move site back up out of hbase-assembly; bad idea

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8286:
-

Status: Patch Available  (was: Open)

 Move site back up out of hbase-assembly; bad idea
 -

 Key: HBASE-8286
 URL: https://issues.apache.org/jira/browse/HBASE-8286
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Attachments: 8286.txt


 Redoing the packaging, I thought it smart putting it all under the new 
 hbase-assembly module.  A few weeks later, it is not looking like a good idea 
 (Enis suggested moving the site was maybe 'odd').  Here are a few issues:
 + The generated reports will have mention of hbase-assembly because that is 
 where they are being generated (Elliott pointed out the the scm links mention 
 hbase-assembly instead of trunk).
 + Doug Meil wants to build and deploy the site but currently deploy is a bit 
 awkward to explain because site presumes it is top level so staging goes to 
 the wrong place and you have to come along and do fixup afterward; it 
 shouldn't be so hard.
 A possible downside is that we will break Nicks site check added to hadoopqa 
 because site is an aggregating build plugin... but lets see.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8286) Move site back up out of hbase-assembly; bad idea

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8286:
-

Attachment: 8286.txt

Just moves stuff around:

+ moves book, site, and xslt back up to src/main/ in top level module.
+ adjusts paths

I played w/ assemblies.  Still work.

 Move site back up out of hbase-assembly; bad idea
 -

 Key: HBASE-8286
 URL: https://issues.apache.org/jira/browse/HBASE-8286
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Attachments: 8286.txt


 Redoing the packaging, I thought it smart putting it all under the new 
 hbase-assembly module.  A few weeks later, it is not looking like a good idea 
 (Enis suggested moving the site was maybe 'odd').  Here are a few issues:
 + The generated reports will have mention of hbase-assembly because that is 
 where they are being generated (Elliott pointed out the the scm links mention 
 hbase-assembly instead of trunk).
 + Doug Meil wants to build and deploy the site but currently deploy is a bit 
 awkward to explain because site presumes it is top level so staging goes to 
 the wrong place and you have to come along and do fixup afterward; it 
 shouldn't be so hard.
 A possible downside is that we will break Nicks site check added to hadoopqa 
 because site is an aggregating build plugin... but lets see.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6496) Example ZK based scan policy

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6496:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Example ZK based scan policy
 

 Key: HBASE-6496
 URL: https://issues.apache.org/jira/browse/HBASE-6496
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.2

 Attachments: 6496-0.94.txt, 6496-0.96.txt, 6496-0.96-v2.txt, 
 6496.txt, 6496-v2.txt


 Provide an example of a RegionServer that listens to a ZK node to learn about 
 what set of KVs can safely be deleted during a compaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5292:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 getsize per-CF metric incorrectly counts compaction related reads as well 
 --

 Key: HBASE-5292
 URL: https://issues.apache.org/jira/browse/HBASE-5292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100924
Reporter: Kannan Muthukkaruppan
 Fix For: 0.94.2

 Attachments: 
 0001-jira-HBASE-5292-Prevent-counting-getSize-on-compacti.patch, 
 5292-0.94.txt, ASF.LICENSE.NOT.GRANTED--D1527.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D1527.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D1527.3.patch, 
 ASF.LICENSE.NOT.GRANTED--D1527.4.patch, 
 ASF.LICENSE.NOT.GRANTED--D1617.1.patch, 
 jira-HBASE-5292-Prevent-counting-getSize-on-compacti-2012-03-09_13_26_52.patch


 The per-CF getsize metric's intent was to track bytes returned (to HBase 
 clients) per-CF. [Note: We already have metrics to track # of HFileBlock's 
 read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt 
 vs. fsblockreadcnt.]
 Currently, the getsize metric gets updated for both client initiated 
 Get/Scan operations as well for compaction related reads. The metric is 
 updated in StoreScanner.java:next() when the Scan query matcher returns an 
 INCLUDE* code via a:
  HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength());
 We should not do the above in case of compactions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5549:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, Zookeeper
Affects Versions: 0.95.2
 Environment: all
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 0.94.2

 Attachments: 5549_092.txt, 5549_094.txt, 5549.v10.patch, 
 5549.v11.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, 
 nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-5997) Fix concerns raised in HBASE-5922 related to HalfStoreFileReader

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5997:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Fix concerns raised in HBASE-5922 related to HalfStoreFileReader
 

 Key: HBASE-5997
 URL: https://issues.apache.org/jira/browse/HBASE-5997
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0, 0.95.2
Reporter: ramkrishna.s.vasudevan
Assignee: Anoop Sam John
 Fix For: 0.94.2

 Attachments: 5997v3_trunk.txt, 5997v3_trunk.txt, 5997v3_trunk.txt, 
 5997v3_trunk.txt, HBASE-5997_0.94.patch, HBASE-5997_94 V2.patch, 
 HBASE-5997_94 V3.patch, Testcase.patch.txt


 Pls refer to the comment
 https://issues.apache.org/jira/browse/HBASE-5922?focusedCommentId=13269346page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13269346.
 Raised this issue to solve that comment. Just incase we don't forget it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6165:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Replication can overrun .META. scans on cluster re-start
 

 Key: HBASE-6165
 URL: https://issues.apache.org/jira/browse/HBASE-6165
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Himanshu Vashishtha
 Fix For: 0.94.2

 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, 
 HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, 
 HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch


 When restarting a large set of regions on a reasonably small cluster the 
 replication from another cluster tied up every xceiver meaning nothing could 
 be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6211) Put latencies in jmx

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6211:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Put latencies in jmx
 

 Key: HBASE-6211
 URL: https://issues.apache.org/jira/browse/HBASE-6211
 Project: HBase
  Issue Type: Bug
  Components: metrics, monitoring
 Environment: RegionServerMetrics pushes latency histograms to hadoop 
 metrics, but they are not getting into jmx.
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.94.2

 Attachments: 6211_092.txt, HBASE-6211-0.patch, HBASE-6211-1.patch, 
 HBASE-6211-2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6263) Use default mode for HBase Thrift gateway if not specified

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6263:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Use default mode for HBase Thrift gateway if not specified
 --

 Key: HBASE-6263
 URL: https://issues.apache.org/jira/browse/HBASE-6263
 Project: HBase
  Issue Type: Bug
  Components: Thrift
Affects Versions: 0.94.0, 0.95.2
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
  Labels: noob
 Fix For: 0.94.2

 Attachments: HBASE-6263-0.94.patch, HBASE-6263.patch


 The Thrift gateway should start with a default mode if one is not selected. 
 Currently, instead we see:
 {noformat}
 Exception in thread main java.lang.AssertionError: Exactly one option out 
 of [-hsha, -nonblocking, -threadpool, -threadedselector] has to be specified
   at 
 org.apache.hadoop.hbase.thrift.ThriftServerRunner$ImplType.setServerImpl(ThriftServerRunner.java:201)
   at 
 org.apache.hadoop.hbase.thrift.ThriftServer.processOptions(ThriftServer.java:169)
   at 
 org.apache.hadoop.hbase.thrift.ThriftServer.doMain(ThriftServer.java:85)
   at 
 org.apache.hadoop.hbase.thrift.ThriftServer.main(ThriftServer.java:192)
 {noformat}
 See also BIGTOP-648. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6299) RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6299:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 RS starting region open while failing ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems
 

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Fix For: 0.94.2

 Attachments: 6299v4.txt, 6299v4.txt, 6299v4.txt, HBASE-6299.patch, 
 HBASE-6299-v2.patch, HBASE-6299-v3.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, 

[jira] [Updated] (HBASE-6321) ReplicationSource dies reading the peer's id

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6321:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 ReplicationSource dies reading the peer's id
 

 Key: HBASE-6321
 URL: https://issues.apache.org/jira/browse/HBASE-6321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.2

 Attachments: HBASE-6321-0.92.patch, HBASE-6321-0.94.patch, 
 HBASE-6321.patch


 This is what I saw:
 {noformat}
 2012-07-01 05:04:01,638 ERROR 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing 
 source 8 because an error occurred: Could not read peer's cluster id
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired for /va1-backup/hbaseid
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259)
 at 
 org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253)
 {noformat}
 The session should just be reopened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6359) KeyValue may return incorrect values after readFields()

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6359:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 KeyValue may return incorrect values after readFields()
 ---

 Key: HBASE-6359
 URL: https://issues.apache.org/jira/browse/HBASE-6359
 Project: HBase
  Issue Type: Bug
Reporter: Dave Revell
Assignee: Dave Revell
  Labels: noob
 Fix For: 0.94.2

 Attachments: HBASE-6359-trunk-v1.diff


 When the same KeyValue object is used multiple times for deserialization 
 using readFields, some methods may return incorrect values. Here is a 
 sequence of operations that will reproduce the problem:
  # A KeyValue is created whose key has length 10. The private field keyLength 
 is initialized to 0.
  # KeyValue.getKeyLength() is called. This reads the key length 10 from the 
 backing array and caches it in keyLength.
  # KeyValue.readFields() is called to deserialize a new value. The keyLength 
 field is not cleared and keeps its value of 10, even though this value is 
 probably incorrect.
  # If getKeyLength() is called, the value 10 will be returned.
 For example, in a reducer with IterableKeyValue, all values after the first 
 one from the iterable are likely to return incorrect values from 
 getKeyLength().
 The solution is to clear all memoized values in KeyValue.readFields(). I'll 
 write a patch for this soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6364:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Powering down the server host holding the .META. table causes HBase Client to 
 take excessively long to recover and connect to reassigned .META. table
 -

 Key: HBASE-6364
 URL: https://issues.apache.org/jira/browse/HBASE-6364
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Suraj Varma
Assignee: Nicolas Liochon
  Labels: client
 Fix For: 0.94.2

 Attachments: 6364.94.v2.nolargetest.patch, 
 6364.94.v2.nolargetest.security-addendum.patch, 
 6364-host-serving-META.v1.patch, 6364.v11.nolargetest.patch, 6364.v1.patch, 
 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
 6364.v5.withtests.patch, 6364.v6.patch, 6364.v6.withtests.patch, 
 6364.v7.withtests.patch, 6364.v8.withtests.patch, 6364.v9.patch, 
 stacktrace.txt


 When a server host with a Region Server holding the .META. table is powered 
 down on a live cluster, while the HBase cluster itself detects and reassigns 
 the .META. table, connected HBase Client's take an excessively long time to 
 detect this and re-discover the reassigned .META. 
 Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
 value (default is 20s leading to 35 minute recovery time; we were able to get 
 acceptable results with 100ms getting a 3 minute recovery) 
 This was found during some hardware failure testing scenarios. 
 Test Case:
 1) Apply load via client app on HBase cluster for several minutes
 2) Power down the region server holding the .META. server (i.e. power off ... 
 and keep it off)
 3) Measure how long it takes for cluster to reassign META table and for 
 client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
 and DN on that host).
 Observation:
 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
 recover (i.e. for the thread count to go back to normal) - no client calls 
 are serviced - they just back up on a synchronized method (see #2 below)
 2) All the client app threads queue up behind the 
 oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
 After taking several thread dumps we found that the thread within this 
 synchronized method was blocked on  NetUtils.connect(this.socket, 
 remoteId.getAddress(), getSocketTimeout(conf));
 The client thread that gets the synchronized lock would try to connect to the 
 dead RS (till socket times out after 20s), retries, and then the next thread 
 gets in and so forth in a serial manner.
 Workaround:
 ---
 Default ipc.socket.timeout is set to 20s. We dropped this to a low number 
 (1000 ms,  100 ms, etc) on the client side hbase-site.xml. With this setting, 
 the client threads recovered in a couple of minutes by failing fast and 
 re-discovering the .META. table on a reassigned RS.
 Assumption: This ipc.socket.timeout is only ever used during the initial 
 HConnection setup via the NetUtils.connect and should only ever be used 
 when connectivity to a region server is lost and needs to be re-established. 
 i.e it does not affect the normal RPC actiivity as this is just the connect 
 timeout.
 During RS GC periods, any _new_ clients trying to connect will fail and will 
 require .META. table re-lookups.
 This above timeout workaround is only for the HBase client side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6432:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 HRegionServer doesn't properly set clusterId in conf
 

 Key: HBASE-6432
 URL: https://issues.apache.org/jira/browse/HBASE-6432
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.95.2
Reporter: Francis Liu
Assignee: Francis Liu
 Fix For: 0.94.2

 Attachments: HBASE-6432_94.patch, HBASE-6432.patch


 ClusterId is normally set into the passed conf during instantiation of an 
 HTable class. In the case of a HRegionServer this is bypassed and set to 
 default since getMaster() since it uses HBaseRPC to create the proxy 
 directly and bypasses the class which retrieves and sets the correct 
 clusterId. 
 This becomes a problem with clients (ie within a coprocessor) using 
 delegation tokens for authentication. Since the token's service will be the 
 correct clusterId and while the TokenSelector is looking for one with service 
 default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6437) Avoid admin.balance during master initialize

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6437:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Avoid admin.balance during master initialize
 

 Key: HBASE-6437
 URL: https://issues.apache.org/jira/browse/HBASE-6437
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
 Fix For: 0.94.2

 Attachments: HBASE-6437_92.patch, HBASE-6437_92.patch, 
 HBASE-6437_94.patch, HBASE-6437_94.patch, HBASE-6437_trunk_2.patch, 
 HBASE-6437_trunk.patch, HBASE-6437_trunk.patch


 In HBASE-5850 many of the admin operations have been blocked till the master 
 initializes.  But the balancer is not.  So this JIRA is to extend the 
 PleaseHoldException in case of admin.balance() call before master is 
 initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6438:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 RegionAlreadyInTransitionException needs to give more info to avoid 
 assignment inconsistencies
 --

 Key: HBASE-6438
 URL: https://issues.apache.org/jira/browse/HBASE-6438
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
 Fix For: 0.94.2

 Attachments: 6438-0.92.txt, 6438.addendum, 6438-addendum.94, 
 6438-trunk_2.patch, HBASE-6438_2.patch, HBASE-6438_94_3.patch, 
 HBASE-6438_94_4.patch, HBASE-6438_94.patch, HBASE-6438-trunk_2.patch, 
 HBASE-6438_trunk.patch


 Seeing some of the recent issues in region assignment, 
 RegionAlreadyInTransitionException is one reason after which the region 
 assignment may or may not happen(in the sense we need to wait for the TM to 
 assign).
 In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on 
 master restart.
 Consider the following case, due to some reason like master restart or 
 external assign call, we try to assign a region that is already getting 
 opened in a RS.
 Now the next call to assign has already changed the state of the znode and so 
 the current assign that is going on the RS is affected and it fails.  The 
 second assignment that started also fails getting RAITE exception.  Finally 
 both assignments not carrying on.  Idea is to find whether any such RAITE 
 exception can be retried or not.
 Here again we have following cases like where
 - The znode is yet to transitioned from OFFLINE to OPENING in RS
 - RS may be in the step of openRegion.
 - RS may be trying to transition OPENING to OPENED.
 - RS is yet to add to online regions in the RS side.
 Here in openRegion() and updateMeta() any failures we are moving the znode to 
 FAILED_OPEN.  So in these cases getting an RAITE should be ok.  But in other 
 cases the assignment is stopped.
 The idea is to just add the current state of the region assignment in the RIT 
 map in the RS side and using that info we can determine whether the 
 assignment can be retried or not on getting an RAITE.
 Considering the current work going on in AM, pls do share if this is needed 
 atleast in the 0.92/0.94 versions?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6460) hbck -repairHoles usage inconsistent with -fixHdfsOrphans

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6460:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 hbck -repairHoles usage inconsistent with -fixHdfsOrphans
 -

 Key: HBASE-6460
 URL: https://issues.apache.org/jira/browse/HBASE-6460
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.94.0, 0.95.2
Reporter: Jie Huang
Assignee: Jie Huang
Priority: Minor
 Fix For: 0.94.2

 Attachments: hbase-6460.patch


 According to the hbck's help info, shortcut - -repairHoles will enable 
 -fixHdfsOrphans as below.
 {noformat}
  -repairHoles  Shortcut for -fixAssignments -fixMeta -fixHdfsHoles 
 -fixHdfsOrphans
 {noformat}
 However, in the implementation, the function fsck.setFixHdfsOrphans(false); 
 is called in -repairHoles. This is not consistent with the usage 
 information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6471) Performance regression caused by HBASE-4054

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6471:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Performance regression caused by HBASE-4054
 ---

 Key: HBASE-6471
 URL: https://issues.apache.org/jira/browse/HBASE-6471
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.92.0
Reporter: Lars George
Assignee: Jimmy Xiang
Priority: Critical
 Fix For: 0.94.2

 Attachments: trunk-6471.patch, trunk-6471.patch, trunk-6471_v2.patch


 The patch in HBASE-4054 switches the PooledHTable to extend HTable as opposed 
 to implement HTableInterface.
 Since HTable does not have an empty constructor, the patch added a call to 
 the super() constructor, which though does trigger the ZooKeeper and META 
 scan, causing a considerable delay. 
 With multiple threads using the pool in parallel, the first thread is holding 
 up all the subsequent ones, in effect it negates the whole reason we have a 
 HTable pool.
 We should complete HBASE-5728, or alternatively add a protected, empty 
 constructor the HTable. I am +1 for the former.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6478) TestClassLoading.testClassLoadingFromLibDirInJar occasionally fails

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6478:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 TestClassLoading.testClassLoadingFromLibDirInJar occasionally fails
 ---

 Key: HBASE-6478
 URL: https://issues.apache.org/jira/browse/HBASE-6478
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.94.2

 Attachments: HBASE-6478-trunk.patch, HBASE-6478-trunk-v2.patch, 
 HBASE-6478-trunk-v3.patch, HBASE-6478-trunk-v4.patch


 When hudson runs for HBASE-6459, it encounters a failed testcase in 
 org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar.
  The link is 
 https://builds.apache.org/job/PreCommit-HBASE-Build/2455/testReport/org.apache.hadoop.hbase.coprocessor/TestClassLoading/testClassLoadingFromLibDirInJar/
 I check the log, and find that the function waitTableAvailable will only 
 check the meta table, when rs open the region and update the metalocation in 
 meta, it may not be added to the onlineregions in rs.
 for (HRegion region:
 hbase.getRegionServer(0).getOnlineRegionsLocalContext()) {
 this Loop will ship, and found1 will be false altogether.
 that's why the testcase failed.
 So maybe we can  hbave some strictly check when table is created

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6488:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 HBase wont run on IPv6 on OSes that use zone-indexes
 

 Key: HBASE-6488
 URL: https://issues.apache.org/jira/browse/HBASE-6488
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ryan rawson
Assignee: ryan rawson
 Fix For: 0.94.2

 Attachments: HBASE-6488-trunk.txt, HBASE-6488.txt


 In IPv6, an address may have a zone-index, which is specified with a percent, 
 eg: ...%0.  This looks like a format string, and thus in a part of the code 
 which uses the hostname as a prefix to another string which is interpreted 
 with String.format, you end up with an exception:
 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.util.UnknownFormatConversionException: Conversion = '0'
 at java.util.Formatter.checkText(Formatter.java:2503)
 at java.util.Formatter.parse(Formatter.java:2467)
 at java.util.Formatter.format(Formatter.java:2414)
 at java.util.Formatter.format(Formatter.java:2367)
 at java.lang.String.format(String.java:2769)
 at 
 com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227)
 at 
 org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
 at java.lang.Thread.run(Thread.java:680)
 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6504:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Adding GC details prevents HBase from starting in non-distributed mode
 --

 Key: HBASE-6504
 URL: https://issues.apache.org/jira/browse/HBASE-6504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Benoit Sigoure
Assignee: Michael Drzal
Priority: Trivial
  Labels: noob
 Fix For: 0.94.2

 Attachments: HBASE-6504-output.txt, HBASE-6504.patch, 
 HBASE-6504-v2.patch


 The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out 
 examples of variables that could be useful, such as adding 
 {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}.  This has 
 the annoying side effect that the JVM prints a summary of memory usage when 
 it exits, and it does so on stdout:
 {code}
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed
 false
 Heap
  par new generation   total 19136K, used 4908K [0x00073a20, 
 0x00073b6c, 0x00075186)
   eden space 17024K,  28% used [0x00073a20, 0x00073a6cb0a8, 
 0x00073b2a)
   from space 2112K,   0% used [0x00073b2a, 0x00073b2a, 
 0x00073b4b)
   to   space 2112K,   0% used [0x00073b4b, 0x00073b4b, 
 0x00073b6c)
  concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 
 0x0007556c, 0x0007f5a0)
  concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 
 0x0007f6ec, 0x0008)
 $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
 hbase.cluster.distributed /dev/null
 (nothing printed)
 {code}
 And this confuses {{bin/start-hbase.sh}} when it does
 {{distMode=`$bin/hbase --config $HBASE_CONF_DIR 
 org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, 
 because then the {{distMode}} variable is not just set to {{false}}, it also 
 contains all this JVM spam.
 If you don't pay enough attention and realize that 3 processes are getting 
 started (ZK, HM, RS) instead of just one (HM), then you end up with this 
 confusing error message:
 {{Could not start ZK at requested port of 2181.  ZK was started at port: 
 2182.  Aborting as clients (e.g. shell) will not be able to find this ZK 
 quorum.}}, which is even more puzzling because when you run {{netstat}} to 
 see who owns that port, then you won't find any rogue process other than the 
 one you just started.
 I'm wondering if the fix is not to just change the {{if [ $distMode == 
 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work 
 around this annoying JVM misfeature that pollutes stdout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6512) Incorrect OfflineMetaRepair log class name

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6512:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Incorrect OfflineMetaRepair log class name
 --

 Key: HBASE-6512
 URL: https://issues.apache.org/jira/browse/HBASE-6512
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.94.0, 0.94.1, 0.94.2, 0.95.2
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 0.94.2

 Attachments: HBASE-6512.diff


 At the beginning of OfflineMetaRepair.java, we can observe:
 private static final Log LOG = LogFactory.getLog(HBaseFsck.class.getName());
 It would be better change to :
 private static final Log LOG = 
 LogFactory.getLog(OfflineMetaRepair.class.getName());

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6514) unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6514:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 unknown metrics type: 
 org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram
 

 Key: HBASE-6514
 URL: https://issues.apache.org/jira/browse/HBASE-6514
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.2, 0.94.0
 Environment: MacOS 10.8
 Oracle JDK 1.7
Reporter: Archimedes Trajano
Assignee: Elliott Clark
 Fix For: 0.94.2

 Attachments: FrameworkTest.java, FrameworkTest.java, 
 HBASE-6514-94-0.patch, HBASE-6514-trunk-0.patch, out.txt


 When trying to run a unit test that just starts up and shutdown the server 
 the following errors occur in System.out
 01:10:59,874 ERROR MetricsUtil:116 - unknown metrics type: 
 org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram
 01:10:59,874 ERROR MetricsUtil:116 - unknown metrics type: 
 org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram
 01:10:59,875 ERROR MetricsUtil:116 - unknown metrics type: 
 org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram
 01:10:59,875 ERROR MetricsUtil:116 - unknown metrics type: 
 org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6516) hbck cannot detect any IOException while .tableinfo file is missing

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6516:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 hbck cannot detect any IOException while .tableinfo file is missing
 -

 Key: HBASE-6516
 URL: https://issues.apache.org/jira/browse/HBASE-6516
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.94.0, 0.95.2
Reporter: Jie Huang
Assignee: Jie Huang
 Fix For: 0.94.2

 Attachments: hbase-6516-94-v5.patch, hbase-6516.patch, 
 hbase-6516-v2.patch, hbase-6516-v3.patch, hbase-6516-v4.patch, 
 hbase-6516-v5a.patch, hbase-6516-v5.patch


 HBaseFsck checks those missing .tableinfo files in loadHdfsRegionInfos() 
 function. However, no IoException will be catched while .tableinfo is 
 missing, since FSTableDescriptors.getTableDescriptor doesn't throw any 
 IoException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6525) bin/replication/copy_tables_desc.rb references non-existent class

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6525:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 bin/replication/copy_tables_desc.rb references non-existent class
 -

 Key: HBASE-6525
 URL: https://issues.apache.org/jira/browse/HBASE-6525
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.0, 0.95.2
Reporter: David S. Wang
Assignee: David S. Wang
Priority: Trivial
  Labels: noob
 Fix For: 0.94.2

 Attachments: HBASE-6525.patch


 $ hbase org.jruby.Main copy_tables_desc.rb
 NameError: cannot load Java class 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
   get_proxy_or_package_under_package at 
 org/jruby/javasupport/JavaUtilities.java:54
   method_missing at 
 file:/mnt/data/hbase/lib/jruby-complete-1.6.5.jar!/builtin/javasupport/java.rb:51
   (root) at copy_tables_desc.rb:35
 Removing the line that references the non-existent class seems to make the 
 script work without any visible side-effects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6520) MSLab May cause the Bytes.toLong not work correctly for increment

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6520:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 MSLab May cause the Bytes.toLong not work correctly for increment
 -

 Key: HBASE-6520
 URL: https://issues.apache.org/jira/browse/HBASE-6520
 Project: HBase
  Issue Type: Bug
Reporter: ShiXing
Assignee: ShiXing
 Fix For: 0.94.2

 Attachments: HBASE-6520-0.94-v1.patch, HBASE-6520-trunk-v1.patch


 When use MemStoreLAB, the KeyValues will share the byte array allocated by 
 the MemStoreLAB, all the KeyValues' bytes attributes are the same byte 
 array. When use the functions such as Bytes.toLong(byte[] bytes, int offset):
 {code}
   public static long toLong(byte[] bytes, int offset) {
 return toLong(bytes, offset, SIZEOF_LONG);
   }
   public static long toLong(byte[] bytes, int offset, final int length) {
 if (length != SIZEOF_LONG || offset + length  bytes.length) {
   throw explainWrongLengthOrOffset(bytes, offset, length, SIZEOF_LONG);
 }
 long l = 0;
 for(int i = offset; i  offset + length; i++) {
   l = 8;
   l ^= bytes[i]  0xFF;
 }
 return l;
   }
 {code}
 If we do not put a long value to the KeyValue, and read it as a long value in 
 HRegion.increment(),the check 
 {code}
 offset + length  bytes.length
 {code}
 will take no effects, because the bytes.length is not equal to 
 keyLength+valueLength, indeed it is MemStoreLAB chunkSize which is default 
 2048 * 1024.
 I will paste the patch later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6529:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 With HFile v2, the region server will always perform an extra copy of source 
 files
 --

 Key: HBASE-6529
 URL: https://issues.apache.org/jira/browse/HBASE-6529
 Project: HBase
  Issue Type: Bug
  Components: Performance, regionserver
Affects Versions: 0.94.0, 0.95.2
Reporter: Jason Dai
Assignee: Jie Huang
 Fix For: 0.94.2

 Attachments: hbase-6529.094.txt, hbase-6529.diff


 With HFile v2 implementation in HBase 0.94  0.96, the region server will use 
 HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
 Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
 {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
 Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6552) TestAcidGuarantees system test should flush more aggressively

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6552:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 TestAcidGuarantees system test should flush more aggressively
 -

 Key: HBASE-6552
 URL: https://issues.apache.org/jira/browse/HBASE-6552
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.2, 0.94.1, 0.95.2
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.94.2

 Attachments: HBASE-6552-94-92.patch, HBASE-6552-trunk.patch


 HBASE-5887 allowed TestAcidGuarantees to be run as a system test by avoiding 
 the call to util.flush().
 It would be better to go through the HBaseAdmin interface to force flushes.  
 This would unify the code path between the unit test and the system test, as 
 well as forcing more frequent flushes, which have previously been the source 
 of ACID guarantee problems, e.g. HBASE-2856.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an endless loop

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6561:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Gets/Puts with many columns send the RegionServer into an endless loop
 

 Key: HBASE-6561
 URL: https://issues.apache.org/jira/browse/HBASE-6561
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.2

 Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 
 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt


 This came from the mailing this:
 We were able to replicate this behavior in a pseudo-distributed hbase
 (hbase-0.94.1) environment. We wrote a test program that creates a test
 table MyTestTable and populates it with random rows, then it creates a
 row with 60,000 columns and repeatedly updates it. Each column has a 18
 byte qualifier and a 50 byte value. In our tests, when we ran the
 program, we usually never got beyond 15 updates before it would flush
 for a really long time. The rows that are being updated are about 4MB
 each (minues any hbase metadata).
 It doesn't seem like it's caused by GC. I turned on gc logging, and
 didn't see any long pauses. This is the gc log during the flush.
 http://pastebin.com/vJKKXDx5
 This is the regionserver log with debug on during the same flush
 http://pastebin.com/Fh5213mg
 This is the test program we wrote.
 http://pastebin.com/aZ0k5tx2
 You should be able to just compile it, and run it against a running
 HBase cluster.
 $ java TestTable
 Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6565) Coprocessor exec result Map is not thread safe

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6565:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Coprocessor exec result Map is not thread safe
 --

 Key: HBASE-6565
 URL: https://issues.apache.org/jira/browse/HBASE-6565
 Project: HBase
  Issue Type: Bug
  Components: Client, Coprocessors
Affects Versions: 0.92.2, 0.94.0, 0.95.2
 Environment: hadoop1.0.2,hbase0.94,jdk1.6
Reporter: Yuan Kang
Assignee: Yuan Kang
  Labels: coprocessors, patch
 Fix For: 0.94.2

 Attachments: Coprocessor-result-thread unsafe-bug-fix.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 I develop a coprocessor program ,but found some different results in repeated 
 tests.for example,normally,the result's size is 10.but sometimes it appears 9.
 I read the HTable.java code,found a TreeMap(thread-unsafe) be used in 
 multithreading environment.It cause the bug happened

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6576) HBaseAdmin.createTable should wait until the table is enabled

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6576:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 HBaseAdmin.createTable should wait until the table is enabled
 -

 Key: HBASE-6576
 URL: https://issues.apache.org/jira/browse/HBASE-6576
 Project: HBase
  Issue Type: Bug
  Components: Client, test
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.94.2

 Attachments: HBASE-6576-92.patch, HBASE-6576-94.patch, 
 HBASE-6576-trunk.patch


 The function:
 {code}
 public void createTable(final HTableDescriptor desc, byte [][] splitKeys)
 {code}
 in HBaseAdmin is synchronous and returns once all the regions of the table 
 are online, but does not wait for the table to be enabled, which is the last 
 step of table creation (see CreateTableHandler).
 This is confusing and leads to racy code because users do not realize that 
 this is the case.  For example, I saw the following test failure in 0.92 when 
 I ran:
 mvn test 
 -Dtest=org.apache.hadoop.hbase.client.TestAdmin#testEnableDisableAddColumnDeleteColumn
  
 {code}
 Error Message
 org.apache.hadoop.hbase.TableNotEnabledException: testMasterAdmin at 
 org.apache.hadoop.hbase.master.handler.DisableTableHandler.init(DisableTableHandler.java:75)
  at org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1154) at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597) at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
  at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)
 Stacktrace
 org.apache.hadoop.hbase.TableNotEnabledException: 
 org.apache.hadoop.hbase.TableNotEnabledException: testMasterAdmin
 at 
 org.apache.hadoop.hbase.master.handler.DisableTableHandler.init(DisableTableHandler.java:75)
 at org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1154)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)
 {code}
 The issue is that code will create and table and immediately disable it in 
 order to do some testing, for example, to test an operation that only works 
 when the table is disabled.  If the table has not been enabled yet, they will 
 get back a TableNotEnabledException.
 The specific test above was fixed in HBASE-5206, but other examples exist in 
 the code, for example the following:
 {code}
 hbase org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat newtable asdf14
 {code}
 The code in question is:
 {code}
 byte[] tname = args[1].getBytes();
 HTable table = util.createTable(tname, FAMILIES);
 HBaseAdmin admin = new HBaseAdmin(conf);
 admin.disableTable(tname);
 {code}
 It would be better if createTable just waited until the table was enabled, or 
 threw a TableNotEnabledException if it exhausted the configured number of 
 retries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6579) Unnecessary KV order check in StoreScanner

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6579:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Unnecessary KV order check in StoreScanner
 --

 Key: HBASE-6579
 URL: https://issues.apache.org/jira/browse/HBASE-6579
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.2

 Attachments: 6579.txt


 In StoreScanner.next(ListKeyValue, int, String) I find this code:
 {code}
   // Check that the heap gives us KVs in an increasing order.
   if (prevKV != null  comparator != null
comparator.compare(prevKV, kv)  0) {
 throw new IOException(Key  + prevKV +  followed by a  +
 smaller key  + kv +  in cf  + store);
   }
   prevKV = kv;
 {code}
 So this checks for bugs in the HFiles or the scanner code. It needs to 
 compare each KVs with its predecessor. This seems unnecessary now, I propose 
 that we remove this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6587) Region would be assigned twice in the case of all RS offline

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6587:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Region would be assigned twice in the case of all RS offline
 

 Key: HBASE-6587
 URL: https://issues.apache.org/jira/browse/HBASE-6587
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.1
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.94.2

 Attachments: 6587-0.94.patch, 6587.patch, HBASE-6587.patch


 In the TimeoutMonitor, we would act on time out for the regions if 
 (this.allRegionServersOffline  !noRSAvailable)
 The code is as the following:
 {code}
  if (regionState.getStamp() + timeout = now ||
   (this.allRegionServersOffline  !noRSAvailable)) {
   //decide on action upon timeout or, if some RSs just came back 
 online, we can start the
   // the assignment
   actOnTimeOut(regionState);
 }
 {code}
 But we found it exists a bug that it would act on time out for the region 
 which was assigned just now , and cause assigning the region twice.
 Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
 {code}
 2012-08-14 20:42:54,367 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan 
 to assign .META.,,1.1028785192 state=OFFLINE, ts=1
 344948174367, server=null
 2012-08-14 20:44:31,640 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for writete
 st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so 
 generated a random one; 
 hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, 
 available=1) available servers
 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x438f53bbf9b0acd Creating (or updating) unassigned node for 
 277b9b6df6de2b9be13
 53b4fa25f4222 with OFFLINE state
 2012-08-14 20:44:31,643 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
 2012-08-14 20:44:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
 region=277b9b6df6de2b9be1353b4fa25f4222
 // 异常的超时
 2012-08-14 20:44:32,518 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, 
 server=dw92.kgb.sqa.cm4,60020,1344948267642
 2012-08-14 20:44:32,518 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for 
 too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6603) RegionMetricsStorage.incrNumericMetric is called too often

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6603:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 RegionMetricsStorage.incrNumericMetric is called too often
 --

 Key: HBASE-6603
 URL: https://issues.apache.org/jira/browse/HBASE-6603
 Project: HBase
  Issue Type: Bug
  Components: Performance
Reporter: Lars Hofhansl
Assignee: M. Chen
 Fix For: 0.94.2

 Attachments: 6503-0.96.txt, 6603-0.94.txt


 Running an HBase scan load through the profiler revealed that 
 RegionMetricsStorage.incrNumericMetric is called way too often.
 It turns out that we make this call for *each* KV in StoreScanner.next(...).
 Incrementing AtomicLong requires expensive memory barriers.
 The observation here is that StoreScanner.next(...) can maintain a simple 
 long in its internal loop and only update the metric upon exit. Thus the 
 AtomicLong is not updated nearly as often.
 That cuts about 10% runtime from scan only load (I'll quantify this better 
 soon).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6608) Fix for HBASE-6160, META entries from daughters can be deleted before parent entries, shouldn't compare HRegionInfo's

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6608:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Fix for HBASE-6160, META entries from daughters can be deleted before parent 
 entries, shouldn't compare HRegionInfo's
 -

 Key: HBASE-6608
 URL: https://issues.apache.org/jira/browse/HBASE-6608
 Project: HBase
  Issue Type: Bug
  Components: Client, regionserver
Affects Versions: 0.92.1, 0.94.2, 0.95.2
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.94.2

 Attachments: 6608-v2.patch, hbase-6608_v1-0.92+0.94.patch, 
 hbase-6608_v1.patch


 Our nightlies discovered that the patch for HBASE-6160 did not actually fix 
 the issue of META entries from daughters can be deleted before parent 
 entries. Instead of reopening the HBASE-6160, it is cleaner to track it 
 here. 
 The original issue is: 
 {quote}
 HBASE-5986 fixed and issue, where the client sees the META entry for the 
 parent, but not the children. However, after the fix, we have seen the 
 following issue in tests:
 Region A is split to - B, C
 Region B is split to - D, E
 After some time, META entry for B is deleted since it is not needed anymore, 
 but META entry for Region A stays in META (C still refers it). In this case, 
 the client throws RegionOfflineException for B.
 {quote}
 The problem with the fix seems to be that we keep and compare HRegionInfo's 
 in the HashSet at CatalogJanitor.java#scan(), but HRI that are compared are 
 not equal.  
 {code}
 HashSetHRegionInfo parentNotCleaned = new HashSetHRegionInfo(); //regions 
 whose parents are still around
   for (Map.EntryHRegionInfo, Result e : splitParents.entrySet()) {
 if (!parentNotCleaned.contains(e.getKey())  cleanParent(e.getKey(), 
 e.getValue())) {
   cleaned++;
 } else {
 ...
 {code}
 In the above case, Meta row for region A will contain a serialized version of 
 B that is not offline. However Meta row for region B will contain a 
 serialized version of B that is offline (MetaEditor.offlineParentInMeta() 
 does that). So the deserialized version we put to HashSet and the 
 deserialized version we query contains() from HashSet are different in the 
 offline field, thus HRI.equals() fail. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6621) Reduce calls to Bytes.toInt

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6621:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Reduce calls to Bytes.toInt
 ---

 Key: HBASE-6621
 URL: https://issues.apache.org/jira/browse/HBASE-6621
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.2

 Attachments: 6621-0.96.txt, 6621-0.96-v2.txt, 6621-0.96-v3.txt, 
 6621-0.96-v4.txt


 Bytes.toInt shows up quite often in a profiler run.
 It turns out that one source is HFileReaderV2$ScannerV2.getKeyValue().
 Notice that we call the KeyValue(byte[], int) constructor, which forces the 
 constructor to determine its size by reading some of the header information 
 and calculate the size. In this case, however, we already know the size (from 
 the call to readKeyValueLen), so we could just use that.
 In the extreme case of 1's of columns this noticeably reduces CPU. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6647) [performance regression] appendNoSync/HBASE-4528 doesn't take deferred log flush into account

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6647:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 [performance regression] appendNoSync/HBASE-4528 doesn't take deferred log 
 flush into account
 -

 Key: HBASE-6647
 URL: https://issues.apache.org/jira/browse/HBASE-6647
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.2

 Attachments: HBASE-6647-0.94.patch, HBASE-6647-0.94-v2.patch, 
 HBASE-6647-0.94-v3.patch, HBASE-6647.pach


 Since we upgraded to 0.94.1 from 0.92 I saw that our ICVs are about twice as 
 slow as they were. jstack'ing I saw that most of the time we are waiting on 
 sync()... but those tables have deferred log flush turned on so they 
 shouldn't even be calling it.
 HTD.isDeferredLogFlush is currently only called in the append() methods which 
 are pretty much not in use anymore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6649:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
 ---

 Key: HBASE-6649
 URL: https://issues.apache.org/jira/browse/HBASE-6649
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.94.2

 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
 6649-fix-io-exception-handling-1.patch, 
 6649-fix-io-exception-handling-1-trunk.patch, 
 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
 #502 test - queueFailover [Jenkins].html


 Have seen it twice in the recent past: http://bit.ly/MPCykB  
 http://bit.ly/O79Dq7 .. 
 Looking briefly at the logs hints at a pattern - in both the failed test 
 instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6671) Kerberos authenticated super user should be able to retrieve proxied delegation tokens

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6671:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Kerberos authenticated super user should be able to retrieve proxied 
 delegation tokens
 --

 Key: HBASE-6671
 URL: https://issues.apache.org/jira/browse/HBASE-6671
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.1
Reporter: Francis Liu
Assignee: Francis Liu
 Fix For: 0.94.2

 Attachments: 6671-trunk-v2.txt, proxy_fix_94.patch, 
 proxy_fix_94.patch, proxy_fix_trunk.patch


 There a services such a oozie which perform actions in behalf of the user 
 using proxy authentication. Retrieving delegation tokens should support this 
 behavior. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6679) RegionServer aborts due to race between compaction and split

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6679:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 RegionServer aborts due to race between compaction and split
 

 Key: HBASE-6679
 URL: https://issues.apache.org/jira/browse/HBASE-6679
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.94.2

 Attachments: 6679-1.094.patch, 6679-1.patch, 
 rs-crash-parallel-compact-split.log


 In our nightlies, we have seen RS aborts due to compaction and split racing. 
 Original parent file gets deleted after the compaction, and hence, the 
 daughters don't find the parent data file. The RS kills itself when this 
 happens. Will attach a snippet of the relevant RS logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6685) Thrift DemoClient.pl got NullPointerException

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6685:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 Thrift DemoClient.pl got NullPointerException
 -

 Key: HBASE-6685
 URL: https://issues.apache.org/jira/browse/HBASE-6685
 Project: HBase
  Issue Type: Bug
  Components: Client, Thrift
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.94.2

 Attachments: trunk-6685.patch


 expected error: TSocket: Could not read 4 bytes from localhost:9090
 12/06/25 13:48:21 ERROR server.TThreadPoolServer: Error occurred during 
 processing of message. 
 java.lang.NullPointerException 
 at org.apache.hadoop.hbase.util.Bytes.getBytes(Bytes.java:765) 
 at 
 org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRowTs(ThriftServer.java:591)
  
 at 
 org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRow(ThriftServer.java:576)
  
 at 
 org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.getResult(Hbase.java:3630)
  
 at 
 org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.getResult(Hbase.java:3618)
  
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) 
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) 
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6686) HFile Quarantine fails with missing dirs in hadoop 2.0

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6686:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 HFile Quarantine fails with missing dirs in hadoop 2.0 
 ---

 Key: HBASE-6686
 URL: https://issues.apache.org/jira/browse/HBASE-6686
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.92.2, 0.94.2, 0.95.2
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.94.2

 Attachments: hbase-6686-94-92.patch


 Two unit tests fail because listStatus's semantics change between hadoop 1.0 
 and hadoop 2.0.  (specifically -- hadoop 1.0 returns empty array if used on 
 dir that does not exist, but hadoop 2.0 throws FileNotFoundException).
 here's the exception:
 {code}
 2012-08-28 16:01:19,789 WARN  [Thread-3155] hbck.HFileCorruptionChecker(230): 
 Failed to quaratine an HFile in regiondir 
 hdfs://localhost:38096/user/jenkins/hbase/testQuarantineMissingFamdir/34b2e072b33052bf4875f85513e9c669
 java.io.FileNotFoundException: File 
 hdfs://localhost:38096/user/jenkins/hbase/testQuarantineMissingFamdir/34b2e072b33052bf4875f85513e9c669/fam
  does not exist.
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:406)
   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1341)
   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1381)
   at 
 org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkColFamDir(HFileCorruptionChecker.java:152)
   at 
 org.apache.hadoop.hbase.util.TestHBaseFsck$2$1.checkColFamDir(TestHBaseFsck.java:1401)
   at 
 org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkRegionDir(HFileCorruptionChecker.java:185)
   at 
 org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:267)
   at 
 org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:258)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6688) folder referred by thrift demo app instructions is outdated

2013-04-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6688:
-

Fix Version/s: (was: 0.95.0)
   0.94.2

Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by 
Lars Hofhansl)

 folder referred by thrift demo app instructions is outdated
 ---

 Key: HBASE-6688
 URL: https://issues.apache.org/jira/browse/HBASE-6688
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.2
Reporter: Jimmy Xiang
Assignee: stack
Priority: Minor
 Fix For: 0.94.2

 Attachments: thrift094.txt, thrift.txt


 Due to the source tree module change for 0.96, the instructions in the thrift 
 demo example don't match the folder structure any more.
 In the instruction, it is referring to:
 ../../../src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift
 it should be
 ../../hbase-server/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >