[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-23 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4308:
--

Status: Open  (was: Patch Available)

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-23 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4308:
--

Status: Patch Available  (was: Open)

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-23 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4308:
--

Attachment: HBASE-4308_2.patch

Updated patch addressing Stack's comments.

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-23 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155762#comment-13155762
 ] 

stack commented on HBASE-4308:
--

Is this check the wrong way round Ram?

{code}
+if (!openedNodeDeleted) {
+  if (this.assignmentManager.getZKTable().isDisablingOrDisabledTable(
+  regionInfo.getTableNameAsString())) {
+debugLog(regionInfo, Opened region 
++ regionInfo.getRegionNameAsString() +  but 
++ this table is disabled, triggering close of region);
+assignmentManager.unassign(regionInfo);
+  }
 }
{code}

If we failed to delete the znode, only then you check if disabled?  Won't 
openedNodeDeleted be true if all goes well and this is when you want to check 
if region is of a disabling table?

It looks like in old code that we checked table disabling whether we succeeded 
znode delete or not?

Otherwise, I'm +1 on this patch (You can do fixup if I'm right and go ahead and 
commit)



 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4853:
-

Attachment: 4853-v4.txt

Working patch.  Not done yet.  Also has unit test to show hole (an edit is 
getting in and its seqid is sticking around in lastSeqid for the region w/o 
being cleared).

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155771#comment-13155771
 ] 

Hadoop QA commented on HBASE-4853:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504860/4853-v4.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/346//console

This message is automatically generated.

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4851) hadoop maven dependency needs to be an optional one

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155783#comment-13155783
 ] 

Hudson commented on HBASE-4851:
---

Integrated in HBase-TRUNK-security #6 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/6/])
HBASE-4851 hadoop maven dependency needs to be an optional one

stack : 
Files : 
* /hbase/trunk/pom.xml


 hadoop maven dependency needs to be an optional one
 ---

 Key: HBASE-4851
 URL: https://issues.apache.org/jira/browse/HBASE-4851
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.92.0, 0.94.0, 0.92.1
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Fix For: 0.92.0

 Attachments: HBASE-4851.92.patch.txt, HBASE-4851.trunk.patch.txt


 Given that HBase 0.92/0.94 is likely to be used with at least 3 different 
 versions of Hadoop (0.20, 0.22 and 0.23) it seems appropriate to make hadoop 
 maven dependencies into optional ones (IOW, the build of HBase will see NO 
 changes in behavior, but any component that has HBase as a dependency will be 
 in control of what version of Hadoop gets used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4825) TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large)

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155785#comment-13155785
 ] 

Hudson commented on HBASE-4825:
---

Integrated in HBase-TRUNK-security #6 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/6/])
HBASE-4825 TestRegionServersMetrics and TestZKLeaderManager are not 
categorized (small/medium/large); ADDENDUM; PARTIAL REVERT; MISTAKENLY 
COMMITTED TestCatalogTracker change
HBASE-4825 TestRegionServersMetrics and TestZKLeaderManager are not categorized 
(small/medium/large)

stack : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java

stack : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKLeaderManager.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperACL.java


 TestRegionServersMetrics and TestZKLeaderManager are not categorized 
 (small/medium/large)
 -

 Key: HBASE-4825
 URL: https://issues.apache.org/jira/browse/HBASE-4825
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.94.0

 Attachments: 4825_trunk_java.patch


 see title

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4849) TestCatalogTracker can fail if an existing zookeeper running

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155787#comment-13155787
 ] 

Hudson commented on HBASE-4849:
---

Integrated in HBase-TRUNK-security #6 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/6/])
HBASE-4849 TestCatalogTracker can fail if an existing zookeeper running

stack : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java


 TestCatalogTracker can fail if an existing zookeeper running
 

 Key: HBASE-4849
 URL: https://issues.apache.org/jira/browse/HBASE-4849
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 4849.txt


 This fact sunk my attempt at building an RC.  Fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155784#comment-13155784
 ] 

Hudson commented on HBASE-4854:
---

Integrated in HBase-TRUNK-security #6 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/6/])
HBASE-4854 it seems that CLASSPATH elements coming from Hadoop change HBase 
behaviour

stack : 
Files : 
* /hbase/trunk/bin/hbase


 it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
 --

 Key: HBASE-4854
 URL: https://issues.apache.org/jira/browse/HBASE-4854
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Fix For: 0.92.0

 Attachments: HBASE-4854.patch.txt


 It looks like HBASE-3465 introduced a slight change in behavior. The ordering 
 of classpath elements makes Hadoop ones go before the HBase ones, which leads 
 to log4j properties picked up from the wrong place, etc. It seems that the 
 easies way to fix that would be to revert the ordering of classpath.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155786#comment-13155786
 ] 

Hudson commented on HBASE-4842:
---

Integrated in HBase-TRUNK-security #6 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/6/])
HBASE-4842 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java


 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
 ---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch


 Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
 is intermittently failing.
 In the test, a region's assignment is purposely changed in META but not in 
 ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
 clean comes up with a new ZK assignment but with META still being 
 inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
 sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155788#comment-13155788
 ] 

Hudson commented on HBASE-4797:
---

Integrated in HBase-TRUNK-security #6 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/6/])
HBASE-4797 [availability] Skip recovered.edits files with edits we know 
older than what region currently has (Jimmy Jiang)

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4848) TestScanner failing because hostname can't be null

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155789#comment-13155789
 ] 

Hudson commented on HBASE-4848:
---

Integrated in HBase-TRUNK-security #6 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/6/])
HBASE-4848 TestScanner failing because hostname can't be null

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanner.java


 TestScanner failing because hostname can't be null
 --

 Key: HBASE-4848
 URL: https://issues.apache.org/jira/browse/HBASE-4848
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: stack
Assignee: stack
 Fix For: 0.90.5

 Attachments: 4848-092.txt, 4848.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-23 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155792#comment-13155792
 ] 

ramkrishna.s.vasudevan commented on HBASE-4308:
---

@Stack

Thanks for your review
{code}
+  private void makeRegionOnline(RegionState rs, HRegionInfo regionInfo) {
+regionOnline(regionInfo, rs.serverName);
+LOG.info(The master has opened the region 
++ regionInfo.getRegionNameAsString() +  that was online on 
++ rs.serverName);
+if (this.getZKTable().isDisablingOrDisabledTable(
+regionInfo.getTableNameAsString())) {
+  debugLog(regionInfo, Opened region 
+  + regionInfo.getRegionNameAsString() +  but 
+  + this table is disabled, triggering close of region);
+  unassign(regionInfo);
+}
+  }
{code}
I have not broken the logic of unassign if the table is disabled.  In 
OpenedRegionHandler also the same code is present even if deletion of the node 
fails.
Same way if it the callback comes on successful deletion even there this code 
is present.  Is it ok Stack? I will commit after your confirmation :)



 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-23 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155795#comment-13155795
 ] 

Hadoop QA commented on HBASE-4308:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504855/HBASE-4308_2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/345//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/345//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/345//console

This message is automatically generated.

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4519) 25s sleep when expiring sessions in tests

2011-11-23 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155815#comment-13155815
 ] 

nkeywal commented on HBASE-4519:


Fixed in HBASE-4798. We now set a timeout for the zookeeper of 0,5s, then we 
wait 7 seconds. It works.

 25s sleep when expiring sessions in tests
 -

 Key: HBASE-4519
 URL: https://issues.apache.org/jira/browse/HBASE-4519
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: nkeywal
 Fix For: 0.92.0


 There's a hardcoded 25 seconds sleep in HBaseTestingUtility.expireSession: 
 {code}
 int sessionTimeout = 5 * 1000; // 5 seconds
 ...
 final long sleep = sessionTimeout * 5L;
 LOG.info(ZK Closed Session 0x + Long.toHexString(sessionID) +
   ; sleeping= + sleep);
 Thread.sleep(sleep);
 {code}
 I'm pretty sure this can be lowered at lot, and it would speed up a couple of 
 tests. The only thing I'm afraid of is if this was made to accomodate flaky 
 tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4851) hadoop maven dependency needs to be an optional one

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155823#comment-13155823
 ] 

Hudson commented on HBASE-4851:
---

Integrated in HBase-0.92-security #8 (See 
[https://builds.apache.org/job/HBase-0.92-security/8/])
HBASE-4851 hadoop maven dependency needs to be an optional one

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/pom.xml


 hadoop maven dependency needs to be an optional one
 ---

 Key: HBASE-4851
 URL: https://issues.apache.org/jira/browse/HBASE-4851
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.92.0, 0.94.0, 0.92.1
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Fix For: 0.92.0

 Attachments: HBASE-4851.92.patch.txt, HBASE-4851.trunk.patch.txt


 Given that HBase 0.92/0.94 is likely to be used with at least 3 different 
 versions of Hadoop (0.20, 0.22 and 0.23) it seems appropriate to make hadoop 
 maven dependencies into optional ones (IOW, the build of HBase will see NO 
 changes in behavior, but any component that has HBase as a dependency will be 
 in control of what version of Hadoop gets used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155824#comment-13155824
 ] 

Hudson commented on HBASE-4854:
---

Integrated in HBase-0.92-security #8 (See 
[https://builds.apache.org/job/HBase-0.92-security/8/])
HBASE-4854 it seems that CLASSPATH elements coming from Hadoop change HBase 
behaviour

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/bin/hbase


 it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
 --

 Key: HBASE-4854
 URL: https://issues.apache.org/jira/browse/HBASE-4854
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Fix For: 0.92.0

 Attachments: HBASE-4854.patch.txt


 It looks like HBASE-3465 introduced a slight change in behavior. The ordering 
 of classpath elements makes Hadoop ones go before the HBase ones, which leads 
 to log4j properties picked up from the wrong place, etc. It seems that the 
 easies way to fix that would be to revert the ordering of classpath.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4851) hadoop maven dependency needs to be an optional one

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155867#comment-13155867
 ] 

Hudson commented on HBASE-4851:
---

Integrated in HBase-TRUNK #2474 (See 
[https://builds.apache.org/job/HBase-TRUNK/2474/])
HBASE-4851 hadoop maven dependency needs to be an optional one

stack : 
Files : 
* /hbase/trunk/pom.xml


 hadoop maven dependency needs to be an optional one
 ---

 Key: HBASE-4851
 URL: https://issues.apache.org/jira/browse/HBASE-4851
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.92.0, 0.94.0, 0.92.1
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Fix For: 0.92.0

 Attachments: HBASE-4851.92.patch.txt, HBASE-4851.trunk.patch.txt


 Given that HBase 0.92/0.94 is likely to be used with at least 3 different 
 versions of Hadoop (0.20, 0.22 and 0.23) it seems appropriate to make hadoop 
 maven dependencies into optional ones (IOW, the build of HBase will see NO 
 changes in behavior, but any component that has HBase as a dependency will be 
 in control of what version of Hadoop gets used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155869#comment-13155869
 ] 

Hudson commented on HBASE-4797:
---

Integrated in HBase-TRUNK #2474 (See 
[https://builds.apache.org/job/HBase-TRUNK/2474/])
HBASE-4797 [availability] Skip recovered.edits files with edits we know 
older than what region currently has (Jimmy Jiang)

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155868#comment-13155868
 ] 

Hudson commented on HBASE-4854:
---

Integrated in HBase-TRUNK #2474 (See 
[https://builds.apache.org/job/HBase-TRUNK/2474/])
HBASE-4854 it seems that CLASSPATH elements coming from Hadoop change HBase 
behaviour

stack : 
Files : 
* /hbase/trunk/bin/hbase


 it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
 --

 Key: HBASE-4854
 URL: https://issues.apache.org/jira/browse/HBASE-4854
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Fix For: 0.92.0

 Attachments: HBASE-4854.patch.txt


 It looks like HBASE-3465 introduced a slight change in behavior. The ordering 
 of classpath elements makes Hadoop ones go before the HBase ones, which leads 
 to log4j properties picked up from the wrong place, etc. It seems that the 
 easies way to fix that would be to revert the ordering of classpath.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-23 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155919#comment-13155919
 ] 

ramkrishna.s.vasudevan commented on HBASE-4855:
---

Will dig in more tomorrow.

 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan

 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-23 Thread ramkrishna.s.vasudevan (Created) (JIRA)
SplitLogManager hangs on cluster restart. 
--

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


Start a master and RS
RS goes down (kill -9)
Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is there 
it cannot be processed.
Restart both master and bring up an RS.
The master hangs in SplitLogManager.waitforTasks().

I feel that batch.done is not getting incremented properly.  Not yet digged in 
fully.

This may be the reason for occasional failure of 
TestDistributedLogSplitting.testWorkerAbort(). 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-23 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155921#comment-13155921
 ] 

ramkrishna.s.vasudevan commented on HBASE-4855:
---

{code}
java.lang.AssertionError
at 
org.apache.hadoop.hbase.master.SplitLogManager.heartbeat(SplitLogManager.java:466)
at 
org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:401)
at 
org.apache.hadoop.hbase.master.SplitLogManager.access$14(SplitLogManager.java:388)
at 
org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:914)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
{code}

Some time on restart i get this log also.

 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan

 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-23 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155933#comment-13155933
 ] 

stack commented on HBASE-4308:
--

+1 on commit.

I see now that the effect is the same.

In ORH, we'd run the disabling code regardless whether we deleted znode or not 
and whether region in RIT or not.  I see now that the disabling code will work 
for all three possible conditions still -- its just that one of the handlings 
has been moved up into AM; only two are done in ORH now.

Good work Ram.

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4519) 25s sleep when expiring sessions in tests

2011-11-23 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-4519.
--

   Resolution: Fixed
Fix Version/s: (was: 0.92.0)
   0.94.0

Fixed by hbase-4798

 25s sleep when expiring sessions in tests
 -

 Key: HBASE-4519
 URL: https://issues.apache.org/jira/browse/HBASE-4519
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: nkeywal
 Fix For: 0.94.0


 There's a hardcoded 25 seconds sleep in HBaseTestingUtility.expireSession: 
 {code}
 int sessionTimeout = 5 * 1000; // 5 seconds
 ...
 final long sleep = sessionTimeout * 5L;
 LOG.info(ZK Closed Session 0x + Long.toHexString(sessionID) +
   ; sleeping= + sleep);
 Thread.sleep(sleep);
 {code}
 I'm pretty sure this can be lowered at lot, and it would speed up a couple of 
 tests. The only thing I'm afraid of is if this was made to accomodate flaky 
 tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4783) Improve RowCounter to count rows in a specific key range.

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4783:
--

Status: Open  (was: Patch Available)

 Improve RowCounter to count rows in a specific key range.
 -

 Key: HBASE-4783
 URL: https://issues.apache.org/jira/browse/HBASE-4783
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.94.0

 Attachments: 4783.txt, HBASE-4783.patch


 Currently RowCounter in MR package is a very simple map only job that does a 
 full scan of a table. Enhance the utility to let the user specify a key range 
 and count the number of rows in this range. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4783) Improve RowCounter to count rows in a specific key range.

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4783:
--

Status: Patch Available  (was: Open)

 Improve RowCounter to count rows in a specific key range.
 -

 Key: HBASE-4783
 URL: https://issues.apache.org/jira/browse/HBASE-4783
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.94.0

 Attachments: 4783.txt, HBASE-4783.patch


 Currently RowCounter in MR package is a very simple map only job that does a 
 full scan of a table. Enhance the utility to let the user specify a key range 
 and count the number of rows in this range. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4783) Improve RowCounter to count rows in a specific key range.

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4783:
--

Attachment: 4783.txt

Patch usable by HadoopQA

 Improve RowCounter to count rows in a specific key range.
 -

 Key: HBASE-4783
 URL: https://issues.apache.org/jira/browse/HBASE-4783
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.94.0

 Attachments: 4783.txt, HBASE-4783.patch


 Currently RowCounter in MR package is a very simple map only job that does a 
 full scan of a table. Enhance the utility to let the user specify a key range 
 and count the number of rows in this range. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4811) Support reverse Scan

2011-11-23 Thread John Carrino (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155961#comment-13155961
 ] 

John Carrino commented on HBASE-4811:
-

Digging a littler deeper it appears that this was already planned when the V2 
HFile format was written.  In the header of a block is the offset of the 
previous block of the same type.  I think this is currently used to support 
efficient lookups when seeking to a location, but could also be used easily for 
reverse scan.

 Support reverse Scan
 

 Key: HBASE-4811
 URL: https://issues.apache.org/jira/browse/HBASE-4811
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.6
Reporter: John Carrino

 All the documentation I find about HBase says that if you want forward and 
 reverse scans you should just build 2 tables and one be ascending and one 
 descending.  Is there a fundamental reason that HBase only supports forward 
 Scan?  It seems like a lot of extra space overhead and coding overhead (to 
 keep them in sync) to support 2 tables.  
 I am assuming this has been discussed before, but I can't find the 
 discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4856) Unit tests under security profile need more heap space

2011-11-23 Thread Ted Yu (Created) (JIRA)
Unit tests under security profile need more heap space
--

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu


In more than one 0.92-security builds (build #9, e.g.), we had the following:
{code}
Running org.apache.hadoop.hbase.master.TestDistributedLogSplitting
Exception in thread ThreadedStreamConsumer java.lang.OutOfMemoryError: Java 
heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuffer.append(StringBuffer.java:224)
at 
org.apache.maven.surefire.report.TestSetRunListener.getAsString(TestSetRunListener.java:201)
at 
org.apache.maven.surefire.report.TestSetRunListener.testError(TestSetRunListener.java:139)
at 
org.apache.maven.plugin.surefire.booterclient.output.ForkClient.consumeLine(ForkClient.java:112)
Running org.apache.hadoop.hbase.master.TestMasterFailover
Exception in thread ThreadedStreamConsumer java.lang.OutOfMemoryError: Java 
heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuffer.append(StringBuffer.java:224)
at 
org.apache.maven.surefire.report.TestSetRunListener.getAsString(TestSetRunListener.java:201)
at 
org.apache.maven.surefire.report.TestSetRunListener.testError(TestSetRunListener.java:139)
{code}
We should increase maximum heap for tests under security profile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

2011-11-23 Thread Terry Siu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155973#comment-13155973
 ] 

Terry Siu commented on HBASE-3792:
--

Bryan, would be able to post a patch of the changes you are using for 0.90.4? I 
applied the trunk patch to 0.90.4 and aside from one minor flub, the patch was 
very clean. I left my mapreduce jobs to run overnight and am seeing ZK 
connections accummulating again, but at a slower rate, so now I'm wondering 
what differences exist between the changes you made for 0.90.4 versus the one 
you posted. Thanks!

 TableInputFormat leaks ZK connections
 -

 Key: HBASE-3792
 URL: https://issues.apache.org/jira/browse/HBASE-3792
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.1
 Environment: Java 1.6.0_24, Mac OS X 10.6.7
Reporter: Bryan Keller
 Attachments: tableinput.patch


 The TableInputFormat creates an HTable using a new Configuration object, and 
 it never cleans it up. When running a Mapper, the TableInputFormat is 
 instantiated and the ZK connection is created. While this connection is not 
 explicitly cleaned up, the Mapper process eventually exits and thus the 
 connection is closed. Ideally the TableRecordReader would close the 
 connection in its close() method rather than relying on the process to die 
 for connection cleanup. This is fairly easy to implement by overriding 
 TableRecordReader, and also overriding TableInputFormat to specify the new 
 record reader.
 The leak occurs when the JobClient is initializing and needs to retrieves the 
 splits. To get the splits, it instantiates a TableInputFormat. Doing so 
 creates a ZK connection that is never cleaned up. Unlike the mapper, however, 
 my job client process does not die. Thus the ZK connections accumulate.
 I was able to fix the problem by writing my own TableInputFormat that does 
 not initialize the HTable in the getConf() method and does not have an HTable 
 member variable. Rather, it has a variable for the table name. The HTable is 
 instantiated where needed and then cleaned up. For example, in the 
 getSplits() method, I create the HTable, then close the connection once the 
 splits are retrieved. I also create the HTable when creating the record 
 reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-23 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4847:
---

Status: Patch Available  (was: Open)

It seems to be ok, I will change the category of the test that fails from small 
to medium and we will be able to push it.

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch, 4847_pom.v2.patch, 4847_pom.v2.patch, 
 4847_pom.v2.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-23 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4847:
---

Attachment: 4847_pom.v2.patch

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch, 4847_pom.v2.patch, 4847_pom.v2.patch, 
 4847_pom.v2.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4783) Improve RowCounter to count rows in a specific key range.

2011-11-23 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-4783:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Improve RowCounter to count rows in a specific key range.
 -

 Key: HBASE-4783
 URL: https://issues.apache.org/jira/browse/HBASE-4783
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.94.0

 Attachments: 4783.txt, HBASE-4783.patch


 Currently RowCounter in MR package is a very simple map only job that does a 
 full scan of a table. Enhance the utility to let the user specify a key range 
 and count the number of rows in this range. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-23 Thread Gary Helmling (Created) (JIRA)
Recursive loop on KeeperException in 
AuthenticationTokenSecretManager/ZKLeaderManager
-

 Key: HBASE-4857
 URL: https://issues.apache.org/jira/browse/HBASE-4857
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
 Fix For: 0.92.0


Looking through stack traces for {{TestMasterFailover}}, I see a case where the 
leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when 
a {{KeeperException}} is encountered:
{noformat}
Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 waiting 
on condition [0x7f9fab376000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at java.lang.Thread.sleep(Thread.java:302)
at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
at 
org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154)
at 
org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
at 
org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
at 
org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
at 
org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
{noformat}

The {{KeeperException}} causes {{ZKLeaderManager}} to call 
{{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
{{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
{{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

2011-11-23 Thread Bryan Keller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155997#comment-13155997
 ] 

Bryan Keller commented on HBASE-3792:
-

Sure, I'll post a patch for 0.90.4 in a bit. There have been quite a few 
changes to ZK connection handling in trunk (deep compare of configs, reference 
counting), so it is possible the patch might need to be tweaked or the leak is 
somewhere else.

 TableInputFormat leaks ZK connections
 -

 Key: HBASE-3792
 URL: https://issues.apache.org/jira/browse/HBASE-3792
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.1
 Environment: Java 1.6.0_24, Mac OS X 10.6.7
Reporter: Bryan Keller
 Attachments: tableinput.patch


 The TableInputFormat creates an HTable using a new Configuration object, and 
 it never cleans it up. When running a Mapper, the TableInputFormat is 
 instantiated and the ZK connection is created. While this connection is not 
 explicitly cleaned up, the Mapper process eventually exits and thus the 
 connection is closed. Ideally the TableRecordReader would close the 
 connection in its close() method rather than relying on the process to die 
 for connection cleanup. This is fairly easy to implement by overriding 
 TableRecordReader, and also overriding TableInputFormat to specify the new 
 record reader.
 The leak occurs when the JobClient is initializing and needs to retrieves the 
 splits. To get the splits, it instantiates a TableInputFormat. Doing so 
 creates a ZK connection that is never cleaned up. Unlike the mapper, however, 
 my job client process does not die. Thus the ZK connections accumulate.
 I was able to fix the problem by writing my own TableInputFormat that does 
 not initialize the HTable in the getConf() method and does not have an HTable 
 member variable. Rather, it has a variable for the table name. The HTable is 
 instantiated where needed and then cleaned up. For example, in the 
 getSplits() method, I create the HTable, then close the connection once the 
 splits are retrieved. I also create the HTable when creating the record 
 reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-23 Thread Gary Helmling (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HBASE-4857:
-

Attachment: HBASE-4857.patch

The simple fix is to recognize when we are already stopping.

 Recursive loop on KeeperException in 
 AuthenticationTokenSecretManager/ZKLeaderManager
 -

 Key: HBASE-4857
 URL: https://issues.apache.org/jira/browse/HBASE-4857
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4857.patch


 Looking through stack traces for {{TestMasterFailover}}, I see a case where 
 the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop 
 when a {{KeeperException}} is encountered:
 {noformat}
 Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 
 waiting on condition [0x7f9fab376000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at java.lang.Thread.sleep(Thread.java:302)
 at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
 at 
 org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154)
 at 
 org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
 at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
 at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 The {{KeeperException}} causes {{ZKLeaderManager}} to call 
 {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
 {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
 {{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-23 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156007#comment-13156007
 ] 

Ted Yu commented on HBASE-4857:
---

Good catch, Gary.
+1 on patch.

 Recursive loop on KeeperException in 
 AuthenticationTokenSecretManager/ZKLeaderManager
 -

 Key: HBASE-4857
 URL: https://issues.apache.org/jira/browse/HBASE-4857
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4857.patch


 Looking through stack traces for {{TestMasterFailover}}, I see a case where 
 the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop 
 when a {{KeeperException}} is encountered:
 {noformat}
 Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 
 waiting on condition [0x7f9fab376000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at java.lang.Thread.sleep(Thread.java:302)
 at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
 at 
 org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154)
 at 
 org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
 at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
 at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 The {{KeeperException}} causes {{ZKLeaderManager}} to call 
 {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
 {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
 {{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4785) Improve recovery time of HBase client when a region server dies.

2011-11-23 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156011#comment-13156011
 ] 

Nicolas Spiegelberg commented on HBASE-4785:


@stack : You're correct about the missing entrySet().  There was a previous 
commit in 89-fb (r1181942) that I could not find a use for.  I guess it's this 
feature.

 Improve recovery time of HBase client when a region server dies.
 

 Key: HBASE-4785
 URL: https://issues.apache.org/jira/browse/HBASE-4785
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4785.patch


 When a region server dies, the HBase client waits until the RPC timesout 
 before learning that it needs to check META to find the new location of the 
 region. And it incurs this *timeout* cost for every region being served by 
 the dead region server. Remove this overhead by clearing the entries in cache 
 that have the dead region server as their values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-23 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156017#comment-13156017
 ] 

ramkrishna.s.vasudevan commented on HBASE-4857:
---

+1

 Recursive loop on KeeperException in 
 AuthenticationTokenSecretManager/ZKLeaderManager
 -

 Key: HBASE-4857
 URL: https://issues.apache.org/jira/browse/HBASE-4857
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4857.patch


 Looking through stack traces for {{TestMasterFailover}}, I see a case where 
 the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop 
 when a {{KeeperException}} is encountered:
 {noformat}
 Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 
 waiting on condition [0x7f9fab376000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at java.lang.Thread.sleep(Thread.java:302)
 at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
 at 
 org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154)
 at 
 org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
 at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
 at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 The {{KeeperException}} causes {{ZKLeaderManager}} to call 
 {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
 {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
 {{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4783) Improve RowCounter to count rows in a specific key range.

2011-11-23 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156018#comment-13156018
 ] 

Hadoop QA commented on HBASE-4783:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504889/4783.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.client.TestAdmin

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/347//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/347//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/347//console

This message is automatically generated.

 Improve RowCounter to count rows in a specific key range.
 -

 Key: HBASE-4783
 URL: https://issues.apache.org/jira/browse/HBASE-4783
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.94.0

 Attachments: 4783.txt, HBASE-4783.patch


 Currently RowCounter in MR package is a very simple map only job that does a 
 full scan of a table. Enhance the utility to let the user specify a key range 
 and count the number of rows in this range. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-23 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156030#comment-13156030
 ] 

Ted Yu commented on HBASE-4857:
---

Since zookeeper 3.4 is released, should we change the following in pom.xml as 
well ?
{code}
zookeeper.version3.4.0-SNAPSHOT/zookeeper.version
{code}

 Recursive loop on KeeperException in 
 AuthenticationTokenSecretManager/ZKLeaderManager
 -

 Key: HBASE-4857
 URL: https://issues.apache.org/jira/browse/HBASE-4857
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4857.patch


 Looking through stack traces for {{TestMasterFailover}}, I see a case where 
 the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop 
 when a {{KeeperException}} is encountered:
 {noformat}
 Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 
 waiting on condition [0x7f9fab376000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at java.lang.Thread.sleep(Thread.java:302)
 at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
 at 
 org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154)
 at 
 org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
 at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
 at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 The {{KeeperException}} causes {{ZKLeaderManager}} to call 
 {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
 {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
 {{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-23 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156043#comment-13156043
 ] 

Todd Lipcon commented on HBASE-4820:


Can we meet in the middle with this patch? A few suggestions that would make 
the patch more trivial to review:
- don't do the whitespace-only fixes in parts of the code you're not touching
- don't expand out the import foo.*s
- don't move the callback code to different parts of the file
- *do* fix variable names to conform to style, remove dead code, add javadoc, 
rename classes, etc.

This should make the patch very easy to look over and make sure it doesn't 
break anything. It'll then be easy for FB to pull it into their branch if they 
want.

 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4785) Improve recovery time of HBase client when a region server dies.

2011-11-23 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-4785:
---

Attachment: HBASE-4785.patch

Fixes SoftValueSortedMap.  Internal comments:

Currently SoftValueSortedMap.entrySet() tries to iteraate through the entry set 
of the underlying map, and put all the values (SoftValueK,V) to a newly 
created TreeSetEntryK,V. The entry set of SortedMap is already sorted, so 
it's not necessary to have a TreeSet to sort those entries again upon adding. 
This gets rid of the runtime class cast exception because it does not require 
comparing anymore.

 Improve recovery time of HBase client when a region server dies.
 

 Key: HBASE-4785
 URL: https://issues.apache.org/jira/browse/HBASE-4785
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4785.patch, HBASE-4785.patch


 When a region server dies, the HBase client waits until the RPC timesout 
 before learning that it needs to check META to find the new location of the 
 region. And it incurs this *timeout* cost for every region being served by 
 the dead region server. Remove this overhead by clearing the entries in cache 
 that have the dead region server as their values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4823) long running scans lose benefit of bloomfilters and timerange hints

2011-11-23 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-4823:
---

Attachment: HBASE-4823.D519.1.patch

aaiyer requested code review of HBASE-4823 [jira] long running scans lose 
benefit of bloomfilters and timerange hints.
Reviewers: JIRA

  Changes to the StoreScanner so that whenever we do a resetScannerStack
  we use the same getScanner() method as done in the constructor to ignore
  files that are not going to be touched by the scan.

  Includes a test to ensure correctness.

  When you have a long running scan due to say an MR job, you can lose the 
benefit of timerange hints  bloom filters midway if your scanner gets reset. 
span class=error[Note: The scanners can get reset say due to a flush or 
compaction]/span.

  In one of our workloads, we periodically want to do rollups on recent 15 
minutes of data in a column family... but the timerange hint benefit is lost 
midway when this resetScannerStack (shown below) happens. And end result-- we 
end up reading all the old HFiles rather than just the recent HFiles.   div 
class=code panel style=border-width: 1px;div class=codeContent 
panelContent pre class=code-javaspan class=code-keywordprivate/span 
void resetScannerStack(KeyValue lastTopKey) span 
class=code-keywordthrows/span IOException { span 
class=code-keywordif/span (heap != span class=code-keywordnull/span) 
{   span class=code-keywordthrow/span span 
class=code-keywordnew/span RuntimeException(span 
class=code-quoteStoreScanner.reseek run on an existing heap!/span); 
}  /* When we have the scan object, should we not pass it to getScanners()  
* to get a limited set of scanners? We did so in the constructor and we 
  * could have done it now by storing the scan object from the constructor 
*/ ListKeyValueScanner scanners = getScanners();/pre /div/div

  The comment in the code seems to be aware of this issue and even has the 
suggested fix!

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D519

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/regionserver/TestScannerResets.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/1149/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 long running scans lose benefit of bloomfilters and timerange hints
 ---

 Key: HBASE-4823
 URL: https://issues.apache.org/jira/browse/HBASE-4823
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Amitanand Aiyer
 Attachments: HBASE-4823.D519.1.patch, TestScannerResets-89fb.txt


 When you have a long running scan due to say an MR job, you can lose the 
 benefit of timerange hints  bloom filters midway if your scanner gets reset. 
 [Note: The scanners can get reset say due to a flush or compaction].
 In one of our workloads, we periodically want to do rollups on recent 15 
 minutes of data in a column family... but the timerange hint benefit is lost 
 midway when this resetScannerStack (shown below) happens. And end result-- we 
 end up reading all the old HFiles rather than just the recent HFiles.
 {code}
  private void resetScannerStack(KeyValue lastTopKey) throws IOException {
 if (heap != null) {
   throw new RuntimeException(StoreScanner.reseek run on an existing 
 heap!);
 }
 /* When we have the scan object, should we not pass it to getScanners()
  * to get a limited set of scanners? We did so in the constructor and we
  * could have done it now by storing the scan object from the constructor 
 */
 ListKeyValueScanner scanners = getScanners();
 {code}
 The comment in the code seems to be aware of this issue and even has the 
 suggested fix!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4856:
--

Description: 
Zookeeper 3.4.0 has been released.
We should upgade.

  was:
In more than one 0.92-security builds (build #9, e.g.), we had the following:
{code}
Running org.apache.hadoop.hbase.master.TestDistributedLogSplitting
Exception in thread ThreadedStreamConsumer java.lang.OutOfMemoryError: Java 
heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuffer.append(StringBuffer.java:224)
at 
org.apache.maven.surefire.report.TestSetRunListener.getAsString(TestSetRunListener.java:201)
at 
org.apache.maven.surefire.report.TestSetRunListener.testError(TestSetRunListener.java:139)
at 
org.apache.maven.plugin.surefire.booterclient.output.ForkClient.consumeLine(ForkClient.java:112)
Running org.apache.hadoop.hbase.master.TestMasterFailover
Exception in thread ThreadedStreamConsumer java.lang.OutOfMemoryError: Java 
heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuffer.append(StringBuffer.java:224)
at 
org.apache.maven.surefire.report.TestSetRunListener.getAsString(TestSetRunListener.java:201)
at 
org.apache.maven.surefire.report.TestSetRunListener.testError(TestSetRunListener.java:139)
{code}
We should increase maximum heap for tests under security profile

Summary: Upgrade zookeeper to 3.4.0 release  (was: Unit tests under 
security profile need more heap space)

 Upgrade zookeeper to 3.4.0 release
 --

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu

 Zookeeper 3.4.0 has been released.
 We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-23 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156201#comment-13156201
 ] 

Jean-Daniel Cryans commented on HBASE-4739:
---

bq. Do we need make a patch for 0.90.5 ? 

Like you said earlier:

bq. In 0.90 version, I think there is no this scenario, The closing zk node is 
only created by RS.

So we should be fine without it.

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, 
 HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-23 Thread Ted Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4856:
-

Assignee: Ted Yu

 Upgrade zookeeper to 3.4.0 release
 --

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu

 Zookeeper 3.4.0 has been released.
 We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4856:
--

Attachment: 4856.txt

 Upgrade zookeeper to 3.4.0 release
 --

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 4856.txt


 Zookeeper 3.4.0 has been released.
 We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4856:
--

Fix Version/s: 0.92.0

 Upgrade zookeeper to 3.4.0 release
 --

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 4856.txt


 Zookeeper 3.4.0 has been released.
 We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4823) long running scans lose benefit of bloomfilters and timerange hints

2011-11-23 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156204#comment-13156204
 ] 

Phabricator commented on HBASE-4823:


Kannan has accepted the revision HBASE-4823 [jira] long running scans lose 
benefit of bloomfilters and timerange hints.

  Super!

  +1 for commit.

REVISION DETAIL
  https://reviews.facebook.net/D519


 long running scans lose benefit of bloomfilters and timerange hints
 ---

 Key: HBASE-4823
 URL: https://issues.apache.org/jira/browse/HBASE-4823
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Amitanand Aiyer
 Attachments: HBASE-4823.D519.1.patch, TestScannerResets-89fb.txt


 When you have a long running scan due to say an MR job, you can lose the 
 benefit of timerange hints  bloom filters midway if your scanner gets reset. 
 [Note: The scanners can get reset say due to a flush or compaction].
 In one of our workloads, we periodically want to do rollups on recent 15 
 minutes of data in a column family... but the timerange hint benefit is lost 
 midway when this resetScannerStack (shown below) happens. And end result-- we 
 end up reading all the old HFiles rather than just the recent HFiles.
 {code}
  private void resetScannerStack(KeyValue lastTopKey) throws IOException {
 if (heap != null) {
   throw new RuntimeException(StoreScanner.reseek run on an existing 
 heap!);
 }
 /* When we have the scan object, should we not pass it to getScanners()
  * to get a limited set of scanners? We did so in the constructor and we
  * could have done it now by storing the scan object from the constructor 
 */
 ListKeyValueScanner scanners = getScanners();
 {code}
 The comment in the code seems to be aware of this issue and even has the 
 suggested fix!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4785) Improve recovery time of HBase client when a region server dies.

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4785:
--

Status: Patch Available  (was: Open)

 Improve recovery time of HBase client when a region server dies.
 

 Key: HBASE-4785
 URL: https://issues.apache.org/jira/browse/HBASE-4785
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4785.patch, HBASE-4785.patch


 When a region server dies, the HBase client waits until the RPC timesout 
 before learning that it needs to check META to find the new location of the 
 region. And it incurs this *timeout* cost for every region being served by 
 the dead region server. Remove this overhead by clearing the entries in cache 
 that have the dead region server as their values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4785) Improve recovery time of HBase client when a region server dies.

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4785:
--

Status: Open  (was: Patch Available)

 Improve recovery time of HBase client when a region server dies.
 

 Key: HBASE-4785
 URL: https://issues.apache.org/jira/browse/HBASE-4785
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4785.patch, HBASE-4785.patch


 When a region server dies, the HBase client waits until the RPC timesout 
 before learning that it needs to check META to find the new location of the 
 region. And it incurs this *timeout* cost for every region being served by 
 the dead region server. Remove this overhead by clearing the entries in cache 
 that have the dead region server as their values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-23 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156208#comment-13156208
 ] 

Todd Lipcon commented on HBASE-4856:


If we're separating a security build and non-security build, I'd recommend 
keeping the non-secure one at the 3.3 series. 3.4 has a lot of new features and 
my hunch is that there are going to be some bugs that shake out over the next 
few months.

 Upgrade zookeeper to 3.4.0 release
 --

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 4856.txt


 Zookeeper 3.4.0 has been released.
 We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-23 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156209#comment-13156209
 ] 

Hadoop QA commented on HBASE-4847:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504894/4847_pom.v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.util.TestFSTableDescriptors

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/348//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/348//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/348//console

This message is automatically generated.

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch, 4847_pom.v2.patch, 4847_pom.v2.patch, 
 4847_pom.v2.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-23 Thread Ted Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4857:
-

Assignee: Gary Helmling

 Recursive loop on KeeperException in 
 AuthenticationTokenSecretManager/ZKLeaderManager
 -

 Key: HBASE-4857
 URL: https://issues.apache.org/jira/browse/HBASE-4857
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4857.patch


 Looking through stack traces for {{TestMasterFailover}}, I see a case where 
 the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop 
 when a {{KeeperException}} is encountered:
 {noformat}
 Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 
 waiting on condition [0x7f9fab376000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at java.lang.Thread.sleep(Thread.java:302)
 at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
 at 
 org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154)
 at 
 org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
 at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
 at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 The {{KeeperException}} causes {{ZKLeaderManager}} to call 
 {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
 {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
 {{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4785) Improve recovery time of HBase client when a region server dies.

2011-11-23 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156207#comment-13156207
 ] 

Ted Yu commented on HBASE-4785:
---

+1 on patch v2.

 Improve recovery time of HBase client when a region server dies.
 

 Key: HBASE-4785
 URL: https://issues.apache.org/jira/browse/HBASE-4785
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4785.patch, HBASE-4785.patch


 When a region server dies, the HBase client waits until the RPC timesout 
 before learning that it needs to check META to find the new location of the 
 region. And it incurs this *timeout* cost for every region being served by 
 the dead region server. Remove this overhead by clearing the entries in cache 
 that have the dead region server as their values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4857:
--

Status: Patch Available  (was: Open)

 Recursive loop on KeeperException in 
 AuthenticationTokenSecretManager/ZKLeaderManager
 -

 Key: HBASE-4857
 URL: https://issues.apache.org/jira/browse/HBASE-4857
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4857.patch


 Looking through stack traces for {{TestMasterFailover}}, I see a case where 
 the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop 
 when a {{KeeperException}} is encountered:
 {noformat}
 Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 
 waiting on condition [0x7f9fab376000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at java.lang.Thread.sleep(Thread.java:302)
 at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
 at 
 org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154)
 at 
 org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
 at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
 at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 The {{KeeperException}} causes {{ZKLeaderManager}} to call 
 {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
 {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
 {{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-23 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156220#comment-13156220
 ] 

Ted Yu commented on HBASE-4856:
---

My reasoning was that the 3.4.0 zookeeper release would be more stable than 
3.4.0-SNAPSHOT build which would change after we release 0.92

When I switched zookeeper to 3.3.3 for non-secure build, I got:
{code}
[ERROR] 
/Users/zhihyu/92hbase/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java:[145,40]
 cannot find symbol
[ERROR] symbol  : class NIOServerCnxnFactory
[ERROR] location: class org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster
{code}

 Upgrade zookeeper to 3.4.0 release
 --

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 4856.txt


 Zookeeper 3.4.0 has been released.
 We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-23 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-4820:
---

Status: Patch Available  (was: Open)

 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-23 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-4820:
---

Status: Open  (was: Patch Available)

 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-23 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-4820:
---

Attachment: 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch

 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-23 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156235#comment-13156235
 ] 

jirapos...@reviews.apache.org commented on HBASE-4820:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2895/
---

(Updated 2011-11-23 19:58:09.915833)


Review request for hbase, Todd Lipcon and Jonathan Robie.


Changes
---

Per Todd's suggestion, the patch is enhanced for easy back porting.


Summary
---

Distributed log splitting coding enhancement to make it easier to understand, 
no semantics change.
It is some issue raised during the code review in back porting this feature to 
CDH.


This addresses bug HBASE-4820.
https://issues.apache.org/jira/browse/HBASE-4820


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 2101054 
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java d7a648d 
  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7dd67e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 
1d329b0 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 
21747b1 
  src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 
51daa1f 
  src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java c8684ec 

Diff: https://reviews.apache.org/r/2895/diff


Testing
---

Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, which 
are not related to this change.


Thanks,

Jimmy



 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-23 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156254#comment-13156254
 ] 

stack commented on HBASE-4856:
--

We can't do 3.3.3 zk and have a secure zk.  See conversation over in tail of 
HBASE-2418.

 Upgrade zookeeper to 3.4.0 release
 --

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 4856.txt


 Zookeeper 3.4.0 has been released.
 We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156258#comment-13156258
 ] 

Hudson commented on HBASE-4308:
---

Integrated in HBase-TRUNK #2475 (See 
[https://builds.apache.org/job/HBase-TRUNK/2475/])
HBASE-4308 Race between RegionOpenedHandler and AssignmentManager(Ram)

ramkrishna : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java


 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4783) Improve RowCounter to count rows in a specific key range.

2011-11-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156259#comment-13156259
 ] 

Hudson commented on HBASE-4783:
---

Integrated in HBase-TRUNK #2475 (See 
[https://builds.apache.org/job/HBase-TRUNK/2475/])
HBASE-4783 Improve RowCounter to count rows in a specific key range.

nspiegelberg : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Improve RowCounter to count rows in a specific key range.
 -

 Key: HBASE-4783
 URL: https://issues.apache.org/jira/browse/HBASE-4783
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.94.0

 Attachments: 4783.txt, HBASE-4783.patch


 Currently RowCounter in MR package is a very simple map only job that does a 
 full scan of a table. Enhance the utility to let the user specify a key range 
 and count the number of rows in this range. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4785) Improve recovery time of HBase client when a region server dies.

2011-11-23 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156269#comment-13156269
 ] 

Hadoop QA commented on HBASE-4785:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504907/HBASE-4785.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.regionserver.wal.TestLogRolling

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/349//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/349//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/349//console

This message is automatically generated.

 Improve recovery time of HBase client when a region server dies.
 

 Key: HBASE-4785
 URL: https://issues.apache.org/jira/browse/HBASE-4785
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4785.patch, HBASE-4785.patch


 When a region server dies, the HBase client waits until the RPC timesout 
 before learning that it needs to check META to find the new location of the 
 region. And it incurs this *timeout* cost for every region being served by 
 the dead region server. Remove this overhead by clearing the entries in cache 
 that have the dead region server as their values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4858) hbase-site.xml example in quickstart doesn't work in Linux

2011-11-23 Thread Bryce Allen (Created) (JIRA)
hbase-site.xml example in quickstart doesn't work in Linux
--

 Key: HBASE-4858
 URL: https://issues.apache.org/jira/browse/HBASE-4858
 Project: HBase
  Issue Type: Bug
  Components: documentation
 Environment: java version 1.6.0_23
OpenJDK Runtime Environment (IcedTea6 1.11pre) (6b23~pre11-1)
OpenJDK Client VM (build 20.0-b11, mixed mode, sharing)
Reporter: Bryce Allen
Priority: Minor


Under Linux with OpenJDK 1.6, using a file:///XX URL in the config file creates 
a directory called 'file:' in the hbase root directory. If I use a standard 
Unix absolute path, it works as expected. This may work on other platforms, but 
it would be good to add a note in the example:

{code}
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
configuration
  property
namehbase.rootdir/name
!-- Depending on your platform, this may create a 'file:' directory
 in hbase home instead of the desired behavior. Try using a standard
 platform specific absolute path instead. --
valuefile:///DIRECTORY/hbase/value
  /property
/configuration
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4605) Constraints

2011-11-23 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156280#comment-13156280
 ] 

Ted Yu commented on HBASE-4605:
---

@Jesse:
Patch v6 doesn't apply cleanly:
{code}
Hunk #13 FAILED at 1135.
1 out of 13 hunks FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java.rej
{code}
Do you mind uploading a patch (--no-prefix) which applies to TRUNK so that 
HadoopQA can run through it ?

Thanks

 Constraints
 ---

 Key: HBASE-4605
 URL: https://issues.apache.org/jira/browse/HBASE-4605
 Project: HBase
  Issue Type: Improvement
  Components: client, coprocessors
Affects Versions: 0.94.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: constraint_as_cp.txt, java_Constraint_v2.patch


 From Jesse's comment on dev:
 {quote}
 What I would like to propose is a simple interface that people can use to 
 implement a 'constraint' (matching the classic database definition). This 
 would help ease of adoption by helping HBase more easily check that box, help 
 minimize code duplication across organizations, and lead to easier adoption.
 Essentially, people would implement a 'Constraint' interface for checking 
 keys before they are put into a table. Puts that are valid get written to the 
 table, but if not people can will throw an exception that gets propagated 
 back to the client explaining why the put was invalid.
 Constraints would be set on a per-table basis and the user would be expected 
 to ensure the jars containing the constraint are present on the machines 
 serving that table.
 Yes, people could roll their own mechanism for doing this via coprocessors 
 each time, but this would make it easier to do so, so you only have to 
 implement a very minimal interface and not worry about the specifics.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4722) TestGlobalMemStoreSize has started failing

2011-11-23 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-4722.
--

Resolution: Won't Fix

This committed fix has done damage.  See HBASE-4853.  Closing as won't fix.

 TestGlobalMemStoreSize has started failing
 --

 Key: HBASE-4722
 URL: https://issues.apache.org/jira/browse/HBASE-4722
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Attachments: 4722.txt, logging-v2.txt, logging.txt


 I'm digging in.  It fails occasionally for me locally to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156283#comment-13156283
 ] 

stack commented on HBASE-4853:
--

Looks like this commit by me broke our memstore sizing: HBASE-4722.  It takes 
memstore flush size outside of an update lock (more edits may have come in in 
meantime).

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4605) Constraints

2011-11-23 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156287#comment-13156287
 ] 

Jesse Yates commented on HBASE-4605:


Yeah, sure. I actually just ran into the same issue trying to work on the shell 
stuff. 

Pushing up new version shortly.

 Constraints
 ---

 Key: HBASE-4605
 URL: https://issues.apache.org/jira/browse/HBASE-4605
 Project: HBase
  Issue Type: Improvement
  Components: client, coprocessors
Affects Versions: 0.94.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: constraint_as_cp.txt, java_Constraint_v2.patch


 From Jesse's comment on dev:
 {quote}
 What I would like to propose is a simple interface that people can use to 
 implement a 'constraint' (matching the classic database definition). This 
 would help ease of adoption by helping HBase more easily check that box, help 
 minimize code duplication across organizations, and lead to easier adoption.
 Essentially, people would implement a 'Constraint' interface for checking 
 keys before they are put into a table. Puts that are valid get written to the 
 table, but if not people can will throw an exception that gets propagated 
 back to the client explaining why the put was invalid.
 Constraints would be set on a per-table basis and the user would be expected 
 to ensure the jars containing the constraint are present on the machines 
 serving that table.
 Yes, people could roll their own mechanism for doing this via coprocessors 
 each time, but this would make it easier to do so, so you only have to 
 implement a very minimal interface and not worry about the specifics.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

2011-11-23 Thread Terry Siu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156289#comment-13156289
 ] 

Terry Siu commented on HBASE-3792:
--

Thanks, Bryan, looking forward to getting the 0.90.4 patch.

 TableInputFormat leaks ZK connections
 -

 Key: HBASE-3792
 URL: https://issues.apache.org/jira/browse/HBASE-3792
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.1
 Environment: Java 1.6.0_24, Mac OS X 10.6.7
Reporter: Bryan Keller
 Attachments: tableinput.patch


 The TableInputFormat creates an HTable using a new Configuration object, and 
 it never cleans it up. When running a Mapper, the TableInputFormat is 
 instantiated and the ZK connection is created. While this connection is not 
 explicitly cleaned up, the Mapper process eventually exits and thus the 
 connection is closed. Ideally the TableRecordReader would close the 
 connection in its close() method rather than relying on the process to die 
 for connection cleanup. This is fairly easy to implement by overriding 
 TableRecordReader, and also overriding TableInputFormat to specify the new 
 record reader.
 The leak occurs when the JobClient is initializing and needs to retrieves the 
 splits. To get the splits, it instantiates a TableInputFormat. Doing so 
 creates a ZK connection that is never cleaned up. Unlike the mapper, however, 
 my job client process does not die. Thus the ZK connections accumulate.
 I was able to fix the problem by writing my own TableInputFormat that does 
 not initialize the HTable in the getConf() method and does not have an HTable 
 member variable. Rather, it has a variable for the table name. The HTable is 
 instantiated where needed and then cleaned up. For example, in the 
 getSplits() method, I create the HTable, then close the connection once the 
 splits are retrieved. I also create the HTable when creating the record 
 reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4605) Constraints

2011-11-23 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156290#comment-13156290
 ] 

jirapos...@reviews.apache.org commented on HBASE-4605:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2579/
---

(Updated 2011-11-23 21:19:56.263794)


Review request for hbase.


Changes
---

Updating to current trunk to take into account changes in HTD and for Hadoop 
QA. Otherwise, no changes from last diff.


Summary
---

Most of the implementation for adding constraints as a coprocessor. 

Looking for general comments on style/structure, though nitpicks are ok too. 

Currently missing implementation for disableConstraints() since that will 
require adding removeCoprocessor() to HTD (also comments on if this is worth it 
would be good). 


This addresses bug HBASE-4605.
https://issues.apache.org/jira/browse/HBASE-4605


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 84a0d1a 
  src/main/java/org/apache/hadoop/hbase/constraint/BaseConstraint.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/constraint/Constraint.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/constraint/ConstraintException.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/constraint/ConstraintProcessor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/constraint/Constraints.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/constraint/IntegerConstraint.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/constraint/package-info.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/TestHTableDescriptor.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/constraint/AllFailConstraint.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/constraint/AllPassConstraint.java 
PRE-CREATION 
  
src/test/java/org/apache/hadoop/hbase/constraint/CheckConfigurationConstraint.java
 PRE-CREATION 
  
src/test/java/org/apache/hadoop/hbase/constraint/IntegrationTestConstraint.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/constraint/RuntimeFailConstraint.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/constraint/TestConstraints.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/constraint/TestIntegerConstraint.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/constraint/WorksConstraint.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/2579/diff


Testing
---

Adding IntegrationTestConstraint and unit tests for Constraints and 
IntegerConstraint. All of those pass.


Thanks,

Jesse



 Constraints
 ---

 Key: HBASE-4605
 URL: https://issues.apache.org/jira/browse/HBASE-4605
 Project: HBase
  Issue Type: Improvement
  Components: client, coprocessors
Affects Versions: 0.94.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: constraint_as_cp.txt, java_Constraint_v2.patch


 From Jesse's comment on dev:
 {quote}
 What I would like to propose is a simple interface that people can use to 
 implement a 'constraint' (matching the classic database definition). This 
 would help ease of adoption by helping HBase more easily check that box, help 
 minimize code duplication across organizations, and lead to easier adoption.
 Essentially, people would implement a 'Constraint' interface for checking 
 keys before they are put into a table. Puts that are valid get written to the 
 table, but if not people can will throw an exception that gets propagated 
 back to the client explaining why the put was invalid.
 Constraints would be set on a per-table basis and the user would be expected 
 to ensure the jars containing the constraint are present on the machines 
 serving that table.
 Yes, people could roll their own mechanism for doing this via coprocessors 
 each time, but this would make it easier to do so, so you only have to 
 implement a very minimal interface and not worry about the specifics.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-23 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156292#comment-13156292
 ] 

Hadoop QA commented on HBASE-4857:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504898/HBASE-4857.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.coprocessor.TestMasterObserver
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.client.TestInstantSchemaChange

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/350//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/350//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/350//console

This message is automatically generated.

 Recursive loop on KeeperException in 
 AuthenticationTokenSecretManager/ZKLeaderManager
 -

 Key: HBASE-4857
 URL: https://issues.apache.org/jira/browse/HBASE-4857
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4857.patch


 Looking through stack traces for {{TestMasterFailover}}, I see a case where 
 the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop 
 when a {{KeeperException}} is encountered:
 {noformat}
 Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 
 waiting on condition [0x7f9fab376000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at java.lang.Thread.sleep(Thread.java:302)
 at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
 at 
 org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154)
 at 
 org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
 at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
 at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 The {{KeeperException}} 

[jira] [Updated] (HBASE-4605) Constraints

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4605:
--

Attachment: 4605.v7

 Constraints
 ---

 Key: HBASE-4605
 URL: https://issues.apache.org/jira/browse/HBASE-4605
 Project: HBase
  Issue Type: Improvement
  Components: client, coprocessors
Affects Versions: 0.94.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: 4605.v7, constraint_as_cp.txt, java_Constraint_v2.patch


 From Jesse's comment on dev:
 {quote}
 What I would like to propose is a simple interface that people can use to 
 implement a 'constraint' (matching the classic database definition). This 
 would help ease of adoption by helping HBase more easily check that box, help 
 minimize code duplication across organizations, and lead to easier adoption.
 Essentially, people would implement a 'Constraint' interface for checking 
 keys before they are put into a table. Puts that are valid get written to the 
 table, but if not people can will throw an exception that gets propagated 
 back to the client explaining why the put was invalid.
 Constraints would be set on a per-table basis and the user would be expected 
 to ensure the jars containing the constraint are present on the machines 
 serving that table.
 Yes, people could roll their own mechanism for doing this via coprocessors 
 each time, but this would make it easier to do so, so you only have to 
 implement a very minimal interface and not worry about the specifics.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4605) Constraints

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4605:
--

Status: Patch Available  (was: Open)

Patch testing v7.

 Constraints
 ---

 Key: HBASE-4605
 URL: https://issues.apache.org/jira/browse/HBASE-4605
 Project: HBase
  Issue Type: Improvement
  Components: client, coprocessors
Affects Versions: 0.94.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: 4605.v7, constraint_as_cp.txt, java_Constraint_v2.patch


 From Jesse's comment on dev:
 {quote}
 What I would like to propose is a simple interface that people can use to 
 implement a 'constraint' (matching the classic database definition). This 
 would help ease of adoption by helping HBase more easily check that box, help 
 minimize code duplication across organizations, and lead to easier adoption.
 Essentially, people would implement a 'Constraint' interface for checking 
 keys before they are put into a table. Puts that are valid get written to the 
 table, but if not people can will throw an exception that gets propagated 
 back to the client explaining why the put was invalid.
 Constraints would be set on a per-table basis and the user would be expected 
 to ensure the jars containing the constraint are present on the machines 
 serving that table.
 Yes, people could roll their own mechanism for doing this via coprocessors 
 each time, but this would make it easier to do so, so you only have to 
 implement a very minimal interface and not worry about the specifics.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92

2011-11-23 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156319#comment-13156319
 ] 

Lars Hofhansl commented on HBASE-4838:
--

With the above scenario what I found is this:
o the table is populated with only two KV: aaa and aab.
o after the split there two regions: ['', aaa) and [aaa,'')
x the client scanner first tries the 1st region
o then it tries the 2nd region

The X is where the difference is. In trunk (and unpatched 0.92), the region's 
internal scanner finds no KVs (as it should) and returns an empty result to the 
client scanner, which then proceeds to the next region.
In 0.92 with this patch, the region's internal scanner actually finds both aaa 
and aab in the 1st region (which is wrong), and then again the 2nd region 
(which is correct).

I don't know, yet, why this is happening, though. Maybe the scanner picks up 
the wrong store files, or there a problem with flushes or compactions.


 Port 2856 (TestAcidGuarantee is failing) to 0.92
 

 Key: HBASE-4838
 URL: https://issues.apache.org/jira/browse/HBASE-4838
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4838-v1.txt


 Moving back port into a separate issue (as suggested by JonH), because this 
 not trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-23 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156323#comment-13156323
 ] 

Ted Yu commented on HBASE-4855:
---

The above assertion error meant there was duplicate heartbeat:
{code}
  assert false;
  LOG.warn(got dup heartbeat for  + path +  ver =  + new_version);
{code}
We should either ignore the dup heartbeat or make the assertion message clearer.

 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan

 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-23 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156326#comment-13156326
 ] 

Ted Yu commented on HBASE-4855:
---

@Ramkrishna:
Can you post from the log file the following:
{code}
  status.setStatus(Waiting for distributed tasks to finish. 
  +  scheduled= + batch.installed
  +  done= + batch.done
  +  error= + batch.error);
{code}
It is interesting that neither done nor error counts increased. Or maybe their 
sum became greater than batch.installed ?

 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan

 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4853:
-

Attachment: 4853-v5.txt

Here's a fix.  I need a review given how this patch is actually revert of two 
commits I've made -- one recent and another a couple of months ago.

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853-v5.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4853:
-

Assignee: stack
  Status: Open  (was: Patch Available)

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853-v5.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4853:
-

Status: Patch Available  (was: Open)

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853-v5.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-23 Thread Andrew Purtell (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-4857:
--

Priority: Critical  (was: Major)

Nice catch. +1 on commit and raise to Critical.

 Recursive loop on KeeperException in 
 AuthenticationTokenSecretManager/ZKLeaderManager
 -

 Key: HBASE-4857
 URL: https://issues.apache.org/jira/browse/HBASE-4857
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-4857.patch


 Looking through stack traces for {{TestMasterFailover}}, I see a case where 
 the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop 
 when a {{KeeperException}} is encountered:
 {noformat}
 Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 
 waiting on condition [0x7f9fab376000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at java.lang.Thread.sleep(Thread.java:302)
 at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
 at 
 org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154)
 at 
 org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
 at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
 at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 The {{KeeperException}} causes {{ZKLeaderManager}} to call 
 {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
 {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
 {{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156332#comment-13156332
 ] 

stack commented on HBASE-4853:
--

Here's some explaination:

M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  On flush of memstores, we were decrementing the global region 
  memory size by the size of the global memstore AT THE TIME OF
  THE DECREMENT rather than decrementing by the flush size (some
  edits may very well have come in in between the setup of flush
  and decrement time).  This change undoes a brain-dead change
  of mine in hbase-4722.  That broke this.
M src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
  Remove flagging of the original problem, our leaving an old
  edit id in the lastSeqWritten for a region that was offline.

I tried to write a test but its too tough at mo.  You need to get some edits 
into the memstore AFTER the update lock is freed down in internalFlushCache but 
BEFORE we decrement memstore size.  Only way to make it work would be by 
mod'ing HRegion to insert a do-nothing method.  Too dumb.

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853-v5.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4853:
-

Status: Open  (was: Patch Available)

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853-v5.txt, 4853-v6.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4853:
-

Attachment: 4853-v6.txt

Same patch with better variable naming.

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853-v5.txt, 4853-v6.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4853:
-

Status: Patch Available  (was: Open)

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853-v5.txt, 4853-v6.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92

2011-11-23 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156350#comment-13156350
 ] 

Todd Lipcon commented on HBASE-4838:


Maybe a problem with the HalfHFile references? After a compaction of the split 
daughters, does the doubling persist?

 Port 2856 (TestAcidGuarantee is failing) to 0.92
 

 Key: HBASE-4838
 URL: https://issues.apache.org/jira/browse/HBASE-4838
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4838-v1.txt


 Moving back port into a separate issue (as suggested by JonH), because this 
 not trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156356#comment-13156356
 ] 

Ted Yu commented on HBASE-4853:
---

With patch v5, I got the following:
{code}
testGlobalMemStore(org.apache.hadoop.hbase.TestGlobalMemStoreSize)  Time 
elapsed: 11.516 sec   FAILURE!
java.lang.AssertionError: Server=10.246.204.31,62993,1322086547613, i=0 
expected:0 but was:608
{code}
Here is tail of test output:
{code}
2011-11-23 14:15:55,955 INFO  [main] regionserver.Store(631): Added 
hdfs://localhost:62971/user/zhihyu/.META./1028785192/info/6d51d01d9498464eb025ca045e696ce4,
 entries=47, sequenceid=36, filesize=8.4k
2011-11-23 14:15:55,956 INFO  [main] regionserver.HRegion(1396): Finished 
memstore flush of ~17.2k/17608 for region .META.,,1.1028785192 in 44ms, 
sequenceid=36, compaction requested=false
2011-11-23 14:15:55,956 INFO  [main] hbase.TestGlobalMemStoreSize(99): Flush 
.META.,,1.1028785192 on 10.246.204.31,62993,1322086547613, false, size=608
2011-11-23 14:15:55,957 INFO  [main] hbase.TestGlobalMemStoreSize(99): Flush 
TestGlobalMemStoreSize,,1322086555196.e2b7276e785c7f6213a5bdd08a54cf8e. on 
10.246.204.31,62993,1322086547613, false, size=608
2011-11-23 14:15:55,957 INFO  [main] hbase.TestGlobalMemStoreSize(99): Flush 
TestGlobalMemStoreSize,c,P\xE3+,1322086555201.2c847584e6af6e64f3bae631bd722934. 
on 10.246.204.31,62993,1322086547613, false, size=608
2011-11-23 14:15:55,957 INFO  [main] hbase.TestGlobalMemStoreSize(99): Flush 
TestGlobalMemStoreSize,q\x83\xCC\xF1{,1322086555217.f5079469f9fa696de61b9db6364cd6e7.
 on 10.246.204.31,62993,1322086547613, false, size=608
2011-11-23 14:15:55,957 INFO  [main] hbase.TestGlobalMemStoreSize(101): Post 
flush on 10.246.204.31,62993,1322086547613
{code}
Basically there was no mentioning of flush completion for 
TestGlobalMemStoreSize table.

I think we should add a log before the assertion so that we know how long we 
spent waiting in the while loop:
{code}
  assertEquals(Server= + server.getServerName() + , i= + i++, 0,
server.getRegionServerAccounting().getGlobalMemstoreSize());
{code}
We should increase the wait time beyond 3 seconds.

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853-v5.txt, 4853-v6.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-23 Thread Gary Helmling (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156355#comment-13156355
 ] 

Gary Helmling commented on HBASE-4857:
--

The TestMasterObserver failure from hadoopqa is odd, but doesn't seem to be 
caused by this patch.  The TestAdmin failure is from exhausted file handles:

{noformat}
Caused by: java.io.IOException: Too many open files
at sun.nio.ch.IOUtil.initPipe(Native Method)
at sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:49)
at 
sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
at java.nio.channels.Selector.open(Selector.java:209)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.init(ClientCnxnSocketNIO.java:42)
at sun.reflect.GeneratedConstructorAccessor41.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at 
org.apache.zookeeper.ZooKeeper.getClientCnxnSocket(ZooKeeper.java:1737)
... 55 more
{noformat}

Going to go ahead with commit.

 Recursive loop on KeeperException in 
 AuthenticationTokenSecretManager/ZKLeaderManager
 -

 Key: HBASE-4857
 URL: https://issues.apache.org/jira/browse/HBASE-4857
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-4857.patch


 Looking through stack traces for {{TestMasterFailover}}, I see a case where 
 the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop 
 when a {{KeeperException}} is encountered:
 {noformat}
 Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 
 waiting on condition [0x7f9fab376000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at java.lang.Thread.sleep(Thread.java:302)
 at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
 at 
 org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154)
 at 
 org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
 at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
 at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 The {{KeeperException}} causes {{ZKLeaderManager}} to call 
 {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
 {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
 {{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92

2011-11-23 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156357#comment-13156357
 ] 

stack commented on HBASE-4838:
--

Yeah, look see if TRUNK has a fix in Reference or HalfStoreFileReader.

 Port 2856 (TestAcidGuarantee is failing) to 0.92
 

 Key: HBASE-4838
 URL: https://issues.apache.org/jira/browse/HBASE-4838
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4838-v1.txt


 Moving back port into a separate issue (as suggested by JonH), because this 
 not trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-23 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156359#comment-13156359
 ] 

Hadoop QA commented on HBASE-4820:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504918/0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.replication.TestReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/351//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/351//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/351//console

This message is automatically generated.

 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156366#comment-13156366
 ] 

stack commented on HBASE-4853:
--

hmm... that don't fail for me and the change shouldn't effect this test.

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853-v5.txt, 4853-v6.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-23 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156367#comment-13156367
 ] 

jirapos...@reviews.apache.org commented on HBASE-4820:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2895/#review3492
---



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
https://reviews.apache.org/r/2895/#comment7770

put the edits where?


- Todd


On 2011-11-23 19:58:09, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2895/
bq.  ---
bq.  
bq.  (Updated 2011-11-23 19:58:09)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon and Jonathan Robie.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Distributed log splitting coding enhancement to make it easier to 
understand, no semantics change.
bq.  It is some issue raised during the code review in back porting this 
feature to CDH.
bq.  
bq.  
bq.  This addresses bug HBASE-4820.
bq.  https://issues.apache.org/jira/browse/HBASE-4820
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
2101054 
bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 
d7a648d 
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
7dd67e9 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 
1d329b0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 
21747b1 
bq.
src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 
51daa1f 
bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 
c8684ec 
bq.  
bq.  Diff: https://reviews.apache.org/r/2895/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, 
which are not related to this change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156368#comment-13156368
 ] 

Ted Yu commented on HBASE-4853:
---

By increasing timeout to 6 seconds (Pardon me, N), I wasn't able to reproduce 
failure in TestGlobalMemStoreSize after 20 iterations:
{code}
Index: src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java
===
--- src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java   
(revision 1205638)
+++ src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java   
(working copy)
@@ -100,11 +100,12 @@
   }
   LOG.info(Post flush on  + server.getServerName());
   long now = System.currentTimeMillis();
-  long timeout = now + 3000;
+  long timeout = now + 6000;
   while(server.getRegionServerAccounting().getGlobalMemstoreSize() != 0 
   timeout  System.currentTimeMillis()) {
 Threads.sleep(10);
   }
+  LOG.info(About to check GlobalMemstoreSize);
   assertEquals(Server= + server.getServerName() + , i= + i++, 0,
 server.getRegionServerAccounting().getGlobalMemstoreSize());
 }
{code}

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853-v5.txt, 4853-v6.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92

2011-11-23 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156369#comment-13156369
 ] 

Lars Hofhansl commented on HBASE-4838:
--

Reference.java and HalfStoreFileReader.java are identical between 0.92 and 
trunk (and neither Reference nor HalfStoreFileReader appear in this patch), so 
that is likely not the cause.

I also verified now that it picks up the correct store file (judged by the 
filename), which means the content of the store file is not correct. I thought 
maybe it had to do with ignoring the version counts in the ColumnTrackers, but 
that does not appear to be the problem.

... going to have to shelve this for a bit to work on some other stuff.


 Port 2856 (TestAcidGuarantee is failing) to 0.92
 

 Key: HBASE-4838
 URL: https://issues.apache.org/jira/browse/HBASE-4838
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4838-v1.txt


 Moving back port into a separate issue (as suggested by JonH), because this 
 not trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-23 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156370#comment-13156370
 ] 

Ted Yu commented on HBASE-4853:
---

We should let TestGlobalMemStoreSize pass consistently. HBASE-4722 tried to 
solve this issue.

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 
 4853-v5.txt, 4853-v6.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-23 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-4820:
---

Attachment: 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch

 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch, 
 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >