date:20120313

[
https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228236#comment-13228236
]

stack commented on HBASE-5569:
--

Ugh. Indexing JIRA lost my comment.

Looking at builds, we don't have much of a history on trunk builds but
TestAtomicOperation started failing today when HBASE-5399 Cut the link between
the client and the zookeeper ensemble went in (among others). I see over in
hadoopqa builds that it doesn't fail if I go back twenty odd builds. It did
break here,
https://builds.apache.org/view/G-L/view/HBase/job/PreCommit-HBASE-Build/1168/,
and on a later build. Should I try reverting it?

TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
---

Key: HBASE-5569
URL: https://issues.apache.org/jira/browse/HBASE-5569
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor

What I pieced together so far is that it is the *scanning* side that has
problems sometimes.
Every time I see a assertion failure in the log I see this before:
{quote}
2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499):
Storescanner.peek() is changed where before =
rowB/colfamily11:qual1/75366/Put/vlen=6,and after =
rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
{quote}
The order of if the Put and Delete is sometimes reversed.
The test threads should always see exactly one KV, if the before was the
Put the thread see 0 KVs, if the before was the Delete the threads see 2
KVs.
This debug message comes from StoreScanner to checkReseek. It seems we still
some consistency issue with scanning sometimes :(

[jira] [Commented] (HBASE-5059) Tests for: Support deleted rows in CopyTable


[ 
https://issues.apache.org/jira/browse/HBASE-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228241#comment-13228241
 ] 

stack commented on HBASE-5059:
--

@Evan If you poke around, you can find the logs of your test run up on the 
apache build server.  If you run the test local, does it pass?  To retry your 
patch against hadoopqa, hit 'cancel patch' above and then reattach and then hit 
'submit patch' again.  See if you patch fails second or third time; if it does, 
add debug to your patch to help figure whats going on?  Thanks.

 Tests for: Support deleted rows in CopyTable
 

 Key: HBASE-5059
 URL: https://issues.apache.org/jira/browse/HBASE-5059
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Evan Beard
Priority: Minor
 Fix For: 0.94.0

 Attachments: TestCopyTable_HBASE_5059.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5059) Tests for: Support deleted rows in CopyTable


[ 
https://issues.apache.org/jira/browse/HBASE-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228242#comment-13228242
 ] 

stack commented on HBASE-5059:
--

@Evan If you poke around, you can find the logs of your test run up on the 
apache build server.  If you run the test local, does it pass?  To retry your 
patch against hadoopqa, hit 'cancel patch' above and then reattach and then hit 
'submit patch' again.  See if you patch fails second or third time; if it does, 
add debug to your patch to help figure whats going on?  Thanks.

 Tests for: Support deleted rows in CopyTable
 

 Key: HBASE-5059
 URL: https://issues.apache.org/jira/browse/HBASE-5059
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Evan Beard
Priority: Minor
 Fix For: 0.94.0

 Attachments: TestCopyTable_HBASE_5059.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5434) [REST] Include more metrics in cluster status request


 [ 
https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5434:
-

Attachment: HBASE-5434.trunk.v2.patch

 [REST] Include more metrics in cluster status request
 -

 Key: HBASE-5434
 URL: https://issues.apache.org/jira/browse/HBASE-5434
 Project: HBase
  Issue Type: Improvement
  Components: metrics, rest
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Mubarak Seyed
Priority: Minor
  Labels: noob
 Fix For: 0.94.0

 Attachments: HBASE-5434.trunk.v1.patch, HBASE-5434.trunk.v2.patch, 
 HBASE-5434.trunk.v2.patch


 /status/cluster shows only
 {code}
 stores=2
 storefiless=0
 storefileSizeMB=0
 memstoreSizeMB=0
 storefileIndexSizeMB=0
 {code}
 for a region but master web-ui shows
 {code}
 stores=1,
 storefiles=0,
 storefileUncompressedSizeMB=0
 storefileSizeMB=0
 memstoreSizeMB=0
 storefileIndexSizeMB=0
 readRequestsCount=0
 writeRequestsCount=0
 rootIndexSizeKB=0
 totalStaticIndexSizeKB=0
 totalStaticBloomSizeKB=0
 totalCompactingKVs=0
 currentCompactedKVs=0
 compactionProgressPct=NaN
 {code}
 In a write-heavy REST gateway based production environment, ops team needs to 
 verify whether write counters are getting incremented per region (they do run 
 /status/cluster on each REST server), we can get the same values from 
 *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but 
 some home-grown tools needs to parse the output of /status/cluster and 
 updates the dashboard.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-5270:
-

Status: Open (was: Patch Available)

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.2

[jira] [Resolved] (HBASE-5314) Gracefully rolling restart region servers in rolling-restart.sh

2012-03-13 Thread stack (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-5314.
--

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed

Committed to trunk.  Thanks for the patch Yifeng.

 Gracefully rolling restart region servers in rolling-restart.sh
 ---

 Key: HBASE-5314
 URL: https://issues.apache.org/jira/browse/HBASE-5314
 Project: HBase
  Issue Type: Improvement
  Components: scripts
Reporter: Yifeng Jiang
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-5314.patch, HBASE-5314.patch.2


 The rolling-restart.sh has a --rs-only option which simply restarts all 
 region servers in the cluster.
 Consider improve it to gracefully restart region servers to avoid the offline 
 time of the regions deployed on that server, and keep the region 
 distributions same as what it was before the restarting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228251#comment-13228251
]

Hadoop QA commented on HBASE-5270:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12518153/HBASE-5270v11.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 9 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 161 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1172//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1172//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1172//console

This message is automatically generated.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.2

[jira] [Updated] (HBASE-5571) Table will be disabling forever

2012-03-13 Thread chunhui shen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5571:


Attachment: HBASE-5571.patch

 Table will be disabling forever
 ---

 Key: HBASE-5571
 URL: https://issues.apache.org/jira/browse/HBASE-5571
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-5571.patch


 If we restart master when it is disabling one table, the table will be 
 disabling forever.
 In current logic, Region CLOSE RPC will always returned 
 NotServingRegionException because RS has already closed the region before we 
 restart master. So table will be disabling forever because the region will in 
 RIT all along.
 In another case, when AssignmentManager#rebuildUserRegions(), it will put 
 parent regions to AssignmentManager.regions, so we can't close these parent 
 regions until it is purged by CatalogJanitor if we execute disabling the 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records

2012-03-13 Thread Laxman (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228297#comment-13228297
]

Laxman commented on HBASE-5564:
---

Scope of this issue.

1) Avoid the behavioral inconsistency with timestamp parameter.

{noformat}
Currently in code,
a) If timstamp parameter is configured, duplicate records will be overwritten.
b) If not configured, some duplicate records are maintained as different
version.
{noformat}

This fix should be inline with the expectation Todd has mentioned.

bq. The whole point is that, in a bulk-load-only workflow, you can identify
each bulk load exactly, and correlate it to the MR job that inserted it.

2) Provide an option to look up timestamp column value from input data. (Like
ROWKEY column)
Example : importtsv.columns='HBASE_ROW_KEY, HBASE_TS_KEY,
emp:name,emp:sal,dept:code'

I will submit the patch with the above mentioned approach.

Any other addons?

Bulkload is discarding duplicate records

Key: HBASE-5564
URL: https://issues.apache.org/jira/browse/HBASE-5564
Project: HBase
Issue Type: Bug
Components: mapreduce
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Environment: HBase 0.92
Reporter: Laxman
Assignee: Laxman
Labels: bulkloader

Duplicate records are getting discarded when duplicate records exists in same
input file and more specifically if they exists in same split.
Duplicate records are considered if the records are from diffrent different
splits.
Version under test: HBase 0.92

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228300#comment-13228300
 ] 

nkeywal commented on HBASE-5399:


@stack
Yes, this test is flaky... I reproduce the error on the trunk as of March 10th 
as well. I've seen it failing previously, I think it's flaky for at the very 
least a month (and may be much more)

git log:
{noformat}
commit 0f3e025a62f89763fffbf8298d565a6c4e5b7d06
Date:   Sat Mar 10 02:27:05 2012 +
{noformat}


With the same stack as in trunk #2676:
{noformat}
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.444 sec  
FAILURE!
testMultiRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation)
  Time elapsed: 7.083 sec   FAILURE!
junit.framework.AssertionFailedError: expected:0 but was:1
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at junit.framework.Assert.assertEquals(Assert.java:205)
at 
org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testMultiRowMutationMultiThreads(TestAtomicOperation.java:416)
{noformat}

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 
 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 
 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 
 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 
 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 
 5399_inprogress.v9.patch, nochange.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records

2012-03-13 Thread Laxman (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228406#comment-13228406
 ] 

Laxman commented on HBASE-5564:
---

While testing the patch in local, I'm getting the following error in trunk.
Any hints on this please?

{noformat}
java.lang.RuntimeException: java.io.IOException: Call to localhost/127.0.0.1:0 
failed on local exception: java.net.BindException: Cannot assign requested 
address: no further information
at 
org.apache.hadoop.mapred.MiniMRCluster.waitUntilIdle(MiniMRCluster.java:323)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:524)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:462)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:454)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:446)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:436)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:426)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:417)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniMapReduceCluster(HBaseTestingUtility.java:1269)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniMapReduceCluster(HBaseTestingUtility.java:1255)
at 
org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:189)
at 
org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:162)
{noformat}

 Bulkload is discarding duplicate records
 

 Key: HBASE-5564
 URL: https://issues.apache.org/jira/browse/HBASE-5564
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
 Environment: HBase 0.92
Reporter: Laxman
Assignee: Laxman
  Labels: bulkloader

 Duplicate records are getting discarded when duplicate records exists in same 
 input file and more specifically if they exists in same split.
 Duplicate records are considered if the records are from diffrent different 
 splits.
 Version under test: HBase 0.92

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread nkeywal (Created) (JIRA)

KeeperException.SessionExpiredException management could be improved in Master
--

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor


Synthesis:
 1) TestMasterZKSessionRecovery distinguish two cases on 
SessionExpiredException. One is explicitly not managed. However, is seems that 
there is no reason for this.
 2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
quite complex function, with a useless recursive call.
 3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
equivalent to TestZooKeeper#testMasterSessionExpired
 4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
removed if we merge the two cases mentioned above.


Changes are:
 2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
single case and remove recursion
 1) Removing TestMasterZKSessionRecovery


Detailed justification:
testMasterZKSessionRecoveryFailure says:

{noformat}
  /**
   * Negative test of master recovery from zk session expiry.
   *
   * Starts with one master. Fakes the master zk session expired.
   * Ensures the master cannot recover the expired zk session since
   * the master zk node is still there.
   */
  public void testMasterZKSessionRecoveryFailure() throws Exception {
MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
HMaster m = cluster.getMaster();
m.abort(Test recovery from zk session expired,
  new KeeperException.SessionExpiredException());
assertTrue(m.isStopped());
  }
{noformat}

This tests works, i.e. the assertion is always verified.
But do we really want this behavior?

When looking at the code, we see that this what's happening is strange:

- HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
HMaster#abort stops the master.
- HMaster#abortNow checks the exception type. As it's a SessionExpiredException 
it will try to recover, calling HMaster#tryRecoveringExpiredZKSession. If it 
cannot, it will return false (and that will make HMaster#abort stopping the 
master)
- HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
then try to become the active master. If it cannot, it will return false (and 
that will make HMaster#abort stopping the master).
- HMaster#becomeActiveMaster returns the result of 
ActiveMasterManager#blockUntilBecomingActiveMaster. 
blockUntilBecomingActiveMaster says it will return false if there is any error 
preventing it to become the active master.
- ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
address. If it's the same port  host, it deletes the nodes, that will start a 
recursive call to blockUntilBecomingActiveMaster. This second call succeeds (we 
became the active master) and return true. This result is ignored by the first 
blockUntilBecomingActiveMaster: it return false (even if we actually became the 
active master), hence the whole suite call returns false and HMaster#abort 
stops the master.

In other words, the comment says Ensures the master cannot recover the expired 
zk session since the master zk node is still there. but we're actually doing a 
check just for this and deleting the node. If we were not ignoring the result, 
we would return true, so we would not stop the master, so the test would fail.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Attachment: 5572.v1.patch

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Status: Patch Available  (was: Open)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK


[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228421#comment-13228421
 ] 

ramkrishna.s.vasudevan commented on HBASE-5206:
---

For the 0.92 version the test case passed even with your updated patch Ted.
The change that you made is needed.

 Port HBASE-5155 to 0.92 and TRUNK
 -

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_trunk-v2.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228453#comment-13228453
]

stack commented on HBASE-5270:
--

I committed to trunk. Can you make a version for 0.92 please Chunhui? The
trunk patch does not seem to apply. Thank you.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.2

[jira] [Commented] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


[ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228460#comment-13228460
 ] 

stack commented on HBASE-5572:
--

@N Thanks for digging in.  So, it looks like your patch retains the behavior 
where if the current master has same host and port, we'll expire it, and then 
try and register ourselves (because we go around to the top of your new while 
loop)?  Is that so?  I believe we have a test to ensure this behavior IIRC.  
Patch looks good.  +1.

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228464#comment-13228464
 ] 

stack commented on HBASE-5399:
--

bq. Yes, this test is flaky... I reproduce the error on the trunk as of March 
10th as well. 

How do you do this?  You run the test multiple times?

bq. I think it's flaky for at the very least a month (and may be much more)

In your estimation, we broke this a while back. Any clue what did it?

Thanks.



 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 
 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 
 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 
 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 
 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 
 5399_inprogress.v9.patch, nochange.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records


[ 
https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228467#comment-13228467
 ] 

stack commented on HBASE-5564:
--

Googling it, its either something is already listening on the port of your 
127.0.0.1 has been removed?   See 
http://www-01.ibm.com/support/docview.wss?uid=swg21233733

 Bulkload is discarding duplicate records
 

 Key: HBASE-5564
 URL: https://issues.apache.org/jira/browse/HBASE-5564
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
 Environment: HBase 0.92
Reporter: Laxman
Assignee: Laxman
  Labels: bulkloader

 Duplicate records are getting discarded when duplicate records exists in same 
 input file and more specifically if they exists in same split.
 Duplicate records are considered if the records are from diffrent different 
 splits.
 Version under test: HBase 0.92

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


[ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228476#comment-13228476
 ] 

nkeywal commented on HBASE-5572:


Yes. I've done 3 modifications in the code, two like for like (hopefully!) and 
one with a different behavior. I:
- removed the variable named cleanSetOfActiveMaster, replaced by return true 
or return false. 
- replaced the recursive call by a while(true) loop. 
- implicitly (it's hidden because there is no recursive call anymore) changed 
the function behavior: we now return the final result. For this reason the 
function behaves differently (we return true instead of false), but it's more 
on line with the method contract. This change breaks the 
testMasterZKSessionRecoveryFailure, because it does not fail anymore. 

TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure was testing 
explicitly the behavior with both SessionExpired AND master with same host  
port. I removed it, but I can move it to TestZooKeeper (to save a cluster 
start/stop) and reverse the assertion in the test (now it does not fail).





 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please

[jira] [Commented] (HBASE-5571) Table will be disabling forever


[ 
https://issues.apache.org/jira/browse/HBASE-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228484#comment-13228484
 ] 

stack commented on HBASE-5571:
--

Oh, any chance of a test?

 Table will be disabling forever
 ---

 Key: HBASE-5571
 URL: https://issues.apache.org/jira/browse/HBASE-5571
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-5571.patch


 If we restart master when it is disabling one table, the table will be 
 disabling forever.
 In current logic, Region CLOSE RPC will always returned 
 NotServingRegionException because RS has already closed the region before we 
 restart master. So table will be disabling forever because the region will in 
 RIT all along.
 In another case, when AssignmentManager#rebuildUserRegions(), it will put 
 parent regions to AssignmentManager.regions, so we can't close these parent 
 regions until it is purged by CatalogJanitor if we execute disabling the 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5571) Table will be disabling forever

[
https://issues.apache.org/jira/browse/HBASE-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228482#comment-13228482
]

stack commented on HBASE-5571:
--

This patch has much merit. Thanks for digging in on it.

I like how you made a method cancelClosingRegionIfDisabling to hold a bunch of
code that was inside in a catch block.

I see why you want to know if a region is splitting -- it makes it so we can
remove the comments where we speculate a region is splitting -- but I don't
think keeping a list of outstaning regions in HRS the right place for it ... in
particular I don't think we should keep this state in the OnlineRegions
Interface (splitting regions are not online). When we go to split a region, we
put this fact up into zk. Why not check there rather than trying to keep
around a collection of splitting regions?

Table will be disabling forever
---

Key: HBASE-5571
URL: https://issues.apache.org/jira/browse/HBASE-5571
Project: HBase
Issue Type: Bug
Components: master, regionserver
Reporter: chunhui shen
Assignee: chunhui shen
Attachments: HBASE-5571.patch

If we restart master when it is disabling one table, the table will be
disabling forever.
In current logic, Region CLOSE RPC will always returned
NotServingRegionException because RS has already closed the region before we
restart master. So table will be disabling forever because the region will in
RIT all along.
In another case, when AssignmentManager#rebuildUserRegions(), it will put
parent regions to AssignmentManager.regions, so we can't close these parent
regions until it is purged by CatalogJanitor if we execute disabling the
table.

[jira] [Commented] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


[ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228487#comment-13228487
 ] 

stack commented on HBASE-5572:
--

Above sounds good.  Would suggest we retain the test that verifies that we 
expire znode if same host and port.  That behavior can be useful in the 
single-master case.

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5570) Compression tool section is referring to wrong link in HBase Book.

2012-03-13 Thread stack (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-5570:


Assignee: Doug Meil

Ok if I assign this to you Mr. Doug?

 Compression tool section is referring to wrong link in HBase Book.
 --

 Key: HBASE-5570
 URL: https://issues.apache.org/jira/browse/HBASE-5570
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Reporter: Laxman
Assignee: Doug Meil
Priority: Trivial
  Labels: documentaion

 http://hbase.apache.org/book/ops_mgt.html#compression.tool
 Above section is refering to itself (recursive) in HBase book.
 This needs to be corrected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228488#comment-13228488
 ] 

Lars Hofhansl commented on HBASE-5399:
--

Only testMultiRowMutationMultiThreads is failing, which I added recently.
I now think the test always had this problem.

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 
 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 
 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 
 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 
 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 
 5399_inprogress.v9.patch, nochange.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5570) Compression tool section is referring to wrong link in HBase Book.

2012-03-13 Thread Doug Meil (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228491#comment-13228491
 ] 

Doug Meil commented on HBASE-5570:
--

that's fine

 Compression tool section is referring to wrong link in HBase Book.
 --

 Key: HBASE-5570
 URL: https://issues.apache.org/jira/browse/HBASE-5570
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Reporter: Laxman
Assignee: Doug Meil
Priority: Trivial
  Labels: documentaion

 http://hbase.apache.org/book/ops_mgt.html#compression.tool
 Above section is refering to itself (recursive) in HBase book.
 This needs to be corrected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5573) Replace client ZooKeeper watchers by simple ZooKeeper reads

2012-03-13 Thread nkeywal (Created) (JIRA)

Replace client ZooKeeper watchers by simple ZooKeeper reads
---

 Key: HBASE-5573
 URL: https://issues.apache.org/jira/browse/HBASE-5573
 Project: HBase
  Issue Type: Improvement
  Components: client, zookeeper
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor


Some code in the package needs to read data in ZK. This could be done by a 
simple read, but is actually implemented with a watcher. This holds ZK 
resources.

Fixing this could also be an opportunity to remove the need for the client to 
provide the master address and port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228494#comment-13228494
]

Lars Hofhansl commented on HBASE-5270:
--

Stack, should I apply to 0.94?

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.2

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228496#comment-13228496
]

stack commented on HBASE-5270:
--

@Lars Yeah. Let me try this trunk patch. Its good for master joining a
cluster w/ concurrent crashing regionservers.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.2

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228498#comment-13228498
 ] 

Lars Hofhansl commented on HBASE-5569:
--

I'll run this in a loop on my work machine (8 core + hyperthreading), should 
increase the likelihood of this happening.
Will then avoid the parallel flushing, and see of that fixes the problem.

I think the test always had this problem. On the other I do think this 
indicates a problem with scanning.
This is suspicious, and the code producing this was also added relatively 
recently:
{quote}
Storescanner.peek() is changed where before = 
rowB/colfamily11:qual1/75366/Put/vlen=6,and after = 
rowB/colfamily11:qual1/75203/DeleteColumn/vlen
{quote}


 TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
 ---

 Key: HBASE-5569
 URL: https://issues.apache.org/jira/browse/HBASE-5569
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor

 What I pieced together so far is that it is the *scanning* side that has 
 problems sometimes.
 Every time I see a assertion failure in the log I see this before:
 {quote}
 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): 
 Storescanner.peek() is changed where before = 
 rowB/colfamily11:qual1/75366/Put/vlen=6,and after = 
 rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
 {quote}
 The order of if the Put and Delete is sometimes reversed.
 The test threads should always see exactly one KV, if the before was the 
 Put the thread see 0 KVs, if the before was the Delete the threads see 2 
 KVs.
 This debug message comes from StoreScanner to checkReseek. It seems we still 
 some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228499#comment-13228499
]

stack commented on HBASE-5270:
--

The trunk patch went into 0.94. I committed it. So, just need patch for
0.92.2 now.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.2

[jira] [Commented] (HBASE-5573) Replace client ZooKeeper watchers by simple ZooKeeper reads

2012-03-13 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228501#comment-13228501
 ] 

stack commented on HBASE-5573:
--

Hurray!

 Replace client ZooKeeper watchers by simple ZooKeeper reads
 ---

 Key: HBASE-5573
 URL: https://issues.apache.org/jira/browse/HBASE-5573
 Project: HBase
  Issue Type: Improvement
  Components: client, zookeeper
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor

 Some code in the package needs to read data in ZK. This could be done by a 
 simple read, but is actually implemented with a watcher. This holds ZK 
 resources.
 Fixing this could also be an opportunity to remove the need for the client to 
 provide the master address and port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228526#comment-13228526
]

ramkrishna.s.vasudevan commented on HBASE-5270:
---

So this one went in.
Thanks Stack, Ted and mainly Chunhui for pursuing on this.
The 0.90 patch that Chunhui gave in HBASE-5179 is running in our test clusters.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.2

[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Attachment: 5572.v2.patch

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Status: Open  (was: Patch Available)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Status: Patch Available  (was: Open)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5574) DEFAULT_MAX_FILE_SIZE defaults to a negative value

2012-03-13 Thread Michael Drzal (Created) (JIRA)

DEFAULT_MAX_FILE_SIZE defaults to a negative value
--

 Key: HBASE-5574
 URL: https://issues.apache.org/jira/browse/HBASE-5574
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Michael Drzal
Assignee: Michael Drzal


HBASE-4365 changed the value of DEFAULT_MAX_FILE_SIZE from 256MB to 10G.  Here 
is the line of code:

public static final long DEFAULT_MAX_FILE_SIZE = 10 * 1024 * 1024 * 1024;

The problem is that java evaluates the constant as an int which wraps and gets 
assigned to a long.  I verified this with a test.  The quick fix is to change 
the end to 1024L;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5574) DEFAULT_MAX_FILE_SIZE defaults to a negative value

2012-03-13 Thread Michael Drzal (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Drzal updated HBASE-5574:
-

Attachment: HBASE-5574.patch

Changes constant evaluation to a long.

 DEFAULT_MAX_FILE_SIZE defaults to a negative value
 --

 Key: HBASE-5574
 URL: https://issues.apache.org/jira/browse/HBASE-5574
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Michael Drzal
Assignee: Michael Drzal
 Attachments: HBASE-5574.patch


 HBASE-4365 changed the value of DEFAULT_MAX_FILE_SIZE from 256MB to 10G.  
 Here is the line of code:
 public static final long DEFAULT_MAX_FILE_SIZE = 10 * 1024 * 1024 * 1024;
 The problem is that java evaluates the constant as an int which wraps and 
 gets assigned to a long.  I verified this with a test.  The quick fix is to 
 change the end to 1024L;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228546#comment-13228546
 ] 

Lars Hofhansl commented on HBASE-5569:
--

I cannot make this test fail locally it seems. Running in a loop for an hour 
now (test takes ~12s on my machine).

 TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
 ---

 Key: HBASE-5569
 URL: https://issues.apache.org/jira/browse/HBASE-5569
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor

 What I pieced together so far is that it is the *scanning* side that has 
 problems sometimes.
 Every time I see a assertion failure in the log I see this before:
 {quote}
 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): 
 Storescanner.peek() is changed where before = 
 rowB/colfamily11:qual1/75366/Put/vlen=6,and after = 
 rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
 {quote}
 The order of if the Put and Delete is sometimes reversed.
 The test threads should always see exactly one KV, if the before was the 
 Put the thread see 0 KVs, if the before was the Delete the threads see 2 
 KVs.
 This debug message comes from StoreScanner to checkReseek. It seems we still 
 some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5574) DEFAULT_MAX_FILE_SIZE defaults to a negative value


[ 
https://issues.apache.org/jira/browse/HBASE-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228548#comment-13228548
 ] 

stack commented on HBASE-5574:
--

+1  Thanks Michael

 DEFAULT_MAX_FILE_SIZE defaults to a negative value
 --

 Key: HBASE-5574
 URL: https://issues.apache.org/jira/browse/HBASE-5574
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Michael Drzal
Assignee: Michael Drzal
 Attachments: HBASE-5574.patch


 HBASE-4365 changed the value of DEFAULT_MAX_FILE_SIZE from 256MB to 10G.  
 Here is the line of code:
 public static final long DEFAULT_MAX_FILE_SIZE = 10 * 1024 * 1024 * 1024;
 The problem is that java evaluates the constant as an int which wraps and 
 gets assigned to a long.  I verified this with a test.  The quick fix is to 
 change the end to 1024L;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228553#comment-13228553
 ] 

Lars Hofhansl commented on HBASE-5569:
--

Are my assumptions about scanning wrong here?

The test works as follows:
A bunch of thread alternate putting a column on RowA and deleting that column 
on RowB in a transaction (next time delete is on RowA and put on RowB).
The they each scan starting with RowA and then expect to always see exactly one 
KV (either the column in RowA or the one in RowB).

So this relies on a scan providing an atomic view over the two rows (which is 
think should work if both RowA and RowB are rolled forward with the same MVCC 
writepoint).


 TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
 ---

 Key: HBASE-5569
 URL: https://issues.apache.org/jira/browse/HBASE-5569
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor

 What I pieced together so far is that it is the *scanning* side that has 
 problems sometimes.
 Every time I see a assertion failure in the log I see this before:
 {quote}
 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): 
 Storescanner.peek() is changed where before = 
 rowB/colfamily11:qual1/75366/Put/vlen=6,and after = 
 rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
 {quote}
 The order of if the Put and Delete is sometimes reversed.
 The test threads should always see exactly one KV, if the before was the 
 Put the thread see 0 KVs, if the before was the Delete the threads see 2 
 KVs.
 This debug message comes from StoreScanner to checkReseek. It seems we still 
 some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228554#comment-13228554
 ] 

Lars Hofhansl commented on HBASE-5569:
--

Ok... Failed locally once now as well.

 TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
 ---

 Key: HBASE-5569
 URL: https://issues.apache.org/jira/browse/HBASE-5569
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor

 What I pieced together so far is that it is the *scanning* side that has 
 problems sometimes.
 Every time I see a assertion failure in the log I see this before:
 {quote}
 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): 
 Storescanner.peek() is changed where before = 
 rowB/colfamily11:qual1/75366/Put/vlen=6,and after = 
 rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
 {quote}
 The order of if the Put and Delete is sometimes reversed.
 The test threads should always see exactly one KV, if the before was the 
 Put the thread see 0 KVs, if the before was the Delete the threads see 2 
 KVs.
 This debug message comes from StoreScanner to checkReseek. It seems we still 
 some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss

2012-03-13 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228558#comment-13228558
 ] 

Hudson commented on HBASE-5179:
---

Integrated in HBase-0.94 #30 (See 
[https://builds.apache.org/job/HBase-0.94/30/])
HBASE-5179 Handle potential data loss due to concurrent processing of 
processFaileOver and ServerShutdownHandler (Revision 1300222)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java


 Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
 region to be assigned before log splitting is completed, causing data loss
 

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.92.2

 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 
 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 
 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 
 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 
 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 
 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
 Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, 
 hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, 
 hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch


 If master's processing its failover and ServerShutdownHandler's processing 
 happen concurrently, it may appear following  case.
 1.master completed splitLogAfterStartup()
 2.RegionserverA restarts, and ServerShutdownHandler is processing.
 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
 dead server.
 4.master starts to assign regions of RegionserverA because it is a dead 
 server by step3.
 However, when doing step4(assigning region), ServerShutdownHandler may be 
 doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()


[ 
https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228563#comment-13228563
 ] 

Phabricator commented on HBASE-5542:


sc has commented on the revision HBASE-5542 [jira] Unify 
HRegion.mutateRowsWithLocks() and HRegion.processRow().

  I think I should use a thread pool to at lease cache the threads for the time 
bound case. I will make a quick change and update this.

REVISION DETAIL
  https://reviews.facebook.net/D2217


 Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()
 

 Key: HBASE-5542
 URL: https://issues.apache.org/jira/browse/HBASE-5542
 Project: HBase
  Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.96.0

 Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.10.patch, 
 HBASE-5542.D2217.2.patch, HBASE-5542.D2217.3.patch, HBASE-5542.D2217.4.patch, 
 HBASE-5542.D2217.5.patch, HBASE-5542.D2217.6.patch, HBASE-5542.D2217.7.patch, 
 HBASE-5542.D2217.8.patch, HBASE-5542.D2217.9.patch


 mutateRowsWithLocks() does atomic mutations on multiple rows.
 processRow() does atomic read-modify-writes on a single row.
 It will be useful to generalize both and have a
 processRowsWithLocks() that does atomic read-modify-writes on multiple rows.
 This also helps reduce some redundancy in the codes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Status: Open  (was: Patch Available)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Attachment: 5572.v2.patch

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Fix Version/s: 0.96.0
   Status: Patch Available  (was: Open)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228587#comment-13228587
 ] 

nkeywal commented on HBASE-5399:


@Stack; I think it fails 20% of the time. I run it alone, i.e. with 
-Dtest=TestAtomicOperation#testMultiRowMutationMultiThreads with nothing else 
running on the machine, and a mvn clean. No clue on when it started to happen.

@Lars: I'm not sure I haven't seen failures on testRowMutationMultiThreads as 
well, I will launch a few tests to see if it happens.

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 
 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 
 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 
 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 
 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 
 5399_inprogress.v9.patch, nochange.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228592#comment-13228592
 ] 

stack commented on HBASE-5569:
--

hbase-5121 does mess w/ scanners... Seems like pretty issue though, what 
hbase-5121 is trying to solve.  Pity its so hard verifying this started the 
failures else we could back it out for now.  Should we back it out anyways and 
see if we get failures over the next few days?

 TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
 ---

 Key: HBASE-5569
 URL: https://issues.apache.org/jira/browse/HBASE-5569
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor

 What I pieced together so far is that it is the *scanning* side that has 
 problems sometimes.
 Every time I see a assertion failure in the log I see this before:
 {quote}
 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): 
 Storescanner.peek() is changed where before = 
 rowB/colfamily11:qual1/75366/Put/vlen=6,and after = 
 rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
 {quote}
 The order of if the Put and Delete is sometimes reversed.
 The test threads should always see exactly one KV, if the before was the 
 Put the thread see 0 KVs, if the before was the Delete the threads see 2 
 KVs.
 This debug message comes from StoreScanner to checkReseek. It seems we still 
 some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


[ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228598#comment-13228598
 ] 

stack commented on HBASE-5572:
--

+1 on patch.  Waiting on hadoopqa...

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5574) DEFAULT_MAX_FILE_SIZE defaults to a negative value

2012-03-13 Thread stack (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-5574.
--

   Resolution: Fixed
Fix Version/s: 0.94.0
 Hadoop Flags: Reviewed

Applied trunk and 0.94 branch.

 DEFAULT_MAX_FILE_SIZE defaults to a negative value
 --

 Key: HBASE-5574
 URL: https://issues.apache.org/jira/browse/HBASE-5574
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Michael Drzal
Assignee: Michael Drzal
 Fix For: 0.94.0

 Attachments: HBASE-5574.patch


 HBASE-4365 changed the value of DEFAULT_MAX_FILE_SIZE from 256MB to 10G.  
 Here is the line of code:
 public static final long DEFAULT_MAX_FILE_SIZE = 10 * 1024 * 1024 * 1024;
 The problem is that java evaluates the constant as an int which wraps and 
 gets assigned to a long.  I verified this with a test.  The quick fix is to 
 change the end to 1024L;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK


 [ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5206:
--

Attachment: (was: 5206_trunk-v2.patch)

 Port HBASE-5155 to 0.92 and TRUNK
 -

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_trunk-v2.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK


 [ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5206:
--

Attachment: 5206_trunk-v2.patch

 Port HBASE-5155 to 0.92 and TRUNK
 -

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_trunk-v2.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK


[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228634#comment-13228634
 ] 

Hadoop QA commented on HBASE-5206:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518221/5206_trunk-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1174//console

This message is automatically generated.

 Port HBASE-5155 to 0.92 and TRUNK
 -

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_trunk-v2.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK


 [ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5206:
--

Attachment: (was: 5206_trunk-v2.patch)

 Port HBASE-5155 to 0.92 and TRUNK
 -

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_trunk_1.patch, 5206_trunk_latest_1.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK


 [ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5206:
--

Attachment: 5206_trunk-v3.patch

 Port HBASE-5155 to 0.92 and TRUNK
 -

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_trunk-v3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK


 [ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5206:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518221/5206_trunk-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1174//console

This message is automatically generated.)

 Port HBASE-5155 to 0.92 and TRUNK
 -

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_trunk-v3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5574) DEFAULT_MAX_FILE_SIZE defaults to a negative value

2012-03-13 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228642#comment-13228642
 ] 

Hudson commented on HBASE-5574:
---

Integrated in HBase-0.94 #31 (See 
[https://builds.apache.org/job/HBase-0.94/31/])
HBASE-5574 DEFAULT_MAX_FILE_SIZE defaults to a negative value (Revision 
1300289)

 Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java


 DEFAULT_MAX_FILE_SIZE defaults to a negative value
 --

 Key: HBASE-5574
 URL: https://issues.apache.org/jira/browse/HBASE-5574
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Michael Drzal
Assignee: Michael Drzal
 Fix For: 0.94.0

 Attachments: HBASE-5574.patch


 HBASE-4365 changed the value of DEFAULT_MAX_FILE_SIZE from 256MB to 10G.  
 Here is the line of code:
 public static final long DEFAULT_MAX_FILE_SIZE = 10 * 1024 * 1024 * 1024;
 The problem is that java evaluates the constant as an int which wraps and 
 gets assigned to a long.  I verified this with a test.  The quick fix is to 
 change the end to 1024L;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228662#comment-13228662
 ] 

nkeywal commented on HBASE-5399:


@Lars, Stack:
After 50 tries, on trunk (fbd4bebd5cca129f49e91ec9936f604998a7025a) + 5572 I 
got it:

testRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation):
 expected:0 but was:5
at 
org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testRowMutationMultiThreads(TestAtomicOperation.java:331)

So the probability for testRowMutationMultiThreads is 10 times inferior than 
for testMultiRowMutationMultiThreads but it can occur as well..

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 
 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 
 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 
 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 
 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 
 5399_inprogress.v9.patch, nochange.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228665#comment-13228665
 ] 

Lars Hofhansl commented on HBASE-5569:
--

I can try to back out HBASE-5121 and see if I can still get this fail.

I do think my assumption about scanning were wrong, though. HBASE-5229 is still 
valid (in that it allows a bunch of operations across multiple rows either all 
fail or all succeed), just that there is currently no way to get a consistent 
scan over *multiple* rows when flushing is involved (which is OK, because the 
scanner contract never guaranteed that). If that is the case I should disable 
the test.

TestAtomicOperation.testRowMutationMultiThreads basically does the same thing 
only within the same row, I have never seen that one fail.


 TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
 ---

 Key: HBASE-5569
 URL: https://issues.apache.org/jira/browse/HBASE-5569
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor

 What I pieced together so far is that it is the *scanning* side that has 
 problems sometimes.
 Every time I see a assertion failure in the log I see this before:
 {quote}
 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): 
 Storescanner.peek() is changed where before = 
 rowB/colfamily11:qual1/75366/Put/vlen=6,and after = 
 rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
 {quote}
 The order of if the Put and Delete is sometimes reversed.
 The test threads should always see exactly one KV, if the before was the 
 Put the thread see 0 KVs, if the before was the Delete the threads see 2 
 KVs.
 This debug message comes from StoreScanner to checkReseek. It seems we still 
 some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK

[
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228675#comment-13228675
]

Hadoop QA commented on HBASE-5206:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12518229/5206_trunk-v3.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 15 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 161 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1175//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1175//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1175//console

This message is automatically generated.

Port HBASE-5155 to 0.92 and TRUNK
-

Key: HBASE-5206
URL: https://issues.apache.org/jira/browse/HBASE-5206
Project: HBase
Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
Attachments: 5206_92_1.patch, 5206_92_latest_1.patch,
5206_trunk-v3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch

This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should
not happen parallely leading to recreation of regions that were deleted) to
0.92 and TRUNK

[jira] [Commented] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()

[
https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228676#comment-13228676
]

Hadoop QA commented on HBASE-5542:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12518234/HBASE-5542.D2217.11.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 4 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1176//console

This message is automatically generated.

Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()

Key: HBASE-5542
URL: https://issues.apache.org/jira/browse/HBASE-5542
Project: HBase
Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
Fix For: 0.96.0

Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.10.patch,
HBASE-5542.D2217.11.patch, HBASE-5542.D2217.2.patch,
HBASE-5542.D2217.3.patch, HBASE-5542.D2217.4.patch, HBASE-5542.D2217.5.patch,
HBASE-5542.D2217.6.patch, HBASE-5542.D2217.7.patch, HBASE-5542.D2217.8.patch,
HBASE-5542.D2217.9.patch

mutateRowsWithLocks() does atomic mutations on multiple rows.
processRow() does atomic read-modify-writes on a single row.
It will be useful to generalize both and have a
processRowsWithLocks() that does atomic read-modify-writes on multiple rows.
This also helps reduce some redundancy in the codes.

[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Attachment: 5572.v2.patch

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch, 
 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Status: Open  (was: Patch Available)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch, 
 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master


[ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228689#comment-13228689
 ] 

nkeywal commented on HBASE-5572:


for an unknown reason the first two patches didn't make it to hadoop-qa. 
Rewriting once again.

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch, 
 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()

2012-03-13 Thread Phabricator (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5542:
---

Attachment: HBASE-5542.D2217.12.patch

sc updated the revision HBASE-5542 [jira] Unify HRegion.mutateRowsWithLocks() 
and HRegion.processRow().
Reviewers: tedyu, lhofhansl, JIRA

  Rebase against trunk

REVISION DETAIL
  https://reviews.facebook.net/D2217

AFFECTED FILES
  
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRowProcessorEndpoint.java
  src/main/java/org/apache/hadoop/hbase/coprocessor/RowProcessorProtocol.java
  src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/MultiRowMutationProcessor.java
  src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java
  src/main/java/org/apache/hadoop/hbase/coprocessor/RowProcessor.java
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestProcessRowEndpoint.java
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java


 Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()
 

 Key: HBASE-5542
 URL: https://issues.apache.org/jira/browse/HBASE-5542
 Project: HBase
  Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.96.0

 Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.10.patch, 
 HBASE-5542.D2217.11.patch, HBASE-5542.D2217.12.patch, 
 HBASE-5542.D2217.2.patch, HBASE-5542.D2217.3.patch, HBASE-5542.D2217.4.patch, 
 HBASE-5542.D2217.5.patch, HBASE-5542.D2217.6.patch, HBASE-5542.D2217.7.patch, 
 HBASE-5542.D2217.8.patch, HBASE-5542.D2217.9.patch


 mutateRowsWithLocks() does atomic mutations on multiple rows.
 processRow() does atomic read-modify-writes on a single row.
 It will be useful to generalize both and have a
 processRowsWithLocks() that does atomic read-modify-writes on multiple rows.
 This also helps reduce some redundancy in the codes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5335) Dynamic Schema Configurations


[ 
https://issues.apache.org/jira/browse/HBASE-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228732#comment-13228732
 ] 

Phabricator commented on HBASE-5335:


nspiegelberg has commented on the revision [jira] [HBASE-5335] Dynamic Schema 
Config.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:736 okay. maybe 
getValues since that's the variable name?
  src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java:571 yeah,  I 
think that HTD  HCD needs a lot of unification work beyond this.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:938 I'll 
write more comments, there's a nasty issue here:

  when you split, you take the 'conf' from the parent region and pass it into 
the daughter region's constructor.  If you passed in the CompoundConfiguration, 
you would end up with using the HTD of the parent region and the new HTD of the 
daughter region.  You really need to pass the base Configuration object used by 
HRegionServer to the daughter regions to avoid a tricky dedupe problem.

REVISION DETAIL
  https://reviews.facebook.net/D2247


 Dynamic Schema Configurations
 -

 Key: HBASE-5335
 URL: https://issues.apache.org/jira/browse/HBASE-5335
 Project: HBase
  Issue Type: New Feature
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
  Labels: configuration, schema
 Attachments: D2247.1.patch


 Currently, the ability for a core developer to add per-table  per-CF 
 configuration settings is very heavyweight.  You need to add a reserved 
 keyword all the way up the stack  you have to support this variable 
 long-term if you're going to expose it explicitly to the user.  This has 
 ended up with using Configuration.get() a lot because it is lightweight and 
 you can tweak settings while you're trying to understand system behavior 
 [since there are many config params that may never need to be tuned].  We 
 need to add the ability to put  read arbitrary KV settings in the HBase 
 schema.  Combined with online schema change, this will allow us to safely 
 iterate on configuration settings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5364) Fix source files missing licenses in 0.92 and trunk

2012-03-13 Thread Jean-Daniel Cryans (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5364:
--

Fix Version/s: 0.90.6

I marked this as fixed for 0.90.6 but I'm not changing the title since it's all 
over the CHANGES.txt files.

 Fix source files missing licenses in 0.92 and trunk
 ---

 Key: HBASE-5364
 URL: https://issues.apache.org/jira/browse/HBASE-5364
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Elliott Clark
Priority: Blocker
 Fix For: 0.90.6, 0.92.1, 0.94.0

 Attachments: HBASE-5364-1.patch, hbase-5364-0.90.patch, 
 hbase-5364-0.92.patch, hbase-5364-v2.patch


 running 'mvn rat:check' shows that a few files have snuck in that do not have 
 proper apache licenses.  Ideally we should fix these before we cut another 
 release/release candidate.
 This is a blocker for 0.94, and probably should be for the other branches as 
 well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5575) Configure Arcanist lint engine for HBase

2012-03-13 Thread Mikhail Bautin (Created) (JIRA)

Configure Arcanist lint engine for HBase


 Key: HBASE-5575
 URL: https://issues.apache.org/jira/browse/HBASE-5575
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


We need to enable Arcanist lint engine in HBase, so that a commit could be 
checked by running arc lint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5576) Configure Arcanist lint engine for HBase

2012-03-13 Thread Mikhail Bautin (Created) (JIRA)

Configure Arcanist lint engine for HBase


 Key: HBASE-5576
 URL: https://issues.apache.org/jira/browse/HBASE-5576
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


We need to be able to use arc lint to check a patch for code style errors 
before submission.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5576) Configure Arcanist lint engine for HBase

2012-03-13 Thread Mikhail Bautin (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin resolved HBASE-5576.
---

Resolution: Duplicate

See HBASE-5575. For some reason new JIRAs are not immediately visible, so I 
ended up creating a duplicate.

 Configure Arcanist lint engine for HBase
 

 Key: HBASE-5576
 URL: https://issues.apache.org/jira/browse/HBASE-5576
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 We need to be able to use arc lint to check a patch for code style errors 
 before submission.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()


[ 
https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228743#comment-13228743
 ] 

Phabricator commented on HBASE-5542:


tedyu has commented on the revision HBASE-5542 [jira] Unify 
HRegion.mutateRowsWithLocks() and HRegion.processRow().

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:4417 
Exception includes IOE.
  Should we check against IOE so that we don't wrap again ?

REVISION DETAIL
  https://reviews.facebook.net/D2217


 Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()
 

 Key: HBASE-5542
 URL: https://issues.apache.org/jira/browse/HBASE-5542
 Project: HBase
  Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.96.0

 Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.10.patch, 
 HBASE-5542.D2217.11.patch, HBASE-5542.D2217.12.patch, 
 HBASE-5542.D2217.2.patch, HBASE-5542.D2217.3.patch, HBASE-5542.D2217.4.patch, 
 HBASE-5542.D2217.5.patch, HBASE-5542.D2217.6.patch, HBASE-5542.D2217.7.patch, 
 HBASE-5542.D2217.8.patch, HBASE-5542.D2217.9.patch


 mutateRowsWithLocks() does atomic mutations on multiple rows.
 processRow() does atomic read-modify-writes on a single row.
 It will be useful to generalize both and have a
 processRowsWithLocks() that does atomic read-modify-writes on multiple rows.
 This also helps reduce some redundancy in the codes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-03-13 Thread Nicolas Spiegelberg (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228753#comment-13228753
 ] 

Lars Hofhansl commented on HBASE-5399:
--

Arrgghhh... That's not good!
Do you see above message in that case in the logs?
Do you still have the log output? (Feel free to send a zip via email or attach 
here).

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 
 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 
 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 
 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 
 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 
 5399_inprogress.v9.patch, nochange.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5575) Configure Arcanist lint engine for HBase


[ 
https://issues.apache.org/jira/browse/HBASE-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228761#comment-13228761
 ] 

Nicolas Spiegelberg commented on HBASE-5575:


+1.  This should make every patch writer's job easier and committers can run 
'arc lint' before checkin instead of having to focus on style issues during 
peer review.

 Configure Arcanist lint engine for HBase
 

 Key: HBASE-5575
 URL: https://issues.apache.org/jira/browse/HBASE-5575
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 We need to enable Arcanist lint engine in HBase, so that a commit could be 
 checked by running arc lint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5559) --presplit option creates a first split with rowkey-end=0

2012-03-13 Thread Sujee Maniyam (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujee Maniyam updated HBASE-5559:
-

Attachment: HBASE-5559-v2.patch

revised patch

 --presplit option creates a first split with rowkey-end=0
 -

 Key: HBASE-5559
 URL: https://issues.apache.org/jira/browse/HBASE-5559
 Project: HBase
  Issue Type: Bug
  Components: util
Reporter: Sujee Maniyam
Assignee: Sujee Maniyam
Priority: Trivial
  Labels: benchmark
 Attachments: 5559_v1.patch, HBASE-5559-v2.patch


 HBASE-4440 adds a 'presplit' option to PerformanceEvaluation utility.
 when the splits are generated, the first split has row-end-key=0 (zero).  
 Hence this split doesn't get any data.
 For example, 
 if total keyspace is 100, and splits requested are 5, 
 generated splits = [0, 20, 40, 60, 80]
 it should be = [20, 40, 60, 80, 100]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5559) --presplit option creates a first split with rowkey-end=0

2012-03-13 Thread Sujee Maniyam (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujee Maniyam updated HBASE-5559:
-

Status: Open  (was: Patch Available)

 --presplit option creates a first split with rowkey-end=0
 -

 Key: HBASE-5559
 URL: https://issues.apache.org/jira/browse/HBASE-5559
 Project: HBase
  Issue Type: Bug
  Components: util
Reporter: Sujee Maniyam
Assignee: Sujee Maniyam
Priority: Trivial
  Labels: benchmark
 Attachments: 5559_v1.patch, HBASE-5559-v2.patch


 HBASE-4440 adds a 'presplit' option to PerformanceEvaluation utility.
 when the splits are generated, the first split has row-end-key=0 (zero).  
 Hence this split doesn't get any data.
 For example, 
 if total keyspace is 100, and splits requested are 5, 
 generated splits = [0, 20, 40, 60, 80]
 it should be = [20, 40, 60, 80, 100]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5559) --presplit option creates a first split with rowkey-end=0

2012-03-13 Thread Sujee Maniyam (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujee Maniyam updated HBASE-5559:
-

Status: Patch Available  (was: Open)

HBASE-5559-v2.patch

 --presplit option creates a first split with rowkey-end=0
 -

 Key: HBASE-5559
 URL: https://issues.apache.org/jira/browse/HBASE-5559
 Project: HBase
  Issue Type: Bug
  Components: util
Reporter: Sujee Maniyam
Assignee: Sujee Maniyam
Priority: Trivial
  Labels: benchmark
 Attachments: 5559_v1.patch, HBASE-5559-v2.patch


 HBASE-4440 adds a 'presplit' option to PerformanceEvaluation utility.
 when the splits are generated, the first split has row-end-key=0 (zero).  
 Hence this split doesn't get any data.
 For example, 
 if total keyspace is 100, and splits requested are 5, 
 generated splits = [0, 20, 40, 60, 80]
 it should be = [20, 40, 60, 80, 100]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228798#comment-13228798
 ] 

Lars Hofhansl commented on HBASE-5399:
--

Yeah. Let's move the discussion to HBASE-5569.
Thanks for doing this Nicolas (pardon me if I misspelled your name). You don't 
have to, though. Your change here is almost certainly not causing this problem.

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 
 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 
 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 
 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 
 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 
 5399_inprogress.v9.patch, nochange.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally


 [ 
https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5569:
---

Attachment: TestAtomicOperation-output.trunk_120313.rar

testRowMutationMultiThreads logs, on trunk as of today. It failed after 200 
iterations.

{noformat}
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.007 sec  
FAILURE!
testRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation)
  Time elapsed: 8.651 sec   FAILURE!
junit.framework.AssertionFailedError: expected:0 but was:8
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at junit.framework.Assert.assertEquals(Assert.java:205)
at 
org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testRowMutationMultiThreads(TestAtomicOperation.java:331)
{noformat}

 TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
 ---

 Key: HBASE-5569
 URL: https://issues.apache.org/jira/browse/HBASE-5569
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: TestAtomicOperation-output.trunk_120313.rar


 What I pieced together so far is that it is the *scanning* side that has 
 problems sometimes.
 Every time I see a assertion failure in the log I see this before:
 {quote}
 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): 
 Storescanner.peek() is changed where before = 
 rowB/colfamily11:qual1/75366/Put/vlen=6,and after = 
 rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
 {quote}
 The order of if the Put and Delete is sometimes reversed.
 The test threads should always see exactly one KV, if the before was the 
 Put the thread see 0 KVs, if the before was the Delete the threads see 2 
 KVs.
 This debug message comes from StoreScanner to checkReseek. It seems we still 
 some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228816#comment-13228816
 ] 

nkeywal commented on HBASE-5399:


It's ok, we're all in the same boat :-)
I've got the test running on a 2 weeks old version of the trunk, I will
have the result tomorrow.

On Wed, Mar 14, 2012 at 12:12 AM, Lars Hofhansl (Commented) (JIRA) 



 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 
 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 
 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 
 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 
 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 
 5399_inprogress.v9.patch, nochange.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5559) --presplit option creates a first split with rowkey-end=0

[
https://issues.apache.org/jira/browse/HBASE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228818#comment-13228818
]

Hadoop QA commented on HBASE-5559:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12518265/HBASE-5559-v2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 161 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1178//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1178//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1178//console

This message is automatically generated.

--presplit option creates a first split with rowkey-end=0
-

Key: HBASE-5559
URL: https://issues.apache.org/jira/browse/HBASE-5559
Project: HBase
Issue Type: Bug
Components: util
Reporter: Sujee Maniyam
Assignee: Sujee Maniyam
Priority: Trivial
Labels: benchmark
Attachments: 5559_v1.patch, HBASE-5559-v2.patch

HBASE-4440 adds a 'presplit' option to PerformanceEvaluation utility.
when the splits are generated, the first split has row-end-key=0 (zero).
Hence this split doesn't get any data.
For example,
if total keyspace is 100, and splits requested are 5,
generated splits = [0, 20, 40, 60, 80]
it should be = [20, 40, 60, 80, 100]

[jira] [Updated] (HBASE-4608) HLog Compression

[
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-4608:
-

Attachment: 4608v23.txt

Renamed method enableCompression in all places to be setCompressionContext

Made all instances of compression contexts have same name rather than a new
name every time used.

Cleaned up unused 'compression' data member flag or moved them local from being
data members when only used by a single method.

Removed define of TRUE and repeat of ENABLE_WAL_COMPRESSION key from
SequenceFileLogReader. No longer needed.

Rather than have the sequencefile metadata code making sprinkled over the
reader and writer, instead do all in writer and have reader use write methods.

Added a global WAL type as metadata.

Added a compression type to metadata.

Renamed method WALCompressionEnabled as isWALCompressionEnabled.

Added some small tests to TestLRUDictionary and a new TestCompressor that
taught me how this stuff works. Added documentation to methods where I was
surprised; e.g. addEntry will happily add new entry even though already has
dictionary entry.

Miscellaneous cleanup.

I ran this compression on one of our production logs and it halved its size.
See below. I then decompressed and then recompressed and I got the same size
back.

{code}
-rwxrwxrwx 1 stack staff 28540761 Mar 13 16:47
sv4r25s8%3A60020.1331661889339.out.out.out
-rwxrwxrwx 1 stack staff 64945799 Mar 13 16:45
sv4r25s8%3A60020.1331661889339.out.out
-rwxrwxrwx 1 stack staff 28540761 Mar 13 16:44
sv4r25s8%3A60020.1331661889339.out
-rw-r--r-- 1 stack staff 64928728 Mar 13 16:25
sv4r25s8%3A60020.1331661889339
{code}

Will run more of our production logs through the compressor this evening to see
if I can turn up bugs.

HLog Compression

Key: HBASE-4608
URL: https://issues.apache.org/jira/browse/HBASE-4608
Project: HBase
Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
Fix For: 0.94.0

Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt,
4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt,
4608v18.txt, 4608v23.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt

The current bottleneck to HBase write speed is replicating the WAL appends
across different datanodes. We can speed up this process by compressing the
HLog. Current plan involves using a dictionary to compress table name, region
id, cf name, and possibly other bits of repeated data. Also, HLog format may
be changed in other ways to produce a smaller HLog.

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

2012-03-13 Thread Jonathan Hsieh (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228837#comment-13228837
]

Jonathan Hsieh commented on HBASE-5128:
---

@Lars I believe the port to 0.94.0 and 0.92.x are likely identical and nearly
trivial and I was intending on doing it. The initial version was also for
0.90.x and a version for that will be ported as well since my crew will be
supporting that version for a while. I may try to do a 0.90.x release at some
point.

[uber hbck] Enable hbck to automatically repair table integrity problems as
well as region consistency problems while online.
-

Key: HBASE-5128
URL: https://issues.apache.org/jira/browse/HBASE-5128
Project: HBase
Issue Type: New Feature
Components: hbck
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Attachments: hbase-5128-trunk.patch

The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region
consistency and table integrity invariant violations. However with '-fix' it
can only automatically repair region consistency cases having to do with
deployment problems. This updated version should be able to handle all cases
(including a new orphan regiondir case). When complete will likely deprecate
the OfflineMetaRepair tool and subsume several open META-hole related issue.
Here's the approach (from the comment of at the top of the new version of the
file).
{code}
/**
* HBaseFsck (hbck) is a tool for checking and repairing region consistency
and
* table integrity.
*
* Region consistency checks verify that META, region deployment on
* region servers and the state of data in HDFS (.regioninfo files) all are in
* accordance.
*
* Table integrity checks verify that that all possible row keys can resolve
to
* exactly one region of a table. This means there are no individual
degenerate
* or backwards regions; no holes between regions; and that there no
overlapping
* regions.
*
* The general repair strategy works in these steps.
* 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
* 2) Repair Region Consistency with META and assignments
*
* For table integrity repairs, the tables their region directories are
scanned
* for .regioninfo files. Each table's integrity is then verified. If there
* are any orphan regions (regions with no .regioninfo files), or holes, new
* regions are fabricated. Backwards regions are sidelined as well as empty
* degenerate (endkey==startkey) regions. If there are any overlapping
regions,
* a new region is created and all data is merged into the new region.
*
* Table integrity repairs deal solely with HDFS and can be done offline --
the
* hbase region servers or master do not need to be running. These phase can
be
* use to completely reconstruct the META table in an offline fashion.
*
* Region consistency requires three conditions -- 1) valid .regioninfo file
* present in an hdfs region dir, 2) valid row with .regioninfo data in META,
* and 3) a region is deployed only at the regionserver that is was assigned
to.
*
* Region consistency requires hbck to contact the HBase master and region
* servers, so the connect() must first be called successfully. Much of the
* region consistency information is transient and less risky to repair.
*/
{code}

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-13 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228842#comment-13228842
 ] 

Zhihong Yu commented on HBASE-4608:
---

In isWALCompressionEnabled():
{code}
+if (txt == null || Integer.parseInt(txt.toString())  VERSION) return 
false;
{code}
What would happen when we have a newer version for WAL_VERSION_KEY ?

Looks like the following check should suffice for isWALCompressionEnabled():
{code}
+txt = metadata.get(WAL_COMPRESSION_TYPE_KEY);
+return txt != null  txt.equals(DICTIONARY_COMPRESSION_TYPE);
{code}

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5563) HRegionInfo#compareTo add the comparison of regionId

2012-03-13 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228846#comment-13228846
 ] 

Jonathan Hsieh commented on HBASE-5563:
---

@chunhui

Check out the review patch in HBASE-5128 (its big but in there) -- there is a 
fix for TestRegionObserverInterface in that patch which can probably go over 
here.   We need to look into the other faliures as well (the one I fixed was a 
Medium test -- the others are likely Large tests that aren't run until small 
and medium pass)
  

 HRegionInfo#compareTo add the comparison of regionId
 

 Key: HBASE-5563
 URL: https://issues.apache.org/jira/browse/HBASE-5563
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-5563.patch, HBASE-5563v2.patch, HBASE-5563v2.patch


 In the one region multi assigned case,  we could find that two regions have 
 the same table name, same startKey, same endKey, and different regionId, so 
 these two regions are same in TreeMap but different in HashMap.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

2012-03-13 Thread Jonathan Hsieh (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228848#comment-13228848
]

Jonathan Hsieh commented on HBASE-5128:
---

@Zhihong

No problem -- I intend to address the reviews.

Sorry about the test failures -- these are actually are related to HBASE-5563
-- I'll help chenhui there. I've been in 0.92 and 0.90 land and then away for
a little bit and didn't realize that a failure in medium skips all the large
tests. (I fixed the medium and expected it to pass but then the large tests ran
and failed).

[uber hbck] Enable hbck to automatically repair table integrity problems as
well as region consistency problems while online.
-

[jira] [Created] (HBASE-5577) improve 'patch submission' section in HBase book

2012-03-13 Thread Sujee Maniyam (Created) (JIRA)

improve 'patch submission' section in HBase book


 Key: HBASE-5577
 URL: https://issues.apache.org/jira/browse/HBASE-5577
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Reporter: Sujee Maniyam
Assignee: Sujee Maniyam


Improve patch section in the book 
http://hbase.apache.org/book/submitting.patches.html


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-03-13 Thread chunhui shen (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

chunhui shen updated HBASE-5270:

Attachment: HBASE-5270-92v11.patch

patchv11 for 0.92

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.2

Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch,
5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch,
5270-testcasev2.patch, HBASE-5270-92v11.patch, HBASE-5270v11.patch,
hbase-5270.patch, hbase-5270v10.patch, hbase-5270v2.patch,
hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch,
hbase-5270v7.patch, hbase-5270v8.patch, hbase-5270v9.patch, sampletest.txt

[jira] [Commented] (HBASE-4608) HLog Compression

[
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228931#comment-13228931
]

Hadoop QA commented on HBASE-4608:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12518270/4608v23.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 9 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 161 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.regionserver.wal.TestLRUDictionary

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1180//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1180//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1180//console

This message is automatically generated.

HLog Compression

Key: HBASE-4608
URL: https://issues.apache.org/jira/browse/HBASE-4608
Project: HBase
Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
Fix For: 0.94.0

[jira] [Commented] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()


[ 
https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228933#comment-13228933
 ] 

Phabricator commented on HBASE-5542:


sc has commented on the revision HBASE-5542 [jira] Unify 
HRegion.mutateRowsWithLocks() and HRegion.processRow().

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:4417 I just 
check the javadoc.

  It seems that it throws
  TimeoutException, ExecutionException and InterruptedException
  
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/FutureTask.html#get(long,
 java.util.concurrent.TimeUnit)

  But I can see your point. If the exception is warped too many times, it will 
be hard to debug.

REVISION DETAIL
  https://reviews.facebook.net/D2217


 Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()
 

 Key: HBASE-5542
 URL: https://issues.apache.org/jira/browse/HBASE-5542
 Project: HBase
  Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.96.0

 Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.10.patch, 
 HBASE-5542.D2217.11.patch, HBASE-5542.D2217.12.patch, 
 HBASE-5542.D2217.2.patch, HBASE-5542.D2217.3.patch, HBASE-5542.D2217.4.patch, 
 HBASE-5542.D2217.5.patch, HBASE-5542.D2217.6.patch, HBASE-5542.D2217.7.patch, 
 HBASE-5542.D2217.8.patch, HBASE-5542.D2217.9.patch


 mutateRowsWithLocks() does atomic mutations on multiple rows.
 processRow() does atomic read-modify-writes on a single row.
 It will be useful to generalize both and have a
 processRowsWithLocks() that does atomic read-modify-writes on multiple rows.
 This also helps reduce some redundancy in the codes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records

2012-03-13 Thread Laxman (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228936#comment-13228936
 ] 

Laxman commented on HBASE-5564:
---

Thanks Stack. Let me give a try.

 Bulkload is discarding duplicate records
 

 Key: HBASE-5564
 URL: https://issues.apache.org/jira/browse/HBASE-5564
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
 Environment: HBase 0.92
Reporter: Laxman
Assignee: Laxman
  Labels: bulkloader

 Duplicate records are getting discarded when duplicate records exists in same 
 input file and more specifically if they exists in same split.
 Duplicate records are considered if the records are from diffrent different 
 splits.
 Version under test: HBase 0.92

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5571) Table will be disabling forever

2012-03-13 Thread chunhui shen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5571:


Attachment: BASE-5571v2.patch

 Table will be disabling forever
 ---

 Key: HBASE-5571
 URL: https://issues.apache.org/jira/browse/HBASE-5571
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: BASE-5571v2.patch, HBASE-5571.patch


 If we restart master when it is disabling one table, the table will be 
 disabling forever.
 In current logic, Region CLOSE RPC will always returned 
 NotServingRegionException because RS has already closed the region before we 
 restart master. So table will be disabling forever because the region will in 
 RIT all along.
 In another case, when AssignmentManager#rebuildUserRegions(), it will put 
 parent regions to AssignmentManager.regions, so we can't close these parent 
 regions until it is purged by CatalogJanitor if we execute disabling the 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4608) HLog Compression


 [ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4608:
-

Attachment: 4608v24.txt

Address Ted's comments.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5571) Table will be disabling forever

2012-03-13 Thread chunhui shen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228942#comment-13228942
 ] 

chunhui shen commented on HBASE-5571:
-

patch v2, check region is splitting through AssignmentManager#RegionState,
Also I fix another a bug when region is disabling and split is rolled back, in 
current logic, if parent region is roll back and it will not be closed if the 
table is disabling.

 Table will be disabling forever
 ---

 Key: HBASE-5571
 URL: https://issues.apache.org/jira/browse/HBASE-5571
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: BASE-5571v2.patch, HBASE-5571.patch


 If we restart master when it is disabling one table, the table will be 
 disabling forever.
 In current logic, Region CLOSE RPC will always returned 
 NotServingRegionException because RS has already closed the region before we 
 restart master. So table will be disabling forever because the region will in 
 RIT all along.
 In another case, when AssignmentManager#rebuildUserRegions(), it will put 
 parent regions to AssignmentManager.regions, so we can't close these parent 
 regions until it is purged by CatalogJanitor if we execute disabling the 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()


[ 
https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228943#comment-13228943
 ] 

Phabricator commented on HBASE-5542:


sc has commented on the revision HBASE-5542 [jira] Unify 
HRegion.mutateRowsWithLocks() and HRegion.processRow().

  @Ted: I think there are some problem with file renaming with 
reviews.facebook.net.
  My git patch actually works fine. It applies to trunk.

  If this patch doesn't apply again. I will manually upload the patch to JIRA.

REVISION DETAIL
  https://reviews.facebook.net/D2217


 Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()
 

 Key: HBASE-5542
 URL: https://issues.apache.org/jira/browse/HBASE-5542
 Project: HBase
  Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.96.0

 Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.10.patch, 
 HBASE-5542.D2217.11.patch, HBASE-5542.D2217.12.patch, 
 HBASE-5542.D2217.13.patch, HBASE-5542.D2217.2.patch, 
 HBASE-5542.D2217.3.patch, HBASE-5542.D2217.4.patch, HBASE-5542.D2217.5.patch, 
 HBASE-5542.D2217.6.patch, HBASE-5542.D2217.7.patch, HBASE-5542.D2217.8.patch, 
 HBASE-5542.D2217.9.patch


 mutateRowsWithLocks() does atomic mutations on multiple rows.
 processRow() does atomic read-modify-writes on a single row.
 It will be useful to generalize both and have a
 processRowsWithLocks() that does atomic read-modify-writes on multiple rows.
 This also helps reduce some redundancy in the codes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()

2012-03-13 Thread Phabricator (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5542:
---

Attachment: HBASE-5542.D2217.13.patch

sc updated the revision HBASE-5542 [jira] Unify HRegion.mutateRowsWithLocks() 
and HRegion.processRow().
Reviewers: tedyu, lhofhansl, JIRA

  Log the IOE for easier debugging.

REVISION DETAIL
  https://reviews.facebook.net/D2217

AFFECTED FILES
  
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRowProcessorEndpoint.java
  src/main/java/org/apache/hadoop/hbase/coprocessor/RowProcessorProtocol.java
  src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/MultiRowMutationProcessor.java
  src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java
  src/main/java/org/apache/hadoop/hbase/coprocessor/RowProcessor.java
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestProcessRowEndpoint.java
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java


 Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()
 

 Key: HBASE-5542
 URL: https://issues.apache.org/jira/browse/HBASE-5542
 Project: HBase
  Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.96.0

 Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.10.patch, 
 HBASE-5542.D2217.11.patch, HBASE-5542.D2217.12.patch, 
 HBASE-5542.D2217.13.patch, HBASE-5542.D2217.2.patch, 
 HBASE-5542.D2217.3.patch, HBASE-5542.D2217.4.patch, HBASE-5542.D2217.5.patch, 
 HBASE-5542.D2217.6.patch, HBASE-5542.D2217.7.patch, HBASE-5542.D2217.8.patch, 
 HBASE-5542.D2217.9.patch


 mutateRowsWithLocks() does atomic mutations on multiple rows.
 processRow() does atomic read-modify-writes on a single row.
 It will be useful to generalize both and have a
 processRowsWithLocks() that does atomic read-modify-writes on multiple rows.
 This also helps reduce some redundancy in the codes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228946#comment-13228946
 ] 

stack commented on HBASE-4608:
--

Here's my compressing, decompressing, compressing again, decompressing again, 
then recompressing a random log file from our front end:

{code}
-rw-r--r--1 stack  staff   64928728 Mar 13 20:43 
sv4r25s8%3A60020.1331661889339
-rwxrwxrwx1 stack  staff   28540761 Mar 13 20:48 
sv4r25s8%3A60020.1331661889339.compressed
-rwxrwxrwx1 stack  staff   28540761 Mar 13 20:58 
sv4r25s8%3A60020.1331661889339.compressed.again
-rwxrwxrwx1 stack  staff   28540761 Mar 13 21:02 
sv4r25s8%3A60020.1331661889339.compressed.again.again
-rwxrwxrwx1 stack  staff   64945799 Mar 13 20:57 
sv4r25s8%3A60020.1331661889339.decompressed
-rwxrwxrwx1 stack  staff   64945799 Mar 13 21:02 
sv4r25s8%3A60020.1331661889339.decompressed.again
{code}

Its 44% of original size.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: stack
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression