[jira] [Updated] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-12 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5677:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522500/5677-proposal.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 3 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestMasterReplication
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.regionserver.wal.TestHLog
  org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1502//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1502//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1502//console

This message is automatically generated.)

> The master never does balance because duplicate openhandled the one region
> --
>
> Key: HBASE-5677
> URL: https://issues.apache.org/jira/browse/HBASE-5677
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6
> Environment: 0.90
>Reporter: xufeng
>Assignee: xufeng
> Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
> Attachments: 5677-proposal.txt, 5677-proposal.txt, 5677-proposal.txt, 
> HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, 
> surefire-report_patched_v1.html
>
>
> If region be assigned When the master is doing initialization(before do 
> processFailover),the region will be duplicate openhandled.
> because the unassigned node in zookeeper will be handled again in 
> AssignmentManager#processFailover()
> it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5778:
--

Attachment: 5778.addendum

Is this what you had for TestHLog ?

> Turn on WAL compression by default
> --
>
> Key: HBASE-5778
> URL: https://issues.apache.org/jira/browse/HBASE-5778
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>Assignee: Lars Hofhansl
>Priority: Blocker
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5778.addendum, HBASE-5778.patch
>
>
> I ran some tests to verify if WAL compression should be turned on by default.
> For a use case where it's not very useful (values two order of magnitude 
> bigger than the keys), the insert time wasn't different and the CPU usage 15% 
> higher (150% CPU usage VS 130% when not compressing the WAL).
> When values are smaller than the keys, I saw a 38% improvement for the insert 
> run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
> WAL compression accounts for all the additional CPU usage, it might just be 
> that we're able to insert faster and we spend more time in the MemStore per 
> second (because our MemStores are bad when they contain tens of thousands of 
> values).
> Those are two extremes, but it shows that for the price of some CPU we can 
> save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
> CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm

2012-04-08 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5656:
--

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

Latest patch applies to trunk cleanly.

> LoadIncrementalHFiles createTable should detect and set compression algorithm
> -
>
> Key: HBASE-5656
> URL: https://issues.apache.org/jira/browse/HBASE-5656
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.92.1
>Reporter: Cosmin Lehene
>Assignee: Cosmin Lehene
> Fix For: 0.92.2, 0.94.0, 0.96.0
>
> Attachments: 5656-simple.txt, HBASE-5656-0.92.patch, 
> HBASE-5656-0.92.patch, HBASE-5656-0.92.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> LoadIncrementalHFiles doesn't set compression when creating the the table.
> This can be detected from the files within each family dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-04-01 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5689:
--

Attachment: HBASE-5689.patch

Re-attaching Chunhui's patch

> Skipping RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 5689-simplified.txt, 5689-testcase.patch, 
> HBASE-5689.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-03-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5606:
--

Attachment: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch

Re-attaching Prakash's patch.

> SplitLogManger async delete node hangs log splitting when ZK connection is 
> lost 
> 
>
> Key: HBASE-5606
> URL: https://issues.apache.org/jira/browse/HBASE-5606
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0
>Reporter: Gopinathan A
>Priority: Critical
> Fix For: 0.92.2
>
> Attachments: 
> 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
> 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch
>
>
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
> deleted asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. The asynchronously deletion in step 2 finally happened for new task
> 5. This made the SplitLogManger in hanging state.
> This leads to .META. region not assigened for long time
> {noformat}
> hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(55413,79):2012-03-14 
> 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
> splitlog task at znode 
> /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
> hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(89303,79):2012-03-14 
> 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
> splitlog task at znode 
> /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
> {noformat}
> {noformat}
> hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(80417,99):2012-03-14 
> 19:34:31,196 DEBUG 
> org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
> /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
> hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(89456,99):2012-03-14 
> 19:34:32,497 DEBUG 
> org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
> /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-03-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5606:
--

Attachment: (was: 5606.txt)

> SplitLogManger async delete node hangs log splitting when ZK connection is 
> lost 
> 
>
> Key: HBASE-5606
> URL: https://issues.apache.org/jira/browse/HBASE-5606
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0
>Reporter: Gopinathan A
>Priority: Critical
> Fix For: 0.92.2
>
> Attachments: 
> 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch
>
>
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
> deleted asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. The asynchronously deletion in step 2 finally happened for new task
> 5. This made the SplitLogManger in hanging state.
> This leads to .META. region not assigened for long time
> {noformat}
> hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(55413,79):2012-03-14 
> 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
> splitlog task at znode 
> /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
> hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(89303,79):2012-03-14 
> 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
> splitlog task at znode 
> /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
> {noformat}
> {noformat}
> hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(80417,99):2012-03-14 
> 19:34:31,196 DEBUG 
> org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
> /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
> hbase-root-master-HOST-192-168-47-204.log.2012-03-14"(89456,99):2012-03-14 
> 19:34:32,497 DEBUG 
> org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
> /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5544) Add metrics to HRegion.processRow()

2012-03-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5544:
--

Fix Version/s: 0.96.0

> Add metrics to HRegion.processRow()
> ---
>
> Key: HBASE-5544
> URL: https://issues.apache.org/jira/browse/HBASE-5544
> Project: HBase
>  Issue Type: New Feature
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.96.0
>
> Attachments: HBASE-5544.D2457.1.patch, HBASE-5544.D2457.2.patch
>
>
> Add metrics of
> 1. time for waiting for the lock
> 2. processing time (scan time)
> 3. time spent while holding the lock
> 4. total call time
> 5. number of failures / calls

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5544) Add metrics to HRegion.processRow()

2012-03-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5544:
--

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

> Add metrics to HRegion.processRow()
> ---
>
> Key: HBASE-5544
> URL: https://issues.apache.org/jira/browse/HBASE-5544
> Project: HBase
>  Issue Type: New Feature
>Reporter: Scott Chen
>Assignee: Scott Chen
> Attachments: HBASE-5544.D2457.1.patch, HBASE-5544.D2457.2.patch
>
>
> Add metrics of
> 1. time for waiting for the lock
> 2. processing time (scan time)
> 3. time spent while holding the lock
> 4. total call time
> 5. number of failures / calls

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-03-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-2600:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12519878/HBASE-2600%2B5217-Sun-Mar-25-2012.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1302//console

This message is automatically generated.)

> Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
> tablename+ENDROW+randomid
> 
>
> Key: HBASE-2600
> URL: https://issues.apache.org/jira/browse/HBASE-2600
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Alex Newman
> Attachments: 
> 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 
> 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, jenkins.pdf
>
>
> This is an idea that Ryan and I have been kicking around on and off for a 
> while now.
> If regionnames were made of tablename+endrow instead of tablename+startrow, 
> then in the metatables, doing a search for the region that contains the 
> wanted row, we'd just have to open a scanner using passed row and the first 
> row found by the scan would be that of the region we need (If offlined 
> parent, we'd have to scan to the next row).
> If we redid the meta tables in this format, we'd be using an access that is 
> natural to hbase, a scan as opposed to the perverse, expensive 
> getClosestRowBefore we currently have that has to walk backward in meta 
> finding a containing region.
> This issue is about changing the way we name regions.
> If we were using scans, prewarming client cache would be near costless (as 
> opposed to what we'll currently have to do which is first a 
> getClosestRowBefore and then a scan from the closestrowbefore forward).
> Converting to the new method, we'd have to run a migration on startup 
> changing the content in meta.
> Up to this, the randomid component of a region name has been the timestamp of 
> region creation.   HBASE-2531 "32-bit encoding of regionnames waaay 
> too susceptible to hash clashes" proposes changing the randomid so that it 
> contains actual name of the directory in the filesystem that hosts the 
> region.  If we had this in place, I think it would help with the migration to 
> this new way of doing the meta because as is, the region name in fs is a hash 
> of regionname... changing the format of the regionname would mean we generate 
> a different hash... so we'd need hbase-2531 to be in place before we could do 
> this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-03-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-2600:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12519877/0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 64 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1301//console

This message is automatically generated.)

> Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
> tablename+ENDROW+randomid
> 
>
> Key: HBASE-2600
> URL: https://issues.apache.org/jira/browse/HBASE-2600
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Alex Newman
> Attachments: 
> 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 
> 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, jenkins.pdf
>
>
> This is an idea that Ryan and I have been kicking around on and off for a 
> while now.
> If regionnames were made of tablename+endrow instead of tablename+startrow, 
> then in the metatables, doing a search for the region that contains the 
> wanted row, we'd just have to open a scanner using passed row and the first 
> row found by the scan would be that of the region we need (If offlined 
> parent, we'd have to scan to the next row).
> If we redid the meta tables in this format, we'd be using an access that is 
> natural to hbase, a scan as opposed to the perverse, expensive 
> getClosestRowBefore we currently have that has to walk backward in meta 
> finding a containing region.
> This issue is about changing the way we name regions.
> If we were using scans, prewarming client cache would be near costless (as 
> opposed to what we'll currently have to do which is first a 
> getClosestRowBefore and then a scan from the closestrowbefore forward).
> Converting to the new method, we'd have to run a migration on startup 
> changing the content in meta.
> Up to this, the randomid component of a region name has been the timestamp of 
> region creation.   HBASE-2531 "32-bit encoding of regionnames waaay 
> too susceptible to hash clashes" proposes changing the randomid so that it 
> contains actual name of the directory in the filesystem that hosts the 
> region.  If we had this in place, I think it would help with the migration to 
> this new way of doing the meta because as is, the region name in fs is a hash 
> of regionname... changing the format of the regionname would mean we generate 
> a different hash... so we'd need hbase-2531 to be in place before we could do 
> this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-03-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-2600:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12510928/0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 31 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/800//console

This message is automatically generated.)

> Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
> tablename+ENDROW+randomid
> 
>
> Key: HBASE-2600
> URL: https://issues.apache.org/jira/browse/HBASE-2600
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Alex Newman
> Attachments: 
> 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 
> 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 
> 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, jenkins.pdf
>
>
> This is an idea that Ryan and I have been kicking around on and off for a 
> while now.
> If regionnames were made of tablename+endrow instead of tablename+startrow, 
> then in the metatables, doing a search for the region that contains the 
> wanted row, we'd just have to open a scanner using passed row and the first 
> row found by the scan would be that of the region we need (If offlined 
> parent, we'd have to scan to the next row).
> If we redid the meta tables in this format, we'd be using an access that is 
> natural to hbase, a scan as opposed to the perverse, expensive 
> getClosestRowBefore we currently have that has to walk backward in meta 
> finding a containing region.
> This issue is about changing the way we name regions.
> If we were using scans, prewarming client cache would be near costless (as 
> opposed to what we'll currently have to do which is first a 
> getClosestRowBefore and then a scan from the closestrowbefore forward).
> Converting to the new method, we'd have to run a migration on startup 
> changing the content in meta.
> Up to this, the randomid component of a region name has been the timestamp of 
> region creation.   HBASE-2531 "32-bit encoding of regionnames waaay 
> too susceptible to hash clashes" proposes changing the randomid so that it 
> contains actual name of the directory in the filesystem that hosts the 
> region.  If we had this in place, I think it would help with the migration to 
> this new way of doing the meta because as is, the region name in fs is a hash 
> of regionname... changing the format of the regionname would mean we generate 
> a different hash... so we'd need hbase-2531 to be in place before we could do 
> this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5615) the master never does balance because of balancing the parent region

2012-03-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5615:
--

Fix Version/s: 0.94.0
   0.92.2

> the master never does balance because of balancing the parent region
> 
>
> Key: HBASE-5615
> URL: https://issues.apache.org/jira/browse/HBASE-5615
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.7
>Reporter: xufeng
>Assignee: xufeng
>Priority: Critical
> Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
> Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, 
> NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html
>
>
> the master never do balance becauseof when master do rebuildUserRegions(),it 
> will add the parent region into  AssignmentManager#servers,
> if balancer let the parent region to move,the parent will in RIT forever.thus 
> balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5190) Limit the IPC queue size based on calls' payload size

2012-03-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5190:
--

Attachment: 5190.addendum

Suggested addendum.

@J-D:
Please take a look.

> Limit the IPC queue size based on calls' payload size
> -
>
> Key: HBASE-5190
> URL: https://issues.apache.org/jira/browse/HBASE-5190
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.5
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5190.addendum, HBASE-5190-v2.patch, HBASE-5190-v3.patch, 
> HBASE-5190.patch
>
>
> Currently we limit the number of calls in the IPC queue only on their count. 
> It used to be really high and was dropped down recently to num_handlers * 10 
> (so 100 by default) because it was easy to OOME yourself when huge calls were 
> being queued. It's still possible to hit this problem if you use really big 
> values and/or a lot of handlers, so the idea is that we should take into 
> account the payload size. I can see 3 solutions:
>  - Do the accounting outside of the queue itself for all calls coming in and 
> out and when a call doesn't fit, throw a retryable exception.
>  - Same accounting but instead block the call when it comes in until space is 
> made available.
>  - Add a new parameter for the maximum size (in bytes) of a Call and then set 
> the size the IPC queue (in terms of the number of items) so that it could 
> only contain as many items as some predefined maximum size (in bytes) for the 
> whole queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5615) the master never does balance because of balancing the parent region

2012-03-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5615:
--

Fix Version/s: 0.96.0
   Status: Patch Available  (was: Open)

> the master never does balance because of balancing the parent region
> 
>
> Key: HBASE-5615
> URL: https://issues.apache.org/jira/browse/HBASE-5615
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.7
>Reporter: xufeng
>Assignee: xufeng
>Priority: Critical
> Fix For: 0.90.7, 0.96.0
>
> Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, 
> NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html
>
>
> the master never do balance becauseof when master do rebuildUserRegions(),it 
> will add the parent region into  AssignmentManager#servers,
> if balancer let the parent region to move,the parent will in RIT forever.thus 
> balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5615) the master never does balance because of balancing the parent region

2012-03-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5615:
--

Attachment: 5615-trunk.txt

> the master never does balance because of balancing the parent region
> 
>
> Key: HBASE-5615
> URL: https://issues.apache.org/jira/browse/HBASE-5615
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.7
>Reporter: xufeng
>Assignee: xufeng
>Priority: Critical
> Fix For: 0.90.7, 0.96.0
>
> Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, 
> NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html
>
>
> the master never do balance becauseof when master do rebuildUserRegions(),it 
> will add the parent region into  AssignmentManager#servers,
> if balancer let the parent region to move,the parent will in RIT forever.thus 
> balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5615) the master never does balance because of balancing the parent region

2012-03-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5615:
--

Fix Version/s: 0.90.7
 Hadoop Flags: Reviewed
  Summary: the master never does balance because of balancing the 
parent region  (was: the master never do balance becauseof  balance the parent 
region)

> the master never does balance because of balancing the parent region
> 
>
> Key: HBASE-5615
> URL: https://issues.apache.org/jira/browse/HBASE-5615
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.7
>Reporter: xufeng
>Assignee: xufeng
>Priority: Critical
> Fix For: 0.90.7
>
> Attachments: HBASE-5615-90.patch, HBASE-5615.patch, 
> NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html
>
>
> the master never do balance becauseof when master do rebuildUserRegions(),it 
> will add the parent region into  AssignmentManager#servers,
> if balancer let the parent region to move,the parent will in RIT forever.thus 
> balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-19 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3996:
--

Attachment: 3996-v4.txt

> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> --
>
> Key: HBASE-3996
> URL: https://issues.apache.org/jira/browse/HBASE-3996
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Eran Kutner
>Assignee: Eran Kutner
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, HBase-3996.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple 
> scanners on a single table can save a lot of time when running map/reduce 
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-19 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3996:
--

Attachment: (was: 3996-v4.txt)

> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> --
>
> Key: HBASE-3996
> URL: https://issues.apache.org/jira/browse/HBASE-3996
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Eran Kutner
>Assignee: Eran Kutner
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, HBase-3996.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple 
> scanners on a single table can save a lot of time when running map/reduce 
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-19 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3996:
--

Attachment: 3996-v4.txt

Latest patch from review board.

> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> --
>
> Key: HBASE-3996
> URL: https://issues.apache.org/jira/browse/HBASE-3996
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Eran Kutner
>Assignee: Eran Kutner
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, HBase-3996.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple 
> scanners on a single table can save a lot of time when running map/reduce 
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-19 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3996:
--

Status: Patch Available  (was: Open)

> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> --
>
> Key: HBASE-3996
> URL: https://issues.apache.org/jira/browse/HBASE-3996
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Eran Kutner
>Assignee: Eran Kutner
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 3996-v2.txt, 3996-v3.txt, HBase-3996.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple 
> scanners on a single table can save a lot of time when running map/reduce 
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-19 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3996:
--

Attachment: 3996-v3.txt

> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> --
>
> Key: HBASE-3996
> URL: https://issues.apache.org/jira/browse/HBASE-3996
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Eran Kutner
>Assignee: Eran Kutner
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 3996-v2.txt, 3996-v3.txt, HBase-3996.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple 
> scanners on a single table can save a lot of time when running map/reduce 
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-19 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3996:
--

Attachment: (was: 3996-v3.txt)

> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> --
>
> Key: HBASE-3996
> URL: https://issues.apache.org/jira/browse/HBASE-3996
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Eran Kutner
>Assignee: Eran Kutner
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 3996-v2.txt, 3996-v3.txt, HBase-3996.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple 
> scanners on a single table can save a lot of time when running map/reduce 
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-19 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3996:
--

Attachment: 3996-v3.txt

Patch v3 compiles

I reformatted some of the new code.

> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> --
>
> Key: HBASE-3996
> URL: https://issues.apache.org/jira/browse/HBASE-3996
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Eran Kutner
>Assignee: Eran Kutner
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 3996-v2.txt, 3996-v3.txt, HBase-3996.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple 
> scanners on a single table can save a lot of time when running map/reduce 
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-19 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3996:
--

Fix Version/s: 0.94.0

Adding 0.94 according to Lars' feedback.

> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> --
>
> Key: HBASE-3996
> URL: https://issues.apache.org/jira/browse/HBASE-3996
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Eran Kutner
>Assignee: Eran Kutner
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 3996-v2.txt, HBase-3996.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple 
> scanners on a single table can save a lot of time when running map/reduce 
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5521) Move compression/decompression to an encoder specific encoding context

2012-03-17 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5521:
--

Fix Version/s: 0.96.0

> Move compression/decompression to an encoder specific encoding context
> --
>
> Key: HBASE-5521
> URL: https://issues.apache.org/jira/browse/HBASE-5521
> Project: HBase
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.96.0
>
> Attachments: HBASE-5521.1.patch, HBASE-5521.D2097.1.patch, 
> HBASE-5521.D2097.2.patch, HBASE-5521.D2097.3.patch, HBASE-5521.D2097.4.patch, 
> HBASE-5521.D2097.5.patch, HBASE-5521.D2097.6.patch, HBASE-5521.D2097.7.patch, 
> HBASE-5521.D2097.8.patch, HBASE-5521.D2097.9.patch
>
>
> As part of working on HBASE-5313, we want to add a new columnar 
> encoder/decoder. It makes sense to move compression to be part of 
> encoder/decoder:
> 1) a scanner for a columnar encoded block can do lazy decompression to a 
> specific part of a key value object
> 2) avoid an extra bytes copy from encoder to hblock-writer. 
> If there is no encoder specified for a writer, the HBlock.Writer will use a 
> default compression-context to do something very similar to today's code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5521) Move compression/decompression to an encoder specific encoding context

2012-03-17 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5521:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12517319/HBASE-5521.D2097.3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -125 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 158 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.io.encoding.TestDataBlockEncoders

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1122//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1122//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1122//console

This message is automatically generated.)

> Move compression/decompression to an encoder specific encoding context
> --
>
> Key: HBASE-5521
> URL: https://issues.apache.org/jira/browse/HBASE-5521
> Project: HBase
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HBASE-5521.1.patch, HBASE-5521.D2097.1.patch, 
> HBASE-5521.D2097.2.patch, HBASE-5521.D2097.3.patch, HBASE-5521.D2097.4.patch, 
> HBASE-5521.D2097.5.patch, HBASE-5521.D2097.6.patch, HBASE-5521.D2097.7.patch, 
> HBASE-5521.D2097.8.patch, HBASE-5521.D2097.9.patch
>
>
> As part of working on HBASE-5313, we want to add a new columnar 
> encoder/decoder. It makes sense to move compression to be part of 
> encoder/decoder:
> 1) a scanner for a columnar encoded block can do lazy decompression to a 
> specific part of a key value object
> 2) avoid an extra bytes copy from encoder to hblock-writer. 
> If there is no encoder specified for a writer, the HBlock.Writer will use a 
> default compression-context to do something very similar to today's code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK

2012-03-12 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5206:
--

Attachment: 5206_trunk-v2.patch

With patch v2, TestDrainingServer passes.

> Port HBASE-5155 to 0.92 and TRUNK
> -
>
> Key: HBASE-5206
> URL: https://issues.apache.org/jira/browse/HBASE-5206
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.2, 0.96.0
>Reporter: Zhihong Yu
> Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
> 5206_trunk-v2.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch
>
>
> This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
> not happen parallely leading to recreation of regions that were deleted) to 
> 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5563) HRegionInfo#compareTo add the comparison of regionId

2012-03-12 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5563:
--

Status: Open  (was: Patch Available)

> HRegionInfo#compareTo add the comparison of regionId
> 
>
> Key: HBASE-5563
> URL: https://issues.apache.org/jira/browse/HBASE-5563
> Project: HBase
>  Issue Type: Bug
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-5563.patch
>
>
> In the one region multi assigned case,  we could find that two regions have 
> the same table name, same startKey, same endKey, and different regionId, so 
> these two regions are same in TreeMap but different in HashMap.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()

2012-03-12 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5542:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12517843/HBASE-5542.D2217.6.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -120 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 158 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1156//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1156//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1156//console

This message is automatically generated.)

> Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()
> 
>
> Key: HBASE-5542
> URL: https://issues.apache.org/jira/browse/HBASE-5542
> Project: HBase
>  Issue Type: Improvement
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.96.0
>
> Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.2.patch, 
> HBASE-5542.D2217.3.patch, HBASE-5542.D2217.4.patch, HBASE-5542.D2217.5.patch, 
> HBASE-5542.D2217.6.patch, HBASE-5542.D2217.7.patch
>
>
> mutateRowsWithLocks() does atomic mutations on multiple rows.
> processRow() does atomic read-modify-writes on a single row.
> It will be useful to generalize both and have a
> processRowsWithLocks() that does atomic read-modify-writes on multiple rows.
> This also helps reduce some redundancy in the codes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5563) HRegionInfo#compareTo add the comparison of regionId

2012-03-12 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5563:
--

Status: Patch Available  (was: Open)

> HRegionInfo#compareTo add the comparison of regionId
> 
>
> Key: HBASE-5563
> URL: https://issues.apache.org/jira/browse/HBASE-5563
> Project: HBase
>  Issue Type: Bug
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-5563.patch
>
>
> In the one region multi assigned case,  we could find that two regions have 
> the same table name, same startKey, same endKey, and different regionId, so 
> these two regions are same in TreeMap but different in HashMap.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-12 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4951:
--

Status: Patch Available  (was: Open)

> master process can not be stopped when it is initializing
> -
>
> Key: HBASE-4951
> URL: https://issues.apache.org/jira/browse/HBASE-4951
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.3
>Reporter: xufeng
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.90.6
>
> Attachments: HBASE-4951.patch
>
>
> It is easy to reproduce by following step:
> step1:start master process.(do not start regionserver process in the cluster).
> the master will wait the regionserver to check in:
> org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
> checkin
> step2:stop the master by sh command bin/hbase master stop
> result:the master process will never die because catalogTracker.waitForRoot() 
> method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-12 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4951:
--

Comment: was deleted

(was: {code}
"master-linux76,6,1323451554081" prio=10 tid=0x085cdc00 nid=0x685f waiting 
on condition [0x6fae9000..0x6fae9e50]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:530)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:489)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:336)
at java.lang.Thread.run(Thread.java:619)
{code}
One thread dump)

> master process can not be stopped when it is initializing
> -
>
> Key: HBASE-4951
> URL: https://issues.apache.org/jira/browse/HBASE-4951
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.3
>Reporter: xufeng
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.90.6
>
> Attachments: HBASE-4951.patch
>
>
> It is easy to reproduce by following step:
> step1:start master process.(do not start regionserver process in the cluster).
> the master will wait the regionserver to check in:
> org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
> checkin
> step2:stop the master by sh command bin/hbase master stop
> result:the master process will never die because catalogTracker.waitForRoot() 
> method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to des

2011-12-08 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4946:
--

Fix Version/s: 0.94.0
   0.92.0

> HTable.coprocessorExec (and possibly coprocessorProxy) does not work with 
> dynamically loaded coprocessors (from hdfs or local system), because the RPC 
> system tries to deserialize an unknown class. 
> -
>
> Key: HBASE-4946
> URL: https://issues.apache.org/jira/browse/HBASE-4946
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Affects Versions: 0.92.0
>Reporter: Andrei Dragomir
>Assignee: Andrei Dragomir
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4946-v4.txt, 4946-v5.txt, HBASE-4946-v2.patch, 
> HBASE-4946-v3.patch, HBASE-4946.patch
>
>
> Loading coprocessors jars from hdfs works fine. I load it from the shell, 
> after setting the attribute, and it gets loaded:
> {noformat}
> INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor 
> config now ...
> INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class 
> com.MyCoprocessorClass needs to be loaded from a file - 
> hdfs://localhost:9000/coproc/rt-  >0.0.1-SNAPSHOT.jar.
> INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: 
> com.MyCoprocessorClass
> INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: 
> RegionEnvironment createEnvironment
> DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol 
> handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. 
> protocol=com.MyCoprocessorClassProtocol
> INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load 
> coprocessor com.MyCoprocessorClass from HTD of t1 successfully.
> {noformat}
> The problem is that this coprocessors simply extends BaseEndpointCoprocessor, 
> with a dynamic method. When calling this method from the client with 
> HTable.coprocessorExec, I get errors on the HRegionServer, because the call 
> cannot be deserialized from writables. 
> The problem is that Exec tries to do an "early" resolve of the coprocessor 
> class. The coprocessor class is loaded, but it is in the context of the 
> HRegionServer / HRegion. So, the call fails:
> {noformat}
> 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: 
> Error in readFields
> java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found
>   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125)
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575)
>   at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:680)
> Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:247)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
>   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122)
>   ... 10 more
> {noformat}
> Probably the correct way to fix this is to make Exec really smart, so that it 
> knows all the class definitions loaded in CoprocessorHost(s).
> I created a small patch that simply doesn't resolve the class definition in 
> the Exec, instead passing it as string down to the HRegion layer. This layer 
> knows all the definitions, and simply loads it by name. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/Con

[jira] [Updated] (HBASE-4120) isolation and allocation

2011-12-08 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4120:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504702/TablePriority_v8.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -143 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 88 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildOverlap
  org.apache.hadoop.hbase.master.TestMasterFailover
  org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildHole
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase
  org.apache.hadoop.hbase.master.TestRestartCluster

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/328//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/328//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/328//console

This message is automatically generated.)

> isolation and allocation
> 
>
> Key: HBASE-4120
> URL: https://issues.apache.org/jira/browse/HBASE-4120
> Project: HBase
>  Issue Type: New Feature
>  Components: master, regionserver
>Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
>Reporter: Liu Jia
>Assignee: Liu Jia
> Fix For: 0.94.0
>
> Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
> Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
> HBase_isolation_and_allocation_user_guide.pdf, 
> Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
> TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, 
> TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, 
> TablePrioriy_v9.patch
>
>
> The HBase isolation and allocation tool is designed to help users manage 
> cluster resource among different application and tables.
> When we have a large scale of HBase cluster with many applications running on 
> it, there will be lots of problems. In Taobao there is a cluster for many 
> departments to test their applications performance, these applications are 
> based on HBase. With one cluster which has 12 servers, there will be only one 
> application running exclusively on this server, and many other applications 
> must wait until the previous test finished.
> After we add allocation manage function to the cluster, applications can 
> share the cluster and run concurrently. Also if the Test Engineer wants to 
> make sure there is no interference, he/she can move out other tables from 
> this group.
> In groups we use table priority to allocate resource, when system is busy; we 
> can make sure high-priority tables are not affected lower-priority tables
> Different groups can have different region server configurations, some groups 
> optimized for reading can have large block cache size, and others optimized 
> for writing can have large memstore size. 
> Tables and region servers can be moved easily between groups; after changing 
> the configuration, a group can be restarted alone instead of restarting the 
> whole cluster.
> git entry : https://github.com/ICT-Ope/HBase_allocation .
> We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4120) isolation and allocation

2011-12-08 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4120:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12502140/TablePriority_v8_for_trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -145 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 68 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestStoreFileBlockCacheSummary
  org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/154//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/154//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/154//console

This message is automatically generated.)

> isolation and allocation
> 
>
> Key: HBASE-4120
> URL: https://issues.apache.org/jira/browse/HBASE-4120
> Project: HBase
>  Issue Type: New Feature
>  Components: master, regionserver
>Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
>Reporter: Liu Jia
>Assignee: Liu Jia
> Fix For: 0.94.0
>
> Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
> Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
> HBase_isolation_and_allocation_user_guide.pdf, 
> Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
> TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, 
> TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, 
> TablePrioriy_v9.patch
>
>
> The HBase isolation and allocation tool is designed to help users manage 
> cluster resource among different application and tables.
> When we have a large scale of HBase cluster with many applications running on 
> it, there will be lots of problems. In Taobao there is a cluster for many 
> departments to test their applications performance, these applications are 
> based on HBase. With one cluster which has 12 servers, there will be only one 
> application running exclusively on this server, and many other applications 
> must wait until the previous test finished.
> After we add allocation manage function to the cluster, applications can 
> share the cluster and run concurrently. Also if the Test Engineer wants to 
> make sure there is no interference, he/she can move out other tables from 
> this group.
> In groups we use table priority to allocate resource, when system is busy; we 
> can make sure high-priority tables are not affected lower-priority tables
> Different groups can have different region server configurations, some groups 
> optimized for reading can have large block cache size, and others optimized 
> for writing can have large memstore size. 
> Tables and region servers can be moved easily between groups; after changing 
> the configuration, a group can be restarted alone instead of restarting the 
> whole cluster.
> git entry : https://github.com/ICT-Ope/HBase_allocation .
> We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4120) isolation and allocation

2011-12-08 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4120:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504706/TablePriority_v8.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -143 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 88 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestRestartCluster
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase
  org.apache.hadoop.hbase.master.TestMasterFailover
  
org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildOverlap

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/330//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/330//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/330//console

This message is automatically generated.)

> isolation and allocation
> 
>
> Key: HBASE-4120
> URL: https://issues.apache.org/jira/browse/HBASE-4120
> Project: HBase
>  Issue Type: New Feature
>  Components: master, regionserver
>Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
>Reporter: Liu Jia
>Assignee: Liu Jia
> Fix For: 0.94.0
>
> Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
> Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
> HBase_isolation_and_allocation_user_guide.pdf, 
> Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
> TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, 
> TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, 
> TablePrioriy_v9.patch
>
>
> The HBase isolation and allocation tool is designed to help users manage 
> cluster resource among different application and tables.
> When we have a large scale of HBase cluster with many applications running on 
> it, there will be lots of problems. In Taobao there is a cluster for many 
> departments to test their applications performance, these applications are 
> based on HBase. With one cluster which has 12 servers, there will be only one 
> application running exclusively on this server, and many other applications 
> must wait until the previous test finished.
> After we add allocation manage function to the cluster, applications can 
> share the cluster and run concurrently. Also if the Test Engineer wants to 
> make sure there is no interference, he/she can move out other tables from 
> this group.
> In groups we use table priority to allocate resource, when system is busy; we 
> can make sure high-priority tables are not affected lower-priority tables
> Different groups can have different region server configurations, some groups 
> optimized for reading can have large block cache size, and others optimized 
> for writing can have large memstore size. 
> Tables and region servers can be moved easily between groups; after changing 
> the configuration, a group can be restarted alone instead of restarting the 
> whole cluster.
> git entry : https://github.com/ICT-Ope/HBase_allocation .
> We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to des

2011-12-08 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4946:
--

Attachment: 4946-v5.txt

Patch v5 removes commented out code.

> HTable.coprocessorExec (and possibly coprocessorProxy) does not work with 
> dynamically loaded coprocessors (from hdfs or local system), because the RPC 
> system tries to deserialize an unknown class. 
> -
>
> Key: HBASE-4946
> URL: https://issues.apache.org/jira/browse/HBASE-4946
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Affects Versions: 0.92.0
>Reporter: Andrei Dragomir
>Assignee: Andrei Dragomir
> Attachments: 4946-v4.txt, 4946-v5.txt, HBASE-4946-v2.patch, 
> HBASE-4946-v3.patch, HBASE-4946.patch
>
>
> Loading coprocessors jars from hdfs works fine. I load it from the shell, 
> after setting the attribute, and it gets loaded:
> {noformat}
> INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor 
> config now ...
> INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class 
> com.MyCoprocessorClass needs to be loaded from a file - 
> hdfs://localhost:9000/coproc/rt-  >0.0.1-SNAPSHOT.jar.
> INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: 
> com.MyCoprocessorClass
> INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: 
> RegionEnvironment createEnvironment
> DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol 
> handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. 
> protocol=com.MyCoprocessorClassProtocol
> INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load 
> coprocessor com.MyCoprocessorClass from HTD of t1 successfully.
> {noformat}
> The problem is that this coprocessors simply extends BaseEndpointCoprocessor, 
> with a dynamic method. When calling this method from the client with 
> HTable.coprocessorExec, I get errors on the HRegionServer, because the call 
> cannot be deserialized from writables. 
> The problem is that Exec tries to do an "early" resolve of the coprocessor 
> class. The coprocessor class is loaded, but it is in the context of the 
> HRegionServer / HRegion. So, the call fails:
> {noformat}
> 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: 
> Error in readFields
> java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found
>   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125)
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575)
>   at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:680)
> Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:247)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
>   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122)
>   ... 10 more
> {noformat}
> Probably the correct way to fix this is to make Exec really smart, so that it 
> knows all the class definitions loaded in CoprocessorHost(s).
> I created a small patch that simply doesn't resolve the class definition in 
> the Exec, instead passing it as string down to the HRegion layer. This layer 
> knows all the definitions, and simply loads it by name. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministra

[jira] [Updated] (HBASE-4987) wrong use of incarnation var in SplitLogManager

2011-12-08 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4987:
--

Status: Patch Available  (was: Open)

> wrong use of incarnation var in SplitLogManager
> ---
>
> Key: HBASE-4987
> URL: https://issues.apache.org/jira/browse/HBASE-4987
> Project: HBase
>  Issue Type: Bug
>Reporter: Prakash Khemani
>Assignee: Prakash Khemani
> Attachments: HBASE-4987.D675.1.patch, HBASE-4987.D675.2.patch
>
>
> @Ramakrishna found and analyzed an issue in SplitLogManager. But I don't 
> think that the fix is correct. Will upload a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss

2011-12-06 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4880:
--

Status: Patch Available  (was: Open)

> Region is on service before completing openRegionHanlder, may cause data loss
> -
>
> Key: HBASE-4880
> URL: https://issues.apache.org/jira/browse/HBASE-4880
> Project: HBase
>  Issue Type: Bug
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: hbase-4880.patch, hbase-4880v2.patch
>
>
> OpenRegionHandler in regionserver is processed as the following steps:
> {code}
> 1.openregion()(Through it, closed = false, closing = false)
> 2.addToOnlineRegions(region)
> 3.update .meta. table 
> 4.update ZK's node state to RS_ZK_REGION_OPEND
> {code}
> We can find that region is on service before Step 4.
> It means client could put data to this region after step 3.
> What will happen if step 4 is failed processing?
> It will execute OpenRegionHandler#cleanupFailedOpen which will do closing 
> region, and master assign this region to another regionserver.
> If closing region is failed, the data which is put between step 3 and step 4 
> may loss, because the region has been opend on another regionserver and be 
> put new data. Therefore, it may not be recoverd through replayRecoveredEdit() 
> because the edit's LogSeqId is smaller than current region SeqId.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4942) HMaster is unable to start of HFile V1 is used

2011-12-04 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4942:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> HMaster is unable to start of HFile V1 is used
> --
>
> Key: HBASE-4942
> URL: https://issues.apache.org/jira/browse/HBASE-4942
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.92.0
>Reporter: Ted Yu
>Assignee: honghua zhu
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBase_0.92.0_HBASE-4942, HBase_0.94.0_HBASE-4942
>
>
> This was reported by HH Zhu (zhh200...@gmail.com)
> If the following is specified in hbase-site.xml:
> {code}
> 
> hfile.format.version
> 1
> 
> {code}
> Clear the hdfs directory "hbase.rootdir" so that MasterFileSystem.bootstrap() 
> is executed.
> You would see:
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV1.close(HFileReaderV1.java:358)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1083)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:570)
> at org.apache.hadoop.hbase.regionserver.Store.close(Store.java:441)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:782)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:717)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:688)
> at 
> org.apache.hadoop.hbase.master.MasterFileSystem.bootstrap(MasterFileSystem.java:390)
> at 
> org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:356)
> at 
> org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128)
> at 
> org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:113)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314)
> at java.lang.Thread.run(Thread.java:619)
> {code}
> The above exception would lead to:
> {code}
> java.lang.RuntimeException: HMaster Aborted
> at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:152)
> at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
> at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1512)
> {code}
> In org.apache.hadoop.hbase.master.HMaster.HMaster(Configuration conf), we 
> have:
> {code}
> this.conf.setFloat(CacheConfig.HFILE_BLOCK_CACHE_SIZE_KEY, 0.0f);
> {code}
> When CacheConfig is instantiated, the following is called:
> {code}
> org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(Configuration
>  conf)
> {code}
> Since "hfile.block.cache.size" is 0.0, instantiateBlockCache() would return 
> null, resulting in blockCache field of CacheConfig to be null.
> When master closes Root region, 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV1.close(boolean evictOnClose) 
> would be called. cacheConf.getBlockCache() returns null, leading to master 
> abort.
> The following should be called in HFileReaderV1.close(), similar to the code 
> in HFileReaderV2.close():
> {code}
> if (evictOnClose && cacheConf.isBlockCacheEnabled())
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4944) Optionally verify bulk loaded HFiles

2011-12-03 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4944:
--

Attachment: 4944.txt

Patch from Andy.

> Optionally verify bulk loaded HFiles
> 
>
> Key: HBASE-4944
> URL: https://issues.apache.org/jira/browse/HBASE-4944
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.92.0, 0.94.0, 0.90.5
>Reporter: Andrew Purtell
>Priority: Minor
> Attachments: 4944.txt
>
>
> We rely on users to produce properly formatted HFiles for bulk import. 
> Attached patch adds an optional code path, toggled by a configuration 
> property, that verifies the HFile under consideration for import is properly 
> sorted. The default maintains the current behavior, which does not scan the 
> file for correctness.
> Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4415) Add configuration script for setup HBase (hbase-setup-conf.sh)

2011-12-01 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4415:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505810/HBASE-4415-8.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -160 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 71 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestRollingRestart
  org.apache.hadoop.hbase.util.TestRegionSplitter
  org.apache.hadoop.hbase.io.hfile.TestLruBlockCache
  org.apache.hadoop.hbase.client.TestMultiParallel
  org.apache.hadoop.hbase.master.TestRestartCluster
  org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler
  org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.regionserver.TestStore
  org.apache.hadoop.hbase.regionserver.wal.TestHLogBench
  org.apache.hadoop.hbase.rest.TestGzipFilter
  org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD
  org.apache.hadoop.hbase.master.TestLogsCleaner
  org.apache.hadoop.hbase.regionserver.TestAtomicOperation
  org.apache.hadoop.hbase.rest.TestScannersWithFilters
  org.apache.hadoop.hbase.TestInfoServers
  org.apache.hadoop.hbase.regionserver.TestParallelPut
  org.apache.hadoop.hbase.coprocessor.TestClassLoading
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.regionserver.wal.TestLogRolling
  org.apache.hadoop.hbase.filter.TestColumnRangeFilter
  org.apache.hadoop.hbase.mapred.TestTableInputFormat
  org.apache.hadoop.hbase.client.TestHCM
  
org.apache.hadoop.hbase.regionserver.TestStoreFileBlockCacheSummary
  org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildHole
  org.apache.hadoop.hbase.coprocessor.TestMasterObserver
  org.apache.hadoop.hbase.rest.TestStatusResource
  org.apache.hadoop.hbase.TestRegionRebalancing
  org.apache.hadoop.hbase.regionserver.TestMultiColumnScanner
  org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort
  org.apache.hadoop.hbase.regionserver.TestSeekOptimizations
  org.apache.hadoop.hbase.rest.TestVersionResource
  org.apache.hadoop.hbase.client.TestScannerTimeout
  org.apache.hadoop.hbase.client.TestFromClientSide
  org.apache.hadoop.hbase.regionserver.TestFSErrorsExposed
  org.apache.hadoop.hbase.coprocessor.TestAggregateProtocol
  org.apache.hadoop.hbase.regionserver.TestSplitTransaction
  org.apache.hadoop.hbase.rest.TestRowResource
  org.apache.hadoop.hbase.rest.TestScannerResource
  org.apache.hadoop.hbase.ipc.TestDelayedRpc
  org.apache.hadoop.hbase.rest.client.TestRemoteAdmin
  org.apache.hadoop.hbase.util.TestFSUtils
  org.apache.hadoop.hbase.zookeeper.TestZKLeaderManager
  org.apache.hadoop.hbase.master.TestDistributedLogSplitting
  org.apache.hadoop.hbase.rest.TestTableResource
  org.apache.hadoop.hbase.regionserver.wal.TestWALReplay
  org.apache.hadoop.hbase.master.TestHMasterRPCException
  org.apache.hadoop.hbase.util.TestIdLock
  org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster
  org.apache.hadoop.hbase.regionserver.TestMemStore
  org.apache.hadoop.hbase.rest.TestTransform
  org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint
  org.apache.hadoop.hbase.client.TestInstantSchemaChangeSplit
  org.apache.hadoop.hbase.regionserver.TestHRegion
  
org.apache.hadoop.hbase.regionserver.TestReadWriteConsistencyControl
  org.apache.hadoop.hb

[jira] [Updated] (HBASE-4899) Region would be assigned twice easily with continually killing server and moving region in testing environment

2011-11-29 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4899:
--

Status: Patch Available  (was: Open)

> Region would be assigned twice easily with continually  killing server and 
> moving region in testing environment
> ---
>
> Key: HBASE-4899
> URL: https://issues.apache.org/jira/browse/HBASE-4899
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: hbase-4899.patch
>
>
> Before assigning region in ServerShutdownHandler#process, it will check 
> whether region is in RIT,
> however, this checking doesn't work as the excepted in the following case:
> 1.move region A from server B to server C
> 2.kill server B
> 3.start server B immediately
> Let's see what happen in the code for the above case
> {code}
> for step1:
> 1.1 server B close the region A,
> 1.2 master setOffline for region 
> A,(AssignmentManager#setOffline:this.regions.remove(regionInfo))
> 1.3 server C start to open region A.(Not completed)
> for step3:
> master ServerShutdownHandler#process() for server B
> {
> ..
> splitlog()
> ...
> List regionsInTransition =
> this.services.getAssignmentManager()
> .processServerShutdown(this.serverName);
> ...
> Skip regions that were in transition unless CLOSING or PENDING_CLOSE
> ...
> assign region
> }
> {code}
> In fact, when running 
> ServerShutdownHandler#process()#this.services.getAssignmentManager().processServerShutdown(this.serverName),
>  region A is in RIT (step1.3 not completed), but the return List 
> regionsInTransition doesn't contain it, because region A has removed from 
> AssignmentManager.regions by AssignmentManager#setOffline in step 1.2
> Therefore, region A will be assigned twice.
> Actually, one server killed and started twice will also easily cause region 
> assigned twice.
> Exclude the above reason, another probability : 
> when execute ServerShutdownHandler#process()#MetaReader.getServerUserRegions 
> ,region is included which is in RIT now.
> But after completing MetaReader.getServerUserRegions, the region has been 
> opened in other server and is not in RIT now.
> In our testing environment where balancing,moving and killing are executed 
> periodly, assigning region twice often happens, and it is hateful because it 
> will affect other test cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4899) Region would be assigned twice easily with continually killing server and moving region in testing environment

2011-11-29 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4899:
--

  Description: 
Before assigning region in ServerShutdownHandler#process, it will check whether 
region is in RIT,
however, this checking doesn't work as the excepted in the following case:
1.move region A from server B to server C
2.kill server B
3.start server B immediately

Let's see what happen in the code for the above case
{code}
for step1:
1.1 server B close the region A,
1.2 master setOffline for region 
A,(AssignmentManager#setOffline:this.regions.remove(regionInfo))
1.3 server C start to open region A.(Not completed)
for step3:
master ServerShutdownHandler#process() for server B
{
..
splitlog()
...
List regionsInTransition =
this.services.getAssignmentManager()
.processServerShutdown(this.serverName);
...
Skip regions that were in transition unless CLOSING or PENDING_CLOSE
...
assign region
}
{code}
In fact, when running 
ServerShutdownHandler#process()#this.services.getAssignmentManager().processServerShutdown(this.serverName),
 region A is in RIT (step1.3 not completed), but the return List 
regionsInTransition doesn't contain it, because region A has removed from 
AssignmentManager.regions by AssignmentManager#setOffline in step 1.2
Therefore, region A will be assigned twice.

Actually, one server killed and started twice will also easily cause region 
assigned twice.
Exclude the above reason, another probability : 
when execute ServerShutdownHandler#process()#MetaReader.getServerUserRegions 
,region is included which is in RIT now.
But after completing MetaReader.getServerUserRegions, the region has been 
opened in other server and is not in RIT now.

In our testing environment where balancing,moving and killing are executed 
periodly, assigning region twice often happens, and it is hateful because it 
will affect other test cases.

  was:
Before assigning region in ServerShutdownHandler#process, it will check whether 
region is in RIT,
however, this checking doesn't work as the excepted in the following case:
1.move region A from server B to server C
2.kill server B
3.start server B immediately

Let's see what happen in the code for the above case
{code}
for step1:
1.1 server B close the region A,
1.2 master setOffline for region 
A,(AssignmentManager#setOffline:this.regions.remove(regionInfo))
1.3 server C start to open region A.(Not completed)
for step3:
master ServerShutdownHandler#process() for server B
{
..
splitlog()
...
List regionsInTransition =
this.services.getAssignmentManager()
.processServerShutdown(this.serverName);
...
Skip regions that were in transition unless CLOSING or PENDING_CLOSE
...
assign region
}

In fact, when running 
ServerShutdownHandler#process()#this.services.getAssignmentManager().processServerShutdown(this.serverName),
 region A is in RIT (step1.3 not completed), but the return List 
regionsInTransition doesn't contain it, because region A has removed from 
AssignmentManager.regions by AssignmentManager#setOffline in step 1.2
Therefore, region A will be assigned twice.
{code}

Actually, one server killed and started twice will also easily cause region 
assigned twice.
Exclude the above reason, another probability : 
when execute ServerShutdownHandler#process()#MetaReader.getServerUserRegions 
,region is included which is in RIT now.
But after completing MetaReader.getServerUserRegions, the region has been 
opened in other server and is not in RIT now.

In our testing environment where balancing,moving and killing are executed 
periodly, assigning region twice often happens, and it is hateful because it 
will affect other test cases.

Affects Version/s: 0.92.0

> Region would be assigned twice easily with continually  killing server and 
> moving region in testing environment
> ---
>
> Key: HBASE-4899
> URL: https://issues.apache.org/jira/browse/HBASE-4899
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: chunhui shen
> Attachments: hbase-4899.patch
>
>
> Before assigning region in ServerShutdownHandler#process, it will check 
> whether region is in RIT,
> however, this checking doesn't work as the excepted in the following case:
> 1.move region A from server B to server C
> 2.kill server B
> 3.start server B immediately
> Let's see what happen in the code for the above case
> {code}
> for step1:
> 1.1 server B close the region A,
> 1.2 master setOffline for region 
> A,(AssignmentManager#setOffline:this.regions.remove(regionInfo))
> 1.3 server C start to open region A.(Not completed)
> for step3:
> master ServerShutdownHandler#process() for server B
> {
> ..
> splitlog()
> ...
> List regionsInTransition =
> this.services.

[jira] [Updated] (HBASE-4120) isolation and allocation

2011-11-29 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4120:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505326/TablePrioriy_v9.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -140 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 84 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/393//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/393//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/393//console

This message is automatically generated.)

> isolation and allocation
> 
>
> Key: HBASE-4120
> URL: https://issues.apache.org/jira/browse/HBASE-4120
> Project: HBase
>  Issue Type: New Feature
>  Components: master, regionserver
>Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
>Reporter: Liu Jia
>Assignee: Liu Jia
> Fix For: 0.94.0
>
> Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
> Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
> HBase_isolation_and_allocation_user_guide.pdf, 
> Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
> TablePriority_v8.patch, TablePriority_v8.patch, 
> TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch
>
>
> The HBase isolation and allocation tool is designed to help users manage 
> cluster resource among different application and tables.
> When we have a large scale of HBase cluster with many applications running on 
> it, there will be lots of problems. In Taobao there is a cluster for many 
> departments to test their applications performance, these applications are 
> based on HBase. With one cluster which has 12 servers, there will be only one 
> application running exclusively on this server, and many other applications 
> must wait until the previous test finished.
> After we add allocation manage function to the cluster, applications can 
> share the cluster and run concurrently. Also if the Test Engineer wants to 
> make sure there is no interference, he/she can move out other tables from 
> this group.
> In groups we use table priority to allocate resource, when system is busy; we 
> can make sure high-priority tables are not affected lower-priority tables
> Different groups can have different region server configurations, some groups 
> optimized for reading can have large block cache size, and others optimized 
> for writing can have large memstore size. 
> Tables and region servers can be moved easily between groups; after changing 
> the configuration, a group can be restarted alone instead of restarting the 
> whole cluster.
> git entry : https://github.com/ICT-Ope/HBase_allocation .
> We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4883) TestCatalogTracker failing for me on ubuntu

2011-11-28 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4883:
--

Attachment: 4883.addendum

> TestCatalogTracker failing for me on ubuntu
> ---
>
> Key: HBASE-4883
> URL: https://issues.apache.org/jira/browse/HBASE-4883
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
> Attachments: 4883.addendum, 4883.txt, tct.txt
>
>
> {code}
> ---
> Test set: org.apache.hadoop.hbase.catalog.TestCatalogTracker
> ---
> Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.276 sec <<< 
> FAILURE!
> testNoTimeoutWaitForMeta(org.apache.hadoop.hbase.catalog.TestCatalogTracker)  
> Time elapsed: 1.051 sec  <<< ERROR!
> org.mockito.exceptions.misusing.WrongTypeOfReturnValue:
> Result cannot be returned by getConfiguration()
> getConfiguration() should return Configuration
> at 
> org.apache.hadoop.hbase.catalog.TestCatalogTracker.testNoTimeoutWaitForMeta(TestCatalogTracker.java:378)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
> {code}
> The above is strange since it seems to pass on jenkins and on macosx.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4878) Master crash when splitting hlog may cause data loss

2011-11-28 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4878:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Master crash when splitting hlog may cause data loss
> 
>
> Key: HBASE-4878
> URL: https://issues.apache.org/jira/browse/HBASE-4878
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0
>
> Attachments: hbase-4878.diff, hbase-4878v2.patch
>
>
> Let's see the code of HlogSplitter#splitLog(final FileStatus[] logfiles)
> {code}
> private List splitLog(final FileStatus[] logfiles) throws IOException {
>  try {
>   for (FileStatus log : logfiles) {
>   parseHLog(in, logPath, entryBuffers, fs, conf, skipErrors);
>  }
>  archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf);
>  } finally {
>   status.setStatus("Finishing writing output logs and closing down.");
>   splits = outputSink.finishWritingAndClose();
> }
> }
> {code}
> If master is killed, after finishing archiveLogs(srcDir, corruptedLogs, 
> processedLogs, oldLogDir, fs, conf), 
> but before finishing splits = outputSink.finishWritingAndClose();
> Log date would loss!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4885) Building against Hadoop 0.23 uses out-of-date MapReduce artifacts

2011-11-28 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4885:
--

Status: Patch Available  (was: Open)

I tried the patch on TRUNK codebase.
Tests ran smoothly (I didn't finish all tests).

Submit for patch testing.

> Building against Hadoop 0.23 uses out-of-date MapReduce artifacts
> -
>
> Key: HBASE-4885
> URL: https://issues.apache.org/jira/browse/HBASE-4885
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: Tom White
>Assignee: Tom White
> Fix For: 0.94.0
>
> Attachments: HBASE-4885.patch
>
>
> The "hadoop-mapred" artifacts have been replaced by "hadoop-mapreduce-*" 
> artifacts in 0.23 onwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4885) Building against Hadoop 0.23 uses out-of-date MapReduce artifacts

2011-11-28 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4885:
--

Fix Version/s: 0.94.0

> Building against Hadoop 0.23 uses out-of-date MapReduce artifacts
> -
>
> Key: HBASE-4885
> URL: https://issues.apache.org/jira/browse/HBASE-4885
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: Tom White
>Assignee: Tom White
> Fix For: 0.94.0
>
> Attachments: HBASE-4885.patch
>
>
> The "hadoop-mapred" artifacts have been replaced by "hadoop-mapreduce-*" 
> artifacts in 0.23 onwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-28 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 
> 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, 
> hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, 
> hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, 
> hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, 
> hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4869) Backport to 0.92: HBASE-4797 [availability] Skip recovered.edits files with edits older than what region currently has

2011-11-28 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4869:
--

Summary: Backport to 0.92: HBASE-4797 [availability] Skip recovered.edits 
files with edits older than what region currently has  (was: Backport to 0.92: 
HBASE-4797 [availability] Skip recovered.edits files with edits we know older 
than what region currently has.)

> Backport to 0.92: HBASE-4797 [availability] Skip recovered.edits files with 
> edits older than what region currently has
> --
>
> Key: HBASE-4869
> URL: https://issues.apache.org/jira/browse/HBASE-4869
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 0.90.2
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4869-Backport-to-0.92-HBASE-4797-availability-.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4878) Master crash when splitting hlog may cause data loss

2011-11-28 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4878:
--

Summary: Master crash when splitting hlog may cause data loss  (was: Master 
crash when spliting hlog may cause data loss)

> Master crash when splitting hlog may cause data loss
> 
>
> Key: HBASE-4878
> URL: https://issues.apache.org/jira/browse/HBASE-4878
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0
>
> Attachments: hbase-4878.diff, hbase-4878v2.patch
>
>
> Let's see the code of HlogSplitter#splitLog(final FileStatus[] logfiles)
> {code}
> private List splitLog(final FileStatus[] logfiles) throws IOException {
>  try {
>   for (FileStatus log : logfiles) {
>   parseHLog(in, logPath, entryBuffers, fs, conf, skipErrors);
>  }
>  archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf);
>  } finally {
>   status.setStatus("Finishing writing output logs and closing down.");
>   splits = outputSink.finishWritingAndClose();
> }
> }
> {code}
> If master is killed, after finishing archiveLogs(srcDir, corruptedLogs, 
> processedLogs, oldLogDir, fs, conf), 
> but before finishing splits = outputSink.finishWritingAndClose();
> Log date would loss!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4773) HBaseAdmin may leak ZooKeeper connections

2011-11-28 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4773:
--

Summary: HBaseAdmin may leak ZooKeeper connections  (was: HBaseAdmin leaks 
ZooKeeper connections)

> HBaseAdmin may leak ZooKeeper connections
> -
>
> Key: HBASE-4773
> URL: https://issues.apache.org/jira/browse/HBASE-4773
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4
>Reporter: gaojinchao
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 4773.patch, branches_4773.patch, trunk_4773_patch.patch
>
>
> When master crashs, HBaseAdmin will leaks ZooKeeper connections
> I think we should close the zk connetion when throw MasterNotRunningException
>  public HBaseAdmin(Configuration c)
>   throws MasterNotRunningException, ZooKeeperConnectionException {
> this.conf = HBaseConfiguration.create(c);
> this.connection = HConnectionManager.getConnection(this.conf);
> this.pause = this.conf.getLong("hbase.client.pause", 1000);
> this.numRetries = this.conf.getInt("hbase.client.retries.number", 10);
> this.retryLongerMultiplier = 
> this.conf.getInt("hbase.client.retries.longer.multiplier", 10);
> //we should add this code and close the zk connection
> try{
>   this.connection.getMaster();
> }catch(MasterNotRunningException e){
>   HConnectionManager.deleteConnection(conf, false);
>   throw e;  
> }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4773) HBaseAdmin leaks ZooKeeper connections

2011-11-28 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4773:
--

Status: Patch Available  (was: Open)

> HBaseAdmin leaks ZooKeeper connections
> --
>
> Key: HBASE-4773
> URL: https://issues.apache.org/jira/browse/HBASE-4773
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4
>Reporter: gaojinchao
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 4773.patch, branches_4773.patch, trunk_4773_patch.patch
>
>
> When master crashs, HBaseAdmin will leaks ZooKeeper connections
> I think we should close the zk connetion when throw MasterNotRunningException
>  public HBaseAdmin(Configuration c)
>   throws MasterNotRunningException, ZooKeeperConnectionException {
> this.conf = HBaseConfiguration.create(c);
> this.connection = HConnectionManager.getConnection(this.conf);
> this.pause = this.conf.getLong("hbase.client.pause", 1000);
> this.numRetries = this.conf.getInt("hbase.client.retries.number", 10);
> this.retryLongerMultiplier = 
> this.conf.getInt("hbase.client.retries.longer.multiplier", 10);
> //we should add this code and close the zk connection
> try{
>   this.connection.getMaster();
> }catch(MasterNotRunningException e){
>   HConnectionManager.deleteConnection(conf, false);
>   throw e;  
> }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4878) Master crash when spliting hlog may cause data loss

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4878:
--

Fix Version/s: 0.94.0
   0.92.0

> Master crash when spliting hlog may cause data loss
> ---
>
> Key: HBASE-4878
> URL: https://issues.apache.org/jira/browse/HBASE-4878
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0
>
> Attachments: hbase-4878.diff, hbase-4878v2.patch
>
>
> Let's see the code of HlogSplitter#splitLog(final FileStatus[] logfiles)
> {code}
> private List splitLog(final FileStatus[] logfiles) throws IOException {
>  try {
>   for (FileStatus log : logfiles) {
>   parseHLog(in, logPath, entryBuffers, fs, conf, skipErrors);
>  }
>  archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf);
>  } finally {
>   status.setStatus("Finishing writing output logs and closing down.");
>   splits = outputSink.finishWritingAndClose();
> }
> }
> {code}
> If master is killed, after finishing archiveLogs(srcDir, corruptedLogs, 
> processedLogs, oldLogDir, fs, conf), 
> but before finishing splits = outputSink.finishWritingAndClose();
> Log date would loss!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Attachment: 4862-0.92.txt

Patch for 0.92 branch.

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, 
> 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, 
> hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, 
> hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, 
> hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, 
> hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505283/4862-v6-trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 67 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/387//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/387//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/387//console

This message is automatically generated.)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 
> 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, 
> hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, 
> hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, 
> hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, 
> hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Attachment: (was: 4862-v6-trunk.txt)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 
> 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, 
> hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, 
> hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, 
> hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Attachment: 4862-v6-trunk.patch

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 
> 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, 
> hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, 
> hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, 
> hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Status: Open  (was: Patch Available)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 
> 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, 
> hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, 
> hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, 
> hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Status: Patch Available  (was: Open)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, 
> 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, 
> hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, 
> hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, 
> hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4336) Convert source tree into maven modules

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4336:
--

 Priority: Critical  (was: Major)
Fix Version/s: 0.94.0

Raising priority.
I think this work is important so that security related classes can be packaged 
in their own jar.

> Convert source tree into maven modules
> --
>
> Key: HBASE-4336
> URL: https://issues.apache.org/jira/browse/HBASE-4336
> Project: HBase
>  Issue Type: Task
>  Components: build
>Reporter: Gary Helmling
>Priority: Critical
> Fix For: 0.94.0
>
>
> When we originally converted the build to maven we had a single "core" module 
> defined, but later reverted this to a module-less build for the sake of 
> simplicity.
> It now looks like it's time to re-address this, as we have an actual need for 
> modules to:
> * provide a trimmed down "client" library that applications can make use of
> * more cleanly support building against different versions of Hadoop, in 
> place of some of the reflection machinations currently required
> * incorporate the secure RPC engine that depends on some secure Hadoop classes
> I propose we start simply by refactoring into two initial modules:
> * core - common classes and utilities, and client-side code and interfaces
> * server - master and region server implementations and supporting code
> This would also lay the groundwork for incorporating the HBase security 
> features that have been developed.  Once the module structure is in place, 
> security-related features could then be incorporated into a third module -- 
> "security" -- after normal review and approval.  The security module could 
> then depend on secure Hadoop, without modifying the dependencies of the rest 
> of the HBase code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4868) TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4868:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails
> -
>
> Key: HBASE-4868
> URL: https://issues.apache.org/jira/browse/HBASE-4868
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4868-0.92.txt, 4868-v3.txt, HBASE-4868_trial.patch, 
> HBASE-4868_trunkv2.patch
>
>
> looks: 
> https://builds.apache.org/job/HBase-TRUNK-security/7/testReport/org.apache.hadoop.hbase.util.hbck/TestOfflineMetaRebuildBase/testMetaRebuild/
> Please review, see whether the method makes sense? 
> If it makes sense, I will check other cases?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4868) TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4868:
--

Issue Type: Test  (was: Bug)

Integrated to 0.92 and TRUNK.

Thanks for the patch, Jinchao.

Thanks for the review Jonathan.

> TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails
> -
>
> Key: HBASE-4868
> URL: https://issues.apache.org/jira/browse/HBASE-4868
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4868-0.92.txt, 4868-v3.txt, HBASE-4868_trial.patch, 
> HBASE-4868_trunkv2.patch
>
>
> looks: 
> https://builds.apache.org/job/HBase-TRUNK-security/7/testReport/org.apache.hadoop.hbase.util.hbck/TestOfflineMetaRebuildBase/testMetaRebuild/
> Please review, see whether the method makes sense? 
> If it makes sense, I will check other cases?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4868) TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4868:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505254/4868-0.92.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/382//console

This message is automatically generated.)

> TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails
> -
>
> Key: HBASE-4868
> URL: https://issues.apache.org/jira/browse/HBASE-4868
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4868-0.92.txt, 4868-v3.txt, HBASE-4868_trial.patch, 
> HBASE-4868_trunkv2.patch
>
>
> looks: 
> https://builds.apache.org/job/HBase-TRUNK-security/7/testReport/org.apache.hadoop.hbase.util.hbck/TestOfflineMetaRebuildBase/testMetaRebuild/
> Please review, see whether the method makes sense? 
> If it makes sense, I will check other cases?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4868) TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4868:
--

Attachment: 4868-0.92.txt

> TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails
> -
>
> Key: HBASE-4868
> URL: https://issues.apache.org/jira/browse/HBASE-4868
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4868-0.92.txt, 4868-v3.txt, HBASE-4868_trial.patch, 
> HBASE-4868_trunkv2.patch
>
>
> looks: 
> https://builds.apache.org/job/HBase-TRUNK-security/7/testReport/org.apache.hadoop.hbase.util.hbck/TestOfflineMetaRebuildBase/testMetaRebuild/
> Please review, see whether the method makes sense? 
> If it makes sense, I will check other cases?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4868) TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4868:
--

Attachment: 4868-v3.txt

Patch v3 applies same change to TestOfflineMetaRebuildHole and 
TestOfflineMetaRebuildOverlap.

> TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails
> -
>
> Key: HBASE-4868
> URL: https://issues.apache.org/jira/browse/HBASE-4868
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4868-v3.txt, HBASE-4868_trial.patch, 
> HBASE-4868_trunkv2.patch
>
>
> looks: 
> https://builds.apache.org/job/HBase-TRUNK-security/7/testReport/org.apache.hadoop.hbase.util.hbck/TestOfflineMetaRebuildBase/testMetaRebuild/
> Please review, see whether the method makes sense? 
> If it makes sense, I will check other cases?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Attachment: 4862-v6-90.txt

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862-v6-90.txt, 4862-v6-trunk.txt, 4862.patch, 4862.txt, 
> hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 
> trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, 
> hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, 
> hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-27 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Attachment: 4862-v6-trunk.txt

Patch v6 with javadoc updated according to reviews

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862-v6-trunk.txt, 4862.patch, 4862.txt, hbase-4862v1 
> for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, 
> hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, 
> hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, 
> hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4833) HRegionServer stops could be 0.5s faster

2011-11-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4833:
--

Hadoop Flags: Reviewed
 Summary: HRegionServer stops could be 0.5s faster  (was: HRegionServer 
stops could be 0,5s faster)

> HRegionServer stops could be 0.5s faster
> 
>
> Key: HBASE-4833
> URL: https://issues.apache.org/jira/browse/HBASE-4833
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver, test
>Affects Versions: 0.94.0
> Environment: all
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
> Attachments: 4833_trunk_hregionserver.patch, 
> 4833_trunk_hregionserver.v2.patch
>
>
> The current implementation of HRegionServer#stop is
> {noformat}
>   public void stop(final String msg) {
> this.stopped = true;
> LOG.info("STOPPED: " + msg);
> synchronized (this) {
>   // Wakes run() if it is sleeping
>   notifyAll(); // FindBugs NN_NAKED_NOTIFY
> }
>   }
> {noformat}
> The notification is sent on the wrong object and does nothing. As a 
> consequence, the region server continues to sleep instead of waking up and 
> stopping immediately. A correct implementation is:
> {noformat}
>   public void stop(final String msg) {
> this.stopped = true;
> LOG.info("STOPPED: " + msg);
> // Wakes run() if it is sleeping
> sleeper.skipSleepCycle();
>   }
> {noformat}
> Then the region server stops immediately. This makes the region server stops 
> 0,5s faster on average, which is quite useful for unit tests.
> However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
> not work.
> It likely because the code does no expect the region server to stop that 
> fast. See HBASE-4832

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4832:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
> stops too fast
> ---
>
> Key: HBASE-4832
> URL: https://issues.apache.org/jira/browse/HBASE-4832
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors, test
>Affects Versions: 0.94.0
>Reporter: nkeywal
>Assignee: Eugene Koontz
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
> HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch
>
>
> The current implementation of HRegionServer#stop is
> {noformat}
>   public void stop(final String msg) {
> this.stopped = true;
> LOG.info("STOPPED: " + msg);
> synchronized (this) {
>   // Wakes run() if it is sleeping
>   notifyAll(); // FindBugs NN_NAKED_NOTIFY
> }
>   }
> {noformat}
> The notification is sent on the wrong object and does nothing. As a 
> consequence, the region server continues to sleep instead of waking up and 
> stopping immediately. A correct implementation is:
> {noformat}
>   public void stop(final String msg) {
> this.stopped = true;
> LOG.info("STOPPED: " + msg);
> // Wakes run() if it is sleeping
> sleeper.skipSleepCycle();
>   }
> {noformat}
> Then the region server stops immediately. This makes the region server stops 
> 0,5s faster on average, which is quite useful for unit tests.
> However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
> not work.
> It likely because the code does no expect the region server to stop that fast.
> The exception is:
> {noformat}
> testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
>   Time elapsed: 30.06 sec  <<< ERROR!
> java.lang.Exception: test timed out after 3 milliseconds
>   at java.lang.Throwable.fillInStackTrace(Native Method)
>   at java.lang.Throwable.(Throwable.java:196)
>   at java.lang.Exception.(Exception.java:41)
>   at java.lang.InterruptedException.(InterruptedException.java:48)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
>   at 
> org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
>   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
>   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
>   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
>   at 
> org.apache.hadoop.hbase.coprocessor.TestRegi

[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4832:
--

Fix Version/s: 0.94.0
 Release Note:   (was: This incorporates nkeywal's earlier patch to this 
JIRA, and allows TestRegionServerCoprocessortWithAbort() to work with it. It 
changes the test to use a Zookeeper watcher in a separate thread to watch for 
the regionserver to abort. (This is also what is currently done with 
TestMasterCoprocessorWithAbort()).

In my testing, repeated iterations (30+) of 
TestRegionServerCoprocessortWithAbort() succeed.)
 Hadoop Flags: Reviewed

This incorporates nkeywal's earlier patch to this JIRA, and allows 
TestRegionServerCoprocessortWithAbort() to work with it. It changes the test to 
use a Zookeeper watcher in a separate thread to watch for the regionserver to 
abort. (This is also what is currently done with 
TestMasterCoprocessorWithAbort()).

In Eugene's testing, repeated iterations (30+) of 
TestRegionServerCoprocessortWithAbort() succeed.

> TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
> stops too fast
> ---
>
> Key: HBASE-4832
> URL: https://issues.apache.org/jira/browse/HBASE-4832
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors, test
>Affects Versions: 0.94.0
>Reporter: nkeywal
>Assignee: Eugene Koontz
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
> HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch
>
>
> The current implementation of HRegionServer#stop is
> {noformat}
>   public void stop(final String msg) {
> this.stopped = true;
> LOG.info("STOPPED: " + msg);
> synchronized (this) {
>   // Wakes run() if it is sleeping
>   notifyAll(); // FindBugs NN_NAKED_NOTIFY
> }
>   }
> {noformat}
> The notification is sent on the wrong object and does nothing. As a 
> consequence, the region server continues to sleep instead of waking up and 
> stopping immediately. A correct implementation is:
> {noformat}
>   public void stop(final String msg) {
> this.stopped = true;
> LOG.info("STOPPED: " + msg);
> // Wakes run() if it is sleeping
> sleeper.skipSleepCycle();
>   }
> {noformat}
> Then the region server stops immediately. This makes the region server stops 
> 0,5s faster on average, which is quite useful for unit tests.
> However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
> not work.
> It likely because the code does no expect the region server to stop that fast.
> The exception is:
> {noformat}
> testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
>   Time elapsed: 30.06 sec  <<< ERROR!
> java.lang.Exception: test timed out after 3 milliseconds
>   at java.lang.Throwable.fillInStackTrace(Native Method)
>   at java.lang.Throwable.(Throwable.java:196)
>   at java.lang.Exception.(Exception.java:41)
>   at java.lang.InterruptedException.(InterruptedException.java:48)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
>   at 
> org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
>   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
>   at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConne

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-26 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505162/4862.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 67 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/371//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/371//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/371//console

This message is automatically generated.)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for 
> trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, 
> hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4868) TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4868:
--

Summary: TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails  
(was: testMetaRebuild#TestOfflineMetaRebuildBase occasionally fails)

> TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails
> -
>
> Key: HBASE-4868
> URL: https://issues.apache.org/jira/browse/HBASE-4868
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4868_trial.patch, HBASE-4868_trunkv2.patch
>
>
> looks: 
> https://builds.apache.org/job/HBase-TRUNK-security/7/testReport/org.apache.hadoop.hbase.util.hbck/TestOfflineMetaRebuildBase/testMetaRebuild/
> Please review, see whether the method makes sense? 
> If it makes sense, I will check other cases?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4875) ZKLeaderManager.handleLeaderChange() doesn't handle KeeperException$SessionExpiredException

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4875:
--

Description: 
TestMasterFailover#testSimpleMasterFailover has failed twice in a row for 
builds 15 and 16.

>From 
>https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92-security/16/testReport/org.apache.hadoop.hbase.master/TestMasterFailover/testSimpleMasterFailover/:
{code}
2011-11-26 01:34:49,217 DEBUG 
[RegionServer:0;hemera.apache.org,57516,1322271278190-EventThread] 
zookeeper.ZooKeeperWatcher(257): regionserver:57516-0x133dd828133 Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/tokenauth/keymaster
2011-11-26 01:34:49,217 WARN  [Thread-1-EventThread] zookeeper.ZKUtil(234): 
master:52934-0x133dd828131 Unable to set watcher on znode 
/hbase/tokenauth/keymaster
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /hbase/tokenauth/keymaster
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:225)
at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:85)
at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:281)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
2011-11-26 01:34:49,218 ERROR [Thread-1-EventThread] 
zookeeper.ZooKeeperWatcher(403): master:52934-0x133dd828131 Received 
unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /hbase/tokenauth/keymaster
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:225)
at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:85)
at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:281)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
2011-11-26 01:34:49,216 DEBUG 
[RegionServer:2;hemera.apache.org,44702,1322271278232-EventThread] 
zookeeper.ZKUtil(230): hconnection-0x133dd828139 /hbase/master does not 
exist. Watcher is set.
2011-11-26 01:34:49,215 DEBUG [Thread-1-EventThread] zookeeper.ZKUtil(230): 
master:44883-0x133dd828132 /hbase/master does not exist. Watcher is set.
2011-11-26 01:34:49,219 DEBUG [Thread-1-EventThread] 
master.ActiveMasterManager(104): No master available. Notifying waiting threads
2011-11-26 01:34:49,215 INFO  [Master:1;hemera.apache.org,52934,1322271278115] 
master.HMaster(338): HMaster main thread exiting
{code}


  was:
TestMasterFailover#testSimpleMasterFailover has failed twice in a row for 
builds 15 and 16.

>From 
>https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92-security/16/testReport/org.apache.hadoop.hbase.master/TestMasterFailover/testSimpleMasterFailover/:
{code}
2011-11-26 01:34:49,218 ERROR [Thread-1-EventThread] 
zookeeper.ZooKeeperWatcher(403): master:52934-0x133dd828131 Received 
unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /hbase/tokenauth/keymaster
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:225)
at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:85)
at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
at 
org.apache

[jira] [Updated] (HBASE-4874) TestHCM#testClosing fails if not enough entropy is available

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4874:
--

Description: TestHCM#testClosing fails if not enough entropy is available  
(was: TestHCM#testClosing fails on Linux)
Summary: TestHCM#testClosing fails if not enough entropy is available  
(was: TestHCM#testClosing fails on Linux)

> TestHCM#testClosing fails if not enough entropy is available
> 
>
> Key: HBASE-4874
> URL: https://issues.apache.org/jira/browse/HBASE-4874
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ted Yu
>
> TestHCM#testClosing fails if not enough entropy is available

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Priority: Critical  (was: Major)

Lifting priority as Ramkrishna suggested.

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make Thrift server thread pool bounded and add a command-line UI test

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4863:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Addendum applied to TRUNK.

> Make Thrift server thread pool bounded and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Fix For: 0.94.0
>
> Attachments: 
> 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, 
> 0002-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, 4863.addendum, 
> D531.1.patch, D531.2.patch, D531.3.patch, D531.4.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Status: Patch Available  (was: Open)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Status: Open  (was: Patch Available)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Attachment: 4862.txt

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Attachment: (was: 4862.txt)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make Thrift server thread pool bounded and add a command-line UI test

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4863:
--

Attachment: 4863.addendum

Addendum to add category for TestThreads

> Make Thrift server thread pool bounded and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Fix For: 0.94.0
>
> Attachments: 
> 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, 
> 0002-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, 4863.addendum, 
> D531.1.patch, D531.2.patch, D531.3.patch, D531.4.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Status: Patch Available  (was: Open)

TestHLogSplit passed on MacBook.

Rerun test suite on Jenkins.

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Attachment: 4862.txt

I ran a few tests based on patch for TRUNK and didn't see failure.
Reattaching patch for TRUNK.

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Status: Open  (was: Patch Available)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Status: Patch Available  (was: Open)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 
> trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make Thrift server thread pool bounded and add a command-line UI test

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4863:
--

Summary: Make Thrift server thread pool bounded and add a command-line UI 
test  (was: Make HBase Thrift server more configurable and add a command-line 
UI test)

> Make Thrift server thread pool bounded and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Fix For: 0.94.0
>
> Attachments: 
> 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, 
> 0002-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
> D531.2.patch, D531.3.patch, D531.4.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4868) testMetaRebuild#TestOfflineMetaRebuildBase occasionally fails

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4868:
--

Fix Version/s: 0.92.0

>From 
>https://builds.apache.org/job/PreCommit-HBASE-Build/365//testReport/org.apache.hadoop.hbase.replication/TestMultiSlaveReplication/testMultiSlaveReplication/,
> I see:
{code}
2011-11-25 12:15:36,018 ERROR 
[org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@c7057c] 
datanode.DataXceiverServer(145): DatanodeRegistration(127.0.0.1:56231, 
storageID=DS-225434506-67.195.138.20-56231-133310312, infoPort=41654, 
ipcPort=53271):DataXceiveServer: Exiting due to:java.lang.OutOfMemoryError: 
unable to create new native thread
at java.lang.Thread.start0(Native Method)
{code}
The other two failed tests were due to 'Too many open files'

> testMetaRebuild#TestOfflineMetaRebuildBase occasionally fails
> -
>
> Key: HBASE-4868
> URL: https://issues.apache.org/jira/browse/HBASE-4868
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4868_trial.patch
>
>
> looks: 
> https://builds.apache.org/job/HBase-TRUNK-security/7/testReport/org.apache.hadoop.hbase.util.hbck/TestOfflineMetaRebuildBase/testMetaRebuild/
> Please review, see whether the method makes sense? 
> If it makes sense, I will check other cases?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4863:
--

Fix Version/s: 0.94.0

+1 on patch v2.
The test failures reported by HadoopQA weren't related to the patch.

> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Fix For: 0.94.0
>
> Attachments: 
> 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, 
> 0002-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
> D531.2.patch, D531.3.patch, D531.4.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4868) testMetaRebuild#TestOfflineMetaRebuildBase occasionally fails

2011-11-25 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4868:
--

Status: Patch Available  (was: Open)

> testMetaRebuild#TestOfflineMetaRebuildBase occasionally fails
> -
>
> Key: HBASE-4868
> URL: https://issues.apache.org/jira/browse/HBASE-4868
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: HBASE-4868_trial.patch
>
>
> looks: 
> https://builds.apache.org/job/HBase-TRUNK-security/7/testReport/org.apache.hadoop.hbase.util.hbck/TestOfflineMetaRebuildBase/testMetaRebuild/
> Please review, see whether the method makes sense? 
> If it makes sense, I will check other cases?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4855:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> SplitLogManager hangs on cluster restart due to batch.installed doubly counted
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4855:
--

Summary: SplitLogManager hangs on cluster restart due to batch.installed 
doubly counted  (was: SplitLogManager hangs on cluster restart. )

> SplitLogManager hangs on cluster restart due to batch.installed doubly counted
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Fix Version/s: 0.90.5
   0.94.0
   0.92.0

> Split hlog and open region concurrently happend may cause data loss
> ---
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Summary: Splitting hlog and opening region concurrently may cause data loss 
 (was: Split hlog and open region concurrently happend may cause data loss)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> TestMasterObserver#testRegionTransitionOperations occasionally fails
> 
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

  Issue Type: Test  (was: Bug)
Hadoop Flags: Reviewed

> testRegionTransitionOperations occasional failures
> --
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

Summary: TestMasterObserver#testRegionTransitionOperations occasionally 
fails  (was: testRegionTransitionOperations occasional failures)

> TestMasterObserver#testRegionTransitionOperations occasionally fails
> 
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

Status: Patch Available  (was: Open)

> testRegionTransitionOperations occasional failures
> --
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-23 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4739:
--

Fix Version/s: (was: 0.90.5)
 Hadoop Flags: Reviewed

> Master dying while going to close a region can leave it in transition forever
> -
>
> Key: HBASE-4739
> URL: https://issues.apache.org/jira/browse/HBASE-4739
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Jean-Daniel Cryans
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
> HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, 
> HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, 
> HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
> the master died it had just created the RIT znode for a region but didn't 
> tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
> state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
> too long, this should eventually complete or the server will expire, doing 
> nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare 
> unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   5   >