[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-03-31 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243668#comment-13243668
 ] 

xufeng commented on HBASE-5677:
---

We can reproduce this issue by following steps with 0.90:

step1:start a cluster and create a table that has many regions.
step2:disable table created in step1 by shell.
step3:kill the active master.
step3:the backup master will become active one,when the master checkin 
regionservers. enable the table by shell.

result:the duplicate problem issue happened.


I think the master should not provide service when it did not complete the 
initialization.
We can add a method in HMasterInterface 
like:
{noformat}
public boolean isMasterAvailable();

  //the master is running and it can provide service
  public boolean isMasterAvailable() {
return !isStopped() && isActiveMaster() && isInitialized();
  }
{noformat}


When the client getMaster,we can check it.

pls give me the suggestions,thanks.

> The master never does balance because duplicate openhandled the one region
> --
>
> Key: HBASE-5677
> URL: https://issues.apache.org/jira/browse/HBASE-5677
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6
> Environment: 0.90
>Reporter: xufeng
>Assignee: xufeng
>
> If region be assigned When the master is doing initialization(before do 
> processFailover),the region will be duplicate openhandled.
> because the unassigned node in zookeeper will be handled again in 
> AssignmentManager#processFailover()
> it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5638) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5638:
-

Fix Version/s: (was: 0.94.1)
   0.96.0

> Backport to 0.90 and 0.92 - NPE reading ZK config in HBase
> --
>
> Key: HBASE-5638
> URL: https://issues.apache.org/jira/browse/HBASE-5638
> Project: HBase
>  Issue Type: Sub-task
>  Components: zookeeper
>Affects Versions: 0.90.6, 0.92.1
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
> Attachments: HBASE-5633-0.90.patch, HBASE-5633-0.92.patch, 
> HBASE-5638-0.90-v1.patch, HBASE-5638-0.90-v2.patch, HBASE-5638-0.92-v1.patch, 
> HBASE-5638-0.92-v2.patch, HBASE-5638-trunk-v1.patch, HBASE-5638-trunk-v2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5671) hbase.metrics.showTableName should be true by default

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5671:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

> hbase.metrics.showTableName should be true by default
> -
>
> Key: HBASE-5671
> URL: https://issues.apache.org/jira/browse/HBASE-5671
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Critical
> Fix For: 0.94.0, 0.96.0
>
> Attachments: HBASE-5671_v1.patch
>
>
> HBASE-4768 added per-cf metrics and a new configuration option 
> "hbase.metrics.showTableName". We should switch the conf option to true by 
> default, since it is not intuitive (at least to me) to aggregate per-cf 
> across tables by default, and it seems confusing to report on cf's without 
> table names. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243665#comment-13243665
 ] 

Lars Hofhansl commented on HBASE-5682:
--

bq. You think this should go into 0.92?
Probably. I guess most folks have clients that they restart frequently, use 
thrift, or asynchhbase. But in its current form using the standard HBase client 
in an app server is very error prone if the HBase/ZK cluster is ever serviced 
without bringing the app server down in lock step.

bq. Didn't we add a check for if the connection is bad?
Yeah with hbase-5153 but in 0.90 only. At some point we decided the fix there 
wasn't good and Ram patched it up for 0.90.
This should subsime HBASE-5153. I'm happy to even put this in 0.90, but that's 
up to Ram.

bq. I'm interested in problems you see in hbase-5153 or issues you have w/ the 
implementation there that being the 0.96 client.

What I saw in 0.96 is that the client was blocked for a very long time (gave up 
after a few minutes), even though I had set all timeouts to low values. This is 
also deadly in an app server setting. Might be a simple fix there, didn't dig 
deeper.


> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243662#comment-13243662
 ] 

Lars Hofhansl edited comment on HBASE-5682 at 4/1/12 5:49 AM:
--

Found the problem.
The ClusterId could remain null permanently if 
HConnection.getZookeeperWatcher() was called. That would initialize 
HConnectionImplementation.zookeeper, and hence not reset clusterid in 
ensureZookeeperTrackers.
TestZookeeper.testClientSessionExpired does that.

Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING 
rather than CONNECTED depending on timing.

Upon inspection I also made clusterId, rootRegionTracker, masterAddressTracker, 
and zooKeeper volatile, because they can be modified by a different thread, but 
are not exclusively accessed in a synchronized block (existing problem).

New patch that fixes the problem, passes all tests.

TestZookeeper seems to have good coverage. If I can think of more tests, I'll 
add them there.

  was (Author: lhofhansl):
Found the problem.
The ClusterId could be remain null permanently if 
HConnection.getZookeeperWatcher() was called. That would initialize 
HConnectionImplementation.zookeeper, and hence not reset clusterid in 
ensureZookeeperTrackers.
TestZookeeper.testClientSessionExpired does that.

Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING 
rather than CONNECTED depending on timing.

Upon inspection I also made clusterId, rootRegionTracker, masterAddressTracker, 
and zooKeeper volatile, because they can be modified by a different thread, but 
are not exclusively accessed in a synchronized block (exiting problem).

New patch that fixes the problem, passes all tests.

TestZookeeper seems to have good coverage. If I can think of more tests, I'll 
add them there.
  
> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Priority: Critical  (was: Major)

Upped to "critical". Without this the HBase client is pretty much useless in an 
AppServer setting where client can outlive the HBase cluster and ZK ensemble.
(Testing within the Salesforce AppServer is how I noticed the problem 
initially.)


> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Attachment: 5682-all-v2.txt

Found the problem.
The ClusterId could be remain null permanently if 
HConnection.getZookeeperWatcher() was called. That would initialize 
HConnectionImplementation.zookeeper, and hence not reset clusterid in 
ensureZookeeperTrackers.
TestZookeeper.testClientSessionExpired does that.

Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING 
rather than CONNECTED depending on timing.

Upon inspection I also made clusterId, rootRegionTracker, masterAddressTracker, 
and zooKeeper volatile, because they can be modified by a different thread, but 
are not exclusively accessed in a synchronized block (exiting problem).

New patch that fixes the problem, passes all tests.

TestZookeeper seems to have good coverage. If I can think of more tests, I'll 
add them there.

> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4348) Add metrics for regions in transition

2012-03-31 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4348:
--

Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed

> Add metrics for regions in transition
> -
>
> Key: HBASE-4348
> URL: https://issues.apache.org/jira/browse/HBASE-4348
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Himanshu Vashishtha
>Priority: Minor
>  Labels: noob
> Fix For: 0.96.0
>
> Attachments: 4348-metrics-v3.patch, 4348-v1.patch, 4348-v2.patch, 
> RITs.png, RegionInTransitions2.png, metrics-v2.patch
>
>
> The following metrics would be useful for monitoring the master:
> - the number of regions in transition
> - the number of regions in transition that have been in transition for more 
> than a minute
> - how many seconds has the oldest region-in-transition been in transition

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243648#comment-13243648
 ] 

Zhihong Yu commented on HBASE-5689:
---

Some minor comments about Chunhui's patch:
{code}
+  LOG.debug("Rename " + wap.p + " to " + dst);
{code}
The log should begin with "Renamed ".
{code}
+private final Map regionMaximumEditLogSeqNum = Collections
+.synchronizedMap(new TreeMap(Bytes.BYTES_COMPARATOR));
{code}
Is a TreeMap needed above ? We're just remembering the mapping, right ?

> Skipping RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 5689-simplified.txt, 5689-testcase.patch, 
> HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2186) hbase master should publish more stats

2012-03-31 Thread Otis Gospodnetic (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243642#comment-13243642
 ] 

Otis Gospodnetic commented on HBASE-2186:
-

Just saw some other JIRA issue (don't recall the number) that mentioned master 
having more data so I wonder - is this issue still relevant?  Hasn't been 
touched in over 2 years.  Thanks.

> hbase master should publish more stats
> --
>
> Key: HBASE-2186
> URL: https://issues.apache.org/jira/browse/HBASE-2186
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Reporter: ryan rawson
>
> hbase master only publishes cluster.requests to ganglia. we should also 
> publish regionserver count and other interesting metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4348) Add metrics for regions in transition

2012-03-31 Thread Otis Gospodnetic (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243641#comment-13243641
 ] 

Otis Gospodnetic commented on HBASE-4348:
-

Fix Version/s is set to None.  Is this for 0.96?

> Add metrics for regions in transition
> -
>
> Key: HBASE-4348
> URL: https://issues.apache.org/jira/browse/HBASE-4348
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Himanshu Vashishtha
>Priority: Minor
>  Labels: noob
> Attachments: 4348-metrics-v3.patch, 4348-v1.patch, 4348-v2.patch, 
> RITs.png, RegionInTransitions2.png, metrics-v2.patch
>
>
> The following metrics would be useful for monitoring the master:
> - the number of regions in transition
> - the number of regions in transition that have been in transition for more 
> than a minute
> - how many seconds has the oldest region-in-transition been in transition

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5692) Add real action time for HLogPrettyPrinter

2012-03-31 Thread Xing Shi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xing Shi updated HBASE-5692:


Attachment: HBASE-5692.patch

> Add real action time for HLogPrettyPrinter
> --
>
> Key: HBASE-5692
> URL: https://issues.apache.org/jira/browse/HBASE-5692
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Xing Shi
>Priority: Minor
> Attachments: HBASE-5692.patch
>
>
> Now the HLogPrettyPrinter print the log without real op time but the timestamp
> {quote}
> Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5
>   Action:
> row: r
> column: f3:q
> at time: Thu Jan 01 08:02:03 CST 1970
> {quote}
> Maybe we need to know the real op time like this
> {quote}
> Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5 at time: 
> 
>   Action:
> row: r
> column: f3:q
> timestamp: Thu Jan 01 08:02:03 CST 1970
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5213) "hbase master stop" does not bring down backup masters

2012-03-31 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5213:
--

Fix Version/s: 0.92.2
   0.90.7

> "hbase master stop" does not bring down backup masters
> --
>
> Key: HBASE-5213
> URL: https://issues.apache.org/jira/browse/HBASE-5213
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
>Priority: Minor
> Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
> Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, 
> HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch
>
>
> Typing "hbase master stop" produces the following message:
> "stop   Start cluster shutdown; Master signals RegionServer shutdown"
> It seems like backup masters should be considered part of the cluster, but 
> they are not brought down by "hbase master stop".
> "stop-hbase.sh" does correctly bring down the backup masters.
> The same behavior is observed when a client app makes use of the client API 
> HBaseAdmin.shutdown() 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown()
>  -- this isn't too surprising since I think "hbase master stop" just calls 
> this API.
> It seems like HBASE-1448 address this; perhaps there was a regression?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5213) "hbase master stop" does not bring down backup masters

2012-03-31 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5213:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> "hbase master stop" does not bring down backup masters
> --
>
> Key: HBASE-5213
> URL: https://issues.apache.org/jira/browse/HBASE-5213
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
>Priority: Minor
> Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
> Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, 
> HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch
>
>
> Typing "hbase master stop" produces the following message:
> "stop   Start cluster shutdown; Master signals RegionServer shutdown"
> It seems like backup masters should be considered part of the cluster, but 
> they are not brought down by "hbase master stop".
> "stop-hbase.sh" does correctly bring down the backup masters.
> The same behavior is observed when a client app makes use of the client API 
> HBaseAdmin.shutdown() 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown()
>  -- this isn't too surprising since I think "hbase master stop" just calls 
> this API.
> It seems like HBASE-1448 address this; perhaps there was a regression?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5213) "hbase master stop" does not bring down backup masters

2012-03-31 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243639#comment-13243639
 ] 

Jonathan Hsieh commented on HBASE-5213:
---

Committed to 0.92 and 0.90.  Thanks Greg.

> "hbase master stop" does not bring down backup masters
> --
>
> Key: HBASE-5213
> URL: https://issues.apache.org/jira/browse/HBASE-5213
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
>Priority: Minor
> Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
> Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, 
> HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch
>
>
> Typing "hbase master stop" produces the following message:
> "stop   Start cluster shutdown; Master signals RegionServer shutdown"
> It seems like backup masters should be considered part of the cluster, but 
> they are not brought down by "hbase master stop".
> "stop-hbase.sh" does correctly bring down the backup masters.
> The same behavior is observed when a client app makes use of the client API 
> HBaseAdmin.shutdown() 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown()
>  -- this isn't too surprising since I think "hbase master stop" just calls 
> this API.
> It seems like HBASE-1448 address this; perhaps there was a regression?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5213) "hbase master stop" does not bring down backup masters

2012-03-31 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243638#comment-13243638
 ] 

Jonathan Hsieh commented on HBASE-5213:
---

Committed to 0.92 and 0.90.  Thanks Greg.

> "hbase master stop" does not bring down backup masters
> --
>
> Key: HBASE-5213
> URL: https://issues.apache.org/jira/browse/HBASE-5213
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
>Priority: Minor
> Fix For: 0.94.0, 0.96.0
>
> Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, 
> HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch
>
>
> Typing "hbase master stop" produces the following message:
> "stop   Start cluster shutdown; Master signals RegionServer shutdown"
> It seems like backup masters should be considered part of the cluster, but 
> they are not brought down by "hbase master stop".
> "stop-hbase.sh" does correctly bring down the backup masters.
> The same behavior is observed when a client app makes use of the client API 
> HBaseAdmin.shutdown() 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown()
>  -- this isn't too surprising since I think "hbase master stop" just calls 
> this API.
> It seems like HBASE-1448 address this; perhaps there was a regression?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5692) Add real action time for HLogPrettyPrinter

2012-03-31 Thread Xing Shi (Created) (JIRA)
Add real action time for HLogPrettyPrinter
--

 Key: HBASE-5692
 URL: https://issues.apache.org/jira/browse/HBASE-5692
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Xing Shi
Priority: Minor


Now the HLogPrettyPrinter print the log without real op time but the timestamp

{quote}
Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5
  Action:
row: r
column: f3:q
at time: Thu Jan 01 08:02:03 CST 1970
{quote}
Maybe we need to know the real op time like this
{quote}
Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5 at time: 

  Action:
row: r
column: f3:q
timestamp: Thu Jan 01 08:02:03 CST 1970
{quote}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243637#comment-13243637
 ] 

Lars Hofhansl commented on HBASE-5682:
--

I am not envisioning any API changes, just that the HConnection would no longer 
be ripped from under any HTables where there is a ZK connection loss.

I ran all tests again, and TestReplication and TestZookeeper have some failures 
that are related. Looking.

> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-03-31 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243625#comment-13243625
 ] 

chunhui shen commented on HBASE-5689:
-

@Ram
The analysis is correct.
However, I think your solution is not use the optimization made in HBASE-4797.
In my  HBASE-5689.patch, I just change the file name of edit log to 
MaximumEditLogSeqNum。
In this issue, with the patch it will not skip RecoveredEdits file because we 
shouldn't, however, in other case, for example, we first put many data to RS1, 
then move regon to RS2 and put many data again. If RS1 and RS2 died, we would 
skip the edit log file from RS1.

bq.Chunhui's patch also makes sense but any way the idea is not to skip any 
recovered.edits file.
So, it is wrong, my way is keeping skip recovered.edits file, except the case 
such as this issue.

> Skipping RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 5689-simplified.txt, 5689-testcase.patch, 
> HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243620#comment-13243620
 ] 

stack commented on HBASE-5682:
--

I looked at the 'all' patch.  Looks good to me.  Am interested in how it 
changes API usage (if at all).

> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243619#comment-13243619
 ] 

stack commented on HBASE-5682:
--

bq. The problem I have seen in 0.94/0.92 without this patch even with managed 
connections is that after HConnection times out, it is unusable and even 
getting a new HTable does not fix the problem since behind the scenes the same 
HConnection is retrieved.

Didn't we add a check for if the connection is bad?

bq. Will think about an automated test. Do you like the version better that 
always does the recheck (and hence all the conditional for "managed" go away)?

How does this work in trunk?  In trunk the work has been done so we don't 
really keep open a zk session any more.  For the sake of making tests run 
smoother, we'll do keep alive on zk session and hold it open 5 minutes and let 
it go if unused.

I'm +1 on making our stuff more resilient.  Resusing a dud hconnection either 
because the connection is dead or zk session died is hard to figure.

How will this change a users's perception about how this stuff is used?  If 
your answer is that it helps in the extreme where the connection goes dead, and 
thats the only change a user percieves, then lets commit.  But we should 
include a test?  If you describe one, I can try help write it?

You think this should go into 0.92?


> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5688) Convert zk root-region-server znode content to pb

2012-03-31 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243618#comment-13243618
 ] 

jirapos...@reviews.apache.org commented on HBASE-5688:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4600/
---

Review request for hbase.


Summary
---

Changes the content of the root location znode, root-region-server, to be
four magic bytes ('PBUF') followed by a protobuf message that holds the
ServerName of the server currently hosting root.

D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java
  Removed. Had two methods, one to add root-region-server znode and another
  to removed it.  Rather, put these methods in RootRegionTracker.  It
  tracks root-region-server znode.  Having all to do w/ root-region-server
  is more cohesive.  Also makes it so can encapsulate in one class
  all to do w/ create, delete, and reading of root-region-server.
  We also want to purge the catalog package (See note at head of
  CatalogTracker).
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Get root region location from RootRegionTracker rather than from 
RootLocationEditor.
A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
  Utility to do w/ protobuf handling.  Has methods to help prefixing
  and stripping from serialized protobuf messages some 'magic'.
A src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java
  PB generated.
M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
  Use new RootRegionTracker method for getting content of znode rather
  than do it all here (going via RootRegionTracker, we can keep how
  the znode content is serialized private to the RootRegionTracker class.
M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
  Has the methods that used to be in RootLocationEditor plus a new


This addresses bug hbase-5688.
https://issues.apache.org/jira/browse/hbase-5688


Diffs
-

  src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java c90864a 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java b2a5463 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 64def15 
  src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 9c215b4 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 2f05005 
  src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java 
33e4e71 
  src/main/protobuf/ZooKeeper.proto PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 533b2bf 
  
src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java 
fe37156 
  src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java 2132036 
  src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/4600/diff


Testing
---


Thanks,

Michael



> Convert zk root-region-server znode content to pb
> -
>
> Key: HBASE-5688
> URL: https://issues.apache.org/jira/browse/HBASE-5688
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Fix For: 0.96.0
>
> Attachments: 5688.txt, 5688v4.txt
>
>
> Move the root-region-server znode content from the versioned bytes that 
> ServerName.getVersionedBytes outputs to instead be pb.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5688) Convert zk root-region-server znode content to pb

2012-03-31 Thread stack (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243616#comment-13243616
 ] 

stack edited comment on HBASE-5688 at 4/1/12 12:18 AM:
---

Changes the content of the  root location znode, root-region-server, to be
four magic bytes ('PBUF') followed by a protobuf message that holds the
ServerName of the server currently hosting root.
{code}
D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java
  Removed. Had two methods, one to add root-region-server znode and another
  to removed it.  Rather, put these methods in RootRegionTracker.  It
  tracks root-region-server znode.  Having all to do w/ root-region-server
  is more cohesive.  Also makes it so can encapsulate in one class
  all to do w/ create, delete, and reading of root-region-server.
  We also want to purge the catalog package (See note at head of
  CatalogTracker).
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Get root region location from RootRegionTracker rather than from 
RootLocationEditor.
A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
  Utility to do w/ protobuf handling.  Has methods to help prefixing
  and stripping from serialized protobuf messages some 'magic'.
A src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java
  PB generated.
M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
  Use new RootRegionTracker method for getting content of znode rather
  than do it all here (going via RootRegionTracker, we can keep how
  the znode content is serialized private to the RootRegionTracker class.
M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
  Has the methods that used to be in RootLocationEditor plus a new
  getRootRegionLocation method (does not set watcher).
A src/main/protobuf/ZooKeeper.proto
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java
M src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
  Get root region location from RootRegionTracker rather than from 
RootLocationEditor.
A src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java
  Test of dataToServerName method in RootRegionTracker.
{code}

  was (Author: stack):
Changes the content of the  root location znode, root-region-server, to be
four magic bytes ('PBUF') followed by a protobuf message that holds the
ServerName of the server currently hosting root.

D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java
  Removed. Had two methods, one to add root-region-server znode and another
  to removed it.  Rather, put these methods in RootRegionTracker.  It
  tracks root-region-server znode.  Having all to do w/ root-region-server
  is more cohesive.  Also makes it so can encapsulate in one class
  all to do w/ create, delete, and reading of root-region-server.
  We also want to purge the catalog package (See note at head of
  CatalogTracker).
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Get root region location from RootRegionTracker rather than from 
RootLocationEditor.
A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
  Utility to do w/ protobuf handling.  Has methods to help prefixing
  and stripping from serialized protobuf messages some 'magic'.
A src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java
  PB generated.
M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
  Use new RootRegionTracker method for getting content of znode rather
  than do it all here (going via RootRegionTracker, we can keep how
  the znode content is serialized private to the RootRegionTracker class.
M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
  Has the methods that used to be in RootLocationEditor plus a new
  getRootRegionLocation method (does not set watcher).
A src/main/protobuf/ZooKeeper.proto
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java
M src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
  Get root region location from RootRegionTracker rather than from 
RootLocationEditor.
A src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java
  Test of dataToServerName method in RootRegionTracker.
  
> Convert zk root-region-server znode content to pb
> -
>
> 

[jira] [Updated] (HBASE-5688) Convert zk root-region-server znode content to pb

2012-03-31 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5688:
-

Attachment: 5688v4.txt

Changes the content of the  root location znode, root-region-server, to be
four magic bytes ('PBUF') followed by a protobuf message that holds the
ServerName of the server currently hosting root.

D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java
  Removed. Had two methods, one to add root-region-server znode and another
  to removed it.  Rather, put these methods in RootRegionTracker.  It
  tracks root-region-server znode.  Having all to do w/ root-region-server
  is more cohesive.  Also makes it so can encapsulate in one class
  all to do w/ create, delete, and reading of root-region-server.
  We also want to purge the catalog package (See note at head of
  CatalogTracker).
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Get root region location from RootRegionTracker rather than from 
RootLocationEditor.
A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
  Utility to do w/ protobuf handling.  Has methods to help prefixing
  and stripping from serialized protobuf messages some 'magic'.
A src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java
  PB generated.
M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
  Use new RootRegionTracker method for getting content of znode rather
  than do it all here (going via RootRegionTracker, we can keep how
  the znode content is serialized private to the RootRegionTracker class.
M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
  Has the methods that used to be in RootLocationEditor plus a new
  getRootRegionLocation method (does not set watcher).
A src/main/protobuf/ZooKeeper.proto
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java
M src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
  Get root region location from RootRegionTracker rather than from 
RootLocationEditor.
A src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java
  Test of dataToServerName method in RootRegionTracker.

> Convert zk root-region-server znode content to pb
> -
>
> Key: HBASE-5688
> URL: https://issues.apache.org/jira/browse/HBASE-5688
> Project: HBase
>  Issue Type: Task
>Reporter: stack
> Fix For: 0.96.0
>
> Attachments: 5688.txt, 5688v4.txt
>
>
> Move the root-region-server znode content from the versioned bytes that 
> ServerName.getVersionedBytes outputs to instead be pb.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5688) Convert zk root-region-server znode content to pb

2012-03-31 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5688:
-

Fix Version/s: 0.96.0
 Assignee: stack
 Release Note: 
Changes the content of the  root location znode, root-region-server, to be
four magic bytes ('PBUF') followed by a protobuf message that holds the
ServerName of the server currently hosting root.
   Status: Patch Available  (was: Open)

Trying against hadoopqa.  Let me put it up on review board too because would 
appreciate some review.

> Convert zk root-region-server znode content to pb
> -
>
> Key: HBASE-5688
> URL: https://issues.apache.org/jira/browse/HBASE-5688
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Fix For: 0.96.0
>
> Attachments: 5688.txt, 5688v4.txt
>
>
> Move the root-region-server znode content from the versioned bytes that 
> ServerName.getVersionedBytes outputs to instead be pb.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243605#comment-13243605
 ] 

Lars Hofhansl commented on HBASE-5682:
--

The more I look at, the more I do like the patch that changes the behavior in 
all cases.
It's simple and low risk: Just recheck the ZK trackers before they are needed.


> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5680) Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1

2012-03-31 Thread Jonathan Hsieh (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243593#comment-13243593
 ] 

Jonathan Hsieh edited comment on HBASE-5680 at 3/31/12 11:24 PM:
-

The master runs on top of hadoop 0.23.x
apache 0.92.1 recompiled with the security profile *on* (-Psecurity) and with 
-Dhadoop.profile=23
apache 0.94.0rc0 recompiled with the security profile *on* (-Psecurity) and  
with -Dhadoop.profile=23 

The master fails on top of hadoop 0.23.x with the class not found error in 
these cases:
apache 0.92.1 right out of tarball
apache 0.92.1 with the security profile *off* with -Dhadoop.profile=23
apache 0.94.0rc0 right out of tarball.
apache 0.94.0rc0 security right out of tarball.
apache 0.94.0rc0 with the security profile *off* with -Dhadoop.profile=23



  was (Author: jmhsieh):
The master starts on top of hadoop 0.23.x
apache 0.92.1 recompiled with the security profile *on* (-Psecurity) and with 
-Dhadoop.profile=23
apache 0.94.0rc0 recompiled with the security profile *on* (-Psecurity) and  
with -Dhadoop.profile=23 

The master fails on top of hadoop 0.23.x with the class not found error in 
these cases:
apache 0.92.1 right out of tarball
apache 0.92.1 with the security profile *off* with -Dhadoop.profile=23
apache 0.94.0rc0 right out of tarball.
apache 0.94.0rc0 security right out of tarball.
apache 0.94.0rc0 with the security profile *off* with -Dhadoop.profile=23


  
> Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1 
> --
>
> Key: HBASE-5680
> URL: https://issues.apache.org/jira/browse/HBASE-5680
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Kristam Subba Swathi
>
> Hmaster is not able to start because of the following error
> Please find the following error 
> 
> 2012-03-30 11:12:19,487 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unhandled exception. Starting shutdown.
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction
>   at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:524)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:324)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:112)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:496)
>   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
>   at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   ... 7 more
> There is a change in the FSConstants

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread Todd Lipcon (Reopened) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened HBASE-5691:



> Importtsv stops the webservice from which it is evoked
> --
>
> Key: HBASE-5691
> URL: https://issues.apache.org/jira/browse/HBASE-5691
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: debarshi basak
>Priority: Minor
>
> I was trying to run importtsv from a servlet. Everytime after the completion 
> of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HBASE-5691.


Resolution: Not A Problem

> Importtsv stops the webservice from which it is evoked
> --
>
> Key: HBASE-5691
> URL: https://issues.apache.org/jira/browse/HBASE-5691
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: debarshi basak
>Priority: Minor
>
> I was trying to run importtsv from a servlet. Everytime after the completion 
> of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5680) Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1

2012-03-31 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243593#comment-13243593
 ] 

Jonathan Hsieh commented on HBASE-5680:
---

The master starts on top of hadoop 0.23.x
apache 0.92.1 recompiled with the security profile *on* (-Psecurity) and with 
-Dhadoop.profile=23
apache 0.94.0rc0 recompiled with the security profile *on* (-Psecurity) and  
with -Dhadoop.profile=23 

The master fails on top of hadoop 0.23.x with the class not found error in 
these cases:
apache 0.92.1 right out of tarball
apache 0.92.1 with the security profile *off* with -Dhadoop.profile=23
apache 0.94.0rc0 right out of tarball.
apache 0.94.0rc0 security right out of tarball.
apache 0.94.0rc0 with the security profile *off* with -Dhadoop.profile=23



> Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1 
> --
>
> Key: HBASE-5680
> URL: https://issues.apache.org/jira/browse/HBASE-5680
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Kristam Subba Swathi
>
> Hmaster is not able to start because of the following error
> Please find the following error 
> 
> 2012-03-30 11:12:19,487 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unhandled exception. Starting shutdown.
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction
>   at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:524)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:324)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:112)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:496)
>   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
>   at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   ... 7 more
> There is a change in the FSConstants

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread debarshi basak (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243591#comment-13243591
 ] 

debarshi basak commented on HBASE-5691:
---

Thanks alot.It worked.

> Importtsv stops the webservice from which it is evoked
> --
>
> Key: HBASE-5691
> URL: https://issues.apache.org/jira/browse/HBASE-5691
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: debarshi basak
>Priority: Minor
>
> I was trying to run importtsv from a servlet. Everytime after the completion 
> of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread debarshi basak (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

debarshi basak resolved HBASE-5691.
---

Resolution: Fixed

> Importtsv stops the webservice from which it is evoked
> --
>
> Key: HBASE-5691
> URL: https://issues.apache.org/jira/browse/HBASE-5691
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: debarshi basak
>Priority: Minor
>
> I was trying to run importtsv from a servlet. Everytime after the completion 
> of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243589#comment-13243589
 ] 

Lars Hofhansl commented on HBASE-5682:
--

"perversion" is hard word. :) It is just rechecking before each use whether the 
trackers are still usable. The timeout is handled through the HConnection's 
abort().

The testing I've done:
# ZK down, HBase down, start a client. Then start ZK, then HBase.
# ZK up, HBase down, start client. Then start HBase
# both ZK and HBase up, start client, kill HBase, restart HBase
# both ZK and HBase up, start client, kill ZK and HBase restart

The client just create a new HTable and then tries to get some rows in a loop.
In all cases the client should successfully be able to reconnect when both ZK 
and HBase are up.

The problem I have seen in 0.94/0.92 without this patch even with managed 
connections is that after HConnection times out, it is unusable and even 
getting a new HTable does not fix the problem since behind the scenes the same 
HConnection is retrieved.

Will think about an automated test. Do you like the version better that always 
does the recheck (and hence all the conditional for "managed" go away)?


> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Attachment: 5682-all.txt

Here's a patch that always attempts reconnecting to ZK when a ZK connection is 
needed.

> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243314#comment-13243314
 ] 

stack commented on HBASE-5691:
--

It calls System.exit on the end: System.exit(job.waitForCompletion(true) ? 
0 : 1);.   How you invoking it?You call main?   Why not call the 
createSubmittableJob(conf, otherArgs) and then pass the passed job to 
job.waitForCompletion(true)...  and show error or success dependent on what is 
returned.  Is it even  a good idea calling this script from a servlet?

> Importtsv stops the webservice from which it is evoked
> --
>
> Key: HBASE-5691
> URL: https://issues.apache.org/jira/browse/HBASE-5691
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: debarshi basak
>Priority: Minor
>
> I was trying to run importtsv from a servlet. Everytime after the completion 
> of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243315#comment-13243315
 ] 

stack commented on HBASE-5691:
--

Oh, please close this issue if the above suggestion works for you.

> Importtsv stops the webservice from which it is evoked
> --
>
> Key: HBASE-5691
> URL: https://issues.apache.org/jira/browse/HBASE-5691
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: debarshi basak
>Priority: Minor
>
> I was trying to run importtsv from a servlet. Everytime after the completion 
> of job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243312#comment-13243312
 ] 

stack commented on HBASE-5682:
--

This is a perversion.

If we pass in a connection from outside, down in the guts, do special handling 
that makes the connection and zookeeper handling do reconnect.  Its like we 
should be passing an Interface made at a higher-level of abstraction and then 
in the implementation, it did this fixup when connection breaks.

With that out of the way, do whatever you need to make it work.  Patch looks 
fine. How did you test.  Would it be hard to make a unit test of it.  A unit 
test would be good codifying this perversion since it will be brittle being not 
whats expected.

I'm against changing the behavior of the default case in 0.92/0.94.   I'm 
interested in problems you see in hbase-5153 or issues you have w/ the 
implementation there that being the 0.96 client.

> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5691) Importtsv stops the webservice from which it is evoked

2012-03-31 Thread debarshi basak (Created) (JIRA)
Importtsv stops the webservice from which it is evoked
--

 Key: HBASE-5691
 URL: https://issues.apache.org/jira/browse/HBASE-5691
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: debarshi basak
Priority: Minor


I was trying to run importtsv from a servlet. Everytime after the completion of 
job, the tomcat server was shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243310#comment-13243310
 ] 

Zhihong Yu commented on HBASE-5666:
---

HConnectionImplementation.checkIfBaseNodeAvailable() doesn't take Abortable.
We can limit the scope of change in this JIRA.

Is that Okay ?

> RegionServer doesn't retry to check if base node is available
> -
>
> Key: HBASE-5666
> URL: https://issues.apache.org/jira/browse/HBASE-5666
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, zookeeper
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
> hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
> hbase-zookeeper.log, zk-exists-refactor-v0.patch
>
>
> I've a script that starts hbase and a couple of region servers in distributed 
> mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and 
> HRegionServer.initializeZooKeeper() check just once if the base not is 
> available.
> {code}
> 2012-03-28 21:54:05,013 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
> one configured in the master.
> 2012-03-28 21:54:08,598 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
> RS.
> java.io.IOException: Received the shutdown message while waiting.
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
>   at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-31 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243302#comment-13243302
 ] 

Matteo Bertozzi commented on HBASE-5666:


I was thinking to patch ZooKeeperNodeTracker.checkIfBaseNodeAvailable() and 
HConnectionImplementation.checkIfBaseNodeAvailable() instead of only the 
HRegionServer...

what do you think?


> RegionServer doesn't retry to check if base node is available
> -
>
> Key: HBASE-5666
> URL: https://issues.apache.org/jira/browse/HBASE-5666
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, zookeeper
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
> hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
> hbase-zookeeper.log, zk-exists-refactor-v0.patch
>
>
> I've a script that starts hbase and a couple of region servers in distributed 
> mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and 
> HRegionServer.initializeZooKeeper() check just once if the base not is 
> available.
> {code}
> 2012-03-28 21:54:05,013 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
> one configured in the master.
> 2012-03-28 21:54:08,598 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
> RS.
> java.io.IOException: Received the shutdown message while waiting.
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
>   at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243296#comment-13243296
 ] 

Zhihong Yu commented on HBASE-5666:
---

Patch makes sense.
Can you integrate it into HRegionServer ?

Thanks

> RegionServer doesn't retry to check if base node is available
> -
>
> Key: HBASE-5666
> URL: https://issues.apache.org/jira/browse/HBASE-5666
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, zookeeper
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
> hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
> hbase-zookeeper.log, zk-exists-refactor-v0.patch
>
>
> I've a script that starts hbase and a couple of region servers in distributed 
> mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and 
> HRegionServer.initializeZooKeeper() check just once if the base not is 
> available.
> {code}
> 2012-03-28 21:54:05,013 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
> one configured in the master.
> 2012-03-28 21:54:08,598 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
> RS.
> java.io.IOException: Received the shutdown message while waiting.
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
>   at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5669) AggregationClient fails validation for open stoprow scan

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5669:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

> AggregationClient fails validation for open stoprow scan
> 
>
> Key: HBASE-5669
> URL: https://issues.apache.org/jira/browse/HBASE-5669
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Affects Versions: 0.92.1
> Environment: n/a
>Reporter: Brian Rogers
>Assignee: Mubarak Seyed
>Priority: Minor
> Fix For: 0.92.2, 0.94.0, 0.96.0
>
> Attachments: HBASE-5669.trunk.v1.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> AggregationClient.validateParameters throws an exception when the Scan has a 
> valid startrow but an unset endrow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5084) Allow different HTable instances to share one ExecutorService

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5084:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

> Allow different HTable instances to share one ExecutorService
> -
>
> Key: HBASE-5084
> URL: https://issues.apache.org/jira/browse/HBASE-5084
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 5084-0.94.txt, 5084-trunk.txt
>
>
> This came out of Lily 1.1.1 release:
> Use a shared ExecutorService for all HTable instances, leading to better (or 
> actual) thread reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4398) If HRegionPartitioner is used in MapReduce, client side configurations are overwritten by hbase-site.xml.

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4398:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

> If HRegionPartitioner is used in MapReduce, client side configurations are 
> overwritten by hbase-site.xml.
> -
>
> Key: HBASE-4398
> URL: https://issues.apache.org/jira/browse/HBASE-4398
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.90.4
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
> Fix For: 0.92.2, 0.94.0
>
> Attachments: HBASE-4398.patch
>
>
> If HRegionPartitioner is used in MapReduce, client side configurations are 
> overwritten by hbase-site.xml.
> We can reproduce the problem by the following instructions:
> {noformat}
> - Add HRegionPartitioner.class to the 4th argument of
> TableMapReduceUtil#initTableReducerJob()
> at line around 133
> in src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java
> - Change or remove "hbase.zookeeper.property.clientPort" property
> in hbase-site.xml ( for example, changed to 12345 ).
> - run testMultiRegionTable()
> {noformat}
> Then I got error messages as following:
> {noformat}
> 2011-09-12 22:28:51,020 DEBUG [Thread-832] zookeeper.ZKUtil(93): hconnection 
> opening connection to ZooKeeper with ensemble (localhost:12345)
> 2011-09-12 22:28:51,022 INFO  [Thread-832] 
> zookeeper.RecoverableZooKeeper(89): The identifier of this process is 
> 43200@imac.local
> 2011-09-12 22:28:51,123 WARN  [Thread-832] 
> zookeeper.RecoverableZooKeeper(161): Possibly transient ZooKeeper exception: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /hbase/master
> 2011-09-12 22:28:51,123 INFO  [Thread-832] 
> zookeeper.RecoverableZooKeeper(173): The 1 times to retry ZooKeeper after 
> sleeping 1000 ms
>  =
> 2011-09-12 22:29:02,418 ERROR [Thread-832] mapreduce.HRegionPartitioner(125): 
> java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2e54e48d
>  closed
> 2011-09-12 22:29:02,422 WARN  [Thread-832] mapred.LocalJobRunner$Job(256): 
> job_local_0001
> java.lang.NullPointerException
>at 
> org.apache.hadoop.hbase.mapreduce.HRegionPartitioner.setConf(HRegionPartitioner.java:128)
>at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:527)
>at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> {noformat}
> I think HTable should connect to ZooKeeper at port 21818 configured at client 
> side instead of 12345 in hbase-site.xml
> and It might be caused by "HBaseConfiguration.addHbaseResources(conf);" in 
> HRegionPartitioner#setConf(Configuration).
> And this might mean that all of client side configurations, also configured 
> in hbase-site.xml, are overwritten caused by this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5670) Have Mutation implement the Row interface.

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5670:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

> Have Mutation implement the Row interface.
> --
>
> Key: HBASE-5670
> URL: https://issues.apache.org/jira/browse/HBASE-5670
> Project: HBase
>  Issue Type: Improvement
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Trivial
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5670-0.94.txt, 5670-trunk.txt, 5670-trunk.txt
>
>
> In HBASE-4347 I factored some code from Put/Delete/Append in Mutation.
> In a discussion with a co-worker I noticed that Put/Delete/Append still 
> implement the Row interface, but Mutation does not.
> In a trivial change I would like to move that interface up to Mutation, along 
> with changing HTable.batch(List) to HTable.batch(List) 
> (HConnection.processBatch takes List already anyway), so that 
> HTable.batch can be used with a list of Mutations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-31 Thread Matteo Bertozzi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5666:
---

Attachment: zk-exists-refactor-v0.patch

don't know... I've tried to refactor the method to get something useful and 
shared...
The problem is that checkExists() called by checkIfBaseNodeAvailable() uses a 
ZooKeeperWatcher and call exists() on  a RecoverableZooKeeper object, while 
waitForBaseZNode() has a plain ZooKeeper node... 
so the checkExists(ZooKeeperWatcher) implementation relays on the fact that the 
RecoverableZooKeeper.exists() is implemented as RZK.getZooKeeper().exists() 
which I don't like...

> RegionServer doesn't retry to check if base node is available
> -
>
> Key: HBASE-5666
> URL: https://issues.apache.org/jira/browse/HBASE-5666
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, zookeeper
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
> hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
> hbase-zookeeper.log, zk-exists-refactor-v0.patch
>
>
> I've a script that starts hbase and a couple of region servers in distributed 
> mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and 
> HRegionServer.initializeZooKeeper() check just once if the base not is 
> available.
> {code}
> 2012-03-28 21:54:05,013 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
> one configured in the master.
> 2012-03-28 21:54:08,598 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
> RS.
> java.io.IOException: Received the shutdown message while waiting.
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
>   at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5690) compression does not work in Store.java of 0.94

2012-03-31 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243246#comment-13243246
 ] 

Hudson commented on HBASE-5690:
---

Integrated in HBase-0.94 #72 (See 
[https://builds.apache.org/job/HBase-0.94/72/])
HBASE-5690 compression does not work in Store.java of 0.94 (Honghua Zhu) 
(Revision 1307851)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


> compression does not work in Store.java of 0.94
> ---
>
> Key: HBASE-5690
> URL: https://issues.apache.org/jira/browse/HBASE-5690
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
> Environment: all
>Reporter: honghua zhu
>Assignee: honghua zhu
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: Store.patch
>
>
> HBASE-5442 The store.createWriterInTmp method missing "compression"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243239#comment-13243239
 ] 

Zhihong Yu commented on HBASE-5666:
---

Refactoring ZKUtil.waitForBaseZNode() so that it can be used by region server 
and the test would be good.

> RegionServer doesn't retry to check if base node is available
> -
>
> Key: HBASE-5666
> URL: https://issues.apache.org/jira/browse/HBASE-5666
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, zookeeper
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
> hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
> hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed 
> mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and 
> HRegionServer.initializeZooKeeper() check just once if the base not is 
> available.
> {code}
> 2012-03-28 21:54:05,013 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
> one configured in the master.
> 2012-03-28 21:54:08,598 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
> RS.
> java.io.IOException: Received the shutdown message while waiting.
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
>   at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-31 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243236#comment-13243236
 ] 

Matteo Bertozzi commented on HBASE-5666:


diving into the code, I've also noticed that there's a 
ZKUtil.waitForBaseZNode() that has already the retry logic but:
 - It takes a Configuration object
 - The timeout is set internally to 1ms
 - It's only used by test/.../hbase/util/ProcessBasedLocalHBaseCluster.java


> RegionServer doesn't retry to check if base node is available
> -
>
> Key: HBASE-5666
> URL: https://issues.apache.org/jira/browse/HBASE-5666
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, zookeeper
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
> hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
> hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed 
> mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and 
> HRegionServer.initializeZooKeeper() check just once if the base not is 
> available.
> {code}
> 2012-03-28 21:54:05,013 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
> one configured in the master.
> 2012-03-28 21:54:08,598 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
> RS.
> java.io.IOException: Received the shutdown message while waiting.
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
>   at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5690) compression does not work in Store.java of 0.94

2012-03-31 Thread Zhihong Yu (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu resolved HBASE-5690.
---

Resolution: Fixed

> compression does not work in Store.java of 0.94
> ---
>
> Key: HBASE-5690
> URL: https://issues.apache.org/jira/browse/HBASE-5690
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
> Environment: all
>Reporter: honghua zhu
>Assignee: honghua zhu
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: Store.patch
>
>
> HBASE-5442 The store.createWriterInTmp method missing "compression"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5690) compression does not work in Store.java of 0.94

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243229#comment-13243229
 ] 

Zhihong Yu commented on HBASE-5690:
---

Integrated to 0.94 branch.

Thanks for the patch Honghua.

> compression does not work in Store.java of 0.94
> ---
>
> Key: HBASE-5690
> URL: https://issues.apache.org/jira/browse/HBASE-5690
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
> Environment: all
>Reporter: honghua zhu
>Assignee: honghua zhu
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: Store.patch
>
>
> HBASE-5442 The store.createWriterInTmp method missing "compression"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243227#comment-13243227
 ] 

Zhihong Yu commented on HBASE-5682:
---

Application to other HConnection makes sense.

> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5097:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

> RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
> return null can stall the system initialization through NPE
> ---
>
> Key: HBASE-5097
> URL: https://issues.apache.org/jira/browse/HBASE-5097
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.94.0, 0.96.0
>
> Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch
>
>
> In HRegionServer.java openScanner()
> {code}
>   r.prepareScanner(scan);
>   RegionScanner s = null;
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().preScannerOpen(scan);
>   }
>   if (s == null) {
> s = r.getScanner(scan);
>   }
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().postScannerOpen(scan, s);
>   }
> {code}
> If we dont have implemention for postScannerOpen the RegionScanner is null 
> and so throwing nullpointer 
> {code}
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
> {code}
> Making this defect as blocker.. Pls feel free to change the priority if am 
> wrong.  Also correct me if my way of trying out coprocessors without 
> implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5656:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

> LoadIncrementalHFiles createTable should detect and set compression algorithm
> -
>
> Key: HBASE-5656
> URL: https://issues.apache.org/jira/browse/HBASE-5656
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.92.1
>Reporter: Cosmin Lehene
>Assignee: Cosmin Lehene
> Fix For: 0.92.2, 0.94.0, 0.96.0
>
> Attachments: HBASE-5656-0.92.patch, HBASE-5656-0.92.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> LoadIncrementalHFiles doesn't set compression when creating the the table.
> This can be detected from the files within each family dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3134) [replication] Add the ability to enable/disable streams

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-3134:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

> [replication] Add the ability to enable/disable streams
> ---
>
> Key: HBASE-3134
> URL: https://issues.apache.org/jira/browse/HBASE-3134
> Project: HBase
>  Issue Type: New Feature
>  Components: replication
>Reporter: Jean-Daniel Cryans
>Assignee: Teruyoshi Zenmyo
>Priority: Minor
>  Labels: replication
> Fix For: 0.94.0
>
> Attachments: 3134-v2.txt, 3134-v3.txt, 3134-v4.txt, 3134.txt, 
> HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch
>
>
> This jira was initially in the scope of HBASE-2201, but was pushed out since 
> it has low value compared to the required effort (and when want to ship 
> 0.90.0 rather soonish).
> We need to design a way to enable/disable replication streams in a 
> determinate fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5689:
-

Fix Version/s: 0.94.0

> Skipping RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: 5689-simplified.txt, 5689-testcase.patch, 
> HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5690) compression does not work in Store.java of 0.94

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5690:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

> compression does not work in Store.java of 0.94
> ---
>
> Key: HBASE-5690
> URL: https://issues.apache.org/jira/browse/HBASE-5690
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
> Environment: all
>Reporter: honghua zhu
>Assignee: honghua zhu
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: Store.patch
>
>
> HBASE-5442 The store.createWriterInTmp method missing "compression"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.0
>
> Attachments: 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243222#comment-13243222
 ] 

Lars Hofhansl commented on HBASE-5682:
--

Thanks Ted.
The last question is: Should we do this for all HConnection (not just for 
unmanaged ones)? It means that HConnection would be able to recover from loss 
of ZK connection and the abort() method would only clear out the ZK trackers 
and never close or abort he connection. I'd be in favor of that.

@Ram and @Jieshan: Since would a more robust version of HBASE-5153, could you 
have a look at this?


> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.1
>
> Attachments: 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-03-31 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5689:
--

Attachment: 5689-simplified.txt

Removing the check should be enough.

With the attached patch, TestHRegion#testDataCorrectnessReplayingRecoveredEdits 
passes.

> Skipping RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Attachments: 5689-simplified.txt, 5689-testcase.patch, 
> HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5636) TestTableMapReduce doesn't work properly.

2012-03-31 Thread Zhihong Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5636:
-

Assignee: Takuya Ueshin

> TestTableMapReduce doesn't work properly.
> -
>
> Key: HBASE-5636
> URL: https://issues.apache.org/jira/browse/HBASE-5636
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
> Attachments: HBASE-5636-v2.patch, HBASE-5636.patch
>
>
> No map function is called because there are no test data put before test 
> starts.
> The following three tests are in the same situation:
> - org.apache.hadoop.hbase.mapred.TestTableMapReduce
> - org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> - org.apache.hadoop.hbase.mapreduce.TestMulitthreadedTableMapper

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5663:
--

Fix Version/s: 0.96.0
   0.94.0

> MultithreadedTableMapper doesn't work.
> --
>
> Key: HBASE-5663
> URL: https://issues.apache.org/jira/browse/HBASE-5663
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.94.0
>Reporter: Takuya Ueshin
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5663+5636.txt, HBASE-5663.patch
>
>
> MapReduce job using MultithreadedTableMapper goes down throwing the following 
> Exception:
> {noformat}
> java.io.IOException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getConstructor(Class.java:1657)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241)
>   ... 8 more
> {noformat}
> This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-03-31 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5689:
--

Priority: Critical  (was: Major)

Making it critical as it is data loss related.

> Skipping RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243205#comment-13243205
 ] 

Zhihong Yu commented on HBASE-5663:
---

{code}
Results :

Tests run: 541, Failures: 0, Errors: 0, Skipped: 0
...
Results :

Tests run: 615, Failures: 0, Errors: 0, Skipped: 2

[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 35:45.595s
{code}

I will integrate Monday if there is no objection.

> MultithreadedTableMapper doesn't work.
> --
>
> Key: HBASE-5663
> URL: https://issues.apache.org/jira/browse/HBASE-5663
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.94.0
>Reporter: Takuya Ueshin
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5663+5636.txt, HBASE-5663.patch
>
>
> MapReduce job using MultithreadedTableMapper goes down throwing the following 
> Exception:
> {noformat}
> java.io.IOException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getConstructor(Class.java:1657)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241)
>   ... 8 more
> {noformat}
> This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Zhihong Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5663:
-

Assignee: Takuya Ueshin

> MultithreadedTableMapper doesn't work.
> --
>
> Key: HBASE-5663
> URL: https://issues.apache.org/jira/browse/HBASE-5663
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.94.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
> Fix For: 0.94.0, 0.96.0
>
> Attachments: 5663+5636.txt, HBASE-5663.patch
>
>
> MapReduce job using MultithreadedTableMapper goes down throwing the following 
> Exception:
> {noformat}
> java.io.IOException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getConstructor(Class.java:1657)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241)
>   ... 8 more
> {noformat}
> This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-03-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243204#comment-13243204
 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
---

@Ted
What do you feel Ted? Chunhui's patch also makes sense but any way the idea is 
not to skip any recovered.edits file.  So i felt removing the check will be 
enough.


> Skipping RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Takuya Ueshin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243191#comment-13243191
 ] 

Takuya Ueshin commented on HBASE-5663:
--

Oh, I misread what you wrote.
Thank you for your support.

> MultithreadedTableMapper doesn't work.
> --
>
> Key: HBASE-5663
> URL: https://issues.apache.org/jira/browse/HBASE-5663
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.94.0
>Reporter: Takuya Ueshin
> Attachments: 5663+5636.txt, HBASE-5663.patch
>
>
> MapReduce job using MultithreadedTableMapper goes down throwing the following 
> Exception:
> {noformat}
> java.io.IOException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getConstructor(Class.java:1657)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241)
>   ... 8 more
> {noformat}
> This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243190#comment-13243190
 ] 

Zhihong Yu commented on HBASE-5663:
---

I ran the following tests based on combined patch and they passed:
TestMultithreadedTableMapper, TestColumnSeeking, TestTableMapReduce

Looks like Hadoop QA is not working. I am running test suite on Linux.

> MultithreadedTableMapper doesn't work.
> --
>
> Key: HBASE-5663
> URL: https://issues.apache.org/jira/browse/HBASE-5663
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.94.0
>Reporter: Takuya Ueshin
> Attachments: 5663+5636.txt, HBASE-5663.patch
>
>
> MapReduce job using MultithreadedTableMapper goes down throwing the following 
> Exception:
> {noformat}
> java.io.IOException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getConstructor(Class.java:1657)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241)
>   ... 8 more
> {noformat}
> This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5663:
--

Attachment: 5663+5636.txt

Combined patch for HBASE-5663 and HBASE-5636

> MultithreadedTableMapper doesn't work.
> --
>
> Key: HBASE-5663
> URL: https://issues.apache.org/jira/browse/HBASE-5663
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.94.0
>Reporter: Takuya Ueshin
> Attachments: 5663+5636.txt, HBASE-5663.patch
>
>
> MapReduce job using MultithreadedTableMapper goes down throwing the following 
> Exception:
> {noformat}
> java.io.IOException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getConstructor(Class.java:1657)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241)
>   ... 8 more
> {noformat}
> This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243182#comment-13243182
 ] 

Zhihong Yu commented on HBASE-5663:
---

So the comment in patch about hadoop versions can be removed.
See the error I got when using hadoop 1.0:
https://issues.apache.org/jira/browse/HBASE-5636?focusedCommentId=13243180&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13243180

> MultithreadedTableMapper doesn't work.
> --
>
> Key: HBASE-5663
> URL: https://issues.apache.org/jira/browse/HBASE-5663
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.94.0
>Reporter: Takuya Ueshin
> Attachments: HBASE-5663.patch
>
>
> MapReduce job using MultithreadedTableMapper goes down throwing the following 
> Exception:
> {noformat}
> java.io.IOException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getConstructor(Class.java:1657)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241)
>   ... 8 more
> {noformat}
> This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5636) TestTableMapReduce doesn't work properly.

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243180#comment-13243180
 ] 

Zhihong Yu commented on HBASE-5636:
---

Without patch from HBASE-5663, I saw the following in test output:
{code}
2012-03-31 07:55:46,743 DEBUG [main] mapreduce.TableInputFormatBase(194): 
getSplits: split -> 24 -> 192.168.0.17:yyy,
java.io.IOException: java.lang.NoSuchMethodException: 
org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
 org.apache.hadoop.mapred.TaskAttemptID, 
org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, 
org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, 
org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
 org.apache.hadoop.hbase.mapreduce.TableSplit)
  at 
org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
  at 
org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
{code}

> TestTableMapReduce doesn't work properly.
> -
>
> Key: HBASE-5636
> URL: https://issues.apache.org/jira/browse/HBASE-5636
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Takuya Ueshin
> Attachments: HBASE-5636-v2.patch, HBASE-5636.patch
>
>
> No map function is called because there are no test data put before test 
> starts.
> The following three tests are in the same situation:
> - org.apache.hadoop.hbase.mapred.TestTableMapReduce
> - org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> - org.apache.hadoop.hbase.mapreduce.TestMulitthreadedTableMapper

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Takuya Ueshin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243172#comment-13243172
 ] 

Takuya Ueshin commented on HBASE-5663:
--

No, any MapReduce Job using current MultithreadedTableMapper would not work.

When the Mapper starts, it creates some threads and 'original' Mapper object 
which should be called by the thread and MapperContext object used by the 
original mapper.
But because the constructor call of MapperContext through reflection is wrong, 
it throws the NoSuchMethodException.

As I reported in an issue HBASE-5636, trunk version's test case 
TestMulitthreadedTableMapper has a bug, so please test by new version test 
attached in it.

> MultithreadedTableMapper doesn't work.
> --
>
> Key: HBASE-5663
> URL: https://issues.apache.org/jira/browse/HBASE-5663
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.94.0
>Reporter: Takuya Ueshin
> Attachments: HBASE-5663.patch
>
>
> MapReduce job using MultithreadedTableMapper goes down throwing the following 
> Exception:
> {noformat}
> java.io.IOException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getConstructor(Class.java:1657)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241)
>   ... 8 more
> {noformat}
> This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5689:
--

Affects Version/s: 0.94.0

> Skip RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-03-31 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5689:
--

Summary: Skipping RecoveredEdits may cause data loss  (was: Skip 
RecoveredEdits may cause data loss)

> Skipping RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243154#comment-13243154
 ] 

Zhihong Yu commented on HBASE-5689:
---

>From the above analysis, it looks like this bug was caused by the optimization 
>made in HBASE-4797.

> Skip RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243057#comment-13243057
 ] 

Zhihong Yu edited comment on HBASE-5689 at 3/31/12 2:00 PM:


@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just trying to 
understand more on what is happening there.  

  was (Author: ram_krish):
@Chunhui

Good test case.  Yes able to reproduce the problem :).. Just try to understand 
more on what is happening there.  
{Edit}Good test case.  Yes able to reproduce the problem :).. Just trying to 
understand more on what is happening there.  
  
> Skip RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243150#comment-13243150
 ] 

Zhihong Yu commented on HBASE-5663:
---

We could check the Exception was caused by NoSuchMethodException.
Since the retry is in MapRunner ctor, the above may not be needed.

> MultithreadedTableMapper doesn't work.
> --
>
> Key: HBASE-5663
> URL: https://issues.apache.org/jira/browse/HBASE-5663
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.94.0
>Reporter: Takuya Ueshin
> Attachments: HBASE-5663.patch
>
>
> MapReduce job using MultithreadedTableMapper goes down throwing the following 
> Exception:
> {noformat}
> java.io.IOException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getConstructor(Class.java:1657)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241)
>   ... 8 more
> {noformat}
> This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5663:
--

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

> MultithreadedTableMapper doesn't work.
> --
>
> Key: HBASE-5663
> URL: https://issues.apache.org/jira/browse/HBASE-5663
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.94.0
>Reporter: Takuya Ueshin
> Attachments: HBASE-5663.patch
>
>
> MapReduce job using MultithreadedTableMapper goes down throwing the following 
> Exception:
> {noformat}
> java.io.IOException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getConstructor(Class.java:1657)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241)
>   ... 8 more
> {noformat}
> This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243147#comment-13243147
 ] 

Zhihong Yu commented on HBASE-5635:
---

Then we should log a message saying what we wait for every X minutes so that 
user doesn't have to use jstack.

> If getTaskList() returns null splitlogWorker is down. It wont serve any 
> requests. 
> --
>
> Key: HBASE-5635
> URL: https://issues.apache.org/jira/browse/HBASE-5635
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.1
>Reporter: Kristam Subba Swathi
> Attachments: HBASE-5635.1.patch, HBASE-5635.patch
>
>
> During the hlog split operation if all the zookeepers are down ,then the 
> paths will be returned as null and the splitworker thread wil be exited
> Now this regionserver wil not be able to acquire any other tasks since the 
> splitworker thread is exited
> Please find the attached code for more details
> {code}
> private List getTaskList() {
> for (int i = 0; i < zkretries; i++) {
>   try {
> return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
> this.watcher.splitLogZNode));
>   } catch (KeeperException e) {
> LOG.warn("Could not get children of znode " +
> this.watcher.splitLogZNode, e);
> try {
>   Thread.sleep(1000);
> } catch (InterruptedException e1) {
>   LOG.warn("Interrupted while trying to get task list ...", e1);
>   Thread.currentThread().interrupt();
>   return null;
> }
>   }
> }
> {code}
> in the org.apache.hadoop.hbase.regionserver.SplitLogWorker 
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5690) compression does not work in Store.java of 0.94

2012-03-31 Thread Zhihong Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5690:
-

Assignee: honghua zhu

> compression does not work in Store.java of 0.94
> ---
>
> Key: HBASE-5690
> URL: https://issues.apache.org/jira/browse/HBASE-5690
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
> Environment: all
>Reporter: honghua zhu
>Assignee: honghua zhu
>Priority: Critical
> Fix For: 0.94.1
>
> Attachments: Store.patch
>
>
> HBASE-5442 The store.createWriterInTmp method missing "compression"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5690) compression does not work in Store.java of 0.94

2012-03-31 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5690:
--

Hadoop Flags: Reviewed
 Summary: compression does not work in Store.java of 0.94  (was: 
compression unavailable)

> compression does not work in Store.java of 0.94
> ---
>
> Key: HBASE-5690
> URL: https://issues.apache.org/jira/browse/HBASE-5690
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
> Environment: all
>Reporter: honghua zhu
>Priority: Critical
> Fix For: 0.94.1
>
> Attachments: Store.patch
>
>
> HBASE-5442 The store.createWriterInTmp method missing "compression"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5690) compression unavailable

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243142#comment-13243142
 ] 

Zhihong Yu commented on HBASE-5690:
---

+1 on patch.

HBASE-5605 fixed the same problem in trunk.

> compression unavailable
> ---
>
> Key: HBASE-5690
> URL: https://issues.apache.org/jira/browse/HBASE-5690
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
> Environment: all
>Reporter: honghua zhu
>Priority: Critical
> Fix For: 0.94.1
>
> Attachments: Store.patch
>
>
> HBASE-5442 The store.createWriterInTmp method missing "compression"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243141#comment-13243141
 ] 

Zhihong Yu commented on HBASE-5682:
---

+1 on patch.

> Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
> only)
> --
>
> Key: HBASE-5682
> URL: https://issues.apache.org/jira/browse/HBASE-5682
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.1
>
> Attachments: 5682-v2.txt, 5682.txt
>
>
> Just realized that without this HBASE-4805 is broken.
> I.e. there's no point keeping a persistent HConnection around if it can be 
> rendered permanently unusable if the ZK connection is lost temporarily.
> Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
> backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5690) compression unavailable

2012-03-31 Thread honghua zhu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

honghua zhu updated HBASE-5690:
---

Fix Version/s: (was: 0.94.0)
   0.94.1

> compression unavailable
> ---
>
> Key: HBASE-5690
> URL: https://issues.apache.org/jira/browse/HBASE-5690
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
> Environment: all
>Reporter: honghua zhu
>Priority: Critical
> Fix For: 0.94.1
>
> Attachments: Store.patch
>
>
> HBASE-5442 The store.createWriterInTmp method missing "compression"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243109#comment-13243109
 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
---

I just want to understand here like removing the 
{code}
  if (maxSeqId <= minSeqId) {
  String msg = "Maximum possible sequenceid for this log is " + maxSeqId
  + ", skipped the whole file, path=" + edits;
  LOG.debug(msg);
  continue;
{code}
'continue' here should solve the problem.  Inside replayRecoveredEdits we have 
this code
{code}
 // Now, figure if we should skip this edit.
  if (key.getLogSeqNum() <= currentEditSeqId) {
skippedEdits++;
continue;
  }
{code}
which will any way skip unnecessary edits.  I tried by commenting the 
'continue' in the code and the test case that you gave passed.  It is my 
thought, your comments are welcome.

> Skip RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243106#comment-13243106
 ] 

ramkrishna.s.vasudevan commented on HBASE-5689:
---

@Chunhui
Thanks for the patch.
I would like to tell my analysis in this
-> Suppose the current  seq for RS 1 is 4
When the first row kv was inserted KV(r1->v1), the current seq is 5.
On moving the region to RS2 the store gets flushed and when RS2 opens the 
region the next seq he can use will be 6.
Now we make the next kv entry KV(r2->v1), now the current seq for this entry is 
6. Now when the region is moved again to RS1, another store file is created by 
RS2.
Now when RS1 opens the region the seq number which he can use will be 7.

We now add an entry KV(r3->v1) again in RS1 so it will have 7 in it (in WAL).

Kill the RS2 first.  This will create a recovered.edits with file name 06.

Kill RS1.   This will create a recovered.edits with file name 5.

Now when the region is finally opened in a new RS i will be having the 2 store 
files and the max seq id from them will be 6.  Now the recovered.edits will 
also give me 6 as highest seq.  
{code}
  if (maxSeqId <= minSeqId) {
  String msg = "Maximum possible sequenceid for this log is " + maxSeqId
  + ", skipped the whole file, path=" + edits;
  LOG.debug(msg);
  continue;
{code}
Correct me my analysis is wrong.



> Skip RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5536) Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no longer work on older versions

2012-03-31 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243092#comment-13243092
 ] 

Uma Maheswara Rao G commented on HBASE-5536:


Yes, Ram. typo :-).

Here is what I meant:
We should not access any non exposed APIs from HDFS at least from this version. 
We should bring the requirement to HDFS about the usage and need of that APIs 
in Hbase. Get the access throw some public interfacing. So, that even though 
HDFS changes internal APIs, public interfacing would remains same. So, 
Hbase/Other dependent users need not end up with reflections for every 
compatibility change.

> Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no 
> longer work on older versions
> --
>
> Key: HBASE-5536
> URL: https://issues.apache.org/jira/browse/HBASE-5536
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Priority: Blocker
> Fix For: 0.96.0
>
>
> Looks like there is pretty much consensus that depending on 1.0.0 in 0.96 
> should be fine?  See 
> http://search-hadoop.com/m/dSbVW14EsUb2/discuss+0.96&subj=RE+DISCUSS+Have+hbase+require+at+least+hadoop+1+0+0+in+hbase+0+96+0+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.

2012-03-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243087#comment-13243087
 ] 

ramkrishna.s.vasudevan commented on HBASE-5635:
---

bq.Do we need a timeout for the newly added loop ?
If we have timeout, then if the zk connection takes a little longer than the 
timeout how to make the worker thread to be alive?  That is the reason an 
infinite loop was tried.

> If getTaskList() returns null splitlogWorker is down. It wont serve any 
> requests. 
> --
>
> Key: HBASE-5635
> URL: https://issues.apache.org/jira/browse/HBASE-5635
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.1
>Reporter: Kristam Subba Swathi
> Attachments: HBASE-5635.1.patch, HBASE-5635.patch
>
>
> During the hlog split operation if all the zookeepers are down ,then the 
> paths will be returned as null and the splitworker thread wil be exited
> Now this regionserver wil not be able to acquire any other tasks since the 
> splitworker thread is exited
> Please find the attached code for more details
> {code}
> private List getTaskList() {
> for (int i = 0; i < zkretries; i++) {
>   try {
> return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
> this.watcher.splitLogZNode));
>   } catch (KeeperException e) {
> LOG.warn("Could not get children of znode " +
> this.watcher.splitLogZNode, e);
> try {
>   Thread.sleep(1000);
> } catch (InterruptedException e1) {
>   LOG.warn("Interrupted while trying to get task list ...", e1);
>   Thread.currentThread().interrupt();
>   return null;
> }
>   }
> }
> {code}
> in the org.apache.hadoop.hbase.regionserver.SplitLogWorker 
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5536) Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no longer work on older versions

2012-03-31 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243085#comment-13243085
 ] 

ramkrishna.s.vasudevan commented on HBASE-5536:
---

bq.So, Hbase/Other dependant users need need end up in reflections for every 
version compatibility
I think you meant need 'not'

> Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no 
> longer work on older versions
> --
>
> Key: HBASE-5536
> URL: https://issues.apache.org/jira/browse/HBASE-5536
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Priority: Blocker
> Fix For: 0.96.0
>
>
> Looks like there is pretty much consensus that depending on 1.0.0 in 0.96 
> should be fine?  See 
> http://search-hadoop.com/m/dSbVW14EsUb2/discuss+0.96&subj=RE+DISCUSS+Have+hbase+require+at+least+hadoop+1+0+0+in+hbase+0+96+0+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block

2012-03-31 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243080#comment-13243080
 ] 

xufeng commented on HBASE-5673:
---

@Stack @Ted
I analyze the problem of my patch.
this is the result:
I wrap all exception in IOException,this IOException can not be handled in 
CatalogTracker#private HRegionInterface getCachedConnection(ServerName sn)
so the master will abort,the cases will fail.


In the future,I will submit the patch with the test result.

> The OOM problem of IPC client call  cause all handle block
> --
>
> Key: HBASE-5673
> URL: https://issues.apache.org/jira/browse/HBASE-5673
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6
> Environment: 0.90.6
>Reporter: xufeng
>Assignee: xufeng
> Fix For: 0.90.7, 0.92.2, 0.94.1
>
> Attachments: HBASE-5673-90-V2.patch, HBASE-5673-90.patch
>
>
> if HBaseClient meet "unable to create new native thread" exception, the call 
> will never complete because it be lost in calls queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.

2012-03-31 Thread Takuya Ueshin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated HBASE-5663:
-

Attachment: HBASE-5663.patch

I attached a patch file.

Could someone please review and refactor it?
I think this is a little dirty code but I couldn't find a better way.

> MultithreadedTableMapper doesn't work.
> --
>
> Key: HBASE-5663
> URL: https://issues.apache.org/jira/browse/HBASE-5663
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.94.0
>Reporter: Takuya Ueshin
> Attachments: HBASE-5663.patch
>
>
> MapReduce job using MultithreadedTableMapper goes down throwing the following 
> Exception:
> {noformat}
> java.io.IOException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.mapred.TaskAttemptID, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
>  
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
>  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
>  org.apache.hadoop.hbase.mapreduce.TableSplit)
>   at java.lang.Class.getConstructor0(Class.java:2706)
>   at java.lang.Class.getConstructor(Class.java:1657)
>   at 
> org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241)
>   ... 8 more
> {noformat}
> This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5690) compression unavailable

2012-03-31 Thread honghua zhu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

honghua zhu updated HBASE-5690:
---

Attachment: Store.patch

> compression unavailable
> ---
>
> Key: HBASE-5690
> URL: https://issues.apache.org/jira/browse/HBASE-5690
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
> Environment: all
>Reporter: honghua zhu
>Priority: Critical
> Fix For: 0.94.0
>
> Attachments: Store.patch
>
>
> HBASE-5442 The store.createWriterInTmp method missing "compression"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5690) compression unavailable

2012-03-31 Thread honghua zhu (Created) (JIRA)
compression unavailable
---

 Key: HBASE-5690
 URL: https://issues.apache.org/jira/browse/HBASE-5690
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
 Environment: all
Reporter: honghua zhu
Priority: Critical
 Fix For: 0.94.0


HBASE-5442 The store.createWriterInTmp method missing "compression"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5689:


Attachment: HBASE-5689.patch

In the patch,
I make the region's MaximumEditLogSeqNum in the RecoveredEdit file as the file 
name.

> Skip RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-31 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243070#comment-13243070
 ] 

chunhui shen commented on HBASE-5689:
-

rather than the current MinimumEditLogSeqNum as the file name.


> Skip RecoveredEdits may cause data loss
> ---
>
> Key: HBASE-5689
> URL: https://issues.apache.org/jira/browse/HBASE-5689
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: 5689-testcase.patch, HBASE-5689.patch
>
>
> Let's see the following scenario:
> 1.Region is on the server A
> 2.put KV(r1->v1) to the region
> 3.move region from server A to server B
> 4.put KV(r2->v2) to the region
> 5.move region from server B to server A
> 6.put KV(r3->v3) to the region
> 7.kill -9 server B and start it
> 8.kill -9 server A and start it 
> 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third 
> KV(r3->v3) is lost.
> Let's analyse the upper scenario from the code:
> 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same 
> hlog file on server A.
> 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
> we create one RecoveredEdits file f1 for the region.
> 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
> we create another RecoveredEdits file f2 for the region.
> 3.however, RecoveredEdits file f2 will be skiped when initializing region
> HRegion#replayRecoveredEditsIfAny
> {code}
>  for (Path edits: files) {
>   if (edits == null || !this.fs.exists(edits)) {
> LOG.warn("Null or non-existent edits file: " + edits);
> continue;
>   }
>   if (isZeroLengthThenDelete(this.fs, edits)) continue;
>   if (checkSafeToSkip) {
> Path higher = files.higher(edits);
> long maxSeqId = Long.MAX_VALUE;
> if (higher != null) {
>   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+"
>   String fileName = higher.getName();
>   maxSeqId = Math.abs(Long.parseLong(fileName));
> }
> if (maxSeqId <= minSeqId) {
>   String msg = "Maximum possible sequenceid for this log is " + 
> maxSeqId
>   + ", skipped the whole file, path=" + edits;
>   LOG.debug(msg);
>   continue;
> } else {
>   checkSafeToSkip = false;
> }
>   }
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira