[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243668#comment-13243668 ] xufeng commented on HBASE-5677: --- We can reproduce this issue by following steps with 0.90: step1:start a cluster and create a table that has many regions. step2:disable table created in step1 by shell. step3:kill the active master. step3:the backup master will become active one,when the master checkin regionservers. enable the table by shell. result:the duplicate problem issue happened. I think the master should not provide service when it did not complete the initialization. We can add a method in HMasterInterface like: {noformat} public boolean isMasterAvailable(); //the master is running and it can provide service public boolean isMasterAvailable() { return !isStopped() && isActiveMaster() && isInitialized(); } {noformat} When the client getMaster,we can check it. pls give me the suggestions,thanks. > The master never does balance because duplicate openhandled the one region > -- > > Key: HBASE-5677 > URL: https://issues.apache.org/jira/browse/HBASE-5677 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6 > Environment: 0.90 >Reporter: xufeng >Assignee: xufeng > > If region be assigned When the master is doing initialization(before do > processFailover),the region will be duplicate openhandled. > because the unassigned node in zookeeper will be handled again in > AssignmentManager#processFailover() > it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5638) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5638: - Fix Version/s: (was: 0.94.1) 0.96.0 > Backport to 0.90 and 0.92 - NPE reading ZK config in HBase > -- > > Key: HBASE-5638 > URL: https://issues.apache.org/jira/browse/HBASE-5638 > Project: HBase > Issue Type: Sub-task > Components: zookeeper >Affects Versions: 0.90.6, 0.92.1 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Minor > Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 > > Attachments: HBASE-5633-0.90.patch, HBASE-5633-0.92.patch, > HBASE-5638-0.90-v1.patch, HBASE-5638-0.90-v2.patch, HBASE-5638-0.92-v1.patch, > HBASE-5638-0.92-v2.patch, HBASE-5638-trunk-v1.patch, HBASE-5638-trunk-v2.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5671) hbase.metrics.showTableName should be true by default
[ https://issues.apache.org/jira/browse/HBASE-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5671: - Fix Version/s: (was: 0.94.1) 0.94.0 > hbase.metrics.showTableName should be true by default > - > > Key: HBASE-5671 > URL: https://issues.apache.org/jira/browse/HBASE-5671 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Enis Soztutar >Assignee: Enis Soztutar >Priority: Critical > Fix For: 0.94.0, 0.96.0 > > Attachments: HBASE-5671_v1.patch > > > HBASE-4768 added per-cf metrics and a new configuration option > "hbase.metrics.showTableName". We should switch the conf option to true by > default, since it is not intuitive (at least to me) to aggregate per-cf > across tables by default, and it seems confusing to report on cf's without > table names. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243665#comment-13243665 ] Lars Hofhansl commented on HBASE-5682: -- bq. You think this should go into 0.92? Probably. I guess most folks have clients that they restart frequently, use thrift, or asynchhbase. But in its current form using the standard HBase client in an app server is very error prone if the HBase/ZK cluster is ever serviced without bringing the app server down in lock step. bq. Didn't we add a check for if the connection is bad? Yeah with hbase-5153 but in 0.90 only. At some point we decided the fix there wasn't good and Ram patched it up for 0.90. This should subsime HBASE-5153. I'm happy to even put this in 0.90, but that's up to Ram. bq. I'm interested in problems you see in hbase-5153 or issues you have w/ the implementation there that being the 0.96 client. What I saw in 0.96 is that the client was blocked for a very long time (gave up after a few minutes), even though I had set all timeouts to low values. This is also deadly in an app server setting. Might be a simple fix there, didn't dig deeper. > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 0.94.0 > > Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243662#comment-13243662 ] Lars Hofhansl edited comment on HBASE-5682 at 4/1/12 5:49 AM: -- Found the problem. The ClusterId could remain null permanently if HConnection.getZookeeperWatcher() was called. That would initialize HConnectionImplementation.zookeeper, and hence not reset clusterid in ensureZookeeperTrackers. TestZookeeper.testClientSessionExpired does that. Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING rather than CONNECTED depending on timing. Upon inspection I also made clusterId, rootRegionTracker, masterAddressTracker, and zooKeeper volatile, because they can be modified by a different thread, but are not exclusively accessed in a synchronized block (existing problem). New patch that fixes the problem, passes all tests. TestZookeeper seems to have good coverage. If I can think of more tests, I'll add them there. was (Author: lhofhansl): Found the problem. The ClusterId could be remain null permanently if HConnection.getZookeeperWatcher() was called. That would initialize HConnectionImplementation.zookeeper, and hence not reset clusterid in ensureZookeeperTrackers. TestZookeeper.testClientSessionExpired does that. Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING rather than CONNECTED depending on timing. Upon inspection I also made clusterId, rootRegionTracker, masterAddressTracker, and zooKeeper volatile, because they can be modified by a different thread, but are not exclusively accessed in a synchronized block (exiting problem). New patch that fixes the problem, passes all tests. TestZookeeper seems to have good coverage. If I can think of more tests, I'll add them there. > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 0.94.0 > > Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5682: - Priority: Critical (was: Major) Upped to "critical". Without this the HBase client is pretty much useless in an AppServer setting where client can outlive the HBase cluster and ZK ensemble. (Testing within the Salesforce AppServer is how I noticed the problem initially.) > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 0.94.0 > > Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5682: - Attachment: 5682-all-v2.txt Found the problem. The ClusterId could be remain null permanently if HConnection.getZookeeperWatcher() was called. That would initialize HConnectionImplementation.zookeeper, and hence not reset clusterid in ensureZookeeperTrackers. TestZookeeper.testClientSessionExpired does that. Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING rather than CONNECTED depending on timing. Upon inspection I also made clusterId, rootRegionTracker, masterAddressTracker, and zooKeeper volatile, because they can be modified by a different thread, but are not exclusively accessed in a synchronized block (exiting problem). New patch that fixes the problem, passes all tests. TestZookeeper seems to have good coverage. If I can think of more tests, I'll add them there. > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4348) Add metrics for regions in transition
[ https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-4348: -- Fix Version/s: 0.96.0 Hadoop Flags: Reviewed > Add metrics for regions in transition > - > > Key: HBASE-4348 > URL: https://issues.apache.org/jira/browse/HBASE-4348 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 0.92.0 >Reporter: Todd Lipcon >Assignee: Himanshu Vashishtha >Priority: Minor > Labels: noob > Fix For: 0.96.0 > > Attachments: 4348-metrics-v3.patch, 4348-v1.patch, 4348-v2.patch, > RITs.png, RegionInTransitions2.png, metrics-v2.patch > > > The following metrics would be useful for monitoring the master: > - the number of regions in transition > - the number of regions in transition that have been in transition for more > than a minute > - how many seconds has the oldest region-in-transition been in transition -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243648#comment-13243648 ] Zhihong Yu commented on HBASE-5689: --- Some minor comments about Chunhui's patch: {code} + LOG.debug("Rename " + wap.p + " to " + dst); {code} The log should begin with "Renamed ". {code} +private final Map regionMaximumEditLogSeqNum = Collections +.synchronizedMap(new TreeMap(Bytes.BYTES_COMPARATOR)); {code} Is a TreeMap needed above ? We're just remembering the mapping, right ? > Skipping RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.94.0 > > Attachments: 5689-simplified.txt, 5689-testcase.patch, > HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2186) hbase master should publish more stats
[ https://issues.apache.org/jira/browse/HBASE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243642#comment-13243642 ] Otis Gospodnetic commented on HBASE-2186: - Just saw some other JIRA issue (don't recall the number) that mentioned master having more data so I wonder - is this issue still relevant? Hasn't been touched in over 2 years. Thanks. > hbase master should publish more stats > -- > > Key: HBASE-2186 > URL: https://issues.apache.org/jira/browse/HBASE-2186 > Project: HBase > Issue Type: Bug > Components: metrics >Reporter: ryan rawson > > hbase master only publishes cluster.requests to ganglia. we should also > publish regionserver count and other interesting metrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4348) Add metrics for regions in transition
[ https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243641#comment-13243641 ] Otis Gospodnetic commented on HBASE-4348: - Fix Version/s is set to None. Is this for 0.96? > Add metrics for regions in transition > - > > Key: HBASE-4348 > URL: https://issues.apache.org/jira/browse/HBASE-4348 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 0.92.0 >Reporter: Todd Lipcon >Assignee: Himanshu Vashishtha >Priority: Minor > Labels: noob > Attachments: 4348-metrics-v3.patch, 4348-v1.patch, 4348-v2.patch, > RITs.png, RegionInTransitions2.png, metrics-v2.patch > > > The following metrics would be useful for monitoring the master: > - the number of regions in transition > - the number of regions in transition that have been in transition for more > than a minute > - how many seconds has the oldest region-in-transition been in transition -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5692) Add real action time for HLogPrettyPrinter
[ https://issues.apache.org/jira/browse/HBASE-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xing Shi updated HBASE-5692: Attachment: HBASE-5692.patch > Add real action time for HLogPrettyPrinter > -- > > Key: HBASE-5692 > URL: https://issues.apache.org/jira/browse/HBASE-5692 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: Xing Shi >Priority: Minor > Attachments: HBASE-5692.patch > > > Now the HLogPrettyPrinter print the log without real op time but the timestamp > {quote} > Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5 > Action: > row: r > column: f3:q > at time: Thu Jan 01 08:02:03 CST 1970 > {quote} > Maybe we need to know the real op time like this > {quote} > Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5 at time: > > Action: > row: r > column: f3:q > timestamp: Thu Jan 01 08:02:03 CST 1970 > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5213) "hbase master stop" does not bring down backup masters
[ https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5213: -- Fix Version/s: 0.92.2 0.90.7 > "hbase master stop" does not bring down backup masters > -- > > Key: HBASE-5213 > URL: https://issues.apache.org/jira/browse/HBASE-5213 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 >Reporter: Gregory Chanan >Assignee: Gregory Chanan >Priority: Minor > Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 > > Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, > HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch > > > Typing "hbase master stop" produces the following message: > "stop Start cluster shutdown; Master signals RegionServer shutdown" > It seems like backup masters should be considered part of the cluster, but > they are not brought down by "hbase master stop". > "stop-hbase.sh" does correctly bring down the backup masters. > The same behavior is observed when a client app makes use of the client API > HBaseAdmin.shutdown() > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown() > -- this isn't too surprising since I think "hbase master stop" just calls > this API. > It seems like HBASE-1448 address this; perhaps there was a regression? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5213) "hbase master stop" does not bring down backup masters
[ https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5213: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > "hbase master stop" does not bring down backup masters > -- > > Key: HBASE-5213 > URL: https://issues.apache.org/jira/browse/HBASE-5213 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 >Reporter: Gregory Chanan >Assignee: Gregory Chanan >Priority: Minor > Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 > > Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, > HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch > > > Typing "hbase master stop" produces the following message: > "stop Start cluster shutdown; Master signals RegionServer shutdown" > It seems like backup masters should be considered part of the cluster, but > they are not brought down by "hbase master stop". > "stop-hbase.sh" does correctly bring down the backup masters. > The same behavior is observed when a client app makes use of the client API > HBaseAdmin.shutdown() > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown() > -- this isn't too surprising since I think "hbase master stop" just calls > this API. > It seems like HBASE-1448 address this; perhaps there was a regression? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5213) "hbase master stop" does not bring down backup masters
[ https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243639#comment-13243639 ] Jonathan Hsieh commented on HBASE-5213: --- Committed to 0.92 and 0.90. Thanks Greg. > "hbase master stop" does not bring down backup masters > -- > > Key: HBASE-5213 > URL: https://issues.apache.org/jira/browse/HBASE-5213 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 >Reporter: Gregory Chanan >Assignee: Gregory Chanan >Priority: Minor > Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 > > Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, > HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch > > > Typing "hbase master stop" produces the following message: > "stop Start cluster shutdown; Master signals RegionServer shutdown" > It seems like backup masters should be considered part of the cluster, but > they are not brought down by "hbase master stop". > "stop-hbase.sh" does correctly bring down the backup masters. > The same behavior is observed when a client app makes use of the client API > HBaseAdmin.shutdown() > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown() > -- this isn't too surprising since I think "hbase master stop" just calls > this API. > It seems like HBASE-1448 address this; perhaps there was a regression? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5213) "hbase master stop" does not bring down backup masters
[ https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243638#comment-13243638 ] Jonathan Hsieh commented on HBASE-5213: --- Committed to 0.92 and 0.90. Thanks Greg. > "hbase master stop" does not bring down backup masters > -- > > Key: HBASE-5213 > URL: https://issues.apache.org/jira/browse/HBASE-5213 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 >Reporter: Gregory Chanan >Assignee: Gregory Chanan >Priority: Minor > Fix For: 0.94.0, 0.96.0 > > Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, > HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch > > > Typing "hbase master stop" produces the following message: > "stop Start cluster shutdown; Master signals RegionServer shutdown" > It seems like backup masters should be considered part of the cluster, but > they are not brought down by "hbase master stop". > "stop-hbase.sh" does correctly bring down the backup masters. > The same behavior is observed when a client app makes use of the client API > HBaseAdmin.shutdown() > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown() > -- this isn't too surprising since I think "hbase master stop" just calls > this API. > It seems like HBASE-1448 address this; perhaps there was a regression? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5692) Add real action time for HLogPrettyPrinter
Add real action time for HLogPrettyPrinter -- Key: HBASE-5692 URL: https://issues.apache.org/jira/browse/HBASE-5692 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Xing Shi Priority: Minor Now the HLogPrettyPrinter print the log without real op time but the timestamp {quote} Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5 Action: row: r column: f3:q at time: Thu Jan 01 08:02:03 CST 1970 {quote} Maybe we need to know the real op time like this {quote} Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5 at time: Action: row: r column: f3:q timestamp: Thu Jan 01 08:02:03 CST 1970 {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243637#comment-13243637 ] Lars Hofhansl commented on HBASE-5682: -- I am not envisioning any API changes, just that the HConnection would no longer be ripped from under any HTables where there is a ZK connection loss. I ran all tests again, and TestReplication and TestZookeeper have some failures that are related. Looking. > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243625#comment-13243625 ] chunhui shen commented on HBASE-5689: - @Ram The analysis is correct. However, I think your solution is not use the optimization made in HBASE-4797. In my HBASE-5689.patch, I just change the file name of edit log to MaximumEditLogSeqNum。 In this issue, with the patch it will not skip RecoveredEdits file because we shouldn't, however, in other case, for example, we first put many data to RS1, then move regon to RS2 and put many data again. If RS1 and RS2 died, we would skip the edit log file from RS1. bq.Chunhui's patch also makes sense but any way the idea is not to skip any recovered.edits file. So, it is wrong, my way is keeping skip recovered.edits file, except the case such as this issue. > Skipping RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.94.0 > > Attachments: 5689-simplified.txt, 5689-testcase.patch, > HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243620#comment-13243620 ] stack commented on HBASE-5682: -- I looked at the 'all' patch. Looks good to me. Am interested in how it changes API usage (if at all). > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243619#comment-13243619 ] stack commented on HBASE-5682: -- bq. The problem I have seen in 0.94/0.92 without this patch even with managed connections is that after HConnection times out, it is unusable and even getting a new HTable does not fix the problem since behind the scenes the same HConnection is retrieved. Didn't we add a check for if the connection is bad? bq. Will think about an automated test. Do you like the version better that always does the recheck (and hence all the conditional for "managed" go away)? How does this work in trunk? In trunk the work has been done so we don't really keep open a zk session any more. For the sake of making tests run smoother, we'll do keep alive on zk session and hold it open 5 minutes and let it go if unused. I'm +1 on making our stuff more resilient. Resusing a dud hconnection either because the connection is dead or zk session died is hard to figure. How will this change a users's perception about how this stuff is used? If your answer is that it helps in the extreme where the connection goes dead, and thats the only change a user percieves, then lets commit. But we should include a test? If you describe one, I can try help write it? You think this should go into 0.92? > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5688) Convert zk root-region-server znode content to pb
[ https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243618#comment-13243618 ] jirapos...@reviews.apache.org commented on HBASE-5688: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4600/ --- Review request for hbase. Summary --- Changes the content of the root location znode, root-region-server, to be four magic bytes ('PBUF') followed by a protobuf message that holds the ServerName of the server currently hosting root. D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java Removed. Had two methods, one to add root-region-server znode and another to removed it. Rather, put these methods in RootRegionTracker. It tracks root-region-server znode. Having all to do w/ root-region-server is more cohesive. Also makes it so can encapsulate in one class all to do w/ create, delete, and reading of root-region-server. We also want to purge the catalog package (See note at head of CatalogTracker). M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Get root region location from RootRegionTracker rather than from RootLocationEditor. A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java Utility to do w/ protobuf handling. Has methods to help prefixing and stripping from serialized protobuf messages some 'magic'. A src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java PB generated. M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java Use new RootRegionTracker method for getting content of znode rather than do it all here (going via RootRegionTracker, we can keep how the znode content is serialized private to the RootRegionTracker class. M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java Has the methods that used to be in RootLocationEditor plus a new This addresses bug hbase-5688. https://issues.apache.org/jira/browse/hbase-5688 Diffs - src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java c90864a src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java b2a5463 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 64def15 src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 9c215b4 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 2f05005 src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java 33e4e71 src/main/protobuf/ZooKeeper.proto PRE-CREATION src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 533b2bf src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java fe37156 src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java 2132036 src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java PRE-CREATION Diff: https://reviews.apache.org/r/4600/diff Testing --- Thanks, Michael > Convert zk root-region-server znode content to pb > - > > Key: HBASE-5688 > URL: https://issues.apache.org/jira/browse/HBASE-5688 > Project: HBase > Issue Type: Task >Reporter: stack >Assignee: stack > Fix For: 0.96.0 > > Attachments: 5688.txt, 5688v4.txt > > > Move the root-region-server znode content from the versioned bytes that > ServerName.getVersionedBytes outputs to instead be pb. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5688) Convert zk root-region-server znode content to pb
[ https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243616#comment-13243616 ] stack edited comment on HBASE-5688 at 4/1/12 12:18 AM: --- Changes the content of the root location znode, root-region-server, to be four magic bytes ('PBUF') followed by a protobuf message that holds the ServerName of the server currently hosting root. {code} D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java Removed. Had two methods, one to add root-region-server znode and another to removed it. Rather, put these methods in RootRegionTracker. It tracks root-region-server znode. Having all to do w/ root-region-server is more cohesive. Also makes it so can encapsulate in one class all to do w/ create, delete, and reading of root-region-server. We also want to purge the catalog package (See note at head of CatalogTracker). M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Get root region location from RootRegionTracker rather than from RootLocationEditor. A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java Utility to do w/ protobuf handling. Has methods to help prefixing and stripping from serialized protobuf messages some 'magic'. A src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java PB generated. M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java Use new RootRegionTracker method for getting content of znode rather than do it all here (going via RootRegionTracker, we can keep how the znode content is serialized private to the RootRegionTracker class. M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java Has the methods that used to be in RootLocationEditor plus a new getRootRegionLocation method (does not set watcher). A src/main/protobuf/ZooKeeper.proto M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java M src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java Get root region location from RootRegionTracker rather than from RootLocationEditor. A src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java Test of dataToServerName method in RootRegionTracker. {code} was (Author: stack): Changes the content of the root location znode, root-region-server, to be four magic bytes ('PBUF') followed by a protobuf message that holds the ServerName of the server currently hosting root. D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java Removed. Had two methods, one to add root-region-server znode and another to removed it. Rather, put these methods in RootRegionTracker. It tracks root-region-server znode. Having all to do w/ root-region-server is more cohesive. Also makes it so can encapsulate in one class all to do w/ create, delete, and reading of root-region-server. We also want to purge the catalog package (See note at head of CatalogTracker). M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Get root region location from RootRegionTracker rather than from RootLocationEditor. A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java Utility to do w/ protobuf handling. Has methods to help prefixing and stripping from serialized protobuf messages some 'magic'. A src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java PB generated. M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java Use new RootRegionTracker method for getting content of znode rather than do it all here (going via RootRegionTracker, we can keep how the znode content is serialized private to the RootRegionTracker class. M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java Has the methods that used to be in RootLocationEditor plus a new getRootRegionLocation method (does not set watcher). A src/main/protobuf/ZooKeeper.proto M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java M src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java Get root region location from RootRegionTracker rather than from RootLocationEditor. A src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java Test of dataToServerName method in RootRegionTracker. > Convert zk root-region-server znode content to pb > - > >
[jira] [Updated] (HBASE-5688) Convert zk root-region-server znode content to pb
[ https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5688: - Attachment: 5688v4.txt Changes the content of the root location znode, root-region-server, to be four magic bytes ('PBUF') followed by a protobuf message that holds the ServerName of the server currently hosting root. D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java Removed. Had two methods, one to add root-region-server znode and another to removed it. Rather, put these methods in RootRegionTracker. It tracks root-region-server znode. Having all to do w/ root-region-server is more cohesive. Also makes it so can encapsulate in one class all to do w/ create, delete, and reading of root-region-server. We also want to purge the catalog package (See note at head of CatalogTracker). M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Get root region location from RootRegionTracker rather than from RootLocationEditor. A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java Utility to do w/ protobuf handling. Has methods to help prefixing and stripping from serialized protobuf messages some 'magic'. A src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java PB generated. M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java Use new RootRegionTracker method for getting content of znode rather than do it all here (going via RootRegionTracker, we can keep how the znode content is serialized private to the RootRegionTracker class. M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java Has the methods that used to be in RootLocationEditor plus a new getRootRegionLocation method (does not set watcher). A src/main/protobuf/ZooKeeper.proto M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java M src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java Get root region location from RootRegionTracker rather than from RootLocationEditor. A src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java Test of dataToServerName method in RootRegionTracker. > Convert zk root-region-server znode content to pb > - > > Key: HBASE-5688 > URL: https://issues.apache.org/jira/browse/HBASE-5688 > Project: HBase > Issue Type: Task >Reporter: stack > Fix For: 0.96.0 > > Attachments: 5688.txt, 5688v4.txt > > > Move the root-region-server znode content from the versioned bytes that > ServerName.getVersionedBytes outputs to instead be pb. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5688) Convert zk root-region-server znode content to pb
[ https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5688: - Fix Version/s: 0.96.0 Assignee: stack Release Note: Changes the content of the root location znode, root-region-server, to be four magic bytes ('PBUF') followed by a protobuf message that holds the ServerName of the server currently hosting root. Status: Patch Available (was: Open) Trying against hadoopqa. Let me put it up on review board too because would appreciate some review. > Convert zk root-region-server znode content to pb > - > > Key: HBASE-5688 > URL: https://issues.apache.org/jira/browse/HBASE-5688 > Project: HBase > Issue Type: Task >Reporter: stack >Assignee: stack > Fix For: 0.96.0 > > Attachments: 5688.txt, 5688v4.txt > > > Move the root-region-server znode content from the versioned bytes that > ServerName.getVersionedBytes outputs to instead be pb. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243605#comment-13243605 ] Lars Hofhansl commented on HBASE-5682: -- The more I look at, the more I do like the patch that changes the behavior in all cases. It's simple and low risk: Just recheck the ZK trackers before they are needed. > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5680) Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1
[ https://issues.apache.org/jira/browse/HBASE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243593#comment-13243593 ] Jonathan Hsieh edited comment on HBASE-5680 at 3/31/12 11:24 PM: - The master runs on top of hadoop 0.23.x apache 0.92.1 recompiled with the security profile *on* (-Psecurity) and with -Dhadoop.profile=23 apache 0.94.0rc0 recompiled with the security profile *on* (-Psecurity) and with -Dhadoop.profile=23 The master fails on top of hadoop 0.23.x with the class not found error in these cases: apache 0.92.1 right out of tarball apache 0.92.1 with the security profile *off* with -Dhadoop.profile=23 apache 0.94.0rc0 right out of tarball. apache 0.94.0rc0 security right out of tarball. apache 0.94.0rc0 with the security profile *off* with -Dhadoop.profile=23 was (Author: jmhsieh): The master starts on top of hadoop 0.23.x apache 0.92.1 recompiled with the security profile *on* (-Psecurity) and with -Dhadoop.profile=23 apache 0.94.0rc0 recompiled with the security profile *on* (-Psecurity) and with -Dhadoop.profile=23 The master fails on top of hadoop 0.23.x with the class not found error in these cases: apache 0.92.1 right out of tarball apache 0.92.1 with the security profile *off* with -Dhadoop.profile=23 apache 0.94.0rc0 right out of tarball. apache 0.94.0rc0 security right out of tarball. apache 0.94.0rc0 with the security profile *off* with -Dhadoop.profile=23 > Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1 > -- > > Key: HBASE-5680 > URL: https://issues.apache.org/jira/browse/HBASE-5680 > Project: HBase > Issue Type: Bug > Components: master >Reporter: Kristam Subba Swathi > > Hmaster is not able to start because of the following error > Please find the following error > > 2012-03-30 11:12:19,487 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > java.lang.NoClassDefFoundError: > org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction > at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:524) > at > org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:324) > at > org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127) > at > org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:112) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:496) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > ... 7 more > There is a change in the FSConstants -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-5691) Importtsv stops the webservice from which it is evoked
[ https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reopened HBASE-5691: > Importtsv stops the webservice from which it is evoked > -- > > Key: HBASE-5691 > URL: https://issues.apache.org/jira/browse/HBASE-5691 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.4 >Reporter: debarshi basak >Priority: Minor > > I was trying to run importtsv from a servlet. Everytime after the completion > of job, the tomcat server was shutdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5691) Importtsv stops the webservice from which it is evoked
[ https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HBASE-5691. Resolution: Not A Problem > Importtsv stops the webservice from which it is evoked > -- > > Key: HBASE-5691 > URL: https://issues.apache.org/jira/browse/HBASE-5691 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.4 >Reporter: debarshi basak >Priority: Minor > > I was trying to run importtsv from a servlet. Everytime after the completion > of job, the tomcat server was shutdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5680) Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1
[ https://issues.apache.org/jira/browse/HBASE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243593#comment-13243593 ] Jonathan Hsieh commented on HBASE-5680: --- The master starts on top of hadoop 0.23.x apache 0.92.1 recompiled with the security profile *on* (-Psecurity) and with -Dhadoop.profile=23 apache 0.94.0rc0 recompiled with the security profile *on* (-Psecurity) and with -Dhadoop.profile=23 The master fails on top of hadoop 0.23.x with the class not found error in these cases: apache 0.92.1 right out of tarball apache 0.92.1 with the security profile *off* with -Dhadoop.profile=23 apache 0.94.0rc0 right out of tarball. apache 0.94.0rc0 security right out of tarball. apache 0.94.0rc0 with the security profile *off* with -Dhadoop.profile=23 > Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1 > -- > > Key: HBASE-5680 > URL: https://issues.apache.org/jira/browse/HBASE-5680 > Project: HBase > Issue Type: Bug > Components: master >Reporter: Kristam Subba Swathi > > Hmaster is not able to start because of the following error > Please find the following error > > 2012-03-30 11:12:19,487 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > java.lang.NoClassDefFoundError: > org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction > at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:524) > at > org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:324) > at > org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127) > at > org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:112) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:496) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > ... 7 more > There is a change in the FSConstants -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5691) Importtsv stops the webservice from which it is evoked
[ https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243591#comment-13243591 ] debarshi basak commented on HBASE-5691: --- Thanks alot.It worked. > Importtsv stops the webservice from which it is evoked > -- > > Key: HBASE-5691 > URL: https://issues.apache.org/jira/browse/HBASE-5691 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.4 >Reporter: debarshi basak >Priority: Minor > > I was trying to run importtsv from a servlet. Everytime after the completion > of job, the tomcat server was shutdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5691) Importtsv stops the webservice from which it is evoked
[ https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] debarshi basak resolved HBASE-5691. --- Resolution: Fixed > Importtsv stops the webservice from which it is evoked > -- > > Key: HBASE-5691 > URL: https://issues.apache.org/jira/browse/HBASE-5691 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.4 >Reporter: debarshi basak >Priority: Minor > > I was trying to run importtsv from a servlet. Everytime after the completion > of job, the tomcat server was shutdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243589#comment-13243589 ] Lars Hofhansl commented on HBASE-5682: -- "perversion" is hard word. :) It is just rechecking before each use whether the trackers are still usable. The timeout is handled through the HConnection's abort(). The testing I've done: # ZK down, HBase down, start a client. Then start ZK, then HBase. # ZK up, HBase down, start client. Then start HBase # both ZK and HBase up, start client, kill HBase, restart HBase # both ZK and HBase up, start client, kill ZK and HBase restart The client just create a new HTable and then tries to get some rows in a loop. In all cases the client should successfully be able to reconnect when both ZK and HBase are up. The problem I have seen in 0.94/0.92 without this patch even with managed connections is that after HConnection times out, it is unusable and even getting a new HTable does not fix the problem since behind the scenes the same HConnection is retrieved. Will think about an automated test. Do you like the version better that always does the recheck (and hence all the conditional for "managed" go away)? > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5682: - Attachment: 5682-all.txt Here's a patch that always attempts reconnecting to ZK when a ZK connection is needed. > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5691) Importtsv stops the webservice from which it is evoked
[ https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243314#comment-13243314 ] stack commented on HBASE-5691: -- It calls System.exit on the end: System.exit(job.waitForCompletion(true) ? 0 : 1);. How you invoking it?You call main? Why not call the createSubmittableJob(conf, otherArgs) and then pass the passed job to job.waitForCompletion(true)... and show error or success dependent on what is returned. Is it even a good idea calling this script from a servlet? > Importtsv stops the webservice from which it is evoked > -- > > Key: HBASE-5691 > URL: https://issues.apache.org/jira/browse/HBASE-5691 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.4 >Reporter: debarshi basak >Priority: Minor > > I was trying to run importtsv from a servlet. Everytime after the completion > of job, the tomcat server was shutdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5691) Importtsv stops the webservice from which it is evoked
[ https://issues.apache.org/jira/browse/HBASE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243315#comment-13243315 ] stack commented on HBASE-5691: -- Oh, please close this issue if the above suggestion works for you. > Importtsv stops the webservice from which it is evoked > -- > > Key: HBASE-5691 > URL: https://issues.apache.org/jira/browse/HBASE-5691 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.4 >Reporter: debarshi basak >Priority: Minor > > I was trying to run importtsv from a servlet. Everytime after the completion > of job, the tomcat server was shutdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243312#comment-13243312 ] stack commented on HBASE-5682: -- This is a perversion. If we pass in a connection from outside, down in the guts, do special handling that makes the connection and zookeeper handling do reconnect. Its like we should be passing an Interface made at a higher-level of abstraction and then in the implementation, it did this fixup when connection breaks. With that out of the way, do whatever you need to make it work. Patch looks fine. How did you test. Would it be hard to make a unit test of it. A unit test would be good codifying this perversion since it will be brittle being not whats expected. I'm against changing the behavior of the default case in 0.92/0.94. I'm interested in problems you see in hbase-5153 or issues you have w/ the implementation there that being the 0.96 client. > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5691) Importtsv stops the webservice from which it is evoked
Importtsv stops the webservice from which it is evoked -- Key: HBASE-5691 URL: https://issues.apache.org/jira/browse/HBASE-5691 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: debarshi basak Priority: Minor I was trying to run importtsv from a servlet. Everytime after the completion of job, the tomcat server was shutdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243310#comment-13243310 ] Zhihong Yu commented on HBASE-5666: --- HConnectionImplementation.checkIfBaseNodeAvailable() doesn't take Abortable. We can limit the scope of change in this JIRA. Is that Okay ? > RegionServer doesn't retry to check if base node is available > - > > Key: HBASE-5666 > URL: https://issues.apache.org/jira/browse/HBASE-5666 > Project: HBase > Issue Type: Bug > Components: regionserver, zookeeper >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, > hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, > hbase-zookeeper.log, zk-exists-refactor-v0.patch > > > I've a script that starts hbase and a couple of region servers in distributed > mode (hbase.cluster.distributed = true) > {code} > $HBASE_HOME/bin/start-hbase.sh > $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 > {code} > but the region servers are not able to start... > It seems that during the RS start the the znode is still not available, and > HRegionServer.initializeZooKeeper() check just once if the base not is > available. > {code} > 2012-03-28 21:54:05,013 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value > configured in 'zookeeper.znode.parent'. There could be a mismatch with the > one configured in the master. > 2012-03-28 21:54:08,598 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > localhost,60202,133296824: Initialization of RS failed. Hence aborting > RS. > java.io.IOException: Received the shutdown message while waiting. > at > org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243302#comment-13243302 ] Matteo Bertozzi commented on HBASE-5666: I was thinking to patch ZooKeeperNodeTracker.checkIfBaseNodeAvailable() and HConnectionImplementation.checkIfBaseNodeAvailable() instead of only the HRegionServer... what do you think? > RegionServer doesn't retry to check if base node is available > - > > Key: HBASE-5666 > URL: https://issues.apache.org/jira/browse/HBASE-5666 > Project: HBase > Issue Type: Bug > Components: regionserver, zookeeper >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, > hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, > hbase-zookeeper.log, zk-exists-refactor-v0.patch > > > I've a script that starts hbase and a couple of region servers in distributed > mode (hbase.cluster.distributed = true) > {code} > $HBASE_HOME/bin/start-hbase.sh > $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 > {code} > but the region servers are not able to start... > It seems that during the RS start the the znode is still not available, and > HRegionServer.initializeZooKeeper() check just once if the base not is > available. > {code} > 2012-03-28 21:54:05,013 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value > configured in 'zookeeper.znode.parent'. There could be a mismatch with the > one configured in the master. > 2012-03-28 21:54:08,598 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > localhost,60202,133296824: Initialization of RS failed. Hence aborting > RS. > java.io.IOException: Received the shutdown message while waiting. > at > org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243296#comment-13243296 ] Zhihong Yu commented on HBASE-5666: --- Patch makes sense. Can you integrate it into HRegionServer ? Thanks > RegionServer doesn't retry to check if base node is available > - > > Key: HBASE-5666 > URL: https://issues.apache.org/jira/browse/HBASE-5666 > Project: HBase > Issue Type: Bug > Components: regionserver, zookeeper >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, > hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, > hbase-zookeeper.log, zk-exists-refactor-v0.patch > > > I've a script that starts hbase and a couple of region servers in distributed > mode (hbase.cluster.distributed = true) > {code} > $HBASE_HOME/bin/start-hbase.sh > $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 > {code} > but the region servers are not able to start... > It seems that during the RS start the the znode is still not available, and > HRegionServer.initializeZooKeeper() check just once if the base not is > available. > {code} > 2012-03-28 21:54:05,013 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value > configured in 'zookeeper.znode.parent'. There could be a mismatch with the > one configured in the master. > 2012-03-28 21:54:08,598 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > localhost,60202,133296824: Initialization of RS failed. Hence aborting > RS. > java.io.IOException: Received the shutdown message while waiting. > at > org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5669) AggregationClient fails validation for open stoprow scan
[ https://issues.apache.org/jira/browse/HBASE-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5669: - Fix Version/s: (was: 0.94.1) 0.94.0 > AggregationClient fails validation for open stoprow scan > > > Key: HBASE-5669 > URL: https://issues.apache.org/jira/browse/HBASE-5669 > Project: HBase > Issue Type: Bug > Components: coprocessors >Affects Versions: 0.92.1 > Environment: n/a >Reporter: Brian Rogers >Assignee: Mubarak Seyed >Priority: Minor > Fix For: 0.92.2, 0.94.0, 0.96.0 > > Attachments: HBASE-5669.trunk.v1.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > AggregationClient.validateParameters throws an exception when the Scan has a > valid startrow but an unset endrow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5084) Allow different HTable instances to share one ExecutorService
[ https://issues.apache.org/jira/browse/HBASE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5084: - Fix Version/s: (was: 0.94.1) 0.94.0 > Allow different HTable instances to share one ExecutorService > - > > Key: HBASE-5084 > URL: https://issues.apache.org/jira/browse/HBASE-5084 > Project: HBase > Issue Type: Task >Reporter: Zhihong Yu >Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5084-0.94.txt, 5084-trunk.txt > > > This came out of Lily 1.1.1 release: > Use a shared ExecutorService for all HTable instances, leading to better (or > actual) thread reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4398) If HRegionPartitioner is used in MapReduce, client side configurations are overwritten by hbase-site.xml.
[ https://issues.apache.org/jira/browse/HBASE-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4398: - Fix Version/s: (was: 0.94.1) 0.94.0 > If HRegionPartitioner is used in MapReduce, client side configurations are > overwritten by hbase-site.xml. > - > > Key: HBASE-4398 > URL: https://issues.apache.org/jira/browse/HBASE-4398 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.90.4 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 0.92.2, 0.94.0 > > Attachments: HBASE-4398.patch > > > If HRegionPartitioner is used in MapReduce, client side configurations are > overwritten by hbase-site.xml. > We can reproduce the problem by the following instructions: > {noformat} > - Add HRegionPartitioner.class to the 4th argument of > TableMapReduceUtil#initTableReducerJob() > at line around 133 > in src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java > - Change or remove "hbase.zookeeper.property.clientPort" property > in hbase-site.xml ( for example, changed to 12345 ). > - run testMultiRegionTable() > {noformat} > Then I got error messages as following: > {noformat} > 2011-09-12 22:28:51,020 DEBUG [Thread-832] zookeeper.ZKUtil(93): hconnection > opening connection to ZooKeeper with ensemble (localhost:12345) > 2011-09-12 22:28:51,022 INFO [Thread-832] > zookeeper.RecoverableZooKeeper(89): The identifier of this process is > 43200@imac.local > 2011-09-12 22:28:51,123 WARN [Thread-832] > zookeeper.RecoverableZooKeeper(161): Possibly transient ZooKeeper exception: > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for /hbase/master > 2011-09-12 22:28:51,123 INFO [Thread-832] > zookeeper.RecoverableZooKeeper(173): The 1 times to retry ZooKeeper after > sleeping 1000 ms > = > 2011-09-12 22:29:02,418 ERROR [Thread-832] mapreduce.HRegionPartitioner(125): > java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2e54e48d > closed > 2011-09-12 22:29:02,422 WARN [Thread-832] mapred.LocalJobRunner$Job(256): > job_local_0001 > java.lang.NullPointerException >at > org.apache.hadoop.hbase.mapreduce.HRegionPartitioner.setConf(HRegionPartitioner.java:128) >at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) >at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:527) >at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) >at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > {noformat} > I think HTable should connect to ZooKeeper at port 21818 configured at client > side instead of 12345 in hbase-site.xml > and It might be caused by "HBaseConfiguration.addHbaseResources(conf);" in > HRegionPartitioner#setConf(Configuration). > And this might mean that all of client side configurations, also configured > in hbase-site.xml, are overwritten caused by this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5670) Have Mutation implement the Row interface.
[ https://issues.apache.org/jira/browse/HBASE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5670: - Fix Version/s: (was: 0.94.1) 0.94.0 > Have Mutation implement the Row interface. > -- > > Key: HBASE-5670 > URL: https://issues.apache.org/jira/browse/HBASE-5670 > Project: HBase > Issue Type: Improvement >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Trivial > Fix For: 0.94.0, 0.96.0 > > Attachments: 5670-0.94.txt, 5670-trunk.txt, 5670-trunk.txt > > > In HBASE-4347 I factored some code from Put/Delete/Append in Mutation. > In a discussion with a co-worker I noticed that Put/Delete/Append still > implement the Row interface, but Mutation does not. > In a trivial change I would like to move that interface up to Mutation, along > with changing HTable.batch(List) to HTable.batch(List) > (HConnection.processBatch takes List already anyway), so that > HTable.batch can be used with a list of Mutations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5666: --- Attachment: zk-exists-refactor-v0.patch don't know... I've tried to refactor the method to get something useful and shared... The problem is that checkExists() called by checkIfBaseNodeAvailable() uses a ZooKeeperWatcher and call exists() on a RecoverableZooKeeper object, while waitForBaseZNode() has a plain ZooKeeper node... so the checkExists(ZooKeeperWatcher) implementation relays on the fact that the RecoverableZooKeeper.exists() is implemented as RZK.getZooKeeper().exists() which I don't like... > RegionServer doesn't retry to check if base node is available > - > > Key: HBASE-5666 > URL: https://issues.apache.org/jira/browse/HBASE-5666 > Project: HBase > Issue Type: Bug > Components: regionserver, zookeeper >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, > hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, > hbase-zookeeper.log, zk-exists-refactor-v0.patch > > > I've a script that starts hbase and a couple of region servers in distributed > mode (hbase.cluster.distributed = true) > {code} > $HBASE_HOME/bin/start-hbase.sh > $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 > {code} > but the region servers are not able to start... > It seems that during the RS start the the znode is still not available, and > HRegionServer.initializeZooKeeper() check just once if the base not is > available. > {code} > 2012-03-28 21:54:05,013 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value > configured in 'zookeeper.znode.parent'. There could be a mismatch with the > one configured in the master. > 2012-03-28 21:54:08,598 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > localhost,60202,133296824: Initialization of RS failed. Hence aborting > RS. > java.io.IOException: Received the shutdown message while waiting. > at > org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5690) compression does not work in Store.java of 0.94
[ https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243246#comment-13243246 ] Hudson commented on HBASE-5690: --- Integrated in HBase-0.94 #72 (See [https://builds.apache.org/job/HBase-0.94/72/]) HBASE-5690 compression does not work in Store.java of 0.94 (Honghua Zhu) (Revision 1307851) Result = FAILURE tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java > compression does not work in Store.java of 0.94 > --- > > Key: HBASE-5690 > URL: https://issues.apache.org/jira/browse/HBASE-5690 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 > Environment: all >Reporter: honghua zhu >Assignee: honghua zhu >Priority: Critical > Fix For: 0.94.0 > > Attachments: Store.patch > > > HBASE-5442 The store.createWriterInTmp method missing "compression" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243239#comment-13243239 ] Zhihong Yu commented on HBASE-5666: --- Refactoring ZKUtil.waitForBaseZNode() so that it can be used by region server and the test would be good. > RegionServer doesn't retry to check if base node is available > - > > Key: HBASE-5666 > URL: https://issues.apache.org/jira/browse/HBASE-5666 > Project: HBase > Issue Type: Bug > Components: regionserver, zookeeper >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, > hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, > hbase-zookeeper.log > > > I've a script that starts hbase and a couple of region servers in distributed > mode (hbase.cluster.distributed = true) > {code} > $HBASE_HOME/bin/start-hbase.sh > $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 > {code} > but the region servers are not able to start... > It seems that during the RS start the the znode is still not available, and > HRegionServer.initializeZooKeeper() check just once if the base not is > available. > {code} > 2012-03-28 21:54:05,013 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value > configured in 'zookeeper.znode.parent'. There could be a mismatch with the > one configured in the master. > 2012-03-28 21:54:08,598 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > localhost,60202,133296824: Initialization of RS failed. Hence aborting > RS. > java.io.IOException: Received the shutdown message while waiting. > at > org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243236#comment-13243236 ] Matteo Bertozzi commented on HBASE-5666: diving into the code, I've also noticed that there's a ZKUtil.waitForBaseZNode() that has already the retry logic but: - It takes a Configuration object - The timeout is set internally to 1ms - It's only used by test/.../hbase/util/ProcessBasedLocalHBaseCluster.java > RegionServer doesn't retry to check if base node is available > - > > Key: HBASE-5666 > URL: https://issues.apache.org/jira/browse/HBASE-5666 > Project: HBase > Issue Type: Bug > Components: regionserver, zookeeper >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, > hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, > hbase-zookeeper.log > > > I've a script that starts hbase and a couple of region servers in distributed > mode (hbase.cluster.distributed = true) > {code} > $HBASE_HOME/bin/start-hbase.sh > $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 > {code} > but the region servers are not able to start... > It seems that during the RS start the the znode is still not available, and > HRegionServer.initializeZooKeeper() check just once if the base not is > available. > {code} > 2012-03-28 21:54:05,013 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value > configured in 'zookeeper.znode.parent'. There could be a mismatch with the > one configured in the master. > 2012-03-28 21:54:08,598 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > localhost,60202,133296824: Initialization of RS failed. Hence aborting > RS. > java.io.IOException: Received the shutdown message while waiting. > at > org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5690) compression does not work in Store.java of 0.94
[ https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu resolved HBASE-5690. --- Resolution: Fixed > compression does not work in Store.java of 0.94 > --- > > Key: HBASE-5690 > URL: https://issues.apache.org/jira/browse/HBASE-5690 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 > Environment: all >Reporter: honghua zhu >Assignee: honghua zhu >Priority: Critical > Fix For: 0.94.0 > > Attachments: Store.patch > > > HBASE-5442 The store.createWriterInTmp method missing "compression" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5690) compression does not work in Store.java of 0.94
[ https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243229#comment-13243229 ] Zhihong Yu commented on HBASE-5690: --- Integrated to 0.94 branch. Thanks for the patch Honghua. > compression does not work in Store.java of 0.94 > --- > > Key: HBASE-5690 > URL: https://issues.apache.org/jira/browse/HBASE-5690 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 > Environment: all >Reporter: honghua zhu >Assignee: honghua zhu >Priority: Critical > Fix For: 0.94.0 > > Attachments: Store.patch > > > HBASE-5442 The store.createWriterInTmp method missing "compression" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243227#comment-13243227 ] Zhihong Yu commented on HBASE-5682: --- Application to other HConnection makes sense. > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE
[ https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5097: - Fix Version/s: (was: 0.94.1) 0.94.0 > RegionObserver implementation whose preScannerOpen and postScannerOpen Impl > return null can stall the system initialization through NPE > --- > > Key: HBASE-5097 > URL: https://issues.apache.org/jira/browse/HBASE-5097 > Project: HBase > Issue Type: Bug > Components: coprocessors >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.2, 0.94.0, 0.96.0 > > Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch > > > In HRegionServer.java openScanner() > {code} > r.prepareScanner(scan); > RegionScanner s = null; > if (r.getCoprocessorHost() != null) { > s = r.getCoprocessorHost().preScannerOpen(scan); > } > if (s == null) { > s = r.getScanner(scan); > } > if (r.getCoprocessorHost() != null) { > s = r.getCoprocessorHost().postScannerOpen(scan, s); > } > {code} > If we dont have implemention for postScannerOpen the RegionScanner is null > and so throwing nullpointer > {code} > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326) > {code} > Making this defect as blocker.. Pls feel free to change the priority if am > wrong. Also correct me if my way of trying out coprocessors without > implementing postScannerOpen is wrong. Am just a learner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm
[ https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5656: - Fix Version/s: (was: 0.94.1) 0.94.0 > LoadIncrementalHFiles createTable should detect and set compression algorithm > - > > Key: HBASE-5656 > URL: https://issues.apache.org/jira/browse/HBASE-5656 > Project: HBase > Issue Type: Bug > Components: util >Affects Versions: 0.92.1 >Reporter: Cosmin Lehene >Assignee: Cosmin Lehene > Fix For: 0.92.2, 0.94.0, 0.96.0 > > Attachments: HBASE-5656-0.92.patch, HBASE-5656-0.92.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > LoadIncrementalHFiles doesn't set compression when creating the the table. > This can be detected from the files within each family dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3134) [replication] Add the ability to enable/disable streams
[ https://issues.apache.org/jira/browse/HBASE-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-3134: - Fix Version/s: (was: 0.94.1) 0.94.0 > [replication] Add the ability to enable/disable streams > --- > > Key: HBASE-3134 > URL: https://issues.apache.org/jira/browse/HBASE-3134 > Project: HBase > Issue Type: New Feature > Components: replication >Reporter: Jean-Daniel Cryans >Assignee: Teruyoshi Zenmyo >Priority: Minor > Labels: replication > Fix For: 0.94.0 > > Attachments: 3134-v2.txt, 3134-v3.txt, 3134-v4.txt, 3134.txt, > HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch > > > This jira was initially in the scope of HBASE-2201, but was pushed out since > it has low value compared to the required effort (and when want to ship > 0.90.0 rather soonish). > We need to design a way to enable/disable replication streams in a > determinate fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5689: - Fix Version/s: 0.94.0 > Skipping RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.94.0 > > Attachments: 5689-simplified.txt, 5689-testcase.patch, > HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5690) compression does not work in Store.java of 0.94
[ https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5690: - Fix Version/s: (was: 0.94.1) 0.94.0 > compression does not work in Store.java of 0.94 > --- > > Key: HBASE-5690 > URL: https://issues.apache.org/jira/browse/HBASE-5690 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 > Environment: all >Reporter: honghua zhu >Assignee: honghua zhu >Priority: Critical > Fix For: 0.94.0 > > Attachments: Store.patch > > > HBASE-5442 The store.createWriterInTmp method missing "compression" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5682: - Fix Version/s: (was: 0.94.1) 0.94.0 > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243222#comment-13243222 ] Lars Hofhansl commented on HBASE-5682: -- Thanks Ted. The last question is: Should we do this for all HConnection (not just for unmanaged ones)? It means that HConnection would be able to recover from loss of ZK connection and the abort() method would only clear out the ZK trackers and never close or abort he connection. I'd be in favor of that. @Ram and @Jieshan: Since would a more robust version of HBASE-5153, could you have a look at this? > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.1 > > Attachments: 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5689: -- Attachment: 5689-simplified.txt Removing the check should be enough. With the attached patch, TestHRegion#testDataCorrectnessReplayingRecoveredEdits passes. > Skipping RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Attachments: 5689-simplified.txt, 5689-testcase.patch, > HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5636) TestTableMapReduce doesn't work properly.
[ https://issues.apache.org/jira/browse/HBASE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu reassigned HBASE-5636: - Assignee: Takuya Ueshin > TestTableMapReduce doesn't work properly. > - > > Key: HBASE-5636 > URL: https://issues.apache.org/jira/browse/HBASE-5636 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.92.1, 0.94.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Attachments: HBASE-5636-v2.patch, HBASE-5636.patch > > > No map function is called because there are no test data put before test > starts. > The following three tests are in the same situation: > - org.apache.hadoop.hbase.mapred.TestTableMapReduce > - org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > - org.apache.hadoop.hbase.mapreduce.TestMulitthreadedTableMapper -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.
[ https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5663: -- Fix Version/s: 0.96.0 0.94.0 > MultithreadedTableMapper doesn't work. > -- > > Key: HBASE-5663 > URL: https://issues.apache.org/jira/browse/HBASE-5663 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.0 >Reporter: Takuya Ueshin > Fix For: 0.94.0, 0.96.0 > > Attachments: 5663+5636.txt, HBASE-5663.patch > > > MapReduce job using MultithreadedTableMapper goes down throwing the following > Exception: > {noformat} > java.io.IOException: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getConstructor(Class.java:1657) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241) > ... 8 more > {noformat} > This occured when the tasks are creating MapRunner threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5689: -- Priority: Critical (was: Major) Making it critical as it is data loss related. > Skipping RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Attachments: 5689-testcase.patch, HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.
[ https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243205#comment-13243205 ] Zhihong Yu commented on HBASE-5663: --- {code} Results : Tests run: 541, Failures: 0, Errors: 0, Skipped: 0 ... Results : Tests run: 615, Failures: 0, Errors: 0, Skipped: 2 [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 35:45.595s {code} I will integrate Monday if there is no objection. > MultithreadedTableMapper doesn't work. > -- > > Key: HBASE-5663 > URL: https://issues.apache.org/jira/browse/HBASE-5663 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.0 >Reporter: Takuya Ueshin > Fix For: 0.94.0, 0.96.0 > > Attachments: 5663+5636.txt, HBASE-5663.patch > > > MapReduce job using MultithreadedTableMapper goes down throwing the following > Exception: > {noformat} > java.io.IOException: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getConstructor(Class.java:1657) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241) > ... 8 more > {noformat} > This occured when the tasks are creating MapRunner threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5663) MultithreadedTableMapper doesn't work.
[ https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu reassigned HBASE-5663: - Assignee: Takuya Ueshin > MultithreadedTableMapper doesn't work. > -- > > Key: HBASE-5663 > URL: https://issues.apache.org/jira/browse/HBASE-5663 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 0.94.0, 0.96.0 > > Attachments: 5663+5636.txt, HBASE-5663.patch > > > MapReduce job using MultithreadedTableMapper goes down throwing the following > Exception: > {noformat} > java.io.IOException: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getConstructor(Class.java:1657) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241) > ... 8 more > {noformat} > This occured when the tasks are creating MapRunner threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243204#comment-13243204 ] ramkrishna.s.vasudevan commented on HBASE-5689: --- @Ted What do you feel Ted? Chunhui's patch also makes sense but any way the idea is not to skip any recovered.edits file. So i felt removing the check will be enough. > Skipping RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5689-testcase.patch, HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.
[ https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243191#comment-13243191 ] Takuya Ueshin commented on HBASE-5663: -- Oh, I misread what you wrote. Thank you for your support. > MultithreadedTableMapper doesn't work. > -- > > Key: HBASE-5663 > URL: https://issues.apache.org/jira/browse/HBASE-5663 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.0 >Reporter: Takuya Ueshin > Attachments: 5663+5636.txt, HBASE-5663.patch > > > MapReduce job using MultithreadedTableMapper goes down throwing the following > Exception: > {noformat} > java.io.IOException: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getConstructor(Class.java:1657) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241) > ... 8 more > {noformat} > This occured when the tasks are creating MapRunner threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.
[ https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243190#comment-13243190 ] Zhihong Yu commented on HBASE-5663: --- I ran the following tests based on combined patch and they passed: TestMultithreadedTableMapper, TestColumnSeeking, TestTableMapReduce Looks like Hadoop QA is not working. I am running test suite on Linux. > MultithreadedTableMapper doesn't work. > -- > > Key: HBASE-5663 > URL: https://issues.apache.org/jira/browse/HBASE-5663 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.0 >Reporter: Takuya Ueshin > Attachments: 5663+5636.txt, HBASE-5663.patch > > > MapReduce job using MultithreadedTableMapper goes down throwing the following > Exception: > {noformat} > java.io.IOException: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getConstructor(Class.java:1657) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241) > ... 8 more > {noformat} > This occured when the tasks are creating MapRunner threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.
[ https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5663: -- Attachment: 5663+5636.txt Combined patch for HBASE-5663 and HBASE-5636 > MultithreadedTableMapper doesn't work. > -- > > Key: HBASE-5663 > URL: https://issues.apache.org/jira/browse/HBASE-5663 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.0 >Reporter: Takuya Ueshin > Attachments: 5663+5636.txt, HBASE-5663.patch > > > MapReduce job using MultithreadedTableMapper goes down throwing the following > Exception: > {noformat} > java.io.IOException: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getConstructor(Class.java:1657) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241) > ... 8 more > {noformat} > This occured when the tasks are creating MapRunner threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.
[ https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243182#comment-13243182 ] Zhihong Yu commented on HBASE-5663: --- So the comment in patch about hadoop versions can be removed. See the error I got when using hadoop 1.0: https://issues.apache.org/jira/browse/HBASE-5636?focusedCommentId=13243180&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13243180 > MultithreadedTableMapper doesn't work. > -- > > Key: HBASE-5663 > URL: https://issues.apache.org/jira/browse/HBASE-5663 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.0 >Reporter: Takuya Ueshin > Attachments: HBASE-5663.patch > > > MapReduce job using MultithreadedTableMapper goes down throwing the following > Exception: > {noformat} > java.io.IOException: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getConstructor(Class.java:1657) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241) > ... 8 more > {noformat} > This occured when the tasks are creating MapRunner threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5636) TestTableMapReduce doesn't work properly.
[ https://issues.apache.org/jira/browse/HBASE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243180#comment-13243180 ] Zhihong Yu commented on HBASE-5636: --- Without patch from HBASE-5663, I saw the following in test output: {code} 2012-03-31 07:55:46,743 DEBUG [main] mapreduce.TableInputFormatBase(194): getSplits: split -> 24 -> 192.168.0.17:yyy, java.io.IOException: java.lang.NoSuchMethodException: org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, org.apache.hadoop.mapred.TaskAttemptID, org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, org.apache.hadoop.hbase.mapreduce.TableSplit) at org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner. (MultithreadedTableMapper.java:260) at org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) {code} > TestTableMapReduce doesn't work properly. > - > > Key: HBASE-5636 > URL: https://issues.apache.org/jira/browse/HBASE-5636 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.92.1, 0.94.0 >Reporter: Takuya Ueshin > Attachments: HBASE-5636-v2.patch, HBASE-5636.patch > > > No map function is called because there are no test data put before test > starts. > The following three tests are in the same situation: > - org.apache.hadoop.hbase.mapred.TestTableMapReduce > - org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > - org.apache.hadoop.hbase.mapreduce.TestMulitthreadedTableMapper -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.
[ https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243172#comment-13243172 ] Takuya Ueshin commented on HBASE-5663: -- No, any MapReduce Job using current MultithreadedTableMapper would not work. When the Mapper starts, it creates some threads and 'original' Mapper object which should be called by the thread and MapperContext object used by the original mapper. But because the constructor call of MapperContext through reflection is wrong, it throws the NoSuchMethodException. As I reported in an issue HBASE-5636, trunk version's test case TestMulitthreadedTableMapper has a bug, so please test by new version test attached in it. > MultithreadedTableMapper doesn't work. > -- > > Key: HBASE-5663 > URL: https://issues.apache.org/jira/browse/HBASE-5663 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.0 >Reporter: Takuya Ueshin > Attachments: HBASE-5663.patch > > > MapReduce job using MultithreadedTableMapper goes down throwing the following > Exception: > {noformat} > java.io.IOException: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getConstructor(Class.java:1657) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241) > ... 8 more > {noformat} > This occured when the tasks are creating MapRunner threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5689) Skip RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5689: -- Affects Version/s: 0.94.0 > Skip RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5689-testcase.patch, HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5689: -- Summary: Skipping RecoveredEdits may cause data loss (was: Skip RecoveredEdits may cause data loss) > Skipping RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5689-testcase.patch, HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243154#comment-13243154 ] Zhihong Yu commented on HBASE-5689: --- >From the above analysis, it looks like this bug was caused by the optimization >made in HBASE-4797. > Skip RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5689-testcase.patch, HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5689) Skip RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243057#comment-13243057 ] Zhihong Yu edited comment on HBASE-5689 at 3/31/12 2:00 PM: @Chunhui Good test case. Yes able to reproduce the problem :).. Just trying to understand more on what is happening there. was (Author: ram_krish): @Chunhui Good test case. Yes able to reproduce the problem :).. Just try to understand more on what is happening there. {Edit}Good test case. Yes able to reproduce the problem :).. Just trying to understand more on what is happening there. > Skip RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5689-testcase.patch, HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.
[ https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243150#comment-13243150 ] Zhihong Yu commented on HBASE-5663: --- We could check the Exception was caused by NoSuchMethodException. Since the retry is in MapRunner ctor, the above may not be needed. > MultithreadedTableMapper doesn't work. > -- > > Key: HBASE-5663 > URL: https://issues.apache.org/jira/browse/HBASE-5663 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.0 >Reporter: Takuya Ueshin > Attachments: HBASE-5663.patch > > > MapReduce job using MultithreadedTableMapper goes down throwing the following > Exception: > {noformat} > java.io.IOException: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getConstructor(Class.java:1657) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241) > ... 8 more > {noformat} > This occured when the tasks are creating MapRunner threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.
[ https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5663: -- Hadoop Flags: Reviewed Status: Patch Available (was: Open) > MultithreadedTableMapper doesn't work. > -- > > Key: HBASE-5663 > URL: https://issues.apache.org/jira/browse/HBASE-5663 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.0 >Reporter: Takuya Ueshin > Attachments: HBASE-5663.patch > > > MapReduce job using MultithreadedTableMapper goes down throwing the following > Exception: > {noformat} > java.io.IOException: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getConstructor(Class.java:1657) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241) > ... 8 more > {noformat} > This occured when the tasks are creating MapRunner threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.
[ https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243147#comment-13243147 ] Zhihong Yu commented on HBASE-5635: --- Then we should log a message saying what we wait for every X minutes so that user doesn't have to use jstack. > If getTaskList() returns null splitlogWorker is down. It wont serve any > requests. > -- > > Key: HBASE-5635 > URL: https://issues.apache.org/jira/browse/HBASE-5635 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 0.92.1 >Reporter: Kristam Subba Swathi > Attachments: HBASE-5635.1.patch, HBASE-5635.patch > > > During the hlog split operation if all the zookeepers are down ,then the > paths will be returned as null and the splitworker thread wil be exited > Now this regionserver wil not be able to acquire any other tasks since the > splitworker thread is exited > Please find the attached code for more details > {code} > private List getTaskList() { > for (int i = 0; i < zkretries; i++) { > try { > return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher, > this.watcher.splitLogZNode)); > } catch (KeeperException e) { > LOG.warn("Could not get children of znode " + > this.watcher.splitLogZNode, e); > try { > Thread.sleep(1000); > } catch (InterruptedException e1) { > LOG.warn("Interrupted while trying to get task list ...", e1); > Thread.currentThread().interrupt(); > return null; > } > } > } > {code} > in the org.apache.hadoop.hbase.regionserver.SplitLogWorker > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5690) compression does not work in Store.java of 0.94
[ https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu reassigned HBASE-5690: - Assignee: honghua zhu > compression does not work in Store.java of 0.94 > --- > > Key: HBASE-5690 > URL: https://issues.apache.org/jira/browse/HBASE-5690 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 > Environment: all >Reporter: honghua zhu >Assignee: honghua zhu >Priority: Critical > Fix For: 0.94.1 > > Attachments: Store.patch > > > HBASE-5442 The store.createWriterInTmp method missing "compression" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5690) compression does not work in Store.java of 0.94
[ https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5690: -- Hadoop Flags: Reviewed Summary: compression does not work in Store.java of 0.94 (was: compression unavailable) > compression does not work in Store.java of 0.94 > --- > > Key: HBASE-5690 > URL: https://issues.apache.org/jira/browse/HBASE-5690 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 > Environment: all >Reporter: honghua zhu >Priority: Critical > Fix For: 0.94.1 > > Attachments: Store.patch > > > HBASE-5442 The store.createWriterInTmp method missing "compression" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5690) compression unavailable
[ https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243142#comment-13243142 ] Zhihong Yu commented on HBASE-5690: --- +1 on patch. HBASE-5605 fixed the same problem in trunk. > compression unavailable > --- > > Key: HBASE-5690 > URL: https://issues.apache.org/jira/browse/HBASE-5690 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 > Environment: all >Reporter: honghua zhu >Priority: Critical > Fix For: 0.94.1 > > Attachments: Store.patch > > > HBASE-5442 The store.createWriterInTmp method missing "compression" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243141#comment-13243141 ] Zhihong Yu commented on HBASE-5682: --- +1 on patch. > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > -- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.1 > > Attachments: 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5690) compression unavailable
[ https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] honghua zhu updated HBASE-5690: --- Fix Version/s: (was: 0.94.0) 0.94.1 > compression unavailable > --- > > Key: HBASE-5690 > URL: https://issues.apache.org/jira/browse/HBASE-5690 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 > Environment: all >Reporter: honghua zhu >Priority: Critical > Fix For: 0.94.1 > > Attachments: Store.patch > > > HBASE-5442 The store.createWriterInTmp method missing "compression" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243109#comment-13243109 ] ramkrishna.s.vasudevan commented on HBASE-5689: --- I just want to understand here like removing the {code} if (maxSeqId <= minSeqId) { String msg = "Maximum possible sequenceid for this log is " + maxSeqId + ", skipped the whole file, path=" + edits; LOG.debug(msg); continue; {code} 'continue' here should solve the problem. Inside replayRecoveredEdits we have this code {code} // Now, figure if we should skip this edit. if (key.getLogSeqNum() <= currentEditSeqId) { skippedEdits++; continue; } {code} which will any way skip unnecessary edits. I tried by commenting the 'continue' in the code and the test case that you gave passed. It is my thought, your comments are welcome. > Skip RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5689-testcase.patch, HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243106#comment-13243106 ] ramkrishna.s.vasudevan commented on HBASE-5689: --- @Chunhui Thanks for the patch. I would like to tell my analysis in this -> Suppose the current seq for RS 1 is 4 When the first row kv was inserted KV(r1->v1), the current seq is 5. On moving the region to RS2 the store gets flushed and when RS2 opens the region the next seq he can use will be 6. Now we make the next kv entry KV(r2->v1), now the current seq for this entry is 6. Now when the region is moved again to RS1, another store file is created by RS2. Now when RS1 opens the region the seq number which he can use will be 7. We now add an entry KV(r3->v1) again in RS1 so it will have 7 in it (in WAL). Kill the RS2 first. This will create a recovered.edits with file name 06. Kill RS1. This will create a recovered.edits with file name 5. Now when the region is finally opened in a new RS i will be having the 2 store files and the max seq id from them will be 6. Now the recovered.edits will also give me 6 as highest seq. {code} if (maxSeqId <= minSeqId) { String msg = "Maximum possible sequenceid for this log is " + maxSeqId + ", skipped the whole file, path=" + edits; LOG.debug(msg); continue; {code} Correct me my analysis is wrong. > Skip RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5689-testcase.patch, HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5536) Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no longer work on older versions
[ https://issues.apache.org/jira/browse/HBASE-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243092#comment-13243092 ] Uma Maheswara Rao G commented on HBASE-5536: Yes, Ram. typo :-). Here is what I meant: We should not access any non exposed APIs from HDFS at least from this version. We should bring the requirement to HDFS about the usage and need of that APIs in Hbase. Get the access throw some public interfacing. So, that even though HDFS changes internal APIs, public interfacing would remains same. So, Hbase/Other dependent users need not end up with reflections for every compatibility change. > Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no > longer work on older versions > -- > > Key: HBASE-5536 > URL: https://issues.apache.org/jira/browse/HBASE-5536 > Project: HBase > Issue Type: Task >Reporter: stack >Priority: Blocker > Fix For: 0.96.0 > > > Looks like there is pretty much consensus that depending on 1.0.0 in 0.96 > should be fine? See > http://search-hadoop.com/m/dSbVW14EsUb2/discuss+0.96&subj=RE+DISCUSS+Have+hbase+require+at+least+hadoop+1+0+0+in+hbase+0+96+0+ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.
[ https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243087#comment-13243087 ] ramkrishna.s.vasudevan commented on HBASE-5635: --- bq.Do we need a timeout for the newly added loop ? If we have timeout, then if the zk connection takes a little longer than the timeout how to make the worker thread to be alive? That is the reason an infinite loop was tried. > If getTaskList() returns null splitlogWorker is down. It wont serve any > requests. > -- > > Key: HBASE-5635 > URL: https://issues.apache.org/jira/browse/HBASE-5635 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 0.92.1 >Reporter: Kristam Subba Swathi > Attachments: HBASE-5635.1.patch, HBASE-5635.patch > > > During the hlog split operation if all the zookeepers are down ,then the > paths will be returned as null and the splitworker thread wil be exited > Now this regionserver wil not be able to acquire any other tasks since the > splitworker thread is exited > Please find the attached code for more details > {code} > private List getTaskList() { > for (int i = 0; i < zkretries; i++) { > try { > return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher, > this.watcher.splitLogZNode)); > } catch (KeeperException e) { > LOG.warn("Could not get children of znode " + > this.watcher.splitLogZNode, e); > try { > Thread.sleep(1000); > } catch (InterruptedException e1) { > LOG.warn("Interrupted while trying to get task list ...", e1); > Thread.currentThread().interrupt(); > return null; > } > } > } > {code} > in the org.apache.hadoop.hbase.regionserver.SplitLogWorker > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5536) Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no longer work on older versions
[ https://issues.apache.org/jira/browse/HBASE-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243085#comment-13243085 ] ramkrishna.s.vasudevan commented on HBASE-5536: --- bq.So, Hbase/Other dependant users need need end up in reflections for every version compatibility I think you meant need 'not' > Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no > longer work on older versions > -- > > Key: HBASE-5536 > URL: https://issues.apache.org/jira/browse/HBASE-5536 > Project: HBase > Issue Type: Task >Reporter: stack >Priority: Blocker > Fix For: 0.96.0 > > > Looks like there is pretty much consensus that depending on 1.0.0 in 0.96 > should be fine? See > http://search-hadoop.com/m/dSbVW14EsUb2/discuss+0.96&subj=RE+DISCUSS+Have+hbase+require+at+least+hadoop+1+0+0+in+hbase+0+96+0+ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block
[ https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243080#comment-13243080 ] xufeng commented on HBASE-5673: --- @Stack @Ted I analyze the problem of my patch. this is the result: I wrap all exception in IOException,this IOException can not be handled in CatalogTracker#private HRegionInterface getCachedConnection(ServerName sn) so the master will abort,the cases will fail. In the future,I will submit the patch with the test result. > The OOM problem of IPC client call cause all handle block > -- > > Key: HBASE-5673 > URL: https://issues.apache.org/jira/browse/HBASE-5673 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6 > Environment: 0.90.6 >Reporter: xufeng >Assignee: xufeng > Fix For: 0.90.7, 0.92.2, 0.94.1 > > Attachments: HBASE-5673-90-V2.patch, HBASE-5673-90.patch > > > if HBaseClient meet "unable to create new native thread" exception, the call > will never complete because it be lost in calls queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5663) MultithreadedTableMapper doesn't work.
[ https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated HBASE-5663: - Attachment: HBASE-5663.patch I attached a patch file. Could someone please review and refactor it? I think this is a little dirty code but I couldn't find a better way. > MultithreadedTableMapper doesn't work. > -- > > Key: HBASE-5663 > URL: https://issues.apache.org/jira/browse/HBASE-5663 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.0 >Reporter: Takuya Ueshin > Attachments: HBASE-5663.patch > > > MapReduce job using MultithreadedTableMapper goes down throwing the following > Exception: > {noformat} > java.io.IOException: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:260) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.mapreduce.Mapper$Context.(org.apache.hadoop.conf.Configuration, > org.apache.hadoop.mapred.TaskAttemptID, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader, > > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter, > org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter, > org.apache.hadoop.hbase.mapreduce.TableSplit) > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getConstructor(Class.java:1657) > at > org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.(MultithreadedTableMapper.java:241) > ... 8 more > {noformat} > This occured when the tasks are creating MapRunner threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5690) compression unavailable
[ https://issues.apache.org/jira/browse/HBASE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] honghua zhu updated HBASE-5690: --- Attachment: Store.patch > compression unavailable > --- > > Key: HBASE-5690 > URL: https://issues.apache.org/jira/browse/HBASE-5690 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 > Environment: all >Reporter: honghua zhu >Priority: Critical > Fix For: 0.94.0 > > Attachments: Store.patch > > > HBASE-5442 The store.createWriterInTmp method missing "compression" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5690) compression unavailable
compression unavailable --- Key: HBASE-5690 URL: https://issues.apache.org/jira/browse/HBASE-5690 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Environment: all Reporter: honghua zhu Priority: Critical Fix For: 0.94.0 HBASE-5442 The store.createWriterInTmp method missing "compression" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5689) Skip RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-5689: Attachment: HBASE-5689.patch In the patch, I make the region's MaximumEditLogSeqNum in the RecoveredEdit file as the file name. > Skip RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5689-testcase.patch, HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5689) Skip RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243070#comment-13243070 ] chunhui shen commented on HBASE-5689: - rather than the current MinimumEditLogSeqNum as the file name. > Skip RecoveredEdits may cause data loss > --- > > Key: HBASE-5689 > URL: https://issues.apache.org/jira/browse/HBASE-5689 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: 5689-testcase.patch, HBASE-5689.patch > > > Let's see the following scenario: > 1.Region is on the server A > 2.put KV(r1->v1) to the region > 3.move region from server A to server B > 4.put KV(r2->v2) to the region > 5.move region from server B to server A > 6.put KV(r3->v3) to the region > 7.kill -9 server B and start it > 8.kill -9 server A and start it > 9.scan the region, we could only get two KV(r1->v1,r2->v2), the third > KV(r3->v3) is lost. > Let's analyse the upper scenario from the code: > 1.the edit logs of KV(r1->v1) and KV(r3->v3) are both recorded in the same > hlog file on server A. > 2.when we split server B's hlog file in the process of ServerShutdownHandler, > we create one RecoveredEdits file f1 for the region. > 2.when we split server A's hlog file in the process of ServerShutdownHandler, > we create another RecoveredEdits file f2 for the region. > 3.however, RecoveredEdits file f2 will be skiped when initializing region > HRegion#replayRecoveredEditsIfAny > {code} > for (Path edits: files) { > if (edits == null || !this.fs.exists(edits)) { > LOG.warn("Null or non-existent edits file: " + edits); > continue; > } > if (isZeroLengthThenDelete(this.fs, edits)) continue; > if (checkSafeToSkip) { > Path higher = files.higher(edits); > long maxSeqId = Long.MAX_VALUE; > if (higher != null) { > // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: "-?[0-9]+" > String fileName = higher.getName(); > maxSeqId = Math.abs(Long.parseLong(fileName)); > } > if (maxSeqId <= minSeqId) { > String msg = "Maximum possible sequenceid for this log is " + > maxSeqId > + ", skipped the whole file, path=" + edits; > LOG.debug(msg); > continue; > } else { > checkSafeToSkip = false; > } > } > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira