Re: [DISCUSS] Separate Git Repository for HBCK2
On Tue, Jul 24, 2018 at 10:27 AM Andrew Purtell wrote: > If we do this can we also move out hbck version 1? It would be really weird > in my opinion to have v2 in a separate repo but v1 shipping with the 1.x > releases. That would be a source of understandable confusion. > > My sense is that hbck1 is not externalizable; we'd not be able to move it out of core because it does dirty tricks all over the shop. But lets see... S > I believe our compatibility guidelines allow us to upgrade interface > annotations from private to LP or Public and from LP to Public. These are > not changes that impact source or binary compatibility. They only change > the promises we make going forward about their stability. I believe we can > allow these in new minors, so we could potentially move hbck out in a > 1.5.0. > > > On Mon, Jul 23, 2018 at 4:46 PM Stack wrote: > > > On Thu, Jul 19, 2018 at 2:09 PM Umesh Agashe > > > > > wrote: > > > > > Hi, > > > > > > I've had the opportunity to talk about HBCK2 with a few of you. One of > > the > > > suggestions is to to have a separate git repository for HBCK2. Lets > > discuss > > > about it. > > > > > > In the past when bugs were found in hbck, there is no easy way to > release > > > patched version of just hbck (without patching HBase). If HBCK2 has a > > > separate git repo, HBCK2 versions will not be tightly related to HBase > > > versions. Fixing and releasing hbck2, may not require patching HBase. > > > Though tight coupling will be somewhat loosened, HBCK2 will still > depend > > on > > > HBase APIs/ code. Caution will be required going forward regarding > > > compatibility. > > > > > > What you all think? > > > > > > > > I think this the way to go. > > > > We'd make a new hbase-hbck2 repo as we did for hbase-thirdparty? > > > > We'd use the hbase JIRA for hbase-hbck2 issues? > > > > We'd make hbase-hbck2 releases on occasion that the PMC voted on? > > > > Sounds great! > > St.Ack > > > > Thanks, > > > Umesh > > > > > > JIRA: https://issues.apache.org/jira/browse/HBASE-19121. > > > Doc: > > > > > > > > > https://docs.google.com/document/d/1NxSFu4TKQ6lY-9J5qsCcJb9kZOnkfX66KMYsiVxBy0Y/edit?usp=sharing > > > > > > > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from truth's > decrepit hands >- A23, Crosstalk >
Re: [DISCUSS] Separate Git Repository for HBCK2
On Tue, Jul 24, 2018 at 8:53 AM Josh Elser wrote: > (-cc user as this I'm getting purely into code development topics) > > First off, thanks for working on an hbck2, Umesh! > > I like the idea of having a separate repository for tracking HBCK and > the flexibility it gives us for making releases at a cadence of our > choosing. > > There are two worries that come to mind immediately: > > * How often does HBCK make decisions on how to implement a correction > based on some known functionality (e.g. a bug) in a specific version(s) > of HBase. Concretely, would HBCK need to make corrections to an HBase > installation that are specific to a subset of HBase 2.x.y versions that > may not be valid for other 2.x.y versions? > hbck2 should be able to do this -- execute a fix ONLY if version matches. I add your suggestion to Umesh's attached doc. S > * How often does HBCK need to re-use methods and constants from code in > hbase-common, hbase-server, etc? > - Related: Is it a goal to firm up API stability around this shared > code or are you planning to just copy needed code to the HBCK2 repo? I > think you are saying that this *is* a goal -- could/should we introduce > some new level of InterfaceAudience to assert that we don't > inadvertently break HBCK2? > > Thanks! > > On 7/19/18 5:09 PM, Umesh Agashe wrote: > > Hi, > > > > I've had the opportunity to talk about HBCK2 with a few of you. One of > the > > suggestions is to to have a separate git repository for HBCK2. Lets > discuss > > about it. > > > > In the past when bugs were found in hbck, there is no easy way to release > > patched version of just hbck (without patching HBase). If HBCK2 has a > > separate git repo, HBCK2 versions will not be tightly related to HBase > > versions. Fixing and releasing hbck2, may not require patching HBase. > > Though tight coupling will be somewhat loosened, HBCK2 will still depend > on > > HBase APIs/ code. Caution will be required going forward regarding > > compatibility. > > > > What you all think? > > > > Thanks, > > Umesh > > > > JIRA: https://issues.apache.org/jira/browse/HBASE-19121. > > Doc: > > > https://docs.google.com/document/d/1NxSFu4TKQ6lY-9J5qsCcJb9kZOnkfX66KMYsiVxBy0Y/edit?usp=sharing > > >
Re: [DISCUSS] Kafka Connection, HBASE-15320
On Tue, Jul 24, 2018 at 10:01 PM Misty Linville wrote: > I like the idea of a separate connectors repo/release vehicle, but I'm a > little concerned about the need to release all together to update just one > of the connectors. How would that work? What kind of compatibility > guarantees are we signing up for? > > I hate responses that begin "Good question" -- so fawning -- but, ahem, good question Misty (in the literal, not flattering, sense). I think hbase-connectors will be like hbase-thirdparty. The latter includes netty, pb, guava and a few other bits and pieces so yeah, sometimes a netty upgrade or an improvement on our patch to pb will require us releasing all though we are fixing one lib only. Usually, if bothering to make a release, we'll check for fixes or updates we can do in the other bundled components. On the rate of releases, I foresee a flurry of activity around launch as we fill missing bits and address critical bug fixes, but that then it will settle down to be boring, with just the occasional update. Thrift and REST have been stable for a good while now (not saying this is a good thing). Our Sean just suggested moving mapreduce to connectors too -- an interesting idea -- and this has also been stable too (at least until recently with the shading work). We should talk about the Spark connector when it comes time. It might not be as stable as the others. On the compatibility guarantees, we'll semver it so if an incompatible change in a connector or if the connectors have to change to match a new version of hbase, we'll make sure the hbase-connector version number is changed appropriately. On the backend, what Mike says; connectors use HBase Public APIs (else they can't be moved to the hbase-connector repo). S > On Tue, Jul 24, 2018, 9:41 PM Stack wrote: > > > Grand. I filed https://issues.apache.org/jira/browse/HBASE-20934. Let me > > have a go at making the easy one work first (the kafka proxy). Lets see > how > > it goes. I'll report back here. > > S > > > > On Tue, Jul 24, 2018 at 2:43 PM Sean Busbey wrote: > > > > > Key functionality for the project's adoption should be in the project. > > > Please do not suggest we donate things to Bahir. > > > > > > I apologize if this is brisk. I have had previous negative experiences > > > with folks that span our communities trying to move work I spent a lot > > > of time contributing to within HBase over to Bahir in an attempt to > > > bypass an agreed upon standard of quality. > > > > > > On Tue, Jul 24, 2018 at 3:38 PM, Artem Ervits > > > wrote: > > > > Why not just donating the connector to http://bahir.apache.org/ ? > > > > > > > > On Tue, Jul 24, 2018, 12:51 PM Lars Francke > > > wrote: > > > > > > > >> I'd love to have the Kafka Connector included. > > > >> > > > >> @Mike thanks so much for the contribution (and your planned ones) > > > >> > > > >> I'm +1 on adding it to the core but I'm also +1 on having a separate > > > >> repository under Apache governance > > > >> > > > >> On Tue, Jul 24, 2018 at 6:01 PM, Josh Elser > > wrote: > > > >> > > > >> > +1 to the great point by Duo about use of non-IA.Public classes > > > >> > > > > >> > +1 for Apache for the governance (although, I wouldn't care if we > > use > > > >> > Github PRs to try to encourage more folks to contribute), a repo > > with > > > the > > > >> > theme of "connectors" (to include Thrift, REST, and the like). > Spark > > > too > > > >> -- > > > >> > I think we had suggested that prior, but it could be a mental > > > invention > > > >> of > > > >> > mine.. > > > >> > > > > >> > > > > >> > On 7/24/18 10:16 AM, Hbase Janitor wrote: > > > >> > > > > >> >> Hi everyone, > > > >> >> > > > >> >> I'm the author of the patch. A separate repo for all the > > connectors > > > is > > > >> a > > > >> >> great idea! I can make whatever changes necessary to the patch to > > > help. > > > >> >> > > > >> >> I have several other integration type projects like this planned. > > > >> >> > > > >> >> Mike > > > >> >> > > > >> >> > > > >> >> On Tue, Jul 24, 2018, 00:03 Mike Drob wrote: > > > >> >> > > > >> >> I would be ok with all of the connectors in a single repo. Doing > a > > > repo > > > >> >>> per > > > >> >>> connector seems like a large amount of overhead work. > > > >> >>> > > > >> >>> On Mon, Jul 23, 2018, 9:12 PM Clay B. wrote: > > > >> >>> > > > >> >>> [Non-binding] > > > >> > > > >> I am all for the Kafka Connect(er) as indeed it makes HBase > "more > > > >> relevant" and generates buzz to help me sell HBase adoption in > my > > > >> endeavors. > > > >> > > > >> Also, I would like to see a connectors repo a lot as I would > > > expect it > > > >> > > > >> >>> can > > > >> >>> > > > >> make the HBase source and releases more obvious in what is > > > changing. > > > >> Not > > > >> to distract from Kafka, but Spark has in the past been a > hang-up > > > and > > > >> > > > >> >>> seems > > > >> >>> > > > >> a good
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
On Wed, Jul 25, 2018 at 11:55 AM Josh Elser wrote: > ... > My biggest take-away is that I complicated this document by tying it too > closely with "HBase on Cloud", treating the WAL+Ratis LogService as the > only/biggest thing to figure out. This was inaccurate and overly bold of > me: I apologize. I think this complicated discussion on a number of > points, and ate a good bit of some of your's time. > > No need of apology. There was healthy back and forth. You read the feedback and took it on board. (See below). > My goal was to present this as an important part of a transition to the > "cloud", giving justification to what WAL+Ratis helps HBase achieve. I > did not want this document to be a step-by-step guide to a perfect HBase > on Cloud design. I need to do a better job with this in the future; sorry. That said, my feeling is that, on the whole, folks are in support of the > proposed changes/architecture described for the WAL+Ratis work (tl;dr > revisit WAL API, plug in current WAL implementation to any API > modification, build new Ratis-backed WAL impl). There were some concerns > which still need immediate action that I am aware of: > > * Sync with Ram and Anoop re: in-memory WAL [1] > * Where is Ratis LogService metadata kept? How do we know what > LogStreams were being used/maintained by a RS? How does this tie into > recovery? > > There are also long-term concerns which I don't think I have an answer > for yet (for either reasons out of my control or a lack of technical > understanding): > > * Maturity of the Ratis community > * Required performance by HBase and the ability of the LogService to > provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging > down disks, ability to scale RAFT quorums). > * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, > dependent upon Ratis scalability. > * I/O amplification on WAL retention for backup and replication > ("logstream export") > * Ensure that LogStreams can be exported to a dist-filesystem in a > manner which requires no additional metadata/handling (avoid more > storage/mgmt complexity) > * Ability to build krb5 authn into Ratis (really, gRPC) > > I will continue the two immediate action items. I think the latter > concerns are some that will require fingers-on-keyboard -- I don't know > enough about runtime characteristics without seeing it for myself. > > All this said, I'd like to start moving toward the point where we start > breaking out this work into a feature-branch off of master and start > building code. My hope is that this is amenable to everyone, with the > acknowledge that the Ratis work is considered "experimental" and not an > attempt to make all of HBase use Ratis-backed WALs. > > Go for it. The branch would have WAL API changes only or would it include Ratis WAL dev? (If the latter, would that be better done over on Ratis project?). S > Finally, I do *not* want this message to be interpreted as me squashing > anyone's concerns. My honest opinion is that discussion has died down, > but I will be the first to apologize if I have missed any outstanding > concerns. Please, please, please ping me if I am negligent. > > Thanks once again for everyone's participation. > > [1] > > https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM > > On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, > > > > A long time ago, I shared a document about a (I'll call it..) "vision" > > where we make some steps towards decoupling HBase from HDFS in an effort > > to make deploying HBase on Cloud IaaS providers a bit easier > > (operational simplicity, effective use of common IaaS paradigms, etc). > > > > > https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing > > > > A good ask from our Stack back then was: "[can you break down this > > work]?" The original document was very high-level, and asking for some > > more details make a lot of sense. Months later, I'd like to share that > > I've updated the original document with some new content at the bottom > > (as well as addressed some comments which went unanswered by me -- > sorry!) > > > > Based on a discussion I had earlier this week (and some discussions > > during HBaseCon in California in June), I've tried to add a brief > > "refresher" on what some of the big goals for this effort are. Please > > check it out at your leisure and let me know what you think. Would like > > to start getting some fingers behind this all and pump out some code :) > > > > > https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#bookmark=id.fml9ynrqagk > > > > - Josh > > >
[jira] [Created] (HBASE-20943) Add offline/online region count into metrics
Tianying Chang created HBASE-20943: -- Summary: Add offline/online region count into metrics Key: HBASE-20943 URL: https://issues.apache.org/jira/browse/HBASE-20943 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 1.2.6.1, 2.0.0 Reporter: Tianying Chang We intensively use metrics to monitor the health of our HBase production cluster. We have seen some regions of a table stuck and cannot be brought online due to AWS issue which cause some log file corrupted. It will be good if we can catch this early. Although WebUI has this information, it is not useful for automated monitoring. By adding this metric, we can easily monitor them with our monitoring system. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20942) Make RpcServer trace log length configurable
Mike Drob created HBASE-20942: - Summary: Make RpcServer trace log length configurable Key: HBASE-20942 URL: https://issues.apache.org/jira/browse/HBASE-20942 Project: HBase Issue Type: Task Reporter: Esteban Gutierrez We truncate RpcServer output to 1000 characters for trace logging. Would be better if that value was configurable. Esteban mentioned this to me earlier, so I'm crediting him as the reporter. cc: [~elserj] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Thanks, Zach! I like your suggestion about project updates. I sincerely hope that this can be something transparent enough that folks who want to follow-on and participate in implementation can do so. Let me think about how to drive this better. On 7/25/18 3:55 PM, Zach York wrote: +1 to starting the work. I think most of the concerns can be figured out on the JIRAs and we can have a project update every X weeks if enough people are interested. I also agree to frame the feature correctly. Decoupling from a HDFS WAL or WAL on Ratis would be more appropriate names that would better convey the scope. I think there are a number of projects necessary to complete "HBase on Cloud" with this being one of those. Thanks for driving this initiative! Zach On Wed, Jul 25, 2018 at 11:55 AM, Josh Elser wrote: Let me give an update on-list for everyone: First and foremost, thank you very much to everyone who took the time to read this, with an extra thanks to those who participated in discussion. There were lots of great points raised. Some about things that were unclear in the doc, and others shining light onto subjects I hadn't considered yet. My biggest take-away is that I complicated this document by tying it too closely with "HBase on Cloud", treating the WAL+Ratis LogService as the only/biggest thing to figure out. This was inaccurate and overly bold of me: I apologize. I think this complicated discussion on a number of points, and ate a good bit of some of your's time. My goal was to present this as an important part of a transition to the "cloud", giving justification to what WAL+Ratis helps HBase achieve. I did not want this document to be a step-by-step guide to a perfect HBase on Cloud design. I need to do a better job with this in the future; sorry. That said, my feeling is that, on the whole, folks are in support of the proposed changes/architecture described for the WAL+Ratis work (tl;dr revisit WAL API, plug in current WAL implementation to any API modification, build new Ratis-backed WAL impl). There were some concerns which still need immediate action that I am aware of: * Sync with Ram and Anoop re: in-memory WAL [1] * Where is Ratis LogService metadata kept? How do we know what LogStreams were being used/maintained by a RS? How does this tie into recovery? There are also long-term concerns which I don't think I have an answer for yet (for either reasons out of my control or a lack of technical understanding): * Maturity of the Ratis community * Required performance by HBase and the ability of the LogService to provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down disks, ability to scale RAFT quorums). * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, dependent upon Ratis scalability. * I/O amplification on WAL retention for backup and replication ("logstream export") * Ensure that LogStreams can be exported to a dist-filesystem in a manner which requires no additional metadata/handling (avoid more storage/mgmt complexity) * Ability to build krb5 authn into Ratis (really, gRPC) I will continue the two immediate action items. I think the latter concerns are some that will require fingers-on-keyboard -- I don't know enough about runtime characteristics without seeing it for myself. All this said, I'd like to start moving toward the point where we start breaking out this work into a feature-branch off of master and start building code. My hope is that this is amenable to everyone, with the acknowledge that the Ratis work is considered "experimental" and not an attempt to make all of HBase use Ratis-backed WALs. Finally, I do *not* want this message to be interpreted as me squashing anyone's concerns. My honest opinion is that discussion has died down, but I will be the first to apologize if I have missed any outstanding concerns. Please, please, please ping me if I am negligent. Thanks once again for everyone's participation. [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb SJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, A long time ago, I shared a document about a (I'll call it..) "vision" where we make some steps towards decoupling HBase from HDFS in an effort to make deploying HBase on Cloud IaaS providers a bit easier (operational simplicity, effective use of common IaaS paradigms, etc). https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb SJwBHVxbO7ge5ORqbCk/edit?usp=sharing A good ask from our Stack back then was: "[can you break down this work]?" The original document was very high-level, and asking for some more details make a lot of sense. Months later, I'd like to share that I've updated the original document with some new content at the bottom (as well as addressed some comments which went unanswered by me -- sorry!) Based on a discussion I had earlier this week (and some discussions during HBaseCon in California in June), I've tried to add a brief
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Thanks, Andrew. I was really upset that I was butting heads with you when I would have previously thought that I had a design which was in line with something you would have called "good". I will wholly take the blame in not having an as-clear-as-possible design doc. I am way down in the weeds and didn't bring myself up for air before trying to write something consumable for everyone else. Making a good API is my biggest goal for the HBase side, and my hope is that it will support this experiment, enable others who want to try out other systems, and simplify our existing WAL implementations. Thanks for the reply. On 7/25/18 3:50 PM, Andrew Purtell wrote: My biggest take-away is that I complicated this document by tying it too closely with "HBase on Cloud", treating the WAL+Ratis LogService as the only/biggest thing to figure out. Understanding this now helps a lot to understand better the positions taken from the doc. At first glance it read as an initially interesting document that quickly went to a weird place where there was a preconceived solution working backward toward a problem, engineering run in reverse. I think it's perfectly fine if the Ratis podling and those associated with it want to drive development and/or adoption by finding candidate use cases in other ecosystem projects. As long as we have good interfaces which don't leak internals, no breaking core changes, no hard dependencies on incubating artifacts, and at least a potential path forward to alternate implementations it's all good! On Wed, Jul 25, 2018 at 11:55 AM Josh Elser wrote: Let me give an update on-list for everyone: First and foremost, thank you very much to everyone who took the time to read this, with an extra thanks to those who participated in discussion. There were lots of great points raised. Some about things that were unclear in the doc, and others shining light onto subjects I hadn't considered yet. My biggest take-away is that I complicated this document by tying it too closely with "HBase on Cloud", treating the WAL+Ratis LogService as the only/biggest thing to figure out. This was inaccurate and overly bold of me: I apologize. I think this complicated discussion on a number of points, and ate a good bit of some of your's time. My goal was to present this as an important part of a transition to the "cloud", giving justification to what WAL+Ratis helps HBase achieve. I did not want this document to be a step-by-step guide to a perfect HBase on Cloud design. I need to do a better job with this in the future; sorry. That said, my feeling is that, on the whole, folks are in support of the proposed changes/architecture described for the WAL+Ratis work (tl;dr revisit WAL API, plug in current WAL implementation to any API modification, build new Ratis-backed WAL impl). There were some concerns which still need immediate action that I am aware of: * Sync with Ram and Anoop re: in-memory WAL [1] * Where is Ratis LogService metadata kept? How do we know what LogStreams were being used/maintained by a RS? How does this tie into recovery? There are also long-term concerns which I don't think I have an answer for yet (for either reasons out of my control or a lack of technical understanding): * Maturity of the Ratis community * Required performance by HBase and the ability of the LogService to provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down disks, ability to scale RAFT quorums). * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, dependent upon Ratis scalability. * I/O amplification on WAL retention for backup and replication ("logstream export") * Ensure that LogStreams can be exported to a dist-filesystem in a manner which requires no additional metadata/handling (avoid more storage/mgmt complexity) * Ability to build krb5 authn into Ratis (really, gRPC) I will continue the two immediate action items. I think the latter concerns are some that will require fingers-on-keyboard -- I don't know enough about runtime characteristics without seeing it for myself. All this said, I'd like to start moving toward the point where we start breaking out this work into a feature-branch off of master and start building code. My hope is that this is amenable to everyone, with the acknowledge that the Ratis work is considered "experimental" and not an attempt to make all of HBase use Ratis-backed WALs. Finally, I do *not* want this message to be interpreted as me squashing anyone's concerns. My honest opinion is that discussion has died down, but I will be the first to apologize if I have missed any outstanding concerns. Please, please, please ping me if I am negligent. Thanks once again for everyone's participation. [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, A long time ago, I shared a document about a (I'll call it..) "vision" where we make
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
+1 to starting the work. I think most of the concerns can be figured out on the JIRAs and we can have a project update every X weeks if enough people are interested. I also agree to frame the feature correctly. Decoupling from a HDFS WAL or WAL on Ratis would be more appropriate names that would better convey the scope. I think there are a number of projects necessary to complete "HBase on Cloud" with this being one of those. Thanks for driving this initiative! Zach On Wed, Jul 25, 2018 at 11:55 AM, Josh Elser wrote: > Let me give an update on-list for everyone: > > First and foremost, thank you very much to everyone who took the time to > read this, with an extra thanks to those who participated in discussion. > There were lots of great points raised. Some about things that were unclear > in the doc, and others shining light onto subjects I hadn't considered yet. > > My biggest take-away is that I complicated this document by tying it too > closely with "HBase on Cloud", treating the WAL+Ratis LogService as the > only/biggest thing to figure out. This was inaccurate and overly bold of > me: I apologize. I think this complicated discussion on a number of points, > and ate a good bit of some of your's time. > > My goal was to present this as an important part of a transition to the > "cloud", giving justification to what WAL+Ratis helps HBase achieve. I did > not want this document to be a step-by-step guide to a perfect HBase on > Cloud design. I need to do a better job with this in the future; sorry. > > That said, my feeling is that, on the whole, folks are in support of the > proposed changes/architecture described for the WAL+Ratis work (tl;dr > revisit WAL API, plug in current WAL implementation to any API > modification, build new Ratis-backed WAL impl). There were some concerns > which still need immediate action that I am aware of: > > * Sync with Ram and Anoop re: in-memory WAL [1] > * Where is Ratis LogService metadata kept? How do we know what LogStreams > were being used/maintained by a RS? How does this tie into recovery? > > There are also long-term concerns which I don't think I have an answer for > yet (for either reasons out of my control or a lack of technical > understanding): > > * Maturity of the Ratis community > * Required performance by HBase and the ability of the LogService to > provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down > disks, ability to scale RAFT quorums). > * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, > dependent upon Ratis scalability. > * I/O amplification on WAL retention for backup and replication > ("logstream export") > * Ensure that LogStreams can be exported to a dist-filesystem in a manner > which requires no additional metadata/handling (avoid more storage/mgmt > complexity) > * Ability to build krb5 authn into Ratis (really, gRPC) > > I will continue the two immediate action items. I think the latter > concerns are some that will require fingers-on-keyboard -- I don't know > enough about runtime characteristics without seeing it for myself. > > All this said, I'd like to start moving toward the point where we start > breaking out this work into a feature-branch off of master and start > building code. My hope is that this is amenable to everyone, with the > acknowledge that the Ratis work is considered "experimental" and not an > attempt to make all of HBase use Ratis-backed WALs. > > Finally, I do *not* want this message to be interpreted as me squashing > anyone's concerns. My honest opinion is that discussion has died down, but > I will be the first to apologize if I have missed any outstanding concerns. > Please, please, please ping me if I am negligent. > > Thanks once again for everyone's participation. > > [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb > SJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM > > On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, > >> >> A long time ago, I shared a document about a (I'll call it..) "vision" >> where we make some steps towards decoupling HBase from HDFS in an effort to >> make deploying HBase on Cloud IaaS providers a bit easier (operational >> simplicity, effective use of common IaaS paradigms, etc). >> >> https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb >> SJwBHVxbO7ge5ORqbCk/edit?usp=sharing >> >> A good ask from our Stack back then was: "[can you break down this >> work]?" The original document was very high-level, and asking for some more >> details make a lot of sense. Months later, I'd like to share that I've >> updated the original document with some new content at the bottom (as well >> as addressed some comments which went unanswered by me -- sorry!) >> >> Based on a discussion I had earlier this week (and some discussions >> during HBaseCon in California in June), I've tried to add a brief >> "refresher" on what some of the big goals for this effort are. Please check >> it out at your leisure and let me know what
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
> My biggest take-away is that I complicated this document by tying it too > closely with "HBase on Cloud", treating the WAL+Ratis LogService as the only/biggest thing to figure out. Understanding this now helps a lot to understand better the positions taken from the doc. At first glance it read as an initially interesting document that quickly went to a weird place where there was a preconceived solution working backward toward a problem, engineering run in reverse. I think it's perfectly fine if the Ratis podling and those associated with it want to drive development and/or adoption by finding candidate use cases in other ecosystem projects. As long as we have good interfaces which don't leak internals, no breaking core changes, no hard dependencies on incubating artifacts, and at least a potential path forward to alternate implementations it's all good! On Wed, Jul 25, 2018 at 11:55 AM Josh Elser wrote: > Let me give an update on-list for everyone: > > First and foremost, thank you very much to everyone who took the time to > read this, with an extra thanks to those who participated in discussion. > There were lots of great points raised. Some about things that were > unclear in the doc, and others shining light onto subjects I hadn't > considered yet. > > My biggest take-away is that I complicated this document by tying it too > closely with "HBase on Cloud", treating the WAL+Ratis LogService as the > only/biggest thing to figure out. This was inaccurate and overly bold of > me: I apologize. I think this complicated discussion on a number of > points, and ate a good bit of some of your's time. > > My goal was to present this as an important part of a transition to the > "cloud", giving justification to what WAL+Ratis helps HBase achieve. I > did not want this document to be a step-by-step guide to a perfect HBase > on Cloud design. I need to do a better job with this in the future; sorry. > > That said, my feeling is that, on the whole, folks are in support of the > proposed changes/architecture described for the WAL+Ratis work (tl;dr > revisit WAL API, plug in current WAL implementation to any API > modification, build new Ratis-backed WAL impl). There were some concerns > which still need immediate action that I am aware of: > > * Sync with Ram and Anoop re: in-memory WAL [1] > * Where is Ratis LogService metadata kept? How do we know what > LogStreams were being used/maintained by a RS? How does this tie into > recovery? > > There are also long-term concerns which I don't think I have an answer > for yet (for either reasons out of my control or a lack of technical > understanding): > > * Maturity of the Ratis community > * Required performance by HBase and the ability of the LogService to > provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging > down disks, ability to scale RAFT quorums). > * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, > dependent upon Ratis scalability. > * I/O amplification on WAL retention for backup and replication > ("logstream export") > * Ensure that LogStreams can be exported to a dist-filesystem in a > manner which requires no additional metadata/handling (avoid more > storage/mgmt complexity) > * Ability to build krb5 authn into Ratis (really, gRPC) > > I will continue the two immediate action items. I think the latter > concerns are some that will require fingers-on-keyboard -- I don't know > enough about runtime characteristics without seeing it for myself. > > All this said, I'd like to start moving toward the point where we start > breaking out this work into a feature-branch off of master and start > building code. My hope is that this is amenable to everyone, with the > acknowledge that the Ratis work is considered "experimental" and not an > attempt to make all of HBase use Ratis-backed WALs. > > Finally, I do *not* want this message to be interpreted as me squashing > anyone's concerns. My honest opinion is that discussion has died down, > but I will be the first to apologize if I have missed any outstanding > concerns. Please, please, please ping me if I am negligent. > > Thanks once again for everyone's participation. > > [1] > > https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM > > On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, > > > > A long time ago, I shared a document about a (I'll call it..) "vision" > > where we make some steps towards decoupling HBase from HDFS in an effort > > to make deploying HBase on Cloud IaaS providers a bit easier > > (operational simplicity, effective use of common IaaS paradigms, etc). > > > > > https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing > > > > A good ask from our Stack back then was: "[can you break down this > > work]?" The original document was very high-level, and asking for some > > more details make a lot of sense. Months later, I'd like to share that > > I've
[jira] [Created] (HBASE-20941) Cre
Umesh Agashe created HBASE-20941: Summary: Cre Key: HBASE-20941 URL: https://issues.apache.org/jira/browse/HBASE-20941 Project: HBase Issue Type: Sub-task Reporter: Umesh Agashe Assignee: Umesh Agashe Create HbckService in master and implement following methods: # purgeProcedure/s(): some procedures do not support abort at every step. When these procedures get stuck then they can not be aborted or make further progress. Corrective action is to purge these procedures from ProcWAL. Provide option to purge sub-procedures as well. # setTable/RegionState(): If table/ region state are inconsistent with action/ procedures working on them, sometimes manipulating their states in meta fix things. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] Kafka Connection, HBASE-15320
Hi Misty, As long as the connectors use a public API, we can be flexible. We get the same guarantees app programmers get. Mike On Wed, Jul 25, 2018, 01:01 Misty Linville wrote: > I like the idea of a separate connectors repo/release vehicle, but I'm a > little concerned about the need to release all together to update just one > of the connectors. How would that work? What kind of compatiy > guarantees are we signing up for? > > >
Re: [DISCUSS] Separate Git Repository for HBCK2
bq. Seems like you're saying it's not a problem now, but you're not sure if it would become a problem. Regardless of that, it's a goal to not be version-specific (and thus, we can have generic hbck-v1 and hbck-v2 tools). LMK if I misread, please :) Thats right. On Wed, Jul 25, 2018 at 11:11 AM Josh Elser wrote: > Thanks, Umesh. Seems like you're saying it's not a problem now, but > you're not sure if it would become a problem. Regardless of that, it's a > goal to not be version-specific (and thus, we can have generic hbck-v1 > and hbck-v2 tools). LMK if I misread, please :) > > One more thought, it would be nice to name this repository as > "operator-tools" or similar (instead of hbck). A separate repo on its > own release cadence is a nice vehicle for random sorts of recovery, > slice-and-dice, one-off tools. I think HBCK is one example of > administrator/operator tooling we provide (certainly, the most used), > but we have the capacity to provide more than just that. > > On 7/24/18 5:55 PM, Umesh Agashe wrote: > > Thanks Stack, Josh and Andrew for your suggestions and concerns. > > > > I share Stack's suggestions. This would be similar to hbase-thirdparty. > The > > new repo could be hbase-hbck/hbase-hbck2. As this tool will be used by > > hbase users/ developers, hbase JIRA can be used for hbck issues. > > > > bq. How often does HBCK need to re-use methods and constants from code > > in hbase-common, hbase-server, etc? > > bq. Is it a goal to firm up API stability around this shared code. > > > > bq. If we do this can we also move out hbck version 1? > > > > As HBCK2 tool will be freshly written, we can try to achieve this goal. I > > think its great idea to move hbck1 to new repo as well. Though I think > its > > more involved with hbck1 as the existing code already uses what it can > from > > hbase-common and hbase-server etc. modules. > > > > bq. How often does HBCK make decisions on how to implement a correction > > based on some known functionality (e.g. a bug) in a specific version(s) > > of HBase. Concretely, would HBCK need to make corrections to an HBase > > installation that are specific to a subset of HBase 2.x.y versions that > > may not be valid for other 2.x.y versions? > > > > I see if this happens too often, compatibility metrics will be > complicated. > > > > Thanks, > > Umesh > > > > > > On Tue, Jul 24, 2018 at 10:27 AM Andrew Purtell > wrote: > > > >> If we do this can we also move out hbck version 1? It would be really > weird > >> in my opinion to have v2 in a separate repo but v1 shipping with the 1.x > >> releases. That would be a source of understandable confusion. > >> > >> I believe our compatibility guidelines allow us to upgrade interface > >> annotations from private to LP or Public and from LP to Public. These > are > >> not changes that impact source or binary compatibility. They only change > >> the promises we make going forward about their stability. I believe we > can > >> allow these in new minors, so we could potentially move hbck out in a > >> 1.5.0. > >> > >> > >> On Mon, Jul 23, 2018 at 4:46 PM Stack wrote: > >> > >>> On Thu, Jul 19, 2018 at 2:09 PM Umesh Agashe > >> > >>> wrote: > >>> > Hi, > > I've had the opportunity to talk about HBCK2 with a few of you. One of > >>> the > suggestions is to to have a separate git repository for HBCK2. Lets > >>> discuss > about it. > > In the past when bugs were found in hbck, there is no easy way to > >> release > patched version of just hbck (without patching HBase). If HBCK2 has a > separate git repo, HBCK2 versions will not be tightly related to HBase > versions. Fixing and releasing hbck2, may not require patching HBase. > Though tight coupling will be somewhat loosened, HBCK2 will still > >> depend > >>> on > HBase APIs/ code. Caution will be required going forward regarding > compatibility. > > What you all think? > > > >>> I think this the way to go. > >>> > >>> We'd make a new hbase-hbck2 repo as we did for hbase-thirdparty? > >>> > >>> We'd use the hbase JIRA for hbase-hbck2 issues? > >>> > >>> We'd make hbase-hbck2 releases on occasion that the PMC voted on? > >>> > >>> Sounds great! > >>> St.Ack > >>> > >>> Thanks, > Umesh > > JIRA: https://issues.apache.org/jira/browse/HBASE-19121. > Doc: > > > >>> > >> > https://docs.google.com/document/d/1NxSFu4TKQ6lY-9J5qsCcJb9kZOnkfX66KMYsiVxBy0Y/edit?usp=sharing > > >>> > >> > >> > >> -- > >> Best regards, > >> Andrew > >> > >> Words like orphans lost among the crosstalk, meaning torn from truth's > >> decrepit hands > >> - A23, Crosstalk > >> > > >
Re: [DISCUSS] Separate Git Repository for HBCK2
Thanks Josh! separate 'operator-tools' repo for hbase tools is a great suggestion. We can work towards it starting with hbck2. Each existing tool needs to be looked in detail regarding how much code it shares with HBase. On Wed, Jul 25, 2018 at 11:11 AM Josh Elser wrote: > Thanks, Umesh. Seems like you're saying it's not a problem now, but > you're not sure if it would become a problem. Regardless of that, it's a > goal to not be version-specific (and thus, we can have generic hbck-v1 > and hbck-v2 tools). LMK if I misread, please :) > > One more thought, it would be nice to name this repository as > "operator-tools" or similar (instead of hbck). A separate repo on its > own release cadence is a nice vehicle for random sorts of recovery, > slice-and-dice, one-off tools. I think HBCK is one example of > administrator/operator tooling we provide (certainly, the most used), > but we have the capacity to provide more than just that. > > On 7/24/18 5:55 PM, Umesh Agashe wrote: > > Thanks Stack, Josh and Andrew for your suggestions and concerns. > > > > I share Stack's suggestions. This would be similar to hbase-thirdparty. > The > > new repo could be hbase-hbck/hbase-hbck2. As this tool will be used by > > hbase users/ developers, hbase JIRA can be used for hbck issues. > > > > bq. How often does HBCK need to re-use methods and constants from code > > in hbase-common, hbase-server, etc? > > bq. Is it a goal to firm up API stability around this shared code. > > > > bq. If we do this can we also move out hbck version 1? > > > > As HBCK2 tool will be freshly written, we can try to achieve this goal. I > > think its great idea to move hbck1 to new repo as well. Though I think > its > > more involved with hbck1 as the existing code already uses what it can > from > > hbase-common and hbase-server etc. modules. > > > > bq. How often does HBCK make decisions on how to implement a correction > > based on some known functionality (e.g. a bug) in a specific version(s) > > of HBase. Concretely, would HBCK need to make corrections to an HBase > > installation that are specific to a subset of HBase 2.x.y versions that > > may not be valid for other 2.x.y versions? > > > > I see if this happens too often, compatibility metrics will be > complicated. > > > > Thanks, > > Umesh > > > > > > On Tue, Jul 24, 2018 at 10:27 AM Andrew Purtell > wrote: > > > >> If we do this can we also move out hbck version 1? It would be really > weird > >> in my opinion to have v2 in a separate repo but v1 shipping with the 1.x > >> releases. That would be a source of understandable confusion. > >> > >> I believe our compatibility guidelines allow us to upgrade interface > >> annotations from private to LP or Public and from LP to Public. These > are > >> not changes that impact source or binary compatibility. They only change > >> the promises we make going forward about their stability. I believe we > can > >> allow these in new minors, so we could potentially move hbck out in a > >> 1.5.0. > >> > >> > >> On Mon, Jul 23, 2018 at 4:46 PM Stack wrote: > >> > >>> On Thu, Jul 19, 2018 at 2:09 PM Umesh Agashe > >> > >>> wrote: > >>> > Hi, > > I've had the opportunity to talk about HBCK2 with a few of you. One of > >>> the > suggestions is to to have a separate git repository for HBCK2. Lets > >>> discuss > about it. > > In the past when bugs were found in hbck, there is no easy way to > >> release > patched version of just hbck (without patching HBase). If HBCK2 has a > separate git repo, HBCK2 versions will not be tightly related to HBase > versions. Fixing and releasing hbck2, may not require patching HBase. > Though tight coupling will be somewhat loosened, HBCK2 will still > >> depend > >>> on > HBase APIs/ code. Caution will be required going forward regarding > compatibility. > > What you all think? > > > >>> I think this the way to go. > >>> > >>> We'd make a new hbase-hbck2 repo as we did for hbase-thirdparty? > >>> > >>> We'd use the hbase JIRA for hbase-hbck2 issues? > >>> > >>> We'd make hbase-hbck2 releases on occasion that the PMC voted on? > >>> > >>> Sounds great! > >>> St.Ack > >>> > >>> Thanks, > Umesh > > JIRA: https://issues.apache.org/jira/browse/HBASE-19121. > Doc: > > > >>> > >> > https://docs.google.com/document/d/1NxSFu4TKQ6lY-9J5qsCcJb9kZOnkfX66KMYsiVxBy0Y/edit?usp=sharing > > >>> > >> > >> > >> -- > >> Best regards, > >> Andrew > >> > >> Words like orphans lost among the crosstalk, meaning torn from truth's > >> decrepit hands > >> - A23, Crosstalk > >> > > >
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Let me give an update on-list for everyone: First and foremost, thank you very much to everyone who took the time to read this, with an extra thanks to those who participated in discussion. There were lots of great points raised. Some about things that were unclear in the doc, and others shining light onto subjects I hadn't considered yet. My biggest take-away is that I complicated this document by tying it too closely with "HBase on Cloud", treating the WAL+Ratis LogService as the only/biggest thing to figure out. This was inaccurate and overly bold of me: I apologize. I think this complicated discussion on a number of points, and ate a good bit of some of your's time. My goal was to present this as an important part of a transition to the "cloud", giving justification to what WAL+Ratis helps HBase achieve. I did not want this document to be a step-by-step guide to a perfect HBase on Cloud design. I need to do a better job with this in the future; sorry. That said, my feeling is that, on the whole, folks are in support of the proposed changes/architecture described for the WAL+Ratis work (tl;dr revisit WAL API, plug in current WAL implementation to any API modification, build new Ratis-backed WAL impl). There were some concerns which still need immediate action that I am aware of: * Sync with Ram and Anoop re: in-memory WAL [1] * Where is Ratis LogService metadata kept? How do we know what LogStreams were being used/maintained by a RS? How does this tie into recovery? There are also long-term concerns which I don't think I have an answer for yet (for either reasons out of my control or a lack of technical understanding): * Maturity of the Ratis community * Required performance by HBase and the ability of the LogService to provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down disks, ability to scale RAFT quorums). * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, dependent upon Ratis scalability. * I/O amplification on WAL retention for backup and replication ("logstream export") * Ensure that LogStreams can be exported to a dist-filesystem in a manner which requires no additional metadata/handling (avoid more storage/mgmt complexity) * Ability to build krb5 authn into Ratis (really, gRPC) I will continue the two immediate action items. I think the latter concerns are some that will require fingers-on-keyboard -- I don't know enough about runtime characteristics without seeing it for myself. All this said, I'd like to start moving toward the point where we start breaking out this work into a feature-branch off of master and start building code. My hope is that this is amenable to everyone, with the acknowledge that the Ratis work is considered "experimental" and not an attempt to make all of HBase use Ratis-backed WALs. Finally, I do *not* want this message to be interpreted as me squashing anyone's concerns. My honest opinion is that discussion has died down, but I will be the first to apologize if I have missed any outstanding concerns. Please, please, please ping me if I am negligent. Thanks once again for everyone's participation. [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, A long time ago, I shared a document about a (I'll call it..) "vision" where we make some steps towards decoupling HBase from HDFS in an effort to make deploying HBase on Cloud IaaS providers a bit easier (operational simplicity, effective use of common IaaS paradigms, etc). https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing A good ask from our Stack back then was: "[can you break down this work]?" The original document was very high-level, and asking for some more details make a lot of sense. Months later, I'd like to share that I've updated the original document with some new content at the bottom (as well as addressed some comments which went unanswered by me -- sorry!) Based on a discussion I had earlier this week (and some discussions during HBaseCon in California in June), I've tried to add a brief "refresher" on what some of the big goals for this effort are. Please check it out at your leisure and let me know what you think. Would like to start getting some fingers behind this all and pump out some code :) https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#bookmark=id.fml9ynrqagk - Josh
Prep for release candidate 1.5.0 RC0
I would like to put up the first release candidate for 1.5.0 by the end of August. To that end over the next couple of weeks I will be evaluating test stability, cluster stability under chaos testing, and performance differences (if any) with the latest 1.2, 1.3, and 1.4 releases as measured by the open source benchmarking tools at our disposal, PE, LTT, and YCSB. If you have any backport work to branch-1 pending please consider finishing it up and getting it in within the next couple of weeks. However, if the changes are likely to have a significant impact (for example, it conforms to compatibility guidelines for a minor release, but not a patch release) then you might want to hold off until after branch-1.5 has been branched, so it can go into branch-1 for a 1.6.0 release toward the end of the year. Use your best judgement is all I ask. -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk
[VOTE] The first HBase 1.4.6 release candidate (RC0) is available
The first HBase 1.4.6 release candidate (RC0) is available for download at https://dist.apache.org/repos/dist/dev/hbase/hbase-1.4.6RC0/ and Maven artifacts are available in the temporary repository https://repository.apache.org/content/repositories/orgapachehbase-1226/ . The git tag corresponding to the candidate is '1.4.6RC0' (a55bcbd4fc). A detailed source and binary compatibility report for this release is available for your review at https://dist.apache.org/repos/dist/dev/hbase/hbase-1.4.6RC0/compat-check-report.html . There is an added method to the LimitedPrivate interface ReplicationPeer which will not cause binary compatibility issues but will require source changes at recompilation. This type of additive change is allowed. The internal utility class Base64 has been made private and so the related changes are allowed. A list of the 34 issues resolved in this release can be found at https://s.apache.org/kolm . Please try out the candidate and vote +1/0/-1. This vote will be open for at least 72 hours. Unless objection I will try to close it Monday July 30, 2018 if we have sufficient votes. Prior to making this announcement I made the following preflight checks: RAT check passes (7u80) Unit test suite passes (7u80) LTT load 1M rows with 100% verification and 20% updates (8u172) ITBLL Loop 1 500M rows with serverKilling monkey (8u172) -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk
[jira] [Reopened] (HBASE-20893) Data loss if splitting region while ServerCrashProcedure executing
[ https://issues.apache.org/jira/browse/HBASE-20893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-20893: --- Reopening to look at these logs I see running this patch on cluster (Its great it detected recovered.edits... but it looks like the patch causes us to hit CODE-BUG... though we seem to be ok...Minimally it will freak-out an operator): {code} 2018-07-25 06:46:56,692 ERROR [PEWorker-3] assignment.SplitTableRegionProcedure: Error trying to split region 2cb977a87bc6bdf90ef7fc71320d7b50 in the table IntegrationTestBigLinkedList (in state=SPLIT_TABLE_REGIONS_CHECK_CLOSED_REGIONS) java.io.IOException: Recovered.edits are found in Region: {ENCODED => 2cb977a87bc6bdf90ef7fc71320d7b50, NAME => 'IntegrationTestBigLinkedList,z\xAA;\xC7M\x1Bf8\x85\xB5\x07\xD5\x9B#\xCD\xCC,1531911202047.2cb977a87bc6bdf90ef7fc71320d7b50.', STARTKEY => 'z\xAA;\xC7M\x1Bf8\x85\xB5\x07\xD5\x9B#\xCD\xCC', ENDKEY => '{\x8D\xF2?'}, abort split to prevent data loss at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkClosedRegion(SplitTableRegionProcedure.java:151) at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:259) at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:92) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760) 2018-07-25 06:46:56,934 INFO [PEWorker-3] procedure.MasterProcedureScheduler: pid=4106, ppid=4105, state=SUCCESS; UnassignProcedure table=IntegrationTestBigLinkedList, region=2cb977a87bc6bdf90ef7fc71320d7b50, server=ve0540.halxg.cloudera.com,16020,1532501580658 checking lock on 2cb977a87bc6bdf90ef7fc71320d7b50 2018-07-25 06:46:56,934 ERROR [PEWorker-3] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception for pid=4106, ppid=4105, state=SUCCESS; UnassignProcedure table=IntegrationTestBigLinkedList, region=2cb977a87bc6bdf90ef7fc71320d7b50, server=ve0540.halxg.cloudera.com,16020,1532501580658 java.lang.UnsupportedOperationException: Unhandled state REGION_TRANSITION_FINISH; there is no rollback for assignment unless we cancel the operation by dropping/disabling the table at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:412) at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:95) at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760) 2018-07-25 06:46:57,088 ERROR [PEWorker-3] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception for pid=4106, ppid=4105, state=SUCCESS; UnassignProcedure table=IntegrationTestBigLinkedList, region=2cb977a87bc6bdf90ef7fc71320d7b50, server=ve0540.halxg.cloudera.com,16020,1532501580658 java.lang.UnsupportedOperationException:
Re: [DISCUSS] Separate Git Repository for HBCK2
Yes, and in that vein also VerifyReplication and tools of that nature. On Wed, Jul 25, 2018 at 11:11 AM Josh Elser wrote: > Thanks, Umesh. Seems like you're saying it's not a problem now, but > you're not sure if it would become a problem. Regardless of that, it's a > goal to not be version-specific (and thus, we can have generic hbck-v1 > and hbck-v2 tools). LMK if I misread, please :) > > One more thought, it would be nice to name this repository as > "operator-tools" or similar (instead of hbck). A separate repo on its > own release cadence is a nice vehicle for random sorts of recovery, > slice-and-dice, one-off tools. I think HBCK is one example of > administrator/operator tooling we provide (certainly, the most used), > but we have the capacity to provide more than just that. > > On 7/24/18 5:55 PM, Umesh Agashe wrote: > > Thanks Stack, Josh and Andrew for your suggestions and concerns. > > > > I share Stack's suggestions. This would be similar to hbase-thirdparty. > The > > new repo could be hbase-hbck/hbase-hbck2. As this tool will be used by > > hbase users/ developers, hbase JIRA can be used for hbck issues. > > > > bq. How often does HBCK need to re-use methods and constants from code > > in hbase-common, hbase-server, etc? > > bq. Is it a goal to firm up API stability around this shared code. > > > > bq. If we do this can we also move out hbck version 1? > > > > As HBCK2 tool will be freshly written, we can try to achieve this goal. I > > think its great idea to move hbck1 to new repo as well. Though I think > its > > more involved with hbck1 as the existing code already uses what it can > from > > hbase-common and hbase-server etc. modules. > > > > bq. How often does HBCK make decisions on how to implement a correction > > based on some known functionality (e.g. a bug) in a specific version(s) > > of HBase. Concretely, would HBCK need to make corrections to an HBase > > installation that are specific to a subset of HBase 2.x.y versions that > > may not be valid for other 2.x.y versions? > > > > I see if this happens too often, compatibility metrics will be > complicated. > > > > Thanks, > > Umesh > > > > > > On Tue, Jul 24, 2018 at 10:27 AM Andrew Purtell > wrote: > > > >> If we do this can we also move out hbck version 1? It would be really > weird > >> in my opinion to have v2 in a separate repo but v1 shipping with the 1.x > >> releases. That would be a source of understandable confusion. > >> > >> I believe our compatibility guidelines allow us to upgrade interface > >> annotations from private to LP or Public and from LP to Public. These > are > >> not changes that impact source or binary compatibility. They only change > >> the promises we make going forward about their stability. I believe we > can > >> allow these in new minors, so we could potentially move hbck out in a > >> 1.5.0. > >> > >> > >> On Mon, Jul 23, 2018 at 4:46 PM Stack wrote: > >> > >>> On Thu, Jul 19, 2018 at 2:09 PM Umesh Agashe > >> > >>> wrote: > >>> > Hi, > > I've had the opportunity to talk about HBCK2 with a few of you. One of > >>> the > suggestions is to to have a separate git repository for HBCK2. Lets > >>> discuss > about it. > > In the past when bugs were found in hbck, there is no easy way to > >> release > patched version of just hbck (without patching HBase). If HBCK2 has a > separate git repo, HBCK2 versions will not be tightly related to HBase > versions. Fixing and releasing hbck2, may not require patching HBase. > Though tight coupling will be somewhat loosened, HBCK2 will still > >> depend > >>> on > HBase APIs/ code. Caution will be required going forward regarding > compatibility. > > What you all think? > > > >>> I think this the way to go. > >>> > >>> We'd make a new hbase-hbck2 repo as we did for hbase-thirdparty? > >>> > >>> We'd use the hbase JIRA for hbase-hbck2 issues? > >>> > >>> We'd make hbase-hbck2 releases on occasion that the PMC voted on? > >>> > >>> Sounds great! > >>> St.Ack > >>> > >>> Thanks, > Umesh > > JIRA: https://issues.apache.org/jira/browse/HBASE-19121. > Doc: > > > >>> > >> > https://docs.google.com/document/d/1NxSFu4TKQ6lY-9J5qsCcJb9kZOnkfX66KMYsiVxBy0Y/edit?usp=sharing > > >>> > >> > >> > >> -- > >> Best regards, > >> Andrew > >> > >> Words like orphans lost among the crosstalk, meaning torn from truth's > >> decrepit hands > >> - A23, Crosstalk > >> > > > -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk
Re: [DISCUSS] Separate Git Repository for HBCK2
Thanks, Umesh. Seems like you're saying it's not a problem now, but you're not sure if it would become a problem. Regardless of that, it's a goal to not be version-specific (and thus, we can have generic hbck-v1 and hbck-v2 tools). LMK if I misread, please :) One more thought, it would be nice to name this repository as "operator-tools" or similar (instead of hbck). A separate repo on its own release cadence is a nice vehicle for random sorts of recovery, slice-and-dice, one-off tools. I think HBCK is one example of administrator/operator tooling we provide (certainly, the most used), but we have the capacity to provide more than just that. On 7/24/18 5:55 PM, Umesh Agashe wrote: Thanks Stack, Josh and Andrew for your suggestions and concerns. I share Stack's suggestions. This would be similar to hbase-thirdparty. The new repo could be hbase-hbck/hbase-hbck2. As this tool will be used by hbase users/ developers, hbase JIRA can be used for hbck issues. bq. How often does HBCK need to re-use methods and constants from code in hbase-common, hbase-server, etc? bq. Is it a goal to firm up API stability around this shared code. bq. If we do this can we also move out hbck version 1? As HBCK2 tool will be freshly written, we can try to achieve this goal. I think its great idea to move hbck1 to new repo as well. Though I think its more involved with hbck1 as the existing code already uses what it can from hbase-common and hbase-server etc. modules. bq. How often does HBCK make decisions on how to implement a correction based on some known functionality (e.g. a bug) in a specific version(s) of HBase. Concretely, would HBCK need to make corrections to an HBase installation that are specific to a subset of HBase 2.x.y versions that may not be valid for other 2.x.y versions? I see if this happens too often, compatibility metrics will be complicated. Thanks, Umesh On Tue, Jul 24, 2018 at 10:27 AM Andrew Purtell wrote: If we do this can we also move out hbck version 1? It would be really weird in my opinion to have v2 in a separate repo but v1 shipping with the 1.x releases. That would be a source of understandable confusion. I believe our compatibility guidelines allow us to upgrade interface annotations from private to LP or Public and from LP to Public. These are not changes that impact source or binary compatibility. They only change the promises we make going forward about their stability. I believe we can allow these in new minors, so we could potentially move hbck out in a 1.5.0. On Mon, Jul 23, 2018 at 4:46 PM Stack wrote: On Thu, Jul 19, 2018 at 2:09 PM Umesh Agashe wrote: Hi, I've had the opportunity to talk about HBCK2 with a few of you. One of the suggestions is to to have a separate git repository for HBCK2. Lets discuss about it. In the past when bugs were found in hbck, there is no easy way to release patched version of just hbck (without patching HBase). If HBCK2 has a separate git repo, HBCK2 versions will not be tightly related to HBase versions. Fixing and releasing hbck2, may not require patching HBase. Though tight coupling will be somewhat loosened, HBCK2 will still depend on HBase APIs/ code. Caution will be required going forward regarding compatibility. What you all think? I think this the way to go. We'd make a new hbase-hbck2 repo as we did for hbase-thirdparty? We'd use the hbase JIRA for hbase-hbck2 issues? We'd make hbase-hbck2 releases on occasion that the PMC voted on? Sounds great! St.Ack Thanks, Umesh JIRA: https://issues.apache.org/jira/browse/HBASE-19121. Doc: https://docs.google.com/document/d/1NxSFu4TKQ6lY-9J5qsCcJb9kZOnkfX66KMYsiVxBy0Y/edit?usp=sharing -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk
Re: HBase nightly job failing forever
> On Jul 25, 2018, at 10:48 AM, Chris Lambertus wrote: > > On-demand resources are certainly being considered (and we had these in the > past,) but I will point out that ephemeral (“on-demand”) cloud builds are in > direct opposition to some of the points brought up by Allen in the other > jenkins storage thread, in that they tend to rely on persistent object > storage in their workspaces to improve the efficiency of their builds. > Perhaps this would be less of an issue with an on-demand instance which would > theoretically have no resource contention? Likely. A lot of work went into greatly reducing the amount of time Hadoop spent in the build queue and running on the nodes. It was “the big one” but I feel like that’s not so true or at least harder to prove anymore. I estimate we shaved days off of the queue from 5 years ago. Part of that was keeping caches, since the ‘Hadoop’ queue nodes were large. But I feel like significantly more work went into “reducing the stupidity” out of the CI jobs though. Two examples: * For source changes, only building and unit testing the relevant parts of a patch. e.g., a patch that changes code in module A should only see module A’s unit tests run. Let the nightlies sort out any inter-module brokenness post-commit. * if a patch is for documentation, only run mvn site. If a patch is for shell code, only run shellcheck and relevant unit tests. Running the java unit tests is pointless. Building everything every time is a waste of time for modularized source trees. Combined with the walls put up around the docker containers (e.g., limiting how many processes can be launched at one time, memory limits, etc), I personally felt much better that, other than disk space, the Hadoop jobs were being exemplary citizens vs. pre-Yetus.
Re: HBase nightly job failing forever
> On Jul 25, 2018, at 10:34 AM, Andrew Purtell wrote: > > public clouds instead. I'm not sure if the ASF is set up to manage on > demand billing for test resources but this could be advantageous. It would > track actual usage not fixed costs. To avoid budget overrun there would be > caps and limits. Eventually demand would hit this new ceiling but the On-demand resources are certainly being considered (and we had these in the past,) but I will point out that ephemeral (“on-demand”) cloud builds are in direct opposition to some of the points brought up by Allen in the other jenkins storage thread, in that they tend to rely on persistent object storage in their workspaces to improve the efficiency of their builds. Perhaps this would be less of an issue with an on-demand instance which would theoretically have no resource contention? -Chris ASF Infra > -- > Best regards, > Andrew signature.asc Description: Message signed with OpenPGP
Re: HBase nightly job failing forever
Thanks Joan and Bertrand. > The number of failed builds in our stream that are directly related to this "tragedy of the commons" far exceeds the number of successful builds at this point, and unfortunately Travis CI is having parallel capacity issues that prevent us from moving to them wholesale as well. This has been my experience. So at one point years ago I moved my work off the shared pool at the ASF as an individual contributor and have been funding the testing I personally do up on EC2 out of pocket. This isn't a general solution for our project, though, as it depends on my time and ability to contribute, and focuses only on what I'm doing at the moment maybe not what the project would like to see happen most. I will look into targeted donation at my employer but am not optimistic. It might be better to look at decommissioning some if not most of the overutilized fixed test resources and use on demand executors launched on public clouds instead. I'm not sure if the ASF is set up to manage on demand billing for test resources but this could be advantageous. It would track actual usage not fixed costs. To avoid budget overrun there would be caps and limits. Eventually demand would hit this new ceiling but the impact would be longer queue waiting times not job failures due to environmental stress, so that would be an improvement. Each job would run in its own virtual server or container so would be free of many of the environmental issues we see now. Or to get the same improvement on the resources we have now limit executor parallelism. Better to have a job wait in queue than to run and fail anyway because the host environment is under stress. For what it's worth. On Wed, Jul 25, 2018 at 10:20 AM Joan Touzet wrote: > I'll speak to CouchDB - the donation is directly in the form of a Jenkins > build agent with our tag, no money is changed hands. The donator received > a letter from fundraising@a.o allowing for tax deduction on the equivalent > amount that the ASF leasing the machine would have cost for a year's > donation. We have 24x7 support on the node from the provider, who performs > all sysadmin (rather than burdening Infra with having to run puppet on our > build machine). This was arranged so we could have a FreeBSD node in the > build array. > > We have another donator in the wings who will be adding a build node for > us; at that point, we expect to move all of our builds to our own Jenkins > build agents and won't be in the common pool any longer. The number of > failed builds in our stream that are directly related to this "tragedy of > the commons" far exceeds the number of successful builds at this point, > and unfortunately Travis CI is having parallel capacity issues that prevent > us from moving to them wholesale as well. > > -Joan > > - Original Message - > From: "Andrew Purtell" > To: ipv6g...@gmail.com > Cc: "Andrew Purtell" , "dev" , > bui...@apache.org > Sent: Wednesday, July 25, 2018 12:22:08 PM > Subject: Re: HBase nightly job failing forever > > How does a targeted hardware donation work? I was under the impression that > targeted donations are not accepted by the ASF. Maybe it is different in > infrastructure, but this is the first time I've heard of it. Who does the > donation on those projects? DataStax for Cassandra? Who for CouchDB? Google > for Beam? By what process are the donations made and how are they audited > to confirm the donation is spent on the desired resources? Can we get a > contact for one of them for testimonial regarding this process? Is this > process documented? > > > > > On Tue, Jul 24, 2018 at 4:27 PM Gav wrote: > > > Hi Andrew, > > > > On Wed, Jul 25, 2018 at 3:21 AM Andrew Purtell > > wrote: > > > >> Thanks for this note. > >> > >> I'm release managing the 1.4 release. I have been running the unit test > >> suite on reasonably endowed EC2 instances and there are no observed > always > >> failing tests. A few can be flaky. In comparison the Apache test > resources > >> have been heavily resource constrained for years and frequently suffer > from > >> environmental effects like botched settings, disk space issues, and > >> contention with other test executors. > >> > > > > Our Jenkins nodes are configured via puppet these days and are pretty > > stable, to which settings do you know of that might (still) be botched? > > Yes, resources are shared and on occasion run to capacity. This is one > > reason for my initial mail - these HBase builds are consuming 10 or more > > executors > > -at the same time- and are starving executors for other builds. The fact > > these tests have been failing for well over a month and that you mention > > below will be > > ignoring them does not make for good cross ASF community spirit, we are > > all in this together and every little bit helps. This is not a target at > > one project, others > > will be getting a similar note and I hope we can come to a resolution > > suitable for all. > > Disk space
Re: HBase nightly job failing forever
I'll speak to CouchDB - the donation is directly in the form of a Jenkins build agent with our tag, no money is changed hands. The donator received a letter from fundraising@a.o allowing for tax deduction on the equivalent amount that the ASF leasing the machine would have cost for a year's donation. We have 24x7 support on the node from the provider, who performs all sysadmin (rather than burdening Infra with having to run puppet on our build machine). This was arranged so we could have a FreeBSD node in the build array. We have another donator in the wings who will be adding a build node for us; at that point, we expect to move all of our builds to our own Jenkins build agents and won't be in the common pool any longer. The number of failed builds in our stream that are directly related to this "tragedy of the commons" far exceeds the number of successful builds at this point, and unfortunately Travis CI is having parallel capacity issues that prevent us from moving to them wholesale as well. -Joan - Original Message - From: "Andrew Purtell" To: ipv6g...@gmail.com Cc: "Andrew Purtell" , "dev" , bui...@apache.org Sent: Wednesday, July 25, 2018 12:22:08 PM Subject: Re: HBase nightly job failing forever How does a targeted hardware donation work? I was under the impression that targeted donations are not accepted by the ASF. Maybe it is different in infrastructure, but this is the first time I've heard of it. Who does the donation on those projects? DataStax for Cassandra? Who for CouchDB? Google for Beam? By what process are the donations made and how are they audited to confirm the donation is spent on the desired resources? Can we get a contact for one of them for testimonial regarding this process? Is this process documented? On Tue, Jul 24, 2018 at 4:27 PM Gav wrote: > Hi Andrew, > > On Wed, Jul 25, 2018 at 3:21 AM Andrew Purtell > wrote: > >> Thanks for this note. >> >> I'm release managing the 1.4 release. I have been running the unit test >> suite on reasonably endowed EC2 instances and there are no observed always >> failing tests. A few can be flaky. In comparison the Apache test resources >> have been heavily resource constrained for years and frequently suffer from >> environmental effects like botched settings, disk space issues, and >> contention with other test executors. >> > > Our Jenkins nodes are configured via puppet these days and are pretty > stable, to which settings do you know of that might (still) be botched? > Yes, resources are shared and on occasion run to capacity. This is one > reason for my initial mail - these HBase builds are consuming 10 or more > executors > -at the same time- and are starving executors for other builds. The fact > these tests have been failing for well over a month and that you mention > below will be > ignoring them does not make for good cross ASF community spirit, we are > all in this together and every little bit helps. This is not a target at > one project, others > will be getting a similar note and I hope we can come to a resolution > suitable for all. > Disk space issues , yes, not on most of the Hadoop and related projects > nodes - H0-H12 do not have disk space issues. As a Hadoop related project > HBase should really be concentrating its builds there. > > >> I think a 1.4 release will happen regardless of the job test results on >> Apache infrastructure. I tend to ignore them as noisy and low signal. >> Others in the HBase community don't necessarily feel the same, so please >> don't take my viewpoint as particularly representative. We could try Alan's >> suggestion first, before ignoring them outright. >> > > No problem > > >> Has anyone given thought toward expanding the pool of test build >> resources? Or roping in cloud instances on demand? Jenkins has support for >> that. >> > > We have currently 19 Hadoop specific nodes available H0-H19 and another 28 > or so general use 'ubuntu' nodes for all to use. In addition we have > projects > that have targetted donated resources and the likes of Cassandra, CouchDB > and Beam all have multiple nodes on which they have priority. I'll throw an > idea > out there than perhaps HBase could do something similar to increase our > node pool and at the same time have priority on a few nodes f their own via > a targeted > hardware donation. > Cloud on demand has been tried a year or two ago, we will revisit this > also soon. > > Summary then, we currently have over 80 nodes connected to our Jenkins > master - what figure did you have in mind when you say 'expanding the pool > of test build resources' ? > > Thanks > > Gav... > > >> >> On Tue, Jul 24, 2018 at 9:16 AM Allen Wittenauer >> wrote: >> >>> I suspect the bigger issue is that the hbase tests are running >>> on the ‘ubuntu’ machines. Since they only have ~300GB for workspaces, the >>> hbase tests are eating a significant majority of it and likely could be >>> dying randomly due to space issues. [All the hbase
Re: HBase nightly job failing forever
Hi, On Wed, Jul 25, 2018 at 6:22 PM Andrew Purtell wrote: > ...How does a targeted hardware donation work? I was under the impression that > targeted donations are not accepted by the ASF This has changed, last year IIRC - there's a bit of information at https://www.apache.org/foundation/contributing under "targeted sponsor program". I suppose fundraising@a.o is best for more specific questions. Targeted sponsors are listed at http://www.apache.org/foundation/thanks.html -Bertrand
Re: HBase nightly job failing forever
How does a targeted hardware donation work? I was under the impression that targeted donations are not accepted by the ASF. Maybe it is different in infrastructure, but this is the first time I've heard of it. Who does the donation on those projects? DataStax for Cassandra? Who for CouchDB? Google for Beam? By what process are the donations made and how are they audited to confirm the donation is spent on the desired resources? Can we get a contact for one of them for testimonial regarding this process? Is this process documented? On Tue, Jul 24, 2018 at 4:27 PM Gav wrote: > Hi Andrew, > > On Wed, Jul 25, 2018 at 3:21 AM Andrew Purtell > wrote: > >> Thanks for this note. >> >> I'm release managing the 1.4 release. I have been running the unit test >> suite on reasonably endowed EC2 instances and there are no observed always >> failing tests. A few can be flaky. In comparison the Apache test resources >> have been heavily resource constrained for years and frequently suffer from >> environmental effects like botched settings, disk space issues, and >> contention with other test executors. >> > > Our Jenkins nodes are configured via puppet these days and are pretty > stable, to which settings do you know of that might (still) be botched? > Yes, resources are shared and on occasion run to capacity. This is one > reason for my initial mail - these HBase builds are consuming 10 or more > executors > -at the same time- and are starving executors for other builds. The fact > these tests have been failing for well over a month and that you mention > below will be > ignoring them does not make for good cross ASF community spirit, we are > all in this together and every little bit helps. This is not a target at > one project, others > will be getting a similar note and I hope we can come to a resolution > suitable for all. > Disk space issues , yes, not on most of the Hadoop and related projects > nodes - H0-H12 do not have disk space issues. As a Hadoop related project > HBase should really be concentrating its builds there. > > >> I think a 1.4 release will happen regardless of the job test results on >> Apache infrastructure. I tend to ignore them as noisy and low signal. >> Others in the HBase community don't necessarily feel the same, so please >> don't take my viewpoint as particularly representative. We could try Alan's >> suggestion first, before ignoring them outright. >> > > No problem > > >> Has anyone given thought toward expanding the pool of test build >> resources? Or roping in cloud instances on demand? Jenkins has support for >> that. >> > > We have currently 19 Hadoop specific nodes available H0-H19 and another 28 > or so general use 'ubuntu' nodes for all to use. In addition we have > projects > that have targetted donated resources and the likes of Cassandra, CouchDB > and Beam all have multiple nodes on which they have priority. I'll throw an > idea > out there than perhaps HBase could do something similar to increase our > node pool and at the same time have priority on a few nodes f their own via > a targeted > hardware donation. > Cloud on demand has been tried a year or two ago, we will revisit this > also soon. > > Summary then, we currently have over 80 nodes connected to our Jenkins > master - what figure did you have in mind when you say 'expanding the pool > of test build resources' ? > > Thanks > > Gav... > > >> >> On Tue, Jul 24, 2018 at 9:16 AM Allen Wittenauer >> wrote: >> >>> I suspect the bigger issue is that the hbase tests are running >>> on the ‘ubuntu’ machines. Since they only have ~300GB for workspaces, the >>> hbase tests are eating a significant majority of it and likely could be >>> dying randomly due to space issues. [All the hbase workspace directories + >>> the yetus-m2 shared mvn cache dirs easily consume 20%+ of the space. >>> Significantly more than the 50 or so other jobs that run on those >>> machines.] >>> >>> By comparison, most of the ‘Hadoop’ nodes have 2-3TB for the big >>> jobs to consume…. >>> >>> >>> > On Jul 24, 2018, at 8:58 AM, Josh Elser wrote: >>> > >>> > Yep, sadly this is a very long tent-pole for us. There are many >>> involved who have invested countless hours in making this better. >>> > >>> > Specific to that job you linked earlier, 3 test failures out of our >>> total 4958 tests (0.06% failure rate) is all but "green" in my mind. I >>> would ask that you keep that in mind, too. >>> > >>> > To that extent, others have also built another job specifically to >>> find tests which are failing intermittently: >>> https://builds.apache.org/job/HBase-Find-Flaky-Tests/25513/artifact/dashboard.html. >>> I mention this as evidence to prove to you that this is not a baseless >>> request from the HBase PMC ;) >>> > >>> > On 7/24/18 3:14 AM, Gav wrote: >>> >> Ok, good enough, will wait, please also note 'master' branch and a few >>> >> others have been failing for over a month also. >>> >> I will check in again next month to see
[jira] [Resolved] (HBASE-20746) Release 2.1.0
[ https://issues.apache.org/jira/browse/HBASE-20746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-20746. --- Resolution: Fixed > Release 2.1.0 > - > > Key: HBASE-20746 > URL: https://issues.apache.org/jira/browse/HBASE-20746 > Project: HBase > Issue Type: Umbrella >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > > After HBASE-20708 I do no think we will have unresolvable problems for 2.1.0 > release any more. So let's create a issue to track the release processing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] test-for-tests in precommit
circling back on this, be aware that precommit has been updated so that test-for-tests won't vote -1 on a contribution now. if the plugin can't find tests it'll give an advisory -0. On Fri, Jul 13, 2018 at 10:28 AM, Sean Busbey wrote: > Hi folks! > > Given how often we end up accepting contributions despite > test-for-tests complaining about lack of changed or new tests would > anyone be opposed to me changing its vote from -1 to -0? > > The rationale for discounting its -1 looked reasonable in the issues I > sampled. It usually was either some change that fixes a problem we > can't test due to limitations in our test suite or an optimization > that's covered by existing tests. > > Maybe in the future if we get to a point where we're including nightly > feature-specific cluster tests we could update it to recognize changes > to that and then turn it back to having a vote that can fail the > precommit test run. > > -- > Sean
[jira] [Created] (HBASE-20940) HStore.cansplit should not be allow split to happen if it has references
Vishal Khandelwal created HBASE-20940: - Summary: HStore.cansplit should not be allow split to happen if it has references Key: HBASE-20940 URL: https://issues.apache.org/jira/browse/HBASE-20940 Project: HBase Issue Type: Bug Affects Versions: 1.3.2 Reporter: Vishal Khandelwal Assignee: Vishal Khandelwal When split happens ans immediately another split happens, it may result into a split of a region who still has references to its parent. More details about scenario can be found here HBASE-20933 HStore.hasReferences should check from fs.storefile rather than in memory objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20939) There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException
Duo Zhang created HBASE-20939: - Summary: There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException Key: HBASE-20939 URL: https://issues.apache.org/jira/browse/HBASE-20939 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang This is very typical usage in our procedure implementation, for example, in AssignProcedure, we will call AM.queueAssign and then suspend ourselves to wait until the AM finish processing our assign request. But there could be races. Think of this: 1. We call suspendIfNotReady on a event, and it returns true so we need to wait. 2. The event has been waked up, and the procedure will be added back to the scheduler. 3. A worker picks up the procedure and finishes it. 4. We finally throw ProcedureSuspendException and the ProcedureExecutor suspend us and store the state in procedure store. So we have a half done procedure in the procedure store for ever... This may cause assertion when loading procedures. And maybe the worker can not finish the procedure as when suspending we need to restore some state, for example, add something to RootProcedureState. But anyway, it will still lead to assertion or other unexpected errors. And this can not be done by simply adding a lock in the procedure, as most works are done in the ProcedureExecutor after we throw ProcedureSuspendException. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: HBase nightly job failing forever
On Wed, Jul 25, 2018 at 2:36 AM Robert Munteanu wrote: > Hi, > > On Wed, 2018-07-25 at 09:27 +1000, Gav wrote: > > Disk space issues , yes, not on most of the Hadoop and related > > projects > > nodes - H0-H12 do not have disk space issues. As a Hadoop related > > project > > HBase should really be concentrating its builds there. > > A suggestion from the sidelines. We could add a 'large-disk-space' > label to the jobs that use a lot of disk space and then also attach it > to the executors that offer a lot of disk space. > Sure ... though I think it is the small disk that is the odd-man-out. The nodes usually have lots of space, but a some few don't. And we're also provisioning with much larger disks nowadays. Cheers, -g
Re: HBase nightly job failing forever
Hi, On Wed, 2018-07-25 at 09:27 +1000, Gav wrote: > Disk space issues , yes, not on most of the Hadoop and related > projects > nodes - H0-H12 do not have disk space issues. As a Hadoop related > project > HBase should really be concentrating its builds there. A suggestion from the sidelines. We could add a 'large-disk-space' label to the jobs that use a lot of disk space and then also attach it to the executors that offer a lot of disk space. Robert
[jira] [Created] (HBASE-20938) Set version to 2.1.1-SNAPSHOT for branch-2.1
Duo Zhang created HBASE-20938: - Summary: Set version to 2.1.1-SNAPSHOT for branch-2.1 Key: HBASE-20938 URL: https://issues.apache.org/jira/browse/HBASE-20938 Project: HBase Issue Type: Sub-task Components: build Reporter: Duo Zhang Assignee: Duo Zhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20746) Release 2.1.0
[ https://issues.apache.org/jira/browse/HBASE-20746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang reopened HBASE-20746: --- > Release 2.1.0 > - > > Key: HBASE-20746 > URL: https://issues.apache.org/jira/browse/HBASE-20746 > Project: HBase > Issue Type: Umbrella >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > > After HBASE-20708 I do no think we will have unresolvable problems for 2.1.0 > release any more. So let's create a issue to track the release processing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20937) Update the support matrix in our ref guide about the recent hadoop releases
Duo Zhang created HBASE-20937: - Summary: Update the support matrix in our ref guide about the recent hadoop releases Key: HBASE-20937 URL: https://issues.apache.org/jira/browse/HBASE-20937 Project: HBase Issue Type: Task Components: documentation Reporter: Duo Zhang Fix For: 3.0.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)