I've added an implementation plan as a comment [0] along with checklist items. It has not yet been groomed, but I think it is ready to be.
It was tempting to make this comment the body of the issue, but I went for a comment instead. I think issue descriptions should describe the problem not the solution. I did put an update on the body identifying the comment specifically as the implementation plan which I think is a healthy practice. [0]: https://pulp.plan.io/issues/2186#note-21 On Tue, Dec 13, 2016 at 2:25 PM, Michael Hrivnak <[email protected]> wrote: > That all sounds great. I think it would be fine as a single issue, since > the increase in heartbeat frequency is specifically motivated by this use > case, even if it may have minor benefit in other ways. > > Michael > > On Tue, Dec 13, 2016 at 2:04 PM, Brian Bouterse <[email protected]> > wrote: > >> Great, let's go with option 2. I plan to update the story with details, >> but a few more questions first. >> >> To support this plan we have to make a change to the celerybeat heartbeat >> timings. I think we would do well to make the heartbeats occur more >> frequently across the board and consistent. >> >> Currently, pulp workers and resource manager heartbeat every 30 seconds, >> and the resource manager heartbeats every 90 seconds. I propose we make all >> 3 of these components heartbeat every 20 seconds. That way when we have >> pulp-manage-db wait up to 60 seconds after the most recent timestamp, at >> least 2 are expected to have occurred. The existing timings for failover >> can remain unchanged. It also sets us up nicely to have Pulp failover >> quicker in the future, but that is separate change. >> >> So two questions... Does ^ all still sound good? Would you want that as 2 >> issues (one for timings change) with a blocking relationship, or do you >> want it all recorded as a single issue? >> >> Thank you for the feedback, I think we're almost at the detailed plan we >> need! >> >> Brian >> >> >> >> On Sun, Dec 11, 2016 at 10:36 PM, Michael Hrivnak <[email protected]> >> wrote: >> >>> I like option 2 a lot. I'm in favor of revising the story to have it >>> implement this. It should display to the user how long it's going to wait, >>> as in: "Waiting 17 more seconds to ensure no Pulp workers are running." I >>> don't think we need a fancy countdown of any kind. Maybe print a "." every >>> second if you want extra credit. >>> >>> Option 1 could make it more difficult for someone to automate the >>> upgrade. The overall experience of option 2 seems more user-friendly. >>> >>> I like that both of these can be implemented and make a difference for a >>> user's next upgrade, rather than needing to use another upgrade just to get >>> data quality straightened out. >>> >>> Thanks for adding new ideas! >>> >>> Michael >>> >>> On Fri, Dec 9, 2016 at 4:07 PM, Brian Bouterse <[email protected]> >>> wrote: >>> >>>> Looking at the process list is easy, but Pulp is a clustered software >>>> so it isn't effective for those users. In terms of the value in checking >>>> for running workers, a clustered installation is really where a user would >>>> especially want this feature. Let's think through some more ideas. Here are >>>> two options, I'm a big fan of option 2. >>>> >>>> Option 1: >>>> * bring the feature back that checks the database (this is easy because >>>> we already have the code) >>>> * have the check consider worker records dead if they are older than 1 >>>> minute (not 5) >>>> * also update the worker failover time to be 1 minute instead of 5 >>>> (also easy, and not a requirement since the worker check and worker >>>> failover do not have to use the same timing) >>>> * do not prompt for y/N but instead gave the user output and accepted >>>> an --ignore-workers flag to override >>>> * Make the output from the check be more verbose and identify the >>>> problem, the argument they could pass, and explain that they can wait 60 >>>> seconds, and also list the worker entries that are still running. >>>> >>>> Option 2: >>>> As a variation on the above, instead of failing and having a flag, have >>>> pulp-manage-db wait until 1-minute after the latest timestamp and check the >>>> db again. This would potentially be < 20 seconds allowing 40 seconds for >>>> package upgrade. This would be a great user experience because there are no >>>> flags, it would just work and take a little longer. >>>> >>>> I think a user who sees an error they experience during upgrade and can >>>> wait 20ish seconds for it to be confirmed as a real issue is an ok user >>>> experience. I think an interactive user would even be hard pressed to >>>> experience this delay by upgrading the packages and running pulp-manage-db >>>> in under 60 seconds. >>>> >>>> What do you all think about ^? >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Dec 8, 2016 at 5:25 PM, Michael Hrivnak <[email protected]> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Dec 8, 2016 at 1:49 PM, Bihan Zhang <[email protected]> >>>>> wrote: >>>>> >>>>>> The changes made for #2186 [0] was pulled from the 2.11.0 release >>>>>> yesterday, and we should talk about how to implement it for 2.12 >>>>>> >>>>>> From what I can see there are 2 ways to move forward with #2186 >>>>>> >>>>>> *1.* We can fix the pulp worker db record cleanup so that >>>>>> pulp_celerybeat exits cleanly (aka put this back: [1]) and make new >>>>>> changes >>>>>> to clean up pulp_workers with a SIGTERM handler in a 2.11.z release. >>>>>> We can then re-revert the commit and put the feature back in 2.12 >>>>>> with little effort. >>>>>> >>>>> >>>>> We would also need a way for pulp-manage-db to know if a record in the >>>>> Worker collection was created by a version of pulp that does not do full >>>>> cleanup. As described in a previous email, one way would be to add some >>>>> kind of new field to the Worker model, such as the version of pulp the >>>>> worker was running. >>>>> >>>>> >>>>>> >>>>>> The original reason #2186 was implemented using the db records was so >>>>>> we can support a clustered pulp installation. >>>>>> But this approach would make migration to 2.12 more difficult, since >>>>>> users now have to upgrade to the 2.11.z release first before going to >>>>>> 2.12 >>>>>> >>>>>> >>>>> Perhaps the original intent was lost. This feature was motivated >>>>> primarily by katello deployments (all of which are single-system), where >>>>> users either forgot to stop pulp services or didn't know that was a >>>>> requirement before running pulp-manage-db. The normal upgrade process for >>>>> them handles it automatically, but when manual intervention is required, >>>>> that's where we've seen people running pulp-manage-db while pulp services >>>>> are running. >>>>> >>>>> >>>>>> *2. *We can rethinking our approach to #2186 and perform the check >>>>>> against the process list >>>>>> Upgrade-wise implementing it this way is a lot easier for users, >>>>>> since they can do a straight upgrade to 2.12 without going through an >>>>>> intermediary release. >>>>>> The downside is that in clustered environments this would not catch >>>>>> every potential error. But #2186 is a best effort story, and if this is >>>>>> the >>>>>> best effort I am ok with it. >>>>>> >>>>> >>>>> This is easy and effective. I think we should do it, and >>>>> augment/replace it in the future with the worker checking. >>>>> >>>>> >>>>> >>>>>> Regardless of which option we go with I think we should get the pulp >>>>>> worker db cleanup in. >>>>>> We should also have a --ignore-running-worker flag [2] to prevent >>>>>> automated upgrade problems. >>>>>> >>>>>> >>>>>> [0] https://pulp.plan.io/issues/2186 >>>>>> [1] https://github.com/werwty/pulp/commit/4f43a85dd568f4a0b5 >>>>>> 0ae9b07bbec7138861e92b#diff-80e8b96df1f5da9a551bb6ff18dea352 >>>>>> [2] https://pulp.plan.io/issues/2469 >>>>>> >>>>>> _______________________________________________ >>>>>> Pulp-dev mailing list >>>>>> [email protected] >>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Pulp-dev mailing list >>>>> [email protected] >>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> Pulp-dev mailing list >>> [email protected] >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >>> >> >
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
