Re: [Pulp-dev] #2186 (pulp-manage-db) options

Brian Bouterse Tue, 13 Dec 2016 14:14:41 -0800

I've added an implementation plan as a comment [0] along with checklist
items. It has not yet been groomed, but I think it is ready to be.


It was tempting to make this comment the body of the issue, but I went for
a comment instead. I think issue descriptions should describe the problem
not the solution. I did put an update on the body identifying the comment
specifically as the implementation plan which I think is a healthy practice.

[0]: https://pulp.plan.io/issues/2186#note-21

On Tue, Dec 13, 2016 at 2:25 PM, Michael Hrivnak <[email protected]>
wrote:

> That all sounds great. I think it would be fine as a single issue, since
> the increase in heartbeat frequency is specifically motivated by this use
> case, even if it may have minor benefit in other ways.
>
> Michael
>
> On Tue, Dec 13, 2016 at 2:04 PM, Brian Bouterse <[email protected]>
> wrote:
>
>> Great, let's go with option 2. I plan to update the story with details,
>> but a few more questions first.
>>
>> To support this plan we have to make a change to the celerybeat heartbeat
>> timings. I think we would do well to make the heartbeats occur more
>> frequently across the board and consistent.
>>
>> Currently, pulp workers and resource manager heartbeat every 30 seconds,
>> and the resource manager heartbeats every 90 seconds. I propose we make all
>> 3 of these components heartbeat every 20 seconds. That way when we have
>> pulp-manage-db wait up to 60 seconds after the most recent timestamp, at
>> least 2 are expected to have occurred. The existing timings for failover
>> can remain unchanged. It also sets us up nicely to have Pulp failover
>> quicker in the future, but that is separate change.
>>
>> So two questions... Does ^ all still sound good? Would you want that as 2
>> issues (one for timings change) with a blocking relationship, or do you
>> want it all recorded as a single issue?
>>
>> Thank you for the feedback, I think we're almost at the detailed plan we
>> need!
>>
>> Brian
>>
>>
>>
>> On Sun, Dec 11, 2016 at 10:36 PM, Michael Hrivnak <[email protected]>
>> wrote:
>>
>>> I like option 2 a lot. I'm in favor of revising the story to have it
>>> implement this. It should display to the user how long it's going to wait,
>>> as in: "Waiting 17 more seconds to ensure no Pulp workers are running." I
>>> don't think we need a fancy countdown of any kind. Maybe print a "." every
>>> second if you want extra credit.
>>>
>>> Option 1 could make it more difficult for someone to automate the
>>> upgrade. The overall experience of option 2 seems more user-friendly.
>>>
>>> I like that both of these can be implemented and make a difference for a
>>> user's next upgrade, rather than needing to use another upgrade just to get
>>> data quality straightened out.
>>>
>>> Thanks for adding new ideas!
>>>
>>> Michael
>>>
>>> On Fri, Dec 9, 2016 at 4:07 PM, Brian Bouterse <[email protected]>
>>> wrote:
>>>
>>>> Looking at the process list is easy, but Pulp is a clustered software
>>>> so it isn't effective for those users. In terms of the value in checking
>>>> for running workers, a clustered installation is really where a user would
>>>> especially want this feature. Let's think through some more ideas. Here are
>>>> two options, I'm a big fan of option 2.
>>>>
>>>> Option 1:
>>>> * bring the feature back that checks the database (this is easy because
>>>> we already have the code)
>>>> * have the check consider worker records dead if they are older than 1
>>>> minute (not 5)
>>>> * also update the worker failover time to be 1 minute instead of 5
>>>> (also easy, and not a requirement since the worker check and worker
>>>> failover do not have to use the same timing)
>>>> * do not prompt for y/N but instead gave the user output and accepted
>>>> an --ignore-workers flag to override
>>>> * Make the output from the check be more verbose and identify the
>>>> problem, the argument they could pass, and explain that they can wait 60
>>>> seconds, and also list the worker entries that are still running.
>>>>
>>>> Option 2:
>>>> As a variation on the above, instead of failing and having a flag, have
>>>> pulp-manage-db wait until 1-minute after the latest timestamp and check the
>>>> db again. This would potentially be < 20 seconds allowing 40 seconds for
>>>> package upgrade. This would be a great user experience because there are no
>>>> flags, it would just work and take a little longer.
>>>>
>>>> I think a user who sees an error they experience during upgrade and can
>>>> wait 20ish seconds for it to be confirmed as a real issue is an ok user
>>>> experience. I think an interactive user would even be hard pressed to
>>>> experience this delay by upgrading the packages and running pulp-manage-db
>>>> in under 60 seconds.
>>>>
>>>> What do you all think about ^?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Dec 8, 2016 at 5:25 PM, Michael Hrivnak <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 8, 2016 at 1:49 PM, Bihan Zhang <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> The changes made for #2186 [0] was pulled from the 2.11.0 release
>>>>>> yesterday, and we should talk about how to implement it for 2.12
>>>>>>
>>>>>> From what I can see there are 2 ways to move forward with #2186
>>>>>>
>>>>>> *1.* We can fix the pulp worker db record cleanup so that
>>>>>> pulp_celerybeat exits cleanly (aka put this back: [1]) and make new 
>>>>>> changes
>>>>>> to clean up pulp_workers with a SIGTERM handler in a 2.11.z release.
>>>>>> We can then re-revert the commit and put the feature back in 2.12
>>>>>> with little effort.
>>>>>>
>>>>>
>>>>> We would also need a way for pulp-manage-db to know if a record in the
>>>>> Worker collection was created by a version of pulp that does not do full
>>>>> cleanup. As described in a previous email, one way would be to add some
>>>>> kind of new field to the Worker model, such as the version of pulp the
>>>>> worker was running.
>>>>>
>>>>>
>>>>>>
>>>>>> The original reason #2186 was implemented using the db records was so
>>>>>> we can support a clustered pulp installation.
>>>>>> But this approach would make migration to 2.12 more difficult, since
>>>>>> users now have to upgrade to the 2.11.z release first before going to 
>>>>>> 2.12
>>>>>>
>>>>>>
>>>>> Perhaps the original intent was lost. This feature was motivated
>>>>> primarily by katello deployments (all of which are single-system), where
>>>>> users either forgot to stop pulp services or didn't know that was a
>>>>> requirement before running pulp-manage-db. The normal upgrade process for
>>>>> them handles it automatically, but when manual intervention is required,
>>>>> that's where we've seen people running pulp-manage-db while pulp services
>>>>> are running.
>>>>>
>>>>>
>>>>>> *2. *We can rethinking our approach to #2186 and perform the check
>>>>>> against the process list
>>>>>> Upgrade-wise implementing it this way is a lot easier for users,
>>>>>> since they can do a straight upgrade to 2.12 without going through an
>>>>>> intermediary release.
>>>>>> The downside is that in clustered environments this would not catch
>>>>>> every potential error. But #2186 is a best effort story, and if this is 
>>>>>> the
>>>>>> best effort I am ok with it.
>>>>>>
>>>>>
>>>>> This is easy and effective. I think we should do it, and
>>>>> augment/replace it in the future with the worker checking.
>>>>>
>>>>>
>>>>>
>>>>>> Regardless of which option we go with I think we should get the pulp
>>>>>> worker db cleanup in.
>>>>>> We should also have a --ignore-running-worker flag [2] to prevent
>>>>>> automated upgrade problems.
>>>>>>
>>>>>>
>>>>>> [0] https://pulp.plan.io/issues/2186
>>>>>> [1] https://github.com/werwty/pulp/commit/4f43a85dd568f4a0b5
>>>>>> 0ae9b07bbec7138861e92b#diff-80e8b96df1f5da9a551bb6ff18dea352
>>>>>> [2] https://pulp.plan.io/issues/2469
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> [email protected]
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> [email protected]
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> [email protected]
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>>
>>
>

_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev

Re: [Pulp-dev] #2186 (pulp-manage-db) options

Reply via email to