Re: [openstack-dev] [trove] scheduled tasks redux
On Jan 23, 2014, at 3:41 PM, Michael Basnight mbasni...@gmail.com wrote: Will we be doing more complex things than every day at some time? ie, does the user base see value in configuring backups every 12th day of every other month? I think this is easy to write the schedule code, but i fear that it will be hard to build a smarter scheduler that would only allow X tasks in a given hour for a window. If we limit to daily at X time, it seems easier to estimate how a given window for backup will look for now and into the future given a constant user base :P Plz note, I think its viable to schedule more than 1 per day, in cron * 0,12 or * */12. Will we be using this as a single task service as well? So if we assume the first paragraph is true, that tasks are scheduled daily, single task services would be scheduled once, and could use the same crontab fields. But at this point, we only really care about the minute, hour, and _frequency_, which is daily or once. Feel free to add 12 scheduled tasks for every 2 hours if you want to back it up that often, or a single task as * 0/2. From the backend, i see that as 12 tasks created, one for each 2 hours. I hadn't really considered anything but repeated use, so that's a good point. I'll have to think on that more. I do think that the frequency won't only be daily or once. It's not uncommon to have weekly or monthly maintenance tasks, which it was my understanding was something we wanted to cover with this spec. I'll do some research to see if there is a suitable standard format besides cron that works well for both repeated and scheduled singular tasks. But this doesnt take into mind windows, when you say you want a cron style 2pm backup, thats really just during some available window. Would it make more sense for an operator to configure a time window, and then let users choose a slot within a time window (and say there are a finite number of slots in a time window). The slotting would be done behind the scenes and a user would only be able to select a window, and if the slots are all taken, it wont be shown in the get available time windows. the available time windows could be smart, in that, your avail time window _could be_ based on the location of the hardware your vm is sitting on (or some other rule…). Think network saturation if everyone on host A is doing a backup to swift. I don't think having windows will solve as much as we hope it will, and it's a tricky problem to get right as the number of tasks that can run per window is highly variable. I'll have to gather my thoughts on this more and post another message when I've got something more to say than my gut says this doesn't feel right. Greg ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [trove] scheduled tasks redux
Would it make more sense for an operator to configure a time window, and then let users choose a slot within a time window (and say there are a finite number of slots in a time window). The slotting would be done behind the scenes and a user would only be able to select a window, and if the slots are all taken, it wont be shown in the get available time windows. the available time windows could be smart, in that, your avail time window _could be_ based on the location of the hardware your vm is sitting on (or some other rule…). Think network saturation if everyone on host A is doing a backup to swift. Allowing operators to define time windows seems preferable to me; I think a cron like system might be too granular. Having windows seems easier to schedule and would enable an operator to change things in a pinch. From: Michael Basnight [mbasni...@gmail.com] Sent: Thursday, January 23, 2014 3:41 PM To: OpenStack Development Mailing List Subject: Re: [openstack-dev] [trove] scheduled tasks redux On Jan 23, 2014, at 12:20 PM, Greg Hill wrote: The blueprint is here: https://wiki.openstack.org/wiki/Trove/scheduled-tasks So I have basically two questions: 1. Does anyone see a problem with defining the repeating options as a single field rather than multiple fields? Im fine w/ a single field, but more explanation below. 2. Should we use the crontab format for this or is that too terse? We could go with a more fluid style like Every Wednesday/Friday/Sunday at 12:00PM but that's English-centric and much more difficult to parse programmatically. I'd welcome alternate suggestions. Will we be doing more complex things than every day at some time? ie, does the user base see value in configuring backups every 12th day of every other month? I think this is easy to write the schedule code, but i fear that it will be hard to build a smarter scheduler that would only allow X tasks in a given hour for a window. If we limit to daily at X time, it seems easier to estimate how a given window for backup will look for now and into the future given a constant user base :P Plz note, I think its viable to schedule more than 1 per day, in cron * 0,12 or * */12. Will we be using this as a single task service as well? So if we assume the first paragraph is true, that tasks are scheduled daily, single task services would be scheduled once, and could use the same crontab fields. But at this point, we only really care about the minute, hour, and _frequency_, which is daily or once. Feel free to add 12 scheduled tasks for every 2 hours if you want to back it up that often, or a single task as * 0/2. From the backend, i see that as 12 tasks created, one for each 2 hours. But this doesnt take into mind windows, when you say you want a cron style 2pm backup, thats really just during some available window. Would it make more sense for an operator to configure a time window, and then let users choose a slot within a time window (and say there are a finite number of slots in a time window). The slotting would be done behind the scenes and a user would only be able to select a window, and if the slots are all taken, it wont be shown in the get available time windows. the available time windows could be smart, in that, your avail time window _could be_ based on the location of the hardware your vm is sitting on (or some other rule…). Think network saturation if everyone on host A is doing a backup to swift. /me puts down wrench ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [trove] scheduled tasks redux
Will we be doing more complex things than every day at some time? ie, does the user base see value in configuring backups every 12th day of every other month? I think this is easy to write the schedule code, but i fear that it will be hard to build a smarter scheduler that would only allow X tasks in a given hour for a window. If we limit to daily at X time, it seems easier to estimate how a given window for backup will look for now and into the future given a constant user base :P Plz note, I think its viable to schedule more than 1 per day, in cron * 0,12 or * */12. Scheduling tasks on something other than a daily basis can be a legitimate requirement. It's not uncommon for organizations to have audit dates where they need to be able to snapshot their data on non-daily, regular intervals (quarterly, annually, etc.). I also like the idea of windows. I know that one of the features that has also been requested that might be satisfied by this is allowing operators to define maintenance windows for users to select. Maintenance could simply be a task that a user schedules in an available window. If the concern over allowing hard times for scheduled tasks, backups as the given example, is saturation of external resources like networking or Swift then it might be more beneficial to use windows as a way of scheduling the availability of task artifacts rather than the task itself. When I used to work in government/higher education, for example, there were multiple dates throughout the year where the organization was mandated by the state to provide figures for particular calendar dates. The systems they used to manage these figures typically did not provide any means of retrieving historic values (think trying to audit the state of a trove instance at a specific point in the past). As a result they would use automated backups to create a snapshot of their data for the state mandated reporting. For them, the backup had to begin a 00:00 because waiting until 04:00 would result in skewed figures. I'm not certain how common this scenario is in other industries, but I thought I should mention it. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [trove] scheduled tasks redux
The blueprint is here: https://wiki.openstack.org/wiki/Trove/scheduled-tasks I've been working on the REST API portion of this project, and as I was working on the client, a part of it didn't sit quite right. As it is specified, it calls for two fields to define when and how often to run the task: frequency : hourly|daily|weekly|monthly, time_window:2012-03-28T22:00Z/2012-03-28T23:00Z, The concept of combining two datetimes into a single field feels awkward when using the API from a client perspective. I originally thought I'd just split it into two date times, but that still felt wrong. We did some internal discussion here at Rackspace, and it was brought up that the date doesn't actually matter in this scenario. All we care about is what time to run the task and how frequently to repeat it. Apparently some of the original discussion was more around a crontab style entry, but with a time window rather than a fixed time, but that didn't get put into the spec. For those who might be wondering, the purpose of the window rather than a fixed time is to give some leeway to the system to not overload things when everyone wants to fire off a backup at midnight. There could be a configurable minimum window size that defaulted to 2 hours, so by default we'd only guarantee a task was run within a two hour window, which could be adjusted up or down by operators. So I have basically two questions: 1. Does anyone see a problem with defining the repeating options as a single field rather than multiple fields? 2. Should we use the crontab format for this or is that too terse? We could go with a more fluid style like Every Wednesday/Friday/Sunday at 12:00PM but that's English-centric and much more difficult to parse programmatically. I'd welcome alternate suggestions. Thanks in advance. Greg ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev