Re: [openstack-dev] [trove] scheduled tasks redux

2014-01-29 Thread Greg Hill

On Jan 23, 2014, at 3:41 PM, Michael Basnight mbasni...@gmail.com wrote:

 
 Will we be doing more complex things than every day at some time? ie, does 
 the user base see value in configuring backups every 12th day of every other 
 month? I think this is easy to write the schedule code, but i fear that it 
 will be hard to build a smarter scheduler that would only allow X tasks in a 
 given hour for a window. If we limit to daily at X time, it seems easier to 
 estimate how a given window for backup will look for now and into the future 
 given a constant user base :P Plz note, I think its viable to schedule more 
 than 1 per day, in cron * 0,12 or * */12.

 
 Will we be using this as a single task service as well? So if we assume the 
 first paragraph is true, that tasks are scheduled daily, single task services 
 would be scheduled once, and could use the same crontab fields. But at this 
 point, we only really care about the minute, hour, and _frequency_, which is 
 daily or once. Feel free to add 12 scheduled tasks for every 2 hours if you 
 want to back it up that often, or a single task as * 0/2. From the backend, i 
 see that as 12 tasks created, one for each 2 hours.

I hadn't really considered anything but repeated use, so that's a good point.  
I'll have to think on that more.  I do think that the frequency won't only be 
daily or once.  It's not uncommon to have weekly or monthly maintenance 
tasks, which it was my understanding was something we wanted to cover with this 
spec.   I'll do some research to see if there is a suitable standard format 
besides cron that works well for both repeated and scheduled singular tasks.

 But this doesnt take into mind windows, when you say you want a cron style 
 2pm backup, thats really just during some available window. Would it make 
 more sense for an operator to configure a time window, and then let users 
 choose a slot within a time window (and say there are a finite number of 
 slots in a time window). The slotting would be done behind the scenes and a 
 user would only be able to select a window, and if the slots are all taken, 
 it wont be shown in the get available time windows. the available time 
 windows could be smart, in that, your avail time window _could be_ based on 
 the location of the hardware your vm is sitting on (or some other rule…). 
 Think network saturation if everyone on host A is doing a backup to swift.

I don't think having windows will solve as much as we hope it will, and it's a 
tricky problem to get right as the number of tasks that can run per window is 
highly variable.  I'll have to gather my thoughts on this more and post another 
message when I've got something more to say than my gut says this doesn't feel 
right.

Greg
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [trove] scheduled tasks redux

2014-01-24 Thread Tim Simpson
  Would it make more sense for an operator to configure a time window, and 
 then let users choose a slot within a time window (and say there are a 
 finite number of slots in a time window). The slotting would be done behind 
 the scenes and a user would only be able to select a window, and if the 
 slots are all taken, it wont be shown in the get available time windows. 
 the available time windows could be smart, in that, your avail time 
 window _could be_ based on the location of the hardware your vm is sitting 
 on (or some other rule…). Think network saturation if everyone on host A is 
 doing a backup to swift.

Allowing operators to define time windows seems preferable to me; I think a 
cron like system might be too granular. Having windows seems easier to schedule 
and would enable an operator to change things in a pinch.

From: Michael Basnight [mbasni...@gmail.com]
Sent: Thursday, January 23, 2014 3:41 PM
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [trove] scheduled tasks redux

On Jan 23, 2014, at 12:20 PM, Greg Hill wrote:

 The blueprint is here:

 https://wiki.openstack.org/wiki/Trove/scheduled-tasks

 So I have basically two questions:

 1. Does anyone see a problem with defining the repeating options as a single 
 field rather than multiple fields?

Im fine w/ a single field, but more explanation below.

 2. Should we use the crontab format for this or is that too terse?  We could 
 go with a more fluid style like Every Wednesday/Friday/Sunday at 12:00PM 
 but that's English-centric and much more difficult to parse programmatically. 
  I'd welcome alternate suggestions.

Will we be doing more complex things than every day at some time? ie, does 
the user base see value in configuring backups every 12th day of every other 
month? I think this is easy to write the schedule code, but i fear that it will 
be hard to build a smarter scheduler that would only allow X tasks in a given 
hour for a window. If we limit to daily at X time, it seems easier to estimate 
how a given window for backup will look for now and into the future given a 
constant user base :P Plz note, I think its viable to schedule more than 1 per 
day, in cron * 0,12 or * */12.

Will we be using this as a single task service as well? So if we assume the 
first paragraph is true, that tasks are scheduled daily, single task services 
would be scheduled once, and could use the same crontab fields. But at this 
point, we only really care about the minute, hour, and _frequency_, which is 
daily or once. Feel free to add 12 scheduled tasks for every 2 hours if you 
want to back it up that often, or a single task as * 0/2. From the backend, i 
see that as 12 tasks created, one for each 2 hours.

But this doesnt take into mind windows, when you say you want a cron style 2pm 
backup, thats really just during some available window. Would it make more 
sense for an operator to configure a time window, and then let users choose a 
slot within a time window (and say there are a finite number of slots in a time 
window). The slotting would be done behind the scenes and a user would only be 
able to select a window, and if the slots are all taken, it wont be shown in 
the get available time windows. the available time windows could be smart, 
in that, your avail time window _could be_ based on the location of the 
hardware your vm is sitting on (or some other rule…). Think network saturation 
if everyone on host A is doing a backup to swift.

/me puts down wrench

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [trove] scheduled tasks redux

2014-01-24 Thread Kevin Conway

Will we be doing more complex things than every day at some time? ie,
does the user base see value in configuring backups every 12th day of
every other month? I think this is easy to write the schedule code, but i
fear that it will be hard to build a smarter scheduler that would only
allow X tasks in a given hour for a window. If we limit to daily at X
time, it seems easier to estimate how a given window for backup will look
for now and into the future given a constant user base :P Plz note, I
think its viable to schedule more than 1 per day, in cron * 0,12 or *
*/12.

Scheduling tasks on something other than a daily basis can be a legitimate
requirement. It's not uncommon for organizations to have audit dates
where they need to be able to snapshot their data on non-daily, regular
intervals (quarterly, annually, etc.).

I also like the idea of windows. I know that one of the features that has
also been requested that might be satisfied by this is allowing operators
to define maintenance windows for users to select. Maintenance could
simply be a task that a user schedules in an available window.

If the concern over allowing hard times for scheduled tasks, backups as
the given example, is saturation of external resources like networking or
Swift then it might be more beneficial to use windows as a way of
scheduling the availability of task artifacts rather than the task itself.
When I used to work in government/higher education, for example, there
were multiple dates throughout the year where the organization was
mandated by the state to provide figures for particular calendar dates.
The systems they used to manage these figures typically did not provide
any means of retrieving historic values (think trying to audit the state
of a trove instance at a specific point in the past). As a result they
would use automated backups to create a snapshot of their data for the
state mandated reporting. For them, the backup had to begin a 00:00
because waiting until 04:00 would result in skewed figures.

I'm not certain how common this scenario is in other industries, but I
thought I should mention it.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [trove] scheduled tasks redux

2014-01-23 Thread Greg Hill
The blueprint is here:

https://wiki.openstack.org/wiki/Trove/scheduled-tasks

I've been working on the REST API portion of this project, and as I was working 
on the client, a part of it didn't sit quite right.  As it is specified, it 
calls for two fields to define when and how often to run the task:

frequency : hourly|daily|weekly|monthly,
time_window:2012-03-28T22:00Z/2012-03-28T23:00Z,

The concept of combining two datetimes into a single field feels awkward when 
using the API from a client perspective.  I originally thought I'd just split 
it into two date times, but that still felt wrong.  We did some internal 
discussion here at Rackspace, and it was brought up that the date doesn't 
actually matter in this scenario.  All we care about is what time to run the 
task and how frequently to repeat it.  Apparently some of the original 
discussion was more around a crontab style entry, but with a time window rather 
than a fixed time, but that didn't get put into the spec.  For those who might 
be wondering, the purpose of the window rather than a fixed time is to give 
some leeway to the system to not overload things when everyone wants to fire 
off a backup at midnight.  There could be a configurable minimum window size 
that defaulted to 2 hours, so by default we'd only guarantee a task was run 
within a two hour window, which could be adjusted up or down by operators.

So I have basically two questions:

1. Does anyone see a problem with defining the repeating options as a single 
field rather than multiple fields?
2. Should we use the crontab format for this or is that too terse?  We could go 
with a more fluid style like Every Wednesday/Friday/Sunday at 12:00PM but 
that's English-centric and much more difficult to parse programmatically.  I'd 
welcome alternate suggestions.

Thanks in advance.

Greg
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev