Massimo, yes there are a ton of issues in my example - but it was a run-one- time upgrade script that i monitored for completion....and at the same time was not overly concerned if a couple of records fell through the cracks. all valid and correct points though for building something that runs repeatedly and is expected to be 100% accurate. I just wanted to provide an example of something simple that uses the task queue to run in a script-like fashion.
and yes, i relied on GAE's retry on fail - even though i catch the timeout exception sometimes it was not thrown with enough time to save and re-queue the task before being killed. re development: you never get the timeout exception on the development server either, so it's kinda hard to test all of this out. for this one i created a second application in GAE, used the bulkloader tool to download production data and copy it to my new test environment, and then tested it there. i'm sure google does not approve of this approach though. cfh On May 11, 4:30 pm, mdipierro <[email protected]> wrote: > How safe is this approach? You process 100 records ate the time and > call the function itself again until done. If this fails for any > reason (like time constraints imposed by GAE or other GAE db access > failure), this is not going to call itself again and it will never > complete. Is it a possibility? > > Another issue: your looping condition is the creation time of the > record. That works. If the condition where different nothing would > prevent a new record to be inserted by a different user that would > fall thought the cracks in the list of records that where already > processed. This only works for simple queries. > > Massimo > > On May 11, 5:38 pm, howesc <[email protected]> wrote: > > > yes, it applies to the dev environment as well. > > > i have not used bulk inserts yet, but they would have to run as a > > controller that is accessible via URL. (at least as far as i can > > tell). Even the bulkloader.py tool distributed with the SDK talks to > > a particular app URL and does everything in 30 second chunks. > > > here is an example of an upgrade i had to do where i needed to add a > > new field to the database, and make sure all existing records had a > > default value for that field (because queries with filters on unset > > values don't work on google app engine). I ran it by hitting the URL, > > and when it ran out of time it just queued itself to keep going. if > > you want to cron it look > > athttp://code.google.com/appengine/docs/python/config/cron.html > > > i put the controller in default.py, so it was accessed at > > <appname>.appspot.com/app/default/task.html > > > ---- > > def task(): > > """ > > today (19-mar-2010) we are going to use this to add > > processed_audio > > field to recordings > > """ > > return > > from google.appengine.api.labs import taskqueue > > from google.appengine.runtime import DeadlineExceededError > > logging.info("in da task") > > > last_id = request.vars.id.split(".")[0] > > > rows = db((db.recording.created_time >= > > last_id)).select(db.recording.media, > > > orderby=db.recording.created_time, > > limitby=(0, 100)) > > try: > > #update processed_audio > > for r in rows: > > media_ids = r['media'].split('|') > > processed = False > > if media_ids: > > processed=True > > > db(db.recording.id==r.id).update(processed_audio=processed) > > last_id = r.created_time > > except DeadlineExceededError: > > logging.info("cutoff at %s" %repr(last_id)) > > taskqueue.add(url='/default/task', params={'id': last_id}) > > return > > > if len(rows) < 100: > > logging.info("no more rows to process") > > return > > > logging.info("finished at %s" %repr(last_id)) > > taskqueue.add(url='/default/task', params={'id': last_id}) > > return > > ---- > > > good luck! > > > cfh > > > On May 10, 7:14 pm, Matthew <[email protected]> wrote: > > > > Does this apply to the dev environment as well? Just fire it up and > > > run it via localhost? > > > > If that's the case, would you mind providing an example or the proper > > > documentation link to help me get started? > > > > Also, since bulk inserts are now possible in > > > GAEhttp://groups.google.com/group/web2py/browse_thread/thread/93d3dad847..., > > > does that mean they're only possible from within the application > > > itself (not via script)? > > > > Thanks, > > > Matthew > > > > On May 10, 7:04 pm, howesc <[email protected]> wrote: > > > > > everything on GAE must be called via a URL (it must be a controller/ > > > > function). if you need to run it periodically look up how to do cron > > > > on GAE and create your cron.yaml. > > > > > @auth.requires_membership() is your friend in this case to limit who > > > > can call your controller. :) > > > > > i have several of these sorts of things running on GAE, and it seems > > > > to work quite well. > > > > > On May 9, 7:14 pm, mdipierro <[email protected]> wrote: > > > > > > no > > > > > > On May 9, 8:33 pm, Matthew <[email protected]> wrote: > > > > > > > You can run a script with Postgres or MySQL using this syntax: > > > > > > > python web2py.py -S myapp -M -R applications/myapp/modules/ > > > > > > myscript.py > > > > > > > Can a script be run in this way using App Engine as the datastore?

