Massimo,

yes there are a ton of issues in my example - but it was a run-one-
time upgrade script that i monitored for completion....and at the same
time was not overly concerned if a couple of records fell through the
cracks.  all valid and correct points though for building something
that runs repeatedly and is expected to be 100% accurate.  I just
wanted to provide an example of something simple that uses the task
queue to run in a script-like fashion.

and yes, i relied on GAE's retry on fail - even though i catch the
timeout exception sometimes it was not thrown with enough time to save
and re-queue the task before being killed.

re development:  you never get the timeout exception on the
development server either, so it's kinda hard to test all of this
out.  for this one i created a second application in GAE, used the
bulkloader tool to download production data and copy it to my new test
environment, and then tested it there.  i'm sure google does not
approve of this approach though.

cfh

On May 11, 4:30 pm, mdipierro <[email protected]> wrote:
> How safe is this approach? You process 100 records ate the time and
> call the function itself again until done. If this fails for any
> reason (like time constraints imposed by GAE or other GAE db access
> failure), this is not going to call itself again and it will never
> complete. Is it a possibility?
>
> Another issue: your  looping condition is the creation time of the
> record. That works. If the condition where different nothing would
> prevent a new record to be inserted by a different user that would
> fall thought the cracks in the list of records that where already
> processed. This only works for simple queries.
>
> Massimo
>
> On May 11, 5:38 pm, howesc <[email protected]> wrote:
>
> > yes, it applies to the dev environment as well.
>
> > i have not used bulk inserts yet, but they would have to run as a
> > controller that is accessible via URL.  (at least as far as i can
> > tell).  Even the bulkloader.py tool distributed with the SDK talks to
> > a particular app URL and does everything in 30 second chunks.
>
> > here is an example of an upgrade i had to do where i needed to add a
> > new field to the database, and make sure all existing records had a
> > default value for that field (because queries with filters on unset
> > values don't work on google app engine).  I ran it by hitting the URL,
> > and when it ran out of time it just queued itself to keep going.  if
> > you want to cron it look 
> > athttp://code.google.com/appengine/docs/python/config/cron.html
>
> > i put the controller in default.py, so it was accessed at
> > <appname>.appspot.com/app/default/task.html
>
> > ----
> > def task():
> >     """
> >     today (19-mar-2010) we are going to use this to add
> > processed_audio
> >     field to recordings
> >     """
> >     return
> >     from google.appengine.api.labs import taskqueue
> >     from google.appengine.runtime import DeadlineExceededError
> >     logging.info("in da task")
>
> >     last_id = request.vars.id.split(".")[0]
>
> >     rows = db((db.recording.created_time >=
> > last_id)).select(db.recording.media,
>
> > orderby=db.recording.created_time,
> >                                             limitby=(0, 100))
> >     try:
> >         #update processed_audio
> >         for r in rows:
> >             media_ids = r['media'].split('|')
> >             processed = False
> >             if media_ids:
> >                  processed=True
>
> > db(db.recording.id==r.id).update(processed_audio=processed)
> >             last_id = r.created_time
> >     except DeadlineExceededError:
> >         logging.info("cutoff at %s" %repr(last_id))
> >         taskqueue.add(url='/default/task', params={'id': last_id})
> >         return
>
> >     if len(rows) < 100:
> >         logging.info("no more rows to process")
> >         return
>
> >     logging.info("finished at %s" %repr(last_id))
> >     taskqueue.add(url='/default/task', params={'id': last_id})
> >     return
> > ----
>
> > good luck!
>
> > cfh
>
> > On May 10, 7:14 pm, Matthew <[email protected]> wrote:
>
> > > Does this apply to the dev environment as well? Just fire it up and
> > > run it via localhost?
>
> > > If that's the case, would you mind providing an example or the proper
> > > documentation link to help me get started?
>
> > > Also, since bulk inserts are now possible in 
> > > GAEhttp://groups.google.com/group/web2py/browse_thread/thread/93d3dad847...,
> > > does that mean they're only possible from within the application
> > > itself (not via script)?
>
> > > Thanks,
> > > Matthew
>
> > > On May 10, 7:04 pm, howesc <[email protected]> wrote:
>
> > > > everything on GAE must be called via a URL (it must be a controller/
> > > > function).  if you need to run it periodically look up how to do cron
> > > > on GAE and create your cron.yaml.
>
> > > > @auth.requires_membership() is your friend in this case to limit who
> > > > can call your controller. :)
>
> > > > i have several of these sorts of things running on GAE, and it seems
> > > > to work quite well.
>
> > > > On May 9, 7:14 pm, mdipierro <[email protected]> wrote:
>
> > > > > no
>
> > > > > On May 9, 8:33 pm, Matthew <[email protected]> wrote:
>
> > > > > > You can run a script with Postgres or MySQL using this syntax:
>
> > > > > >     python web2py.py -S myapp -M -R applications/myapp/modules/
> > > > > > myscript.py
>
> > > > > > Can a script be run in this way using App Engine as the datastore?

Reply via email to