so it turns out that GAE itself fails when i pass an iterator over a large list to gae.delete(). so i've tweaked the implementation to not call count, but to still count the number of entries deleted and it seems to be working.
suggested patch included in http://code.google.com/p/web2py/issues/detail?id=1134 thanks! cfh On Saturday, October 20, 2012 6:18:23 PM UTC-7, howesc wrote: > > sure. i'll make a patch soon... > > thanks for the input! > > cfh > > On 10/20/12 13:29 , Massimo Di Pierro wrote: > > I meant to skip count. > > > > On Saturday, 20 October 2012 15:28:56 UTC-5, Massimo Di Pierro wrote: > >> > >> How about adding a gae only parameter to the gae adapter_args that > tells > >> it to skip fetch? > >> > >> On Saturday, 20 October 2012 11:25:51 UTC-5, howesc wrote: > >>> > >>> It appears that the most efficient way to delete on app engine is to: > >>> - build a query object, like we are doing now > >>> - call run with keys_only=True ( > >>> > https://developers.google.com/appengine/docs/python/datastore/queryclass#Query_run) > > > >>> which returns an iterator. > >>> - pass that iterator to the datastore delete method ( > >>> > https://developers.google.com/appengine/docs/python/datastore/functions#delete > > >>> ) > >>> > >>> this avoids the cost of loading the rows into memory, decreases the > >>> likelihood of timeout, and has the cost of 1 datastore small operation > per > >>> row. but it does prevent us from getting a count of rows deleted. > >>> > >>> the way we do it now: > >>> - run count() on the query. this has a cost (time and money) of > >>> iterating over all the rows that match the query on GAE (1 datastore > small > >>> operation per row) > >>> - run fetch(limit=1000) and call delete() successively until no more > >>> rows. this has the cost of running a full query (at least 1 datastore > read > >>> operation per row) and loading the result set into memory and then > deleting > >>> the results. > >>> > >>> in my case i'm timing out on the count() call so i don't even start > the > >>> delete. from an efficiency standpoint i'd rather have more rows > deleted > >>> for less cost then get a count....but this may not be acceptable for > all. > >>> at a minimum i think we should switch to use keys_only=True for the > fetch, > >>> and skip the leading count() call and just sum the number of times we > call > >>> fetch. we may also consider catching the datastore timeout error and > >>> trying to handle a partial delete more gracefully (or continue to let > the > >>> user catch the error). > >>> > >>> what is the "right" approach for web2py? if the approach with count > is > >>> correct, could i propose a gae bulk_delete method that does not return > >>> count but uses my first method? > >>> > >>> thanks for the input! > >>> > >>> cfh > >>> > >>> On Saturday, October 20, 2012 7:58:56 AM UTC-7, Massimo Di Pierro > wrote: > >>>> > >>>> Delete should return the number of deleted records. What is your > >>>> proposal? > >>>> > >>>> On Wednesday, 17 October 2012 17:30:22 UTC-5, howesc wrote: > >>>>> > >>>>> Hi all, > >>>>> > >>>>> I'm trying to clean up old expired sessions.....but i waited a long > >>>>> time to get to this and now my GAE delete is just timing out. > Reading the > >>>>> GAE docs, there appears to be some improvements that we can make to > the > >>>>> query delete method on GAE that will make it faster and cheaper. > what we > >>>>> lose then is the count of the number of rows deleted. > >>>>> > >>>>> my question is, does having a db(db.table.something==True).delete() > >>>>> that does not return a count break the web2py API contract, or break > >>>>> anyone's applications? > >>>>> > >>>>> thanks, > >>>>> > >>>>> christian > >>>>> > >>>> > > > --

