Re: [web2py] how to loop through tables with 1M records?

2012-10-21 Thread Niphlod
a subtle bug appearead on the last patch, please re-download scheduler.py, should all be ok right now (as of revision 1cc2decfddb4ec2b9a2cd8e098754504856f1990) On Sunday, October 21, 2012 4:10:07 AM UTC+2, Adi wrote: hmm... seems like we still have the same problem, unless i was supposed to

Re: [web2py] how to loop through tables with 1M records?

2012-10-21 Thread Adnan Smajlovic
will do it right now :) On Sun, Oct 21, 2012 at 10:30 AM, Niphlod niph...@gmail.com wrote: a subtle bug appearead on the last patch, please re-download scheduler.py, should all be ok right now (as of revision 1cc2decfddb4ec2b9a2cd8e098754504856f1990) On Sunday, October 21, 2012 4:10:07 AM

Re: [web2py] how to loop through tables with 1M records?

2012-10-21 Thread Adnan Smajlovic
Confirming that it works PERFECT :) handling all three queues (groups) concurrently as it should. now loading first 600k tasks to see if it will degrade performance, and if ok, then couple more around 2-3M each... Niphlod and Massimo, thank you! When did you plan to include new scheduler into

Re: [web2py] how to loop through tables with 1M records?

2012-10-21 Thread Massimo Di Pierro
It is already in web2py 2.2.1 ;-) On Sunday, 21 October 2012 14:06:53 UTC-5, Adi wrote: Confirming that it works PERFECT :) handling all three queues (groups) concurrently as it should. now loading first 600k tasks to see if it will degrade performance, and if ok, then couple more

Re: [web2py] how to loop through tables with 1M records?

2012-10-20 Thread Niphlod
no prio available (it's hard to manage a task queued 3 hours ago with prio 7 comes before of after one with prio 8 queued 2 hours ago ?). hackish way: tasks are picked up ordered by next_run_time. So, queue your tasks with next_runtime = request.now - datetime.timedelta(hours=1) kinda

Re: [web2py] how to loop through tables with 1M records?

2012-10-20 Thread Adnan Smajlovic
all clear :) in process of implementing. Is new api defined in scheduler.py, since i don't see it in there (2.1.1 (2012-10-17 17:00:46) dev), but I'm modifying the existing code to employ fast_track, since order confirmations are getting behind. This will be really good :) Thanks again, and

Re: [web2py] how to loop through tables with 1M records?

2012-10-20 Thread Adnan Smajlovic
couple things happened... The main group worker got created even though I didn't call it... Not sure why, but i guess because there are lot of leftover tasks queued (500k) and some were assigned when I stopped the process. Even though fast_tack worker started, nothing is getting picked, assigned

Re: [web2py] how to loop through tables with 1M records?

2012-10-20 Thread Adnan Smajlovic
i can confirm that size of the queued records has something to do with delay to process different queues... once i deleted all outstanding records from main group, fast_track group started working as expected... sorry for a long thread, but i think it's a very neat idea to load scheduler with lots

Re: [web2py] how to loop through tables with 1M records?

2012-10-20 Thread Niphlod
You're right, there's a bug: for zillions of queued tasks at the same priority (i.e. got queued first) the bunch we assign on every loop doesn't take into account that there might be 10 or 20 tasks to assign and execute on a faster pace in the following bunch(es). Nice catch! reviewing the

Re: [web2py] how to loop through tables with 1M records?

2012-10-20 Thread Niphlod
just sent the patch to Massimo. If you're in a hurry, as soon as it is committed just replace your gluon/scheduler.py with the one from trunk Thanks for pointing out this misbehaviour of the scheduler. On Saturday, October 20, 2012 8:25:12 PM UTC+2, Niphlod wrote: You're right, there's a

Re: [web2py] how to loop through tables with 1M records?

2012-10-20 Thread Niphlod
The main group worker got created even though I didn't call it... Not sure why, but i guess because there are lot of leftover tasks queued (500k) and some were assigned when I stopped the process. Remind that a group_name for tasks is required for the scheduler to work. However, the

Re: [web2py] how to loop through tables with 1M records?

2012-10-20 Thread Adnan Smajlovic
will try to replacing scheduler.py in production and load some serious data again, since all is setup there for the full process, so we can have a real test :) I understand the concept with main being the default group, but wasn't sure if I was doing something wrong. All clear now :) Thanks for

Re: [web2py] how to loop through tables with 1M records?

2012-10-20 Thread Adnan Smajlovic
hmm... seems like we still have the same problem, unless i was supposed to copy more files than just scheduler.py loaded around 12,000 records into slow_track, while fast_track has very few, but some should be executed by now... 3 workers are properly running (main, slow_track, fast_track), but

[web2py] how to loop through tables with 1M records?

2012-10-19 Thread Adi
I just tried to perform a select() on a mysql table with 700k records. All available memory on my mac was consumed to the point that I had to kill the process. I reduced number of fields just to id, and got the results after some time, but wondering if there is some better approach in order

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Vasile Ermicioi
_last_id = 0 _items_per_page=1000 for row in db(db.table.id_last_id).select(limitby=(0,_items_per_page), orderby=db.table.id): #do something _last_id = row.id --

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Adi
Thank you Vasile for your quick response. This will be perfect. On Friday, October 19, 2012 2:02:41 PM UTC-4, Vasile Ermicioi wrote: _last_id = 0 _items_per_page=1000 for row in db(db.table.id_last_id).select(limitby=(0,_items_per_page), orderby=db.table.id): #do something

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Niphlod
the set returned by select is always a full result set, because it is extracted and parsed alltogether. Slicing with limits is good (and recommended, if possible). Just remember that you can save a lot of time and memory passing cacheable=True to the select() function. References will be

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Adnan Smajlovic
I'm afraid, limitby will not work, since it returns limited set, and I guess it's not possible to dynamically change the limit, so I'll have to sort of loop through some kind of subqueries, or use the original query with limited set of fields (takes 60secs for 700k records, not ready to test on

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Vasile Ermicioi
_last_id = 0 _items_per_page=1000 for row in db(db.table.id_last_id).select(limitby=(0,_items_per_page), orderby=db.table.id): #do something _last_id = row.id have you tried it and it doesn't work? do you understand the logic? --

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Vasile Ermicioi
increase _items_per_page to 20 000 --

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Adnan Smajlovic
Yes Vasile, I tried, and understand the logic... May change it slightly and use it as subquery with an offset. The problem is that I'm dealing with legacy tables that go up to 3 million rows, and have lot of columns that need to be checked, so your solution will work, and I will be loading data in

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Vasile Ermicioi
_last_id = 0 _items_per_page=1000 for row in db(db.table.id_last_id).select(limitby=(0,_items_per_page), orderby=db.table.id): #do something _last_id = row.id you don;t need to change anything to load all data, this code is loading everything in slices as you need, all records

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Adnan Smajlovic
i put it exactly as it is, but it stopped working after 1000 records... will double check again. On Fri, Oct 19, 2012 at 3:47 PM, Vasile Ermicioi elff...@gmail.com wrote: _last_id = 0 _items_per_page=1000 for row in db(db.table.id_last_id).select(limitby=(0,_items_per_page),

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Vasile Ermicioi
also _last_id = row.id after your code inside the loop is required --

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Niphlod
it's missing the outer loop. _last_id = 0 _items_per_page=1000 while True: rows = db(db.table.id_last_id).select(limitby=(0,_items_per_page), orderby=db.table.id) if len(rows) == 0: break for row in rows: #do something _last_id = row.id Should work. On Friday,

Re: [web2py] how to loop through tables with 1M records?

2012-10-19 Thread Adnan Smajlovic
Does work. Thank you both very much! Now that I have thousands of queued/backlogged tasks in a scheduler, I noticed that my regular tasks, which are of higher priority will be on hold until everything else gets processed. Maybe, it would be a good idea to have a field for a priority of a task?