Bump
On 27 February 2011 20:32, David Mitchell <[email protected]> wrote:
> Hello everyone,
>
> I've read through the message archive and there seems to be a fairly clear
> message: don't using the multiprocessing module within web2py.
>
> However, I'm hoping I might have a use case that's a bit different...
>
> I've got an app that basically does analytics on moderately large datasets.
> I've got a number of controller methods that look like the following:
>
> def my_method():
> # Note: all data of interest has previously been loaded into
> 'session.data'
> results = []
> d = local_import('analysis')
> results += d.my_1st_analysis_method(session)
> results += d.my_2nd_analysis_method(session, date=date)
> results += d.my_3rd_analysis_method(session)
> results += d.my_4th_analysis_method(session, date=date)
> results += d.my_5th_analysis_method(session, date=date)
> return dict(results=results)
>
> The problem I have is that all of the methods in my 'analysis' module, when
> run in sequence as per the above, simply take too long to execute and give
> me a browser timeout. I can mitigate this to some extent by extending the
> timeout on my browser, but I need to be able to use an iPad's Safari browser
> and it appears to be impossible to increase the browser timeout on the iPad.
> Even if it can be done, that approach seems pretty ugly and I'd rather not
> have to do it. What I really want to do is run all of these analysis
> methods *simultaneously*, capturing the results of each analysis_method into
> a single variable once they've finished.
>
> All of the methods within the 'analysis' module are designed to run
> concurrently - although they reference session variables, I've consciously
> avoided updating any session variables within any of these methods. While
> all the data is stored in a database, it's loaded into a session variable
> (session.data) before my_method is called; this data never gets changed as
> part of the analysis.
>
> Is it reasonable to replace the above code with something like this:
>
> def my_method():
> import multiprocessing
> d = local_import('analysis')
>
> tasks = [
> ('job': 'd.my_1st_analysis_method', 'params': ['session']),
> ('job': 'd.my_2nd_analysis_method', 'params': ['session',
> 'date=date']),
> ('job': 'd.my_3rd_analysis_method', 'params': ['session']),
> ('job': 'd.my_4th_analysis_method', 'params': ['session',
> 'date=date']),
> ('job': 'd.my_5th_analysis_method', 'params': ['session',
> 'date=date']),
> ]
>
> task_queue = multiprocessing.Queue()
> for t in tasks:
> task_queue.put(t['job'])
>
> result_queue = multiprocessing.Queue()
>
> for t in tasks:
> args = (arg for arg in t['params'])
> worker = multiprocessing.Worker(work_queue, result_queue,
> args=args)
> worker.start()
>
> results = []
> while len(results) < len(tasks):
> result = result_queue.get()
> results.append(result)
>
> return dict(results=results)
>
> Note: I haven't tried anything using the multiprocessing module before, so
> if you've got any suggestions as to how to improve the above code, I'd
> greatly appreciate it...
>
> Is introducing multiprocessing as I've outlined above a reasonable way to
> optimise code in this scenario, or is there something in web2py that makes
> this a bad idea? If it's a bad idea, do you have any suggestions what else
> I could try?
>
> Thanks in advance
>
> David Mitchell
>