For my problem i have to use multiprocessing + Queue is:

I am developing a file indexer for a Digital Forensic Firm which extract
huge archive of files , pst (ms Outlook archives), Read them . parse them ,
extract Mime and Indexable text out of many different type of documents , as
much as possible. And put them into the database for later indexing by
search engine.

Target is to process over 10GB of files at once.

When we tested 800MB archives, takes over 20 mins (without multiprocessing)
.

machine is dual Core2 8GB DDR

Problems are ,
1. it stop web2py's from responding both normal requests and ajax requests
, that makes it impossible for showing progress to user.
2. it also stops Rocket's Keealive responds for 20 mins , making browser
think its dead

my solution is to separate all processing out into a separate python
process, spawn it via subprocess.Popen , without waiting it to end.

With that i can use multiprocessing module to spread loads across 4
processess , without bombarding web2py.
With that web2py can function welll.The process is running at background.
and with Ajax i can make a progress bar by checking the progress report of
the separated process.

Problem there are :

1. As it is outside of web2py theres no way of knowing if there error
occoured, unless i monitoring the output of web2py process.
2. The way to communicate back to web2py is via files. I wrote process
progress or error to 4 different files , which ajax requests monitor them
for progress.
3. Between 4 processes it is easy to communicate parsed results via
multiprocessing Queue , but as it is outside of web2py scope , cannot
communicate using Queue
4. Also DAL have to be use as outside of web2py to put back results into
database , and its ugly.


If tasks and ques of celery are integrated , there will be many benefits.
Can easily load balance across multiple machines too. and communicate to and
fro easily.


On Sun, Nov 21, 2010 at 6:06 AM, mdipierro <[email protected]> wrote:

> If we here to integrate a queue functionality in web2py what features
> would you consider most valuable?
>
> I can three different applications:
> 1) a message queue (one app sends a message, another one receives it)
> 2) a task queue (an app can submit a task to be executed later): task
> is executed by a remote process (or cloud)
> 3) a task queue: task is executed by the app itself (in a separate
> thread) but triggered by a queue callback.
>
> There is some overlap but they are subject to different optimizations.
> 2) could be a compatibility layer on top of google's task queue.
>

Reply via email to