2) could be a compatibility layer on top of google's task queue.

google's task queue? on GAE?

for many orginizations and companys , GAE or any other public clouds are not
an option. They want their sensitive data to be private, and paranoid to go
it online. The firm i am working on is building their own private cloud and
for their clients. My extractor will also need to run over it too , later.

So many private or goverment organisation , they will want non public cloud
related technologies.


On Sun, Nov 21, 2010 at 7:49 AM, Phyo Arkar <[email protected]>wrote:

> For my problem i have to use multiprocessing + Queue is:
>
> I am developing a file indexer for a Digital Forensic Firm which extract
> huge archive of files , pst (ms Outlook archives), Read them . parse them ,
> extract Mime and Indexable text out of many different type of documents , as
> much as possible. And put them into the database for later indexing by
> search engine.
>
> Target is to process over 10GB of files at once.
>
> When we tested 800MB archives, takes over 20 mins (without multiprocessing)
> .
>
> machine is dual Core2 8GB DDR
>
> Problems are ,
> 1. it stop web2py's from responding both normal requests and ajax requests
> , that makes it impossible for showing progress to user.
> 2. it also stops Rocket's Keealive responds for 20 mins , making browser
> think its dead
>
> my solution is to separate all processing out into a separate python
> process, spawn it via subprocess.Popen , without waiting it to end.
>
> With that i can use multiprocessing module to spread loads across 4
> processess , without bombarding web2py.
> With that web2py can function welll.The process is running at background.
> and with Ajax i can make a progress bar by checking the progress report of
> the separated process.
>
> Problem there are :
>
> 1. As it is outside of web2py theres no way of knowing if there error
> occoured, unless i monitoring the output of web2py process.
> 2. The way to communicate back to web2py is via files. I wrote process
> progress or error to 4 different files , which ajax requests monitor them
> for progress.
> 3. Between 4 processes it is easy to communicate parsed results via
> multiprocessing Queue , but as it is outside of web2py scope , cannot
> communicate using Queue
> 4. Also DAL have to be use as outside of web2py to put back results into
> database , and its ugly.
>
>
> If tasks and ques of celery are integrated , there will be many benefits.
> Can easily load balance across multiple machines too. and communicate to and
> fro easily.
>
>
>
> On Sun, Nov 21, 2010 at 6:06 AM, mdipierro <[email protected]>wrote:
>
>> If we here to integrate a queue functionality in web2py what features
>> would you consider most valuable?
>>
>> I can three different applications:
>> 1) a message queue (one app sends a message, another one receives it)
>> 2) a task queue (an app can submit a task to be executed later): task
>> is executed by a remote process (or cloud)
>> 3) a task queue: task is executed by the app itself (in a separate
>> thread) but triggered by a queue callback.
>>
>> There is some overlap but they are subject to different optimizations.
>> 2) could be a compatibility layer on top of google's task queue.
>>
>
>

Reply via email to