[web2py] Question about my current processing design

Phyo Arkar Wed, 24 Nov 2010 22:31:48 -0800

I have a concern on my current design, not strictly web2py related :

My processor engine have to parse huge list of files (10 GB or more)


Current architecture make use of multiprocessing module to split work load
across 4 processess of parsers.
In those processes i utilize multiprocessing queues to retrive processed
data into lists after each of them done.
and i merge those lists before inserting them into database.

Since i read multiproccessing queues do not work well with DAL (or
PostGreSQL only?) .
i do not put results directly into database , but only after all parsing
processes are done , and after results from different processes are  merged.

i do this :

attachs=marge_processed_results(parsed_results)

casedb=DAL(...)
[casesdb.attach_data.insert( **attach ) for attach in a['attachs']]

casedb.commit()

My concerns are :

1. As i store all parsed results inside a list[], if parsed data (all
textual data from that 10GB of files) is huge > 4GB (as server max Ram is
4GB) . will it exhausted all ram?
2. inserting each parsed_results into database and commiting is better idea?
or commit all after all  parsed_results are?
3. Will it cause problems with multiprocessing if i do
"casesdb.attach_data.insert( **attach )" inside sub-processess?

[web2py] Question about my current processing design

Reply via email to