On 2023-01-08 13:49:38 +0200, jacob kruger wrote: > Ok, the specific usage case right now is that I need to set up a process > pulling contents of e-mail messages from an IMAP protocol mail server, which > I then populate into a postgresql database, and, since this is the inbox of > a relatively large-scale CRM/support system, there are currently over 2.5 > million e-mails in the inbox, but, it can grow by over 50000 per day.
This is probably I/O-bound. You will likely spend much more time waiting
for the IMAP server or the database than parsing the messages. So you
probably don't need multi-processing just to utilize all your cores.
On the other hand you have some nicely separated task which can be
parallelized, so multi-threading should help (async probably would work
just as well or as badly as multi-threading but I find that harder to
understand so I would discard it at this point).
I might be mistaken, though: Depending on how much processing you need
to do on these messages it might be worth it split the work across
multiple processes. Check the CPU-usage of your process: If it's close
to 100% you will probably gain significantly from multi-processing.
> I already have the basic process operating, using imap_tools, but, wanted to
> enable you to query the process during run-time, without needing to either
> check logs, or query the database itself while it is on-the-go
[...]
> Also wanted to offer the ability to either pause, or terminate processes
> while it's busy batch processing large chunks of e-mail messages
So that would be an http (or other socket-based) interface? Should also
be possible to add as an additional thread (or process).
> So, I think that for now, threading is probably the simplest to look into.
I agree with that assessment.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | [email protected] | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
signature.asc
Description: PGP signature
-- https://mail.python.org/mailman/listinfo/python-list
