Re: my weekend project: a streaming POP3 fetcher, replacing fetchmail/getmail

2021-04-07 Thread Kurt Hackenberg

On 2021/04/07 18:03, Cameron Simpson wrote:


Just tried it on the satellite link with an overnight load of messages,
normally a 10 minute exercise with getmail (give or take). 411 messages,
8.5 seconds.


Nice.

...regexps...are appalling for email addresses. When testing 
addresses, my filer does a correct address parse and compares the 
inner component (eg "c...@cskk.id.au" from "Cameron Simpson 
"), and can do set membership tests on that eg "is 
this address in this group?".
I agree that regexps on raw header values are not good enough for 
addresses, and that parsing and some kind of custom matching is the way 
to go. Set membership sounds useful.


Re: my weekend project: a streaming POP3 fetcher, replacing fetchmail/getmail

2021-04-07 Thread Cameron Simpson
On 06Apr2021 23:12, Kurt Hackenberg  wrote:
>On Wed, Apr 07, 2021 at 09:43:36AM +1000, Cameron Simpson wrote:
>>My new tool streams the fetches: it issues RETRs for every message up
>>front at maximum network speed - fully buffered and with no waits. A
>>parallel worker thread collects the messages as they come in at full
>>speed (the upstream server likely also gets to fully buffer); it issues
>>DELEtes as each message is saved, also fully buffered.
>
>Slick.  Clearly the right way to handle that high latency.

Just tried it on the satellite link with an overnight load of messages, 
normally a 10 minute exercise with getmail (give or take). 411 messages, 
8.5 seconds.

>Have you ever tried the program fdm?  It fetches, filters, and
>delivers mail, like getmail and procmail combined.  I haven't tried
>it, but it looks interesting.

I have not.

My mail filer is decoupled from my fetcher: it monitors spool Maildirs 
(which also means I can refile a message just by saving it to a spool 
Maildir).  And it has its own syntax to my liking; other tools 
inherently will not :-)

And looking at the conf file, it seems that (like procmail, which I 
abandoned years ago) it matches using regexps. These are appalling for 
email addresses. When testing addresses, my filer does a correct address 
parse and compares the inner component (eg "c...@cskk.id.au" from "Cameron 
Simpson "), and can do set membership tests on that eg 
"is this address in this group?".

>This paragraph in its manual sounds like it might stream fetching like
>your program:
>
>"fdm tries to queue a number of mails simultaneously, so that older
>can be delivered while waiting for the server to provide the next. The
>maximum length of the queue for each account is set by the
>'queue-high' option (the default is two) and the maximum mail size
>accepted by the 'maximum-size' option (the default is 32 MB). In
>addition, the 'rewrite' action requires an additional temporary
>mail. Although fdm will fail rather than dropping mail if the disk
>becomes full, users should bear in mind the possibility and set the
>size of the temporary directory and the fdm options according to their
>needs."

I've been reading the manual. I think it does not. I think it actually 
allows the filing/saving to proceeed while requesting/fetching the next 
message. So a simple form of parallelism, but not one which reduces the 
fetch latency between requests.

>fdm is at github:
>
>
>The paragraph quoted above is at about line 300 in the manual, which is here:
>

Thanks. An interesting read.

Cheers,
Cameron Simpson 


Re: my weekend project: a streaming POP3 fetcher, replacing fetchmail/getmail

2021-04-06 Thread Kurt Hackenberg
On Wed, Apr 07, 2021 at 09:43:36AM +1000, Cameron Simpson wrote:

>My new tool streams the fetches: it issues RETRs for every message up
>front at maximum network speed - fully buffered and with no waits. A
>parallel worker thread collects the messages as they come in at full
>speed (the upstream server likely also gets to fully buffer); it issues
>DELEtes as each message is saved, also fully buffered.

Slick.  Clearly the right way to handle that high latency.

Have you ever tried the program fdm?  It fetches, filters, and
delivers mail, like getmail and procmail combined.  I haven't tried
it, but it looks interesting.

This paragraph in its manual sounds like it might stream fetching like
your program:

"fdm tries to queue a number of mails simultaneously, so that older
can be delivered while waiting for the server to provide the next. The
maximum length of the queue for each account is set by the
'queue-high' option (the default is two) and the maximum mail size
accepted by the 'maximum-size' option (the default is 32 MB). In
addition, the 'rewrite' action requires an additional temporary
mail. Although fdm will fail rather than dropping mail if the disk
becomes full, users should bear in mind the possibility and set the
size of the temporary directory and the fdm options according to their
needs."

fdm is at github:


The paragraph quoted above is at about line 300 in the manual, which is here:



my weekend project: a streaming POP3 fetcher, replacing fetchmail/getmail

2021-04-06 Thread Cameron Simpson
Like several here, I fetch email from my ISP mail spool(s) and file 
messages locally. If my laptop's been offline overnight there can be 
hundreds of messages to fetch when I wake it up. On a satellite link 
(geostationary) with a ping time of over 600ms this can be many minutes 
of tedium.

The raw bandwidth is fine and my filing process is pretty expeditious;
the root cause of that tedium is network latency and the synchronous 
behaviour of getmail. Its cycle is like this:

- RETRieve the message, collect and save
- DELEte the message
- repeat for all the messages, then QUIT to commit the deletes

Each of steps 1 and 2 above incurs over a second just in network 
latency. That scales up over hundreds of messages.

My new tool streams the fetches: it issues RETRs for every message up 
front at maximum network speed - fully buffered and with no waits. A 
parallel worker thread collects the messages as they come in at full 
speed (the upstream server likely also gets to fully buffer); it issues 
DELEtes as each message is saved, also fully buffered.

The code's here:

https://hg.sr.ht/~cameron-simpson/css/browse/lib/python/cs/pop3.py?rev=tip

The cs.pop3 modules on PyPI here:

https://pypi.org/project/cs.pop3/

and can be installed with:

pip install cs.pop3

which also provides a "pop3" command in your Python environment.

Typical use is:

pop3 dl mylo...@mail.cskk.id.au ~/var/mail/spool

specifying my internet mail spool and the local Maildir to receive the 
messages.

Cheers,
Cameron Simpson 

If you cannot, in the long run, tell everyone what you have been doing, 
your doing has been worthless.   - Erwin Schrodinger