Auto saving attachment [was Re: Elimination of mime/html part]

2020-05-03 Thread Angel M Alganza

Hi again,

On Fri, May 01, 2020 at 11:20:54AM +1000, raf wrote:


I wrote a tool for replacing non-text attachments in email with text
versions. It's at http://raf.org/textmail.


As I said in my previous mail, textmail does a wonderful job reducing my
email list folders by eliminating all HTML from them.  But I also have
an archive Maildir folder with years of mail with attachments (PDF,
maybe some .DOC, too) that I'd like to have extracted from there as
independet files into an attachments directory before they get removed
or replaced with text, much like Eudora did (or does, I used it last on
1999).  Could textmail help with that part, too?  Or is there another
tool that you guys know of that could?

It'd be great if I could do the same for incoming mail.  At the moment,
I just download it with mbsync, read it with Mutt and manually save any
attachment I need/want to keep (files with forms I need to fill out or
print or sign, etc.) and then delete them from the mail to save space.
It'd be wonderfull if I could have the attached files saved into my
attachments folder and the either replaced by text or deleted by
textmail.  I'd love if it could be done via imapfilter directly to the
IMAP server (with the attached files being saved locally, of course),
but I could live if it needed to be done locally and then synced back to
the IMAP server.

Any idea on that, please?

Cheers,
Ángel


Re: Elimination of mime/html part

2020-05-03 Thread Angel M Alganza

Hi raf,

On Fri, May 01, 2020 at 11:20:54AM +1000, raf wrote:


I wrote a tool for replacing non-text attachments in email with text
versions. It's at http://raf.org/textmail.


Thank you so much for writting textmail, which is exactly what I needed
and much more.  It is much cooler than I could ever imagine for a tool
like that.


By default, it replaces HTML attachments with inline plain text
attachments that contain just the text within the original document.
It also reduces text-versus-html alternative attachments to just the
text part. It needs lynx to be installed.



If that's the only transformation you want, you'll need to supply lots
of options to prevent other default actions. Something like this:



  textmail -MWERPULIAVXBS


For the moment, to proccess the few mailboxes I already have (mainly
from email lists) that's the only transformation needed (there aren't
any images or attached files, just the nasty text/html that doubles (or
almost) the size the mailboxes take.  In the future, I'll probably be
doing other transformations as well for my daily incoming mail.


That suppresses translating and deleting anything else, just HTML.


It works very nicely, thank you, except for 33 messages out of 3336 that
result in totaly empty mail (OK), with no content (not even headers).


You can use it in a procmail recipe or apply it to mbox files.


For the moment I'm using a bash for loop, since I have all my mail
stored using the Maildir format (one file per message).


WARNING: Please verify it carefully and make sure it's doing what you
want before deleting the original mailboxes. Anything that transforms
your email should not be trusted until you have reason to trust it.


Sure. The first thing my imapfilter config file has is a recipe to make
a copy on a mail folder called bak on the IMAP server, and only then it
starts procesing mail.

In any case, for this particular case, I'm reading mail from my local
original mailbox and writting the output of textmail also into a local
testing mailfolder.

Until now I haven't done any processing with imapfilter other than some
subject line substitutions and mail folder sorting, but I guess it
should be possible to use textmail with imapfilter like with procmail,
to preprocess email on the IMAP server before fetching it with mbsyn,
right?

Again thank you so much for such a great tool!

Cheers,
Ángel


Re: Elimination of mime/html part

2020-05-02 Thread Derek Martin
On Thu, Apr 30, 2020 at 03:27:29PM +0100, Sam Kuper wrote:
> Unfortunately, one of the weaknesses in Python's email handling (which
> might be related to some ambiguities or flaws in the RFCs on which they
> are based - I'm not sure) relates to the problem of identifying a
> "primary" (for want of a better word) text/plain part.

It's not a weakness in Python, per se.  There isn't such a thing.
That's one of the points I was trying to make before.  MIME allows for
the BODY of your message to be literally anything.  You can't
hueristically determine what the "main" part is because it doesn't
exist, except in the minds of the humans who interact with the
message... and as we've seen, their ideas about what the main part is
may differ.  And in any event, you can not do this without potentially
losing some information that is important, since many such messages
will have plain text parts that contain only garbage, where the actual
content is only in the HTML part (or even some other piece).

[The tools could assume the first plain-text part is the "main" part,
and that would be a reasonable assumption, but some of the time it
will be wrong.]

A likely better option is to compress the folder.  If HTML truly is
the bloat, it should compress very efficiently.

-- 
Derek D. Martinhttp://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.



signature.asc
Description: PGP signature


Re: Elimination of mime/html part

2020-04-30 Thread raf
Angel M Alganza wrote:

> Hello:
> 
> I keep a few mailbox folders containing a large amount of multipart mail
> with a text/plain part and a text/html part, which I would like to
> eliminate in order to reduce the amount of disk space used (for a small
> device and mainly the IMAP server I keep synced with).
> 
> I've been looking for ways to remove the text/html part with tools like
> imapfilter or mbsync/isync that I use to manipulat, sort and
> syncronised, but I haven't had any succeed.  I can eliminate it by hand
> with Mutt, but that'd be really tedious, since I have thousands of mail.
> 
> Is there an automated way that you know of to get that done?  Or do you
> know of any third party program that could help me out?
> 
> Thank you very much in advance for your help.
> 
> Regards,
> Ángel

Hi Ángel,

I wrote a tool for replacing non-text attachments in
email with text versions. It's at http://raf.org/textmail.

By default, it replaces HTML attachments with inline
plain text attachments that contain just the text
within the original document. It also reduces
text-versus-html alternative attachments to just the
text part. It needs lynx to be installed.

If that's the only transformation you want, you'll need
to supply lots of options to prevent other default
actions. Something like this:

  textmail -MWERPULIAVXBS

That suppresses translating and deleting anything else,
just HTML.

You can use it in a procmail recipe or apply it to mbox
files.

WARNING: Please verify it carefully and make sure it's
doing what you want before deleting the original
mailboxes. Anything that transforms your email should
not be trusted until you have reason to trust it.

cheers,
raf



Re: Elimination of mime/html part

2020-04-30 Thread Sam Kuper
On Thu, Apr 30, 2020 at 08:54:25AM +0200, Angel M Alganza wrote:
> [...] I've been looking for ways to remove the text/html part [...]
> 
> Is there an automated way that you know of to get that done?  Or do
> you know of any third party program that could help me out?

Python and some other programming/scripting languages have built-in or
third-party email handling libraries that can be used to iterate over
messages in a mailbox (mbox, maildir, MH, etc), processing each message
in turn.

Unfortunately, one of the weaknesses in Python's email handling (which
might be related to some ambiguities or flaws in the RFCs on which they
are based - I'm not sure) relates to the problem of identifying a
"primary" (for want of a better word) text/plain part.

So, if you just want to remove text/html parts from each message that
also has a text/plain part, you'll probably find Python adequate.

But if your use case involves being sure that your script has correctly
identified the "primary" text/plain part, then you may have to work
around shortcomings in Python's email objects/functions.

If you come up with a solution that works for you, please post a
follow-up in this thread, ideally with a copy of your source code (or a
link to it) so that others can benefit.  I would be very interested to
see your approach.

Good luck!

-- 
A: When it messes up the order in which people normally read text.
Q: When is top-posting a bad thing?

()  ASCII ribbon campaign. Please avoid HTML emails & proprietary
/\  file formats. (Why? See e.g. https://v.gd/jrmGbS ). Thank you.


Elimination of mime/html part

2020-04-30 Thread Angel M Alganza

Hello:

I keep a few mailbox folders containing a large amount of multipart mail
with a text/plain part and a text/html part, which I would like to
eliminate in order to reduce the amount of disk space used (for a small
device and mainly the IMAP server I keep synced with).

I've been looking for ways to remove the text/html part with tools like
imapfilter or mbsync/isync that I use to manipulat, sort and
syncronised, but I haven't had any succeed.  I can eliminate it by hand
with Mutt, but that'd be really tedious, since I have thousands of mail.

Is there an automated way that you know of to get that done?  Or do you
know of any third party program that could help me out?

Thank you very much in advance for your help.

Regards,
Ángel