Auto saving attachment [was Re: Elimination of mime/html part]
Hi again, On Fri, May 01, 2020 at 11:20:54AM +1000, raf wrote: I wrote a tool for replacing non-text attachments in email with text versions. It's at http://raf.org/textmail. As I said in my previous mail, textmail does a wonderful job reducing my email list folders by eliminating all HTML from them. But I also have an archive Maildir folder with years of mail with attachments (PDF, maybe some .DOC, too) that I'd like to have extracted from there as independet files into an attachments directory before they get removed or replaced with text, much like Eudora did (or does, I used it last on 1999). Could textmail help with that part, too? Or is there another tool that you guys know of that could? It'd be great if I could do the same for incoming mail. At the moment, I just download it with mbsync, read it with Mutt and manually save any attachment I need/want to keep (files with forms I need to fill out or print or sign, etc.) and then delete them from the mail to save space. It'd be wonderfull if I could have the attached files saved into my attachments folder and the either replaced by text or deleted by textmail. I'd love if it could be done via imapfilter directly to the IMAP server (with the attached files being saved locally, of course), but I could live if it needed to be done locally and then synced back to the IMAP server. Any idea on that, please? Cheers, Ángel
Re: Elimination of mime/html part
Hi raf, On Fri, May 01, 2020 at 11:20:54AM +1000, raf wrote: I wrote a tool for replacing non-text attachments in email with text versions. It's at http://raf.org/textmail. Thank you so much for writting textmail, which is exactly what I needed and much more. It is much cooler than I could ever imagine for a tool like that. By default, it replaces HTML attachments with inline plain text attachments that contain just the text within the original document. It also reduces text-versus-html alternative attachments to just the text part. It needs lynx to be installed. If that's the only transformation you want, you'll need to supply lots of options to prevent other default actions. Something like this: textmail -MWERPULIAVXBS For the moment, to proccess the few mailboxes I already have (mainly from email lists) that's the only transformation needed (there aren't any images or attached files, just the nasty text/html that doubles (or almost) the size the mailboxes take. In the future, I'll probably be doing other transformations as well for my daily incoming mail. That suppresses translating and deleting anything else, just HTML. It works very nicely, thank you, except for 33 messages out of 3336 that result in totaly empty mail (OK), with no content (not even headers). You can use it in a procmail recipe or apply it to mbox files. For the moment I'm using a bash for loop, since I have all my mail stored using the Maildir format (one file per message). WARNING: Please verify it carefully and make sure it's doing what you want before deleting the original mailboxes. Anything that transforms your email should not be trusted until you have reason to trust it. Sure. The first thing my imapfilter config file has is a recipe to make a copy on a mail folder called bak on the IMAP server, and only then it starts procesing mail. In any case, for this particular case, I'm reading mail from my local original mailbox and writting the output of textmail also into a local testing mailfolder. Until now I haven't done any processing with imapfilter other than some subject line substitutions and mail folder sorting, but I guess it should be possible to use textmail with imapfilter like with procmail, to preprocess email on the IMAP server before fetching it with mbsyn, right? Again thank you so much for such a great tool! Cheers, Ángel
Re: Elimination of mime/html part
On Thu, Apr 30, 2020 at 03:27:29PM +0100, Sam Kuper wrote: > Unfortunately, one of the weaknesses in Python's email handling (which > might be related to some ambiguities or flaws in the RFCs on which they > are based - I'm not sure) relates to the problem of identifying a > "primary" (for want of a better word) text/plain part. It's not a weakness in Python, per se. There isn't such a thing. That's one of the points I was trying to make before. MIME allows for the BODY of your message to be literally anything. You can't hueristically determine what the "main" part is because it doesn't exist, except in the minds of the humans who interact with the message... and as we've seen, their ideas about what the main part is may differ. And in any event, you can not do this without potentially losing some information that is important, since many such messages will have plain text parts that contain only garbage, where the actual content is only in the HTML part (or even some other piece). [The tools could assume the first plain-text part is the "main" part, and that would be a reasonable assumption, but some of the time it will be wrong.] A likely better option is to compress the folder. If HTML truly is the bloat, it should compress very efficiently. -- Derek D. Martinhttp://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02 -=-=-=-=- This message is posted from an invalid address. Replying to it will result in undeliverable mail due to spam prevention. Sorry for the inconvenience. signature.asc Description: PGP signature
Re: Elimination of mime/html part
Angel M Alganza wrote: > Hello: > > I keep a few mailbox folders containing a large amount of multipart mail > with a text/plain part and a text/html part, which I would like to > eliminate in order to reduce the amount of disk space used (for a small > device and mainly the IMAP server I keep synced with). > > I've been looking for ways to remove the text/html part with tools like > imapfilter or mbsync/isync that I use to manipulat, sort and > syncronised, but I haven't had any succeed. I can eliminate it by hand > with Mutt, but that'd be really tedious, since I have thousands of mail. > > Is there an automated way that you know of to get that done? Or do you > know of any third party program that could help me out? > > Thank you very much in advance for your help. > > Regards, > Ángel Hi Ángel, I wrote a tool for replacing non-text attachments in email with text versions. It's at http://raf.org/textmail. By default, it replaces HTML attachments with inline plain text attachments that contain just the text within the original document. It also reduces text-versus-html alternative attachments to just the text part. It needs lynx to be installed. If that's the only transformation you want, you'll need to supply lots of options to prevent other default actions. Something like this: textmail -MWERPULIAVXBS That suppresses translating and deleting anything else, just HTML. You can use it in a procmail recipe or apply it to mbox files. WARNING: Please verify it carefully and make sure it's doing what you want before deleting the original mailboxes. Anything that transforms your email should not be trusted until you have reason to trust it. cheers, raf
Re: Elimination of mime/html part
On Thu, Apr 30, 2020 at 08:54:25AM +0200, Angel M Alganza wrote: > [...] I've been looking for ways to remove the text/html part [...] > > Is there an automated way that you know of to get that done? Or do > you know of any third party program that could help me out? Python and some other programming/scripting languages have built-in or third-party email handling libraries that can be used to iterate over messages in a mailbox (mbox, maildir, MH, etc), processing each message in turn. Unfortunately, one of the weaknesses in Python's email handling (which might be related to some ambiguities or flaws in the RFCs on which they are based - I'm not sure) relates to the problem of identifying a "primary" (for want of a better word) text/plain part. So, if you just want to remove text/html parts from each message that also has a text/plain part, you'll probably find Python adequate. But if your use case involves being sure that your script has correctly identified the "primary" text/plain part, then you may have to work around shortcomings in Python's email objects/functions. If you come up with a solution that works for you, please post a follow-up in this thread, ideally with a copy of your source code (or a link to it) so that others can benefit. I would be very interested to see your approach. Good luck! -- A: When it messes up the order in which people normally read text. Q: When is top-posting a bad thing? () ASCII ribbon campaign. Please avoid HTML emails & proprietary /\ file formats. (Why? See e.g. https://v.gd/jrmGbS ). Thank you.
Elimination of mime/html part
Hello: I keep a few mailbox folders containing a large amount of multipart mail with a text/plain part and a text/html part, which I would like to eliminate in order to reduce the amount of disk space used (for a small device and mainly the IMAP server I keep synced with). I've been looking for ways to remove the text/html part with tools like imapfilter or mbsync/isync that I use to manipulat, sort and syncronised, but I haven't had any succeed. I can eliminate it by hand with Mutt, but that'd be really tedious, since I have thousands of mail. Is there an automated way that you know of to get that done? Or do you know of any third party program that could help me out? Thank you very much in advance for your help. Regards, Ángel