RE: filtering HTML tags from email

2005-02-24 Thread Ted Mittelstaedt


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Mike Hauber
 Sent: Wednesday, February 23, 2005 4:19 AM
 To: freebsd-questions@freebsd.org
 Subject: Re: filtering HTML tags from email
 
 
 Just after destroying the headers in who-knows-how-many emails 
 (backed up...  whew!), I finally realized that piping the 
 messages though html2text (or lynx or w3m) was probably not such 
 a great idea after all.  :)
 
 This is something that really should be implemented as part of 
 kmail itself (it would help to remain compatable with both 
 maildir/mbox).  I'll continue to be frustrated with html2text for 
 a while (it's a pretty cool tool), and who knows...  Mayhaps I'll 
 figure out a reasonable way to set it up so that everything is 
 done automatically.

Mike, why are you torturing yourself when http://www.mimedefang.org/
does this?  Afraid of Sendmail?

Ted
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: filtering HTML tags from email

2005-02-23 Thread Simon Barner
Mike Hauber wrote:
  Mutt saves to a temp file then calls the following command:
  lynx -localhost -dump %s
  where '%s' is the temporary file you saved it to.
 
  You could also just pipe it to the following:
  lynx -localhost -dump -stdin
 
  the -localhost argument prevents lynx from simply following
  links external to your machine - helpful to avoid generating
  hits for unscrupulous spammers that get paid for hits on a URL.
 
  Just make sure lynx is installed.
 
  Lou
 
 Okay, so to be sure, there is no filter (as of yet) to simply open 
 an email file, strip the HTML tags, and resave it?  I'm not 
 complaining, as this may actually be something I'm capable of 
 creating myself.  (I'll make this my first python project. :) )
 
 I'm just making sure I'm not missing anything obvious before I 
 start working on it.  It's irritating to spend time on something 
 only to find out that it's already been done.

You probably could do it also with procmail + lynx (or w3m) during the
delivery process.

Another possibility is to have the following entries in your ~/.mailcap
file, which converts html, doc and rtf to plain text.

text/html; w3m -dump -T text/html; copiousoutput;
application/msword; antiword %s; copiousoutput
application/rtf; rtfreader %s; copiousoutput

As for your python script: I don't think that just stripping everything
matching the following expressions is correct because they might appear
in non html emails, too: .* \/.* (perl syntax).

At least, you'd need a list of valid html tags, i.e. a regular grammar
for html: b | /b | i | /i | ... (BNF notation).

While this is not too hard to implement (and possibly a good project to
learn a new programming language), this would be too much work for
something that can be achieved easier with existing tools (that is, for
me, personally ;-)

Simon


pgpgUlVMmAaoT.pgp
Description: PGP signature


Re: filtering HTML tags from email

2005-02-23 Thread Mike Hauber
On Wednesday 23 February 2005 04:43 am, Simon Barner wrote:
   You could also just pipe it to the following:
   lynx -localhost -dump -stdin
  
   Lou
 
  Okay, so to be sure, there is no filter (as of yet) to simply
  open an email file, strip the HTML tags, and resave it?  I'm
  not complaining, as this may actually be something I'm
  capable of creating myself.  (I'll make this my first python
  project. :) )
 

 You probably could do it also with procmail + lynx (or w3m)
 during the delivery process.

 Another possibility is to have the following entries in your
 ~/.mailcap file, which converts html, doc and rtf to plain
 text.

 text/html; w3m -dump -T text/html; copiousoutput;
 application/msword; antiword %s; copiousoutput
 application/rtf; rtfreader %s; copiousoutput

 Simon

Just after destroying the headers in who-knows-how-many emails 
(backed up...  whew!), I finally realized that piping the 
messages though html2text (or lynx or w3m) was probably not such 
a great idea after all.  :)

This is something that really should be implemented as part of 
kmail itself (it would help to remain compatable with both 
maildir/mbox).  I'll continue to be frustrated with html2text for 
a while (it's a pretty cool tool), and who knows...  Mayhaps I'll 
figure out a reasonable way to set it up so that everything is 
done automatically.

Thanks for the feeds.

Mike
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


filtering html tags from email

2005-02-22 Thread Mike Hauber
Without going through the hassle of setting up proxy servers, 
isn't there a way that one can filter out html tags from a 
message (say, pipe the email through the filter from kmail for 
instance?)

Perhaps I'm looking too hard for it, but I didn't see anything in 
the ports tree except for /mail/nohtml.  I tried to pipe a html 
message through nohtml.py from kmail, but doesn't seem to work 
(although I'm getting no errors from kmail's filter log).

Any ideas?  Thx.


Mike
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: filtering html tags from email

2005-02-22 Thread Louis LeBlanc
On 02/22/05 11:16 PM, Mike Hauber sat at the `puter and typed:
 Without going through the hassle of setting up proxy servers, 
 isn't there a way that one can filter out html tags from a 
 message (say, pipe the email through the filter from kmail for 
 instance?)
 
 Perhaps I'm looking too hard for it, but I didn't see anything in 
 the ports tree except for /mail/nohtml.  I tried to pipe a html 
 message through nohtml.py from kmail, but doesn't seem to work 
 (although I'm getting no errors from kmail's filter log).
 
 Any ideas?  Thx.

Mutt saves to a temp file then calls the following command:
lynx -localhost -dump %s
where '%s' is the temporary file you saved it to.

You could also just pipe it to the following:
lynx -localhost -dump -stdin

the -localhost argument prevents lynx from simply following links
external to your machine - helpful to avoid generating hits for
unscrupulous spammers that get paid for hits on a URL.

Just make sure lynx is installed.

Lou
-- 
Louis LeBlanc  FreeBSD-at-keyslapper-DOT-net
Fully Funded Hobbyist,   KeySlapper Extrordinaire :)
Please send off-list email to: leblanc at keyslapper d.t net
Key fingerprint = C5E7 4762 F071 CE3B ED51  4FB8 AF85 A2FE 80C8 D9A2

Habit is habit, and not to be flung out of the window by any man, but
coaxed down-stairs a step at a time.
-- Mark Twain, Pudd'nhead Wilson's Calendar


pgpwHmOTn9WRn.pgp
Description: PGP signature


Re: filtering HTML tags from email

2005-02-22 Thread Mike Hauber
On Wednesday 23 February 2005 12:50 am, Louis LeBlanc wrote:
 On 02/22/05 11:16 PM, Mike Hauber sat at the `puter and typed:
  Without going through the hassle of setting up proxy servers,
  isn't there a way that one can filter out html tags from a
  message (say, pipe the email through the filter from kmail
  for instance?)
 
  Perhaps I'm looking too hard for it, but I didn't see
  anything in the ports tree except for /mail/nohtml.  I tried
  to pipe a html message through nohtml.py from kmail, but
  doesn't seem to work (although I'm getting no errors from
  kmail's filter log).
 
  Any ideas?  Thx.

 Mutt saves to a temp file then calls the following command:
 lynx -localhost -dump %s
 where '%s' is the temporary file you saved it to.

 You could also just pipe it to the following:
 lynx -localhost -dump -stdin

 the -localhost argument prevents lynx from simply following
 links external to your machine - helpful to avoid generating
 hits for unscrupulous spammers that get paid for hits on a URL.

 Just make sure lynx is installed.

 Lou

Okay, so to be sure, there is no filter (as of yet) to simply open 
an email file, strip the HTML tags, and resave it?  I'm not 
complaining, as this may actually be something I'm capable of 
creating myself.  (I'll make this my first python project. :) )

I'm just making sure I'm not missing anything obvious before I 
start working on it.  It's irritating to spend time on something 
only to find out that it's already been done.

Thanks,

Mike

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]