> On Wed, Oct 27, 2004 at 09:35:11AM -0400, Keith Hackworth wrote: >> > I'm guess you want PMS::get_decoded_stripped_body_text_array(). >> >> Thanks, Theo - this may work for html only messages, which might be good >> enough for what I'm trying to do. I need just the HTML version of the >> email. No attachments, just the HTML body. If the 1st part if >> multipart, >> I need the 1st html part. > > If you want to limit what you're looking at in that way, you'd need to > access > the Message object directly and use find_parts to grab just the first > matching > part you're interested in. The PMS functions work on all text/* parts, > and > aren't limited to HTML. > >> Here's what I'm trying to do: >> I'm trying to find invalid html tags and if there's too many, bump the >> sa >> score up a bit. I noticed a bunch of messages come in with obfu like >> this >> "v-wo<notatag>rd" in the body of the html message, which shows up as >> "v-word" on a normal webmail or outlook email client. I want to see how >> many "notatag"s we're getting in a message. I got the code on how to do >> it and it works fine, but it's just WAY too slow using >> PMS::get_message(). > > Yeah, that'll get you a bunch of stuff you really don't care about. > get_decoded_stripped... is also not the right thing, since it will have > stripped all the HTML tags. I'd try get_decoded_body_text_array(), > or since you're doing code anyway, just use find_parts and grab the > [EMAIL PROTECTED]/[EMAIL PROTECTED] parts of the message. You can then > easily call decode() > on them (object function) and get the raw HTML out. > > Just curious though, why limit yourself to invalid html tags? Why not > just > target the html-tag-in-middle-of-word behavior? and isn't this the same > idea > as the backhair code? > > -- > Randomly Generated Tagline: > "Exactly what it should've been, give people what they expect. The third > one can be clever." - John Hughes about Home Alone 2 >
Wow! I guess if I RTFM a little better, I'd save myself a lot of trouble. I didn't realize backhair did this already. On to [EMAIL PROTECTED] [EMAIL PROTECTED], which catches "[EMAIL PROTECTED] l|k3 th!s" in the subject. Yes - I know chicken pox does this already, but I have many custom rules built on my server for this one and it seems to be much more accurate. Thanks! Keith