Re: [PHP-DB] Email and HTML Parser Library

2002-11-09 Thread Marco Tabini
Have you considered using the IMAP extension? That would solve pretty
much all your problems with regards to "interpreting" the contents of a
message. It's a bit slow, though.

As for searching the hrefs and imgs, you can easily get away with a
couple of regular expressions.

Hope this helps.


Marco

-- 

php|architect - The magazine for PHP Professionals
The first monthly worldwide magazine dedicated to PHP programmers
Check us out on the web at http://www.phparch.com

On Sat, 2002-11-09 at 15:57, Peter Beckman wrote:
> Hey Folks:
> 
> I admit, I haven't searched for this anywhere yet, but I thought I'd ask
> for opinions first.
> 
> I'm looking to parse an email.  Some emails are HTML, some are not.
> 
> What I want to do with an email is:
> 
> 1. Split the headers from the body
> 2. Remove MIME attachments that aren't txt or html from the body
> 3. Grab all the HREF urls in the body as well as image SRCes (fully
>qualified, so if there is a "http://www.purplecow.com/hhs/
>img  http://purplecow.com/gfx/icons/new.gif
> 
> Are there any good libraries of functions that would aid me in this effort?
> The idea is to store the headers in the DB, store the body in another
> table, and store hrefs and image URLs in another table.  The URLs will be
> used to see if there are redirects to somewhere else and make a "parent"
> association.
> 
> Any thoughts, pointers, urls or code would be appreciated!
> 
> Peter
> ---
> Peter BeckmanSystems Engineer, Fairfax Cable Access Corporation
> [EMAIL PROTECTED] http://www.purplecow.com/
> ---
> 
> 
> -- 
> PHP Database Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 



-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




[PHP-DB] Email and HTML Parser Library

2002-11-09 Thread Peter Beckman
Hey Folks:

I admit, I haven't searched for this anywhere yet, but I thought I'd ask
for opinions first.

I'm looking to parse an email.  Some emails are HTML, some are not.

What I want to do with an email is:

1. Split the headers from the body
2. Remove MIME attachments that aren't txt or html from the body
3. Grab all the HREF urls in the body as well as image SRCes (fully
   qualified, so if there is a "http://www.purplecow.com/hhs/
   img  http://purplecow.com/gfx/icons/new.gif

Are there any good libraries of functions that would aid me in this effort?
The idea is to store the headers in the DB, store the body in another
table, and store hrefs and image URLs in another table.  The URLs will be
used to see if there are redirects to somewhere else and make a "parent"
association.

Any thoughts, pointers, urls or code would be appreciated!

Peter
---
Peter BeckmanSystems Engineer, Fairfax Cable Access Corporation
[EMAIL PROTECTED] http://www.purplecow.com/
---


-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php