Ahmed Abdel-Aliem wrote:
> Doesn anyone plz knows a good tutorial for parsing html files ?
> i have a html page and i want to parse information from it to insert
> it into mysql.
> i have a good experience in php, but i didn't write a parser before.
> can anyone help plz ?
TidyHTML is supposed to be good at that. Never actually tried it, but
John Coggeshall's presentation a few months ago at the Chicago PHP User
Group meeting was pretty compelling.
If you only need a few small bits of information from web pages whose
format doesn't change often, you can maybe get it done really fast and
easy with http://php.net/explode.
I've scraped a lot of stuff that way myself.
You simply have to search the HTML for a distinctive tag that is unlikely
to change often and is shortly before the content you want.
Then use http://php.net/explode with that tag. For example, on a site
with calendar events, you might use:
http://example.com/');
$html = implode('', $file);
$parts = explode('', $event);
//Prepend
MOST sites with content you want to scrape on a routine basis are pretty
predictable. CSS classes can be particularly useful to find the right
bits you want to scrap.
Occasionally I run across one where it's hand-edited and completely
unpredictable -- and usually not worth scraping, in my experience.
--
Like Music?
http://l-i-e.com/artists.htm
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php