Re: [PHP] html parser tutorial

2004-12-07 Thread Richard Lynch
Ahmed Abdel-Aliem wrote:
> Doesn anyone plz knows a good tutorial for parsing html files ?
> i have a html page and i want to parse information from it to insert
> it into mysql.
> i have a good experience in php, but i didn't write a parser before.
> can anyone help plz ?

TidyHTML is supposed to be good at that.  Never actually tried it, but
John Coggeshall's presentation a few months ago at the Chicago PHP User
Group meeting was pretty compelling.

If you only need a few small bits of information from web pages whose
format doesn't change often, you can maybe get it done really fast and
easy with http://php.net/explode.

I've scraped a lot of stuff that way myself.

You simply have to search the HTML for a distinctive tag that is unlikely
to change often and is shortly before the content you want.

Then use http://php.net/explode with that tag.  For example, on a site
with calendar events, you might use:

http://example.com/');
  $html = implode('', $file);
  $parts = explode('', $event);
//Prepend 

MOST sites with content you want to scrape on a routine basis are pretty
predictable.  CSS classes can be particularly useful to find the right
bits you want to scrap.

Occasionally I run across one where it's hand-edited and completely
unpredictable -- and usually not worth scraping, in my experience.

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] html parser tutorial

2004-12-07 Thread rouvas
On Tuesday 07 December 2004 19:09, Ahmed Abdel-Aliem wrote:
> Doesn anyone plz knows a good tutorial for parsing html files ?
> i have a html page and i want to parse information from it to insert
> it into mysql.

Check out:
http://0x00.org/php/phpHTMLparse/index.php

-Stathis

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] html parser tutorial

2004-12-07 Thread Ahmed Abdel-Aliem
Doesn anyone plz knows a good tutorial for parsing html files ?
i have a html page and i want to parse information from it to insert
it into mysql.
i have a good experience in php, but i didn't write a parser before.
can anyone help plz ?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php