Re: [PHP] Re: Static and/or Dynamic site scraping using PHP
On Thu, Apr 30, 2009 at 8:03 AM, 9el wrote: > On Thu, Apr 30, 2009 at 3:33 AM, 9el wrote: >> I just got a project to do on PHP of scraping the body items from >> static sites or just html sites. >> Could you experts please suggest me some quick resources? >> >> I have to make an WP plugin with the data as well. > > Any expert there yet? Was looking for urgent advices on accomplishing the > task. http://www.regular-expressions.info and preg_match are your best friend(s). -- // Todd -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: Re: [PHP] Re: Static and/or Dynamic site scraping using PHP
On Sat, 2009-05-02 at 13:08 -0400, Paul M Foster wrote: > On Sat, May 02, 2009 at 10:40:04PM +0600, Lenin wrote: > > > On Sat, May 2, 2009 at 10:01 PM, > > wrote: > > > > > Je suis actuellement absent du bureau aussi TEST ! > > > > > > I dont get it why I get this automated mail everytime I send message to > > this thread. :-/ > > My French is rusty, but it looks like it says something like "I'm out of > the office". So it would appear this person has an autoreply > going. > > Paul > > -- > Paul M. Foster > I've got it for every email I send to the list as well! It's annoying, but the TEST bit just makes it funny! Ash www.ashleysheridan.co.uk -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: Re: [PHP] Re: Static and/or Dynamic site scraping using PHP
On Sat, May 02, 2009 at 10:40:04PM +0600, Lenin wrote: > On Sat, May 2, 2009 at 10:01 PM, > wrote: > > > Je suis actuellement absent du bureau aussi TEST ! > > > > I dont get it why I get this automated mail everytime I send message to > this thread. :-/ My French is rusty, but it looks like it says something like "I'm out of the office". So it would appear this person has an autoreply going. Paul -- Paul M. Foster -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: Re: [PHP] Re: Static and/or Dynamic site scraping using PHP
On Sat, May 2, 2009 at 10:01 PM, wrote: > Je suis actuellement absent du bureau aussi TEST ! > > I dont get it why I get this automated mail everytime I send message to this thread. :-/
Re: [PHP] Re: Static and/or Dynamic site scraping using PHP
I thought I would get some more experts giving me more insight about the methods of scraping. I want to grab the body content of pages say of Wordpress but not through RSS. I would assume the pages are static only. And try to scrape the body content but avoiding sidebar, footer, header etc. I tried with the DOM and its fun. But just wanting to know some expert experience on specific to my problem. Thanks in advance.
[PHP] Re: Static and/or Dynamic site scraping using PHP
9el wrote: > On Thu, Apr 30, 2009 at 3:33 AM, 9el wrote: >> I just got a project to do on PHP of scraping the body items from >> static sites or just html sites. >> Could you experts please suggest me some quick resources? >> >> I have to make an WP plugin with the data as well. > > Any expert there yet? Was looking for urgent advices on accomplishing the > task. > > Thanks > > Lenin > > www.twitter.com/nine_L If you're just capturing and using the body, the load with file_get_contents() and use preg_match() to select the body or individual tags, etc... For more control, maybe try this: $doc = new DOMDocument(); $doc->loadHTMLFile('http://example.com/page.html'); Then use: http://php.net/manual/book.dom.php -- Thanks! -Shawn http://www.spidean.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: Static and/or Dynamic site scraping using PHP
On Thu, Apr 30, 2009 at 3:33 AM, 9el wrote: > I just got a project to do on PHP of scraping the body items from > static sites or just html sites. > Could you experts please suggest me some quick resources? > > I have to make an WP plugin with the data as well. Any expert there yet? Was looking for urgent advices on accomplishing the task. Thanks Lenin www.twitter.com/nine_L -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php