[PHP-DB] Re: Subject: Searching remote web sites for content
At 06:26 23/10/2005, you wrote: Message-ID: [EMAIL PROTECTED] Date: Sat, 22 Oct 2005 13:21:26 -0400 From: Joseph Crawford [EMAIL PROTECTED] To: [PHP-DB] Mailing List php-db@lists.php.net MIME-Version: 1.0 Content-Type: multipart/alternative; boundary==_Part_33359_9054580.1130001686839 Subject: Re: [PHP-DB] Re: Subject: Searching remote web sites for content why do all that, Oh, it's far less work than the method you're proposing - you only have one site to fopen() not many dozens. There's no 'all that' to it - it's the same method we're discussing, but more optimal (see point 3) if you know the address of the page that the link will reside on just curl that page for the results and preg_match that. Ref the OP : I ask them to nominate where the link back page is, and I could check this manually. But is there a way to check whether the remote page links back using a php script, so that I could get a report and follow up on exceptions, without having to check all pages that say they link to my site? Three reasons : 1 is because the nomination process might be poorly understood by the nominee, or they could be inept and place the link somewhere other than where they specified (or move it about once nominated). You'd need to be able to crawl their entire site in order to automate the scan on a regular basis, or you're back to and I could check this manually 2 is that unless you want to write a very very robust parser, you may as well rely on google's hard work writing such a parser. You can't be sure *how* the referring webmaster has set up his links (re:inept) so they could occur in a wide range of formats. The results from google come in a regular format, so they're easy to parse - and you said yourself you're not too certain of the regex you'd need - why complicate it by having to cover dozens of eventualities ? 3 is that the point of the exercise is to ensure goos SE rankings by having referring links of high relevance. Only google knows how that relevance ranking results in a search index placement based on link popularity - and that includes using hidden links to 'spam' the search engine, whic you don't want. So, relying on google to spider the remote site is a way to ensure your QA process for the link referrals really does result in a usable link:mysite index in the search engine - which of course is *the whole point of the exercise* ! HTH Cheers - Neil -- PHP Database Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DB] Re: Subject: Searching remote web sites for content
At 16:17 22/10/2005, you wrote: Message-ID: [EMAIL PROTECTED] From: ioannes [EMAIL PROTECTED] To: php-db@lists.php.net Date: Sat, 22 Oct 2005 16:17:22 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=iso-8859-1; reply-type=original Content-Transfer-Encoding: 7bit Subject: Searching remote web sites for content I have a web site and google likes to count the inbound links. I have set up a way for people to add links from my site to theirs, however I would like to check whether they have linked back to my site. I ask them to nominate where the link back page is, and I could check this manually. But is there a way to check whether the remote page links back using a php script, so that I could get a report and follow up on exceptions, without having to check all pages that say they link to my site? Yes, you can - exploit Google's search to do this. You need to run a query for link:mysite.mydomain.com then screen-scrape the results. IE You'd curl or fopen() the pages with, for example http://www.google.co.uk/search?q=link:www.captionkit.comhl=enlr=start=10sa=N The for each page returned, use a regex to extract the HTML returned from Google, eg on p class=ga href=http://archive.netbsd.se/?ml=php-databasea=2004-10m=430433; onmousedown=return clk(this.href,'res','18','')archive.netbsd.se - NetBSD Sverige/a You just want a capture pattern to extract the href value, which you then store in your database. Before you accuse anybody of anything, ensure you've waited a few days for google to re-spider their site. If their site doesn't appear in the index at all, it may be because google doesn't or can't spider it, rather than the back link isn't there - but in that case their link popularity is ineffectual and may as well be ignored ! HTH Cheers - Neil -- PHP Database Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DB] Re: Subject: Searching remote web sites for content
why do all that, if you know the address of the page that the link will reside on just curl that page for the results and preg_match that. -- Joseph Crawford Jr. Zend Certified Engineer Codebowl Solutions, Inc. 1-802-671-2021 [EMAIL PROTECTED]