[PHP-DB] Re: Subject: Searching remote web sites for content

2005-10-23 Thread Neil Smith [MVP, Digital media]

At 06:26 23/10/2005, you wrote:

Message-ID: [EMAIL PROTECTED]
Date: Sat, 22 Oct 2005 13:21:26 -0400
From: Joseph Crawford [EMAIL PROTECTED]
To: [PHP-DB] Mailing List php-db@lists.php.net
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary==_Part_33359_9054580.1130001686839
Subject: Re: [PHP-DB] Re: Subject: Searching remote web sites for content

why do all that,


Oh, it's far less work than the method you're proposing - you only 
have one site to fopen() not many dozens. There's no 'all that' to it 
- it's the same method we're discussing, but more optimal (see point 3)



 if you know the address of the page that the link will
reside on just curl that page for the results and preg_match that.



Ref the OP : I ask them to nominate where the link back page is, and 
I could check this manually.  But is there a way to check whether the 
remote page links back using a php script, so that I could get a 
report and follow up on exceptions, without having to check all pages 
that say they link to my site?


Three reasons : 1 is because the nomination process might be poorly 
understood by the nominee, or they could be inept and place the link 
somewhere other than where they specified (or move it about once 
nominated). You'd need to be able to crawl their entire site in order 
to automate the scan on a regular basis, or you're back to  and I 
could check this manually


2 is that unless you want to write a very very robust parser, you may 
as well rely on google's hard work writing such a parser. You can't 
be sure *how* the referring webmaster has set up his links (re:inept) 
so they could occur in a wide range of formats. The results from 
google come in a regular format, so they're easy to parse - and you 
said yourself you're not too certain of the regex you'd need - why 
complicate it by having to cover dozens of eventualities ?


3 is that the point of the exercise is to ensure goos SE rankings by 
having referring links of high relevance. Only google knows how that 
relevance ranking results in a search index placement based on link 
popularity -  and that includes using hidden links to 'spam' the 
search engine, whic you don't want.


So, relying on google to spider the remote site is a way to ensure 
your QA process for the link referrals really does result in a usable 
link:mysite index in the search engine - which of course is *the 
whole point of the exercise* !


HTH
Cheers - Neil  


--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DB] Re: Subject: Searching remote web sites for content

2005-10-22 Thread Neil Smith [MVP, Digital media]

At 16:17 22/10/2005, you wrote:

Message-ID: [EMAIL PROTECTED]
From: ioannes [EMAIL PROTECTED]
To: php-db@lists.php.net
Date: Sat, 22 Oct 2005 16:17:22 +0100
MIME-Version: 1.0
Content-Type: text/plain;
format=flowed;
charset=iso-8859-1;
reply-type=original
Content-Transfer-Encoding: 7bit
Subject: Searching remote web sites for content

I have a web site and google likes to count the inbound links.  I 
have set up a way for people to add links from my site to theirs, 
however I would like to check whether they have linked back to my 
site.  I ask them to nominate where the link back page is, and I 
could check this manually.  But is there a way to check whether the 
remote page links back using a php script, so that I could get a 
report and follow up on exceptions, without having to check all 
pages that say they link to my site?


Yes, you can - exploit Google's search to do this.

You need to run a query for link:mysite.mydomain.com then 
screen-scrape the results. IE You'd curl or fopen() the pages with, for example


http://www.google.co.uk/search?q=link:www.captionkit.comhl=enlr=start=10sa=N

The for each page returned, use a regex to extract the HTML returned 
from Google, eg on


p class=ga 
href=http://archive.netbsd.se/?ml=php-databasea=2004-10m=430433; 
onmousedown=return clk(this.href,'res','18','')archive.netbsd.se - 
NetBSD Sverige/a


You just want a capture pattern to extract the href value, which you 
then store in your database. Before you accuse anybody of anything, 
ensure you've waited a few days for google to re-spider their site. 
If their site doesn't appear in the index at all, it may be because 
google doesn't or can't spider it, rather than the back link isn't 
there - but in that case their link popularity is ineffectual and may 
as well be ignored !


HTH
Cheers - Neil

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] Re: Subject: Searching remote web sites for content

2005-10-22 Thread Joseph Crawford
why do all that, if you know the address of the page that the link will
reside on just curl that page for the results and preg_match that.


--
Joseph Crawford Jr.
Zend Certified Engineer
Codebowl Solutions, Inc.
1-802-671-2021
[EMAIL PROTECTED]