[PHP] Blatantly Evil Question

2005-08-11 Thread Brian Dunning
What is the best way to cloak a site - send search engines different  
content than real users?


Yes, I know it's bad practice, and I know the domain will eventually  
be banned. I've found lots of different methods including huge tables  
of all the possible client types sent by various spiders. I postulate  
that the simplest/fastest way to do it, and no less reliably, would  
be to simply consider any user whose client type includes msie,  
netscape, or safari to be a person, and let the rest go.


Anyone have any practical experience with success that they'd like to  
share? I know there are plenty of negative stories and reasons NOT to  
do those but no need to take up the bandwidth with that; heard 'em  
already.  :)


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Blatantly Evil Question

2005-08-11 Thread Brian Dunning

On Aug 11, 2005, at 3:44 PM, Evert | Collab wrote:


Use robots.txt
'evil' searchengines will spoof the user-agent string anyway


Can you be more specific about what you mean by use robots.txt?

I just want to cloak for Google, MSN, and Yahoo. I couldn't care less  
about what any other search engine (evil or not) does or sees.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Blatantly Evil Question

2005-08-11 Thread Brian Dunning

On Aug 11, 2005, at 4:06 PM, Evert | Collab wrote:


First hit on google:
http://www.searchengineworld.com/robots/robots_tutorial.htm
Search engines check for a robots.txt on your site, in the  
robots.txt file you can specify that certain or all search engines  
shouldn't index your site


I know what robots.txt is, I meant how would you use that to cloak  
the site.  Put PHP code in robots.txt to log the IP of any requests  
to a db, and then use that db to cloak the rest of the site or not?


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Blatantly Evil Question

2005-08-11 Thread Jasper Bryant-Greene

Brian Dunning wrote:

On Aug 11, 2005, at 3:44 PM, Evert | Collab wrote:


Use robots.txt
'evil' searchengines will spoof the user-agent string anyway



Can you be more specific about what you mean by use robots.txt?

I just want to cloak for Google, MSN, and Yahoo. I couldn't care less  
about what any other search engine (evil or not) does or sees.




robots.txt will not do what you want it to.

Just sniff for those robots' User-Agents (Google, MSN and Yahoo all 
publish their UA strings on their websites, AFAIK) and send different 
content if it's one of those.


Jasper

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Blatantly Evil Question

2005-08-11 Thread Jochem Maas

Jasper Bryant-Greene wrote:

Brian Dunning wrote:


On Aug 11, 2005, at 3:44 PM, Evert | Collab wrote:


Use robots.txt
'evil' searchengines will spoof the user-agent string anyway




Can you be more specific about what you mean by use robots.txt?

I just want to cloak for Google, MSN, and Yahoo. I couldn't care less  
about what any other search engine (evil or not) does or sees.




robots.txt will not do what you want it to.

Just sniff for those robots' User-Agents (Google, MSN and Yahoo all 
publish their UA strings on their websites, AFAIK) and send different 
content if it's one of those.


they will hammer you for it eventually - AFAICT all major SEs send out their
spiders occasionally with faked user-agent strings - to catch out crap like
this.

oh and the guy that invented php is a really bigcheese down at yahoo...
and he reads this list :-) though I doubt he has the time or desire to chase
you personally.

I would recommend you don't go down this road. it's bad for your business in the
longer term and its bad for the web because your filling it with shite.



Jasper



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Blatantly Evil Question

2005-08-11 Thread Jasper Bryant-Greene

Jochem Maas wrote:

Jasper Bryant-Greene wrote:

robots.txt will not do what you want it to.

Just sniff for those robots' User-Agents (Google, MSN and Yahoo all 
publish their UA strings on their websites, AFAIK) and send different 
content if it's one of those.



they will hammer you for it eventually - AFAICT all major SEs send out 
their

spiders occasionally with faked user-agent strings - to catch out crap like
this.

oh and the guy that invented php is a really bigcheese down at yahoo...
and he reads this list :-) though I doubt he has the time or desire to 
chase

you personally.

I would recommend you don't go down this road. it's bad for your 
business in the

longer term and its bad for the web because your filling it with shite.


Of course it is, but in his original post he said that he realised that 
it was bad, and he didn't want to hear reasons not to do it.


I would never even attempt to do something like this on a website of my 
own -- as I said in an off-list email to this guy (it was OT for the 
list) it's going to harm his website more than help it. It's not exactly 
hard for the search engines to detect cloaking.


Jasper

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Blatantly Evil Question

2005-08-11 Thread Philip Hallstrom

robots.txt will not do what you want it to.

Just sniff for those robots' User-Agents (Google, MSN and Yahoo all publish 
their UA strings on their websites, AFAIK) and send different content if 
it's one of those.


they will hammer you for it eventually - AFAICT all major SEs send out their
spiders occasionally with faked user-agent strings - to catch out crap like
this.


google adsense won't.  I explicity asked them about this.  Well, what I 
asked was that if I had a password protected area, could I allow them 
access to spider the content so that normal users could see the ads.  I 
told them the layout would be different, but the content the same.


They said that was fine.

2cents.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Blatantly Evil Question

2005-08-11 Thread Jochem Maas

Philip Hallstrom wrote:

robots.txt will not do what you want it to.

Just sniff for those robots' User-Agents (Google, MSN and Yahoo all 
publish their UA strings on their websites, AFAIK) and send different 
content if it's one of those.



they will hammer you for it eventually - AFAICT all major SEs send out 
their
spiders occasionally with faked user-agent strings - to catch out crap 
like

this.



google adsense won't.  I explicity asked them about this.  Well, what I 
asked was that if I had a password protected area, could I allow them 
access to spider the content so that normal users could see the ads.  I 
told them the layout would be different, but the content the same.


They said that was fine.


but you didn't ask - 'heh is it okay to fill my public page with SEO crud but
only if a spider comes round'

they might just take a different view on that :-)



2cents.



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Blatantly Evil Question

2005-08-11 Thread Matthew Weier O'Phinney
* Brian Dunning [EMAIL PROTECTED] :
 On Aug 11, 2005, at 4:06 PM, Evert | Collab wrote:

  First hit on google:
  http://www.searchengineworld.com/robots/robots_tutorial.htm
  Search engines check for a robots.txt on your site, in the  
  robots.txt file you can specify that certain or all search engines  
  shouldn't index your site

 I know what robots.txt is, I meant how would you use that to cloak  
 the site.  Put PHP code in robots.txt to log the IP of any requests  
 to a db, and then use that db to cloak the rest of the site or not?

If you want to dynamically determine what to disallow based on the
UserAgent string, simply tell Apache, via an .htaccess file,  to pass
robots.txt to PHP for handling. Then have that script do the processing
and return output compatible with the robots.txt specification.

-- 
Matthew Weier O'Phinney
Zend Certified Engineer
http://weierophinney.net/matthew/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Blatantly Evil Question

2005-08-11 Thread Jasper Bryant-Greene

Evert | Collab wrote:

Lets just put it this way:

if you don't want your site indexed, use robots.txt
if you want to hide your site from search engines [ which won't even 
touch your files if you use robots.txt ] check the UA string.


I can't imagine a situation where you want to hide your content from the 
major search engines, since they all use robots.txt


You misunderstand his original question. He wants to show different 
content to search engines than to users. He understands this is a bad 
thing to do, but just wants to know how to do it anyway.


Jasper

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Blatantly Evil Question

2005-08-11 Thread Jochem Maas

Jasper Bryant-Greene wrote:

Jochem Maas wrote:


Jasper Bryant-Greene wrote:


robots.txt will not do what you want it to.

Just sniff for those robots' User-Agents (Google, MSN and Yahoo all 
publish their UA strings on their websites, AFAIK) and send different 
content if it's one of those.




they will hammer you for it eventually - AFAICT all major SEs send out 
their
spiders occasionally with faked user-agent strings - to catch out crap 
like

this.

oh and the guy that invented php is a really bigcheese down at yahoo...
and he reads this list :-) though I doubt he has the time or desire to 
chase

you personally.

I would recommend you don't go down this road. it's bad for your 
business in the

longer term and its bad for the web because your filling it with shite.



Of course it is, but in his original post he said that he realised that 
it was bad, and he didn't want to hear reasons not to do it.


I know - I only really replied to voice my total disdain for idiots
who are filling the search engines with shite. thats bad for all of us
(well those of us that use search engines - you get the impression that
some people here don't know what one is ;-)

if he didn't want to hear this stuf he should have googled - there is
tons of code that does this - he didn't really need the list at all to
figure out how to do it.



I would never even attempt to do something like this on a website of my 
own -- as I said in an off-list email to this guy (it was OT for the 


you have to go pretty far to be off topic for this list ;-)

list) it's going to harm his website more than help it. It's not exactly 
hard for the search engines to detect cloaking.


I concur.



Jasper



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php