Rick rick-at-napalmriot.com |nyphp dev/internal group use| wrote:
It's the webserver that is configured to look for default-index files,
such as index.html, and not search engines. Search engines only
attempt to access valid resources, such as the "fake" resource you
mentioned (which is quite valid and not fake at all).
--
Rick
http://www.sensual.jp
(Top-posting requires top-posting... sorry Michael.)
Yes, technically correct -- it is the webserver. BUT, to the traditional
search engine, the URL defines the resource. Every unique URL is
potentially a unique resource, and ideally they are all tested and
included in the index if unique.
As webmaster, in the eyes of the indexing search spider, you have
defined your "site" by the URL structure you used to define the
resources, and not by the content (regardless of how that content is
served... by the web server or your PHP scripts). So it becomes
important to control the URL even more carefully than the content in
many cases.
This is now changing, as we move away from URL as defining name/label
(ajax, etc). If semantic web were more advanced, it might work, but for
now, it's a good thing we only have one search engine because its
behavior is slowly becoming less standardized and more customized over
time (that was sarcasm.... a little).
-=john andrews
inforequest wrote:
Kenneth Downs ken-at-secdat.com |nyphp dev/internal group use| wrote:
Let's say you use a friendly url (furl) system so that a url looks
like this:
www.example.com/furl/parm/value/parm/value
Because we are faking a nesting of folders and files here, will a
search bot expect to be able to find:
www.example.com/furl/parm/index.html?
and
www.example.com/furl/parm/value/index.html?
I have not seen any mention of this by search engines (or their human
representatives).
I don't think you are "faking" anything, though. It's a valid web
resource, right? Who said it had to represent files and folders?
Lately it seems that they do some poking around when you do this:
www.example.com/furl/parm/value/parm/value
to try and determine the best way to grab that resource (slash or no
slash) but the name is just "value". Google has said it uses your
own internal linking styles as clue for your site, and also how
others link to you.
A quick check of Google shows this page ranking well:
http://www.phpwact.org/pattern/model_view_controller
with this snippet:
Model View *Controller* [Web Application Component Toolkit]
<http://www.phpwact.org/pattern/model_view_controller>
Application *Controller* Controls the flow of logic of a single
application. Because the popular MVC framework Java Struts from a
*PHP* Perspective implements a *...*
www.*php*wact.org/pattern/model_view_*controller* - 40k - Cached
<http://64.233.167.104/search?q=cache:AU1WIk8nh3MJ:www.phpwact.org/pattern/model_view_controller+php+controller&hl=en&ct=clnk&cd=9&gl=us>
- Similar pages
</search?hl=en&q=related:www.phpwact.org/pattern/model_view_controller>
A hit to a trailing-slash-added version gives a 200ok but emptyish
template page, BUT it is in the Google index with this snippet:
Model View Controller [Web Application Component Toolkit]
<http://www.phpwact.org/pattern/model_view_controller>
You are here: Web Application Component Toolkit » pattern » Model
View Controller. Table of Contents. Model View Controller. Model.
Passive Model *...*
www.phpwact.org/pattern/model_view_controller - 40k - Cached
<http://64.233.167.104/search?q=cache:AU1WIk8nh3MJ:www.phpwact.org/pattern/model_view_controller+http://www.phpwact.org/pattern/model_view_controller/&hl=en&ct=clnk&cd=1&gl=us>
- Similar pages
</search?hl=en&q=related:www.phpwact.org/pattern/model_view_controller>
Notice that Google lists that trailing slash page with a URL that has
no trailing slash. That looks like a double listing of the
no-trailing-slash URL, with two snippets.
The contrived resource (with index.html) does the same (200ok but
empty template page):
http://www.phpwact.org/pattern/model_view_controller/index.php
A search of Google for that contrived page
http://www.phpwact.org/pattern/model_view_controller/index.php shows
"no such page".
So? Google says slash and no trailing slash are the same resource as
no trailing slash, but represents the two as different content in the
index (which they are). That suggests that Google is confused about
that resource. A scan of the rest of that results set shows some URLs
have trailing slashes, some do not :
http://www.google.com/search?q=php+controller&hl=en&start=10&sa=N
What happens if I put a page on my site and 302 redirect to those
pages? Will Google take their content and index it as belonging to my
site? What if I put content at that empty template page, which is
diferent content than the no-trailing-slash URL. Will Google still
list both pages as going to the no-trailing-slash URL? If one page is
on red widgets and the other on reen baboons, will it effect
relevance ranking of either page? That's for SEO homework.
-=john
_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk
NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com
Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php