I think this is very important for dynamic site
developers to understand. I'm very interested in
learning more about this and I think we could all
benefit from anyone with solid search engine
experience.
I run a site with about 18,000 news articles. They are
stored in database and dynamically generated (some
template elements update weekly). Since these articles
are mostly static once published, I generate a
last-modified header using the article publish_date
(and zero's for the hour/min/sec). This last-modified
header is also used by the internal search engine
(ht://Dig) to make articles searchable by date.
I'm finding that even though google indexes the site
daily and grabs stories for their news.google.com MANY
of my pages are not appearing in the google index. It
appears that these are not being updated in their
cache either (only a couple months of data to go on).
I'm quite knowledgable on search engine optimizing
etc. but this has me confused.
To make sure that google re-indexes every month. I
have thought of sending a last modified header using
year/month/day of article and a random
hour/minute/second. but if this random
hour/month/second is earlier than the one already
indexed it does not get indexed?
olinux
On Wed, 28 May 2003 09:31:11 -0500, Jay Blanchard
wrote:
I wouldn't go as far as using the
auto_prepend_file.
Neither would I in this case Jay.It was simply
an example of what
could be done, not necessarily what SHOULD be done.
I did however, use
auto_prepend_file in a .htaccess file for a somewhat
similar case.
I have a site with about 90 pseudo-static pages (the
page is static but
I use PHP to include the header and footer) and a
handful of fully
dynamic pages. I REALLY want this site to be
regularly updated in the
search engines but, unfortunately, many search
engines only spider
pages that are newer than what they have in their
database. Since
PHP is dynamic, it doesn't report a Last-Modified
header so the
search engine doesn't think anything has been
updated. Hence stale
search engine results.
To force all of the pages (both pseudo-static and
dynamic) to generate
a Last-Modified header, I set up prepend.php
script which is
configured as a directory level (.htaccess) parm to
auto_prepend_file.
Here is the content of prepend.php.
?php
header( Last-Modified: .
gmdate( D, d M Y H:i:s,
filemtime( $_SERVER['SCRIPT_FILENAME'] ) ) .
GMT );
?
For my truly dynamic pages, I figured out that only
the last call to
header actually shows up in the real header that
makes it to the
browser (or search engine), so I can create a more
unique
Last-Modified header as part of the dynamic pages
(like when the
database is updated or whatever makes sense) and it
will overwrite the
automatically generated one.
__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php