Salvador Ramirez <[EMAIL PROTECTED]> wrote:

> I see no reason for indexing web pages created from data coming from
> a database. Two problems I see on this: a) you can expect big volumes
> of data (HTML web pages) probably with almost the same information
> except for certain fields.

If you're so worried about conserving disk space, this would easily be
solved by using compression. Otherwise, this claim can apply to most major
web pages, especially those using SSI. Why bother indexing the page? The
majority of the page (the header, sidebar, footer) is the same except for
some parts. Well, the truth is that those parts include important content
which is proper for a search engine to index it. Again, I repeat, if it's
not important, or part of a game or some such, the Robots Exclusion Protocol
should be used.

> b) why indexing something that is already
> indexed (a database) in another place. I'd prefer to save disk space
> to web pages already created so you can expect not much change.

Because the goal of a search engine is to try and index as much as possible.
Of course, if we used the Site File protocol I proposed a while back, people
could share their indexes.

--
Aaron Swartz <[EMAIL PROTECTED]>|   <http://www.theinfo.org>
<http://www.swartzfam.com/aaron/> | community of knowledge-workers
 lambda.moo.mud.org:8888 - Aaron  |     - Douglas Englebart

Reply via email to