Salvador Ramirez <[EMAIL PROTECTED]> wrote: > I see no reason for indexing web pages created from data coming from > a database. Two problems I see on this: a) you can expect big volumes > of data (HTML web pages) probably with almost the same information > except for certain fields.
If you're so worried about conserving disk space, this would easily be solved by using compression. Otherwise, this claim can apply to most major web pages, especially those using SSI. Why bother indexing the page? The majority of the page (the header, sidebar, footer) is the same except for some parts. Well, the truth is that those parts include important content which is proper for a search engine to index it. Again, I repeat, if it's not important, or part of a game or some such, the Robots Exclusion Protocol should be used. > b) why indexing something that is already > indexed (a database) in another place. I'd prefer to save disk space > to web pages already created so you can expect not much change. Because the goal of a search engine is to try and index as much as possible. Of course, if we used the Site File protocol I proposed a while back, people could share their indexes. -- Aaron Swartz <[EMAIL PROTECTED]>| <http://www.theinfo.org> <http://www.swartzfam.com/aaron/> | community of knowledge-workers lambda.moo.mud.org:8888 - Aaron | - Douglas Englebart