On Aug 12, 2009, at 12:44 AM, Alan Coopersmith wrote:

Elaine Ashton wrote:

Sorry, I was just pointing out how many, if not most, folks likely
manage to find content on the site, even the stuff with non-broken
links.

So you admit you'll break most folks since their Google searches will
return the old links, at least until Google re-indexes every single
page on the site, and we have no idea how long that will take.

I'm not admitting to anything as I'm just the plumber down in the tubes. If it were up to me, we'd all still be using gopher. :)

I've been actively removing a lot of the spam from the mailing list archives which requires me to rebuild the pipermail indices and which breaks the links the spammers have so assiduously trained google to return on various random searches. The drawback is that it renumbers all of the links to all of the individual messages in the archive. It's a trade-off since I loathe spammers far more than broken links and remapping the old urls to the new urls for the individual messages just isn't possible.

As for google, I can submit a new sitemap and request a full reindexing at any time though this can take a day or three to complete and I can also submit a list of broken urls to be removed.

but I will point out that remapping every URL from the current site to the new urls is not very realistic.

Why?   You've got all the information you need to do it already since
you know the old URL and new URL of every page you're converting with
the transition script - outputing that to a list of redirects seems
trivial and far more realistic than anyone but your team thinking it's a good idea to break all the existing links in books, CD's, slide decks, blogs, web pages, source code (look at the ON files for the license information
lately?), search engines, and other web sites.

What information do you suspect we have? :) Getting to the urls programmatically from inside the db is not as simple as it might seem. Also, I'll have to revisit the redirect algos for apache and tomcat to see how and when it scans the remapping table, but when you're talking about 10-20,000 pages to be individually remapped, it will not be without an expense to the performance of the site as a whole. And if you maintain the broken links, you commit to maintain them in perpetuity whereas if they are broken, people fix them and move on. I believe most links will get you to the top-level project page which is a reasonable compromise.

Links change constantly and we can't control who points to what in various media and this has been an endemic problem in hypertext since there were more than two pages on the internet which linked to each other.

e.
_______________________________________________
website-discuss mailing list
[email protected]

Reply via email to