On 4 Aug 2010, at 2:53 AM, Elizabeth Lloyd wrote:
My query specifically relates to 'orphan files' e.g
'Guidelines on Dissemination of Information through Government
Websites':
http://www.gov.hk/en/about/accessibility/docs/disseminationguidelines.pdf
"33. Any out-dated or obsolete web pages should be removed from the
production site. If these orphan pages are still retained on-line,
they may be
accessible through the search results from search engines though no
navigation
path to the obsolete page is available. This may result in users
getting
incorrect or outdated information from the website."
Scenario : a webpage is deleted from the website and an associated
file e.g pdf file it links to is not removed from the web server.
Subsequently this file is located in a search result and is loaded
into the web browser as a valid URL with the path to the location of
the file within the web server.
What web server file management processes etc should be put in place
to avoid this file from still being publicly accessible?
Are there procedures/ processes/ protocols for ensuring that the
associated obsolete/orphaned files previously linked to a webpage do
not still remain?
Hello, Elizabeth. Apologies for the delay in replying.
What good practice standards in web server file management exist,
specifically relating to currency of files? For example, to avoid a
file that is years out of date from remaining on the server, being
accessible by a search engine and delivering inaccurate information.
I don't have a lot of experience with this, and I am not familiar with
useful resources on this topic. (I welcome input from others on this
list.) Here are a few ideas:
* Put status information in documents and act as though the first
time you publish them it will be the last time you publish them. There
should be enough status information so that if somebody finds the
document five years later, they understand the context in which it was
created, where to look for the most up-to-date information, and whom
to contact with questions.
If a document becomes outdated, you can update the status
information and leave the document on the site for historical
purposes. In this case, provide a link so that people can find more up-
to-date information.
* You might be able to run software over your site and look for
documents that have not changed in a long time AND that are not in a
list of documents known to be ok even if that haven't changed.
* Maintaining information up-to-date seems to me to be largely a
social process. Documents that are published but that are not part of
any routine review are likely to become outdated. Documents that are
reviewed regularly have a better chance of being kept up to date or of
being explicitly marked as outdated.
* Lastly, HTTP offers redirect mechanisms so that at the server
level you can automatically redirect people from URIs that are no
longer maintained to ones that are. Of course, this relies on a social
process of knowing which URIs you want to redirect.
Hope that helps,
_ Ian
Kind regards,
Elizabeth Lloyd
--
Ian Jacobs ([email protected]) http://www.w3.org/People/Jacobs/
Tel: +1 718 260 9447