On 8/16/06, Joshua Schachter <[EMAIL PROTECTED]> wrote: > All kinds of subtlety here. For example, what to do if the site happens > to be down while we check it? What about respecting robots.txt etc?
Site down? system:dead, system:unresponsive, lasts until next check (automatic or manual) shows available. robots.txt? As far as I know, this relates to bulk spidering not the accessing of specific URLs, which is the case here. robots.txt would have the same relationship to this process as it does were I to run one of the many link-checker applications from my PC. Del.icio.us would appear no different to the caching-proxy of any significant ISP. However, if a site operator was concerned enough to specify "User-agent: del.icio.us" obviously (to me) del.icio.us should respect that. In any case, this all begs the question of the degree of integration that is occurring, or indeed existed, between del.icio.us and one or some of the search engines. Social search, among many meanings, may imply the use of the aggregated click streams of sites like del.icio.us (or digg, techmeme, rojo, et al) to drive priority indexing. There might be a correlation between what appears on del.icio.us/popular and what people are searching for, thus early availability of search results from those links (or the entire site) may provide an advantage at very economical cost to the search operator. Further, association with a search engine using this approach would mean del.icio.us could utilise that search engines cache, so that even if the link died, del.icio.us could offer the cached version of that page, all usual caveats applying. My untested feeling is that del.icio.us does search more than the fields in the post link form which suggests some caching or access to cache is occurring. > Joshua Hamish. -- http://del.icio.us/Hamish.MacEwan Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com/group/ydn-delicious/ <*> To unsubscribe from this group, send an email to: [EMAIL PROTECTED] <*> Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/

