>From my perspective as the designer of a system that both consumes and publishes data, the load/burden issue here is not at all particular to the semantic web. Needle obeys robots.txt rules, but that's a small deal compared to the difficulty of extracting whole data from sites set up to deliver it only in tiny pieces. I'd say about 98% of the time I can describe the data I want from a site with a single conceptual query. Indeed, once I've got the data into Needle I can almost always actually produce that query. But on the source site, I usually can't, and thus we are forced to waste everybody's time navigating the machines through superfluous presentation rendering designed for people. 10-at-a-time results lists, interminable AJAX refreshes, animated DIV reveals, grafting back together the splintered bits of tree-traversals, etc. This is all absurdly unnecessary. Why is anybody having to "crawl" an open semantic-web dataset? Isn't there a "download" link, and/or a SPARQL endpoint? If there isn't, why not? We're the Semantic Web, dammit. If we aren't the masters of data interoperability, what are we?
glenn (www.needlebase.com)
