I know it's only tangentially related to Nutch, but is no one else interested in this? I've read the APIs and read a couple of news stories about it, and it looks like you can download the crawled data (for a relatively small fee: $1/GB).
This could be the thing that changes everything. The barrier to entry to this field was fairly low using Nutch, but building up a decent sized index takes time and a decent number of machines. Now you can buy the crawled data, and literally get a custom search engine running overnight. I'm guessing that many would choose not to host their front-end search on Alexa. In this case, Nutch/Lucene would come in very handy. Just cram the Alexa data into a Lucene index, and use Nutch as the front-end. Instant search engine... Howie
It doesn't sound like they are offering the data itself, only access to it, CPU cycles used for accessing it, upload of your own data, and such. In other words, it doesn't sound like you can just download a chunk of data and do your own processing with it. That would be one mighty chunk! :) Otis
------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
