I wrote a couple of spiders at the start of the year and managed to collect 10,000,000+ links using them. This is all in aid of a hobby project to build a vector space search engine using Perl and C++. I then started to save the pages to disk and had a couple of million pages before I lost a 20Gb SCSI disk which took a lot of my data with it.

To cut a long story short I was wondering if this data would be any use to you. I intend to start the robots again (I have bought a 160Gb SATA disk for this) and I still have the 10M links in a postgres dump although they are a bit out of date now.

I know the data is a bit out of date now but if I do start the robots again could you use the data. I ask this because I noticed on your website that you want to be able to "fetch several billion pages per month". I think that if you had enough people providing you with pages then this could be achieved and it's not using *your* bandwidth.

I have the tools to provide the pages with or without html and I also keep a record of which page each page was found on (This is all in the postgres dump I mentioned earlier).

Harry




------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to