Hi, All, I have since modified nutch to make it possible to fetch ftp sites. This allows me to build up an intranet search engine for files on both http and ftp servers. The fetcher is in stable running for weeks over a few millions of urls.
In my modification, ftp response mimics http one, so that code changes are at minimum (for the purpose of fetch). Specifically (1) Response.java is made as an interface instead of a class (2) HostQueueKey is tweaked to include url scheme (protocol) and port. (3) of course, light change for HttpResponse.java, Http.java and others. I would like to sumbit a patch if core developers think this approach is sensible. My base code is nutch-2003-11-17. Thanks, John On Sun, Dec 21, 2003 at 04:34:00PM -0800, [EMAIL PROTECTED] wrote: > Hi, All, > > Has anyone made nutch capable of fetching ftp sites besides http ones? > Nutch uses its own http class when dealing with http fetches. > Is there an ftp class in working too? > > Thanks, > > John __________________________________________ http://www.neasys.com - A Good Place to Be Come to visit us today! ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
