Good afternoon: Newbie question here: Nutch 0.9 works fine, the issue is I would like to locally store the *.html files Nutch is fetching. That is, out of my list of URLs, I want Nutch to store each *.html in a directory of my choosing.
I read an earlier reply to the mailing list along these lines: From: "Martin Kuen" <[EMAIL PROTECTED]> .. > > Hi, > > Thank you :) > One more question for the fetched page reading: I prefer I can dump the > fetched page into a single html file. You could modify the Fetcher class (org.apache.nutch.fetch.Fetcher) to create a seperate file for each downloaded file. You could modify the SegmentReader class ( org.apache.nutch.segment.SegmentReader) if you want to do that. Since I am not a Java expert I was wondering if somebody else has tackled this issue before. Also, would it be feasible to add this as a feature request for future releases? Nutch's fetch capability is very useful by itself, it might not be that difficult to expose this feature via the nutch-site.xml file. Regards. -- Jose C. Lacal, Founder & Chief Vision Officer Open Personalized Health Informatics "OpenPHI" 15625 NW 15th Avenue; Suite 15 Miami, FL 33169-5601 USA www.OpenPHI.com [O] +1 (305) 395-6091 [M] +1 (954) 553-1984 [EMAIL PROTECTED] [F] +1 (954) 364-7144
