Hi there, I need to read some pages from segments to get the raw HTML.
I do it like: nutch-1.2/bin nutch readseg -get /path/to/segment http://key.value.html -nofetch -nogenerate -noparse -noparsedata -noparsetext That works fine but it takes 2 or 3 full seconds per page! My very small test environment has about 20 crawled and indexed pages and is on a single machine. A search over the Lucene index takes only milli seconds. Is there a way to read segments faster? Is it the right way to implement SegmentReader.class to get original HTML? Best Regards Thomas GfK SE, Nuremberg, Germany, commercial register Nuremberg HRB 25014; Management Board: Professor Dr. Klaus L. W?bbenhorst (CEO), Pamela Knapp (CFO), Dr. Gerhard Hausruckinger, Petra Heinlein, Debra A. Pruent, Wilhelm R. Wessels; Chairman of the Supervisory Board: Dr. Arno Mahlert This email and any attachments may contain confidential or privileged information. Please note that unauthorized copying, disclosure or distribution of the material in this email is not permitted.

