Ok, I will try 1.4dev I guess. After your comment on bolilerpipe, I want to build a custom extractor that will use xpath expressions to extract certain parts of the pages.
By the way, can one machine run two copies of nutch at the same time? I just copied /home/nutch to /home/hutch and started using hutch to crawl something else, with a differet crawl directory. Can those two interfere with each other? Best On Sat, Jul 9, 2011 at 10:56 PM, Markus Jelsma <[email protected]> wrote: > Nutch 2.0 has a few issues. Do not run it in production unless you are very > aware of it's problems. Nutch 1.4-dev will run even in production happily. I > use the latest revision in a large production environment but 1.3 in a series > of smaller production environments. > >> How usable are the development versions? like 1.4 or 2.0 - for example >> could I use 1.4 for non-production use without problems? >> >> Best Regards, >> C.B. >

