hey everyone, I'm using Nutch 1.5. I'm trying to crawl a local directory and index the files into HDFS, and then into Solr. I can successfully run a local crawl that then creates a local directory, but I inevitably run out of space and/or get out of memory errors. What I really want to do is have the input paths be on my local fs and the output paths be to HDFS.
so, seems like it wouldn't be so complicated, but I can't figure out what to modify in the Nutch source code. anyone have any pointers? thanks, casey

