Here is the example. It works. ./nutch segslice -filterUrlBy "-(.* dba.test.com/ftpinput/.*|.*home.in.test.com/sunrp/south/.*<http://dba.test.com/ftpinput/.*|.*home.in.test.com/sunrp/south/.*>)" -o /apps/nutch/baseROOT/data/nutch/searchC/segments/20051003000141B -logLevel FINE /data/nutch/searchC/segments/20051003000141 > ./logs/slice.log 2>&1
log level of FINE will log url entries it copies and skips. Why use this? I have written a perl script to pretty much do what the java crawl does, but in steps. I also split the fetch step to a fetch / parse (2 steps). In my case the fetch is working great, the parse hung. So I used the slice to take out (what I think) are the offending urls and I will re-run the parse and updatedb on the last segment, then back to the std crawl loop. -j.p.
