Example of segslice using -filterUrlBy

Jeff Pettenski Mon, 03 Oct 2005 11:44:12 -0700

Here is the example. It works.

./nutch segslice -filterUrlBy "-(.*
dba.test.com/ftpinput/.*|.*home.in.test.com/sunrp/south/.*<http://dba.test.com/ftpinput/.*|.*home.in.test.com/sunrp/south/.*>)"
-o /apps/nutch/baseROOT/data/nutch/searchC/segments/20051003000141B
-logLevel FINE /data/nutch/searchC/segments/20051003000141 >
./logs/slice.log 2>&1



log level of FINE will log url entries it copies and skips.

Why use this? I have written a perl script to pretty much do what the java
crawl does, but in steps. I also split the fetch step to a fetch / parse (2
steps).

In my case the fetch is working great, the parse hung. So I used the slice
to take out (what I think) are the offending urls and I will re-run the
parse and updatedb on the last segment, then back to the std crawl loop.

-j.p.

Example of segslice using -filterUrlBy

Reply via email to