Hi all,
I'm having same trouble trying to carawl and recrawl my local 
filesystem. I'm using the script posted at 
http://wiki.apache.org/nutch/IntranetRecrawl


My filesystem is made like this:

../
../first/
../first/file1.pdf
../first/second/
../first/second/file2.pdf
../first/second/third
../first/second/third/file2.pdf
../first/second/third/fourth/
../first/second/third/fourth/file4.pdf
../first/second/third/fourth/fifth
../first/second/third/fourth/fifth/file5.pdf


On the first crawl "round" everything seems fine....it stops at the 
"first" directory (depth 1)
On the first recrawl(depth 3) it stops at the "third" directory and all 
the files seem indexed correctly.
On the second recrawl(always depth 3) it arrives at the fifth diretory 
but none of the files are indexed.

any idea?
thanks
Luca

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to