Jim Wilson wrote:
Thanks for responding Renaud,
I'm using Nutch 0.8, and I have a single file (urls.txt) in my urls
directory.
In it, I tried putting a line just like this:
file://///server/path/to/filename.doc
Folks,
Windows shares (CIFS / SMB shares) are accessible using CIFS/SMB
protocol, not the file protocol. Under Windows you either "mount" them
under a local driver letter (and then you can access them using the file
protocol) or you use the double backslash notation and access them
remotely through the SMB protocol - Windows Explorer tries to hide this
difference, but it does exist ...
Unfortunately, there is no SMB protocol plugin for Nutch yet - which
means that unless you mount the remote shares you are not able to access
them using the double-backslash notation, which requires using SMB.
It wouldn't be too hard to write an implementation of protocol-cifs
using the JCIFS library ...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com