Thanks for clearing that up. So in short: it can't be done until somebody does some coding.
Unfortunately, mounting the shares is not a feasible option for the following reasons: 1) This would render useless the links served up through the Nutch search (end-users won't have the same shares mounted). 2) This method has an upper limit of about 24 shares. 3) As the Fetcher discovers new documents, they might reference documents in new shares that may not be mounted (this assumes that the *.doc interpreter follows Word hyperlinks). I admit that the above is not a problem for a singler user scenario, or could be overcome through code, but the energy required to code a solution would be better spent on the aforementioned SMB implementation. In my particular use case, people are fond of making links of the following form: <a href="\\share\path\to\somefile.doc">Somefile.doc</a> It would be nice if there were a parser hook that could interpret a pair of leading backslash characters as SMB file links and follow them accordingly. Anyway, that's probably enough ranting for now. I really do LOVE Nutch as it mostly solves my Intranet indexing problem ... mostly. -- Jim On 9/10/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Jim Wilson wrote: > Thanks for responding Renaud, > > I'm using Nutch 0.8, and I have a single file (urls.txt) in my urls > directory. > > In it, I tried putting a line just like this: > > file://///server/path/to/filename.doc > Folks, Windows shares (CIFS / SMB shares) are accessible using CIFS/SMB protocol, not the file protocol. Under Windows you either "mount" them under a local driver letter (and then you can access them using the file protocol) or you use the double backslash notation and access them remotely through the SMB protocol - Windows Explorer tries to hide this difference, but it does exist ... Unfortunately, there is no SMB protocol plugin for Nutch yet - which means that unless you mount the remote shares you are not able to access them using the double-backslash notation, which requires using SMB. It wouldn't be too hard to write an implementation of protocol-cifs using the JCIFS library ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
