Thanks for clearing that up.  So in short: it can't be done until somebody
does some coding.

Unfortunately, mounting the shares is not a feasible option for the
following reasons:

1) This would render useless the links served up through the Nutch search
(end-users won't have the same shares mounted).
2) This method has an upper limit of about 24 shares.
3) As the Fetcher discovers new documents, they might reference documents in
new shares that may not be mounted (this assumes that the *.doc interpreter
follows Word hyperlinks).

I admit that the above is not a problem for a singler user scenario, or
could be overcome through code, but the energy required to code a solution
would be better spent on the aforementioned SMB implementation.

In my particular use case, people are fond of making links of the following
form:

<a href="\\share\path\to\somefile.doc">Somefile.doc</a>

It would be nice if there were a parser hook that could interpret a pair of
leading backslash characters as SMB file links and follow them accordingly.

Anyway, that's probably enough ranting for now. I really do LOVE Nutch as it
mostly solves my Intranet indexing problem ... mostly.

-- Jim



On 9/10/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

Jim Wilson wrote:
> Thanks for responding Renaud,
>
> I'm using Nutch 0.8, and I have a single file (urls.txt) in my urls
> directory.
>
> In it, I tried putting a line just like this:
>
> file://///server/path/to/filename.doc
>


Folks,

Windows shares (CIFS / SMB shares) are accessible using CIFS/SMB
protocol, not the file protocol. Under Windows you either "mount" them
under a local driver letter (and then you can access them using the file
protocol) or you use the double backslash notation and access them
remotely through the SMB protocol - Windows Explorer tries to hide this
difference, but it does exist ...

Unfortunately, there is no SMB protocol plugin for Nutch yet - which
means that unless you mount the remote shares you are not able to access
them using the double-backslash notation, which requires using SMB.

It wouldn't be too hard to write an implementation of protocol-cifs
using the JCIFS library ...

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Reply via email to