Re: Nutch for NFS crawling and data indexing

Harald Kirsch Mon, 28 Apr 2014 04:16:52 -0700

The strength of web crawlers is their ability to follow links inwebsites to explore more and more of the landscape to index.

When you want to index fileshares, like NFS, potentially with accessrights included, you way want to use a different beast, (we call it aconnector). The benefit is, that the connector can just follow thedirectory structure. There is no need to figure out new documents byparsing the documents found so far.


Harald.


On 28.04.2014 12:54, Touretsky, Gregory wrote:

Hi,

       I see multiple references to Web search implementation with Nutch.
Have anyone implemented large scale (many TBs of data, millions of files) NFS 
search?
Are there any alternative solutions for scale-out crawling and indexing of file 
systems?

Thank you,
    Gregory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


--
Harald Kirsch
Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Duesseldorf
Fon +49-211-550266-0
Fax +49-211-550266-19
http://www.raytion.com

Re: Nutch for NFS crawling and data indexing

Reply via email to