Re: [Nutch-general] Can I generate nutch index without crawling?

The Golden Condor ! Tue, 23 Jan 2007 16:32:00 -0800

If you have some pre-downloaded pages on the local file system that
you'd like to index, then the process is not as simple. You'd need to
make some changes to the fetcher code to open and load a local file
instead of opening an http socket. Be careful not to try and change
the download protocol to file:// because, while the protocol is
supported, you will get undesirable results in terms of in-link and
anchor text information. Substituting http://blah with
file://path/to/blah will make Nutch believe that file://path/to/blah
is the URL of the file, which is likely not what you want.


Cheers,

TAA

On 1/23/07, Sean Dean <[EMAIL PROTECTED]> wrote:
> What exactly are you looking to do?
>
> If you don't crawl for anything, then what data are you looking to index?
>
> You can certainly take some other persons Nutch segment (that they crawled) 
> and then index it yourself, on your machines.
>
>
> ----- Original Message ----
> From: Scott Green <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Tuesday, January 23, 2007 12:08:31 PM
> Subject: Can I generate nutch index without crawling?
>
>
> Hi,
>
> I am now debugging nutch searcher and wondering can I generate nutch
> index without crawling? If yes, can you give me some hints? Thanks.
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Can I generate nutch index without crawling?

Reply via email to