[Nutch-dev] Crawling directly from URL and Questions about using the index

Nils Hoeller Fri, 05 Aug 2005 11:00:16 -0700

Hi, 

since my first experiments were sucessful, I m actually starting
implementing Nutch into my Website Visualisation Tool.


So I got now to my first questions:

1. I put a class into my Project that works similar to the 
CrawlerTool.java main class. 
This works fine if you have written the urls into a file 
like "urls". But now I want to directly crawl a site
which means:

instead of using

WebDBInjector.main(prependFileSystem(fs, nameserver, new String[] { db,
"-urlfile", rootUrlFile })); 

I d like to call crawling for a let s say : String url which 
is a the wanted URL and not a File where urls are in.

How could this be done ? 

Or the other solution:

Can I have a nutch process running , that checks a certain file for 
new urls, and does crawling and indexing for me. 
So that I can add urls to that file (acting like a queue) out of may
program.


2. 

Using the given .war file for searching the Index after crawling works
fine. But where do I have to look , to find out how it works. 
I m now using the method to start tomcat from the crawleddir where 
the segments are. Searching works fine. 
But I d like to implement searching (I used Lucene directly before)
in my application, and so it would be interesting how the war works.


That s it for the moment.
Thanks for any kind of help.

Greetings Nils



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Crawling directly from URL and Questions about using the index

Reply via email to