Re: nutch crawling with java (not shellscript)

2009-01-14 Thread Matthias W.
Message From: Matthias W. matthias.wang...@e-projecta.com To: nutch-user@lucene.apache.org Sent: Tuesday, January 13, 2009 7:17:50 AM Subject: nutch crawling with java (not shellscript) Hi, is there a tutorial or can anyone explain if and how I can run the nutch crawler via java

Re: nutch crawling with java (not shellscript)

2009-01-14 Thread Matthias W.
Matthias W. matthias.wang...@e-projecta.com Ok thanks! But I decided against using the nutch crawler. It will be the better way to build the index directly with Lucene, because I do not need to crawl. (I'm also searching with Lucene) Now I use the parsers PDFBox for PDF-Documents

Re: my own crawlscript.sh

2008-12-08 Thread Matthias W.
Dennis Kubes-2 wrote: Just having the urls isn't the same as having an index. You would still need to crawl them. You can inject your url list into a clean crawldb and fetch only those urls with the inject, generate, fetch commands. Then you can use the index command to index them.

my own crawlscript.sh

2008-12-05 Thread Matthias W.
Hi, I've got a textfile with all URLs to index, I don't want to crawl URLs before indexing. How to do this? Also I'm creating an index in a temporary folder and on success I want to overwrite the old index. How do I check in the shell script, if the crawl- (index-) command was successful? --

RE: Using Nutch for crawling and Lucene for searching (Wildcard/Fuzzy)

2008-11-03 Thread Matthias W.
Patrick Markiewicz wrote: I'm not sure what you're using for searching, but wherever you reference an analyzer in Lucene, you need to change that from StandardAnalyzer to AnalyzerFactory.get(NutchConfiguration.create().get(en)) (which may require importing nutch-specific classes). I

searching by Id

2008-10-21 Thread Matthias W.
Hi, every document saved in the nutch index has a unique Id !? Is it possible to get search the index by this unique Id? (Like 'id:123') -- View this message in context: http://www.nabble.com/searching-by-Id-tp20092545p20092545.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Using Nutch for crawling and Lucene for searching (Wildcard/Fuzzy)

2008-10-15 Thread Matthias W.
with Luke and the nutch webapp I get results. Andrzej Bialecki wrote: Matthias W. wrote: Hi, I want to use Nutch for crawling contents and Lucene webapp to search the Nutch-created index. I thought nutch creates a Lucene interoperable index, but when I'm searching the index with the Lucene

Edit index structure

2008-09-11 Thread Matthias W.
Hi, is it possible to edit the index structure of nutch? I have following problem: The files will be indexed by Nutch, the frontend will be implemented with Zend Framework 1.6.0 (Zend_Search_Lucene). Zend_Search_Lucene IMO doesn't support the nutch index structure, so I can only read the title,