Message
From: Matthias W. matthias.wang...@e-projecta.com
To: nutch-user@lucene.apache.org
Sent: Tuesday, January 13, 2009 7:17:50 AM
Subject: nutch crawling with java (not shellscript)
Hi,
is there a tutorial or can anyone explain if and how I can run the nutch
crawler via java
Matthias W. matthias.wang...@e-projecta.com
Ok thanks!
But I decided against using the nutch crawler.
It will be the better way to build the index directly with Lucene,
because
I
do not need to crawl.
(I'm also searching with Lucene)
Now I use the parsers PDFBox for PDF-Documents
Dennis Kubes-2 wrote:
Just having the urls isn't the same as having an index. You would still
need to crawl them. You can inject your url list into a clean crawldb
and fetch only those urls with the inject, generate, fetch commands.
Then you can use the index command to index them.
Hi,
I've got a textfile with all URLs to index, I don't want to crawl URLs
before indexing.
How to do this?
Also I'm creating an index in a temporary folder and on success I want to
overwrite the old index.
How do I check in the shell script, if the crawl- (index-) command was
successful?
--
Patrick Markiewicz wrote:
I'm not sure what you're using for searching, but wherever you
reference an analyzer in Lucene, you need to change that from
StandardAnalyzer to
AnalyzerFactory.get(NutchConfiguration.create().get(en)) (which may
require importing nutch-specific classes).
I
Hi,
every document saved in the nutch index has a unique Id !?
Is it possible to get search the index by this unique Id? (Like 'id:123')
--
View this message in context:
http://www.nabble.com/searching-by-Id-tp20092545p20092545.html
Sent from the Nutch - User mailing list archive at Nabble.com.
with Luke and the nutch webapp I get
results.
Andrzej Bialecki wrote:
Matthias W. wrote:
Hi,
I want to use Nutch for crawling contents and Lucene webapp to search the
Nutch-created index.
I thought nutch creates a Lucene interoperable index, but when I'm
searching
the index with the Lucene
Hi,
is it possible to edit the index structure of nutch?
I have following problem:
The files will be indexed by Nutch, the frontend will be implemented with
Zend Framework 1.6.0 (Zend_Search_Lucene).
Zend_Search_Lucene IMO doesn't support the nutch index structure, so I can
only read the title,