Thanks Alexander.
-Original Message-
From: Alexander Aristov [mailto:alexander.aris...@gmail.com]
Sent: Wednesday, August 12, 2009 3:42 PM
To: nutch-user@lucene.apache.org
Subject: Re: Nutch book
Wiki is the best resource
http://wiki.apache.org/nutch/
There are some presentations and
I like using Nutch for the crawlDB, scalability, threading, document
parsing, ... but crawling is not important to me as I index targeted
data sources.
Obviously, I'm using it with Solr for indexing and searching documents.
Fabrice
Alexander Aristov a écrit :
Nutch primarily is a crawler. I
On Tue, Aug 11, 2009 at 2:10 PM, Paul Tomblin wrote:
> I want to iterate through all the documents that are in the crawl,
> programattically. The only code I can find does searches. I don't
> want to search for a term, I want everything. Is there a way to do
> this?
To answer my own question, w
Nutch primarily is a crawler. I would suggest you to take a look at solr
which is just indexer and searcher. You may use it's API as well as open
interfaces
Best Regards
Alexander Aristov
2009/8/12 Fabrice Estiévenart
> Hello,
>
> How can I use Nutch Java objects to index one (or a very limite
Wiki is the best resource
http://wiki.apache.org/nutch/
There are some presentations and other interesting links.
Some day ago I found this "Getting started" very simple and straight forward
http://lucene.apache.org/nutch/tutorial8.html
Best Regards
Alexander Aristov
2009/8/12 Max S
> Does
Forwarding the ApacheCon announcement. Also note we have a lot of
Lucene ecosystem talks and a meetup scheduled, as well as training on
both Lucene and Solr, so I hope you will join us.
Cheers,
Grant
Begin forwarded message:
From: Sally Khudairi
Date: August 7, 2009 9:55:10 PM EDT
To: an
OK,
I'm trying to use the SolrIndexer with Nutch 1.0 and nothing seems to
be sent to Solr.
I've put some more debug logging into the SolrIndexer and SolrWriter
classes. It seems like although the SolrWriter class is told to open()
and close() it is never told to write() anything in between.
Why
i use jboss for development; only thing different is i have to start
jboss from cygwin; otherwise nuthc wont work.
On Wed, 2009-08-12 at 14:23 +0400, Alexander Aristov wrote:
> Nutch comes with all necessary libraries and I don't think this is a problem
> which would prevent running it. I am not s
Try looking at how the indexers work. They *do* iterate through all
the documents in the crawl (or rather one segment at a time). However
they do it in a Hadoop way...
2009/8/11 Paul Tomblin :
> I want to iterate through all the documents that are in the crawl,
> programattically. The only code
Nutch comes with all necessary libraries and I don't think this is a problem
which would prevent running it. I am not sure about all JBoss features if
they are compatible with nutch.
Anyway it's a minute work to check if you already have JBoss installed and
configured. And you may share you experi
Hello,
How can I use Nutch Java objects to index one (or a very limited set of)
web page(s) without crawling them ?
Do I need to use the crawling tools (such as Injector, Generator, ...)
or can I do it by the means of lower-level objects (Content,
ParseResult, ...) ?
Thanks for your help,
11 matches
Mail list logo