RE: Nutch book (Thanks)

2009-08-12 Thread Max S
Thanks Alexander. -Original Message- From: Alexander Aristov [mailto:alexander.aris...@gmail.com] Sent: Wednesday, August 12, 2009 3:42 PM To: nutch-user@lucene.apache.org Subject: Re: Nutch book Wiki is the best resource http://wiki.apache.org/nutch/ There are some presentations and

Re: Which Java objects to index a web page ?

2009-08-12 Thread Fabrice Estiévenart
I like using Nutch for the crawlDB, scalability, threading, document parsing, ... but crawling is not important to me as I index targeted data sources. Obviously, I'm using it with Solr for indexing and searching documents. Fabrice Alexander Aristov a écrit : Nutch primarily is a crawler. I

Re: How do I get all the documents in the index without searching?

2009-08-12 Thread Paul Tomblin
On Tue, Aug 11, 2009 at 2:10 PM, Paul Tomblin wrote: > I want to iterate through all the documents that are in the crawl, > programattically.  The only code I can find does searches.  I don't > want to search for a term, I want everything.  Is there a way to do > this? To answer my own question, w

Re: Which Java objects to index a web page ?

2009-08-12 Thread Alexander Aristov
Nutch primarily is a crawler. I would suggest you to take a look at solr which is just indexer and searcher. You may use it's API as well as open interfaces Best Regards Alexander Aristov 2009/8/12 Fabrice Estiévenart > Hello, > > How can I use Nutch Java objects to index one (or a very limite

Re: Nutch book

2009-08-12 Thread Alexander Aristov
Wiki is the best resource http://wiki.apache.org/nutch/ There are some presentations and other interesting links. Some day ago I found this "Getting started" very simple and straight forward http://lucene.apache.org/nutch/tutorial8.html Best Regards Alexander Aristov 2009/8/12 Max S > Does

Fwd: Sign up for ApacheCon US by 14 August and save up to $500!

2009-08-12 Thread Grant Ingersoll
Forwarding the ApacheCon announcement. Also note we have a lot of Lucene ecosystem talks and a meetup scheduled, as well as training on both Lucene and Solr, so I hope you will join us. Cheers, Grant Begin forwarded message: From: Sally Khudairi Date: August 7, 2009 9:55:10 PM EDT To: an

Re: Nutch to SolR. First steps

2009-08-12 Thread Alex McLintock
OK, I'm trying to use the SolrIndexer with Nutch 1.0 and nothing seems to be sent to Solr. I've put some more debug logging into the SolrIndexer and SolrWriter classes. It seems like although the SolrWriter class is told to open() and close() it is never told to write() anything in between. Why

Re: nutch and JBoss

2009-08-12 Thread Fadzi Ushewokunze
i use jboss for development; only thing different is i have to start jboss from cygwin; otherwise nuthc wont work. On Wed, 2009-08-12 at 14:23 +0400, Alexander Aristov wrote: > Nutch comes with all necessary libraries and I don't think this is a problem > which would prevent running it. I am not s

Re: How do I get all the documents in the index without searching?

2009-08-12 Thread Alex McLintock
Try looking at how the indexers work. They *do* iterate through all the documents in the crawl (or rather one segment at a time). However they do it in a Hadoop way... 2009/8/11 Paul Tomblin : > I want to iterate through all the documents that are in the crawl, > programattically.  The only code

Re: nutch and JBoss

2009-08-12 Thread Alexander Aristov
Nutch comes with all necessary libraries and I don't think this is a problem which would prevent running it. I am not sure about all JBoss features if they are compatible with nutch. Anyway it's a minute work to check if you already have JBoss installed and configured. And you may share you experi

Which Java objects to index a web page ?

2009-08-12 Thread Fabrice Estiévenart
Hello, How can I use Nutch Java objects to index one (or a very limited set of) web page(s) without crawling them ? Do I need to use the crawling tools (such as Injector, Generator, ...) or can I do it by the means of lower-level objects (Content, ParseResult, ...) ? Thanks for your help,