Thanks my friend.. This will help, I am sure.. I am just getting out of a meeting and will start digging in there. I already have started to look at some of the samples.. Right now I am trying to get Nutch setup in Eclipse on windows.. That is what I am really hoping to get done in the few hours I have today to work on this. I just went through an article on the wiki that seems to have a good explanation.. Well I will let you know how it goes.. Thanks Ismael ray
-----Original Message----- From: Ismael [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 11, 2008 3:58 AM To: [email protected] Subject: Re: Example in Java Please Hello. You should check the code from package org.apache.nutch.crawl, the file Crawl.java. It is a good crawl example, with some comments, and clear enough (I think). It is the code used when using nutch from command line. I hope this help. 2008/11/10 Lukas, Ray <[EMAIL PROTECTED]> > Thanks Hasan: > > Forgive me.. First your generosity is greatly appreciated. Please accept > my thanks.. I might be wrong, but... Humm.. I think that we are missing > a few things here that I also need and, is, in fact, why I selected > Nutch. > Nutch does some things .. like.. perform the post event, gather up and > parse HTML, discover, and follow, the nested url links recursively, and > countless other things as well. Maintain a database of what was scanned, > and what should be scanned (WebDB), and I will let the experts expand on > my limited feature list. > If I do this.. I am loosing all of those things. I am thinking that > there is a chunk of code that demonstrates how to call and use the Nutch > Crawl object and indexers. > I am, this very moment, going through the sample code (which need to be > commented by the way, no offense to anyone) in hopes of understanding > how this all works together. > > I am up on Lucene, well kind of, Nutch is the bolder I have to move, or > get around.. you see.. Do not be discouraged, kindness, and helping > other people, is never a mistake. > > > -----Original Message----- > From: Hasan Diwan [mailto:[EMAIL PROTECTED] > Sent: Monday, November 10, 2008 1:55 PM > To: [email protected] > Subject: Re: Example in Java Please > > Ray, > I am feeling charitable this morning, so have posted code to do what > you desire at the end. > 2008/11/10 Lukas, Ray <[EMAIL PROTECTED]>: > > If you could, please. I am, as you probably are, or have been in the > > recent past, short on time for my project. I need something very > simple. > > An example that goes to a single URL, parses the pages under it, > gathers > > up all the words (terms) and returns me a Lucene index of them so that > I > > can then say "do any of the words I am thinking (terms from my Oracle > > database) appear in this index and how many times do they appear". > That > > is it, very simple. I would like to use Nutch. > > I am going through the Nutch source code examples which require > someone > > to understand Hadoop. I would love to, if I had the time, which I do > > not. So can someone post or point me to an example. > > Sorry to bother you, but time is a problem, I hope that you > understand, > > import org.cyberneko.pull.util.DefaultHandler; > import org.apache.lucene.document.Document; > import org.apache.lucene.document.Field; > > public class HTMLParser extends DefaultHandler { > void handleCharacters(CharacterEvent event) { > this.text += event.text.toString(); > } > > void handleEndDocument(DocumentEvent event) { > Document doc = new Document(); > doc.add(new Field("all", this.text, Field.Store.YES, > Field.Index.TOKENIZED)); > } > } > > This should get you started. > -- > Cheers, > Hasan Diwan <[EMAIL PROTECTED]> >
