RE: Example in Java Please

Lukas, Ray Tue, 11 Nov 2008 11:41:59 -0800

Thanks my friend.. This will help, I am sure.. I am just getting out of
a meeting and will start digging in there. I already have started to
look at some of the samples.. Right now I am trying to get Nutch setup
in Eclipse on windows.. That is what I am really hoping to get done in
the few hours I have today to work on this. I just went through an
article on the wiki that seems to have a good explanation.. Well I will
let you know how it goes.. 
Thanks Ismael
ray


-----Original Message-----
From: Ismael [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 11, 2008 3:58 AM
To: [email protected]
Subject: Re: Example in Java Please

Hello. You should check the code from package org.apache.nutch.crawl,
the
file Crawl.java. It is a good crawl example, with some comments, and
clear
enough (I think). It is the code used when using nutch from command
line. I
hope this help.


2008/11/10 Lukas, Ray <[EMAIL PROTECTED]>

> Thanks Hasan:
>
> Forgive me.. First your generosity is greatly appreciated. Please
accept
> my thanks.. I might be wrong, but... Humm.. I think that we are
missing
> a few things here that I also need and, is, in fact, why I selected
> Nutch.
> Nutch does some things .. like.. perform the post event, gather up and
> parse HTML, discover, and follow, the nested url links recursively,
and
> countless other things as well. Maintain a database of what was
scanned,
> and what should be scanned (WebDB), and I will let the experts expand
on
> my limited feature list.
> If I do this.. I am loosing all of those things. I am thinking that
> there is a chunk of code that demonstrates how to call and use the
Nutch
> Crawl object and indexers.
> I am, this very moment, going through the sample code (which need to
be
> commented by the way, no offense to anyone) in hopes of understanding
> how this all works together.
>
> I am up on Lucene, well kind of, Nutch is the bolder I have to move,
or
> get around.. you see.. Do not be discouraged, kindness, and helping
> other people, is never a mistake.
>
>
> -----Original Message-----
> From: Hasan Diwan [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 10, 2008 1:55 PM
> To: [email protected]
> Subject: Re: Example in Java Please
>
> Ray,
> I am feeling charitable this morning, so have posted code to do what
> you desire at the end.
> 2008/11/10 Lukas, Ray <[EMAIL PROTECTED]>:
> > If you could, please. I am, as you probably are, or have been in the
> > recent past, short on time for my project. I need something very
> simple.
> > An example that goes to a single URL, parses the pages under it,
> gathers
> > up all the words (terms) and returns me a Lucene index of them so
that
> I
> > can then say "do any of the words I am thinking (terms from my
Oracle
> > database) appear in this index and how many times do they appear".
> That
> > is it, very simple. I would like to use Nutch.
> > I am going through the Nutch source code examples which require
> someone
> > to understand Hadoop. I would love to, if I had the time, which I do
> > not. So can someone post or point me to an example.
> > Sorry to bother you, but time is a problem, I hope that you
> understand,
>
> import org.cyberneko.pull.util.DefaultHandler;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
>
> public class HTMLParser extends DefaultHandler {
>     void handleCharacters(CharacterEvent event) {
>        this.text += event.text.toString();
>     }
>
>     void handleEndDocument(DocumentEvent event) {
>          Document doc = new Document();
>          doc.add(new Field("all", this.text, Field.Store.YES,
> Field.Index.TOKENIZED));
>      }
> }
>
> This should get you started.
> --
> Cheers,
> Hasan Diwan <[EMAIL PROTECTED]>
>

RE: Example in Java Please

Reply via email to