Software Therapist
E: [EMAIL PROTECTED] T:+44 (0) 1423 872988
W: www.idcl.co.uk
http://mvp.support.microsoft.com
-Original Message-
From: Fadzi Ushewokunze [mailto:[EMAIL PROTECTED]
Sent: 24 September 2006 04:03
To: nutch-user@lucene.apache.org
Subject: Re: crawl/index/search
Richard
Iain,
Ah thanks for that. I am actually playing with it right now.
Are you using it?
- Original Message -
From: Iain [EMAIL PROTECTED]
To: nutch-user@lucene.apache.org
Sent: Sunday, September 24, 2006 6:26 PM
Subject: RE: crawl/index/search
You might want to check out GATE from
Therapist
E: [EMAIL PROTECTED] T:+44 (0) 1423 872988
W: www.idcl.co.uk
http://mvp.support.microsoft.com
-Original Message-
From: Fadzi Ushewokunze [mailto:[EMAIL PROTECTED]
Sent: 24 September 2006 10:19
To: nutch-user@lucene.apache.org; [EMAIL PROTECTED]
Subject: Re: crawl/index/search
: Sunday, September 24, 2006 6:26 PM
Subject: RE: crawl/index/search
You might want to check out GATE from Sheffield University. It's
like UIMA
in concept, but more mature and probably richer.
They've got a number of modules which integrate with Lucene, so
integration
with Nutch should be easier
***I might have posted this already, my mail server is playing up. apologies if
so***
hi there,
Been playing with Nutch for a few weeks now, so i am starting on coming up
something
usable but i need some suggestions here;
Heres the problem - crawl the web (maybe 50 sites or so) and get
Getting other information out of the page requires parsing. In this case
you have to come up with some pretty complicated regular expressions
unless the information that you want like the company name is going to
be in the same place on each site.
I don't know know how to tackle this problem
hi there,
Been playing with Nutch for a few weeks now, so i am starting on coming
up something usable but i need some suggestions here;
Heres the problem - crawl the web (maybe 50 sites or so) and get
physical addreses;
i want to index physical addresses found on the crawl, so my search