RE: crawl/index/search

2006-09-24 Thread Iain
Software Therapist E: [EMAIL PROTECTED] T:+44 (0) 1423 872988 W: www.idcl.co.uk http://mvp.support.microsoft.com -Original Message- From: Fadzi Ushewokunze [mailto:[EMAIL PROTECTED] Sent: 24 September 2006 04:03 To: nutch-user@lucene.apache.org Subject: Re: crawl/index/search Richard

Re: crawl/index/search

2006-09-24 Thread Fadzi Ushewokunze
Iain, Ah thanks for that. I am actually playing with it right now. Are you using it? - Original Message - From: Iain [EMAIL PROTECTED] To: nutch-user@lucene.apache.org Sent: Sunday, September 24, 2006 6:26 PM Subject: RE: crawl/index/search You might want to check out GATE from

RE: crawl/index/search

2006-09-24 Thread Iain
Therapist E: [EMAIL PROTECTED] T:+44 (0) 1423 872988 W: www.idcl.co.uk http://mvp.support.microsoft.com -Original Message- From: Fadzi Ushewokunze [mailto:[EMAIL PROTECTED] Sent: 24 September 2006 10:19 To: nutch-user@lucene.apache.org; [EMAIL PROTECTED] Subject: Re: crawl/index/search

Re: crawl/index/search

2006-09-24 Thread Richard Braman
: Sunday, September 24, 2006 6:26 PM Subject: RE: crawl/index/search You might want to check out GATE from Sheffield University. It's like UIMA in concept, but more mature and probably richer. They've got a number of modules which integrate with Lucene, so integration with Nutch should be easier

crawl/index/search

2006-09-19 Thread Fadzi Ushewokunze
***I might have posted this already, my mail server is playing up. apologies if so*** hi there, Been playing with Nutch for a few weeks now, so i am starting on coming up something usable but i need some suggestions here; Heres the problem - crawl the web (maybe 50 sites or so) and get

Re: crawl/index/search

2006-09-19 Thread Richard Braman
Getting other information out of the page requires parsing. In this case you have to come up with some pretty complicated regular expressions unless the information that you want like the company name is going to be in the same place on each site. I don't know know how to tackle this problem

crawl/index/search

2006-09-18 Thread dev
hi there, Been playing with Nutch for a few weeks now, so i am starting on coming up something usable but i need some suggestions here; Heres the problem - crawl the web (maybe 50 sites or so) and get physical addreses; i want to index physical addresses found on the crawl, so my search