Further on the information extraction idea, consider what the SIMILE
team at MIT are doing... http://simile.mit.edu
The lower-case semantic web is gaining a lot of momentum these days,
and I'm a strong proponent and student of it at the moment. Scraping
rich information from a site is certainly reasonably pragmatic, but
it is also highly fragile. SIMILE's Piggy Bank has a scraper
facility. In an more ideal world, computer shops, book stores,
libraries, and anyone with data to share would publish it in a
reusable and structured way (RDF seems to me to be the best way to do
this). Merging a full-text search engine with structured
information, though, is yet another tricky thing that I am myself
working with at the moment.
I'd love to have more discussions along these lines.
Erik
On Jul 26, 2005, at 5:50 AM, Cuong Hoang wrote:
Hi Jack,
I've been doing research the last few days and I think that once
successfully implemented, an information extraction system should
be able to
extract information from various sources. I've started reading
pattern/context free grammar/ontology which I think will be the
core of such
a system. I intend to index computer shops.
Regards,
Cuong Hoang
-----Original Message-----
From: Jack Tang [mailto:[EMAIL PROTECTED]
Sent: Tuesday, 26 July 2005 6:16 PM
To: [email protected]; [email protected]
Subject: Re: Information extraction
Hi Cuong.
I am going to build private book search engine. And I am face the same
problem.
Could you describe more about the information you want to extract and
the website?
Regards
/Jack
On 7/26/05, Cuong Hoang <[EMAIL PROTECTED]> wrote:
Hi all,
Does anyone have experience with designing web information
extraction such
as shopbots/pricebots? I'm currently doing research on this topic
and want
to integrate Nutch. A few guidelines from anyone who has designed
this
type
of systems will really be helpful to me.
Regards,
Cuong Hoang
--
Keep Discovering ... ...
http://www.jroller.com/page/jmars
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers