Further on the information extraction idea, consider what the SIMILE team at MIT are doing... http://simile.mit.edu

The lower-case semantic web is gaining a lot of momentum these days, and I'm a strong proponent and student of it at the moment. Scraping rich information from a site is certainly reasonably pragmatic, but it is also highly fragile. SIMILE's Piggy Bank has a scraper facility. In an more ideal world, computer shops, book stores, libraries, and anyone with data to share would publish it in a reusable and structured way (RDF seems to me to be the best way to do this). Merging a full-text search engine with structured information, though, is yet another tricky thing that I am myself working with at the moment.

I'd love to have more discussions along these lines.

    Erik


On Jul 26, 2005, at 5:50 AM, Cuong Hoang wrote:

Hi Jack,

I've been doing research the last few days and I think that once
successfully implemented, an information extraction system should be able to
extract information from various sources. I've started reading
pattern/context free grammar/ontology which I think will be the core of such
a system. I intend to index computer shops.

Regards,

Cuong Hoang

-----Original Message-----
From: Jack Tang [mailto:[EMAIL PROTECTED]
Sent: Tuesday, 26 July 2005 6:16 PM
To: [email protected]; [email protected]
Subject: Re: Information extraction

Hi Cuong.

I am going to build private book search engine. And I am face the same
problem.
Could you describe more about the information you want to extract and
the website?

Regards
/Jack

On 7/26/05, Cuong Hoang <[EMAIL PROTECTED]> wrote:

Hi all,



Does anyone have experience with designing web information extraction such as shopbots/pricebots? I'm currently doing research on this topic and want to integrate Nutch. A few guidelines from anyone who has designed this

type

of systems will really be helpful to me.



Regards,



Cuong Hoang






--
Keep Discovering ... ...
http://www.jroller.com/page/jmars




-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to