A couple of useful papers: (Quite well known) Mining the peanut gallery: Opinion Extraction and Semantic Classification of Product Reviews http://www2003.org/cdrom/papers/refereed/p451/package/p451-dave.html
(I'd never seen this before - about using Hidden Markov Models) Information extraction from HTML product catalogues: coupling quantitative and knowledge-based approaches http://rainbow.vse.cz/dags05.pdf > -----Original Message----- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Tuesday, 26 July 2005 11:42 PM > To: [email protected] > Subject: Re: Information extraction > > Further on the information extraction idea, consider what the > SIMILE team at MIT are doing... http://simile.mit.edu > > The lower-case semantic web is gaining a lot of momentum > these days, and I'm a strong proponent and student of it at > the moment. Scraping rich information from a site is > certainly reasonably pragmatic, but it is also highly > fragile. SIMILE's Piggy Bank has a scraper facility. In an > more ideal world, computer shops, book stores, libraries, and > anyone with data to share would publish it in a reusable and > structured way (RDF seems to me to be the best way to do > this). Merging a full-text search engine with structured > information, though, is yet another tricky thing that I am > myself working with at the moment. > > I'd love to have more discussions along these lines. > > Erik > > > On Jul 26, 2005, at 5:50 AM, Cuong Hoang wrote: > > > Hi Jack, > > > > I've been doing research the last few days and I think that once > > successfully implemented, an information extraction system > should be > > able to extract information from various sources. I've > started reading > > pattern/context free grammar/ontology which I think will be > the core > > of such a system. I intend to index computer shops. > > > > Regards, > > > > Cuong Hoang > > > > -----Original Message----- > > From: Jack Tang [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, 26 July 2005 6:16 PM > > To: [email protected]; [email protected] > > Subject: Re: Information extraction > > > > Hi Cuong. > > > > I am going to build private book search engine. And I am > face the same > > problem. > > Could you describe more about the information you want to > extract and > > the website? > > > > Regards > > /Jack > > > > On 7/26/05, Cuong Hoang <[EMAIL PROTECTED]> wrote: > > > >> Hi all, > >> > >> > >> > >> Does anyone have experience with designing web information > extraction > >> such as shopbots/pricebots? I'm currently doing research on this > >> topic and want to integrate Nutch. A few guidelines from > anyone who > >> has designed this > >> > > type > > > >> of systems will really be helpful to me. > >> > >> > >> > >> Regards, > >> > >> > >> > >> Cuong Hoang > >> > >> > >> > >> > > > > > > -- > > Keep Discovering ... ... > > http://www.jroller.com/page/jmars > > > > IMPORTANT: This e-mail, including any attachments, may contain private or confidential information. If you think you may not be the intended recipient, or if you have received this e-mail in error, please contact the sender immediately and delete all copies of this e-mail. If you are not the intended recipient, you must not reproduce any part of this e-mail or disclose its contents to any other party. This email represents the views of the individual sender, which do not necessarily reflect those of education.au limited except where the sender expressly states otherwise. It is your responsibility to scan this email and any files transmitted with it for viruses or any other defects. education.au limited will not be liable for any loss, damage or consequence caused directly or indirectly by this email. ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
