WebDataKit,
http://www.lotontech.com/wdbc.html
- free for download. Some kind of SQL for HTML (even from different
web-sites, concatenation etc.), interesting... I need to search specific
places within HTML... 


Some sites have very good design, they have explicit meta-tags... If you
are working on Intranet, it's easiest solution:
<title>TOSHIBA TECRA S2 Pentium M 15.0&quot; nVIDIA GeForce Go 6600
NoteBook - Retail at Newegg.com</title>
<meta name="description" content="Buy TOSHIBA TECRA S2 Pentium M
15.0&quot; nVIDIA GeForce Go 6600 NoteBook - Retail Online" />  
<meta name="keywords" content="Buy TOSHIBA TECRA S2 Pentium M 15.0&quot;
nVIDIA GeForce Go 6600 NoteBook - Retail Cheap" />



-----Original Message-----
From: Jack Tang [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 18, 2005 10:15 PM
To: [email protected]
Subject: Parse-html should be enhanced!


Hi Nutchers

I think parse-html parse should be enhanced. In some of  my
projects(Intranet search engine), we only need the content in the
specified detectors and filter the junk, say the content between <div
class="start-here"> and </div> or some detectors like XPath. Any
thoughts on this enhancement?

Regards
/Jack
-- 
Keep Discovering ... ...
http://www.jroller.com/page/jmars




-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to