Do you mean that you can search for "<table>" and find it? Some 'tags' should be included into index, such as value of <![CDATA[<table>]]>, and XML comments (not sure)... With the simplest HTML page and with default Nutch settings Nutch must index all plain text after defining language settings and removing all HTLM tags... 'Body' of a tag <a href="#">Body</a> will be indexed.
-----Original Message----- From: Damian Florczyk [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 06, 2006 9:19 AM To: [email protected] Subject: Nutch crawler problem Hi there, Hi have small problem when i'm indexing few sites crawler indexes html tags too and when i'm trying to search using this index there are some results which are inside html tags. What Can i do to remove those tags -- Damian Florczyk Gentoo/NetBSD Development Lead
