Do you mean that you can search for "<table>" and find it?
Some 'tags' should be included into index, such as value of
<![CDATA[<table>]]>, and XML comments (not sure)...
With the simplest HTML page and with default Nutch settings Nutch must index
all plain text after defining language settings and removing all HTLM
tags... 'Body' of a tag <a href="#">Body</a> will be indexed.


-----Original Message-----
From: Damian Florczyk [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 06, 2006 9:19 AM
To: [email protected]
Subject: Nutch crawler problem


Hi there,

Hi have small problem when i'm indexing few sites crawler indexes html
tags too and when i'm trying to search using this index there are some
results which are inside html tags. What Can i do to remove those tags


-- 
Damian Florczyk
Gentoo/NetBSD Development Lead


Reply via email to