>From the limited HTML that I've seen I can only assume that the offending
xhtml is in the content field.

If this is the case then you will need to write a custom plugin
implementation that removes this. There is loads of info allowing you to
get up to speed with plugins on our wiki.[0]

Once you have something that requires help get on to the list and let us
know.

Lewis

[0] http://wiki.apache.org/nutch/PluginCentral

On Sat, Apr 7, 2012 at 2:33 PM, alessio crisantemi <
[email protected]> wrote:

> may be it'd my cause with my schema?
> I chose for inex about only title, author and content.
>
> can you help me for setting a parsefilter?
> thank you
> alessio
>
>

Reply via email to