-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey,

I'm new to the nutch project and just started to test some things. So
I followed this example
http://wiki.apache.org/nutch/WritingPluginExample and implemented my
own HtmlParseFilter.

My custom MyHtmlParseFilter works fine on most of the pages - but
isn't called at all on others. (I also implemented an IndexingFilter
that works just fine)

The goal was to add a new field to the search index. For most of the
pages my stuff is called what adds a custom field to the later
search-index-documents. For some few pages, my code is ignored and I
don't see this field in the index-documents.

To sum this up: my ParseFilter doesn't get called at all for only a
few random pages ... why!?!

I guess this may be related to the MIME-type of the pages to be
parsed? Has anyone an idea what may cause this?

Regards,
mana

# I'm using nutch v.1.3 stable
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4DkP0ACgkQzp84az+gLK3GIgCgimSSrsREQYqh3vWbf3ywaX5S
HxcAnjqJgOML/a/NR6Q80PjC9EhU2MFS
=jY8Z
-----END PGP SIGNATURE-----

Reply via email to