-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hey,
I'm new to the nutch project and just started to test some things. So I followed this example http://wiki.apache.org/nutch/WritingPluginExample and implemented my own HtmlParseFilter. My custom MyHtmlParseFilter works fine on most of the pages - but isn't called at all on others. (I also implemented an IndexingFilter that works just fine) The goal was to add a new field to the search index. For most of the pages my stuff is called what adds a custom field to the later search-index-documents. For some few pages, my code is ignored and I don't see this field in the index-documents. To sum this up: my ParseFilter doesn't get called at all for only a few random pages ... why!?! I guess this may be related to the MIME-type of the pages to be parsed? Has anyone an idea what may cause this? Regards, mana # I'm using nutch v.1.3 stable -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk4DkP0ACgkQzp84az+gLK3GIgCgimSSrsREQYqh3vWbf3ywaX5S HxcAnjqJgOML/a/NR6Q80PjC9EhU2MFS =jY8Z -----END PGP SIGNATURE-----

