Hi Mana, I think you would be best to provide details on the following.
What the htmlparsefilter plugin does some log data displaying how it works with some urls but not witrh others e.g. so we can see the nature of the urls it is not working with and vice versa Which version of nutch you are using Some comments on your indexing plugin, in my own opinion it is much easier to create fields to be indexed if we write this into our mapping schema and in our Solr implementation. My assumption is that you are not using Solr for indexing, this is why you are experiencing some problem getting your fields to map to the index. Is it convenient to try Solr, without access to code for yoyur plugin it makes it extremely hard to try and route out the problem you are experiencing. On Thu, Jun 23, 2011 at 12:16 PM, Matthias Naber < [email protected]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hey, > > I'm new to the nutch project and just started to test some things. So > I followed this example > http://wiki.apache.org/nutch/WritingPluginExample and implemented my > own HtmlParseFilter. > > My custom MyHtmlParseFilter works fine on most of the pages - but > isn't called at all on others. (I also implemented an IndexingFilter > that works just fine) > > The goal was to add a new field to the search index. For most of the > pages my stuff is called what adds a custom field to the later > search-index-documents. For some few pages, my code is ignored and I > don't see this field in the index-documents. > > To sum this up: my ParseFilter doesn't get called at all for only a > few random pages ... why!?! > > I guess this may be related to the MIME-type of the pages to be > parsed? Has anyone an idea what may cause this? > > Regards, > mana > > # I'm using nutch v.1.3 stable > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.8 (Darwin) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk4DkP0ACgkQzp84az+gLK3GIgCgimSSrsREQYqh3vWbf3ywaX5S > HxcAnjqJgOML/a/NR6Q80PjC9EhU2MFS > =jY8Z > -----END PGP SIGNATURE----- > > -- *Lewis*

