Hi Mana,

I think you would be best to provide details on the following.

What the htmlparsefilter plugin does
some log data displaying how it works with some urls but not witrh others
e.g. so we can see the nature of the urls it is not working with and vice
versa
Which version of nutch you are using

Some comments on your indexing plugin, in my own opinion it is much easier
to create fields to be indexed if we write this into our mapping schema and
in our Solr implementation. My assumption is that you are not using Solr for
indexing, this is why you are experiencing some problem getting your fields
to map to the index. Is it convenient to try Solr, without access to code
for yoyur plugin it makes it extremely hard to try and route out the problem
you are experiencing.

On Thu, Jun 23, 2011 at 12:16 PM, Matthias Naber <
[email protected]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hey,
>
> I'm new to the nutch project and just started to test some things. So
> I followed this example
> http://wiki.apache.org/nutch/WritingPluginExample and implemented my
> own HtmlParseFilter.
>
> My custom MyHtmlParseFilter works fine on most of the pages - but
> isn't called at all on others. (I also implemented an IndexingFilter
> that works just fine)
>
> The goal was to add a new field to the search index. For most of the
> pages my stuff is called what adds a custom field to the later
> search-index-documents. For some few pages, my code is ignored and I
> don't see this field in the index-documents.
>
> To sum this up: my ParseFilter doesn't get called at all for only a
> few random pages ... why!?!
>
> I guess this may be related to the MIME-type of the pages to be
> parsed? Has anyone an idea what may cause this?
>
> Regards,
> mana
>
> # I'm using nutch v.1.3 stable
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.8 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk4DkP0ACgkQzp84az+gLK3GIgCgimSSrsREQYqh3vWbf3ywaX5S
> HxcAnjqJgOML/a/NR6Q80PjC9EhU2MFS
> =jY8Z
> -----END PGP SIGNATURE-----
>
>


-- 
*Lewis*

Reply via email to