Hey,

I'm using nutch-0.9 and a custom IndexingFilter which writes out crawled
pages as XML documents.

Now I'm trying to access the content-length of the web page (or PDF) inside
my IndexingFilter. Apparently it does not work...

parse.getData().getContentMeta().get(Response.CONTENT_LENGTH) -> null
parse.getData().getMeta(Response.CONTENT_LENGTH) -> null

I know there is the more-indexing plugin which uses
data.getMeta(Response.CONTENT_LENGTH) but how can I access this in the
IndexingFilter?

Btw. I'm using protocol-httpclient to fetch HTTPS pages.

Can someone give me a hint?

Thanks

Hannes

Reply via email to