Hey, I'm using nutch-0.9 and a custom IndexingFilter which writes out crawled pages as XML documents.
Now I'm trying to access the content-length of the web page (or PDF) inside my IndexingFilter. Apparently it does not work... parse.getData().getContentMeta().get(Response.CONTENT_LENGTH) -> null parse.getData().getMeta(Response.CONTENT_LENGTH) -> null I know there is the more-indexing plugin which uses data.getMeta(Response.CONTENT_LENGTH) but how can I access this in the IndexingFilter? Btw. I'm using protocol-httpclient to fetch HTTPS pages. Can someone give me a hint? Thanks Hannes

