Gal Nitzan wrote:
Sorry :) no.
Hmm. ok. :) But I think that patch is needed anyway, because now we
silently assume that parse plugins will always copy all Content metadata
to ParseData.metadata, while it may not be the case - and it certainly
does not happen if there is a parse error ... and this patch fixes it.
Later on, Indexer tries to retrieve these values from
parseData.metadata, and not from the content.metadata (because we try to
avoid reading too much data, so the content part of a segment is not
accessed during indexing).
I run fetcher with parse.
This NPE happens for only a few documents and that is the problem :)
Ok, then I think I know what is going on... Please try this patch -
that's the same problem, actually: these few documents failed to parse,
and we got an empty parseData - but in this case it means also empty
metadata, which means no segment name nor score in parseData.metadata.
Please test and report if it helps.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Index: Fetcher.java
===================================================================
--- Fetcher.java (revision 367099)
+++ Fetcher.java (working copy)
@@ -223,6 +223,9 @@
parse.getData().getMetadata().setProperty(SIGNATURE_KEY,
StringUtil.toHexString(signature));
datum.setSignature(signature);
}
+ // add segment name and score to parseData metadata
+ parse.getData().getMetadata().setProperty(SEGMENT_NAME_KEY, segmentName);
+ parse.getData().getMetadata().setProperty(SCORE_KEY,
Float.toString(datum.getScore()));
try {
output.collect