OK. thanks for the patch.
I shall embed it tonight.
I promise :) to let you know...
Gal.
On Mon, 2006-01-09 at 10:53 +0100, Andrzej Bialecki wrote:
> Gal Nitzan wrote:
>
> >Sorry :) no.
> >
> >
> >
>
> Hmm. ok. :) But I think that patch is needed anyway, because now we
> silently assume that parse plugins will always copy all Content metadata
> to ParseData.metadata, while it may not be the case - and it certainly
> does not happen if there is a parse error ... and this patch fixes it.
> Later on, Indexer tries to retrieve these values from
> parseData.metadata, and not from the content.metadata (because we try to
> avoid reading too much data, so the content part of a segment is not
> accessed during indexing).
>
> >I run fetcher with parse.
> >
> >This NPE happens for only a few documents and that is the problem :)
> >
> >
>
> Ok, then I think I know what is going on... Please try this patch -
> that's the same problem, actually: these few documents failed to parse,
> and we got an empty parseData - but in this case it means also empty
> metadata, which means no segment name nor score in parseData.metadata.
>
> Please test and report if it helps.
>
> plain text document attachment (patch)
> Index: Fetcher.java
> ===================================================================
> --- Fetcher.java (revision 367099)
> +++ Fetcher.java (working copy)
> @@ -223,6 +223,9 @@
> parse.getData().getMetadata().setProperty(SIGNATURE_KEY,
> StringUtil.toHexString(signature));
> datum.setSignature(signature);
> }
> + // add segment name and score to parseData metadata
> + parse.getData().getMetadata().setProperty(SEGMENT_NAME_KEY,
> segmentName);
> + parse.getData().getMetadata().setProperty(SCORE_KEY,
> Float.toString(datum.getScore()));
>
> try {
> output.collect
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers