Sorry I meant doesn't get to doc.add
David
On 24 Nov 2009, at 11:27, "david.stu...@progressivealliance.co.uk" <david.stu...@progressivealliance.co.uk
> wrote:
I thought I did but I thought before I did a bin/nutch index (or
solrindex) it would be stored somewhere it does seems to be getting
to the doc.add bit which makes me think the variable is empty
{code}
public void addIndexBackendOptions(Configuration conf) {
LOG.warn("+_+_You called me _+_+");
LuceneWriter.addFieldOptions("html_filter_data", STORE.YES,
INDEX.UNTOKENIZED, conf);
}
public NutchDocument filter(NutchDocument doc, Parse parse, Text
url, CrawlDatum datum, Inlinks inlinks) throws IndexingException {
LOG.warn
("________________________FILTER_______________________");
String html_filter_data = parse.getData().getMeta
("html_filter_data");
if (html_filter_data != null){
LOG.warn("________________________Adding filter
data_______________________");
doc.add("html_filter_data", html_filter_data);
}
return doc;
}
{code}
On 24 November 2009 at 12:05 Andrzej Bialecki <a...@getopt.org> wrote:
> david.stu...@progressivealliance.co.uk wrote:
> > Hi All,
> >
> > I think I am just about finished my plugin (nutch 1.0) which
adds extra
> > metadata to during parsing the problem I am having is it doesn't
seem to
> > be adding the data to the system (via luke or readseg). I looked
at in
> > the wiki but it seems to be for 0.9 and the syntax looks
different.
> >
> > {code}
> > public ParseResult filter(Content content, ParseResult
parseResult,
> > HTMLMetaTags metaTags, DocumentFragment doc) {
> > Metadata metadata = new Metadata();
> > // parse the content
> > DocumentFragment root;
> > String docTrans;
> > try {
> > byte[] contentInOctets = content.getContent();
> > String input = new String(contentInOctets);
> > XSLTSimpleTransform DocTransform = new
XSLTSimpleTransform();
> > docTrans = DocTransform.doTransform(input);
> > Parse parse = parseResult.get(content.getUrl());
> > metadata = parse.getData().getParseMeta();
> > metadata.add("filter_html_data", docTrans);
> >
> > } catch (Exception e) {
> > e.printStackTrace(LogUtil.getWarnStream(LOG));
> > }
> >
> > return parseResult;
> > }
> > {code}
>
> Did you declare that you are adding this field in the
> IndexingFilter.addIndexBackendOptions(..) ? See how other indexing
> plugins do this.
>
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>