Sorry keep pressing But I dont quite understanding how the metadata is passed from the parse to the index if in my public ParseResult filter...
Do this Parse parse = parseResult.get(content.getUrl()); metadata = parse.getData().getParseMeta(); metadata.add("filter_html_data", docTrans); Then return return parseResult; Is the data passed by reference into parseResult? because when I try and retrieve it in public NutchDocument filter... by doing String html_filter_data = parse.getData().getMeta("html_filter_data"); LOG.warn(html_filter_data); if (html_filter_data != null){ LOG.warn("________________________Adding filter data_______________________"); doc.add("html_filter_data", html_filter_data); } I Never reach the add because the variable html_filter_data is empty any ideas Thanks for you help On 24 November 2009 at 12:27 "david.stu...@progressivealliance.co.uk" <david.stu...@progressivealliance.co.uk> wrote: > I thought I did but I thought before I did a bin/nutch index (or solrindex) it > would be stored somewhere it does seems to be getting to the doc.add bit which > makes me think the variable is empty > {code} > public void addIndexBackendOptions(Configuration conf) { > LOG.warn("+_+_You called me _+_+"); > LuceneWriter.addFieldOptions("html_filter_data", STORE.YES, > INDEX.UNTOKENIZED, conf); > } > > public NutchDocument filter(NutchDocument doc, Parse parse, Text url, > CrawlDatum datum, Inlinks inlinks) throws IndexingException { > LOG.warn("________________________FILTER_______________________"); > String html_filter_data = parse.getData().getMeta("html_filter_data"); > if (html_filter_data != null){ > LOG.warn("________________________Adding filter > data_______________________"); > doc.add("html_filter_data", html_filter_data); > } > return doc; > } > {code} > On 24 November 2009 at 12:05 Andrzej Bialecki <a...@getopt.org> wrote: > > > david.stu...@progressivealliance.co.uk wrote: > > > Hi All, > > > > > > I think I am just about finished my plugin (nutch 1.0) which adds extra > > > metadata to during parsing the problem I am having is it doesn't seem to > > > be adding the data to the system (via luke or readseg). I looked at in > > > the wiki but it seems to be for 0.9 and the syntax looks different. > > > > > > {code} > > > public ParseResult filter(Content content, ParseResult parseResult, > > > HTMLMetaTags metaTags, DocumentFragment doc) { > > > Metadata metadata = new Metadata(); > > > // parse the content > > > DocumentFragment root; > > > String docTrans; > > > try { > > > byte[] contentInOctets = content.getContent(); > > > String input = new String(contentInOctets); > > > XSLTSimpleTransform DocTransform = new XSLTSimpleTransform(); > > > docTrans = DocTransform.doTransform(input); > > > Parse parse = parseResult.get(content.getUrl()); > > > metadata = parse.getData().getParseMeta(); > > > metadata.add("filter_html_data", docTrans); > > > > > > } catch (Exception e) { > > > e.printStackTrace(LogUtil.getWarnStream(LOG)); > > > } > > > > > > return parseResult; > > > } > > > {code} > > > > Did you declare that you are adding this field in the > > IndexingFilter.addIndexBackendOptions(..) ? See how other indexing > > plugins do this. > > > > > > -- > > Best regards, > > Andrzej Bialecki <>< > > ___. ___ ___ ___ _ _ __________________________________ > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > ___|||__|| \| || | Embedded Unix, System Integration > > http://www.sigram.com Contact: info at sigram dot com > >