I thought I did but I thought before I did a bin/nutch index (or solrindex) it
would be stored somewhere it does seems to be getting to the doc.add bit which
makes me think the variable is empty
{code}
    public void addIndexBackendOptions(Configuration conf) {
      LOG.warn("+_+_You called me _+_+");
      LuceneWriter.addFieldOptions("html_filter_data", STORE.YES,
INDEX.UNTOKENIZED, conf);
    }
    
    public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
CrawlDatum datum, Inlinks inlinks) throws IndexingException {
      LOG.warn("________________________FILTER_______________________");
      String html_filter_data = parse.getData().getMeta("html_filter_data");
      if (html_filter_data != null){
          LOG.warn("________________________Adding filter
data_______________________");
          doc.add("html_filter_data", html_filter_data);
      }
      return doc;
    }
{code}
On 24 November 2009 at 12:05 Andrzej Bialecki <a...@getopt.org> wrote:

> david.stu...@progressivealliance.co.uk wrote:
> >   Hi All,
> > 
> > I think I am just about finished my plugin (nutch 1.0) which adds extra 
> > metadata to during parsing the problem I am having is it doesn't seem to 
> > be adding the data to the system (via luke or readseg). I looked at in 
> > the wiki but it seems to be for 0.9 and the syntax looks different.
> > 
> > {code}       
> >   public ParseResult filter(Content content, ParseResult parseResult, 
> > HTMLMetaTags metaTags, DocumentFragment doc) {
> >       Metadata metadata = new Metadata();
> >       // parse the content
> >       DocumentFragment root;   
> >       String docTrans;
> >       try {
> >         byte[] contentInOctets = content.getContent();
> >         String input = new String(contentInOctets);
> >         XSLTSimpleTransform DocTransform = new XSLTSimpleTransform();
> >         docTrans = DocTransform.doTransform(input);
> >         Parse parse = parseResult.get(content.getUrl());
> >         metadata = parse.getData().getParseMeta();
> >         metadata.add("filter_html_data", docTrans);
> > 
> >       } catch (Exception e) {
> >         e.printStackTrace(LogUtil.getWarnStream(LOG));
> >       }
> >      
> >     return parseResult;
> >   }
> > {code}
> 
> Did you declare that you are adding this field in the 
> IndexingFilter.addIndexBackendOptions(..) ? See how other indexing 
> plugins do this.
> 
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>

Reply via email to