Figured out the issue was I was not explicitly including the metadata field
in the indexing filter, I had to do the following:

private static final Collection<WebPage.Field> FIELDS = new
HashSet<WebPage.Field>();
    static {
       FIELDS.add(WebPage.Field.METADATA);
}
        
@Override
public Collection<WebPage.Field> getFields() {
   return FIELDS;
}

I don't think this is that well documented - at least I did not realize it
was necessary for some time.  Previous plugins I used were able to get data
from the "page" variable without explicitly setting fields, albeit basic
data (baseUrl), and I was basing my plugin on another plugin that was also
doing this incorrectly.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Empty-webpage-metadata-in-IndexingFilter-but-not-empty-in-database-tp4086419p4086633.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to