On 2010-06-29 08:23, Torsten Krah wrote:
> Hi,
>
> pre 1.0 Days, it was possible to have dynamic attributes in lucene, because
> the API let you do such things (Lucene document access).
>
> How to do the same in 1.0> - using 1.1 the API the NutchDocument does only
> know name and value, but if i don't know the name (dynamic attribute via
> HtmlParser, meta tags indexing) - how can i still index them? Or is this
> impossible with the lucene backend now?
It's still possible to do this, but it's undocumented...
Here's a quick howto: in your IndexingFilter, whenever you want to add a
previously undeclared field you need to declare its Lucene options on a
per-document level like this:
String fieldName = "myMetaField";
String value = "undeclared meta value";
Metadata meta = nutchDocument.getDocumentMeta();
meta.add(LuceneConstants.FIELD_PREFIX + fieldName,
LuceneConstants.STORE_YES);
meta.add(LuceneConstants.FIELD_PREFIX + fieldName,
LuceneConstants.INDEX_TOKENIZED);
//... etc, add those field options that you want
// and add the field value
nutchDocument.add(fieldName, value);
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com