On 2010-06-29 08:23, Torsten Krah wrote:
> Hi,
> 
> pre 1.0 Days, it was possible to have dynamic attributes in lucene, because 
> the API let you do such things (Lucene document access).
> 
> How to do the same in 1.0> - using 1.1 the API the NutchDocument does only 
> know name and value, but if i don't know the name (dynamic attribute via 
> HtmlParser, meta tags indexing) - how can i still index them? Or is this 
> impossible with the lucene backend now?

It's still possible to do this, but it's undocumented...

Here's a quick howto: in your IndexingFilter, whenever you want to add a
previously undeclared field you need to declare its Lucene options on a
per-document level like this:

        String fieldName = "myMetaField";
        String value = "undeclared meta value";
        Metadata meta = nutchDocument.getDocumentMeta();
        meta.add(LuceneConstants.FIELD_PREFIX + fieldName,
LuceneConstants.STORE_YES);
        meta.add(LuceneConstants.FIELD_PREFIX + fieldName,
LuceneConstants.INDEX_TOKENIZED);
        //... etc, add those field options that you want
        // and add the field value
        nutchDocument.add(fieldName, value);



-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to