Stefan Groschupf wrote:


public class MyDocumentFactory extends DocumentFactoryImpl { public Document getDocument (String seg, long doc, FetcherOutput fo, ParseText t, ParseData d) {

    Document result = super.getDocument(seg,doc,fo,t,d);

    // add my field
    result.add(Field.Keyword("myMetaField", d.get("myMetaField")));

    return result;
  }
}

So I have every-time to provide a own DocumentFactory in case i wish to add custom meta data? Sorry for being confused.


Why we do not:

public Document getDocument
(String seg, long doc, FetcherOutput fo, ParseText t, ParseData d, Properties properties) {
Document result = super.getDocument(seg,doc,fo,t,d);


loop throw all properties {
  result.add(Field.Keyword(propertyKey, d.get(propertyValue)));
}


Hmmm. This is perhaps easier for the casual user of this extensibility, but it doesn't allow for putting more logic inside the method. E.g. based on the metadata I need to do some processing inside the getDocument() in order to determine the kind of additional fields I would like to add. Examples include, as before, language detection, keyphrase extraction, classification, getting additional metadata for specific content types, etc...


--
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)



-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to