Re: Tika use cases

kbennett Mon, 10 Sep 2007 11:10:05 -0700

Jukka -

Thanks for responding.  What you said made perfect sense.  My domain
knowledge in this area is very limited, so I apologize in advance for that.


So a given parser (e.g. an MS Word document parser) might be instantiated at
its first use with "global" options, that is, options for all parses, and
then each call to extractMetadata would use that instance and be given
file-specific options?  So it might look something like this?:

// in some parser factory class, named, say ParserFactory:

private SomeTikaInterface msWordParser;

SomeTikaInterface getMSWordParser() {
    if msWordParser == null) {
        msWordParser = new MSWordParser( /* the global config options */);
    }
    return msWordParser;
}

// ----------------- and then, where the actual parse needs to be done:

InputStream stream = ...; 
Metadata metadata = new Metadata(); 
myParserFactoryInstance.getMSWordParser().extractMetadata(stream, metadata);

?

- Keith 


Jukka Zitting wrote:
> 
> 
> There are really two kinds of options that could affect the way a
> parser would work. The first kind are generic options like the maximum
> amount of memory or time to use, the location of any temporary files
> to be used, etc. that don't have any direct relation to the specific
> document being parsed. The other kind are parsing hints related to the
> parsed document, like the name (and extension) of the file that
> contains the document, any MIME headers associated with the document
> (for example from a HTTP request or an email body part), etc.
> 

-- 
View this message in context: 
http://www.nabble.com/Tika-use-cases-tf4287938.html#a12599158
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Re: Tika use cases

Reply via email to