Hi, all.  I'm new here, so if I don't know what I'm talking about, feel free
to correct me. :)

It seems to me that options going into the parser are logically different
from metadata coming out of the parser, and that to maximize the code's
cohesion (see http://en.wikipedia.org/wiki/Cohesion_%28computer_science%29),
it would be preferable to express them as two different objects.

Also, if the metadata is the only output of the parser (as it appears to be
in the use case), why not have the parser create the metadata object itself,
and return it as the return value?  This would seem like a more natural
interface.

So, using this approach, the code would look something like this:

InputStream stream = ...;
ParseOptions parseOptions = ...
SomeTikaInterface parser = new SomeTikaClass();
Metadata metadata = parser.extractMetadata(stream, options);...

... or, alternatively, the ParseOptions might be used to instantiate the
parser instead of being passed to the extractMetadata() method.

- Keith



Jukka Zitting wrote:
> 
> Hi,
> 
> On 8/25/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:
>> On 8/24/07, Jukka Zitting <[EMAIL PROTECTED]> wrote:
>> > ...Extract metadata:
>> >
>> >     InputStream stream = ...;
>> >     Metadata metadata = new Metadata();
>> >     SomeTikaInterface parser = new SomeTikaClass();
>> >     parser.extractMetadata(stream, metadata);...
>>
>> Maybe this (and extractContent() as well) need an additional
>> TikaParseOptions parameter that sets options just for this parsing
>> call?
> 
> Good point, though we could also pass all such options as a part of
> the metadata argument. If the options affect just this one document,
> then I would argue that those options might as well be a part of the
> document-specific metadata.
> 
> More generic options, like the XML parser options to use when parsing
> application/xml documents, should probably be handled as JavaBean
> properties of the instantiated parser objects.
> 
> BR,
> 
> Jukka Zitting
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Tika-use-cases-tf4287938.html#a12596742
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Reply via email to