Hi, all. I'm new here, so if I don't know what I'm talking about, feel free to correct me. :)
It seems to me that options going into the parser are logically different from metadata coming out of the parser, and that to maximize the code's cohesion (see http://en.wikipedia.org/wiki/Cohesion_%28computer_science%29), it would be preferable to express them as two different objects. Also, if the metadata is the only output of the parser (as it appears to be in the use case), why not have the parser create the metadata object itself, and return it as the return value? This would seem like a more natural interface. So, using this approach, the code would look something like this: InputStream stream = ...; ParseOptions parseOptions = ... SomeTikaInterface parser = new SomeTikaClass(); Metadata metadata = parser.extractMetadata(stream, options);... ... or, alternatively, the ParseOptions might be used to instantiate the parser instead of being passed to the extractMetadata() method. - Keith Jukka Zitting wrote: > > Hi, > > On 8/25/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote: >> On 8/24/07, Jukka Zitting <[EMAIL PROTECTED]> wrote: >> > ...Extract metadata: >> > >> > InputStream stream = ...; >> > Metadata metadata = new Metadata(); >> > SomeTikaInterface parser = new SomeTikaClass(); >> > parser.extractMetadata(stream, metadata);... >> >> Maybe this (and extractContent() as well) need an additional >> TikaParseOptions parameter that sets options just for this parsing >> call? > > Good point, though we could also pass all such options as a part of > the metadata argument. If the options affect just this one document, > then I would argue that those options might as well be a part of the > document-specific metadata. > > More generic options, like the XML parser options to use when parsing > application/xml documents, should probably be handled as JavaBean > properties of the instantiated parser objects. > > BR, > > Jukka Zitting > > -- View this message in context: http://www.nabble.com/Tika-use-cases-tf4287938.html#a12596742 Sent from the Apache Tika - Development mailing list archive at Nabble.com.
