On Tue, Aug 30, 2011 at 4:30 PM, Jukka Zitting <[email protected]> wrote:
>> I think Tika.parseToString (static sugar method) closes the
>> InputStream for you, while the Parser.parse method does not?
>> Kinda confusing!
>
> It is, but there's a design behind this. :-) The basic idea behind
> resource handling in Tika is that whoever opens a stream or another
> resource is also responsible for properly closing or releasing it.
> This (I think) is also pretty well documented in Tika's key
> interfaces.
I think that's a good approach in general (if you open it, you close
it). It does look like it's well documented, which is great.
> The exception to this rule is the Tika.parseToString(InputStream,
> Metadata) method. The reason for making an exception here is that the
> Tika facade class was designed primarily for convenience and to
> minimize the amount of code that the consumer of the API needs to
> write. The proper resource management pattern in this case would have
> been:
>
> InputStream stream = ...;
> try {
> return tika.parseToString(stream, ...);
> } finally {
> stream.close();
> }
>
> However, since in this specific case the client application hardly
> ever needs to use the stream for anything else and since in pretty
> much all cases the stream in question is constructed right when the
> parseToString call is made, it makes more sense for the
> parseToString() method to take care of closing the stream. The result
> is that the above code can be reduced to:
>
> return tika.parseToString(..., ...);
I can appreciate this motivation, ie "convenience trumps consistency",
here.
Still this inconsistency can lead to confusion and to bugs, but since
it's only one API that's "the exception" I think it's OK?
But maybe we can beef up its javadocs a bit, saying "NOTE: unlike all
other APIs parsing from an InputStream, this API closes the incoming
InputStream for you for convenience" or something?
Mike McCandless
http://blog.mikemccandless.com