Jukka -

The ParserPostProcessor creates a TeeHandler that sends output to the
caller's handler and, in addition, its own WriteOutContentHandler.  So, if I
understand correctly, if the caller's handler is also using a
WriteOutContentHandler or equivalent, then the full text is being saved in
two StringWriter's, no?

Regards,
Keith




Jukka Zitting wrote:
> 
> Hi,
> 
> On 10/22/07, Keith R. Bennett <[EMAIL PROTECTED]> wrote:
>> I was thinking that as a ContentHandler, the user could choose to place
>> all
>> the data in memory, and there would be a single copy of the full text.
>>
>> As the ParserPostProcessor, if I understand correctly, the user is bound
>> to
>> consume the extra memory if using the AutoDetectParser, and we are
>> probably
>> consuming twice as much memory to do so, since we would be saving the
>> full
>> text in two different string writers.
> 
> I don't quite follow you. AutoDetectParser never reads the full
> content into memory (of course unless an underlying parser does it).
> 
>> So I was thinking of moving the existing logic from the
>> ParserPostProcessor
>> to a ContentHandler implementation.
> 
> Sure, why not.
> 
> If I understand you correctly, you'd prefer something like this:
> 
>     Parser parser = ...;
>     Metadata metadata = new Metadata();
>     parser.parse(..., new FullTextContentHandler(metadata), metadata);
> 
> over:
> 
>     Parser parser = new ParserPostProcessor(...);
>     Metadata metadata = new Metadata();
>     parser.parse(..., new DefaultHandler(), metadata);
> 
> BR,
> 
> Jukka Zitting
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Fulltext-Metadata-Property--tf4643633.html#a13352591
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Reply via email to