Hi,

On 9/23/07, kbennett <[EMAIL PROTECTED]> wrote:
> 1) I suggest we create a class to store the parsed document content, rather
> than just a Map.  The class could have convenience methods such as
> getStringContent(), and possibly hold onto a resource identifier that could
> be set.  We might also want to make the parsed values immutable.

This is what I had in mind for the Metadata instance in my proposed
Parser interface design. I think I have a reasonable evolutionary path
designed for transforming the current Parser interfaces to this
proposed model. Something like this:

    current: List<Content> getContents();
    TIKA-26: Map<String,Content> getContents();
    TIKA-n1: Map<String,Content> parse(InputStream stream);
    TIKA-n2: String parse(InputStream stream, Map<String,Content> metadata);
    TIKA-n3: String parse(InputStream stream, Metadata metadata);
    TIKA-n4: void parse(InputStream stream, ContentHanlder handler,
Metadata metadata);

> 2) If we make the Parser stateless, how will we deal with the chunking of
> large documents?

By making the parse method output SAX events instead of  a single
String that contains the text content of the entire document.

BR,

Jukka Zitting

Reply via email to