On 08/12/2011 11:13, Nick Burch wrote:
On Wed, 7 Dec 2011, Andrzej Bialecki wrote:
However, I'd like to have an option to avoid recursing into compound
documents, while still being able to process nested archives (like
zip, tgz, etc). Is there any easy way to express this preference? I
thought about using the type of handler passed to the
RecursiveParser.parse(..) to decide when to stop recursing, but I
noticed that in both cases (embedded components and entries in
archives) an EmbeddedContentHandler is passed to the parse(...) method.

I'd suggest you just put the logic into your nested parser. What I'd
suggest is that you look at the mimetype of the source document, and use
that to decide if you supply the recursing parser or not on the parse
context.

I guess that could work, but it would be very messy - I would have to keep a list of all potentially interesting mime types in my code, which is difficult to maintain.

It would be much better if the parent parser passed a token in metadata, basically saying "this is invoked from a XXXParser", so then I could detect that it was the PackageParser that invoked the method, and act accordingly.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to