Re: Help needed for special byte collecting input stream

Nick Burch Mon, 19 Oct 2015 07:10:28 -0700

On Sun, 18 Oct 2015, Vjeran Marcinko wrote:

Well, the problem is that I don't need to collect raw content of everypossible file type, just some predefined file types. And some parsed filescan be veeeeeery large, like some big archives, and I don't want to collectthese raw bytes for such files (memory issue). But problem is that generalTika API offers file's "content-type" as part of Metadata that is populated*only after Parser.parse has finished*

Why not do detection first? Then wrap if needed, then set the result ofdetection onto the Metadata object, then finally call DefaultParser. It'sbasically what AutoDetectParser does internally, but this way you get tochange your logic post-detection and pre-parsing


Nick

Re: Help needed for special byte collecting input stream

Reply via email to