On Mon, 12 Sep 2016, Sergey Beryozkin wrote:
By the way, I've found out AutoDetectParser may not work if the (pdf) stream is an attachment stream which may not support a mark.
Simplest would probably be just to wrap it in a TikaInputStream, which would handle any buffering/marking as needed
I've been wondering, would it make sense to pass a MediaType identifying the data format as either a ParseContext or Metadata property for AutoDetectParser to avoid trying to read the stream ?
If you pass in the filename + disable all detectors other than the magic one, or pass in the mime type + disable all detectors, the auto detect parser ought to skip anything to do with the stream at detection time
Nick
