>Maybe a patch is possible for the narrow use case the Tika user has

Y. Will take a closer look at hpsf and POIFS. Definitely belongs in POI.
Thank you!

On Tue, Apr 16, 2019 at 3:45 PM Dave Fisher <[email protected]> wrote:

> Hi -
>
> Well it’s early POI stuff. Maybe a patch is possible for the narrow use
> case the Tika user has.
>
> I assume that all you need is the first block or two to confirm this looks
> like an OLE document.
>
> Regards,
> Dave
>
> > On Apr 16, 2019, at 12:29 PM, Tim Allison <[email protected]> wrote:
> >
> > Thank you, Dave!  The reading examples use POIFSReader, which I had hoped
> > was truly streaming, but it creates a POIFS, which requires a read/skip
> of
> > the entire stream IIUC, and then iterates...Or, am I missing something?
> >
> > I didn’t try POIFSReader by specifying a subdoc to process, but it looks
> > like it opens a POIFS first no matter how you register a listener.
> >
> > On Tue, Apr 16, 2019 at 3:20 PM Dave Fisher <[email protected]>
> wrote:
> >
> >> Hi Tim,
> >>
> >> Maybe the answer is using HPSF -
> >>
> >> https://poi.apache.org/components/hpsf/how-to.html
> >>
> >> Regards,
> >> Dave
> >>
> >>> On Apr 16, 2019, at 11:47 AM, Tim Allison <[email protected]> wrote:
> >>>
> >>> All,
> >>> In Tika, when we do file type detection of OLE files
> >>> (POIFSContainerDetector), we spool the file to disk, open a POIFS and
> >>> make a decision based on document/directory names.  A user on
> >>> TIKA-2849 does not want to copy the full file from a slow network
> >>> drive for detection.  When I tried using a BoundedInputStream with
> >>> POIFS, not surprisingly, I got EOF exceptions.
> >>> Question: is there any way to do detection in a streaming mode for
> >>> OLE files?  Or, is this the best we can do?  Thank you!
> >>>
> >>>      Best,
> >>>
> >>>                    Tim
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to