>Maybe a patch is possible for the narrow use case the Tika user has Y. Will take a closer look at hpsf and POIFS. Definitely belongs in POI. Thank you!
On Tue, Apr 16, 2019 at 3:45 PM Dave Fisher <[email protected]> wrote: > Hi - > > Well it’s early POI stuff. Maybe a patch is possible for the narrow use > case the Tika user has. > > I assume that all you need is the first block or two to confirm this looks > like an OLE document. > > Regards, > Dave > > > On Apr 16, 2019, at 12:29 PM, Tim Allison <[email protected]> wrote: > > > > Thank you, Dave! The reading examples use POIFSReader, which I had hoped > > was truly streaming, but it creates a POIFS, which requires a read/skip > of > > the entire stream IIUC, and then iterates...Or, am I missing something? > > > > I didn’t try POIFSReader by specifying a subdoc to process, but it looks > > like it opens a POIFS first no matter how you register a listener. > > > > On Tue, Apr 16, 2019 at 3:20 PM Dave Fisher <[email protected]> > wrote: > > > >> Hi Tim, > >> > >> Maybe the answer is using HPSF - > >> > >> https://poi.apache.org/components/hpsf/how-to.html > >> > >> Regards, > >> Dave > >> > >>> On Apr 16, 2019, at 11:47 AM, Tim Allison <[email protected]> wrote: > >>> > >>> All, > >>> In Tika, when we do file type detection of OLE files > >>> (POIFSContainerDetector), we spool the file to disk, open a POIFS and > >>> make a decision based on document/directory names. A user on > >>> TIKA-2849 does not want to copy the full file from a slow network > >>> drive for detection. When I tried using a BoundedInputStream with > >>> POIFS, not surprisingly, I got EOF exceptions. > >>> Question: is there any way to do detection in a streaming mode for > >>> OLE files? Or, is this the best we can do? Thank you! > >>> > >>> Best, > >>> > >>> Tim > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [email protected] > >>> For additional commands, e-mail: [email protected] > >>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
