I can list some of them currently needing temp files: jpeg, zip (for detection) and derived (docx, xlsx, pptx), ole2 (for detection) and derived (doc, xls, ppt), mdb, pst, rar, 7zip, sqlite...
But quoting Tim Allison, that can change depending on dependencies. For example, in the past PDF needed temp files, in recent versions it was stored in memory, now it is configurable... Luis 2018-01-11 18:02 GMT-02:00 Allison, Timothy B. <[email protected]>: > I'm not aware of such a list. Part of the challenge is that we don't know > when our dependencies might choose to create a temp file. > > Sorry! > > -----Original Message----- > From: Van Tassell, Kristian [mailto:[email protected]] > Sent: Thursday, January 11, 2018 1:42 PM > To: [email protected] > Subject: RE: Parse file without creating tmp file > > Apologies for bumping such an old thread, but is there an official list > somewhere of those filetypes that require the temporary file being created? > > Thanks! > > -----Original Message----- > From: Nick Burch [mailto:[email protected]] > Sent: Tuesday, July 11, 2017 4:23 AM > To: [email protected] > Subject: Re: Parse file without creating tmp file > > On Tue, 11 Jul 2017, aravinth thangasami wrote: > > Recently I have noticed tika creates a tmp file in before parsing the > > stream. > > Only for certain formats, generally where the underlying parsing library > requires a file for random-access > > > I don't have much experience in Tika but I feel it is an overhead. > > Can we achieve file parsing without writing to tmp file? > > For some files, no, not without re-writing other open source libraries > > For most, it isn't needed and Tika won't do it > > Nick > >
