Hello Both, Thanks for the reply. Using VisualVM it shows me that 8GB is being reserved (8GB Xmx), the Used memory quickly climbs up to around 6GB and eventually to 8GB at which point the program will crash. If I trigger Garbage Collections it does not save any memory.
The files themselves are a mixture of PDF, JPG, and Office. The largest PDF file is 20MB, the largest DOCX is 600KB. I have done some testing and it is the PDF files that cause the issue (only running the JPG and Office files causes no memory problems). I am using Tika 1.14. I had thought by disposing of the Tika facade each loop iteration this would have freed up any memory used by the previous parse (sorry I am a bit new to Java)? Thank you On 3 January 2017 at 17:56, Allison, Timothy B. <[email protected]> wrote: > Concur with Markus. > > Also, what type of files are these? We know that very large .docx (think > "War and Peace") and .pptx can use up a crazy amount of memory. We've > added new experimental parsers to handle those via SAX in trunk (coming in > v 1.15), and these parsers decrease memory usage dramatically. > > > -----Original Message----- > From: Markus Jelsma [mailto:[email protected]] > Sent: Tuesday, January 3, 2017 12:23 PM > To: [email protected] > Subject: RE: Memory issues with the Tika Facade > > Hello - what is a large amount of memory, how do you determine it (make > sure you look at RES, not VIRT) and what are your JVM settings. > > It is not uncommon for programs to allocate much memory if the default max > heap is used, 2 GB in my case. If your JVM eats too much, limit it by > setting Xmx to a lower level. > > Markus > > -----Original message----- > > From:Will Jones <[email protected]> > > Sent: Tuesday 3rd January 2017 18:14 > > To: [email protected] > > Subject: Memory issues with the Tika Facade > > > > Hi, > > > > Big fan of what you are doing with Apache Tika. I have been using the > Tika facade to fetch metadata on each file in a directory containing a > large number of files. > > > > It returns the data I need, but the running process very quickly > consumes a large amount of memory as it proceeds through the files. > > > > What am I doing wrong? I have attached the code required to reproduce my > problem below. > > > > > > public class TikaTest { > > > > public void tikaProcess(Path filePath) { > > Tika t = new Tika(); > > try { > > Metadata metadata = new Metadata(); > > > > String result = t.parse(filePath, metadata).toString(); > > }catch (Exception e){ > > e.printStackTrace(); > > } > > } > > > > public static void main(String[] args) { > > TikaTest tt = new TikaTest(); > > try { > > Files.list(Paths.get("g:/somedata/")).forEach( > > path -> tt.tikaProcess(path) > > ); > > }catch (Exception e) { > > e.printStackTrace(); > > } > > } > > } > >
