Hello - you should set Xmx yourself, 100 MB should be ok depending on the size 
of your documents. Finding the optimal Xmx is iterative, as long as no 
OutOfMemory occurs, your Xmx is either too high, or just spot on. If you hit an 
OutOfMemory regardless of Xmx there's probably a leak, but that rarely happens.

Having 8 GB of heap is not a good idea, the JVM can easily eat it all, whether 
it needs it or not.

Markus

-----Original message-----
> From:Will Jones <[email protected]>
> Sent: Tuesday 3rd January 2017 19:41
> To: [email protected]
> Subject: Re: Memory issues with the Tika Facade
> 
> Hello Both, 
> 
> Thanks for the reply. Using VisualVM it shows me that 8GB is being reserved 
> (8GB Xmx), the Used memory quickly climbs up to around 6GB and eventually to 
> 8GB at which point the program will crash. If I trigger Garbage Collections 
> it does not save any memory. 
> 
> The files themselves are a mixture of PDF, JPG, and Office. The largest PDF 
> file is 20MB, the largest DOCX is 600KB. I have done some testing and it is 
> the PDF files that cause the issue (only running the JPG and Office files 
> causes no memory problems). 
> 
> I am using Tika 1.14. I had thought by disposing of the Tika facade each loop 
> iteration this would have freed up any memory used by the previous parse 
> (sorry I am a bit new to Java)? 
> 
> Thank you 
> 
> 
> 
> On 3 January 2017 at 17:56, Allison, Timothy B. <[email protected] 
> <mailto:[email protected]>> wrote:
> Concur with Markus.
 
> 
 
> Also, what type of files are these?  We know that very large .docx (think 
> "War and Peace") and .pptx can use up a crazy amount of memory.  Weve added 
> new experimental parsers to handle those via SAX in trunk (coming in v 1.15), 
> and these parsers decrease memory usage dramatically.
 
> 
 
> 
 
> -----Original Message-----
 
> From: Markus Jelsma [mailto:[email protected] 
> <mailto:[email protected]>]
 
> Sent: Tuesday, January 3, 2017 12:23 PM
 
> To: [email protected] <mailto:[email protected]>
 
> Subject: RE: Memory issues with the Tika Facade
 
> 
 
> Hello - what is a large amount of memory, how do you determine it (make sure 
> you look at RES, not VIRT) and what are your JVM settings.
 
> 
 
> It is not uncommon for programs to allocate much memory if the default max 
> heap is used, 2 GB in my case. If your JVM eats too much, limit it by setting 
> Xmx to a lower level.
 
> 
 
> Markus
 
> 
 
> -----Original message-----
 
> > From:Will Jones <[email protected] <mailto:[email protected]>>
 
> > Sent: Tuesday 3rd January 2017 18:14
 
> > To: [email protected] <mailto:[email protected]>
 
> > Subject: Memory issues with the Tika Facade
 
> >
 
> > Hi,
 
> >
 
> > Big fan of what you are doing with Apache Tika. I have been using the Tika 
> > facade to fetch metadata on each file in a directory containing a large 
> > number of files. 
 
> >
 
> > It returns the data I need, but the running process very quickly consumes a 
> > large amount of memory as it proceeds through the files.
 
> >
 
> > What am I doing wrong? I have attached the code required to reproduce my 
> > problem below.
 
> >
 
> >
 
> > public class TikaTest {
 
> >
 
> >     public void tikaProcess(Path filePath) {
 
> >         Tika t = new Tika();
 
> >         try {
 
> >             Metadata metadata = new Metadata();
 
> >
 
> >             String result = t.parse(filePath, metadata).toString();
 
> >         }catch (Exception e){
 
> >             e.printStackTrace();
 
> >         }
 
> >     }
 
> >
 
> >     public static void main(String[] args) {
 
> >         TikaTest tt = new TikaTest();
 
> >         try {
 
> >             Files.list(Paths.get("g:/somedata/")).forEach(
 
> >                     path -> tt.tikaProcess(path)
 
> >             );
 
> >         }catch (Exception e) {
 
> >             e.printStackTrace();
 
> >         }
 
> >     }
 
> > }
 
> 
 
> 

Reply via email to