Lucene OOM
Hi, all We have a search engine service built with lucene 4.7, it seem that lucene eat too much momery, and we have approximate 10 million document,the index size on disk is approximate 750G. My question is why the FST$Arc objects consume so much memory? please refer to the following histo stat of jmap. Hope anybody can give me some suggestion. num #instances #bytes class name -- 1: 4346283 2294837424 [Lorg.apache.lucene.util.fst.FST$Arc; 2: 25918804 2023475632 [C 3: 17450041 1014051416 [B 4: 25878734 621089616 java.lang.String 5: 18634803 596313696 java.util.HashMap$Node 6: 14039862 561594480 java.util.TreeMap$Entry 7: 4346283 452013432 org.apache.lucene.util.fst.FST 8: 4522836 424741520 [Ljava.util.HashMap$Node; 9: 4346283 347702640 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader 10: 4683616 337220352 org.apache.lucene.util.fst.FST$Arc 11: 12947467 310739208 org.apache.lucene.util.BytesRef 12:790283 280383040 [J 13: 4359111 245496264 [Ljava.lang.Object; 14: 4545337 218176176 java.util.HashMap 15: 4510384 216498432 org.apache.lucene.index.FieldInfo 16: 4359066 199713232 [I 17: 4346283 173851320 org.apache.lucene.util.fst.BytesStore 18: 4510400 144332800 java.util.Collections$UnmodifiableMap 19: 4354347 104504328 java.util.ArrayList 20: 5736589 91785424 java.lang.Integer 21:822685 59233320 org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$NumericEntry 22:428313 13706016 org.apache.lucene.facet.taxonomy.writercache.CollisionMap$Entry 23:420547 13457504 org.wltea.analyzer.dic.DictSegment 24:1770395665248 [Lorg.wltea.analyzer.dic.DictSegment; 25:205112128 [Lorg.apache.lucene.facet.taxonomy.writercache.CollisionMap$Entry; 26: 424542377424 org.apache.lucene.store.RAMInputStream 27: 500542002160 org.apache.lucene.util.packed.Packed64 28: 440361761440 org.apache.lucene.util.packed.DirectPackedReader 29: 330131056416 java.util.concurrent.ConcurrentHashMap$Node 30: 439571054968 org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$2 Thanks & Best Regards! lubin
Re: Lucene OOM
nly > have 10 million documents! > > Are those documents huge and have lots of indexed text content, possibly > OCR/scanned stuff? If this is the case, the term dictionary may get huge > because of many terms with incorrect spelling. > > Please also give us a "ls -lh" of your index directory to make a guess. > > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: dawn breaks [mailto:2005dawnbre...@gmail.com] > > Sent: Thursday, January 11, 2018 3:40 AM > > To: java-user@lucene.apache.org > > Subject: Lucene OOM > > > > Hi, all > > We have a search engine service built with lucene 4.7, it seem that > > lucene eat too much momery, and we have approximate 10 million > > document,the > > index size on disk is approximate 750G. My question is why the FST$Arc > > objects consume so much memory? please refer to the following histo stat > > of jmap. Hope anybody can give me some suggestion. > > > > num #instances #bytes class name > > -- > >1: 4346283 2294837424 [Lorg.apache.lucene.util.fst. > FST$Arc; > >2: 25918804 2023475632 [C > >3: 17450041 1014051416 [B > >4: 25878734 621089616 java.lang.String > >5: 18634803 596313696 java.util.HashMap$Node > >6: 14039862 561594480 java.util.TreeMap$Entry > >7: 4346283 452013432 org.apache.lucene.util.fst.FST > >8: 4522836 424741520 [Ljava.util.HashMap$Node; > >9: 4346283 347702640 > > org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader > > 10: 4683616 337220352 org.apache.lucene.util.fst.FST$Arc > > 11: 12947467 310739208 org.apache.lucene.util.BytesRef > > 12:790283 280383040 [J > > 13: 4359111 245496264 [Ljava.lang.Object; > > 14: 4545337 218176176 java.util.HashMap > > 15: 4510384 216498432 org.apache.lucene.index.FieldInfo > > 16: 4359066 199713232 [I > > 17: 4346283 173851320 org.apache.lucene.util.fst. > BytesStore > > 18: 4510400 144332800 java.util.Collections$ > UnmodifiableMap > > 19: 4354347 104504328 java.util.ArrayList > > 20: 5736589 91785424 java.lang.Integer > > 21:822685 59233320 > > org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$NumericE > > ntry > > 22:428313 13706016 > > org.apache.lucene.facet.taxonomy.writercache.CollisionMap$Entry > > 23:420547 13457504 org.wltea.analyzer.dic.DictSegment > > 24:1770395665248 [Lorg.wltea.analyzer.dic. > DictSegment; > > 25:205112128 > > [Lorg.apache.lucene.facet.taxonomy.writercache.CollisionMap$Entry; > > 26: 424542377424 org.apache.lucene.store. > RAMInputStream > > 27: 500542002160 org.apache.lucene.util.packed. > Packed64 > > 28: 440361761440 > > org.apache.lucene.util.packed.DirectPackedReader > > 29: 330131056416 > > java.util.concurrent.ConcurrentHashMap$Node > > 30: 439571054968 > > org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$2 > > > > > > > > > > Thanks & Best Regards! > > lubin > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Lucene OOM
Hi, Uwe Yes, All indexes running in the same JVM with 14GiB of heap space, but the JVM heap usage is up to 95%. I'am sue that all IndexReaders/IndexSearchers has been closed properly. On 11 January 2018 at 20:46, Uwe Schindler wrote: > Hi lubin, > > the terms dictionary is using the "tim" and "tip" files. It should be > approximately in the dimension of the FST. > > Do you have all indexes running in the same JVM or is it 10 servers? > Because then the numbers look correct. If you really want to have such an > large index in a single machine using a single JVM, you should plan for > more heap space. I'd start with 12 GiB of heap space to run this index. > > A last recommendation: If you update your index during runtime, make sure > that you correctly close the outdated IndexReaders/IndexSearchers (e.g. > using SearcherManager), so you don't have orphaned instances of IndexReader > consuming heap space and disk space, because the files can't be fully > deleted as long as those are open! > > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: dawn breaks [mailto:2005dawnbre...@gmail.com] > > Sent: Thursday, January 11, 2018 10:22 AM > > To: java-user@lucene.apache.org > > Subject: Re: Lucene OOM > > > > Hi, Uwe > > Thanks for your timely reply. Yes, those documents are huge text. We > > have ten indices, and each of them has approximate 75G index size on > disk. > > Following is the directory content of one of the indices. > > > > Thanks & Best Regards! > > lubin > > > > total 74G > > -rw-r--r-- 1 root root 100K Jan 10 16:11 _2ncr_4k.del > > -rw-r--r-- 1 root root 1.1G Aug 4 12:52 _2ncr.fdt > > -rw-r--r-- 1 root root 468K Aug 4 12:52 _2ncr.fdx > > -rw-r--r-- 1 root root 636K Aug 4 12:55 _2ncr.fnm > > -rw-r--r-- 1 root root 398M Aug 4 12:53 _2ncr_Lucene41_0.doc > > -rw-r--r-- 1 root root 712M Aug 4 12:53 _2ncr_Lucene41_0.pay > > -rw-r--r-- 1 root root 744M Aug 4 12:53 _2ncr_Lucene41_0.pos > > -rw-r--r-- 1 root root 129M Aug 4 12:53 _2ncr_Lucene41_0.tim > > -rw-r--r-- 1 root root 3.1M Aug 4 12:53 _2ncr_Lucene41_0.tip > > -rw-r--r-- 1 root root 822M Aug 4 12:54 _2ncr_Lucene45_0.dvd > > -rw-r--r-- 1 root root 210K Aug 4 12:54 _2ncr_Lucene45_0.dvm > > -rw-r--r-- 1 root root 540 Aug 4 12:55 _2ncr.si > > -rw-r--r-- 1 root root 1.5G Aug 4 12:55 _2ncr.tvd > > -rw-r--r-- 1 root root 441K Aug 4 12:55 _2ncr.tvx > > -rw-r--r-- 1 root root 98K Jan 11 11:43 _555c_5x.del > > -rw-r--r-- 1 root root 1.1G Aug 25 12:51 _555c.fdt > > -rw-r--r-- 1 root root 457K Aug 25 12:51 _555c.fdx > > -rw-r--r-- 1 root root 872K Aug 25 12:54 _555c.fnm > > -rw-r--r-- 1 root root 389M Aug 25 12:52 _555c_Lucene41_0.doc > > -rw-r--r-- 1 root root 713M Aug 25 12:52 _555c_Lucene41_0.pay > > -rw-r--r-- 1 root root 750M Aug 25 12:52 _555c_Lucene41_0.pos > > -rw-r--r-- 1 root root 136M Aug 25 12:52 _555c_Lucene41_0.tim > > -rw-r--r-- 1 root root 3.2M Aug 25 12:52 _555c_Lucene41_0.tip > > -rw-r--r-- 1 root root 1.1G Aug 25 12:53 _555c_Lucene45_0.dvd > > -rw-r--r-- 1 root root 442K Aug 25 12:53 _555c_Lucene45_0.dvm > > -rw-r--r-- 1 root root 540 Aug 25 12:54 _555c.si > > -rw-r--r-- 1 root root 1.4G Aug 25 12:54 _555c.tvd > > -rw-r--r-- 1 root root 422K Aug 25 12:54 _555c.tvx > > -rw-r--r-- 1 root root 93K Jan 10 16:38 _790n_5s.del > > -rw-r--r-- 1 root root 1.1G Sep 6 14:17 _790n.fdt > > -rw-r--r-- 1 root root 438K Sep 6 14:17 _790n.fdx > > -rw-r--r-- 1 root root 1.1M Sep 6 14:20 _790n.fnm > > -rw-r--r-- 1 root root 380M Sep 6 14:18 _790n_Lucene41_0.doc > > -rw-r--r-- 1 root root 684M Sep 6 14:18 _790n_Lucene41_0.pay > > -rw-r--r-- 1 root root 746M Sep 6 14:18 _790n_Lucene41_0.pos > > -rw-r--r-- 1 root root 141M Sep 6 14:18 _790n_Lucene41_0.tim > > -rw-r--r-- 1 root root 3.5M Sep 6 14:18 _790n_Lucene41_0.tip > > -rw-r--r-- 1 root root 1.2G Sep 6 14:20 _790n_Lucene45_0.dvd > > -rw-r--r-- 1 root root 550K Sep 6 14:20 _790n_Lucene45_0.dvm > > -rw-r--r-- 1 root root 540 Sep 6 14:20 _790n.si > > -rw-r--r-- 1 root root 1.4G Sep 6 14:20 _790n.tvd > > -rw-r--r-- 1 root root 412K Sep 6 14:20 _790n.tvx > > -rw-r--r-- 1 root root 82K Jan 10 16:38 _bv18_8d.del > > -rw-r--r-- 1 root root 1.1G Oct 10 12:17 _bv18.fdt > > -rw-r--r-- 1 root root 425K Oct 10 12:17 _bv18.fdx > > -rw-r--r-- 1 root root 1.4M Oct 10 12:20 _bv18.fnm > > -rw-r--r-- 1 root root 363M Oct 10 12:18 _bv18_Lucene41_0.doc &g