Hi all,
When we run the first query after starting up Solr, memory use goes up from
about 1GB to 15GB and never goes below that level. In debugging a recent OOM
problem I ran jmap with the output appended below. Not surprisingly, given the
size of our indexes, it looks like the TermInfo and Term data structures which
are the in-memory representation of the tii file are taking up most of the
memory. This is running Solr under Tomcat with 16GB allocated to the jvm and 3
shards each with a tii file of about 600MB.
Total index size is about 400GB for each shard (we are indexing about 600,000
full-text books in each shard).
In interpreting the jmap output, can we assume that the listings for utf8
character arrays ("[C"), java.lang.String, long int arrays ("[J), and int
arrays ("[i) are all part of the data structures involved in representing the
tii file in memory?
Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search
(jmap output, commas in numbers added)
num #instances #bytes class name
----------------------------------------------
1: 82,496,803 4,273,137,904 [C
2: 82,498,673 3,299,946,920 java.lang.String
3: 27,810,887 1,112,435,480 org.apache.lucene.index.TermInfo
4: 27,533,080 1,101,323,200 org.apache.lucene.index.TermInfo
5: 27,115,577 1,084,623,080 org.apache.lucene.index.TermInfo
6: 27,810,894 889,948,608 org.apache.lucene.index.Term
7: 27,533,088 881,058,816 org.apache.lucene.index.Term
8: 27,115,589 867,698,848 org.apache.lucene.index.Term
9: 148 659,685,520 [J
10: 2 222,487,072 [Lorg.apache.lucene.index.Term;
11: 2 222,487,072 [Lorg.apache.lucene.index.TermInfo;
12: 2 220,264,600 [Lorg.apache.lucene.index.Term;
13: 2 220,264,600 [Lorg.apache.lucene.index.TermInfo;
14: 2 216,924,560 [Lorg.apache.lucene.index.Term;
15: 2 216,924,560 [Lorg.apache.lucene.index.TermInfo;
16: 737,060 155,114,960 [I
17: 627,793 35,156,408 java.lang.ref.SoftReference