Re: Solr and jvm Garbage Collection tuning
I think this could help : http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Cheers 2013/9/27 ewinclub7 ewincl...@hotmail.com ด้วยที่แทงบอลแบบออนไลน์กำลังมาแรงทำให้พวกโต๊ะบอลเดี๋ยวนี้ก็เริ่มขยับขยายมาเปิดรับแทงบอลออนไลน์เอง download goldclub http://www.goldclub.net/download/ เป้าหมายหลักในวิธีการเล่นคาสิโนนั้น มีเพื่อความเพลิดเพลินหรือความสนุก ไม่ใช่เพื่อมาหาเงินหรือหวังที่จะรวย เพราะนั้นเหมือนกับการที่เราเอาจิตใจของตนเองไปผูกติดกับวิธีการเล่นพนัน โปรโมชั่น goldclub slot http://www.goldclub.net/promotion/ เล่นสนุก เล่นง่าย พร้อมบริการอย่างเป็นกันเอง กับทีมงาน ผลบอลเมื่อคืนนี้ http://www.mixscore.com/result-score/ จากที่เราได้เห็นวิธีการเล่นการพนันที่เล่นกันง่ายนั่นก็เลยทำให้คนเรานั่นเกิดความคิดที่อยากจะลองเล่นการพนันลองดู สาเหตุที่ทำให้นักเล่นหน้าใหม่ได้หัดเล่นเกมส์ซะเป็นส่วนใหญ่ goldclub slot http://www.goldclub-slot.com/ เพราะแน่นอนว่าจากที่เคยไปเที่ยวประเทศไหนที่มีคาสิโนและเข้าไปลองเล่นดูก็คงจะได้สัมผัสถึงความคึกคักของคาสิโนนั้นๆ ถอนออกมาทั้ง 1200 บาทเลยก็ได้ หรือจะถอนมาแค่ 1000 บาท อีก 200 บาทเก็บไว้ เล่นอีก แบบนี้ก็ได้เล่นกัน -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-jvm-Garbage-Collection-tuning-tp1455467p4092328.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Solr and jvm Garbage Collection tuning
ด้วยที่แทงบอลแบบออนไลน์กำลังมาแรงทำให้พวกโต๊ะบอลเดี๋ยวนี้ก็เริ่มขยับขยายมาเปิดรับแทงบอลออนไลน์เอง download goldclub http://www.goldclub.net/download/ เป้าหมายหลักในวิธีการเล่นคาสิโนนั้น มีเพื่อความเพลิดเพลินหรือความสนุก ไม่ใช่เพื่อมาหาเงินหรือหวังที่จะรวย เพราะนั้นเหมือนกับการที่เราเอาจิตใจของตนเองไปผูกติดกับวิธีการเล่นพนัน โปรโมชั่น goldclub slot http://www.goldclub.net/promotion/ เล่นสนุก เล่นง่าย พร้อมบริการอย่างเป็นกันเอง กับทีมงาน ผลบอลเมื่อคืนนี้ http://www.mixscore.com/result-score/ จากที่เราได้เห็นวิธีการเล่นการพนันที่เล่นกันง่ายนั่นก็เลยทำให้คนเรานั่นเกิดความคิดที่อยากจะลองเล่นการพนันลองดู สาเหตุที่ทำให้นักเล่นหน้าใหม่ได้หัดเล่นเกมส์ซะเป็นส่วนใหญ่ goldclub slot http://www.goldclub-slot.com/ เพราะแน่นอนว่าจากที่เคยไปเที่ยวประเทศไหนที่มีคาสิโนและเข้าไปลองเล่นดูก็คงจะได้สัมผัสถึงความคึกคักของคาสิโนนั้นๆ ถอนออกมาทั้ง 1200 บาทเลยก็ได้ หรือจะถอนมาแค่ 1000 บาท อีก 200 บาทเก็บไว้ เล่นอีก แบบนี้ก็ได้เล่นกัน -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-jvm-Garbage-Collection-tuning-tp1455467p4092328.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr and jvm Garbage Collection tuning
Thanks Kent for your info. We are not doing any faceting, sorting, or much else. My guess is that most of the memory increase is just the data structures created when parts of the frq and prx files get read into memory. Our frq files are about 77GB and the prx files are about 260GB per shard and we are running 3 shards per machine. I suspect that the document cache and query result cache don't take up that much space, but will try a run with those caches set to 0, just to see. We have dual 4 core processors and 74GB total memory. We want to leave a significant amount of memory free for OS disk caching. We tried increasing the memory from 20GB to 28GB and adding the -XXMaxGCPauseMillis=1000 flag but that seemed to have no effect. Currently I'm testing using the ConcurrentMarkSweep and that's looking much better although I don't understand why it has sized the Eden space down into the 20MB range. However, I am very new to Java memory management. Anyone know if when using ConcurrentMarkSweep its better to let the JVM size the Eden space or better to give it some hints? Once we get some decent JVM settings we can put into production I'll be testing using termIndexInterval with Solr 1.4.1 on our test server. Tom -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] .What are your current GC settings? Also, I guess I'd look at ways you can reduce the heap size needed. Caching, field type choices, faceting choices. Also could try playing with the termIndexInterval which will load fewer terms into memory at the cost of longer seeks. At some point, though, you just may need more shards and the resulting smaller indexes. How many CPU cores do you have on each machine?
Re: Solr and jvm Garbage Collection tuning
On Mon, Sep 13, 2010 at 6:45 PM, Burton-West, Tom tburt...@umich.edu wrote: Thanks Kent for your info. We are not doing any faceting, sorting, or much else. My guess is that most of the memory increase is just the data structures created when parts of the frq and prx files get read into memory. Our frq files are about 77GB and the prx files are about 260GB per shard and we are running 3 shards per machine. I suspect that the document cache and query result cache don't take up that much space, but will try a run with those caches set to 0, just to see. We have dual 4 core processors and 74GB total memory. We want to leave a significant amount of memory free for OS disk caching. We tried increasing the memory from 20GB to 28GB and adding the -XXMaxGCPauseMillis=1000 flag but that seemed to have no effect. Currently I'm testing using the ConcurrentMarkSweep and that's looking much better although I don't understand why it has sized the Eden space down into the 20MB range. However, I am very new to Java memory management. Anyone know if when using ConcurrentMarkSweep its better to let the JVM size the Eden space or better to give it some hints? Really the best thing to do is to run the system for a while with GC logging on and then look at how often the young generation GC is occurring. A set of parameters like: -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails Should give you some indication how often the young gen GC is occurring. If it's often, you can try increasing the size of the young generation. The option: -Xloggc:some file will dump this information to the specified file rather than sending it to the standard error. I've done this a few times with a variety of systems: some times you want to make the young gen bigger and some times you don't. Steve -- Stephen Green http://thesearchguy.wordpress.com
Re: Solr and jvm Garbage Collection tuning
On Sep 10, 2010, at 7:01 PM, Burton-West, Tom wrote: We have noticed that when the first query hits Solr after starting it up, memory use increases significantly, from about 1GB to about 16GB, and then as queries are received it goes up to about 19GB at which point there is a Full Garbage Collection which takes about 30 seconds and then memory use drops back down to 16GB. Under a relatively heavy load, the full GC happens about every 10-20 minutes. We are running 3 Solr shards under one Tomcat with 20GB allocated to the jvm. Each shard has a total index size of about 400GB on and a tii size of about 600MB and indexes about 650,000 full-text books. (The server has a total of 72GB of memory, so we are leaving quite a bit of memory for the OS disk cache). Is there some argument we could give the jvm so that it would collect garbage more frequently? Or some other JVM tuning action that might reduce the amount of time where Solr is waiting on GC? If we could get the time for each GC to take under a second, with the trade-off being that GC would occur much more frequently, that would help us avoid the occasional query taking more than 30 seconds at the cost of a larger number of queries taking at least a second. What are your current GC settings? Also, I guess I'd look at ways you can reduce the heap size needed. Caching, field type choices, faceting choices. Also could try playing with the termIndexInterval which will load fewer terms into memory at the cost of longer seeks. At some point, though, you just may need more shards and the resulting smaller indexes. How many CPU cores do you have on each machine?
Re: Solr and jvm Garbage Collection tuning
Thanks for the real life examples. You would have to do a LOT of sharding to get that to work better. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/10/10, Kent Fitch kent.fi...@gmail.com wrote: From: Kent Fitch kent.fi...@gmail.com Subject: Re: Solr and jvm Garbage Collection tuning To: solr-user@lucene.apache.org Date: Friday, September 10, 2010, 10:45 PM Hi Tim, For what it is worth, behind Trove (http://trove.nla.gov.au/) are 3 SOLR-managed indices and 1 Lucene index. None of ours is as big as one of your shards, and one of our SOLR-managed indices is tiny, but your experiences with long GC pauses are familar to us. One of the most difficult indices to tune is our bibliographic index of around 38M mostly metadata records which is around 125GB and 97MB tii files. We need to commit updates and reopen the index every 90 seconds, and the facet recalculation (using UnInverted) was taking quite a lot of time, and seemed to generate lots of objects to be collected on each reopening. Although we've been through several rounds of tuning which have seemed to work, at least temporarily, a few months ago we started getting 12 sec full gc times every 90 secs, which was no good! We've noticed/did three things: 1) optimise to 1 segment - we'd got to the stage where 50% of the documents had been updated (hence deleted), and the maxdocid was 50% bigger than it needed to be, and hence datastructures whose size was proportional to maxdocid had increased a lot. Optimising to 1 segment greatly reduced full GC frequency and times. 2) for most of our facets, forcing the facets to be filters rather than uninverted happened to work better - but this depends on many factors, and certainly isnt a cure-all for all facets - uninverted often works much better than filters! 3) after lots of benchmarking real updates and queries on a dev system, we came up with this set of JVM parameters that worked best for our environment (at the moment!): -Xmx17000M -XX:NewSize=3500M -XX:SurvivorRatio=3 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC \ -XX:+CMSIncrementalMode I can't say exactly why, except that with this combination of parameters and our data, a much bigger newgen led to less movement of objects to oldgen, and non-full-GC collections on oldgen worked much better. Currently we are seeing less than 10 Full GC's a day, and they almost always take less than 4 seconds. This index is running on an 8 core X5570 machine with 64GB, sharing it with a large/busy mysql instance and the Trove web server. One of our other indices is only updated once per day, but is larger: 33.5M docs representing full text of archived web pages, 246GB, tii file is 36MB. JVM parms are -Xmx1M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC. It also does less than 10 Full GC's per day, taking less than 5 sec each. Our other large index, newspapers, is a native Lucene index, about 180GB with comparatively large tii of 280MB (probably for the same reason your tii is large - the contents of this database is mostly OCR'ed text). This index is updated/reopened every 3 minutes (to incorporate OCR text corrections and tagging) and we use a bitmap to represent all facet values, which typically take 5 secs to rebuild on each reopen. JVM parms: -mx15000M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC Although this JVM usually does fewer than 5 GC's per day, these Full GC's often take 20-30 seconds, and we need to test increasing the Newsize on this JVM to see if we can reduce these pauses. The web archive and newspaper index are running on 8 core X5570 machine with 72GB. We are also running a separate copy/version of this index behind the site http://newspapers.nla.gov.au/ - the main difference is that the Trove version using shingling (inspired by the Hathi Trust results) to improve searches containing common words. This other version is running on a machine with 32GB and 8 X5460 cores and has JVM parms: -mx11500M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC Apart from the old newspapers index, all other SOLR/lucene indices are maintained on SSDs (Intel x25m 160GB), which whilst not having anything to do with GCs, work very very well - we couldnt cope with our current query volumes on rotating disk without spending a great deal of money. The old newspaper index is running on a SAN with 24 fast disks backing it, and we can't support the same query rate on it as we can with the other newspaper index on SSDs (even before the shingling change). Kent Fitch Trove development team National Library of Australia
Re: Solr and jvm Garbage Collection tuning
Hi Tim, For what it is worth, behind Trove (http://trove.nla.gov.au/) are 3 SOLR-managed indices and 1 Lucene index. None of ours is as big as one of your shards, and one of our SOLR-managed indices is tiny, but your experiences with long GC pauses are familar to us. One of the most difficult indices to tune is our bibliographic index of around 38M mostly metadata records which is around 125GB and 97MB tii files. We need to commit updates and reopen the index every 90 seconds, and the facet recalculation (using UnInverted) was taking quite a lot of time, and seemed to generate lots of objects to be collected on each reopening. Although we've been through several rounds of tuning which have seemed to work, at least temporarily, a few months ago we started getting 12 sec full gc times every 90 secs, which was no good! We've noticed/did three things: 1) optimise to 1 segment - we'd got to the stage where 50% of the documents had been updated (hence deleted), and the maxdocid was 50% bigger than it needed to be, and hence datastructures whose size was proportional to maxdocid had increased a lot. Optimising to 1 segment greatly reduced full GC frequency and times. 2) for most of our facets, forcing the facets to be filters rather than uninverted happened to work better - but this depends on many factors, and certainly isnt a cure-all for all facets - uninverted often works much better than filters! 3) after lots of benchmarking real updates and queries on a dev system, we came up with this set of JVM parameters that worked best for our environment (at the moment!): -Xmx17000M -XX:NewSize=3500M -XX:SurvivorRatio=3 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC \ -XX:+CMSIncrementalMode I can't say exactly why, except that with this combination of parameters and our data, a much bigger newgen led to less movement of objects to oldgen, and non-full-GC collections on oldgen worked much better. Currently we are seeing less than 10 Full GC's a day, and they almost always take less than 4 seconds. This index is running on an 8 core X5570 machine with 64GB, sharing it with a large/busy mysql instance and the Trove web server. One of our other indices is only updated once per day, but is larger: 33.5M docs representing full text of archived web pages, 246GB, tii file is 36MB. JVM parms are -Xmx1M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC. It also does less than 10 Full GC's per day, taking less than 5 sec each. Our other large index, newspapers, is a native Lucene index, about 180GB with comparatively large tii of 280MB (probably for the same reason your tii is large - the contents of this database is mostly OCR'ed text). This index is updated/reopened every 3 minutes (to incorporate OCR text corrections and tagging) and we use a bitmap to represent all facet values, which typically take 5 secs to rebuild on each reopen. JVM parms: -mx15000M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC Although this JVM usually does fewer than 5 GC's per day, these Full GC's often take 20-30 seconds, and we need to test increasing the Newsize on this JVM to see if we can reduce these pauses. The web archive and newspaper index are running on 8 core X5570 machine with 72GB. We are also running a separate copy/version of this index behind the site http://newspapers.nla.gov.au/ - the main difference is that the Trove version using shingling (inspired by the Hathi Trust results) to improve searches containing common words. This other version is running on a machine with 32GB and 8 X5460 cores and has JVM parms: -mx11500M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC Apart from the old newspapers index, all other SOLR/lucene indices are maintained on SSDs (Intel x25m 160GB), which whilst not having anything to do with GCs, work very very well - we couldnt cope with our current query volumes on rotating disk without spending a great deal of money. The old newspaper index is running on a SAN with 24 fast disks backing it, and we can't support the same query rate on it as we can with the other newspaper index on SSDs (even before the shingling change). Kent Fitch Trove development team National Library of Australia