We use munin with jmx plugin for monitoring all server and Solr installations. (http://munin-monitoring.org/)
Only for short time monitoring we also use jvisualvm delivered with Java SE JDK. Regards Bernd Am 25.03.2013 14:45, schrieb Arkadi Colson: > Thanks for the info! > I just upgraded java from 6 to 7... > How exactly do you monitor the memory usage and the affect of the garbage > collector? > > > On 03/25/2013 01:18 PM, Bernd Fehling wrote: >> The of UseG1GC yes, >> but with Solr 4.x, Jetty 8.1.8 and Java HotSpot(TM) 64-Bit Server VM >> (1.7.0_07). >> os.arch: amd64 >> os.name: Linux >> os.version: 2.6.32.13-0.5-xen >> >> Only args are "-XX:+UseG1GC -Xms16g -Xmx16g". >> Monitoring shows that 16g is a bit high, I might reduce it to 10g or 12g for >> the slaves. >> Start is at 5g, runtime is between 6 and 8g with some peaks to 9.5g. >> Single index, 130GByte, 43.5 mio. dokuments. >> >> Regards, >> Bernd >> >> >> Am 25.03.2013 11:55, schrieb Arkadi Colson: >>> Is sombody using the UseG1GC garbage collector with Solr and Tomcat 7? Any >>> extra options needed? >>> >>> Thanks... >>> >>> On 03/25/2013 08:34 AM, Arkadi Colson wrote: >>>> I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as >>>> parameters. I also added -XX:+UseG1GC to the java process. But now >>>> the whole machine crashes! Any idea why? >>>> >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: >>>> gfp_mask=0x201da, order=0, oom_adj=0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.077962] java cpuset=/ >>>> mems_allowed=0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078019] Pid: 29339, comm: java >>>> Not tainted 2.6.32-5-amd64 #1 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078095] Call Trace: >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078155] [<ffffffff810b6324>] ? >>>> oom_kill_process+0x7f/0x23f >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078233] [<ffffffff810b6848>] ? >>>> __out_of_memory+0x12a/0x141 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078309] [<ffffffff810b699f>] ? >>>> out_of_memory+0x140/0x172 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078385] [<ffffffff810ba704>] ? >>>> __alloc_pages_nodemask+0x4ec/0x5fc >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078469] [<ffffffff812fb47a>] ? >>>> io_schedule+0x93/0xb7 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078541] [<ffffffff810bbc69>] ? >>>> __do_page_cache_readahead+0x9b/0x1b4 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078626] [<ffffffff81064fc0>] ? >>>> wake_bit_function+0x0/0x23 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078702] [<ffffffff810bbd9e>] ? >>>> ra_submit+0x1c/0x20 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078773] [<ffffffff810b4a72>] ? >>>> filemap_fault+0x17d/0x2f6 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078849] [<ffffffff810ca9e2>] ? >>>> __do_fault+0x54/0x3c3 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078921] [<ffffffff810ccd36>] ? >>>> handle_mm_fault+0x3b8/0x80f >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.078999] [<ffffffff8101166e>] ? >>>> apic_timer_interrupt+0xe/0x20 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079078] [<ffffffff812febf6>] ? >>>> do_page_fault+0x2e0/0x2fc >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079153] [<ffffffff812fca95>] ? >>>> page_fault+0x25/0x30 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079222] Mem-Info: >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079261] Node 0 DMA per-cpu: >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079310] CPU 0: hi: 0, btch: >>>> 1 usd: 0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079374] CPU 1: hi: 0, btch: >>>> 1 usd: 0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079439] CPU 2: hi: 0, btch: >>>> 1 usd: 0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079527] CPU 3: hi: 0, btch: >>>> 1 usd: 0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079591] Node 0 DMA32 per-cpu: >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079642] CPU 0: hi: 186, btch: >>>> 31 usd: 0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079706] CPU 1: hi: 186, btch: >>>> 31 usd: 0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079770] CPU 2: hi: 186, btch: >>>> 31 usd: 0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079834] CPU 3: hi: 186, btch: >>>> 31 usd: 0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079899] Node 0 Normal per-cpu: >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.079951] CPU 0: hi: 186, btch: >>>> 31 usd: 17 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.080015] CPU 1: hi: 186, btch: >>>> 31 usd: 0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.080079] CPU 2: hi: 186, btch: >>>> 31 usd: 2 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.080142] CPU 3: hi: 186, btch: >>>> 31 usd: 0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_anon:2638016 >>>> inactive_anon:388557 isolated_anon:0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.080209] active_file:68 >>>> inactive_file:236 isolated_file:0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.080210] unevictable:0 dirty:5 >>>> writeback:5 unstable:0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.080211] free:16573 >>>> slab_reclaimable:2398 slab_unreclaimable:2335 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.080212] mapped:36 shmem:0 >>>> pagetables:24750 bounce:0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.080575] Node 0 DMA free:15796kB >>>> min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB >>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB >>>> isolated(file):0kB present:15244kB mlocked:0kB dirty:0kB writeback:0kB >>>> mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB >>>> kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB >>>> pages_scanned:0 all_unreclaimable? yes >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.081041] lowmem_reserve[]: 0 3000 >>>> 12090 12090 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.081110] Node 0 DMA32 >>>> free:39824kB min:3488kB low:4360kB high:5232kB active_anon:2285240kB >>>> inactive_anon:520624kB active_file:0kB inactive_file:188kB unevictable:0kB >>>> isolated(anon):0kB isolated(file):0kB present:3072096kB mlocked:0kB >>>> dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4152kB >>>> slab_unreclaimable:1640kB kernel_stack:1104kB pagetables:31100kB >>>> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:89 >>>> all_unreclaimable? no >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.081600] lowmem_reserve[]: 0 0 >>>> 9090 9090 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.081664] Node 0 Normal >>>> free:10672kB min:10572kB low:13212kB high:15856kB active_anon:8266824kB >>>> inactive_anon:1033604kB active_file:292kB inactive_file:756kB >>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:9308160kB >>>> mlocked:0kB dirty:20kB writeback:20kB mapped:156kB shmem:0kB >>>> slab_reclaimable:5440kB slab_unreclaimable:7692kB kernel_stack:1280kB >>>> pagetables:67900kB unstable:0kB bounce:0kB writeback_tmp:0kB >>>> pages_scanned:256 all_unreclaimable? no >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.082171] lowmem_reserve[]: 0 0 0 0 >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.082240] Node 0 DMA: 1*4kB 2*8kB >>>> 2*16kB 0*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB >>>> 3*4096kB = 15796kB >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.082394] Node 0 DMA32: 4578*4kB >>>> 2434*8kB 4*16kB 1*32kB 1*64kB 2*128kB 0*256kB 2*512kB 1*1024kB >>>> 0*2048kB 0*4096kB = 40248kB >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.082555] Node 0 Normal: 1020*4kB >>>> 332*8kB 7*16kB 3*32kB 6*64kB 6*128kB 2*256kB 2*512kB 1*1024kB >>>> 0*2048kB 0*4096kB = 10656kB >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.082715] 7069 total pagecache >>>> pages >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.082769] 6768 pages in swap cache >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.114216] 203 pages shared >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.114261] 3067047 pages non-shared >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.114314] Out of memory: kill >>>> process 29301 (java) score 37654 or a child >>>> Mar 22 20:30:01 solr01-gs kernel: [716098.114401] Killed process 29301 >>>> (java) >>>> >>>> Thanks! >>>> >>>> On 03/14/2013 04:00 PM, Arkadi Colson wrote: >>>>> On 03/14/2013 03:11 PM, Toke Eskildsen wrote: >>>>>> On Thu, 2013-03-14 at 13:10 +0100, Arkadi Colson wrote: >>>>>>> When I shutdown tomcat free -m and top keeps telling me the same values. >>>>>>> Almost no free memory... >>>>>>> >>>>>>> Any idea? >>>>>> Are you reading top & free right? It is standard behaviour for most >>>>>> modern operating systems to have very little free memory. As long as the >>>>>> sum of free memory and cache is high, everything is fine. >>>>>> >>>>>> Looking at the stats you gave previously we have >>>>>> >>>>>>>> *top* >>>>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >>>>>>>> COMMAND &nbs >>>>>>>> p; >>>>>>>> 13666 root 20 0 86.8g 4.7g 248m S 101 39.7 478:37.45 >>>>>> 4.7GB physical memory used and ~80GB used for memory mapping the index. >>>>>> >>>>>>>> *free -m** >>>>>>>> * total used free shared buffers cached >>>>>>>> Mem: 12047 11942 105 0 180 6363 >>>>>>>> -/+ buffers/cache: 5399 6648 >>>>>>>> Swap: 956 75 881 >>>>>> So 6648MB used for either general disk cache or memory mapped index. >>>>>> This really translates to 6648MB (plus the 105MB above) available memory >>>>>> as any application asking for memory will get it immediately from that >>>>>> pool (sorry if this is basic stuff for you). >>>>>> >>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>>> Caused by: java.lang.OutOfMemoryError >>>>>>>> at java.util.zip.ZipFile.open(Native Method) >>>>>>>> at java.util.zip.ZipFile.<init>(ZipFile.java:127) >>>>>>>> at java.util.zip.ZipFile.<init>(ZipFile.java:144) >>>>>>>> at >>>>>>>> org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:157) >>>>>> [...] >>>>>> >>>>>>>> Java HotSpot(TM) 64-Bit Server VM warning: Attempt to allocate stack >>>>>>>> guard pages failed. >>>>>>>> mmap failed for CEN and END part of zip file >>>>>> A quick search shows that other people have had problems with ZipFile in >>>>>> at least some sub-versions of Java 1.7. However, another very common >>>>>> cause for OOM with memory mapping is that the limit for allocating >>>>>> virtual memory is too low. >>>>> We do not index zip files so that could not cause the problem >>>>> >>>>>> Try doing a >>>>>> ulimit -v >>>>>> on the machine. If the number is somewhere around 100000000 (100GB), >>>>>> Lucene's memory mapping of your index (the 80GB) plus the ZipFile's >>>>>> memory mapping plus other processes might hit the ceiling. If that is >>>>>> the case, simply raise the limit. >>>>>> >>>>>> - Toke >>>>>> >>>>> ulimit -v shows me unlimited >>>>> I decreased the hard commit time to 10 seconds and set ramBufferSizeMB to >>>>> 250. Hope this helps... >>>>> Will keep you informed! >>>>> >>>>> Thanks for the explanation! >>>>> >>>>> > -- ************************************************************* Bernd Fehling Bielefeld University Library Dipl.-Inform. (FH) LibTec - Library Technology Universitätsstr. 25 and Knowledge Management 33615 Bielefeld Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de BASE - Bielefeld Academic Search Engine - www.base-search.net *************************************************************