Re: Solr and jvm Garbage Collection tuning

2013-09-30 Thread Alessandro Benedetti
I think this could help : http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Cheers


2013/9/27 ewinclub7 ewincl...@hotmail.com


 ด้วยที่แทงบอลแบบออนไลน์กำลังมาแรงทำให้พวกโต๊ะบอลเดี๋ยวนี้ก็เริ่มขยับขยายมาเปิดรับแทงบอลออนไลน์เอง
 download goldclub http://www.goldclub.net/download/
 เป้าหมายหลักในวิธีการเล่นคาสิโนนั้น มีเพื่อความเพลิดเพลินหรือความสนุก
 ไม่ใช่เพื่อมาหาเงินหรือหวังที่จะรวย
 เพราะนั้นเหมือนกับการที่เราเอาจิตใจของตนเองไปผูกติดกับวิธีการเล่นพนัน
 โปรโมชั่น goldclub slot http://www.goldclub.net/promotion/   เล่นสนุก
 เล่นง่าย พร้อมบริการอย่างเป็นกันเอง กับทีมงาน  ผลบอลเมื่อคืนนี้
 http://www.mixscore.com/result-score/

 จากที่เราได้เห็นวิธีการเล่นการพนันที่เล่นกันง่ายนั่นก็เลยทำให้คนเรานั่นเกิดความคิดที่อยากจะลองเล่นการพนันลองดู
 สาเหตุที่ทำให้นักเล่นหน้าใหม่ได้หัดเล่นเกมส์ซะเป็นส่วนใหญ่  goldclub slot
 http://www.goldclub-slot.com/

 เพราะแน่นอนว่าจากที่เคยไปเที่ยวประเทศไหนที่มีคาสิโนและเข้าไปลองเล่นดูก็คงจะได้สัมผัสถึงความคึกคักของคาสิโนนั้นๆ
 ถอนออกมาทั้ง 1200 บาทเลยก็ได้ หรือจะถอนมาแค่ 1000 บาท อีก 200 บาทเก็บไว้
 เล่นอีก แบบนี้ก็ได้เล่นกัน



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-and-jvm-Garbage-Collection-tuning-tp1455467p4092328.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Solr and jvm Garbage Collection tuning

2013-09-27 Thread ewinclub7
ด้วยที่แทงบอลแบบออนไลน์กำลังมาแรงทำให้พวกโต๊ะบอลเดี๋ยวนี้ก็เริ่มขยับขยายมาเปิดรับแทงบอลออนไลน์เอง
 
download goldclub http://www.goldclub.net/download/  
เป้าหมายหลักในวิธีการเล่นคาสิโนนั้น มีเพื่อความเพลิดเพลินหรือความสนุก
ไม่ใช่เพื่อมาหาเงินหรือหวังที่จะรวย
เพราะนั้นเหมือนกับการที่เราเอาจิตใจของตนเองไปผูกติดกับวิธีการเล่นพนัน 
โปรโมชั่น goldclub slot http://www.goldclub.net/promotion/   เล่นสนุก
เล่นง่าย พร้อมบริการอย่างเป็นกันเอง กับทีมงาน  ผลบอลเมื่อคืนนี้
http://www.mixscore.com/result-score/  
จากที่เราได้เห็นวิธีการเล่นการพนันที่เล่นกันง่ายนั่นก็เลยทำให้คนเรานั่นเกิดความคิดที่อยากจะลองเล่นการพนันลองดู
สาเหตุที่ทำให้นักเล่นหน้าใหม่ได้หัดเล่นเกมส์ซะเป็นส่วนใหญ่  goldclub slot
http://www.goldclub-slot.com/  
เพราะแน่นอนว่าจากที่เคยไปเที่ยวประเทศไหนที่มีคาสิโนและเข้าไปลองเล่นดูก็คงจะได้สัมผัสถึงความคึกคักของคาสิโนนั้นๆ
ถอนออกมาทั้ง 1200 บาทเลยก็ได้ หรือจะถอนมาแค่ 1000 บาท อีก 200 บาทเก็บไว้
เล่นอีก แบบนี้ก็ได้เล่นกัน



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-jvm-Garbage-Collection-tuning-tp1455467p4092328.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr and jvm Garbage Collection tuning

2010-09-13 Thread Burton-West, Tom
Thanks Kent for your info.  

We are not doing any faceting, sorting, or much else.  My guess is that most of 
the memory increase is just the data structures created when parts of the frq 
and prx files get read into memory.  Our frq files are about 77GB  and the prx 
files are about 260GB per shard and we are running 3 shards per machine.   I 
suspect that the document cache and query result cache don't take up that much 
space, but will try a run with those caches set to 0, just to see.

We have dual 4 core processors and 74GB total memory.  We want to leave a 
significant amount of memory free for OS disk caching. 

We tried increasing the memory from 20GB to 28GB and adding the 
-XXMaxGCPauseMillis=1000 flag but that seemed to have no effect.  

Currently I'm testing using the ConcurrentMarkSweep and that's looking much 
better although I don't understand why it has sized the Eden space down into 
the 20MB range. However, I am very new to Java memory management.

Anyone know if when using ConcurrentMarkSweep its better to let the JVM size 
the Eden space or better to give it some hints?


Once we get some decent JVM settings we can put into production I'll be testing 
using termIndexInterval with Solr 1.4.1 on our test server.

Tom

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 

.What are your current GC settings?  Also, I guess I'd look at ways you can 
reduce the heap size needed. 
 Caching, field type choices, faceting choices.  
Also could try playing with the termIndexInterval which will load fewer terms 
into memory at the cost of longer seeks. 

 At some point, though, you just may need more shards and the resulting 
 smaller indexes.  How many CPU cores do you have on each machine?


Re: Solr and jvm Garbage Collection tuning

2010-09-13 Thread Stephen Green
On Mon, Sep 13, 2010 at 6:45 PM, Burton-West, Tom tburt...@umich.edu wrote:
 Thanks Kent for your info.

 We are not doing any faceting, sorting, or much else.  My guess is that most 
 of the memory increase is just the data structures created when parts of the 
 frq and prx files get read into memory.  Our frq files are about 77GB  and 
 the prx files are about 260GB per shard and we are running 3 shards per 
 machine.   I suspect that the document cache and query result cache don't 
 take up that much space, but will try a run with those caches set to 0, just 
 to see.

 We have dual 4 core processors and 74GB total memory.  We want to leave a 
 significant amount of memory free for OS disk caching.

 We tried increasing the memory from 20GB to 28GB and adding the 
 -XXMaxGCPauseMillis=1000 flag but that seemed to have no effect.

 Currently I'm testing using the ConcurrentMarkSweep and that's looking much 
 better although I don't understand why it has sized the Eden space down into 
 the 20MB range. However, I am very new to Java memory management.

 Anyone know if when using ConcurrentMarkSweep its better to let the JVM size 
 the Eden space or better to give it some hints?

Really the best thing to do is to run the system for a while with GC
logging on and then look at how often the young generation GC is
occurring.  A set of parameters like:

-verbose:gc -XX:+PrintGCTimeStamps  -XX:+PrintGCDetails

Should give you some indication how often the young gen GC is
occurring.  If it's often, you can try increasing the size of the
young generation.  The option:

-Xloggc:some file

will dump this information to the specified file rather than sending
it to the standard error.

I've done this a few times with a variety of systems:  some times you
want to make the young gen bigger and some times you don't.

Steve
-- 
Stephen Green
http://thesearchguy.wordpress.com


Re: Solr and jvm Garbage Collection tuning

2010-09-12 Thread Grant Ingersoll

On Sep 10, 2010, at 7:01 PM, Burton-West, Tom wrote:

 We have noticed that when the first query hits Solr after starting it up, 
 memory use increases significantly, from about 1GB to about 16GB, and then as 
 queries are received it goes up to about 19GB at which point there is a Full 
 Garbage Collection which takes about 30 seconds and then memory use drops 
 back down to 16GB.  Under a relatively heavy load, the full GC happens about 
 every 10-20 minutes.
 
 We are running 3 Solr shards under one Tomcat with 20GB allocated to the jvm. 
  Each shard has a total index size of about 400GB on and a tii size of about 
 600MB and indexes about 650,000 full-text books. (The server has a total of 
 72GB of memory, so we are leaving quite a bit of memory for the OS disk 
 cache).
 
 Is there some argument we could give the jvm so that it would collect garbage 
 more frequently? Or some other JVM tuning action that might reduce the amount 
 of time where Solr is waiting on GC?
 
 If we could get the time for each GC to take under a second, with the 
 trade-off being that GC  would occur much more frequently, that would help us 
 avoid the occasional query taking more than 30 seconds at the cost of a 
 larger number of queries taking at least a second.
 

What are your current GC settings?  Also, I guess I'd look at ways you can 
reduce the heap size needed.  Caching, field type choices, faceting choices.  
Also could try playing with the termIndexInterval which will load fewer terms 
into memory at the cost of longer seeks.  At some point, though, you just may 
need more shards and the resulting smaller indexes.  How many CPU cores do you 
have on each machine?

Re: Solr and jvm Garbage Collection tuning

2010-09-11 Thread Dennis Gearon
Thanks for the real life examples.

You would have to do a LOT of sharding to get that to work better.


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 9/10/10, Kent Fitch kent.fi...@gmail.com wrote:

 From: Kent Fitch kent.fi...@gmail.com
 Subject: Re: Solr and jvm Garbage Collection tuning
 To: solr-user@lucene.apache.org
 Date: Friday, September 10, 2010, 10:45 PM
 Hi Tim,
 
 For what it is worth,  behind Trove (http://trove.nla.gov.au/) are 3
 SOLR-managed indices and 1 Lucene index. None of ours is as
 big as one
 of your shards, and one of our SOLR-managed indices is
 tiny, but your
 experiences with long GC pauses are familar to us.
 
 One of the most difficult indices to tune is our
 bibliographic index
 of around 38M mostly metadata records which is around 125GB
 and 97MB
 tii files.
 
 We need to commit updates and reopen the index every 90
 seconds, and
 the facet recalculation (using UnInverted) was taking quite
 a lot of
 time, and seemed to generate lots of objects to be
 collected on each
 reopening.
 
 Although we've been through several rounds of tuning which
 have seemed
 to work, at least temporarily, a few months ago we started
 getting 12
 sec full gc times every 90 secs, which was no good!
 
 We've noticed/did three things:
 
 1) optimise to 1 segment - we'd got to the stage where 50%
 of the
 documents had been updated (hence deleted), and the
 maxdocid was 50%
 bigger than it needed to be, and hence datastructures whose
 size was
 proportional to maxdocid had increased a lot. 
 Optimising to 1 segment
 greatly reduced full GC frequency and times.
 
 2) for most of our facets, forcing the facets to be filters
 rather
 than uninverted happened to work better - but this depends
 on many
 factors, and certainly isnt a cure-all for all facets -
 uninverted
 often works much better than filters!
 
 3) after lots of benchmarking real updates and queries on a
 dev
 system, we came up with this set of JVM parameters that
 worked best
 for our environment (at the moment!):
 
 -Xmx17000M -XX:NewSize=3500M -XX:SurvivorRatio=3
 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC \
 -XX:+CMSIncrementalMode
 
 I can't say exactly why, except that with this combination
 of
 parameters and our data, a much bigger newgen led to less
 movement of
 objects to oldgen, and non-full-GC collections on oldgen
 worked much
 better.  Currently we are seeing less than 10 Full
 GC's a day, and
 they almost always take less than 4 seconds.
 
 This index is running on an 8 core X5570 machine with 64GB,
 sharing it
 with a large/busy mysql instance and the Trove web server.
 
 One of our other indices is only updated once per day, but
 is larger:
 33.5M docs representing full text of archived web pages,
 246GB, tii
 file is 36MB.
 
 JVM parms are  -Xmx1M -XX:+UseConcMarkSweepGC
 -XX:+UseParNewGC.
 
 It also does less than 10 Full GC's per day, taking less
 than 5 sec each.
 
 Our other large index, newspapers, is a native Lucene
 index, about
 180GB with comparatively large tii of 280MB (probably for
 the same
 reason your tii is large - the contents of this database is
 mostly
 OCR'ed text).  This index is updated/reopened every 3
 minutes (to
 incorporate OCR text corrections and tagging) and we use a
 bitmap to
 represent all facet values, which typically take 5 secs to
 rebuild on
 each reopen.
 
 JVM parms: -mx15000M -XX:+UseConcMarkSweepGC
 -XX:+UseParNewGC
 
 Although this JVM usually does fewer than 5 GC's per day,
 these Full
 GC's often take 20-30 seconds, and we need to test
 increasing the
 Newsize on this JVM to see if we can reduce these pauses.
 
 The web archive and newspaper index are running on 8 core
 X5570
 machine with 72GB.
 
 We are also running a separate copy/version of this index
 behind the
 site  http://newspapers.nla.gov.au/ - the main
 difference is that the
 Trove version using shingling (inspired by the Hathi Trust
 results) to
 improve searches containing common words.  This other
 version is
 running on a machine with 32GB and 8 X5460 cores and 
 has JVM parms:
   -mx11500M  -XX:+UseConcMarkSweepGC
 -XX:+UseParNewGC
 
 
 Apart from the old newspapers index, all other SOLR/lucene
 indices are
 maintained on SSDs (Intel x25m 160GB), which whilst not
 having
 anything to do with GCs, work very very well - we couldnt
 cope with
 our current query volumes on rotating disk without spending
 a great
 deal of money.  The old newspaper index is running on
 a SAN with 24
 fast disks backing it, and we can't support the same query
 rate on it
 as we can with the other newspaper index on SSDs (even
 before the
 shingling change).
 
 Kent Fitch
 Trove development team
 National Library of Australia



Re: Solr and jvm Garbage Collection tuning

2010-09-10 Thread Kent Fitch
Hi Tim,

For what it is worth,  behind Trove (http://trove.nla.gov.au/) are 3
SOLR-managed indices and 1 Lucene index. None of ours is as big as one
of your shards, and one of our SOLR-managed indices is tiny, but your
experiences with long GC pauses are familar to us.

One of the most difficult indices to tune is our bibliographic index
of around 38M mostly metadata records which is around 125GB and 97MB
tii files.

We need to commit updates and reopen the index every 90 seconds, and
the facet recalculation (using UnInverted) was taking quite a lot of
time, and seemed to generate lots of objects to be collected on each
reopening.

Although we've been through several rounds of tuning which have seemed
to work, at least temporarily, a few months ago we started getting 12
sec full gc times every 90 secs, which was no good!

We've noticed/did three things:

1) optimise to 1 segment - we'd got to the stage where 50% of the
documents had been updated (hence deleted), and the maxdocid was 50%
bigger than it needed to be, and hence datastructures whose size was
proportional to maxdocid had increased a lot.  Optimising to 1 segment
greatly reduced full GC frequency and times.

2) for most of our facets, forcing the facets to be filters rather
than uninverted happened to work better - but this depends on many
factors, and certainly isnt a cure-all for all facets - uninverted
often works much better than filters!

3) after lots of benchmarking real updates and queries on a dev
system, we came up with this set of JVM parameters that worked best
for our environment (at the moment!):

-Xmx17000M -XX:NewSize=3500M -XX:SurvivorRatio=3
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC \
-XX:+CMSIncrementalMode

I can't say exactly why, except that with this combination of
parameters and our data, a much bigger newgen led to less movement of
objects to oldgen, and non-full-GC collections on oldgen worked much
better.  Currently we are seeing less than 10 Full GC's a day, and
they almost always take less than 4 seconds.

This index is running on an 8 core X5570 machine with 64GB, sharing it
with a large/busy mysql instance and the Trove web server.

One of our other indices is only updated once per day, but is larger:
33.5M docs representing full text of archived web pages, 246GB, tii
file is 36MB.

JVM parms are  -Xmx1M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC.

It also does less than 10 Full GC's per day, taking less than 5 sec each.

Our other large index, newspapers, is a native Lucene index, about
180GB with comparatively large tii of 280MB (probably for the same
reason your tii is large - the contents of this database is mostly
OCR'ed text).  This index is updated/reopened every 3 minutes (to
incorporate OCR text corrections and tagging) and we use a bitmap to
represent all facet values, which typically take 5 secs to rebuild on
each reopen.

JVM parms: -mx15000M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

Although this JVM usually does fewer than 5 GC's per day, these Full
GC's often take 20-30 seconds, and we need to test increasing the
Newsize on this JVM to see if we can reduce these pauses.

The web archive and newspaper index are running on 8 core X5570
machine with 72GB.

We are also running a separate copy/version of this index behind the
site  http://newspapers.nla.gov.au/ - the main difference is that the
Trove version using shingling (inspired by the Hathi Trust results) to
improve searches containing common words.  This other version is
running on a machine with 32GB and 8 X5460 cores and  has JVM parms:
  -mx11500M  -XX:+UseConcMarkSweepGC -XX:+UseParNewGC


Apart from the old newspapers index, all other SOLR/lucene indices are
maintained on SSDs (Intel x25m 160GB), which whilst not having
anything to do with GCs, work very very well - we couldnt cope with
our current query volumes on rotating disk without spending a great
deal of money.  The old newspaper index is running on a SAN with 24
fast disks backing it, and we can't support the same query rate on it
as we can with the other newspaper index on SSDs (even before the
shingling change).

Kent Fitch
Trove development team
National Library of Australia