Frequent garbage collections after a day of operation
Hey everyone, we're running into some operational problems with our SOLR production setup here and were wondering if anyone else is affected or has even solved these problems before. We're running a vanilla SOLR 3.4.0 in several Tomcat 6 instances, so nothing out of the ordinary, but after a day or so of operation we see increased response times from SOLR, up to 3 times increases on average. During this time we see increased CPU load due to heavy garbage collection in the JVM, which bogs down the the whole system, so throughput decreases, naturally. When restarting the slaves, everything goes back to normal, but that's more like a brute force solution. The thing is, we don't know what's causing this and we don't have that much experience with Java stacks since we're for most parts a Rails company. Are Tomcat 6 or SOLR known to leak memory? Is anyone else seeing this, or can you think of a reason for this? Most of our queries to SOLR involve the DismaxHandler and the spatial search query components. We don't use any custom request handlers so far. Thanks in advance, -Matthias -- Matthias Käppler Lead Developer API Mobile Qype GmbH Großer Burstah 50-52 20457 Hamburg Telephone: +49 (0)40 - 219 019 2 - 160 Skype: m_kaeppler Email: matth...@qype.com Managing Director: Ian Brotherston Amtsgericht Hamburg HRB 95913 This e-mail and its attachments may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail and its attachments. Any unauthorized copying, disclosure or distribution of this e-mail and its attachments is strictly forbidden. This notice also applies to future messages.
Re: Improving performance for SOLR geo queries?
hey thanks all for the suggestions, didn't have time to look into them yet as we're feature-sprinting for MWC, but will report back with some feedback over the next weeks (we will have a few more performance sprints in March) Best, Matthias On Mon, Feb 13, 2012 at 2:32 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Feb 9, 2012 at 1:46 PM, Yonik Seeley yo...@lucidimagination.com wrote: One way to speed up numeric range queries (at the cost of increased index size) is to lower the precisionStep. You could try changing this from 8 to 4 and then re-indexing to see how that affects your query speed. Your issue, and the fact that I had been looking at the post-filtering code again for another client, reminded me that I had been planning on implementing post-filtering for spatial. It's now checked into trunk. If you have the ability to use trunk, you can add a high cost (like cost=200) along with cache=false to trigger it. More details here: http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/ -Yonik lucidimagination.com -- Matthias Käppler Lead Developer API Mobile Qype GmbH Großer Burstah 50-52 20457 Hamburg Telephone: +49 (0)40 - 219 019 2 - 160 Skype: m_kaeppler Email: matth...@qype.com Managing Director: Ian Brotherston Amtsgericht Hamburg HRB 95913 This e-mail and its attachments may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail and its attachments. Any unauthorized copying, disclosure or distribution of this e-mail and its attachments is strictly forbidden. This notice also applies to future messages.
Re: Improving performance for SOLR geo queries?
Hi Ryan, I'm trying to understand how you have your data indexed so we can give reasonable direction. What field type are you using for your locations? Is it using the solr spatial field types? What do you see when you look at the debug information from debugQuery=true? we query against a LatLonType using plain latitudes and longitudes and the bbox function. We send the bbox filter in a filter query that is uncached (we had to do this in order to get eviction rate down in the filter cache, we had problems with that). Our filter cache is set up as follows: Concurrent LRU Cache(maxSize=32768, initialSize=8192, minSize=29491, acceptableSize=31129, cleanupThread=false, autowarmCount=8192, regenerator=org.apache.solr.search.SolrIndexSearcher$2@2fd1fc5c) We've just restarted the slaves 30 minutes ago, so these values are not really giving away much, but we see a hit rate of up to 97% on the filter caches: lookups : 13003 hits : 12440 hitratio : 0.95 inserts : 563 evictions : 0 size : 8927 warmupTime : 116891 cumulative_lookups : 9990103 cumulative_hits : 9583913 cumulative_hitratio : 0.95 cumulative_inserts : 406191 cumulative_evictions : 0 The warmup time looks a bit worrying, is that a high value by your experience? As for debugQuery, here's the relevant snippet for the kind of geo queries we send: arr name=filter_queries str{!bbox cache=false d=50 sfield=location_ll pt=54.1434,-0.452322}/str /arr arr name=parsed_filter_queries str WrappedQuery({!cache=false cost=0}+location_ll_0_coordinate:[53.69373983225355 TO 54.59306016774645] +location_ll_1_coordinate:[-1.2199462259963294 TO 0.31530222599632934]) /str /arr From my experience, there is no single best practice for spatial queries -- it will depend on your data density and distribution if. You may also want to look at: http://code.google.com/p/lucene-spatial-playground/ but note this is off lucene trunk -- the geohash queries are super fast though thanks, I will look into that! I still haven't really considered geo hashes. As far as I understand, documents with a lat/lon are already assigned a geo hash upon indexing, is that correct? In which way does a query get faster though when I query by a geo hash rather than a lat/lon? Doesn't local lucene already map documents to a cartesian grid upon indexing, thus reducing lookup time? Moreover, will this mean the results get less accurate since different lat/lons may collapse into the same hash? Thanks! -- Matthias Käppler Lead Developer API Mobile Qype GmbH Großer Burstah 50-52 20457 Hamburg Telephone: +49 (0)40 - 219 019 2 - 160 Skype: m_kaeppler Email: matth...@qype.com Managing Director: Ian Brotherston Amtsgericht Hamburg HRB 95913 This e-mail and its attachments may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail and its attachments. Any unauthorized copying, disclosure or distribution of this e-mail and its attachments is strictly forbidden. This notice also applies to future messages.
Re: Improving performance for SOLR geo queries?
Hi Erick, if we're not doing geo searches, we filter by location tags that we attach to places. This is simply a hierachical regional id, which is simple to filter for, but much less flexible. We use that on Web a lot, but not on mobile, where we want to performance searches in arbitrary radii around arbitrary positions. For those location tag kind of queries, the average time spent in SOLR is 43msec (I'm looking at the New Relic snapshot of the last 12 hours). I have disabled our optimization again just yesterday, so for the bbox queries we're now at an avg of 220ms (same time window). That's a 5 fold increase in response time, and in peak hours it's worse than that. I've also found a blog post from 3 years ago which outlines the inner workings of the SOLR spatial indexing and searching: http://www.searchworkings.org/blog/-/blogs/23842 From that it seems as if SOLR already performs a similar optimization we had in mind during the index step, so if I understand correctly, it doesn't even search over all records, only those that were mapped to the grid box identified during indexing. What I would love to see is what the suggested way is to perform a geo query on SOLR, considering that they're so difficult to cache and expensive to run. Is the best approach to restrict the candidate set as much as possible using cheap filter queries, so that SOLR merely has to do the geo search against these subsets? How does the query planner work here? I see there's a cost attached to a filter query, but one can only set it when cache is set to false? Are cached geo queries executed last when there are cheaper filter queries to cut down on documents? If you have a real world practical setup to share, one that performs well in a production environment that serves requests in the Millions per day, that would be great. I'd love to contribute documentation by the way, if you knew me you'd know I'm an avid open source contributor and actually run several open source projects myself. But tell me, how can I possibly contribute answer to questions I don't have an answer to? That's why I'm here, remember :) So please, these kinds of snippy replies are not helping anyone. Thanks -Matthias On Tue, Feb 7, 2012 at 3:06 PM, Erick Erickson erickerick...@gmail.com wrote: So the obvious question is what is your performance like without the distance filters? Without that knowledge, we have no clue whether the modifications you've made had any hope of speeding up your response times As for the docs, any improvements you'd like to contribute would be happily received Best Erick 2012/2/6 Matthias Käppler matth...@qype.com: Hi, we need to perform fast geo lookups on an index of ~13M places, and were running into performance problems here with SOLR. We haven't done a lot of query optimization / SOLR tuning up until now so there's probably a lot of things we're missing. I was wondering if you could give me some feedback on the way we do things, whether they make sense, and especially why a supposed optimization we implemented recently seems to have no effect, when we actually thought it would help a lot. What we do is this: our API is built on a Rails stack and talks to SOLR via a Ruby wrapper. We have a few filters that almost always apply, which we put in filter queries. Filter cache hit rate is excellent, about 97%, and cache size caps at 10k filters (max size is 32k, but it never seems to reach that many, probably because we replicate / delta update every few minutes). Still, geo queries are slow, about 250-500msec on average. We send them with cache=false, so as to not flood the fq cache and cause undesirable evictions. Now our idea was this: while the actual geo queries are poorly cacheable, we could clearly identify geographical regions which are more often queried than others (naturally, since we're a user driven service). Therefore, we dynamically partition Earth into a static grid of overlapping boxes, where the grid size (the distance of the nodes) depends on the maximum allowed search radius. That way, for every user query, we would always be able to identify a single bounding box that covers it. This larger bounding box (200km edge length) we would send to SOLR as a cached filter query, along with the actual user query which would still be sent uncached. Ex: User asks for places in 10km around 49.14839,8.5691, then what we will send to SOLR is something like this: fq={!bbox cache=false d=10 sfield=location_ll pt=49.14839,8.5691} fq={!bbox cache=true d=100.0 sfield=location_ll pt=49.4684836290799,8.31165802979391} -- this one we derive automatically That way SOLR would intersect the two filters and return the same results as when only looking at the smaller bounding box, but keep the larger box in cache and speed up subsequent geo queries in the same regions. Or so we thought; unfortunately this approach did not help query execution times get better, at all
Improving performance for SOLR geo queries?
Hi, we need to perform fast geo lookups on an index of ~13M places, and were running into performance problems here with SOLR. We haven't done a lot of query optimization / SOLR tuning up until now so there's probably a lot of things we're missing. I was wondering if you could give me some feedback on the way we do things, whether they make sense, and especially why a supposed optimization we implemented recently seems to have no effect, when we actually thought it would help a lot. What we do is this: our API is built on a Rails stack and talks to SOLR via a Ruby wrapper. We have a few filters that almost always apply, which we put in filter queries. Filter cache hit rate is excellent, about 97%, and cache size caps at 10k filters (max size is 32k, but it never seems to reach that many, probably because we replicate / delta update every few minutes). Still, geo queries are slow, about 250-500msec on average. We send them with cache=false, so as to not flood the fq cache and cause undesirable evictions. Now our idea was this: while the actual geo queries are poorly cacheable, we could clearly identify geographical regions which are more often queried than others (naturally, since we're a user driven service). Therefore, we dynamically partition Earth into a static grid of overlapping boxes, where the grid size (the distance of the nodes) depends on the maximum allowed search radius. That way, for every user query, we would always be able to identify a single bounding box that covers it. This larger bounding box (200km edge length) we would send to SOLR as a cached filter query, along with the actual user query which would still be sent uncached. Ex: User asks for places in 10km around 49.14839,8.5691, then what we will send to SOLR is something like this: fq={!bbox cache=false d=10 sfield=location_ll pt=49.14839,8.5691} fq={!bbox cache=true d=100.0 sfield=location_ll pt=49.4684836290799,8.31165802979391} -- this one we derive automatically That way SOLR would intersect the two filters and return the same results as when only looking at the smaller bounding box, but keep the larger box in cache and speed up subsequent geo queries in the same regions. Or so we thought; unfortunately this approach did not help query execution times get better, at all. Question is: why does it not help? Shouldn't it be faster to search on a cached bbox with only a few hundred thousand places? Is it a good idea to make these kinds of optimizations in the app layer (we do this as part of resolving the SOLR query in Ruby), and does it make sense at all? We're not sure what kind of optimizations SOLR already does in its query planner. The documentation is (sorry) miserable, and debugQuery yields no insight into which optimizations are performed. So this has been a hit and miss game for us, which is very ineffective considering that it takes considerable time to build these kinds of optimizations in the app layer. Would be glad to hear your opinions / experience around this. Thanks! -- Matthias Käppler Lead Developer API Mobile Qype GmbH Großer Burstah 50-52 20457 Hamburg Telephone: +49 (0)40 - 219 019 2 - 160 Skype: m_kaeppler Email: matth...@qype.com Managing Director: Ian Brotherston Amtsgericht Hamburg HRB 95913 This e-mail and its attachments may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail and its attachments. Any unauthorized copying, disclosure or distribution of this e-mail and its attachments is strictly forbidden. This notice also applies to future messages.