Re: commit time and lock
Thanks, this is helpful. I do indeed periodically update or delete just about every doc in the index, so it makes sense that optimization might be neccesary even in post 1.4, but I'm still on 1.4 -- add this to another thing to look into rather than assume after I upgrade. Indeed I was aware that it would trigger a pretty complete index replication, but, since it seemed to greatly improve performance (in 1.4), so it goes. But yes, I'm STILL only updating once a day, even with all that. (And in fact, I'm only replicating once a day too, ha). On 7/25/2011 10:50 AM, Erick Erickson wrote: Yeah, the 1.4 code base is older. That is, optimization will have more effect on that vintage code than on 3.x and trunk code. I should have been a bit more explicit in that other thread. In the case where you add a bunch of documents, optimization doesn't buy you all that much currently. If you delete a bunch of docs (or update a bunch of existing docs), then optimization will reclaim resources. So you *could* have a case where the size of your index shrank drastically after optimization (say you updated the same 100K documents 10 times then optimized). But even that is it depends (tm). The new segment merging, as I remember, will possibly reclaim deleted resources, but I'm parroting people who actually know, so you might want to verify that if it Optimization will almost certainly trigger a complete index replication to any slaves configured, though. So the usual advice is to optimize maybe once a day or week during off hours as a starting point unless and until you can verify that your particular situation warrants optimizing more frequently. Best Erick On Fri, Jul 22, 2011 at 11:53 AM, Jonathan Rochkindrochk...@jhu.edu wrote: How old is 'older'? I'm pretty sure I'm still getting much faster performance on an optimized index in Solr 1.4. This could be due to the nature of my index and queries (which include some medium sized stored fields, and extensive facetting -- facetting on up to a dozen fields in every request, where each field can include millions of unique values. Amazing I can do this with good performance at all!). It's also possible i'm wrong about that faster performance, i haven't done robustly valid benchmarking on a clone of my production index yet. But it really looks like that way to me, from what investigation I have done. If the answer is that optimization is believed no longer neccesary on versions LATER than 1.4, that might be the simplest explanation. From: Pierre GOSSE [pierre.go...@arisem.com] Sent: Friday, July 22, 2011 10:23 AM To: solr-user@lucene.apache.org Subject: RE: commit time and lock Hi Mark I've read that in a thread title Weird optimize performance degradation, where Erick Erickson states that Older versions of Lucene would search faster on an optimized index, but this is no longer necessary., and more recently in a thread you initiated a month ago Question about optimization. I'll also be very interested if anyone had a more precise idea/datas of benefits and tradeoff of optimize vs merge ... Pierre -Message d'origine- De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com] Envoyé : vendredi 22 juillet 2011 15:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Hello, Pierre, can you tell us where you read that? I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx Marc. On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSEpierre.go...@arisem.comwrote: Solr will response for search during optimization, but commits will have to wait the end of the optimization process. During optimization a new index is generated on disk by merging every single file of the current index into one big file, so you're server will be busy, especially regarding disk access. This may alter your response time and has very negative effect on the replication of index if you have a master/slave architecture. I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx, so maybe you don't really need optimization. What version of solr are you using ? Maybe someone can point toward a relevant link about optimization other than solr wiki http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : vendredi 22 juillet 2011 12:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Thanks for clarity. One more thing I want to know about optimization. Right now I am planning to optimize the server in 24 hour. Optimization is also time taking ( last time took around 13 minutes), so I want to know that : 1. when optimization is under process that time will solr server response or not? 2. if server will not response then how to do
Re: commit time and lock
Yeah, the 1.4 code base is older. That is, optimization will have more effect on that vintage code than on 3.x and trunk code. I should have been a bit more explicit in that other thread. In the case where you add a bunch of documents, optimization doesn't buy you all that much currently. If you delete a bunch of docs (or update a bunch of existing docs), then optimization will reclaim resources. So you *could* have a case where the size of your index shrank drastically after optimization (say you updated the same 100K documents 10 times then optimized). But even that is it depends (tm). The new segment merging, as I remember, will possibly reclaim deleted resources, but I'm parroting people who actually know, so you might want to verify that if it Optimization will almost certainly trigger a complete index replication to any slaves configured, though. So the usual advice is to optimize maybe once a day or week during off hours as a starting point unless and until you can verify that your particular situation warrants optimizing more frequently. Best Erick On Fri, Jul 22, 2011 at 11:53 AM, Jonathan Rochkind rochk...@jhu.edu wrote: How old is 'older'? I'm pretty sure I'm still getting much faster performance on an optimized index in Solr 1.4. This could be due to the nature of my index and queries (which include some medium sized stored fields, and extensive facetting -- facetting on up to a dozen fields in every request, where each field can include millions of unique values. Amazing I can do this with good performance at all!). It's also possible i'm wrong about that faster performance, i haven't done robustly valid benchmarking on a clone of my production index yet. But it really looks like that way to me, from what investigation I have done. If the answer is that optimization is believed no longer neccesary on versions LATER than 1.4, that might be the simplest explanation. From: Pierre GOSSE [pierre.go...@arisem.com] Sent: Friday, July 22, 2011 10:23 AM To: solr-user@lucene.apache.org Subject: RE: commit time and lock Hi Mark I've read that in a thread title Weird optimize performance degradation, where Erick Erickson states that Older versions of Lucene would search faster on an optimized index, but this is no longer necessary., and more recently in a thread you initiated a month ago Question about optimization. I'll also be very interested if anyone had a more precise idea/datas of benefits and tradeoff of optimize vs merge ... Pierre -Message d'origine- De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com] Envoyé : vendredi 22 juillet 2011 15:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Hello, Pierre, can you tell us where you read that? I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx Marc. On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE pierre.go...@arisem.comwrote: Solr will response for search during optimization, but commits will have to wait the end of the optimization process. During optimization a new index is generated on disk by merging every single file of the current index into one big file, so you're server will be busy, especially regarding disk access. This may alter your response time and has very negative effect on the replication of index if you have a master/slave architecture. I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx, so maybe you don't really need optimization. What version of solr are you using ? Maybe someone can point toward a relevant link about optimization other than solr wiki http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : vendredi 22 juillet 2011 12:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Thanks for clarity. One more thing I want to know about optimization. Right now I am planning to optimize the server in 24 hour. Optimization is also time taking ( last time took around 13 minutes), so I want to know that : 1. when optimization is under process that time will solr server response or not? 2. if server will not response then how to do optimization of server fast or other way to do optimization so our user will not have to wait to finished optimization process. regards Jonty On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.com wrote: Solr still respond to search queries during commit, only new indexations requests will have to wait (until end of commit?). So I don't think your users will experience increased response time during commits (unless your server is much undersized). Pierre -Message d'origine- De : Jonty Rhods
Re: commit time and lock
What does the committers think about adding a index queue in Solr? Then we can have lots of one-off index requests that would queue up... On Fri, Jul 22, 2011 at 3:14 AM, Pierre GOSSE pierre.go...@arisem.com wrote: Solr still respond to search queries during commit, only new indexations requests will have to wait (until end of commit?). So I don't think your users will experience increased response time during commits (unless your server is much undersized). Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : jeudi 21 juillet 2011 20:27 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Actually i m worried about the response time. i k commiting around 500 docs in every 5 minutes. as i know,correct me if i m wrong; at the time of commiting solr server stop responding. my concern is how to minimize the response time so user not need to wait. or any other logic will require for my case. please suggest. regards jonty On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote: What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty -- Bill Bell billnb...@gmail.com cell 720-256-8076
RE: commit time and lock
Solr still respond to search queries during commit, only new indexations requests will have to wait (until end of commit?). So I don't think your users will experience increased response time during commits (unless your server is much undersized). Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : jeudi 21 juillet 2011 20:27 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Actually i m worried about the response time. i k commiting around 500 docs in every 5 minutes. as i know,correct me if i m wrong; at the time of commiting solr server stop responding. my concern is how to minimize the response time so user not need to wait. or any other logic will require for my case. please suggest. regards jonty On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote: What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
Re: commit time and lock
Thanks for clarity. One more thing I want to know about optimization. Right now I am planning to optimize the server in 24 hour. Optimization is also time taking ( last time took around 13 minutes), so I want to know that : 1. when optimization is under process that time will solr server response or not? 2. if server will not response then how to do optimization of server fast or other way to do optimization so our user will not have to wait to finished optimization process. regards Jonty On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.comwrote: Solr still respond to search queries during commit, only new indexations requests will have to wait (until end of commit?). So I don't think your users will experience increased response time during commits (unless your server is much undersized). Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : jeudi 21 juillet 2011 20:27 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Actually i m worried about the response time. i k commiting around 500 docs in every 5 minutes. as i know,correct me if i m wrong; at the time of commiting solr server stop responding. my concern is how to minimize the response time so user not need to wait. or any other logic will require for my case. please suggest. regards jonty On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote: What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
RE: commit time and lock
Solr will response for search during optimization, but commits will have to wait the end of the optimization process. During optimization a new index is generated on disk by merging every single file of the current index into one big file, so you're server will be busy, especially regarding disk access. This may alter your response time and has very negative effect on the replication of index if you have a master/slave architecture. I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx, so maybe you don't really need optimization. What version of solr are you using ? Maybe someone can point toward a relevant link about optimization other than solr wiki http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : vendredi 22 juillet 2011 12:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Thanks for clarity. One more thing I want to know about optimization. Right now I am planning to optimize the server in 24 hour. Optimization is also time taking ( last time took around 13 minutes), so I want to know that : 1. when optimization is under process that time will solr server response or not? 2. if server will not response then how to do optimization of server fast or other way to do optimization so our user will not have to wait to finished optimization process. regards Jonty On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.comwrote: Solr still respond to search queries during commit, only new indexations requests will have to wait (until end of commit?). So I don't think your users will experience increased response time during commits (unless your server is much undersized). Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : jeudi 21 juillet 2011 20:27 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Actually i m worried about the response time. i k commiting around 500 docs in every 5 minutes. as i know,correct me if i m wrong; at the time of commiting solr server stop responding. my concern is how to minimize the response time so user not need to wait. or any other logic will require for my case. please suggest. regards jonty On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote: What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
Re: commit time and lock
Hello, Pierre, can you tell us where you read that? I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx Marc. On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE pierre.go...@arisem.comwrote: Solr will response for search during optimization, but commits will have to wait the end of the optimization process. During optimization a new index is generated on disk by merging every single file of the current index into one big file, so you're server will be busy, especially regarding disk access. This may alter your response time and has very negative effect on the replication of index if you have a master/slave architecture. I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx, so maybe you don't really need optimization. What version of solr are you using ? Maybe someone can point toward a relevant link about optimization other than solr wiki http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : vendredi 22 juillet 2011 12:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Thanks for clarity. One more thing I want to know about optimization. Right now I am planning to optimize the server in 24 hour. Optimization is also time taking ( last time took around 13 minutes), so I want to know that : 1. when optimization is under process that time will solr server response or not? 2. if server will not response then how to do optimization of server fast or other way to do optimization so our user will not have to wait to finished optimization process. regards Jonty On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.com wrote: Solr still respond to search queries during commit, only new indexations requests will have to wait (until end of commit?). So I don't think your users will experience increased response time during commits (unless your server is much undersized). Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : jeudi 21 juillet 2011 20:27 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Actually i m worried about the response time. i k commiting around 500 docs in every 5 minutes. as i know,correct me if i m wrong; at the time of commiting solr server stop responding. my concern is how to minimize the response time so user not need to wait. or any other logic will require for my case. please suggest. regards jonty On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote: What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
RE: commit time and lock
Hi Mark I've read that in a thread title Weird optimize performance degradation, where Erick Erickson states that Older versions of Lucene would search faster on an optimized index, but this is no longer necessary., and more recently in a thread you initiated a month ago Question about optimization. I'll also be very interested if anyone had a more precise idea/datas of benefits and tradeoff of optimize vs merge ... Pierre -Message d'origine- De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com] Envoyé : vendredi 22 juillet 2011 15:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Hello, Pierre, can you tell us where you read that? I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx Marc. On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE pierre.go...@arisem.comwrote: Solr will response for search during optimization, but commits will have to wait the end of the optimization process. During optimization a new index is generated on disk by merging every single file of the current index into one big file, so you're server will be busy, especially regarding disk access. This may alter your response time and has very negative effect on the replication of index if you have a master/slave architecture. I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx, so maybe you don't really need optimization. What version of solr are you using ? Maybe someone can point toward a relevant link about optimization other than solr wiki http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : vendredi 22 juillet 2011 12:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Thanks for clarity. One more thing I want to know about optimization. Right now I am planning to optimize the server in 24 hour. Optimization is also time taking ( last time took around 13 minutes), so I want to know that : 1. when optimization is under process that time will solr server response or not? 2. if server will not response then how to do optimization of server fast or other way to do optimization so our user will not have to wait to finished optimization process. regards Jonty On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.com wrote: Solr still respond to search queries during commit, only new indexations requests will have to wait (until end of commit?). So I don't think your users will experience increased response time during commits (unless your server is much undersized). Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : jeudi 21 juillet 2011 20:27 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Actually i m worried about the response time. i k commiting around 500 docs in every 5 minutes. as i know,correct me if i m wrong; at the time of commiting solr server stop responding. my concern is how to minimize the response time so user not need to wait. or any other logic will require for my case. please suggest. regards jonty On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote: What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
Re: commit time and lock
On 7/22/2011 8:23 AM, Pierre GOSSE wrote: I've read that in a thread title Weird optimize performance degradation, where Erick Erickson states that Older versions of Lucene would search faster on an optimized index, but this is no longer necessary., and more recently in a thread you initiated a month ago Question about optimization. I'll also be very interested if anyone had a more precise idea/datas of benefits and tradeoff of optimize vs merge ... My most recent testing has been with Solr 3.2.0. I have noticed some speedup after optimizing an index, but the gain is not earth-shattering. My index consists of 7 shards. One of them is small, and receives all new documents every two minutes. The others are large, and aside from deletes, are mostly static. Once a day, the oldest data is distributed from the small shard to its proper place in the other six shards. The small shard is optimized once an hour, and usually takes less than a minute. I optimize one large shard every day, so each one gets optimized once every six days. That optimize takes 10-15 minutes. The only reason that I optimize is to remove deleted documents, whatever speedup I get is just icing on the cake. Deleted documents take up space and continue to influence the relevance scoring of queries, so I want to remove them. Thanks, Shawn
RE: commit time and lock
Merging does not happen often enough to keep deleted documents to a low enough count ? Maybe there's a need to have partial optimization available in solr, meaning that segment with too much deleted document could be copied to a new file without unnecessary datas. That way cleaning deleted datas could be compatible with having light replications. I'm worried by this idea of deleted documents influencing relevance scores, any pointer to how important this influence may be ? Pierre -Message d'origine- De : Shawn Heisey [mailto:s...@elyograg.org] Envoyé : vendredi 22 juillet 2011 16:42 À : solr-user@lucene.apache.org Objet : Re: commit time and lock On 7/22/2011 8:23 AM, Pierre GOSSE wrote: I've read that in a thread title Weird optimize performance degradation, where Erick Erickson states that Older versions of Lucene would search faster on an optimized index, but this is no longer necessary., and more recently in a thread you initiated a month ago Question about optimization. I'll also be very interested if anyone had a more precise idea/datas of benefits and tradeoff of optimize vs merge ... My most recent testing has been with Solr 3.2.0. I have noticed some speedup after optimizing an index, but the gain is not earth-shattering. My index consists of 7 shards. One of them is small, and receives all new documents every two minutes. The others are large, and aside from deletes, are mostly static. Once a day, the oldest data is distributed from the small shard to its proper place in the other six shards. The small shard is optimized once an hour, and usually takes less than a minute. I optimize one large shard every day, so each one gets optimized once every six days. That optimize takes 10-15 minutes. The only reason that I optimize is to remove deleted documents, whatever speedup I get is just icing on the cake. Deleted documents take up space and continue to influence the relevance scoring of queries, so I want to remove them. Thanks, Shawn
RE: commit time and lock
How old is 'older'? I'm pretty sure I'm still getting much faster performance on an optimized index in Solr 1.4. This could be due to the nature of my index and queries (which include some medium sized stored fields, and extensive facetting -- facetting on up to a dozen fields in every request, where each field can include millions of unique values. Amazing I can do this with good performance at all!). It's also possible i'm wrong about that faster performance, i haven't done robustly valid benchmarking on a clone of my production index yet. But it really looks like that way to me, from what investigation I have done. If the answer is that optimization is believed no longer neccesary on versions LATER than 1.4, that might be the simplest explanation. From: Pierre GOSSE [pierre.go...@arisem.com] Sent: Friday, July 22, 2011 10:23 AM To: solr-user@lucene.apache.org Subject: RE: commit time and lock Hi Mark I've read that in a thread title Weird optimize performance degradation, where Erick Erickson states that Older versions of Lucene would search faster on an optimized index, but this is no longer necessary., and more recently in a thread you initiated a month ago Question about optimization. I'll also be very interested if anyone had a more precise idea/datas of benefits and tradeoff of optimize vs merge ... Pierre -Message d'origine- De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com] Envoyé : vendredi 22 juillet 2011 15:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Hello, Pierre, can you tell us where you read that? I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx Marc. On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE pierre.go...@arisem.comwrote: Solr will response for search during optimization, but commits will have to wait the end of the optimization process. During optimization a new index is generated on disk by merging every single file of the current index into one big file, so you're server will be busy, especially regarding disk access. This may alter your response time and has very negative effect on the replication of index if you have a master/slave architecture. I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx, so maybe you don't really need optimization. What version of solr are you using ? Maybe someone can point toward a relevant link about optimization other than solr wiki http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : vendredi 22 juillet 2011 12:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Thanks for clarity. One more thing I want to know about optimization. Right now I am planning to optimize the server in 24 hour. Optimization is also time taking ( last time took around 13 minutes), so I want to know that : 1. when optimization is under process that time will solr server response or not? 2. if server will not response then how to do optimization of server fast or other way to do optimization so our user will not have to wait to finished optimization process. regards Jonty On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.com wrote: Solr still respond to search queries during commit, only new indexations requests will have to wait (until end of commit?). So I don't think your users will experience increased response time during commits (unless your server is much undersized). Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : jeudi 21 juillet 2011 20:27 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Actually i m worried about the response time. i k commiting around 500 docs in every 5 minutes. as i know,correct me if i m wrong; at the time of commiting solr server stop responding. my concern is how to minimize the response time so user not need to wait. or any other logic will require for my case. please suggest. regards jonty On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote: What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit
Re: commit time and lock
On 7/22/2011 9:32 AM, Pierre GOSSE wrote: Merging does not happen often enough to keep deleted documents to a low enough count ? Maybe there's a need to have partial optimization available in solr, meaning that segment with too much deleted document could be copied to a new file without unnecessary datas. That way cleaning deleted datas could be compatible with having light replications. I'm worried by this idea of deleted documents influencing relevance scores, any pointer to how important this influence may be ? I've got a pretty high mergeFactor, for fast indexing. Also, I want to know for sure and control when merges happen, so I am not leaving it up to Lucene/Solr. Right now the largest number of deleted documents on any shard at this moment is 45347. The shard (17.65GB) contains 9663271 documents, in six segments. That will be one HUGE segment (from the last optimize) and five very very tiny segments, each with only a few thousand documents in them. Tonight when the document distribution process runs, that index will be optimized again. Tomorrow night a different shard will be optimized. Deleted documents can (and do) happen anywhere in the index, so even if I had a lot of largish segments rather than one huge segment, it's very likely that just expunging deletes would still result in the entire index being merged, so I am not losing anything by doing a full optimize, and I am gaining a small bit of performance. The 45000 deletes mentioned above represent less than half a percent of the shard, so the influence on relevance is *probably* not large ... but that's not something I can say definitively. I think it all depends on what people are searching for and how common the terms in the deleted documents are. Thanks, Shawn
Re: commit time and lock
Actually i m worried about the response time. i k commiting around 500 docs in every 5 minutes. as i know,correct me if i m wrong; at the time of commiting solr server stop responding. my concern is how to minimize the response time so user not need to wait. or any other logic will require for my case. please suggest. regards jonty On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote: What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
Re: commit time and lock
Dear all, Kindly help me.. thanks On Tuesday 21 June 2011 11:46 AM, Jonty Rhods wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
commit time and lock
I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
Re: commit time and lock
What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty