Re: commit time and lock

2011-07-25 Thread Jonathan Rochkind

Thanks, this is helpful.

I do indeed periodically update or delete just about every doc in the 
index, so it makes sense that optimization might be neccesary even in 
post 1.4, but I'm still on 1.4 -- add this to another thing to look into 
rather than assume after I upgrade.


Indeed I was aware that it would trigger a pretty complete index 
replication, but, since it seemed to greatly improve performance (in 
1.4), so it goes. But yes, I'm STILL only updating once a day, even with 
all that. (And in fact, I'm only replicating once a day too, ha).


On 7/25/2011 10:50 AM, Erick Erickson wrote:

Yeah, the 1.4 code base is older. That is, optimization will have more
effect on that vintage code than on 3.x and trunk code.

I should have been a bit more explicit in that other thread. In the case
where you add a bunch of documents, optimization doesn't buy you all
that much currently. If you delete a bunch of docs (or update a bunch of
existing docs), then optimization will reclaim resources. So you *could*
have a case where the size of your index shrank drastically after
optimization (say you updated the same 100K documents 10 times then
optimized).

But even that is it depends (tm). The new segment merging, as I remember,
will possibly reclaim deleted resources, but I'm parroting people who actually
know, so you might want to verify that if it

Optimization will almost certainly trigger a complete index replication to any
slaves configured, though.

So the usual advice is to optimize maybe once a day or week during off hours
as a starting point unless and until you can verify that your
particular situation
warrants optimizing more frequently.

Best
Erick

On Fri, Jul 22, 2011 at 11:53 AM, Jonathan Rochkindrochk...@jhu.edu  wrote:

How old is 'older'?  I'm pretty sure I'm still getting much faster performance 
on an optimized index in Solr 1.4.

This could be due to the nature of my index and queries (which include some 
medium sized stored fields, and extensive facetting -- facetting on up to a 
dozen fields in every request, where each field can include millions of unique 
values. Amazing I can do this with good performance at all!).

It's also possible i'm wrong about that faster performance, i haven't done 
robustly valid benchmarking on a clone of my production index yet. But it 
really looks like that way to me, from what investigation I have done.

If the answer is that optimization is believed no longer neccesary on versions 
LATER than 1.4, that might be the simplest explanation.

From: Pierre GOSSE [pierre.go...@arisem.com]
Sent: Friday, July 22, 2011 10:23 AM
To: solr-user@lucene.apache.org
Subject: RE: commit time and lock

Hi Mark

I've read that in a thread title  Weird optimize performance degradation, where Erick Erickson 
states that Older versions of Lucene would search faster on an optimized index, but this is no longer 
necessary., and more recently in a thread you initiated a month ago Question about 
optimization.

I'll also be very interested if anyone had a more precise idea/datas of 
benefits and tradeoff of optimize vs merge ...

Pierre


-Message d'origine-
De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com]
Envoyé : vendredi 22 juillet 2011 15:45
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Hello,

Pierre, can you tell us where you read that?
I've read here that optimization is not always a requirement to have an
efficient index, due to some low level changes in lucene 3.xx

Marc.

On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSEpierre.go...@arisem.comwrote:


Solr will response for search during optimization, but commits will have to
wait the end of the optimization process.

During optimization a new index is generated on disk by merging every
single file of the current index into one big file, so you're server will be
busy, especially regarding disk access. This may alter your response time
and has very negative effect on the replication of index if you have a
master/slave architecture.

I've read here that optimization is not always a requirement to have an
efficient index, due to some low level changes in lucene 3.xx, so maybe you
don't really need optimization. What version of solr are you using ? Maybe
someone can point toward a relevant link about optimization other than solr
wiki
http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

Pierre


-Message d'origine-
De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
Envoyé : vendredi 22 juillet 2011 12:45
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Thanks for clarity.

One more thing I want to know about optimization.

Right now I am planning to optimize the server in 24 hour. Optimization is
also time taking ( last time took around 13 minutes), so I want to know
that
:

1. when optimization is under process that time will solr server response
or
not?
2. if server will not response then how to do

Re: commit time and lock

2011-07-25 Thread Erick Erickson
Yeah, the 1.4 code base is older. That is, optimization will have more
effect on that vintage code than on 3.x and trunk code.

I should have been a bit more explicit in that other thread. In the case
where you add a bunch of documents, optimization doesn't buy you all
that much currently. If you delete a bunch of docs (or update a bunch of
existing docs), then optimization will reclaim resources. So you *could*
have a case where the size of your index shrank drastically after
optimization (say you updated the same 100K documents 10 times then
optimized).

But even that is it depends (tm). The new segment merging, as I remember,
will possibly reclaim deleted resources, but I'm parroting people who actually
know, so you might want to verify that if it

Optimization will almost certainly trigger a complete index replication to any
slaves configured, though.

So the usual advice is to optimize maybe once a day or week during off hours
as a starting point unless and until you can verify that your
particular situation
warrants optimizing more frequently.

Best
Erick

On Fri, Jul 22, 2011 at 11:53 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 How old is 'older'?  I'm pretty sure I'm still getting much faster 
 performance on an optimized index in Solr 1.4.

 This could be due to the nature of my index and queries (which include some 
 medium sized stored fields, and extensive facetting -- facetting on up to a 
 dozen fields in every request, where each field can include millions of 
 unique values. Amazing I can do this with good performance at all!).

 It's also possible i'm wrong about that faster performance, i haven't done 
 robustly valid benchmarking on a clone of my production index yet. But it 
 really looks like that way to me, from what investigation I have done.

 If the answer is that optimization is believed no longer neccesary on 
 versions LATER than 1.4, that might be the simplest explanation.
 
 From: Pierre GOSSE [pierre.go...@arisem.com]
 Sent: Friday, July 22, 2011 10:23 AM
 To: solr-user@lucene.apache.org
 Subject: RE: commit time and lock

 Hi Mark

 I've read that in a thread title  Weird optimize performance degradation, 
 where Erick Erickson states that Older versions of Lucene would search 
 faster on an optimized index, but this is no longer necessary., and more 
 recently in a thread you initiated a month ago Question about optimization.

 I'll also be very interested if anyone had a more precise idea/datas of 
 benefits and tradeoff of optimize vs merge ...

 Pierre


 -Message d'origine-
 De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com]
 Envoyé : vendredi 22 juillet 2011 15:45
 À : solr-user@lucene.apache.org
 Objet : Re: commit time and lock

 Hello,

 Pierre, can you tell us where you read that?
 I've read here that optimization is not always a requirement to have an
 efficient index, due to some low level changes in lucene 3.xx

 Marc.

 On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

 Solr will response for search during optimization, but commits will have to
 wait the end of the optimization process.

 During optimization a new index is generated on disk by merging every
 single file of the current index into one big file, so you're server will be
 busy, especially regarding disk access. This may alter your response time
 and has very negative effect on the replication of index if you have a
 master/slave architecture.

 I've read here that optimization is not always a requirement to have an
 efficient index, due to some low level changes in lucene 3.xx, so maybe you
 don't really need optimization. What version of solr are you using ? Maybe
 someone can point toward a relevant link about optimization other than solr
 wiki
 http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

 Pierre


 -Message d'origine-
 De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
 Envoyé : vendredi 22 juillet 2011 12:45
 À : solr-user@lucene.apache.org
 Objet : Re: commit time and lock

 Thanks for clarity.

 One more thing I want to know about optimization.

 Right now I am planning to optimize the server in 24 hour. Optimization is
 also time taking ( last time took around 13 minutes), so I want to know
 that
 :

 1. when optimization is under process that time will solr server response
 or
 not?
 2. if server will not response then how to do optimization of server fast
 or
 other way to do optimization so our user will not have to wait to finished
 optimization process.

 regards
 Jonty



 On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.com
 wrote:

  Solr still respond to search queries during commit, only new indexations
  requests will have to wait (until end of commit?). So I don't think your
  users will experience increased response time during commits (unless your
  server is much undersized).
 
  Pierre
 
  -Message d'origine-
  De : Jonty Rhods

Re: commit time and lock

2011-07-24 Thread William Bell
What does the committers think about adding a index queue in Solr?

Then we can have lots of one-off index requests that would queue up...

On Fri, Jul 22, 2011 at 3:14 AM, Pierre GOSSE pierre.go...@arisem.com wrote:
 Solr still respond to search queries during commit, only new indexations 
 requests will have to wait (until end of commit?). So I don't think your 
 users will experience increased response time during commits (unless your 
 server is much undersized).

 Pierre

 -Message d'origine-
 De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
 Envoyé : jeudi 21 juillet 2011 20:27
 À : solr-user@lucene.apache.org
 Objet : Re: commit time and lock

 Actually i m worried about the response time. i k commiting around 500
 docs in every 5 minutes. as i know,correct me if i m wrong; at the
 time of commiting solr server stop responding. my concern is how to
 minimize the response time so user not need to wait. or any other
 logic will require for my case. please suggest.

 regards
 jonty

 On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote:
 What is it you want help with? You haven't told us what the
 problem you're trying to solve is. Are you asking how to
 speed up indexing? What have you tried? Have you
 looked at: http://wiki.apache.org/solr/FAQ#Performance?

 Best
 Erick

 On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote:
 I am using solrj to index the data. I have around 5 docs indexed. As at
 the time of commit due to lock server stop giving response so I was
 calculating commit time:

 double starttemp = System.currentTimeMillis();
 server.add(docs);
 server.commit();
 System.out.println(total time in commit =  + (System.currentTimeMillis() -
 starttemp)/1000);

 It taking around 9 second to commit the 5000 docs with 15 fields. However I
 am not confirm the lock time of index whether it is start
 since server.add(docs); time or server.commit(); time only.

 If I am changing from above to following

 server.add(docs);
 double starttemp = System.currentTimeMillis();
 server.commit();
 System.out.println(total time in commit =  + (System.currentTimeMillis() -
 starttemp)/1000);

 then commit time becomes less then 1 second. I am not sure which one is
 right.

 please help.

 regards
 Jonty






-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Solr still respond to search queries during commit, only new indexations 
requests will have to wait (until end of commit?). So I don't think your users 
will experience increased response time during commits (unless your server is 
much undersized).

Pierre

-Message d'origine-
De : Jonty Rhods [mailto:jonty.rh...@gmail.com] 
Envoyé : jeudi 21 juillet 2011 20:27
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Actually i m worried about the response time. i k commiting around 500
docs in every 5 minutes. as i know,correct me if i m wrong; at the
time of commiting solr server stop responding. my concern is how to
minimize the response time so user not need to wait. or any other
logic will require for my case. please suggest.

regards
jonty

On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote:
 What is it you want help with? You haven't told us what the
 problem you're trying to solve is. Are you asking how to
 speed up indexing? What have you tried? Have you
 looked at: http://wiki.apache.org/solr/FAQ#Performance?

 Best
 Erick

 On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote:
 I am using solrj to index the data. I have around 5 docs indexed. As at
 the time of commit due to lock server stop giving response so I was
 calculating commit time:

 double starttemp = System.currentTimeMillis();
 server.add(docs);
 server.commit();
 System.out.println(total time in commit =  + (System.currentTimeMillis() -
 starttemp)/1000);

 It taking around 9 second to commit the 5000 docs with 15 fields. However I
 am not confirm the lock time of index whether it is start
 since server.add(docs); time or server.commit(); time only.

 If I am changing from above to following

 server.add(docs);
 double starttemp = System.currentTimeMillis();
 server.commit();
 System.out.println(total time in commit =  + (System.currentTimeMillis() -
 starttemp)/1000);

 then commit time becomes less then 1 second. I am not sure which one is
 right.

 please help.

 regards
 Jonty




Re: commit time and lock

2011-07-22 Thread Jonty Rhods
Thanks for clarity.

One more thing I want to know about optimization.

Right now I am planning to optimize the server in 24 hour. Optimization is
also time taking ( last time took around 13 minutes), so I want to know that
:

1. when optimization is under process that time will solr server response or
not?
2. if server will not response then how to do optimization of server fast or
other way to do optimization so our user will not have to wait to finished
optimization process.

regards
Jonty



On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

 Solr still respond to search queries during commit, only new indexations
 requests will have to wait (until end of commit?). So I don't think your
 users will experience increased response time during commits (unless your
 server is much undersized).

 Pierre

 -Message d'origine-
 De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
 Envoyé : jeudi 21 juillet 2011 20:27
 À : solr-user@lucene.apache.org
 Objet : Re: commit time and lock

 Actually i m worried about the response time. i k commiting around 500
 docs in every 5 minutes. as i know,correct me if i m wrong; at the
 time of commiting solr server stop responding. my concern is how to
 minimize the response time so user not need to wait. or any other
 logic will require for my case. please suggest.

 regards
 jonty

 On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote:
  What is it you want help with? You haven't told us what the
  problem you're trying to solve is. Are you asking how to
  speed up indexing? What have you tried? Have you
  looked at: http://wiki.apache.org/solr/FAQ#Performance?
 
  Best
  Erick
 
  On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com
 wrote:
  I am using solrj to index the data. I have around 5 docs indexed. As
 at
  the time of commit due to lock server stop giving response so I was
  calculating commit time:
 
  double starttemp = System.currentTimeMillis();
  server.add(docs);
  server.commit();
  System.out.println(total time in commit =  +
 (System.currentTimeMillis() -
  starttemp)/1000);
 
  It taking around 9 second to commit the 5000 docs with 15 fields.
 However I
  am not confirm the lock time of index whether it is start
  since server.add(docs); time or server.commit(); time only.
 
  If I am changing from above to following
 
  server.add(docs);
  double starttemp = System.currentTimeMillis();
  server.commit();
  System.out.println(total time in commit =  +
 (System.currentTimeMillis() -
  starttemp)/1000);
 
  then commit time becomes less then 1 second. I am not sure which one is
  right.
 
  please help.
 
  regards
  Jonty
 
 



RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Solr will response for search during optimization, but commits will have to 
wait the end of the optimization process.

During optimization a new index is generated on disk by merging every single 
file of the current index into one big file, so you're server will be busy, 
especially regarding disk access. This may alter your response time and has 
very negative effect on the replication of index if you have a master/slave 
architecture.

I've read here that optimization is not always a requirement to have an 
efficient index, due to some low level changes in lucene 3.xx, so maybe you 
don't really need optimization. What version of solr are you using ? Maybe 
someone can point toward a relevant link about optimization other than solr 
wiki 
http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

Pierre


-Message d'origine-
De : Jonty Rhods [mailto:jonty.rh...@gmail.com] 
Envoyé : vendredi 22 juillet 2011 12:45
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Thanks for clarity.

One more thing I want to know about optimization.

Right now I am planning to optimize the server in 24 hour. Optimization is
also time taking ( last time took around 13 minutes), so I want to know that
:

1. when optimization is under process that time will solr server response or
not?
2. if server will not response then how to do optimization of server fast or
other way to do optimization so our user will not have to wait to finished
optimization process.

regards
Jonty



On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

 Solr still respond to search queries during commit, only new indexations
 requests will have to wait (until end of commit?). So I don't think your
 users will experience increased response time during commits (unless your
 server is much undersized).

 Pierre

 -Message d'origine-
 De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
 Envoyé : jeudi 21 juillet 2011 20:27
 À : solr-user@lucene.apache.org
 Objet : Re: commit time and lock

 Actually i m worried about the response time. i k commiting around 500
 docs in every 5 minutes. as i know,correct me if i m wrong; at the
 time of commiting solr server stop responding. my concern is how to
 minimize the response time so user not need to wait. or any other
 logic will require for my case. please suggest.

 regards
 jonty

 On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote:
  What is it you want help with? You haven't told us what the
  problem you're trying to solve is. Are you asking how to
  speed up indexing? What have you tried? Have you
  looked at: http://wiki.apache.org/solr/FAQ#Performance?
 
  Best
  Erick
 
  On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com
 wrote:
  I am using solrj to index the data. I have around 5 docs indexed. As
 at
  the time of commit due to lock server stop giving response so I was
  calculating commit time:
 
  double starttemp = System.currentTimeMillis();
  server.add(docs);
  server.commit();
  System.out.println(total time in commit =  +
 (System.currentTimeMillis() -
  starttemp)/1000);
 
  It taking around 9 second to commit the 5000 docs with 15 fields.
 However I
  am not confirm the lock time of index whether it is start
  since server.add(docs); time or server.commit(); time only.
 
  If I am changing from above to following
 
  server.add(docs);
  double starttemp = System.currentTimeMillis();
  server.commit();
  System.out.println(total time in commit =  +
 (System.currentTimeMillis() -
  starttemp)/1000);
 
  then commit time becomes less then 1 second. I am not sure which one is
  right.
 
  please help.
 
  regards
  Jonty
 
 



Re: commit time and lock

2011-07-22 Thread Marc SCHNEIDER
Hello,

Pierre, can you tell us where you read that?
I've read here that optimization is not always a requirement to have an
efficient index, due to some low level changes in lucene 3.xx

Marc.

On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

 Solr will response for search during optimization, but commits will have to
 wait the end of the optimization process.

 During optimization a new index is generated on disk by merging every
 single file of the current index into one big file, so you're server will be
 busy, especially regarding disk access. This may alter your response time
 and has very negative effect on the replication of index if you have a
 master/slave architecture.

 I've read here that optimization is not always a requirement to have an
 efficient index, due to some low level changes in lucene 3.xx, so maybe you
 don't really need optimization. What version of solr are you using ? Maybe
 someone can point toward a relevant link about optimization other than solr
 wiki
 http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

 Pierre


 -Message d'origine-
 De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
 Envoyé : vendredi 22 juillet 2011 12:45
 À : solr-user@lucene.apache.org
 Objet : Re: commit time and lock

 Thanks for clarity.

 One more thing I want to know about optimization.

 Right now I am planning to optimize the server in 24 hour. Optimization is
 also time taking ( last time took around 13 minutes), so I want to know
 that
 :

 1. when optimization is under process that time will solr server response
 or
 not?
 2. if server will not response then how to do optimization of server fast
 or
 other way to do optimization so our user will not have to wait to finished
 optimization process.

 regards
 Jonty



 On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.com
 wrote:

  Solr still respond to search queries during commit, only new indexations
  requests will have to wait (until end of commit?). So I don't think your
  users will experience increased response time during commits (unless your
  server is much undersized).
 
  Pierre
 
  -Message d'origine-
  De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
  Envoyé : jeudi 21 juillet 2011 20:27
  À : solr-user@lucene.apache.org
  Objet : Re: commit time and lock
 
  Actually i m worried about the response time. i k commiting around 500
  docs in every 5 minutes. as i know,correct me if i m wrong; at the
  time of commiting solr server stop responding. my concern is how to
  minimize the response time so user not need to wait. or any other
  logic will require for my case. please suggest.
 
  regards
  jonty
 
  On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com
 wrote:
   What is it you want help with? You haven't told us what the
   problem you're trying to solve is. Are you asking how to
   speed up indexing? What have you tried? Have you
   looked at: http://wiki.apache.org/solr/FAQ#Performance?
  
   Best
   Erick
  
   On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com
  wrote:
   I am using solrj to index the data. I have around 5 docs indexed.
 As
  at
   the time of commit due to lock server stop giving response so I was
   calculating commit time:
  
   double starttemp = System.currentTimeMillis();
   server.add(docs);
   server.commit();
   System.out.println(total time in commit =  +
  (System.currentTimeMillis() -
   starttemp)/1000);
  
   It taking around 9 second to commit the 5000 docs with 15 fields.
  However I
   am not confirm the lock time of index whether it is start
   since server.add(docs); time or server.commit(); time only.
  
   If I am changing from above to following
  
   server.add(docs);
   double starttemp = System.currentTimeMillis();
   server.commit();
   System.out.println(total time in commit =  +
  (System.currentTimeMillis() -
   starttemp)/1000);
  
   then commit time becomes less then 1 second. I am not sure which one
 is
   right.
  
   please help.
  
   regards
   Jonty
  
  
 



RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Hi Mark

I've read that in a thread title  Weird optimize performance degradation, 
where Erick Erickson states that Older versions of Lucene would search faster 
on an optimized index, but this is no longer necessary., and more recently in 
a thread you initiated a month ago Question about optimization.

I'll also be very interested if anyone had a more precise idea/datas of 
benefits and tradeoff of optimize vs merge ...

Pierre


-Message d'origine-
De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com] 
Envoyé : vendredi 22 juillet 2011 15:45
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Hello,

Pierre, can you tell us where you read that?
I've read here that optimization is not always a requirement to have an
efficient index, due to some low level changes in lucene 3.xx

Marc.

On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

 Solr will response for search during optimization, but commits will have to
 wait the end of the optimization process.

 During optimization a new index is generated on disk by merging every
 single file of the current index into one big file, so you're server will be
 busy, especially regarding disk access. This may alter your response time
 and has very negative effect on the replication of index if you have a
 master/slave architecture.

 I've read here that optimization is not always a requirement to have an
 efficient index, due to some low level changes in lucene 3.xx, so maybe you
 don't really need optimization. What version of solr are you using ? Maybe
 someone can point toward a relevant link about optimization other than solr
 wiki
 http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

 Pierre


 -Message d'origine-
 De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
 Envoyé : vendredi 22 juillet 2011 12:45
 À : solr-user@lucene.apache.org
 Objet : Re: commit time and lock

 Thanks for clarity.

 One more thing I want to know about optimization.

 Right now I am planning to optimize the server in 24 hour. Optimization is
 also time taking ( last time took around 13 minutes), so I want to know
 that
 :

 1. when optimization is under process that time will solr server response
 or
 not?
 2. if server will not response then how to do optimization of server fast
 or
 other way to do optimization so our user will not have to wait to finished
 optimization process.

 regards
 Jonty



 On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.com
 wrote:

  Solr still respond to search queries during commit, only new indexations
  requests will have to wait (until end of commit?). So I don't think your
  users will experience increased response time during commits (unless your
  server is much undersized).
 
  Pierre
 
  -Message d'origine-
  De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
  Envoyé : jeudi 21 juillet 2011 20:27
  À : solr-user@lucene.apache.org
  Objet : Re: commit time and lock
 
  Actually i m worried about the response time. i k commiting around 500
  docs in every 5 minutes. as i know,correct me if i m wrong; at the
  time of commiting solr server stop responding. my concern is how to
  minimize the response time so user not need to wait. or any other
  logic will require for my case. please suggest.
 
  regards
  jonty
 
  On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com
 wrote:
   What is it you want help with? You haven't told us what the
   problem you're trying to solve is. Are you asking how to
   speed up indexing? What have you tried? Have you
   looked at: http://wiki.apache.org/solr/FAQ#Performance?
  
   Best
   Erick
  
   On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com
  wrote:
   I am using solrj to index the data. I have around 5 docs indexed.
 As
  at
   the time of commit due to lock server stop giving response so I was
   calculating commit time:
  
   double starttemp = System.currentTimeMillis();
   server.add(docs);
   server.commit();
   System.out.println(total time in commit =  +
  (System.currentTimeMillis() -
   starttemp)/1000);
  
   It taking around 9 second to commit the 5000 docs with 15 fields.
  However I
   am not confirm the lock time of index whether it is start
   since server.add(docs); time or server.commit(); time only.
  
   If I am changing from above to following
  
   server.add(docs);
   double starttemp = System.currentTimeMillis();
   server.commit();
   System.out.println(total time in commit =  +
  (System.currentTimeMillis() -
   starttemp)/1000);
  
   then commit time becomes less then 1 second. I am not sure which one
 is
   right.
  
   please help.
  
   regards
   Jonty
  
  
 



Re: commit time and lock

2011-07-22 Thread Shawn Heisey

On 7/22/2011 8:23 AM, Pierre GOSSE wrote:

I've read that in a thread title  Weird optimize performance degradation, where Erick Erickson 
states that Older versions of Lucene would search faster on an optimized index, but this is no longer 
necessary., and more recently in a thread you initiated a month ago Question about 
optimization.

I'll also be very interested if anyone had a more precise idea/datas of 
benefits and tradeoff of optimize vs merge ...


My most recent testing has been with Solr 3.2.0.  I have noticed some 
speedup after optimizing an index, but the gain is not 
earth-shattering.  My index consists of 7 shards.  One of them is small, 
and receives all new documents every two minutes.  The others are large, 
and aside from deletes, are mostly static.  Once a day, the oldest data 
is distributed from the small shard to its proper place in the other six 
shards.


The small shard is optimized once an hour, and usually takes less than a 
minute.  I optimize one large shard every day, so each one gets 
optimized once every six days.  That optimize takes 10-15 minutes.  The 
only reason that I optimize is to remove deleted documents, whatever 
speedup I get is just icing on the cake.  Deleted documents take up 
space and continue to influence the relevance scoring of queries, so I 
want to remove them.


Thanks,
Shawn



RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Merging does not happen often enough to keep deleted documents to a low enough 
count ?

Maybe there's a need to have partial optimization available in solr, meaning 
that segment with too much deleted document could be copied to a new file 
without unnecessary datas. That way cleaning deleted datas could be compatible 
with having light replications.

I'm worried by this idea of deleted documents influencing relevance scores, any 
pointer to how important this influence may be ?

Pierre

-Message d'origine-
De : Shawn Heisey [mailto:s...@elyograg.org] 
Envoyé : vendredi 22 juillet 2011 16:42
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

On 7/22/2011 8:23 AM, Pierre GOSSE wrote:
 I've read that in a thread title  Weird optimize performance degradation, 
 where Erick Erickson states that Older versions of Lucene would search 
 faster on an optimized index, but this is no longer necessary., and more 
 recently in a thread you initiated a month ago Question about optimization.

 I'll also be very interested if anyone had a more precise idea/datas of 
 benefits and tradeoff of optimize vs merge ...

My most recent testing has been with Solr 3.2.0.  I have noticed some 
speedup after optimizing an index, but the gain is not 
earth-shattering.  My index consists of 7 shards.  One of them is small, 
and receives all new documents every two minutes.  The others are large, 
and aside from deletes, are mostly static.  Once a day, the oldest data 
is distributed from the small shard to its proper place in the other six 
shards.

The small shard is optimized once an hour, and usually takes less than a 
minute.  I optimize one large shard every day, so each one gets 
optimized once every six days.  That optimize takes 10-15 minutes.  The 
only reason that I optimize is to remove deleted documents, whatever 
speedup I get is just icing on the cake.  Deleted documents take up 
space and continue to influence the relevance scoring of queries, so I 
want to remove them.

Thanks,
Shawn



RE: commit time and lock

2011-07-22 Thread Jonathan Rochkind
How old is 'older'?  I'm pretty sure I'm still getting much faster performance 
on an optimized index in Solr 1.4. 

This could be due to the nature of my index and queries (which include some 
medium sized stored fields, and extensive facetting -- facetting on up to a 
dozen fields in every request, where each field can include millions of unique 
values. Amazing I can do this with good performance at all!). 

It's also possible i'm wrong about that faster performance, i haven't done 
robustly valid benchmarking on a clone of my production index yet. But it 
really looks like that way to me, from what investigation I have done. 

If the answer is that optimization is believed no longer neccesary on versions 
LATER than 1.4, that might be the simplest explanation. 

From: Pierre GOSSE [pierre.go...@arisem.com]
Sent: Friday, July 22, 2011 10:23 AM
To: solr-user@lucene.apache.org
Subject: RE: commit time and lock

Hi Mark

I've read that in a thread title  Weird optimize performance degradation, 
where Erick Erickson states that Older versions of Lucene would search faster 
on an optimized index, but this is no longer necessary., and more recently in 
a thread you initiated a month ago Question about optimization.

I'll also be very interested if anyone had a more precise idea/datas of 
benefits and tradeoff of optimize vs merge ...

Pierre


-Message d'origine-
De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com]
Envoyé : vendredi 22 juillet 2011 15:45
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Hello,

Pierre, can you tell us where you read that?
I've read here that optimization is not always a requirement to have an
efficient index, due to some low level changes in lucene 3.xx

Marc.

On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

 Solr will response for search during optimization, but commits will have to
 wait the end of the optimization process.

 During optimization a new index is generated on disk by merging every
 single file of the current index into one big file, so you're server will be
 busy, especially regarding disk access. This may alter your response time
 and has very negative effect on the replication of index if you have a
 master/slave architecture.

 I've read here that optimization is not always a requirement to have an
 efficient index, due to some low level changes in lucene 3.xx, so maybe you
 don't really need optimization. What version of solr are you using ? Maybe
 someone can point toward a relevant link about optimization other than solr
 wiki
 http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

 Pierre


 -Message d'origine-
 De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
 Envoyé : vendredi 22 juillet 2011 12:45
 À : solr-user@lucene.apache.org
 Objet : Re: commit time and lock

 Thanks for clarity.

 One more thing I want to know about optimization.

 Right now I am planning to optimize the server in 24 hour. Optimization is
 also time taking ( last time took around 13 minutes), so I want to know
 that
 :

 1. when optimization is under process that time will solr server response
 or
 not?
 2. if server will not response then how to do optimization of server fast
 or
 other way to do optimization so our user will not have to wait to finished
 optimization process.

 regards
 Jonty



 On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.com
 wrote:

  Solr still respond to search queries during commit, only new indexations
  requests will have to wait (until end of commit?). So I don't think your
  users will experience increased response time during commits (unless your
  server is much undersized).
 
  Pierre
 
  -Message d'origine-
  De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
  Envoyé : jeudi 21 juillet 2011 20:27
  À : solr-user@lucene.apache.org
  Objet : Re: commit time and lock
 
  Actually i m worried about the response time. i k commiting around 500
  docs in every 5 minutes. as i know,correct me if i m wrong; at the
  time of commiting solr server stop responding. my concern is how to
  minimize the response time so user not need to wait. or any other
  logic will require for my case. please suggest.
 
  regards
  jonty
 
  On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com
 wrote:
   What is it you want help with? You haven't told us what the
   problem you're trying to solve is. Are you asking how to
   speed up indexing? What have you tried? Have you
   looked at: http://wiki.apache.org/solr/FAQ#Performance?
  
   Best
   Erick
  
   On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com
  wrote:
   I am using solrj to index the data. I have around 5 docs indexed.
 As
  at
   the time of commit due to lock server stop giving response so I was
   calculating commit time:
  
   double starttemp = System.currentTimeMillis();
   server.add(docs);
   server.commit

Re: commit time and lock

2011-07-22 Thread Shawn Heisey

On 7/22/2011 9:32 AM, Pierre GOSSE wrote:

Merging does not happen often enough to keep deleted documents to a low enough 
count ?

Maybe there's a need to have partial optimization available in solr, meaning 
that segment with too much deleted document could be copied to a new file without 
unnecessary datas. That way cleaning deleted datas could be compatible with having light 
replications.

I'm worried by this idea of deleted documents influencing relevance scores, any 
pointer to how important this influence may be ?


I've got a pretty high mergeFactor, for fast indexing. Also, I want to 
know for sure and control when merges happen, so I am not leaving it up 
to Lucene/Solr.


Right now the largest number of deleted documents on any shard at this 
moment is 45347.  The shard (17.65GB) contains 9663271 documents, in six 
segments.  That will be one HUGE segment (from the last optimize) and 
five very very tiny segments, each with only a few thousand documents in 
them.  Tonight when the document distribution process runs, that index 
will be optimized again.  Tomorrow night a different shard will be 
optimized.


Deleted documents can (and do) happen anywhere in the index, so even if 
I had a lot of largish segments rather than one huge segment, it's very 
likely that just expunging deletes would still result in the entire 
index being merged, so I am not losing anything by doing a full 
optimize, and I am gaining a small bit of performance.


The 45000 deletes mentioned above represent less than half a percent of 
the shard, so the influence on relevance is *probably* not large ... but 
that's not something I can say definitively.  I think it all depends on 
what people are searching for and how common the terms in the deleted 
documents are.


Thanks,
Shawn



Re: commit time and lock

2011-07-21 Thread Jonty Rhods
Actually i m worried about the response time. i k commiting around 500
docs in every 5 minutes. as i know,correct me if i m wrong; at the
time of commiting solr server stop responding. my concern is how to
minimize the response time so user not need to wait. or any other
logic will require for my case. please suggest.

regards
jonty

On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote:
 What is it you want help with? You haven't told us what the
 problem you're trying to solve is. Are you asking how to
 speed up indexing? What have you tried? Have you
 looked at: http://wiki.apache.org/solr/FAQ#Performance?

 Best
 Erick

 On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote:
 I am using solrj to index the data. I have around 5 docs indexed. As at
 the time of commit due to lock server stop giving response so I was
 calculating commit time:

 double starttemp = System.currentTimeMillis();
 server.add(docs);
 server.commit();
 System.out.println(total time in commit =  + (System.currentTimeMillis() -
 starttemp)/1000);

 It taking around 9 second to commit the 5000 docs with 15 fields. However I
 am not confirm the lock time of index whether it is start
 since server.add(docs); time or server.commit(); time only.

 If I am changing from above to following

 server.add(docs);
 double starttemp = System.currentTimeMillis();
 server.commit();
 System.out.println(total time in commit =  + (System.currentTimeMillis() -
 starttemp)/1000);

 then commit time becomes less then 1 second. I am not sure which one is
 right.

 please help.

 regards
 Jonty




Re: commit time and lock

2011-06-22 Thread Ranveer

Dear all,

Kindly help me..

thanks

On Tuesday 21 June 2011 11:46 AM, Jonty Rhods wrote:

I am using solrj to index the data. I have around 5 docs indexed. As at
the time of commit due to lock server stop giving response so I was
calculating commit time:

double starttemp = System.currentTimeMillis();
server.add(docs);
server.commit();
System.out.println(total time in commit =  + (System.currentTimeMillis() -
starttemp)/1000);

It taking around 9 second to commit the 5000 docs with 15 fields. However I
am not confirm the lock time of index whether it is start
since server.add(docs); time or server.commit(); time only.

If I am changing from above to following

server.add(docs);
double starttemp = System.currentTimeMillis();
server.commit();
System.out.println(total time in commit =  + (System.currentTimeMillis() -
starttemp)/1000);

then commit time becomes less then 1 second. I am not sure which one is
right.

please help.

regards
Jonty





commit time and lock

2011-06-21 Thread Jonty Rhods
I am using solrj to index the data. I have around 5 docs indexed. As at
the time of commit due to lock server stop giving response so I was
calculating commit time:

double starttemp = System.currentTimeMillis();
server.add(docs);
server.commit();
System.out.println(total time in commit =  + (System.currentTimeMillis() -
starttemp)/1000);

It taking around 9 second to commit the 5000 docs with 15 fields. However I
am not confirm the lock time of index whether it is start
since server.add(docs); time or server.commit(); time only.

If I am changing from above to following

server.add(docs);
double starttemp = System.currentTimeMillis();
server.commit();
System.out.println(total time in commit =  + (System.currentTimeMillis() -
starttemp)/1000);

then commit time becomes less then 1 second. I am not sure which one is
right.

please help.

regards
Jonty


Re: commit time and lock

2011-06-21 Thread Erick Erickson
What is it you want help with? You haven't told us what the
problem you're trying to solve is. Are you asking how to
speed up indexing? What have you tried? Have you
looked at: http://wiki.apache.org/solr/FAQ#Performance?

Best
Erick

On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote:
 I am using solrj to index the data. I have around 5 docs indexed. As at
 the time of commit due to lock server stop giving response so I was
 calculating commit time:

 double starttemp = System.currentTimeMillis();
 server.add(docs);
 server.commit();
 System.out.println(total time in commit =  + (System.currentTimeMillis() -
 starttemp)/1000);

 It taking around 9 second to commit the 5000 docs with 15 fields. However I
 am not confirm the lock time of index whether it is start
 since server.add(docs); time or server.commit(); time only.

 If I am changing from above to following

 server.add(docs);
 double starttemp = System.currentTimeMillis();
 server.commit();
 System.out.println(total time in commit =  + (System.currentTimeMillis() -
 starttemp)/1000);

 then commit time becomes less then 1 second. I am not sure which one is
 right.

 please help.

 regards
 Jonty