Re: [1.3] help with update timeout issue?

2010-01-20 Thread Jerome L Quinn


Lance Norskog goks...@gmail.com wrote on 01/16/2010 12:43:09 AM:

 If your indexing software does not have the ability to retry after a
 failure, you might with to change the timeout from 20 seconds to, say,
 5 minutes.

I can make it retry, but I have somewhat real-time processes doing these
updates.  Does anyone
push updates into a temporary file and then have an async process push the
updates so that it
can survive the lockups without worry?  This seems like a real hack, but I
don't want a
long timeout like that in the program that currently pushes the data.

One thing that worries me is that solr may not respond to searches in these
windows.  I'm basing
that on the observation that search does not respond when solr is
optimizing.

Can anyone offer me insight on why these delays happen?

Thanks,
Jerry

Re: [1.3] help with update timeout issue?

2010-01-15 Thread MitchK

If, and only if you need to fix your problem as fast as you can, I would
think about virtualization.
You need to replicate your Solr and his index-files. 

The idea is quiete easy: while one Solr-server does its optimization, the
other one is available for searching documents without any downtime. After
the first Solr-server has finished, add every document that has been added
to the second Solr's index and then optimize the second one.
Afterwards you can choose one of these servers as your default-server and
minimize system's ressources for the other one. 

It's only an idea. I have never done something like that, but maybe it
helps.
For more information consider the wiki for distributed search.

Kind regards
Mitch
-- 
View this message in context: 
http://old.nabble.com/-1.3--help-with-update-timeout-issue--tp27171798p27178423.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: [1.3] help with update timeout issue?

2010-01-15 Thread Andre Parodi

add these to your JAVA_OPTS when you start your jvm.
-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails 
-Xloggc:/your/path/verbose-gc.log


tail the verbose gc log to see if the timing of your pause corresponds 
with a full gc.


On 15/01/10 03:59, Jerome L Quinn wrote:

Is this related to GC?
   


Re: [1.3] help with update timeout issue?

2010-01-15 Thread MitchK

The current topic Need deployment strategy may give you another answer
quite similar to mine one. It sounds much cleaner. 
-- 
View this message in context: 
http://old.nabble.com/-1.3--help-with-update-timeout-issue--tp27171798p27179780.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: [1.3] help with update timeout issue?

2010-01-15 Thread Jerome L Quinn
Otis Gospodnetic otis_gospodne...@yahoo.com wrote on 01/14/2010 10:07:15
PM:

 See those waitFlush=true,waitSearcher=true ?  Do things improve if
 you make them false? (not sure how with autocommit without looking
 at the config and not sure if this makes a difference when
 autocommit triggers commits)

Looking at DirectUpdateHandler2, it appears that those values are hardwired
to true for autocommit.  Unless there's another mechanism for changing
that.

 Re deleted docs, they are probably getting expunged, it's just that
 you always have more deleted docs, so those 2 numbers will never be
 the same without optimize.

I can accept that they will always be different, but that's a large
difference.
Hmm, a couple weeks ago, I manually deleted a bunch of docs that had
associated
data get corrupted.  Normally, I'd only be deleting a day's worth of docs
at
a time.  Is there a time I could expect the old stuff to get cleaned up by
without optimizing?

Thanks,
Jerry

Re: [1.3] help with update timeout issue?

2010-01-15 Thread Lance Norskog
If your indexing software does not have the ability to retry after a
failure, you might with to change the timeout from 20 seconds to, say,
5 minutes.

On Fri, Jan 15, 2010 at 1:20 PM, Jerome L Quinn jlqu...@us.ibm.com wrote:
 Otis Gospodnetic otis_gospodne...@yahoo.com wrote on 01/14/2010 10:07:15
 PM:

 See those waitFlush=true,waitSearcher=true ?  Do things improve if
 you make them false? (not sure how with autocommit without looking
 at the config and not sure if this makes a difference when
 autocommit triggers commits)

 Looking at DirectUpdateHandler2, it appears that those values are hardwired
 to true for autocommit.  Unless there's another mechanism for changing
 that.

 Re deleted docs, they are probably getting expunged, it's just that
 you always have more deleted docs, so those 2 numbers will never be
 the same without optimize.

 I can accept that they will always be different, but that's a large
 difference.
 Hmm, a couple weeks ago, I manually deleted a bunch of docs that had
 associated
 data get corrupted.  Normally, I'd only be deleting a day's worth of docs
 at
 a time.  Is there a time I could expect the old stuff to get cleaned up by
 without optimizing?

 Thanks,
 Jerry



-- 
Lance Norskog
goks...@gmail.com


[1.3] help with update timeout issue?

2010-01-14 Thread Jerome L Quinn


Hi, folks,

I am using Solr 1.3 pretty successfully, but am running into an issue that
hits once in a long while.  I'm still using 1.3 since I have some custom
code I will have to port forward to 1.4.

My basic setup is that I have data sources continually pushing data into
Solr, around 20K adds per day.  The index is currently around 100G, stored
on local disk on a fast linux server.  I'm trying to make new docs
searchable as quickly as possible, so I currently have autocommit set to
15s.  I originally had 3s but that seems to be a little too unstable.  I
never optimize the index since optimize will lock things up solid for 2
hours, dropping docs until the optimize completes.  I'm using the default
segment merging settings.

Every once in a while I'm getting a socket timeout when trying to add a
document.  I traced it to a 20s timeout and then found the corresponding
point in the Solr log.

Jan 13, 2010 2:59:15 PM org.apache.solr.core.SolrCore execute
INFO: [tales] webapp=/solr path=/update params={} status=0 QTime=2
Jan 13, 2010 2:59:15 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Jan 13, 2010 2:59:56 PM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening searc...@26e926e9 main
Jan 13, 2010 2:59:56 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush

Solr locked up for 41 seconds here while doing some of the commit work.
So, I have a few questions.

Is this related to GC?
Does Solr always lock up when merging segments and I just have to live with
losing the doc I want to add?
Is there a timeout that would guarantee me a write success?
Should I just retry in this situation? If so, how do I distinguish between
this and Solr just being down?
I already have had issues in the past with too many files open, so
increasing the merge factor isn't an option.


On a related note, I had previously asked about optimizing and was told
that segment merging would take care of cleaning up deleted docs.  However,
I have the following stats for my index:

numDocs : 2791091
maxDoc : 4811416

My understanding is that numDocs is the docs being searched and maxDoc is
the number of docs including ones that will disappear after optimization.
How do I get this cleanup without using optimize, since it locks up Solr
for multiple hours.  I'm deleting old docs daily as well.

Thanks for all the help,
Jerry