Mark,
Thanks for your help and lucid exposition of what may be going on....I will
await to hear from you before I plough ahead with the latest trunk bits....
Best regards,
-- Bill
--------------------------------------------------
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 7:29 AM
To: <solr-user@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication
Hang on. The more I look at this, the more I am thinking that was not the
problem. Directories are pretty strictly managed by Lucene at the moment,
and it should actually be pretty difficult to have one closed out from
under you. They are singletons and reference counted. The IndexReader
would have to have been closed if it was responsible for closing the
Directory, and in that case, we would not be trying to reopen it. The
Searcher we get the IndexReader from has been inc ref'd to ensure it won't
close. All I can think is that something is grabbing/stealing a Directory
that didn't directly ask for it from FSDirectory.getDirectory, and is then
closing it. I'm trying to hunt where that could be happening now. Hope to
spot something, but appears pretty mysterious at the moment. I suppose
another option is a nasty little race condition - I've been trying to
repeat the error by sending lots of update/search requests from multiple
threads with no luck though. Looks like I may have to throw in snap puller
code (havn't look too heavily into any of that before).
At worst, if/when a fix is discovered, you will probably be able to apply
just the fix to the revision your working with.
- Mark
William Pierce wrote:
Mark,
Thanks for your response --- I do appreciate all you volunteers working
to provide such a nice system!
Anyway, I will try the trunk bits as you said. The only problem is that
the later the trunk I use from 1.3, the more of post 1.3 capability I
get. And I feel somewhat exposed running these bits in our production
environment...
Not sure if there's another approach?
Thanks,
-Bill
--------------------------------------------------
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Friday, November 14, 2008 8:43 PM
To: <solr-user@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication
Hey William, sorry about the trouble. I have to look at this further,
but I think the issue is fixed if you grab the latest trunk build.
Solr-465 should inadvertently fix things - before that patch, a
deprecated constructor for solrsearcher was being called - this
constructor caused the underlying IndexReader to close its own
Directory, and since IndexReaders are reopened, we don't want that.
Mark Miller wrote:
Looks like there might be an issue with the reopen - I'm not seeing
what it could be offhand though. Have to find what could be closing a
Directory unexpectedly...I'll try to take a further look over the
weekend.
- Mark
William Pierce wrote:
Folks:
I am using the nightly build of 1.3 as of Oct 23 so as to use the
replication handler. I am running on windows 2003 server with tomcat
6.0.14. Everything was running fine until I noticed that certain
updated records were not showing up on the slave. Further
investigation showed me that the failures have indeed been occurring
since early this morning with a fatal exception....here is a segment
of the tomcat log:
INFO: Total time taken for download : 0 secs
Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller
fetchLatestIndex
INFO: Conf files are not downloaded or are in sync
Nov 14, 2008 5:34:24 AM org.apache.solr.update.DirectUpdateHandler2
commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler
doSnapPull
SEVERE: SnapPull failed org.apache.solr.common.SolrException:
Snappull failed : at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
at
org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException:
org.apache.lucene.store.AlreadyClosedException: this Directory is
closed
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
... 11 more
Caused by: org.apache.lucene.store.AlreadyClosedException: this
Directory is closed
at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
at
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
at
org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
at
org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
... 14 more
Nov 14, 2008 5:38:52 AM org.apache.solr.update.DirectUpdateHandler2
commit
Any ideas, anyone?
-- Bill