I meant patch as in a source code patch, so I'm afraid your kind of in a
tough spot. Thats part of the 'trunk running' risk unfortunately...
You've done it once though, so I am sure you can manage again right ?
I'm not sure exactly what state your checkout is (though I suppose I can
guess close from the date), which makes producing a source patch
difficult, but essentially hiding the problem is pretty simple.
Take your check out (which I hope you have or can get again...the only
safe way to play the trunk game) and make the following simple changes:
We need to stop using the SolrIndexSearcher constructors that take a
String or File rather than a Directory or Reader. The key spot is in
SolrCore:
line: approx 1010
try {
newestSearcher = getNewestSearcher(false);
if (newestSearcher != null) {
IndexReader currentReader = newestSearcher.get().getReader();
String newIndexDir = getNewIndexDir();
if(new File(getIndexDir()).equals(new File(newIndexDir))) {
IndexReader newReader = currentReader.reopen();
if(newReader == currentReader) {
currentReader.incRef();
}
tmp = new SolrIndexSearcher(this, schema, "main", newReader,
true, true);
} else {
tmp = new SolrIndexSearcher(this, schema, "main", newIndexDir,
true);
}
} else {
tmp = new SolrIndexSearcher(this, schema, "main",
getNewIndexDir(), true);
You can see the two lower SolrIndexSearchers are inited with a String.
You want to init them with a Directory instead,
tmp = new SolrIndexSearcher(this, schema, "main",
FSDirectory.getDirectory(newIndexDir), true, true); // you will need to
add another true param here
By passing a Directory rather than a String, the underlying IndexReaders
will not try to close the Directory and you won't hit that error. Trunk
no longer has this problem exposed because we now supply Directories to
these constructors (though for a different reason).
If your hesitant on any of this, you might try trunk and just test it
out after looking at the changes that have been put in, or you might
email me privately and I may be able to point you to some alternate options.
- Mark
William Pierce wrote:
Mark,
That sounds great! Good luck with the cleaning :-)
Let me know how I can get a patch --- I'd prefer not do a solr build
from source since we are not Java savvy here....:-(
- Bill
--------------------------------------------------
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 12:43 PM
To: <solr-user@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication
Okay, I'm fairly certain I've found it. As usual, take a walk and the
solution pops into your head out of the blue.
It looks like Lucene's IndexReader reopen call is not very friendly
with the FSDirectory implementation. If you call reopen and it
returns a new IndexReader, it creates a new reference on the
Directory - so if you reopen an IndexReader that was originally
opened with a non Directory parameter (String or File instead), both
Readers (the reopened one and the one your reopening on) will close
the Directory when they close. Thats not right. Thats how we get to 0
faster than we should. So its kind of a Lucene issue.
My guess that this is hidden in the trunk was right, because I think
we are no longer using String, File based IndexReader opens, which
means our IndexReaders don't attempt to close their underlying
Directories now.
I can probably send you patch for the revision your on to hide this
as well, but I'm already in the doghouse on cleaning right now ; )
The way my brain works, I'll probably be back to this later though.
- Mark
William Pierce wrote:
Trunk may actually still hide the issue (possibly), but something
really funky seems to have gone on and I can't find it yet. Do you
have any custom code interacting with solr?
None whatsoever...I am using out-of-the-box solr 1.3 (build of
10/23). I am using my C# app to http requests to my solr instance.
Is there something you want me to try at my end that might give you
a clue? Let me know and I can try to help out.
Best,
- Bill
--------------------------------------------------
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 10:59 AM
To: <solr-user@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication
Havn't given up, but this has really got me so far. For every path,
FSDirectory allows just one instance of FSDirectory to exist, and
it keeps a ref count of how many have been returned from
openDirectory for a given path. An FSDirectory will not actually be
closed unless all references to it are released (it doesn't
actually even do anything in close, other than drop the reference).
So pretty much, the only way to get into trouble is to call close
enough times to equal how many times you called openDirectory, and
then try to use the FSDirectory again. This is what your stack
trace indicates is happening.
So we can get the call hierarchy for directory.close() in solr, and
we find that everything is pretty matched up...at worst it looks
that a reference might not be closed - but that doesn't hurt
anything...FSDirectory will just think there is something out there
holding onto a reference for that directory and allow you to
continue using it, even though no such reference might exist. Its
only when enough closes are called that an instance will be marked
as closed (a further openDirectory will return a new open
instance). So to get your error, something has to close a Directory
that it did not get from openDirectory.
We know that the IndexReader that is trying to be reopened must
have called openDirectory (or something called openDirectory for
it) and we know that it hasn't called close (the IndexReader has
already held up to an ensureOpen call on itself in that stack
trace). Something else must have closed it. I can't find this
happening. Nothing else calls close on Directory unless it called
openDirectory (that I can find using all of Eclipses magical
goodness).
So how does the refcount on the Directory hit 0? I can't find or
duplicate yet...
Trunk may actually still hide the issue (possibly), but something
really funky seems to have gone on and I can't find it yet. Do you
have any custom code interacting with solr?
- Mark
William Pierce wrote:
Mark,
Thanks for your help and lucid exposition of what may be going
on....I will await to hear from you before I plough ahead with the
latest trunk bits....
Best regards,
-- Bill
--------------------------------------------------
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 7:29 AM
To: <solr-user@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication
Hang on. The more I look at this, the more I am thinking that was
not the problem. Directories are pretty strictly managed by
Lucene at the moment, and it should actually be pretty difficult
to have one closed out from under you. They are singletons and
reference counted. The IndexReader would have to have been closed
if it was responsible for closing the Directory, and in that
case, we would not be trying to reopen it. The Searcher we get
the IndexReader from has been inc ref'd to ensure it won't close.
All I can think is that something is grabbing/stealing a
Directory that didn't directly ask for it from
FSDirectory.getDirectory, and is then closing it. I'm trying to
hunt where that could be happening now. Hope to spot something,
but appears pretty mysterious at the moment. I suppose another
option is a nasty little race condition - I've been trying to
repeat the error by sending lots of update/search requests from
multiple threads with no luck though. Looks like I may have to
throw in snap puller code (havn't look too heavily into any of
that before).
At worst, if/when a fix is discovered, you will probably be able
to apply just the fix to the revision your working with.
- Mark
William Pierce wrote:
Mark,
Thanks for your response --- I do appreciate all you volunteers
working to provide such a nice system!
Anyway, I will try the trunk bits as you said. The only
problem is that the later the trunk I use from 1.3, the more of
post 1.3 capability I get. And I feel somewhat exposed running
these bits in our production environment...
Not sure if there's another approach?
Thanks,
-Bill
--------------------------------------------------
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Friday, November 14, 2008 8:43 PM
To: <solr-user@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication
Hey William, sorry about the trouble. I have to look at this
further, but I think the issue is fixed if you grab the latest
trunk build. Solr-465 should inadvertently fix things - before
that patch, a deprecated constructor for solrsearcher was being
called - this constructor caused the underlying IndexReader to
close its own Directory, and since IndexReaders are reopened,
we don't want that.
Mark Miller wrote:
Looks like there might be an issue with the reopen - I'm not
seeing what it could be offhand though. Have to find what
could be closing a Directory unexpectedly...I'll try to take a
further look over the weekend.
- Mark
William Pierce wrote:
Folks:
I am using the nightly build of 1.3 as of Oct 23 so as to use
the replication handler. I am running on windows 2003
server with tomcat 6.0.14. Everything was running fine
until I noticed that certain updated records were not showing
up on the slave. Further investigation showed me that the
failures have indeed been occurring since early this morning
with a fatal exception....here is a segment of the tomcat log:
INFO: Total time taken for download : 0 secs
Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller
fetchLatestIndex
INFO: Conf files are not downloaded or are in sync
Nov 14, 2008 5:34:24 AM
org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit(optimize=false,waitFlush=true,waitSearcher=true)
Nov 14, 2008 5:34:24 AM
org.apache.solr.handler.ReplicationHandler doSnapPull
SEVERE: SnapPull failed
org.apache.solr.common.SolrException: Snappull failed : at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
at
org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
at
org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException:
org.apache.lucene.store.AlreadyClosedException: this
Directory is closed
at
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
at
org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
... 11 more
Caused by: org.apache.lucene.store.AlreadyClosedException:
this Directory is closed
at
org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
at
org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
at
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
at
org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
at
org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
at
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
... 14 more
Nov 14, 2008 5:38:52 AM
org.apache.solr.update.DirectUpdateHandler2 commit
Any ideas, anyone?
-- Bill