Great. Will merge that patch into trunk as soon as possible. -Johan
On Thu, Jun 16, 2011 at 10:21 PM, Jennifer Hickey <[email protected]> wrote: > Hi Johan, > Sorry for the delay. I was finally able to try out that patch (against 1.3) > on our test environment, and things are running smoothly. I have not seen > the ClosedChannelException (or any others) once in 24 hours. Previously on > the same system I saw it frequently, as early as 15 minutes into the uptime. > Thanks! > > Jennifer > ________________________________________ > From: [email protected] [[email protected]] On Behalf > Of Johan Svensson [[email protected]] > Sent: Thursday, May 26, 2011 3:09 AM > To: Neo4j user discussions > Subject: Re: [Neo4j] ClosedChannelExceptions in highly concurrent environment > > Hi Jennifier, > > Could you apply this patch to the kernel and then see if the problem > still exists? If you want I can send you a jar but then I need to know > what version of Neo4j you are using. > > Regards, > Johan > > > On Mon, May 23, 2011 at 6:50 PM, Jennifer Hickey <[email protected]> wrote: >> Hi Tobias, >> >> Looks like the environment is still setup, so I should be able to attempt a >> repro with a patched version. Let me know what you would like me to use. >> >> Thanks, >> Jennifer >> ________________________________________ >> From: [email protected] [[email protected]] On Behalf >> Of Tobias Ivarsson [[email protected]] >> Sent: Monday, May 16, 2011 11:01 PM >> To: Neo4j user discussions >> Subject: Re: [Neo4j] ClosedChannelExceptions in highly concurrent environment >> >> Hi Jennifer, >> >> Could you reproduce it on your side by doing the same kind of systems tests >> again? If you could then I'd be very happy if you could try a patched >> version that we have been working on and see if that fixes the issue. >> >> Cheers, >> Tobias >> >> On Tue, May 17, 2011 at 2:49 AM, Jennifer Hickey <[email protected]> wrote: >> >>> Hi Tobias, >>> Unfortunately I don't have an isolated test case, as I was doing a fairly >>> involved system test at the time. I may be able to have a colleague work on >>> reproducing it at a later date (I've been diverted to something else for the >>> moment). >>> >>> I was remote debugging with Eclipse, so I toggled a method breakpoint on >>> Thread.interrupt() and then inspected the stack once the breakpoint was hit. >>> >>> Sorry I don't have more information at the moment. I agree that >>> eliminating the interrupts sounds like the best approach, if possible. >>> >>> Thanks, >>> Jennifer >>> ________________________________________ >>> From: [email protected] [[email protected]] On >>> Behalf Of Tobias Ivarsson [[email protected]] >>> Sent: Thursday, April 28, 2011 6:23 AM >>> To: Neo4j user discussions >>> Subject: Re: [Neo4j] ClosedChannelExceptions in highly concurrent >>> environment >>> >>> Hi Jennifer, >>> >>> I'd first like to thank you for the testing and analysis you've done. Very >>> useful stuff. Do you think you could send some test code our way that >>> reproduces this issue? >>> >>> This is actually the first time this issue has been reported, so I wouldn't >>> say it is a common issue. My guess is that your thread volume triggered a >>> rare condition that wouldn't be encountered otherwise. >>> >>> I'm also curious to know how you found the source of the interruptions. >>> When >>> I debug thread interruptions I've never been able to find out where the >>> thread got interrupted from without doing tedious procedures of breakpoint >>> + >>> logging + trying to match thread ids. If you have a better method for doing >>> that I'd very much like to know. >>> >>> I think we should focus the effort on fixing the interruption issue if we >>> can. And I believe we would be able to do that if the interruptions do in >>> fact originate from where you say they do. But the suggestion of being able >>> to switch the lucene directory implementation is still interesting, but as >>> you point out since it has issues on some platforms it would be better if >>> we >>> could be rid of the interruption issue. >>> >>> Cheers, >>> Tobias >>> >>> On Thu, Apr 28, 2011 at 12:41 AM, Jennifer Hickey <[email protected] >>> >wrote: >>> >>> > Hello, >>> > I've been running some tests w/approx 400 threads reading various indexed >>> > property values. I'm running on 64 bit Linux. I was frequently seeing >>> the >>> > ClosedChannelException below. The javadoc on Lucene's NIOFSDirectory >>> states >>> > that "Accessing this class either directly or indirectly from a thread >>> while >>> > it's interrupted can close the underlying file descriptor immediately if >>> at >>> > the same time the thread is blocked on IO. The file descriptor will >>> remain >>> > closed and subsequent access to {@link NIOFSDirectory} will throw a >>> {@link >>> > ClosedChannelException}. If your application uses either {@link >>> > Thread#interrupt()} or {@link Future#cancel(boolean)} you should use >>> {@link >>> > SimpleFSDirectory} in favor of {@link NIOFSDirectory}." >>> > >>> > A bit of debugging revealed that the Thread.interrupts were coming from >>> > Neo4j, specifically in RWLock and MappedPersistenceWindow. So it seems >>> like >>> > this would be a common problem, though perhaps I am missing something? >>> > >>> > SimpleFSDirectory seems a bit of a performance bottleneck, so I switched >>> to >>> > MMapDirectory and the problem did go away. I didn't see a way to switch >>> > implementations w/out modifying neo4j code, so I changed LuceneDataSource >>> as >>> > follows: >>> > >>> > static Directory getDirectory( String storeDir, >>> > IndexIdentifier identifier ) throws IOException >>> > { >>> > MMapDirectory dir=new MMapDirectory(getFileDirectory( storeDir, >>> > identifier), null); >>> > if(MMapDirectory.UNMAP_SUPPORTED) { >>> > dir.setUseUnmap(true); >>> > } >>> > return dir; >>> > } >>> > >>> > So I'm wondering if others have seen this problem and/or if there is a >>> > recommended solution? Our product runs on quite a few different >>> operating >>> > systems, so I have some reservations about using MMapDirectory as well >>> > (javadoc speaks of a few caveats on Windows, 64 vs 32, etc). Also, I'd >>> > rather not maintain a patched version of the neo4j code if avoidable. >>> > >>> > Thanks! >>> > Jennifer >>> > >>> > Exception: >>> > Caused by: java.nio.channels.ClosedChannelException >>> > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:88) >>> > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:613) >>> > at >>> > >>> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161) >>> > at >>> > >>> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:139) >>> > at >>> > >>> org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:285) >>> > at >>> > >>> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:160) >>> > at >>> > >>> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) >>> > at org.apache.lucene.store.DataInput.readVInt(DataInput.java:86) >>> > at >>> > >>> org.apache.lucene.index.codecs.DeltaBytesReader.read(DeltaBytesReader.java:40) >>> > at >>> > >>> org.apache.lucene.index.codecs.PrefixCodedTermsReader$FieldReader$SegmentTermsEnum.next(PrefixCodedTermsReader.java:469) >>> > at >>> > >>> org.apache.lucene.index.codecs.PrefixCodedTermsReader$FieldReader$SegmentTermsEnum.seek(PrefixCodedTermsReader.java:385) >>> > at org.apache.lucene.index.TermsEnum.seek(TermsEnum.java:68) >>> > at org.apache.lucene.index.Terms.docFreq(Terms.java:53) >>> > at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:898) >>> > at org.apache.lucene.index.IndexReader.docFreq(IndexReader.java:882) >>> > at >>> > org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:687) _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

