[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after

2018-04-20 Thread Christine Poerschke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446269#comment-16446269
 ] 

Christine Poerschke commented on SOLR-12232:


bq. ... Is this perhaps more properly a Lucene issue?

Good question. JIRA can support issue moves between projects -- I think, let me 
try that here, SOLR-12232 would become a forwarding link to the LUCENE issue.

> NativeFSLockFactory loses the channel when a thread is interrupted and the 
> SolrCore becomes unusable after
> --
>
> Key: SOLR-12232
> URL: https://issues.apache.org/jira/browse/SOLR-12232
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.1.1
>Reporter: Jeff Miller
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: NativeFSLockFactory, locking
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> The condition is rare for us and seems basically a race.  If a thread that is 
> running just happens to have the FileChannel open for NativeFSLockFactory and 
> is interrupted, the channel is closed since it extends 
> [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html]
> Unfortunately this means the Solr Core has to be unloaded and reopened to 
> make the core usable again as the ensureValid check forever throws an 
> exception after.
> org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an 
> external force: 
> NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>  exclusive invalid],creationTime=2018-04-06T21:45:11Z) at 
> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178)
>  at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
>  at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
>  
> Proposed solution is using AsynchronousFileChannel instead, since this is 
> only operating on a lock and .size method



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after

2018-04-17 Thread Jeff Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441909#comment-16441909
 ] 

Jeff Miller commented on SOLR-12232:


"Please don't shoot the messenger "

Sometimes it's about how you deliver it, not the message itself.  Your point is 
understood and appreciated.

> NativeFSLockFactory loses the channel when a thread is interrupted and the 
> SolrCore becomes unusable after
> --
>
> Key: SOLR-12232
> URL: https://issues.apache.org/jira/browse/SOLR-12232
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.1.1
>Reporter: Jeff Miller
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: NativeFSLockFactory, locking
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> The condition is rare for us and seems basically a race.  If a thread that is 
> running just happens to have the FileChannel open for NativeFSLockFactory and 
> is interrupted, the channel is closed since it extends 
> [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html]
> Unfortunately this means the Solr Core has to be unloaded and reopened to 
> make the core usable again as the ensureValid check forever throws an 
> exception after.
> org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an 
> external force: 
> NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>  exclusive invalid],creationTime=2018-04-06T21:45:11Z) at 
> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178)
>  at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
>  at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
>  
> Proposed solution is using AsynchronousFileChannel instead, since this is 
> only operating on a lock and .size method



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after

2018-04-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441898#comment-16441898
 ] 

Robert Muir commented on SOLR-12232:


{quote}
Understood. But we don't have to use NIO.
{quote}

Yes, use another lock factory or some alternative if you want. But this is NIO 
lock factory, and well it uses NIO. And its behavior is correct: its wrong to 
interrupt the NIO stuff. It is definitely OK to dictate that its wrong to 
interrupt NIO stuff, we document it that way for a reason, because its 
dangerous.

Lock validation and other checks here are important because they prevent screw 
crazy corruption-looking cases from showing up. Please don't shoot the 
messenger but fix the actual bugs instead (the perp calling interrupt on lucene 
threads).


> NativeFSLockFactory loses the channel when a thread is interrupted and the 
> SolrCore becomes unusable after
> --
>
> Key: SOLR-12232
> URL: https://issues.apache.org/jira/browse/SOLR-12232
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.1.1
>Reporter: Jeff Miller
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: NativeFSLockFactory, locking
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> The condition is rare for us and seems basically a race.  If a thread that is 
> running just happens to have the FileChannel open for NativeFSLockFactory and 
> is interrupted, the channel is closed since it extends 
> [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html]
> Unfortunately this means the Solr Core has to be unloaded and reopened to 
> make the core usable again as the ensureValid check forever throws an 
> exception after.
> org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an 
> external force: 
> NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>  exclusive invalid],creationTime=2018-04-06T21:45:11Z) at 
> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178)
>  at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
>  at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
>  
> Proposed solution is using AsynchronousFileChannel instead, since this is 
> only operating on a lock and .size method



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after

2018-04-17 Thread Jeff Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441890#comment-16441890
 ] 

Jeff Miller commented on SOLR-12232:


The thread interrupting is purposeful for our solution and won't be changing 
anytime soon due to external requirements.  It worked just fine for quite a few 
years until the call to ensureValid was added.  Since I saw no specific 
requirement for this class to close its file channel due to an interrupt 
exception it seemed a decent solution incase anyone else out there uses 
interrupts in any manner without removing the ensureValid call for us.  If no 
one sees value in this for Solr then so be it.

 

 

> NativeFSLockFactory loses the channel when a thread is interrupted and the 
> SolrCore becomes unusable after
> --
>
> Key: SOLR-12232
> URL: https://issues.apache.org/jira/browse/SOLR-12232
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.1.1
>Reporter: Jeff Miller
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: NativeFSLockFactory, locking
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> The condition is rare for us and seems basically a race.  If a thread that is 
> running just happens to have the FileChannel open for NativeFSLockFactory and 
> is interrupted, the channel is closed since it extends 
> [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html]
> Unfortunately this means the Solr Core has to be unloaded and reopened to 
> make the core usable again as the ensureValid check forever throws an 
> exception after.
> org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an 
> external force: 
> NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>  exclusive invalid],creationTime=2018-04-06T21:45:11Z) at 
> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178)
>  at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
>  at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
>  
> Proposed solution is using AsynchronousFileChannel instead, since this is 
> only operating on a lock and .size method



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after

2018-04-17 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441889#comment-16441889
 ] 

David Smiley commented on SOLR-12232:
-

By trade-off, I mean an app loses raw search speed (and you explained this 
well) but the app gains the ability to interrupt (cancel) a search task that is 
taking too long.  However wise we may be, I don't think we ought to dictate to 
all users/apps that doing this is fundamentally wrong (what you call a bug).

bq. If you interrupt lucene threads using nio ...

Understood.  But we don't have to use NIO.

> NativeFSLockFactory loses the channel when a thread is interrupted and the 
> SolrCore becomes unusable after
> --
>
> Key: SOLR-12232
> URL: https://issues.apache.org/jira/browse/SOLR-12232
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.1.1
>Reporter: Jeff Miller
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: NativeFSLockFactory, locking
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> The condition is rare for us and seems basically a race.  If a thread that is 
> running just happens to have the FileChannel open for NativeFSLockFactory and 
> is interrupted, the channel is closed since it extends 
> [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html]
> Unfortunately this means the Solr Core has to be unloaded and reopened to 
> make the core usable again as the ensureValid check forever throws an 
> exception after.
> org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an 
> external force: 
> NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>  exclusive invalid],creationTime=2018-04-06T21:45:11Z) at 
> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178)
>  at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
>  at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
>  
> Proposed solution is using AsynchronousFileChannel instead, since this is 
> only operating on a lock and .size method



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after

2018-04-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441878#comment-16441878
 ] 

Robert Muir commented on SOLR-12232:


There isn't a real tradeoff. I'm not even sure it counts as a "workaround". RAF 
must synchronize all reads so its basically like just only having one thread, 
searches will pile up. 

It has nothing to do with what i like or don't like. If you interrupt lucene 
threads using nio its gonna look nasty, probably like index corruption. the 
whole point of lockfactory is to detect bugs in the code: it found one here in 
solr (or some plugin or something). That's what needs to be fixed.

> NativeFSLockFactory loses the channel when a thread is interrupted and the 
> SolrCore becomes unusable after
> --
>
> Key: SOLR-12232
> URL: https://issues.apache.org/jira/browse/SOLR-12232
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.1.1
>Reporter: Jeff Miller
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: NativeFSLockFactory, locking
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> The condition is rare for us and seems basically a race.  If a thread that is 
> running just happens to have the FileChannel open for NativeFSLockFactory and 
> is interrupted, the channel is closed since it extends 
> [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html]
> Unfortunately this means the Solr Core has to be unloaded and reopened to 
> make the core usable again as the ensureValid check forever throws an 
> exception after.
> org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an 
> external force: 
> NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>  exclusive invalid],creationTime=2018-04-06T21:45:11Z) at 
> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178)
>  at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
>  at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
>  
> Proposed solution is using AsynchronousFileChannel instead, since this is 
> only operating on a lock and .size method



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after

2018-04-17 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441875#comment-16441875
 ] 

David Smiley commented on SOLR-12232:
-

Disclaimer: I haven't studied the approach in the patch.

Lucene has {{RAFDirectory}} (in misc) for apps that want to trade-off raw 
performance for interruptibility.  It uses RandomAccessFile and not NIO.  
Wouldn't it be appropriate to have a LockFactory impl that supports (safe) 
interruptability too, so they can be used together?  I'm not sure I'm getting 
your point Rob... are you saying, indirectly, that interrupt-*safe* IO is 
impossible?  Or maybe you don't like interruption at all so, in your opinion, 
anyone using it has made an error in judgement for using it?

> NativeFSLockFactory loses the channel when a thread is interrupted and the 
> SolrCore becomes unusable after
> --
>
> Key: SOLR-12232
> URL: https://issues.apache.org/jira/browse/SOLR-12232
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.1.1
>Reporter: Jeff Miller
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: NativeFSLockFactory, locking
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> The condition is rare for us and seems basically a race.  If a thread that is 
> running just happens to have the FileChannel open for NativeFSLockFactory and 
> is interrupted, the channel is closed since it extends 
> [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html]
> Unfortunately this means the Solr Core has to be unloaded and reopened to 
> make the core usable again as the ensureValid check forever throws an 
> exception after.
> org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an 
> external force: 
> NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>  exclusive invalid],creationTime=2018-04-06T21:45:11Z) at 
> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178)
>  at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
>  at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
>  
> Proposed solution is using AsynchronousFileChannel instead, since this is 
> only operating on a lock and .size method



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after

2018-04-17 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441844#comment-16441844
 ] 

Shawn Heisey commented on SOLR-12232:
-

[~rcmuir] likely has a much better understanding of the gory details than I do.

I've only written a few multi-threaded apps ... but I have *never* used 
Thread.interrupt.  What little reading I've done on the subject tells me that 
doing so is likely to cause problems.

Even if I were to research it and learn how to use interrupting properly, I 
would never use it on a thread that I didn't create -- especially not those in 
a comprehensive system like Solr or Lucene.


> NativeFSLockFactory loses the channel when a thread is interrupted and the 
> SolrCore becomes unusable after
> --
>
> Key: SOLR-12232
> URL: https://issues.apache.org/jira/browse/SOLR-12232
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.1.1
>Reporter: Jeff Miller
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: NativeFSLockFactory, locking
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> The condition is rare for us and seems basically a race.  If a thread that is 
> running just happens to have the FileChannel open for NativeFSLockFactory and 
> is interrupted, the channel is closed since it extends 
> [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html]
> Unfortunately this means the Solr Core has to be unloaded and reopened to 
> make the core usable again as the ensureValid check forever throws an 
> exception after.
> org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an 
> external force: 
> NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>  exclusive invalid],creationTime=2018-04-06T21:45:11Z) at 
> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178)
>  at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
>  at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
>  
> Proposed solution is using AsynchronousFileChannel instead, since this is 
> only operating on a lock and .size method



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after

2018-04-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441825#comment-16441825
 ] 

Robert Muir commented on SOLR-12232:


IMO it does not solve the problem. The correct fix is not to Thread.interrupt 
lucene threads using NIO apis. it is not safe to use Thread.interrupt with 
nio-based stuff with lucene: we document that. It is good that locking detected 
the error in the code (use of Thread.interrupt) because it can have much more 
dangerous impacts (e.g. loss of a reader). Asynchronous channels are too slow 
and wont help there. In the future, maybe its fixed in the JDK: 
http://mail.openjdk.java.net/pipermail/nio-dev/2018-March/004761.html

I don't think lucene should mask the problem here because it will not solve 
anything for these reasons. Please, fix the Thread.interrupt

> NativeFSLockFactory loses the channel when a thread is interrupted and the 
> SolrCore becomes unusable after
> --
>
> Key: SOLR-12232
> URL: https://issues.apache.org/jira/browse/SOLR-12232
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.1.1
>Reporter: Jeff Miller
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: NativeFSLockFactory, locking
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> The condition is rare for us and seems basically a race.  If a thread that is 
> running just happens to have the FileChannel open for NativeFSLockFactory and 
> is interrupted, the channel is closed since it extends 
> [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html]
> Unfortunately this means the Solr Core has to be unloaded and reopened to 
> make the core usable again as the ensureValid check forever throws an 
> exception after.
> org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an 
> external force: 
> NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>  exclusive invalid],creationTime=2018-04-06T21:45:11Z) at 
> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178)
>  at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
>  at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
>  
> Proposed solution is using AsynchronousFileChannel instead, since this is 
> only operating on a lock and .size method



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12232) NativeFSLockFactory loses the channel when a thread is interrupted and the SolrCore becomes unusable after

2018-04-17 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441816#comment-16441816
 ] 

Erick Erickson commented on SOLR-12232:
---

[~rcmuir ] [~mikemccand] Is this perhaps more properly a Lucene issue?

> NativeFSLockFactory loses the channel when a thread is interrupted and the 
> SolrCore becomes unusable after
> --
>
> Key: SOLR-12232
> URL: https://issues.apache.org/jira/browse/SOLR-12232
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.1.1
>Reporter: Jeff Miller
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: NativeFSLockFactory, locking
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> The condition is rare for us and seems basically a race.  If a thread that is 
> running just happens to have the FileChannel open for NativeFSLockFactory and 
> is interrupted, the channel is closed since it extends 
> [AbstractInterruptibleChannel|https://docs.oracle.com/javase/7/docs/api/java/nio/channels/spi/AbstractInterruptibleChannel.html]
> Unfortunately this means the Solr Core has to be unloaded and reopened to 
> make the core usable again as the ensureValid check forever throws an 
> exception after.
> org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an 
> external force: 
> NativeFSLock(path=data/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807
>  exclusive invalid],creationTime=2018-04-06T21:45:11Z) at 
> org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:178)
>  at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
>  at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
>  
> Proposed solution is using AsynchronousFileChannel instead, since this is 
> only operating on a lock and .size method



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org