[jira] [Created] (LUCENE-10680) UnifiedHighlighter's term extraction not working for some query rewrites
Yannick Welsch created LUCENE-10680: --- Summary: UnifiedHighlighter's term extraction not working for some query rewrites Key: LUCENE-10680 URL: https://issues.apache.org/jira/browse/LUCENE-10680 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Reporter: Yannick Welsch UnifiedHighlighter rewrites the query against an empty index when extracting the terms from the query (see [https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java#L149).|https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java#L149)] The rewrite step can unfortunately drop the terms that are to be extracted. Take for example the boolean query "+field:value -ConstantScore(FieldExistsQuery [field=other_field])" when highlighting on "field". The `FieldExistsQuery` rewrites on an empty index to a `MatchAllDocsQuery`, and as a `MUST_NOT` clause rewrites the overall boolean query to a `MatchNoDocsQuery`, dropping the `MUST` clause in the process, which means that the `field:value` term is not being extracted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10582) CombinedFieldQuery fails with distributed field statistics
Yannick Welsch created LUCENE-10582: --- Summary: CombinedFieldQuery fails with distributed field statistics Key: LUCENE-10582 URL: https://issues.apache.org/jira/browse/LUCENE-10582 Project: Lucene - Core Issue Type: Bug Components: modules/sandbox Reporter: Yannick Welsch CombinedFieldQuery does not properly combine distributed collection statistics, resulting in an IllegalArgumentException during searches. Originally surfaced in this Elasticsearch issue: https://github.com/elastic/elasticsearch/issues/82817 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10474) Avoid throwing StackOverflowError when creating RegExp
Yannick Welsch created LUCENE-10474: --- Summary: Avoid throwing StackOverflowError when creating RegExp Key: LUCENE-10474 URL: https://issues.apache.org/jira/browse/LUCENE-10474 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Yannick Welsch Creating a regular expression using Lucene's RegExp class can easily result in a StackOverflowError being thrown, for example when the input is larger than the maximum stack depth. Throwing a StackOverflowError isn't something a user would expect, and it isn't documented either. StackOverflowError is a user-unfriendly exception as it does not convey any intent that the user has done something wrong, but suggests a bug in the implementation. I would like Lucene to follow the [approach taken by the JDK|https://github.com/openjdk/jdk/blob/cab4ff64541393a974ea91e35167668ef0036804/src/java.base/share/classes/java/util/regex/Pattern.java#L1441] and throw an IllegalArgumentException instead to clearly mark this as an input that the implementation can't handle. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10235) LRUQueryCache should not count never-cacheable queries as a miss
Yannick Welsch created LUCENE-10235: --- Summary: LRUQueryCache should not count never-cacheable queries as a miss Key: LUCENE-10235 URL: https://issues.apache.org/jira/browse/LUCENE-10235 Project: Lucene - Core Issue Type: Improvement Reporter: Yannick Welsch Hit and miss counts of a cache are typically used to check how effective a caching layer is. While looking at a system that exhibited a very high miss to hit ratio, I took a closer look at Lucene's LRUQueryCache and noticed that it's treating the handling of queries as a miss that it would never ever even think about caching in the first place. (e.g. TermQuery and others mentioned in UsageTrackingQueryCachingPolicy.shouldNeverCache). The reason these are counted as a miss is that LRUQueryCache (scorerSupplier and bulkScorer methods) first does a lookup on the cache, incrementing hit or miss counters, and upon miss, only then checks QueryCachingPolicy.shouldCache to decide whether that query should be put into the cache. This issue is made more complex by the fact that QueryCachingPolicy.shouldCache is a stateful method, and cacheability of a query can change over time (e.g. after appearing N times). I'm opening this issue to discuss whether others also feel that the current way of accounting misses is unintuitive / confusing. I would also like to put forward a proposal to: * generalize the boolean QueryCachingPolicy.shouldCache method to return an enum instead (one of YES, NOT_RIGHT_NOW, NEVER), and only account queries that are (eventually) cacheable and not in the cache as a miss, * optionally introduce another metric for queries that are never cacheable, e.g. "ignored", and * optionally refine miss count into a count for items that are cacheable right away, and those that will eventually be cacheable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory
[ https://issues.apache.org/jira/browse/LUCENE-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053233#comment-17053233 ] Yannick Welsch commented on LUCENE-9264: I've opened a pull request for the removal (linked in this issue) and one for the deprecation (see sub-task). > Remove SimpleFSDirectory in favor of NIOFsDirectory > --- > > Key: LUCENE-9264 > URL: https://issues.apache.org/jira/browse/LUCENE-9264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Yannick Welsch >Priority: Minor > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > {{SimpleFSDirectory}} looks to duplicate what's already offered by > {{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is > using non-positional reads on the {{FileChannel}} (i.e., reads that are > stateful, changing the current position), and {{SimpleFSDirectory}} therefore > has to externally synchronize access to the read method. > On Windows, positional reads are not supported, which is why {{FileChannel}} > is already internally using synchronization to guarantee only access by one > thread at a time for positional reads (see {{read(ByteBuffer dst, long > position)}} in {{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, > which returns true on Windows) and the JDK implementation for Windows is > emulating positional reads by using non-positional ones, see > [http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139]. > This means that on Windows, there should be no difference between > {{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it > should be equally poor as both implementations only allow one thread at a > time to read). On Linux/Mac, {{NIOFsDirectory}} is superior to > {{SimpleFSDirectory}}, however, as positional reads (pread) can be done > concurrently. > My proposal is to remove {{SimpleFSDirectory}} and replace its uses with > {{NIOFsDirectory}}, given how similar these two directory implementations are > ({{SimpleFSDirectory}} isn't really simpler). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9265) Deprecate SimpleFSDirectory
Yannick Welsch created LUCENE-9265: -- Summary: Deprecate SimpleFSDirectory Key: LUCENE-9265 URL: https://issues.apache.org/jira/browse/LUCENE-9265 Project: Lucene - Core Issue Type: Sub-task Reporter: Yannick Welsch -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory
[ https://issues.apache.org/jira/browse/LUCENE-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yannick Welsch updated LUCENE-9264: --- Fix Version/s: master (9.0) > Remove SimpleFSDirectory in favor of NIOFsDirectory > --- > > Key: LUCENE-9264 > URL: https://issues.apache.org/jira/browse/LUCENE-9264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Yannick Welsch >Priority: Minor > Fix For: master (9.0) > > > {{SimpleFSDirectory}} looks to duplicate what's already offered by > {{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is > using non-positional reads on the {{FileChannel}} (i.e., reads that are > stateful, changing the current position), and {{SimpleFSDirectory}} therefore > has to externally synchronize access to the read method. > On Windows, positional reads are not supported, which is why {{FileChannel}} > is already internally using synchronization to guarantee only access by one > thread at a time for positional reads (see {{read(ByteBuffer dst, long > position)}} in {{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, > which returns true on Windows) and the JDK implementation for Windows is > emulating positional reads by using non-positional ones, see > [http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139]. > This means that on Windows, there should be no difference between > {{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it > should be equally poor as both implementations only allow one thread at a > time to read). On Linux/Mac, {{NIOFsDirectory}} is superior to > {{SimpleFSDirectory}}, however, as positional reads (pread) can be done > concurrently. > My proposal is to remove {{SimpleFSDirectory}} and replace its uses with > {{NIOFsDirectory}}, given how similar these two directory implementations are > ({{SimpleFSDirectory}} isn't really simpler). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9264) Remove SimpleFSDirectory in favor of NIOFsDirectory
Yannick Welsch created LUCENE-9264: -- Summary: Remove SimpleFSDirectory in favor of NIOFsDirectory Key: LUCENE-9264 URL: https://issues.apache.org/jira/browse/LUCENE-9264 Project: Lucene - Core Issue Type: Improvement Reporter: Yannick Welsch {{SimpleFSDirectory}} looks to duplicate what's already offered by {{NIOFsDirectory}}. The only difference is that {{SimpleFSDirectory}} is using non-positional reads on the {{FileChannel}} (i.e., reads that are stateful, changing the current position), and {{SimpleFSDirectory}} therefore has to externally synchronize access to the read method. On Windows, positional reads are not supported, which is why {{FileChannel}} is already internally using synchronization to guarantee only access by one thread at a time for positional reads (see {{read(ByteBuffer dst, long position)}} in {{FileChannelImpl}}, and {{FileDispatcher.needsPositionLock}}, which returns true on Windows) and the JDK implementation for Windows is emulating positional reads by using non-positional ones, see [http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/native/sun/nio/ch/FileDispatcherImpl.c#l139]. This means that on Windows, there should be no difference between {{NIOFsDirectory}} and {{SimpleFSDirectory}} in terms of performance (it should be equally poor as both implementations only allow one thread at a time to read). On Linux/Mac, {{NIOFsDirectory}} is superior to {{SimpleFSDirectory}}, however, as positional reads (pread) can be done concurrently. My proposal is to remove {{SimpleFSDirectory}} and replace its uses with {{NIOFsDirectory}}, given how similar these two directory implementations are ({{SimpleFSDirectory}} isn't really simpler). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org