[jira] [Resolved] (LUCENE-9422) Detailed logging for MergePolicy$MergeException stack trace
[ https://issues.apache.org/jira/browse/LUCENE-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viral Gandhi resolved LUCENE-9422. -- Resolution: Invalid I think this was happening due to our modification in ConcurrentMergeSchedulerWrapper, I'll reopen if we can reproduce on a clean Lucene clone. > Detailed logging for MergePolicy$MergeException stack trace > > > Key: LUCENE-9422 > URL: https://issues.apache.org/jira/browse/LUCENE-9422 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > > > We hit the following exception: > {code:java} > Uncaught exception: org.apache.lucene.index.MergePolicy$MergeException: > java.lang.IllegalStateException: files were not computed yet; segment=_3g5 > maxDoc=3095 in thread Thread[Lucene Merge Thread #456,5,main] > org.apache.lucene.index.MergePolicy$MergeException: > java.lang.IllegalStateException: files were not computed yet; segment=_3g5 > maxDoc=3095 > at > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) > Caused by: java.lang.IllegalStateException: files were not computed yet; > segment=_3g5 maxDoc=3095 > at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:176) > at > org.apache.lucene.index.SegmentCommitInfo.files(SegmentCommitInfo.java:228) > at org.apache.lucene.index.IndexWriter$2.mergeFinished(IndexWriter.java:3181) > at > org.apache.lucene.index.IndexWriter.closeMergeReaders(IndexWriter.java:) > at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4744) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4170) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) > at > com.amazon.lucene.index.ConcurrentMergeSchedulerWrapper.doMerge(ConcurrentMergeSchedulerWrapper.java:64) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) > {code} > > After a merge thread hit an exception, and in trying to throw the exception, > Lucene called _SegmentInfo.files()_ which then threw another exception. Maybe > this caused in losing root cause exception? Having more details regarding the > root cause would have been helpful here. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9422) Detailed logging for MergePolicy$MergeException stack trace
Viral Gandhi created LUCENE-9422: Summary: Detailed logging for MergePolicy$MergeException stack trace Key: LUCENE-9422 URL: https://issues.apache.org/jira/browse/LUCENE-9422 Project: Lucene - Core Issue Type: Improvement Reporter: Viral Gandhi We hit the following exception: {code:java} Uncaught exception: org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalStateException: files were not computed yet; segment=_3g5 maxDoc=3095 in thread Thread[Lucene Merge Thread #456,5,main] org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalStateException: files were not computed yet; segment=_3g5 maxDoc=3095 at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) Caused by: java.lang.IllegalStateException: files were not computed yet; segment=_3g5 maxDoc=3095 at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:176) at org.apache.lucene.index.SegmentCommitInfo.files(SegmentCommitInfo.java:228) at org.apache.lucene.index.IndexWriter$2.mergeFinished(IndexWriter.java:3181) at org.apache.lucene.index.IndexWriter.closeMergeReaders(IndexWriter.java:) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4744) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4170) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) at com.amazon.lucene.index.ConcurrentMergeSchedulerWrapper.doMerge(ConcurrentMergeSchedulerWrapper.java:64) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) {code} After a merge thread hit an exception, and in trying to throw the exception, Lucene called _SegmentInfo.files()_ which then threw another exception. Maybe this caused in losing root cause exception? Having more details regarding the root cause would have been helpful here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viral Gandhi updated LUCENE-9378: - Description: Lucene 8.5.1 includes a change to always [compress BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused (~30%) reduction in our red-line QPS (throughput). We think users should be given some way to opt-in for this compression feature instead of always being enabled which can have a substantial query time cost as we saw during our upgrade. [~mikemccand] suggested one possible approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and UNCOMPRESSED) and allowing users to create a custom Codec subclassing the default Codec and pick the format they want. Idea is similar to Lucene50StoredFieldsFormat which has two modes, Mode.BEST_SPEED and Mode.BEST_COMPRESSION. Here's related issues for adding benchmark covering BINARY doc values query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] was: Lucene 8.5.1 includes a change to always [compress BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused (~30%) reduction in our red-line QPS (throughput). We think users should be given some way to opt-in for this compression feature instead of always being enabled which can have a substantial query time cost as we saw during our upgrade. [~mikemccand] suggested one possible approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) and allowing users to create a custom Codec subclassing the default Codec and pick the format they want. Idea is similar to Lucene50StoredFieldsFormat which has two modes, Mode.BEST_SPEED and Mode.BEST_COMPRESSION. Here's related issues for adding benchmark covering BINARY doc values query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viral Gandhi updated LUCENE-9378: - Description: Lucene 8.5.1 includes a change to always [compress BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused (~30%) reduction in our red-line QPS (throughput). We think users should be given some way to opt-in for this compression feature instead of always being enabled which can have a substantial query time cost as we saw during our upgrade. [~mikemccand] suggested one possible approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) and allowing users to create a custom Codec subclassing the default Codec and pick the format they want. Idea is similar to Lucene50StoredFieldsFormat which has two modes, Mode.BEST_SPEED and Mode.BEST_COMPRESSION. Here's related issues for adding benchmark covering BINARY doc values query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] was: Lucene 8.5.1 includes a change to always compress BinaryDocValues. This caused (~30%) reduction in our red-line QPS (throughput). We think users should be given some way to opt-in for this compression feature instead of always being enabled which can have a substantial query time cost as we saw during our upgrade. [~mikemccand] suggested one possible approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) and allowing users to create a custom Codec subclassing the default Codec and pick the format they want. Idea is similar to Lucene50StoredFieldsFormat which has two modes, Mode.BEST_SPEED and Mode.BEST_COMPRESSION. Here's related issues for adding benchmark covering BINARY doc values query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viral Gandhi updated LUCENE-9378: - Description: Lucene 8.5.1 includes a change to always compress BinaryDocValues. This caused (~30%) reduction in our red-line QPS (throughput). We think users should be given some way to opt-in for this compression feature instead of always being enabled which can have a substantial query time cost as we saw during our upgrade. [~mikemccand] suggested one possible approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) and allowing users to create a custom Codec subclassing the default Codec and pick the format they want. Idea is similar to Lucene50StoredFieldsFormat which has two modes, Mode.BEST_SPEED and Mode.BEST_COMPRESSION. Here's related issues for adding benchmark covering BINARY doc values query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] was: Lucene 8.5.1 includes a change to always [compress BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused (~30%) reduction in our red-line QPS (throughput). We think users should be given some way to opt-in for this compression feature instead of always being enabled which can have a substantial query time cost as we saw during our upgrade. [~mikemccand] suggested one possible approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) and allowing users to create a custom Codec subclassing the default Codec and pick the format they want. Idea is similar to Lucene50StoredFieldsFormat which has two modes, Mode.BEST_SPEED and Mode.BEST_COMPRESSION. Here' related issues for adding benchmark covering BINARY doc values query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > > Lucene 8.5.1 includes a change to always compress BinaryDocValues. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9211) Adding compression to BinaryDocValues storage
[ https://issues.apache.org/jira/browse/LUCENE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113384#comment-17113384 ] Viral Gandhi commented on LUCENE-9211: -- This improvement had a negative impact on our internal benchmarking when we tried to upgrade to Lucene 8.5.1. I have created an issue regarding that - https://issues.apache.org/jira/browse/LUCENE-9378. > Adding compression to BinaryDocValues storage > - > > Key: LUCENE-9211 > URL: https://issues.apache.org/jira/browse/LUCENE-9211 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Mark Harwood >Assignee: Mark Harwood >Priority: Minor > Labels: pull-request-available > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > While SortedSetDocValues can be used today to store identical values in a > compact form this is not effective for data with many unique values. > The proposal is that BinaryDocValues should be stored in LZ4 compressed > blocks which can dramatically reduce disk storage costs in many cases. The > proposal is blocks of a number of documents are stored as a single compressed > blob along with metadata that records offsets where the original document > values can be found in the uncompressed content. > There's a trade-off here between efficient compression (more docs-per-block = > better compression) and fast retrieval times (fewer docs-per-block = faster > read access for single values). A fixed block size of 32 docs seems like it > would be a reasonable compromise for most scenarios. > A PR is up for review here [https://github.com/apache/lucene-solr/pull/1234] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9378) Configurable compression for BinaryDocValues
Viral Gandhi created LUCENE-9378: Summary: Configurable compression for BinaryDocValues Key: LUCENE-9378 URL: https://issues.apache.org/jira/browse/LUCENE-9378 Project: Lucene - Core Issue Type: Improvement Reporter: Viral Gandhi Lucene 8.5.1 includes a change to always [compress BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused (~30%) reduction in our red-line QPS (throughput). We think users should be given some way to opt-in for this compression feature instead of always being enabled which can have a substantial query time cost as we saw during our upgrade. [~mikemccand] suggested one possible approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) and allowing users to create a custom Codec subclassing the default Codec and pick the format they want. Idea is similar to Lucene50StoredFieldsFormat which has two modes, Mode.BEST_SPEED and Mode.BEST_COMPRESSION. Here' related issues for adding benchmark covering BINARY doc values query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org