[jira] [Resolved] (LUCENE-9422) Detailed logging for MergePolicy$MergeException stack trace

2020-06-30 Thread Viral Gandhi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viral Gandhi resolved LUCENE-9422.
--
Resolution: Invalid

I think this was happening due to our modification in 
ConcurrentMergeSchedulerWrapper, I'll reopen if we can reproduce on a clean 
Lucene clone.

> Detailed logging for MergePolicy$MergeException stack trace 
> 
>
> Key: LUCENE-9422
> URL: https://issues.apache.org/jira/browse/LUCENE-9422
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
>
>  
> We hit the following exception:
> {code:java}
> Uncaught exception: org.apache.lucene.index.MergePolicy$MergeException: 
> java.lang.IllegalStateException: files were not computed yet; segment=_3g5 
> maxDoc=3095 in thread Thread[Lucene Merge Thread #456,5,main] 
> org.apache.lucene.index.MergePolicy$MergeException: 
> java.lang.IllegalStateException: files were not computed yet; segment=_3g5 
> maxDoc=3095
>  at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704)
>  at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
> Caused by: java.lang.IllegalStateException: files were not computed yet; 
> segment=_3g5 maxDoc=3095
>  at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:176)
>  at 
> org.apache.lucene.index.SegmentCommitInfo.files(SegmentCommitInfo.java:228)
>  at org.apache.lucene.index.IndexWriter$2.mergeFinished(IndexWriter.java:3181)
>  at 
> org.apache.lucene.index.IndexWriter.closeMergeReaders(IndexWriter.java:)
>  at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4744)
>  at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4170)
>  at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
>  at 
> com.amazon.lucene.index.ConcurrentMergeSchedulerWrapper.doMerge(ConcurrentMergeSchedulerWrapper.java:64)
>  at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
> {code}
>  
> After a merge thread hit an exception, and in trying to throw the exception, 
> Lucene called _SegmentInfo.files()_ which then threw another exception. Maybe 
> this caused in losing root cause exception? Having more details regarding the 
> root cause would have been helpful here.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9422) Detailed logging for MergePolicy$MergeException stack trace

2020-06-29 Thread Viral Gandhi (Jira)
Viral Gandhi created LUCENE-9422:


 Summary: Detailed logging for MergePolicy$MergeException stack 
trace 
 Key: LUCENE-9422
 URL: https://issues.apache.org/jira/browse/LUCENE-9422
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Viral Gandhi


 

We hit the following exception:
{code:java}
Uncaught exception: org.apache.lucene.index.MergePolicy$MergeException: 
java.lang.IllegalStateException: files were not computed yet; segment=_3g5 
maxDoc=3095 in thread Thread[Lucene Merge Thread #456,5,main] 
org.apache.lucene.index.MergePolicy$MergeException: 
java.lang.IllegalStateException: files were not computed yet; segment=_3g5 
maxDoc=3095
 at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704)
 at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
Caused by: java.lang.IllegalStateException: files were not computed yet; 
segment=_3g5 maxDoc=3095
 at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:176)
 at org.apache.lucene.index.SegmentCommitInfo.files(SegmentCommitInfo.java:228)
 at org.apache.lucene.index.IndexWriter$2.mergeFinished(IndexWriter.java:3181)
 at org.apache.lucene.index.IndexWriter.closeMergeReaders(IndexWriter.java:)
 at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4744)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4170)
 at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
 at 
com.amazon.lucene.index.ConcurrentMergeSchedulerWrapper.doMerge(ConcurrentMergeSchedulerWrapper.java:64)
 at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
{code}
 

After a merge thread hit an exception, and in trying to throw the exception, 
Lucene called _SegmentInfo.files()_ which then threw another exception. Maybe 
this caused in losing root cause exception? Having more details regarding the 
root cause would have been helpful here.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-05-22 Thread Viral Gandhi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viral Gandhi updated LUCENE-9378:
-
Description: 
Lucene 8.5.1 includes a change to always [compress 
BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused 
(~30%) reduction in our red-line QPS (throughput). 

We think users should be given some way to opt-in for this compression feature 
instead of always being enabled which can have a substantial query time cost as 
we saw during our upgrade. [~mikemccand] suggested one possible approach by 
introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and UNCOMPRESSED) 
and allowing users to create a custom Codec subclassing the default Codec and 
pick the format they want.

Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
Mode.BEST_SPEED and Mode.BEST_COMPRESSION.

Here's related issues for adding benchmark covering BINARY doc values 
query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]

  was:
Lucene 8.5.1 includes a change to always [compress 
BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused 
(~30%) reduction in our red-line QPS (throughput). 

We think users should be given some way to opt-in for this compression feature 
instead of always being enabled which can have a substantial query time cost as 
we saw during our upgrade. [~mikemccand] suggested one possible approach by 
introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) 
and allowing users to create a custom Codec subclassing the default Codec and 
pick the format they want.

Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
Mode.BEST_SPEED and Mode.BEST_COMPRESSION.

Here's related issues for adding benchmark covering BINARY doc values 
query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]


> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-05-21 Thread Viral Gandhi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viral Gandhi updated LUCENE-9378:
-
Description: 
Lucene 8.5.1 includes a change to always [compress 
BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused 
(~30%) reduction in our red-line QPS (throughput). 

We think users should be given some way to opt-in for this compression feature 
instead of always being enabled which can have a substantial query time cost as 
we saw during our upgrade. [~mikemccand] suggested one possible approach by 
introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) 
and allowing users to create a custom Codec subclassing the default Codec and 
pick the format they want.

Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
Mode.BEST_SPEED and Mode.BEST_COMPRESSION.

Here's related issues for adding benchmark covering BINARY doc values 
query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]

  was:
Lucene 8.5.1 includes a change to always compress BinaryDocValues. This caused 
(~30%) reduction in our red-line QPS (throughput). 

We think users should be given some way to opt-in for this compression feature 
instead of always being enabled which can have a substantial query time cost as 
we saw during our upgrade. [~mikemccand] suggested one possible approach by 
introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) 
and allowing users to create a custom Codec subclassing the default Codec and 
pick the format they want.

Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
Mode.BEST_SPEED and Mode.BEST_COMPRESSION.

Here's related issues for adding benchmark covering BINARY doc values 
query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]


> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-05-21 Thread Viral Gandhi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viral Gandhi updated LUCENE-9378:
-
Description: 
Lucene 8.5.1 includes a change to always compress BinaryDocValues. This caused 
(~30%) reduction in our red-line QPS (throughput). 

We think users should be given some way to opt-in for this compression feature 
instead of always being enabled which can have a substantial query time cost as 
we saw during our upgrade. [~mikemccand] suggested one possible approach by 
introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) 
and allowing users to create a custom Codec subclassing the default Codec and 
pick the format they want.

Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
Mode.BEST_SPEED and Mode.BEST_COMPRESSION.

Here's related issues for adding benchmark covering BINARY doc values 
query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]

  was:
Lucene 8.5.1 includes a change to always [compress 
BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused 
(~30%) reduction in our red-line QPS (throughput). 

We think users should be given some way to opt-in for this compression feature 
instead of always being enabled which can have a substantial query time cost as 
we saw during our upgrade. [~mikemccand] suggested one possible approach by 
introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) 
and allowing users to create a custom Codec subclassing the default Codec and 
pick the format they want.

Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
Mode.BEST_SPEED and Mode.BEST_COMPRESSION.

Here' related issues for adding benchmark covering BINARY doc values query-time 
performance - [https://github.com/mikemccand/luceneutil/issues/61]


> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
>
> Lucene 8.5.1 includes a change to always compress BinaryDocValues. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9211) Adding compression to BinaryDocValues storage

2020-05-21 Thread Viral Gandhi (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113384#comment-17113384
 ] 

Viral Gandhi commented on LUCENE-9211:
--

This improvement had a negative impact on our internal benchmarking when we 
tried to upgrade to Lucene 8.5.1. I have created an issue regarding that - 
https://issues.apache.org/jira/browse/LUCENE-9378.

> Adding compression to BinaryDocValues storage
> -
>
> Key: LUCENE-9211
> URL: https://issues.apache.org/jira/browse/LUCENE-9211
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 8.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While SortedSetDocValues can be used today to store identical values in a 
> compact form this is not effective for data with many unique values.
> The proposal is that BinaryDocValues should be stored in LZ4 compressed 
> blocks which can dramatically reduce disk storage costs in many cases. The 
> proposal is blocks of a number of documents are stored as a single compressed 
> blob along with metadata that records offsets where the original document 
> values can be found in the uncompressed content.
> There's a trade-off here between efficient compression (more docs-per-block = 
> better compression) and fast retrieval times (fewer docs-per-block = faster 
> read access for single values). A fixed block size of 32 docs seems like it 
> would be a reasonable compromise for most scenarios.
> A PR is up for review here [https://github.com/apache/lucene-solr/pull/1234]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-05-21 Thread Viral Gandhi (Jira)
Viral Gandhi created LUCENE-9378:


 Summary: Configurable compression for BinaryDocValues
 Key: LUCENE-9378
 URL: https://issues.apache.org/jira/browse/LUCENE-9378
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Viral Gandhi


Lucene 8.5.1 includes a change to always [compress 
BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This caused 
(~30%) reduction in our red-line QPS (throughput). 

We think users should be given some way to opt-in for this compression feature 
instead of always being enabled which can have a substantial query time cost as 
we saw during our upgrade. [~mikemccand] suggested one possible approach by 
introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and UNCOMPRESSED) 
and allowing users to create a custom Codec subclassing the default Codec and 
pick the format they want.

Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
Mode.BEST_SPEED and Mode.BEST_COMPRESSION.

Here' related issues for adding benchmark covering BINARY doc values query-time 
performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org