[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-12-04 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708899#comment-16708899
 ] 

Joel Bernstein edited comment on LUCENE-8374 at 12/4/18 3:47 PM:
-

I have a couple of thoughts about this ticket:

1) It probably should not have been committed to master so close to the 8.0 
release without consensus. I'm in agreement with reverting the commit unless 
consensus can be reached.

2) Is there a way to create another docValues type that users can select 
through configurations? This allows users to decide for themselves if they 
would like to make the tradeoffs that this patch introduces.


was (Author: joel.bernstein):
I have a couple of thoughts about this ticket:

1) It probably should not have been committed to master so close to the 8.0 
release without consensus. I'm in agreement with reverting the commit unless 
consensus can be reached.

2) Is there a way to create another docValues type that users can select 
through configurations. This allows users to decide for themselves if they 
would like to make the tradeoffs that this patch introduces? 

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: https://issues.apache.org/jira/browse/LUCENE-8374
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 7.5, master (8.0)
>Reporter: Toke Eskildsen
>Priority: Major
>  Labels: performance
> Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374_branch_7_3.patch, LUCENE-8374_branch_7_3.patch.20181005, 
> LUCENE-8374_branch_7_4.patch, LUCENE-8374_branch_7_5.patch, 
> LUCENE-8374_part_1.patch, LUCENE-8374_part_2.patch, LUCENE-8374_part_3.patch, 
> LUCENE-8374_part_4.patch, entire_index_logs.txt, 
> image-2018-10-24-07-30-06-663.png, image-2018-10-24-07-30-56-962.png, 
> single_vehicle_logs.txt, 
> start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png,
>  
> start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png
>
>
> The {{Lucene70DocValuesProducer}} has the internal classes 
> {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), 
> which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. 
> The value-ordinal is the index of the docID assuming an abstract tightly 
> packed monotonically increasing list of docIDs: If the docIDs with 
> corresponding values are {{[0, 4, 1432]}}, their value-ordinals will be {{[0, 
> 1, 2]}}.
> h2. Outer blocks
> The lookup structure of {{IndexedDISI}} consists of blocks of 2^16 values 
> (65536), where each block can be either {{ALL}}, {{DENSE}} (2^12 to 2^16 
> values) or {{SPARSE}} (< 2^12 values ~= 6%). Consequently blocks vary quite a 
> lot in size and ordinal resolving strategy.
> When a sparse Numeric DocValue is needed, the code first locates the block 
> containing the wanted docID flag. It does so by iterating blocks one-by-one 
> until it reaches the needed one, where each iteration requires a lookup in 
> the underlying {{IndexSlice}}. For a common memory mapped index, this 
> translates to either a cached request or a read operation. If a segment has 
> 6M documents, worst-case is 91 lookups. In our web archive, our segments has 
> ~300M values: A worst-case of 4577 lookups!
> One obvious solution is to use a lookup-table for blocks: A long[]-array with 
> an entry for each block. For 6M documents, that is < 1KB and would allow for 
> direct jumping (a single lookup) in all instances. Unfortunately this 
> lookup-table cannot be generated upfront when the writing of values is purely 
> streaming. It can be appended to the end of the stream before it is closed, 
> but without knowing the position of the lookup-table the reader cannot seek 
> to it.
> One strategy for creating such a lookup-table would be to generate it during 
> reads and cache it for next lookup. This does not fit directly into how 
> {{IndexedDISI}} currently works (it is created anew for each invocation), but 
> could probably be added with a little work. An advantage to this is that this 
> does not change the underlying format and thus could be used with existing 
> indexes.
> h2. The lookup structure inside each block
> If {{ALL}} of the 2^16 values are defined, the structure is empty and the 
> ordinal is simply the requested docID with some modulo and multiply math. 
> Nothing to improve there.
> If the block is {{DENSE}} (2^12 to 2^16 values are defined), a bitmap is used 
> and the number of set bits up to the wanted index (the docID modulo the block 
> origo) are counted. That bitmap is a long[1024], meaning that 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-12-04 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708899#comment-16708899
 ] 

Joel Bernstein edited comment on LUCENE-8374 at 12/4/18 3:46 PM:
-

I have a couple of thoughts about this ticket:

1) It probably should not have been committed to master so close to the 8.0 
release without consensus. I'm in agreement with reverting the commit unless 
consensus can be reached.

2) Is there a way to create another docValues type that users can select 
through configurations. This allows users to decide for themselves if they 
would like to make the tradeoffs that this patch introduces? 


was (Author: joel.bernstein):
I have a couple of thoughts about this ticket:

1) It probably should not have been committed to master so close to the 8.0 
release without consensus. I'm in agreement with reverting the commit unless 
consensus can be reached.

2) Is there a way to create another docValues type that users can select 
through configurations. This allows users to decide for themselves if they 
would to make the tradeoffs that this patch introduces? 

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: https://issues.apache.org/jira/browse/LUCENE-8374
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 7.5, master (8.0)
>Reporter: Toke Eskildsen
>Priority: Major
>  Labels: performance
> Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374_branch_7_3.patch, LUCENE-8374_branch_7_3.patch.20181005, 
> LUCENE-8374_branch_7_4.patch, LUCENE-8374_branch_7_5.patch, 
> LUCENE-8374_part_1.patch, LUCENE-8374_part_2.patch, LUCENE-8374_part_3.patch, 
> LUCENE-8374_part_4.patch, entire_index_logs.txt, 
> image-2018-10-24-07-30-06-663.png, image-2018-10-24-07-30-56-962.png, 
> single_vehicle_logs.txt, 
> start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png,
>  
> start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png
>
>
> The {{Lucene70DocValuesProducer}} has the internal classes 
> {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), 
> which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. 
> The value-ordinal is the index of the docID assuming an abstract tightly 
> packed monotonically increasing list of docIDs: If the docIDs with 
> corresponding values are {{[0, 4, 1432]}}, their value-ordinals will be {{[0, 
> 1, 2]}}.
> h2. Outer blocks
> The lookup structure of {{IndexedDISI}} consists of blocks of 2^16 values 
> (65536), where each block can be either {{ALL}}, {{DENSE}} (2^12 to 2^16 
> values) or {{SPARSE}} (< 2^12 values ~= 6%). Consequently blocks vary quite a 
> lot in size and ordinal resolving strategy.
> When a sparse Numeric DocValue is needed, the code first locates the block 
> containing the wanted docID flag. It does so by iterating blocks one-by-one 
> until it reaches the needed one, where each iteration requires a lookup in 
> the underlying {{IndexSlice}}. For a common memory mapped index, this 
> translates to either a cached request or a read operation. If a segment has 
> 6M documents, worst-case is 91 lookups. In our web archive, our segments has 
> ~300M values: A worst-case of 4577 lookups!
> One obvious solution is to use a lookup-table for blocks: A long[]-array with 
> an entry for each block. For 6M documents, that is < 1KB and would allow for 
> direct jumping (a single lookup) in all instances. Unfortunately this 
> lookup-table cannot be generated upfront when the writing of values is purely 
> streaming. It can be appended to the end of the stream before it is closed, 
> but without knowing the position of the lookup-table the reader cannot seek 
> to it.
> One strategy for creating such a lookup-table would be to generate it during 
> reads and cache it for next lookup. This does not fit directly into how 
> {{IndexedDISI}} currently works (it is created anew for each invocation), but 
> could probably be added with a little work. An advantage to this is that this 
> does not change the underlying format and thus could be used with existing 
> indexes.
> h2. The lookup structure inside each block
> If {{ALL}} of the 2^16 values are defined, the structure is empty and the 
> ordinal is simply the requested docID with some modulo and multiply math. 
> Nothing to improve there.
> If the block is {{DENSE}} (2^12 to 2^16 values are defined), a bitmap is used 
> and the number of set bits up to the wanted index (the docID modulo the block 
> origo) are counted. That bitmap is a long[1024], meaning that worst 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-12-04 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708899#comment-16708899
 ] 

Joel Bernstein edited comment on LUCENE-8374 at 12/4/18 3:45 PM:
-

I have a couple of thoughts about this ticket:

1) It probably should not have been committed to master so close to the 8.0 
release without consensus. I'm in agreement with reverting the commit unless 
there consensus can be reached.

2) Is there a way to create another docValues type that users can select 
through configurations. This allows users to decide for themselves if they 
would to make the tradeoffs that this patch introduces? 


was (Author: joel.bernstein):
I have a couple of thoughts about this ticket:

1) It probably should no have been committed to master so close to the 8.0 
release without consensus. I'm in agreement with reverting the commit unless 
there consensus can be reached.

2) Is there a way to create another docValues type that users can select 
through configurations. This allows users to decide for themselves if they 
would to make the tradeoffs that this patch introduces? 

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: https://issues.apache.org/jira/browse/LUCENE-8374
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 7.5, master (8.0)
>Reporter: Toke Eskildsen
>Priority: Major
>  Labels: performance
> Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374_branch_7_3.patch, LUCENE-8374_branch_7_3.patch.20181005, 
> LUCENE-8374_branch_7_4.patch, LUCENE-8374_branch_7_5.patch, 
> LUCENE-8374_part_1.patch, LUCENE-8374_part_2.patch, LUCENE-8374_part_3.patch, 
> LUCENE-8374_part_4.patch, entire_index_logs.txt, 
> image-2018-10-24-07-30-06-663.png, image-2018-10-24-07-30-56-962.png, 
> single_vehicle_logs.txt, 
> start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png,
>  
> start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png
>
>
> The {{Lucene70DocValuesProducer}} has the internal classes 
> {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), 
> which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. 
> The value-ordinal is the index of the docID assuming an abstract tightly 
> packed monotonically increasing list of docIDs: If the docIDs with 
> corresponding values are {{[0, 4, 1432]}}, their value-ordinals will be {{[0, 
> 1, 2]}}.
> h2. Outer blocks
> The lookup structure of {{IndexedDISI}} consists of blocks of 2^16 values 
> (65536), where each block can be either {{ALL}}, {{DENSE}} (2^12 to 2^16 
> values) or {{SPARSE}} (< 2^12 values ~= 6%). Consequently blocks vary quite a 
> lot in size and ordinal resolving strategy.
> When a sparse Numeric DocValue is needed, the code first locates the block 
> containing the wanted docID flag. It does so by iterating blocks one-by-one 
> until it reaches the needed one, where each iteration requires a lookup in 
> the underlying {{IndexSlice}}. For a common memory mapped index, this 
> translates to either a cached request or a read operation. If a segment has 
> 6M documents, worst-case is 91 lookups. In our web archive, our segments has 
> ~300M values: A worst-case of 4577 lookups!
> One obvious solution is to use a lookup-table for blocks: A long[]-array with 
> an entry for each block. For 6M documents, that is < 1KB and would allow for 
> direct jumping (a single lookup) in all instances. Unfortunately this 
> lookup-table cannot be generated upfront when the writing of values is purely 
> streaming. It can be appended to the end of the stream before it is closed, 
> but without knowing the position of the lookup-table the reader cannot seek 
> to it.
> One strategy for creating such a lookup-table would be to generate it during 
> reads and cache it for next lookup. This does not fit directly into how 
> {{IndexedDISI}} currently works (it is created anew for each invocation), but 
> could probably be added with a little work. An advantage to this is that this 
> does not change the underlying format and thus could be used with existing 
> indexes.
> h2. The lookup structure inside each block
> If {{ALL}} of the 2^16 values are defined, the structure is empty and the 
> ordinal is simply the requested docID with some modulo and multiply math. 
> Nothing to improve there.
> If the block is {{DENSE}} (2^12 to 2^16 values are defined), a bitmap is used 
> and the number of set bits up to the wanted index (the docID modulo the block 
> origo) are counted. That bitmap is a long[1024], meaning that 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-12-04 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708899#comment-16708899
 ] 

Joel Bernstein edited comment on LUCENE-8374 at 12/4/18 3:45 PM:
-

I have a couple of thoughts about this ticket:

1) It probably should no have been committed to master so close to the 8.0 
release without consensus. I'm in agreement with reverting the commit unless 
there consensus can be reached.

2) Is there a way to create another docValues type that users can select 
through configurations. This allows users to decide for themselves if they 
would to make the tradeoffs that this patch introduces? 


was (Author: joel.bernstein):
I have couple of thoughts about this ticket:

1) It probably should no have been committed to master so close to the 8.0 
release without consensus. I'm in agreement with reverting the commit unless 
there consensus can be reached.

2) Is there a way to create another docValues type that users can select 
through configurations. This allows users to decide for themselves if they 
would to make the tradeoffs that this patch introduces? 

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: https://issues.apache.org/jira/browse/LUCENE-8374
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 7.5, master (8.0)
>Reporter: Toke Eskildsen
>Priority: Major
>  Labels: performance
> Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374_branch_7_3.patch, LUCENE-8374_branch_7_3.patch.20181005, 
> LUCENE-8374_branch_7_4.patch, LUCENE-8374_branch_7_5.patch, 
> LUCENE-8374_part_1.patch, LUCENE-8374_part_2.patch, LUCENE-8374_part_3.patch, 
> LUCENE-8374_part_4.patch, entire_index_logs.txt, 
> image-2018-10-24-07-30-06-663.png, image-2018-10-24-07-30-56-962.png, 
> single_vehicle_logs.txt, 
> start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png,
>  
> start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png
>
>
> The {{Lucene70DocValuesProducer}} has the internal classes 
> {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), 
> which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. 
> The value-ordinal is the index of the docID assuming an abstract tightly 
> packed monotonically increasing list of docIDs: If the docIDs with 
> corresponding values are {{[0, 4, 1432]}}, their value-ordinals will be {{[0, 
> 1, 2]}}.
> h2. Outer blocks
> The lookup structure of {{IndexedDISI}} consists of blocks of 2^16 values 
> (65536), where each block can be either {{ALL}}, {{DENSE}} (2^12 to 2^16 
> values) or {{SPARSE}} (< 2^12 values ~= 6%). Consequently blocks vary quite a 
> lot in size and ordinal resolving strategy.
> When a sparse Numeric DocValue is needed, the code first locates the block 
> containing the wanted docID flag. It does so by iterating blocks one-by-one 
> until it reaches the needed one, where each iteration requires a lookup in 
> the underlying {{IndexSlice}}. For a common memory mapped index, this 
> translates to either a cached request or a read operation. If a segment has 
> 6M documents, worst-case is 91 lookups. In our web archive, our segments has 
> ~300M values: A worst-case of 4577 lookups!
> One obvious solution is to use a lookup-table for blocks: A long[]-array with 
> an entry for each block. For 6M documents, that is < 1KB and would allow for 
> direct jumping (a single lookup) in all instances. Unfortunately this 
> lookup-table cannot be generated upfront when the writing of values is purely 
> streaming. It can be appended to the end of the stream before it is closed, 
> but without knowing the position of the lookup-table the reader cannot seek 
> to it.
> One strategy for creating such a lookup-table would be to generate it during 
> reads and cache it for next lookup. This does not fit directly into how 
> {{IndexedDISI}} currently works (it is created anew for each invocation), but 
> could probably be added with a little work. An advantage to this is that this 
> does not change the underlying format and thus could be used with existing 
> indexes.
> h2. The lookup structure inside each block
> If {{ALL}} of the 2^16 values are defined, the structure is empty and the 
> ordinal is simply the requested docID with some modulo and multiply math. 
> Nothing to improve there.
> If the block is {{DENSE}} (2^12 to 2^16 values are defined), a bitmap is used 
> and the number of set bits up to the wanted index (the docID modulo the block 
> origo) are counted. That bitmap is a long[1024], meaning that 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-12-04 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708899#comment-16708899
 ] 

Joel Bernstein edited comment on LUCENE-8374 at 12/4/18 3:45 PM:
-

I have a couple of thoughts about this ticket:

1) It probably should not have been committed to master so close to the 8.0 
release without consensus. I'm in agreement with reverting the commit unless 
consensus can be reached.

2) Is there a way to create another docValues type that users can select 
through configurations. This allows users to decide for themselves if they 
would to make the tradeoffs that this patch introduces? 


was (Author: joel.bernstein):
I have a couple of thoughts about this ticket:

1) It probably should not have been committed to master so close to the 8.0 
release without consensus. I'm in agreement with reverting the commit unless 
there consensus can be reached.

2) Is there a way to create another docValues type that users can select 
through configurations. This allows users to decide for themselves if they 
would to make the tradeoffs that this patch introduces? 

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: https://issues.apache.org/jira/browse/LUCENE-8374
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 7.5, master (8.0)
>Reporter: Toke Eskildsen
>Priority: Major
>  Labels: performance
> Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374_branch_7_3.patch, LUCENE-8374_branch_7_3.patch.20181005, 
> LUCENE-8374_branch_7_4.patch, LUCENE-8374_branch_7_5.patch, 
> LUCENE-8374_part_1.patch, LUCENE-8374_part_2.patch, LUCENE-8374_part_3.patch, 
> LUCENE-8374_part_4.patch, entire_index_logs.txt, 
> image-2018-10-24-07-30-06-663.png, image-2018-10-24-07-30-56-962.png, 
> single_vehicle_logs.txt, 
> start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png,
>  
> start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png
>
>
> The {{Lucene70DocValuesProducer}} has the internal classes 
> {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), 
> which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. 
> The value-ordinal is the index of the docID assuming an abstract tightly 
> packed monotonically increasing list of docIDs: If the docIDs with 
> corresponding values are {{[0, 4, 1432]}}, their value-ordinals will be {{[0, 
> 1, 2]}}.
> h2. Outer blocks
> The lookup structure of {{IndexedDISI}} consists of blocks of 2^16 values 
> (65536), where each block can be either {{ALL}}, {{DENSE}} (2^12 to 2^16 
> values) or {{SPARSE}} (< 2^12 values ~= 6%). Consequently blocks vary quite a 
> lot in size and ordinal resolving strategy.
> When a sparse Numeric DocValue is needed, the code first locates the block 
> containing the wanted docID flag. It does so by iterating blocks one-by-one 
> until it reaches the needed one, where each iteration requires a lookup in 
> the underlying {{IndexSlice}}. For a common memory mapped index, this 
> translates to either a cached request or a read operation. If a segment has 
> 6M documents, worst-case is 91 lookups. In our web archive, our segments has 
> ~300M values: A worst-case of 4577 lookups!
> One obvious solution is to use a lookup-table for blocks: A long[]-array with 
> an entry for each block. For 6M documents, that is < 1KB and would allow for 
> direct jumping (a single lookup) in all instances. Unfortunately this 
> lookup-table cannot be generated upfront when the writing of values is purely 
> streaming. It can be appended to the end of the stream before it is closed, 
> but without knowing the position of the lookup-table the reader cannot seek 
> to it.
> One strategy for creating such a lookup-table would be to generate it during 
> reads and cache it for next lookup. This does not fit directly into how 
> {{IndexedDISI}} currently works (it is created anew for each invocation), but 
> could probably be added with a little work. An advantage to this is that this 
> does not change the underlying format and thus could be used with existing 
> indexes.
> h2. The lookup structure inside each block
> If {{ALL}} of the 2^16 values are defined, the structure is empty and the 
> ordinal is simply the requested docID with some modulo and multiply math. 
> Nothing to improve there.
> If the block is {{DENSE}} (2^12 to 2^16 values are defined), a bitmap is used 
> and the number of set bits up to the wanted index (the docID modulo the block 
> origo) are counted. That bitmap is a long[1024], meaning that worst 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-12-04 Thread Toke Eskildsen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708764#comment-16708764
 ] 

Toke Eskildsen edited comment on LUCENE-8374 at 12/4/18 2:29 PM:
-

[~jpountz] something might have been lost in translation or culture here. For 
me, "preference" is a neutral term, used in this context to signal that I 
understand that you do not see having a search-time cache as the right choice.

We have
 # A DV-implementation (pre LUCENE-8374 commit) that is O\(n), with poor 
scaling up
 # An instantly working band aid
 # A plan to implement a better index-time solution for future segments

I don't claim that search-time is a superior solution to index-time, when 
compared neutrally. You are absolutely right that it does lazy-load of data and 
it is technologically better to do index-time. But it is a red herring. The 
issue is whether the band-aid makes sense for existing setups.

Now, this is all on master, so no need for panic. But at some point we will 
roll a 7.7 or an 8.0. Hopefully we will have the index-time implementation 
ready then. But on the off chance that we don't, let me ask you: Do you think 
it would be better to have no DV jump-tables at all, rather than have the 
search-time implementation only in a release?


was (Author: toke):
[~jpountz] something might have been lost in translation or culture here. For 
me, "preference" is a neutral term, used in this context to signal that I 
understand that you do not see having a search-time cache as the right choice.

We have
 # A DV-implementation (pre LUCENE-8374 commit) that is O(n), with poor scaling 
up
 # An instantly working band aid
 # A plan to implement a better index-time solution for future segments

I don't claim that search-time is a superior solution to index-time, when 
compared neutrally. You are absolutely right that it does lazy-load of data and 
it is technologically better to do index-time. But it is a red herring. The 
issue is whether the band-aid makes sense for existing setups.

Now, this is all on master, so no need for panic. But at some point we will 
roll a 7.7 or an 8.0. Hopefully we will have the index-time implementation 
ready then. But on the off chance that we don't, let me ask you: Do you think 
it would be better to have no DV jump-tables at all, rather than have the 
search-time implementation only in a release?

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: https://issues.apache.org/jira/browse/LUCENE-8374
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 7.5, master (8.0)
>Reporter: Toke Eskildsen
>Priority: Major
>  Labels: performance
> Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374_branch_7_3.patch, LUCENE-8374_branch_7_3.patch.20181005, 
> LUCENE-8374_branch_7_4.patch, LUCENE-8374_branch_7_5.patch, 
> LUCENE-8374_part_1.patch, LUCENE-8374_part_2.patch, LUCENE-8374_part_3.patch, 
> LUCENE-8374_part_4.patch, entire_index_logs.txt, 
> image-2018-10-24-07-30-06-663.png, image-2018-10-24-07-30-56-962.png, 
> single_vehicle_logs.txt, 
> start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png,
>  
> start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png
>
>
> The {{Lucene70DocValuesProducer}} has the internal classes 
> {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), 
> which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. 
> The value-ordinal is the index of the docID assuming an abstract tightly 
> packed monotonically increasing list of docIDs: If the docIDs with 
> corresponding values are {{[0, 4, 1432]}}, their value-ordinals will be {{[0, 
> 1, 2]}}.
> h2. Outer blocks
> The lookup structure of {{IndexedDISI}} consists of blocks of 2^16 values 
> (65536), where each block can be either {{ALL}}, {{DENSE}} (2^12 to 2^16 
> values) or {{SPARSE}} (< 2^12 values ~= 6%). Consequently blocks vary quite a 
> lot in size and ordinal resolving strategy.
> When a sparse Numeric DocValue is needed, the code first locates the block 
> containing the wanted docID flag. It does so by iterating blocks one-by-one 
> until it reaches the needed one, where each iteration requires a lookup in 
> the underlying {{IndexSlice}}. For a common memory mapped index, this 
> translates to either a cached request or a read operation. If a segment has 
> 6M documents, worst-case is 91 lookups. In our web archive, our segments has 
> ~300M values: A worst-case of 4577 lookups!
> One obvious solution is to use a lookup-table for blocks: A 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-12-04 Thread Toke Eskildsen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708764#comment-16708764
 ] 

Toke Eskildsen edited comment on LUCENE-8374 at 12/4/18 2:29 PM:
-

[~jpountz] something might have been lost in translation or culture here. For 
me, "preference" is a neutral term, used in this context to signal that I 
understand that you do not see having a search-time cache as the right choice.

We have
 # A DV-implementation (pre LUCENE-8374 commit) that is O(n), with poor scaling 
up
 # An instantly working band aid
 # A plan to implement a better index-time solution for future segments

I don't claim that search-time is a superior solution to index-time, when 
compared neutrally. You are absolutely right that it does lazy-load of data and 
it is technologically better to do index-time. But it is a red herring. The 
issue is whether the band-aid makes sense for existing setups.

Now, this is all on master, so no need for panic. But at some point we will 
roll a 7.7 or an 8.0. Hopefully we will have the index-time implementation 
ready then. But on the off chance that we don't, let me ask you: Do you think 
it would be better to have no DV jump-tables at all, rather than have the 
search-time implementation only in a release?


was (Author: toke):
[~jpountz] something might have been lost in translation or culture here. For 
me, "preference" is a neutral term, used in this context to signal that I 
understand that you do not see having a search-time cache as the right choice.

We have
 # A DV-implementation (pre LUCENE-8374 commit) that is O(n), with poor scaling 
up
 # An instantly working band aid
 # A plan to implement a better index-time solution for future segments

I don't claim that search-time is a superior solution to index-time, when 
compared neutrally. You are absolutely right that it does lazy-load of data and 
it is technologically better to do index-time. But it is a red herring. The 
issue is whether the band-aid makes sense for existing setups.

Now, this is all on master, so no need for panic. But at some point we will 
roll a 7.7 or an 8.0. Hopefully we will have the index-time implementation 
ready then. But on the off chance that we don't, let me ask you: Do you think 
it would be better to have no DV jump-tables at all, rather than have the 
search-time implementation only in a release?

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: https://issues.apache.org/jira/browse/LUCENE-8374
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 7.5, master (8.0)
>Reporter: Toke Eskildsen
>Priority: Major
>  Labels: performance
> Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374_branch_7_3.patch, LUCENE-8374_branch_7_3.patch.20181005, 
> LUCENE-8374_branch_7_4.patch, LUCENE-8374_branch_7_5.patch, 
> LUCENE-8374_part_1.patch, LUCENE-8374_part_2.patch, LUCENE-8374_part_3.patch, 
> LUCENE-8374_part_4.patch, entire_index_logs.txt, 
> image-2018-10-24-07-30-06-663.png, image-2018-10-24-07-30-56-962.png, 
> single_vehicle_logs.txt, 
> start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png,
>  
> start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png
>
>
> The {{Lucene70DocValuesProducer}} has the internal classes 
> {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), 
> which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. 
> The value-ordinal is the index of the docID assuming an abstract tightly 
> packed monotonically increasing list of docIDs: If the docIDs with 
> corresponding values are {{[0, 4, 1432]}}, their value-ordinals will be {{[0, 
> 1, 2]}}.
> h2. Outer blocks
> The lookup structure of {{IndexedDISI}} consists of blocks of 2^16 values 
> (65536), where each block can be either {{ALL}}, {{DENSE}} (2^12 to 2^16 
> values) or {{SPARSE}} (< 2^12 values ~= 6%). Consequently blocks vary quite a 
> lot in size and ordinal resolving strategy.
> When a sparse Numeric DocValue is needed, the code first locates the block 
> containing the wanted docID flag. It does so by iterating blocks one-by-one 
> until it reaches the needed one, where each iteration requires a lookup in 
> the underlying {{IndexSlice}}. For a common memory mapped index, this 
> translates to either a cached request or a read operation. If a segment has 
> 6M documents, worst-case is 91 lookups. In our web archive, our segments has 
> ~300M values: A worst-case of 4577 lookups!
> One obvious solution is to use a lookup-table for blocks: A 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-10-25 Thread Toke Eskildsen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663475#comment-16663475
 ] 

Toke Eskildsen edited comment on LUCENE-8374 at 10/25/18 2:30 PM:
--

[~tpunder] Profiling, nice! Must. Resist. Urge. To. Look. At. 
{{MonotonicDocValues}} :)

The differences for the profiles for "Faceting on most of the index" seems like 
jitter to me. Good to verify that.

"Faceting on data for a single vehicle" has multiple interesting measurements:

 * I am surprised that {{FieldCacheImpl#Cache.get}} takes 68,485ms for cached 
and only 4,792ms for non-cached. The LUCENE-8374 caches are lazy-created so I 
would expect the start-up overhead for them to be paid under 
{{SparseNumericDocValues.advanceExact}} or similar, but not as part of 
requesting the cache. I wonder why there is such a difference?
 * {{SparseNumericDocValues.advanceExact}} is reduced by a factor 6 with 
caching. This factor should scale to the number of documents in each segment: 
If you try the 300M parent/child structure and the number of segments stays 
about the same, I would expect the factor to increase to 15-20.
 * {{IndexedDISI.advanceBlock}} + {{IndexedDISI.rankSkip}} for cached is 
46,540ms and {{IndexedDISI.advanceBlock}} is 79,185ms for non-cached, telling 
me that the BLOCK-cache is a noticeable win in itself. Glad to see that as my 
artificial test showed a lesser win.
 * I wonder why {{IndexedDISI.advanceExactWithinBlock}} is not visible in any 
of the profiles. Could it be that practically all your documents has values 
defined for the fields used for faceting?

In the current patch, statistics for LUCENE-8374 are written to stdout, which 
Solr collects in the logfile {{solr--console.log}}. If possible, I would 
like to see the content of that file as it contains a break-down of the 
DocValues-fields, thanks.


was (Author: toke):
[~tpunder] Profiling, nice! Must. Resist. Urge. To. Look. At. 
{{MonotonicDocValues}} :)

The differences for the profiles for "Faceting on most of the index" seems like 
jitter to me. Good to verify that.

"Faceting on data for a single vehicle" has multiple interesting measurements:

 * I am surprised that {FieldCacheImpl#Cache.get} takes 68,485ms for cached and 
only 4,792ms for non-cached. The LUCENE-8374 caches are lazy-created so I would 
expect the start-up overhead for them to be paid under 
{SparseNumericDocValues.advanceExact} or similar, but not as part of requesting 
the cache. I wonder why there is such a difference?
 * {SparseNumericDocValues.advanceExact} is reduced by a factor 6 with caching. 
This factor should scale to the number of documents in each segment: If you try 
the 300M parent/child structure and the number of segments stays about the 
same, I would expect the factor to increase to 15-20.
 * {IndexedDISI.advanceBlock} + {IndexedDISI.rankSkip} for cached is 46,540ms 
and {IndexedDISI.advanceBlock} is 79,185ms for non-cached, telling me that the 
BLOCK-cache is a noticeable win in itself. Glad to see that as my artificial 
test showed a lesser win.
 * I wonder why {{IndexedDISI.advanceExactWithinBlock} is not visible in any of 
the profiles. Could it be that practically all your documents has values 
defined for the fields used for faceting?

In the current patch, statistics for LUCENE-8374 are written to stdout, which 
Solr collects in the logfile {solr--console.log}. If possible, I would like 
to see the content of that file as it contains a break-down of the 
DocValues-fields, thanks.

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: https://issues.apache.org/jira/browse/LUCENE-8374
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 7.5, master (8.0)
>Reporter: Toke Eskildsen
>Priority: Major
>  Labels: performance
> Fix For: 7.6
>
> Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374_branch_7_3.patch, 
> LUCENE-8374_branch_7_3.patch.20181005, LUCENE-8374_branch_7_4.patch, 
> LUCENE-8374_branch_7_5.patch, image-2018-10-24-07-30-06-663.png, 
> image-2018-10-24-07-30-56-962.png, 
> start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png,
>  
> start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png
>
>
> The {{Lucene70DocValuesProducer}} has the internal classes 
> {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), 
> which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. 
> The value-ordinal is the index of the docID assuming an abstract tightly 
> packed monotonically increasing list of docIDs: If the docIDs with 
> 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-10-24 Thread Tim Underwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662370#comment-16662370
 ] 

Tim Underwood edited comment on LUCENE-8374 at 10/24/18 2:53 PM:
-

For what it's worth my original testing was with unpatched vs patched.

I did some quick YourKit CPU sampling on my 2 tests with cached 
enabled/disable.  I've excluded "[Wall Time] java.lang.Thread.sleep(long)" 
which is why the "CPU (What-If)" tab is active.

I should also note that all of my testing (both caches enabled and disabled) 
has been done with 3 other patches applied (in addition to this one):  
SOLR-12875 (ArrayIndexOutOfBoundsException in JSON Facets), SOLR-12878 (large 
performance issue in FacetFieldProcessorByHashDV) and SOLR-12882 (lambda memory 
allocation removal in FacetFieldProcessorByHashDV.collectValFirstPhase).

Note: Image thumbnails do not seem to be working right so I've included the 
file name so you can find the right image attachment that goes with the smaller 
resized image.

h2. Test 1 - Faceting on data for a single vehicle
h3. Caches Enabled:

 

!image-2018-10-24-07-30-06-663.png|width=600px!

{noformat}image-2018-10-24-07-30-06-663.png{noformat}

h3. Caches Disabled:

 

!image-2018-10-24-07-30-56-962.png|width=600px!

{noformat}image-2018-10-24-07-30-56-962.png{noformat}
 
h2. Test 2 - Faceting on most of the index

 
h3. Caches Enabled:

!start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png|width=600px!
  
{noformat}start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png{noformat}

h3. Caches Disabled:

!start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png|width=600px!
  
{noformat}start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png{noformat}
 

 


was (Author: tpunder):
For what it's worth my original testing was with unpatched vs patched.

I did some quick YourKit CPU sampling on my 2 tests with cached 
enabled/disable.  I've excluded "[Wall Time] java.lang.Thread.sleep(long)" 
which is why the "CPU (What-If)" tab is active.

I should also note that all of my testing (both caches enabled and disabled) 
has been done with 3 other patches applied (in addition to this one):  
SOLR-12875 (ArrayIndexOutOfBoundsException in JSON Facets), SOLR-12878 (large 
performance issue in FacetFieldProcessorByHashDV) and SOLR-12882 (lambda memory 
allocation removal in FacetFieldProcessorByHashDV.collectValFirstPhase).
h2. Test 1 - Faceting on data for a single vehicle
h3. Caches Enabled:

 

!image-2018-10-24-07-30-06-663.png|width=600px!

 
h3. Caches Disabled:

 

!image-2018-10-24-07-30-56-962.png|width=600px!

 
h2. Test 2 - Faceting on most of the index

 
h3. Caches Enabled:

!start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png|width=600px!
  
h3. Caches Disabled:

!start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png|width=600px!
  

 

 

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: https://issues.apache.org/jira/browse/LUCENE-8374
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 7.5, master (8.0)
>Reporter: Toke Eskildsen
>Priority: Major
>  Labels: performance
> Fix For: 7.6
>
> Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374_branch_7_3.patch, 
> LUCENE-8374_branch_7_3.patch.20181005, LUCENE-8374_branch_7_4.patch, 
> LUCENE-8374_branch_7_5.patch, image-2018-10-24-07-30-06-663.png, 
> image-2018-10-24-07-30-56-962.png, 
> start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png,
>  
> start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png
>
>
> The {{Lucene70DocValuesProducer}} has the internal classes 
> {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), 
> which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. 
> The value-ordinal is the index of the docID assuming an abstract tightly 
> packed monotonically increasing list of docIDs: If the docIDs with 
> corresponding values are {{[0, 4, 1432]}}, their value-ordinals will be {{[0, 
> 1, 2]}}.
> h2. Outer blocks
> The lookup structure of {{IndexedDISI}} consists of blocks of 2^16 values 
> (65536), where each block can be either {{ALL}}, {{DENSE}} (2^12 to 2^16 
> values) or {{SPARSE}} (< 2^12 values ~= 6%). Consequently blocks vary quite a 
> lot in size and ordinal resolving strategy.
> When a sparse Numeric DocValue is needed, the 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-10-24 Thread Tim Underwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662370#comment-16662370
 ] 

Tim Underwood edited comment on LUCENE-8374 at 10/24/18 2:48 PM:
-

For what it's worth my original testing was with unpatched vs patched.

I did some quick YourKit CPU sampling on my 2 tests with cached 
enabled/disable.  I've excluded "[Wall Time] java.lang.Thread.sleep(long)" 
which is why the "CPU (What-If)" tab is active.

I should also note that all of my testing (both caches enabled and disabled) 
has been done with 3 other patches applied (in addition to this one):  
SOLR-12875 (ArrayIndexOutOfBoundsException in JSON Facets), SOLR-12878 (large 
performance issue in FacetFieldProcessorByHashDV) and SOLR-12882 (lambda memory 
allocation removal in FacetFieldProcessorByHashDV.collectValFirstPhase).
h2. Test 1 - Faceting on data for a single vehicle
h3. Caches Enabled:

 

!image-2018-10-24-07-30-06-663.png|width=600px!

 
h3. Caches Disabled:

 

!image-2018-10-24-07-30-56-962.png|width=600px!

 
h2. Test 2 - Faceting on most of the index

 
h3. Caches Enabled:

!start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png|width=600px!
  
h3. Caches Disabled:

!start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png|width=600px!
  

 

 


was (Author: tpunder):
For what it's worth my original testing was with unpatched vs patched.

I did some quick YourKit CPU sampling on my 2 tests with cached 
enabled/disable.  I've excluded "[Wall Time] java.lang.Thread.sleep(long)" 
which is why the "CPU (What-If)" tab is active.

I should also note that all of my testing (both caches enabled and disabled) 
has been done with 3 other patches applied (in addition to this one):  
SOLR-12875 (ArrayIndexOutOfBoundsException in JSON Facets), SOLR-12878 (large 
performance issue in FacetFieldProcessorByHashDV) and SOLR-12882 (lambda memory 
allocation removal in FacetFieldProcessorByHashDV.collectValFirstPhase).
h2. Test 1 - Faceting on data for a single vehicle
h3. Caches Enabled:

 

!image-2018-10-24-07-30-06-663.png|thumbnail!

 
h3. Caches Disabled:

 

!image-2018-10-24-07-30-56-962.png|thumbnail!

 
h2. Test 2 - Faceting on most of the index

 
h3. Caches Enabled:

!start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png|thumbnail!
  
h3. Caches Disabled:

!start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png|thumbnail!
  

 

 

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: https://issues.apache.org/jira/browse/LUCENE-8374
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 7.5, master (8.0)
>Reporter: Toke Eskildsen
>Priority: Major
>  Labels: performance
> Fix For: 7.6
>
> Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374_branch_7_3.patch, 
> LUCENE-8374_branch_7_3.patch.20181005, LUCENE-8374_branch_7_4.patch, 
> LUCENE-8374_branch_7_5.patch, image-2018-10-24-07-30-06-663.png, 
> image-2018-10-24-07-30-56-962.png, 
> start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png,
>  
> start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png
>
>
> The {{Lucene70DocValuesProducer}} has the internal classes 
> {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), 
> which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. 
> The value-ordinal is the index of the docID assuming an abstract tightly 
> packed monotonically increasing list of docIDs: If the docIDs with 
> corresponding values are {{[0, 4, 1432]}}, their value-ordinals will be {{[0, 
> 1, 2]}}.
> h2. Outer blocks
> The lookup structure of {{IndexedDISI}} consists of blocks of 2^16 values 
> (65536), where each block can be either {{ALL}}, {{DENSE}} (2^12 to 2^16 
> values) or {{SPARSE}} (< 2^12 values ~= 6%). Consequently blocks vary quite a 
> lot in size and ordinal resolving strategy.
> When a sparse Numeric DocValue is needed, the code first locates the block 
> containing the wanted docID flag. It does so by iterating blocks one-by-one 
> until it reaches the needed one, where each iteration requires a lookup in 
> the underlying {{IndexSlice}}. For a common memory mapped index, this 
> translates to either a cached request or a read operation. If a segment has 
> 6M documents, worst-case is 91 lookups. In our web archive, our segments has 
> ~300M values: A worst-case of 4577 lookups!
> One obvious solution is to use a lookup-table for blocks: A 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-10-24 Thread Tim Underwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662370#comment-16662370
 ] 

Tim Underwood edited comment on LUCENE-8374 at 10/24/18 2:47 PM:
-

For what it's worth my original testing was with unpatched vs patched.

I did some quick YourKit CPU sampling on my 2 tests with cached 
enabled/disable.  I've excluded "[Wall Time] java.lang.Thread.sleep(long)" 
which is why the "CPU (What-If)" tab is active.

I should also note that all of my testing (both caches enabled and disabled) 
has been done with 3 other patches applied (in addition to this one):  
SOLR-12875 (ArrayIndexOutOfBoundsException in JSON Facets), SOLR-12878 (large 
performance issue in FacetFieldProcessorByHashDV) and SOLR-12882 (lambda memory 
allocation removal in FacetFieldProcessorByHashDV.collectValFirstPhase).
h2. Test 1 - Faceting on data for a single vehicle
h3. Caches Enabled:

 

!image-2018-10-24-07-30-06-663.png|thumbnail!

 
h3. Caches Disabled:

 

!image-2018-10-24-07-30-56-962.png|thumbnail!

 
h2. Test 2 - Faceting on most of the index

 
h3. Caches Enabled:

!start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png|thumbnail!
  
h3. Caches Disabled:

!start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png|thumbnail!
  

 

 


was (Author: tpunder):
For what it's worth my original testing was with unpatched vs patched.

I did some quick YourKit CPU sampling on my 2 tests with cached 
enabled/disable.  I've excluded "[Wall Time] java.lang.Thread.sleep(long)" 
which is why the "CPU (What-If)" tab is active.

I should also note that all of my testing (both caches enabled and disabled) 
has been done with 3 other patches applied (in addition to this one):  
SOLR-12875 (ArrayIndexOutOfBoundsException in JSON Facets), SOLR-12878 (large 
performance issue in FacetFieldProcessorByHashDV) and SOLR-12882 (lambda memory 
allocation removal in FacetFieldProcessorByHashDV.collectValFirstPhase).
h2. Test 1 - Faceting on data for a single vehicle
h3. Caches Enabled:

 

!image-2018-10-24-07-30-06-663.png!

 
h3. Caches Disabled:

 

!image-2018-10-24-07-30-56-962.png!

 
h2. Test 2 - Faceting on most of the index

 
h3. Caches Enabled:

!start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png!
  
h3. Caches Disabled:

!start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png!
  

 

 

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: https://issues.apache.org/jira/browse/LUCENE-8374
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 7.5, master (8.0)
>Reporter: Toke Eskildsen
>Priority: Major
>  Labels: performance
> Fix For: 7.6
>
> Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, 
> LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374_branch_7_3.patch, 
> LUCENE-8374_branch_7_3.patch.20181005, LUCENE-8374_branch_7_4.patch, 
> LUCENE-8374_branch_7_5.patch, image-2018-10-24-07-30-06-663.png, 
> image-2018-10-24-07-30-56-962.png, 
> start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png,
>  
> start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png
>
>
> The {{Lucene70DocValuesProducer}} has the internal classes 
> {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), 
> which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. 
> The value-ordinal is the index of the docID assuming an abstract tightly 
> packed monotonically increasing list of docIDs: If the docIDs with 
> corresponding values are {{[0, 4, 1432]}}, their value-ordinals will be {{[0, 
> 1, 2]}}.
> h2. Outer blocks
> The lookup structure of {{IndexedDISI}} consists of blocks of 2^16 values 
> (65536), where each block can be either {{ALL}}, {{DENSE}} (2^12 to 2^16 
> values) or {{SPARSE}} (< 2^12 values ~= 6%). Consequently blocks vary quite a 
> lot in size and ordinal resolving strategy.
> When a sparse Numeric DocValue is needed, the code first locates the block 
> containing the wanted docID flag. It does so by iterating blocks one-by-one 
> until it reaches the needed one, where each iteration requires a lookup in 
> the underlying {{IndexSlice}}. For a common memory mapped index, this 
> translates to either a cached request or a read operation. If a segment has 
> 6M documents, worst-case is 91 lookups. In our web archive, our segments has 
> ~300M values: A worst-case of 4577 lookups!
> One obvious solution is to use a lookup-table for blocks: A long[]-array with 
> an entry for each block. For 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-08-27 Thread Toke Eskildsen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593536#comment-16593536
 ] 

Toke Eskildsen edited comment on LUCENE-8374 at 8/27/18 11:47 AM:
--

My previous stats script was slightly off. I isolated the block / 
DENSE-structures and got

{{Index: 890GB / 51 fields / 307M docs}}
{{ Blocks: Total 238971 (blocks: ALL=102693, SPARSE=28739, DENSE=102421, 
EMPTY=5118) / 1866KB}}
{{ DENSE-Rank: 28118KB (raw DENSE data: 102421*8KB = 819368KB)}}
{{ BPV(long): 7 / 654KB}}
{{ Total: 30653KB, 13 seconds start up}}

DENSE is by far the biggest consumer of cache-space in our setup. Interestingly 
enough, the vBPV-caching was the ones that gave us by far the biggest benefit, 
for the few long fields that we have.

I looked at skip lists, both in the 
[MultiLevelSkipListWriter|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java]
 and [on Wikipedia|[https://en.wikipedia.org/wiki/Skip_list].] As I understand 
it they are essentially a {{rank+select}} implementation that allows for 
varying-size skips and works well with a linked list that can be modified. The 
varying-size and mutability does not seem to be used/relevant for Lucene.

What I don't really understand is the benefit of skip list's multi-level 
approach in this case. How would a skip list be better than the current 
direct-lookup in the array of longs representing offset+bitcount? If the point 
is to save further memory, the block-offsets could be stated for every 4th 
block or so, just as the skip lists does. But the current overhead of 2MB for a 
rather large segment does not seem problematic to me and it does mean that 0 
superfluous blocks needs to be seeked.

New point: I would very much like to make this issue two-step.

As the performance regression gets worse linear with segment size, it seems 
plausible that the people that will benefit the most from the patch are also 
people where a full re-index is not trivial. From my point of view, search-time 
caching should be present for present segments, independently of codec-changes 
to future segments. The current patch needs polishing, but is functionwise 
ready and does exactly this.

As the search-time cache is non-destructive, rolling this as step 1 would be a 
conservative update with easy rollback. Step 2 is of course to change the codec 
to embed the caching structures, if they prove their worth.


was (Author: toke):
My previous stats script was slightly off. I isolated the block / 
DENSE-structures and got

{{Index: 890GB / 51 fields / 307M docs}}
Blocks: Total 238971 (blocks: ALL=102693, SPARSE=28739, DENSE=102421, 
EMPTY=5118) / 1866KB
DENSE-Rank: 28118KB (raw DENSE data: 102421*8KB = 819368KB)
 BPV(long): 7 / 654KB
 Total: 30653KB, 13 seconds start up

DENSE is by far the biggest consumer of cache-space in our setup. Interestingly 
enough, the vBPV-caching was the ones that gave us by far the biggest benefit, 
for the few long fields that we have.

I looked at skip lists, both in the 
[MultiLevelSkipListWriter|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java]
 and [on Wikipedia|[https://en.wikipedia.org/wiki/Skip_list].] As I understand 
it they are essentially a {{rank+select}} implementation that allows for 
varying-size skips and works well with a linked list that can be modified. The 
varying-size and mutability does not seem to be used/relevant for Lucene.

What I don't really understand is the benefit of skip list's multi-level 
approach in this case. How would a skip list be better than the current 
direct-lookup in the array of longs representing offset+bitcount? If the point 
is to save further memory, the block-offsets could be stated for every 4th 
block or so, just as the skip lists does. But the current overhead of 2MB for a 
rather large segment does not seem problematic to me and it does mean that 0 
superfluous blocks needs to be seeked.

New point: I would very much like to make this issue two-step.

As the performance regression gets worse linear with segment size, it seems 
plausible that the people that will benefit the most from the patch are also 
people where a full re-index is not trivial. From my point of view, search-time 
caching should be present for present segments, independently of codec-changes 
to future segments. The current patch needs polishing, but is functionwise 
ready and does exactly this.

As the search-time cache is non-destructive, rolling this as step 1 would be a 
conservative update with easy rollback. Step 2 is of course to change the codec 
to embed the caching structures, if they prove their worth.

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-08-27 Thread Toke Eskildsen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593536#comment-16593536
 ] 

Toke Eskildsen edited comment on LUCENE-8374 at 8/27/18 11:46 AM:
--

My previous stats script was slightly off. I isolated the block / 
DENSE-structures and got

{{Index: 890GB / 51 fields / 307M docs}}
Blocks: Total 238971 (blocks: ALL=102693, SPARSE=28739, DENSE=102421, 
EMPTY=5118) / 1866KB
DENSE-Rank: 28118KB (raw DENSE data: 102421*8KB = 819368KB)
 BPV(long): 7 / 654KB
 Total: 30653KB, 13 seconds start up

DENSE is by far the biggest consumer of cache-space in our setup. Interestingly 
enough, the vBPV-caching was the ones that gave us by far the biggest benefit, 
for the few long fields that we have.

I looked at skip lists, both in the 
[MultiLevelSkipListWriter|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java]
 and [on Wikipedia|[https://en.wikipedia.org/wiki/Skip_list].] As I understand 
it they are essentially a {{rank+select}} implementation that allows for 
varying-size skips and works well with a linked list that can be modified. The 
varying-size and mutability does not seem to be used/relevant for Lucene.

What I don't really understand is the benefit of skip list's multi-level 
approach in this case. How would a skip list be better than the current 
direct-lookup in the array of longs representing offset+bitcount? If the point 
is to save further memory, the block-offsets could be stated for every 4th 
block or so, just as the skip lists does. But the current overhead of 2MB for a 
rather large segment does not seem problematic to me and it does mean that 0 
superfluous blocks needs to be seeked.

New point: I would very much like to make this issue two-step.

As the performance regression gets worse linear with segment size, it seems 
plausible that the people that will benefit the most from the patch are also 
people where a full re-index is not trivial. From my point of view, search-time 
caching should be present for present segments, independently of codec-changes 
to future segments. The current patch needs polishing, but is functionwise 
ready and does exactly this.

As the search-time cache is non-destructive, rolling this as step 1 would be a 
conservative update with easy rollback. Step 2 is of course to change the codec 
to embed the caching structures, if they prove their worth.


was (Author: toke):
My previous stats script was slightly off. I isolated the block / 
DENSE-structures and got

{{Index: 890GB / 51 fields / 307M docs}}
{{ Blocks: Total 238971 (blocks: ALL=102693, SPARSE=28739, DENSE=102421, 
EMPTY=5118) / 1866KB}}
{{ DENSE-Rank: 28118KB (raw DENSE data: 102421*8KB = 819368KB)}}
{{ vBPV(long): 7 / 654KB}}
{{ Total: 30653KB, 13 seconds start up}}

DENSE is by far the biggest consumer of cache-space in our setup. Interestingly 
enough, the vBPV-caching was the ones that gave us by far the biggest benefit, 
for the few long fields that we have.

I looked at skip lists, both in the 
[MultiLevelSkipListWriter|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java]
 and [on Wikipedia|[https://en.wikipedia.org/wiki/Skip_list].] As I understand 
it they are essentially a {{rank+select}} implementation that allows for 
varying-size skips and works well with a linked list that can be modified. The 
varying-size and mutability does not seem to be used/relevant for Lucene.

What I don't really understand is the benefit of skip list's multi-level 
approach in this case. How would a skip list be better than the current 
direct-lookup in the array of longs representing offset+bitcount? If the point 
is to save further memory, the block-offsets could be stated for every 4th 
block or so, just as the skip lists does. But the current overhead of 2MB for a 
rather large segment does not seem problematic to me and it does mean that 0 
superfluous blocks needs to be seeked.

New point: I would very much like to make this issue two-step.

As the performance regression gets worse linear with segment size, it seems 
plausible that the people that will benefit the most from the patch are also 
people where a full re-index is not trivial. From my point of view, search-time 
caching should be present for present segments, independently of codec-changes 
to future segments. The current patch needs polishing, but is functionwise 
ready and does exactly this.

As the search-time cache is non-destructive, rolling this as step 1 would be a 
conservative update with easy rollback. Step 2 is of course to change the codec 
to embed the caching structures, if they prove their worth.

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: 

[jira] [Comment Edited] (LUCENE-8374) Reduce reads for sparse DocValues

2018-08-27 Thread Toke Eskildsen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593536#comment-16593536
 ] 

Toke Eskildsen edited comment on LUCENE-8374 at 8/27/18 11:45 AM:
--

My previous stats script was slightly off. I isolated the block / 
DENSE-structures and got

{{Index: 890GB / 51 fields / 307M docs}}
{{ Blocks: Total 238971 (blocks: ALL=102693, SPARSE=28739, DENSE=102421, 
EMPTY=5118) / 1866KB}}
{{ DENSE-Rank: 28118KB (raw DENSE data: 102421*8KB = 819368KB)}}
{{ vBPV(long): 7 / 654KB}}
{{ Total: 30653KB, 13 seconds start up}}

DENSE is by far the biggest consumer of cache-space in our setup. Interestingly 
enough, the vBPV-caching was the ones that gave us by far the biggest benefit, 
for the few long fields that we have.

I looked at skip lists, both in the 
[MultiLevelSkipListWriter|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java]
 and [on Wikipedia|[https://en.wikipedia.org/wiki/Skip_list].] As I understand 
it they are essentially a {{rank+select}} implementation that allows for 
varying-size skips and works well with a linked list that can be modified. The 
varying-size and mutability does not seem to be used/relevant for Lucene.

What I don't really understand is the benefit of skip list's multi-level 
approach in this case. How would a skip list be better than the current 
direct-lookup in the array of longs representing offset+bitcount? If the point 
is to save further memory, the block-offsets could be stated for every 4th 
block or so, just as the skip lists does. But the current overhead of 2MB for a 
rather large segment does not seem problematic to me and it does mean that 0 
superfluous blocks needs to be seeked.

New point: I would very much like to make this issue two-step.

As the performance regression gets worse linear with segment size, it seems 
plausible that the people that will benefit the most from the patch are also 
people where a full re-index is not trivial. From my point of view, search-time 
caching should be present for present segments, independently of codec-changes 
to future segments. The current patch needs polishing, but is functionwise 
ready and does exactly this.

As the search-time cache is non-destructive, rolling this as step 1 would be a 
conservative update with easy rollback. Step 2 is of course to change the codec 
to embed the caching structures, if they prove their worth.


was (Author: toke):
My previous stats script was slightly off. I isolated the block / 
DENSE-structures and got

```

Index: 890GB / 51 fields / 307M docs
Blocks: Total 238971 (blocks: ALL=102693, SPARSE=28739, DENSE=102421, 
EMPTY=5118) / 1866KB
DENSE-Rank: 28118KB (raw DENSE data: 102421*8KB = 819368KB)
vBPV(long): 7 / 654KB
Total: 30653KB, 13 seconds start up

```

DENSE is by far the biggest consumer of cache-space in our setup. Interestingly 
enough, the vBPV-caching was the ones that gave us by far the biggest benefit, 
for the few long fields that we have.

I looked at skip lists, both in the 
[MultiLevelSkipListWriter|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java]
 and [on Wikipedia|[https://en.wikipedia.org/wiki/Skip_list].] As I understand 
it they are essentially a {{rank+select}} implementation that allows for 
varying-size skips and works well with a linked list that can be modified. The 
varying-size and mutability does not seem to be used/relevant for Lucene.

What I don't really understand is the benefit of skip list's multi-level 
approach in this case. How would a skip list be better than the current 
direct-lookup in the array of longs representing offset+bitcount? If the point 
is to save further memory, the block-offsets could be stated for every 4th 
block or so, just as the skip lists does. But the current overhead of 2MB for a 
rather large segment does not seem problematic to me and it does mean that 0 
superfluous blocks needs to be seeked.

New point: I would very much like to make this issue two-step.

As the performance regression gets worse linear with segment size, it seems 
plausible that the people that will benefit the most from the patch are also 
people where a full re-index is not trivial. From my point of view, search-time 
caching should be present for present segments, independently of codec-changes 
to future segments. The current patch needs polishing, but is functionwise 
ready and does exactly this.

As the search-time cache is non-destructive, rolling this as step 1 would be a 
conservative update with easy rollback. Step 2 is of course to change the codec 
to embed the caching structures, if they prove their worth.

> Reduce reads for sparse DocValues
> -
>
> Key: LUCENE-8374
> URL: