subject:"\[jira\] \[Comment Edited\] \(LUCENE\-10655\) can we optimize visited bitset usage in HNSW graph search\/indexing\?"

[jira] [Comment Edited] (LUCENE-10655) can we optimize visited bitset usage in HNSW graph search/indexing?

2022-07-15 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567358#comment-17567358
 ] 

Michael Sokolov edited comment on LUCENE-10655 at 7/15/22 6:41 PM:
---

OK I was confused, and in fact we already do use SparseFixedBitSet for every 
layer, and we re-use the same one for the lifetime of a HnswGraphBuilder, which 
processes all the vectors. And I tried allocating afresh rather than clear-ing, 
and it was a bit slower. FixedBitSet.clear() is a hot-spot but it's not really 
clear what's to be done about it.

Perhaps since we are re-using we could try using a fully-allocated FixedBitSet 
(not sparse) when indexing? My concern is that over the lifetime of indexing 
many vectors, the sparse bit set will eventually become dense, but 
inefficiently. Oh I see - in fact that *is* what we do. Okay, returning to this 
again, I think I will try using that one for the fully-populated level only


was (Author: sokolov):
OK I was confused, and in fact we already do use SparseFixedBitSet for every 
layer, and we re-use the same one for the lifetime of a HnswGraphBuilder, which 
processes all the vectors. And I tried allocating afresh rather than clear-ing, 
and it was a bit slower. FixedBitSet.clear() is a hot-spot but it's not really 
clear what's to be done about it.

Perhaps since we are re-using we could try using a fully-allocated FixedBitSet 
(not sparse) when indexing? My concern is that over the lifetime of indexing 
many vectors, the sparse bit set will eventually become dense, but 
inefficiently.

> can we optimize visited bitset usage in HNSW graph search/indexing?
> ---
>
> Key: LUCENE-10655
> URL: https://issues.apache.org/jira/browse/LUCENE-10655
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/hnsw
>Reporter: Michael Sokolov
>Priority: Major
>
> When running {{luceneutil}}  I noticed that {{FixedBitSet.clear()}} dominates 
> the CPU profiler output. I had a few ideas:
>  # In upper graph layers, the occupied nodes are very sparse - maybe 
> {{SparseFixedBitSet}} would be a better fit for those
>  # We are caching these bitsets, but they are only used for a single search 
> (single document insert, during indexing). Should we cache across searches? 
> We would need to pool them though, and they would vary by field since fields 
> can have different numbers of vector nodes. This starts to get complex
>  # Are we sure that clearing a bitset is more efficient than allocating a new 
> one? Maybe the JDK maintains a pool of already-zeroed memory for us
> I think we could try specializing the bitset type by graph level, and then I 
> think we ought to measure the performance of allocation vs the limited reuse 
> that we currently have.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10655) can we optimize visited bitset usage in HNSW graph search/indexing?

2022-07-15 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567358#comment-17567358
 ] 

Michael Sokolov edited comment on LUCENE-10655 at 7/15/22 6:39 PM:
---

OK I was confused, and in fact we already do use SparseFixedBitSet for every 
layer, and we re-use the same one for the lifetime of a HnswGraphBuilder, which 
processes all the vectors. And I tried allocating afresh rather than clear-ing, 
and it was a bit slower. FixedBitSet.clear() is a hot-spot but it's not really 
clear what's to be done about it.

Perhaps since we are re-using we could try using a fully-allocated FixedBitSet 
(not sparse) when indexing? My concern is that over the lifetime of indexing 
many vectors, the sparse bit set will eventually become dense, but 
inefficiently.


was (Author: sokolov):
OK I was confused, and in fact we already do use SparseFixedBitSet for every 
layer. And I tried allocating afresh rather than clear-ing, and it was a bit 
slower. FixedBitSet.clear() is a hot-spot but it's not really clear what's to 
be done about it.

> can we optimize visited bitset usage in HNSW graph search/indexing?
> ---
>
> Key: LUCENE-10655
> URL: https://issues.apache.org/jira/browse/LUCENE-10655
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/hnsw
>Reporter: Michael Sokolov
>Priority: Major
>
> When running {{luceneutil}}  I noticed that {{FixedBitSet.clear()}} dominates 
> the CPU profiler output. I had a few ideas:
>  # In upper graph layers, the occupied nodes are very sparse - maybe 
> {{SparseFixedBitSet}} would be a better fit for those
>  # We are caching these bitsets, but they are only used for a single search 
> (single document insert, during indexing). Should we cache across searches? 
> We would need to pool them though, and they would vary by field since fields 
> can have different numbers of vector nodes. This starts to get complex
>  # Are we sure that clearing a bitset is more efficient than allocating a new 
> one? Maybe the JDK maintains a pool of already-zeroed memory for us
> I think we could try specializing the bitset type by graph level, and then I 
> think we ought to measure the performance of allocation vs the limited reuse 
> that we currently have.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10655) can we optimize visited bitset usage in HNSW graph search/indexing?

[jira] [Comment Edited] (LUCENE-10655) can we optimize visited bitset usage in HNSW graph search/indexing?

2 matches

Site Navigation

Mail list logo

Footer information