Re: [jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
FYI I had filed https://issues.apache.org/jira/browse/INFRA-23503 ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Tue, Jul 26, 2022 at 3:54 PM Michael Sokolov wrote: > searching JIRA for "slkjfdf" I found a few issues in other projects, > but none seems to be getting the same degree of spam love > > On Tue, Jul 26, 2022 at 3:50 PM Mike Sokolov (Jira) > wrote: > > > > > > [ > https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571588#comment-17571588 > ] > > > > Mike Sokolov commented on LUCENE-10054: > > --- > > > > what is it with this issue that spammers love so much!? I wonder if we > > could somehow lock it as read-only ... > > > > > > > > > Handle hierarchy in HNSW graph > > > -- > > > > > > Key: LUCENE-10054 > > > URL: > https://issues.apache.org/jira/browse/LUCENE-10054 > > > Project: Lucene - Core > > > Issue Type: Task > > >Reporter: Mayya Sharipova > > >Priority: Major > > > Labels: vector-based-search > > > Fix For: 9.1 > > > > > > Time Spent: 20h 20m > > > Remaining Estimate: 0h > > > > > > Currently HNSW graph is represented as a single layer graph. > > > We would like to extend it to handle hierarchy as per [discussion| > https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216 > ]. > > > > > > TODO tasks: > > > - add multiple layers in the HnswGraph class > > > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > > > - modify graph construction and search algorithm to handle hierarchy > > > - run benchmarks > > > > > > > > -- > > This message was sent by Atlassian Jira > > (v8.20.10#820010) > > > > - > > To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org > > For additional commands, e-mail: issues-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571588#comment-17571588 ] Mike Sokolov commented on LUCENE-10054: --- what is it with this issue that spammers love so much!? I wonder if we could somehow lock it as read-only ... > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Labels: vector-based-search > Fix For: 9.1 > > Time Spent: 20h 20m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496910#comment-17496910 ] ASF subversion and git services commented on LUCENE-10054: -- Commit 458fb1abed45e2b7605b3e89d20ec0709ca755fd in lucene's branch refs/heads/branch_9x from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=458fb1a ] LUCENE-10054: Make sure to use Lucene90 codec in unit tests (#699) Before we were using the default Lucene91 codec, so we weren't exercising the old format. > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Fix For: 9.1 > > Time Spent: 20h 20m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496888#comment-17496888 ] ASF subversion and git services commented on LUCENE-10054: -- Commit 4364bdd63ef58b2094dc252018d3c027302af4f4 in lucene's branch refs/heads/main from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4364bdd ] LUCENE-10054: Make sure to use Lucene90 codec in unit tests (#699) Before we were using the default Lucene91 codec, so we weren't exercising the old format. > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Fix For: 9.1 > > Time Spent: 20h 10m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487278#comment-17487278 ] ASF subversion and git services commented on LUCENE-10054: -- Commit ff2189c477dd39acfc9233a05a6f529d2fde2bf0 in lucene's branch refs/heads/main from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ff2189c ] Add changes item for LUCENE-10054 > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Fix For: 9.1 > > Time Spent: 20h > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485084#comment-17485084 ] Adrien Grand commented on LUCENE-10054: --- It looks like this change made both indexing (+5%) and searching (+7%) vectors slightly faster. http://people.apache.org/~mikemccand/lucenebench/VectorSearch.html > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 20h > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17483729#comment-17483729 ] ASF subversion and git services commented on LUCENE-10054: -- Commit 68beb1acb499b5759c9471dcc815e46071558371 in lucene's branch refs/heads/branch_9x from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=68beb1a ] LUCENE-10054 Make HnswGraph hierarchical (#608) (#629) Currently HNSW has only a single layer. This patch makes HNSW graph multi-layered. This PR is based on the following PRs: #250, #267, #287, #315, #536, #416 Main changes: - Multi layers are introduced into HnswGraph and HnswGraphBuilder - A new Lucene91HnswVectorsFormat with new Lucene91HnswVectorsReader and Lucene91HnswVectorsWriter are introduced to encode graph layers' information - Lucene90Codec, Lucene90HnswVectorsFormat, and the reading logic of Lucene90HnswVectorsReader and Lucene90HnswGraph are moved to backward_codecs to support reading and searching of graphs built in pre 9.1 version. Lucene90HnswVectorsWriter is deleted. - For backwards compatible tests, previous Lucene90 graph reading and writing logic was copied into test files of Lucene90RWHnswVectorsFormat, Lucene90HnswVectorsWriter, Lucene90HnswGraphBuilder and Lucene90HnswRWGraph. TODO: tests for KNN search for graphs built in pre 9.1 version; tests for merge of indices of pre 9.1 + current versions. > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 19h 50m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17482182#comment-17482182 ] Tomoko Uchida commented on LUCENE-10054: Hi, just a quick note - https://issues.apache.org/jira/browse/LUCENE-10389 could relates to this ? > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 18h 20m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17482049#comment-17482049 ] ASF subversion and git services commented on LUCENE-10054: -- Commit b0d6fe68d1f2b8a93c9fc22a6ccdedff94bf1fbb in lucene's branch refs/heads/main from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b0d6fe6 ] LUCENE-10054 Make HnswGraph hierarchical (#608) Currently HNSW has only a single layer. This patch makes HNSW graph multi-layered. This PR is based on the following PRs: #250, #267, #287, #315, #536, #416 Main changes: - Multi layers are introduced into HnswGraph and HnswGraphBuilder - A new Lucene91HnswVectorsFormat with new Lucene91HnswVectorsReader and Lucene91HnswVectorsWriter are introduced to encode graph layers' information - Lucene90Codec, Lucene90HnswVectorsFormat, and the reading logic of Lucene90HnswVectorsReader and Lucene90HnswGraph are moved to backward_codecs to support reading and searching of graphs built in pre 9.1 version. Lucene90HnswVectorsWriter is deleted. - For backwards compatible tests, previous Lucene90 graph reading and writing logic was copied into test files of Lucene90RWHnswVectorsFormat, Lucene90HnswVectorsWriter, Lucene90HnswGraphBuilder and Lucene90HnswRWGraph. TODO: tests for KNN search for graphs built in pre 9.1 version; tests for merge of indices of pre 9.1 + current versions. > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 18h 20m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430199#comment-17430199 ] Mayya Sharipova commented on LUCENE-10054: -- [~mikemccand] Thanks for checking. This issue is not resolved yet. We have implemented this as a [feature brach|https://github.com/apache/lucene/tree/hnsw], but it has not been merged to the `main` branch yet. Storing hierarchy on disk is already implemented in that feature branch as well. We had an idea to move some data off-heap, and run more benchmarking tests, and then open a PR against a `main` branch. > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429958#comment-17429958 ] Michael McCandless commented on LUCENE-10054: - [~mayya] this looks like an awesome improvement! Can this maybe be resolved now? Or are we leaving it open to support the hierarchy also on-disk? Maybe open a separate spinoff issue for that? > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410532#comment-17410532 ] ASF subversion and git services commented on LUCENE-10054: -- Commit a2c8f2805e18895bdfcca568126353c86b95e50b in lucene's branch refs/heads/hnsw from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a2c8f28 ] LUCENE-10054 Handle hierarchy in graph construction and search (#267) This patch handles hierarchy in graph construction and search, but only in memory. Changes: - HnswGraphBuilder has an extra parameter :ml normalization factor for level generation. When ml=0, the graph will have only a single layer: SNW. When ml > 0, the graph will have multiple layers. - The recommended ml value from 2018 HNSW paper is : ml = 1 / Math.log(1.0 * maxConn), which was used for tests in TestHnswGraph class. - When ml = 0, we use the previous code in the method buildSNW - When ml > 0, the method buildHNSW is used, that according the paper, for every new node: generates a random level for this node, and then places the new node in those levels with its connections. - HnswGraph's search method has also been modified to handle two cases: - When ml = 0, we use the previous code for a flat graph: generate boundedNumSeed number of random entry points, and use them to search a flat graph - When ml > 0, we use the hierarchical graph: from max level till level 1 using topK=1, and on the level 0th using topK=boundedNumSeed Work left for future: handle hierarchy on disk > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 8h 20m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404019#comment-17404019 ] ASF subversion and git services commented on LUCENE-10054: -- Commit fb5ba4d86b930a8cbe7c37003fdb687845c237ff in lucene's branch refs/heads/hnsw from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb5ba4d ] LUCENE-10054 Make HnswGraph hierarchical (#250) Currently HNSW has only a single layer. This is the first part to make it multi-layered. To keep changes small, this PR only adds multiple layers in the HnswGraph class. > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404007#comment-17404007 ] ASF subversion and git services commented on LUCENE-10054: -- Commit fc67d6aa6e2bf2ec8ff4b2b8e4a763f3f706de29 in lucene's branch refs/heads/main from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fc67d6a ] Revert "LUCENE-10054 Make HnswGraph hierarchical (#250)" This reverts commit 257d256defc47c446493ea99b841f58c543673c0. We've decided to have a separate feature branch for HNSW, and put all related changes there. > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403721#comment-17403721 ] Mayya Sharipova commented on LUCENE-10054: -- Julie, I think having a feature branch is a great idea. If everyone is ok, I can revert my last commit to main and put it instead to a feature branch. > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403443#comment-17403443 ] Julie Tibshirani commented on LUCENE-10054: --- [~mayya] just a suggestion, what would you think of developing this in a feature branch instead of main? We could still review the individual PRs against the branch to provide feedback. It might be nice to keep the changes separate until we have a fuller picture with benchmarks, so we could look at the trade-off in performance vs. complexity? > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403363#comment-17403363 ] ASF subversion and git services commented on LUCENE-10054: -- Commit 257d256defc47c446493ea99b841f58c543673c0 in lucene's branch refs/heads/main from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=257d256 ] LUCENE-10054 Make HnswGraph hierarchical (#250) Currently HNSW has only a single layer. This is the first part to make it multi-layered. To keep changes small, this PR only adds multiple layers in the HnswGraph class. TODO for following PRs: - modify graph construction and search algorithm for a hierarchical graph. - modify Lucene90HnswVectorsWriter and Lucene90HnswVectorsReader to write and read multiple layers\ > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402399#comment-17402399 ] Mayya Sharipova commented on LUCENE-10054: -- [~sokolov] Thanks your for your feedback. I've modified the diagram to expand all abbreviations. {quote}I wonder if we'll need {{ep}} as a special case – it's really just the single node in the max level isn't it? {quote} Great comment, we don't need ep, I've removed it from the diagram.. Although the max level may contain several nodes, it is the first node in the max level that is served as the graph's entry point. {quote}Could you explain what is in each of the {{NodesLevelX}}? I guess it's a list of the ordinals in level 0 that are contained in the other level – but I wonder if we will need this {quote} I've expanded the explanation, but I think with need this info, as I can't see how we can find for each level the closest nodes to a query; these ordinals allow us to get the vector values for distance calculations. {quote}Should we implement the in-memory version before defining the serialization? {quote} Agree. Great suggestion. > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402289#comment-17402289 ] Michael Sokolov commented on LUCENE-10054: -- Thanks for looking into this! could you expand *all* the abbreviations in the text below the diagram? EG I think "VDOffset" is "vector data offset"? I think it will help to clarify for readers. I wonder if we'll need {{ep}} as a special case -- it's really just the single node in the max level isn't it? Could you explain what is in each of the {{NodesLevelX}}? I guess it's a list of the ordinals in level 0 that are contained in the other level -- but I wonder if we will need this. Should we implement the in-memory version before defining the serialization? > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph
[ https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401172#comment-17401172 ] Mayya Sharipova commented on LUCENE-10054: -- Proposed .vem index file structure: +-++--+--+--+-+--+---+ | FieldNumber | SimFun | VDOffset | VDLength | VIOffset | VILength| dims | docIds +-++--+--+--+-+--+---+ --+-+--+-++---+-++ | LevelsCount | SizeLevelmax | ... | SizeLevel0 | NodesLevelmax | ... | NodesLevel1 --+-+--+-++---+-++ --+--+-++ | graphOffsetsLevelmax | ... | graphOffsetsLevel0 | --+--+ ++ > Handle hierarchy in HNSW graph > -- > > Key: LUCENE-10054 > URL: https://issues.apache.org/jira/browse/LUCENE-10054 > Project: Lucene - Core > Issue Type: Task >Reporter: Mayya Sharipova >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently HNSW graph is represented as a single layer graph. > We would like to extend it to handle hierarchy as per > [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216]. > > > TODO tasks: > - add multiple layers in the HnswGraph class > - modify the format in Lucene90HnswVectorsWriter and > Lucene90HnswVectorsReader to handle multiple layers > - modify graph construction and search algorithm to handle hierarchy > - run benchmarks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org