Re: [jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-07-27 Thread David Smiley
FYI I had filed https://issues.apache.org/jira/browse/INFRA-23503

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jul 26, 2022 at 3:54 PM Michael Sokolov  wrote:

> searching JIRA for "slkjfdf" I found a few issues in other projects,
> but none seems to be getting the same degree of spam love
>
> On Tue, Jul 26, 2022 at 3:50 PM Mike Sokolov (Jira) 
> wrote:
> >
> >
> > [
> https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571588#comment-17571588
> ]
> >
> > Mike Sokolov commented on LUCENE-10054:
> > ---
> >
> > what is it with this issue that spammers love so much!? I wonder if we
> > could somehow lock it as read-only ...
> >
> >
> >
> > > Handle hierarchy in HNSW graph
> > > --
> > >
> > > Key: LUCENE-10054
> > > URL:
> https://issues.apache.org/jira/browse/LUCENE-10054
> > > Project: Lucene - Core
> > >  Issue Type: Task
> > >Reporter: Mayya Sharipova
> > >Priority: Major
> > >  Labels: vector-based-search
> > > Fix For: 9.1
> > >
> > >  Time Spent: 20h 20m
> > >  Remaining Estimate: 0h
> > >
> > > Currently HNSW graph is represented as a single layer graph.
> > >  We would like to extend it to handle hierarchy as per [discussion|
> https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216
> ].
> > >
> > > TODO tasks:
> > > - add multiple layers in the HnswGraph class
> > >  - modify the format in  Lucene90HnswVectorsWriter and
> Lucene90HnswVectorsReader to handle multiple layers
> > > - modify graph construction and search algorithm to handle hierarchy
> > >  - run benchmarks
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.20.10#820010)
> >
> > -
> > To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: issues-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-07-26 Thread Mike Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571588#comment-17571588
 ] 

Mike Sokolov commented on LUCENE-10054:
---

what is it with this issue that spammers love so much!? I wonder if we
could somehow lock it as read-only ...



> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Labels: vector-based-search
> Fix For: 9.1
>
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-02-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496910#comment-17496910
 ] 

ASF subversion and git services commented on LUCENE-10054:
--

Commit 458fb1abed45e2b7605b3e89d20ec0709ca755fd in lucene's branch 
refs/heads/branch_9x from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=458fb1a ]

LUCENE-10054: Make sure to use Lucene90 codec in unit tests (#699)

Before we were using the default Lucene91 codec, so we weren't exercising the
old format.

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
> Fix For: 9.1
>
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-02-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496888#comment-17496888
 ] 

ASF subversion and git services commented on LUCENE-10054:
--

Commit 4364bdd63ef58b2094dc252018d3c027302af4f4 in lucene's branch 
refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4364bdd ]

LUCENE-10054: Make sure to use Lucene90 codec in unit tests (#699)

Before we were using the default Lucene91 codec, so we weren't exercising the
old format.

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
> Fix For: 9.1
>
>  Time Spent: 20h 10m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-02-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487278#comment-17487278
 ] 

ASF subversion and git services commented on LUCENE-10054:
--

Commit ff2189c477dd39acfc9233a05a6f529d2fde2bf0 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ff2189c ]

Add changes item for LUCENE-10054


> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
> Fix For: 9.1
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-01-31 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485084#comment-17485084
 ] 

Adrien Grand commented on LUCENE-10054:
---

It looks like this change made both indexing (+5%) and searching (+7%) vectors 
slightly faster. 
http://people.apache.org/~mikemccand/lucenebench/VectorSearch.html

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-01-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17483729#comment-17483729
 ] 

ASF subversion and git services commented on LUCENE-10054:
--

Commit 68beb1acb499b5759c9471dcc815e46071558371 in lucene's branch 
refs/heads/branch_9x from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=68beb1a ]

LUCENE-10054 Make HnswGraph hierarchical (#608) (#629)

Currently HNSW has only a single layer.
This patch makes HNSW graph multi-layered.

This PR is based on the following PRs:
 #250, #267, #287, #315, #536, #416

Main changes:
- Multi layers are introduced into HnswGraph and HnswGraphBuilder
- A new Lucene91HnswVectorsFormat with new Lucene91HnswVectorsReader
and Lucene91HnswVectorsWriter are introduced to encode graph
layers' information
- Lucene90Codec, Lucene90HnswVectorsFormat, and the reading logic of
Lucene90HnswVectorsReader and Lucene90HnswGraph are moved to
backward_codecs to support reading and searching of graphs built
in pre 9.1 version. Lucene90HnswVectorsWriter is deleted.
- For backwards compatible tests, previous Lucene90 graph reading and
writing logic was copied into test files of
Lucene90RWHnswVectorsFormat, Lucene90HnswVectorsWriter,
Lucene90HnswGraphBuilder and Lucene90HnswRWGraph.

TODO: tests for KNN search for graphs built in pre 9.1 version;
tests for merge of indices of pre 9.1 + current versions.

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-01-25 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17482182#comment-17482182
 ] 

Tomoko Uchida commented on LUCENE-10054:


Hi, 
just a quick note - https://issues.apache.org/jira/browse/LUCENE-10389 could 
relates to this ?

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17482049#comment-17482049
 ] 

ASF subversion and git services commented on LUCENE-10054:
--

Commit b0d6fe68d1f2b8a93c9fc22a6ccdedff94bf1fbb in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b0d6fe6 ]

LUCENE-10054 Make HnswGraph hierarchical (#608)

Currently HNSW has only a single layer.
This patch makes HNSW graph multi-layered.

This PR is based on the following PRs:
 #250, #267, #287, #315, #536, #416

Main changes:
- Multi layers are introduced into HnswGraph and HnswGraphBuilder
- A new Lucene91HnswVectorsFormat with new Lucene91HnswVectorsReader
and Lucene91HnswVectorsWriter are introduced to encode graph
layers' information
- Lucene90Codec, Lucene90HnswVectorsFormat, and the reading logic of
Lucene90HnswVectorsReader and Lucene90HnswGraph are moved to
backward_codecs to support reading and searching of graphs built
in pre 9.1 version. Lucene90HnswVectorsWriter is deleted.
- For backwards compatible tests, previous Lucene90 graph reading and
writing logic was copied into test files of
Lucene90RWHnswVectorsFormat, Lucene90HnswVectorsWriter,
Lucene90HnswGraphBuilder and Lucene90HnswRWGraph.

TODO: tests for KNN search for graphs built in pre 9.1 version;
tests for merge of indices of pre 9.1 + current versions.

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2021-10-18 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430199#comment-17430199
 ] 

Mayya Sharipova commented on LUCENE-10054:
--

[~mikemccand] Thanks for checking. This issue is not resolved yet.

We have implemented this as a [feature 
brach|https://github.com/apache/lucene/tree/hnsw], but it has not been merged 
to the `main` branch yet.  Storing hierarchy on disk is already implemented in 
that feature branch as well. 

We had an idea to move some data off-heap, and run more benchmarking tests, and 
then open a PR against a `main` branch.  

 

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2021-10-18 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429958#comment-17429958
 ] 

Michael McCandless commented on LUCENE-10054:
-

[~mayya] this looks like an awesome improvement!  Can this maybe be resolved 
now?  Or are we leaving it open to support the hierarchy also on-disk?  Maybe 
open a separate spinoff issue for that?

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2021-09-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410532#comment-17410532
 ] 

ASF subversion and git services commented on LUCENE-10054:
--

Commit a2c8f2805e18895bdfcca568126353c86b95e50b in lucene's branch 
refs/heads/hnsw from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a2c8f28 ]

LUCENE-10054 Handle hierarchy in graph construction and search (#267)

This patch handles hierarchy in graph construction and search,
but only in memory.

Changes:

- HnswGraphBuilder has an extra parameter :ml normalization factor for level
  generation. When ml=0, the graph will have only a single layer: SNW.
  When ml > 0, the graph will have multiple layers.
  - The recommended ml value from 2018 HNSW paper is :
ml = 1 / Math.log(1.0 * maxConn), which was used for tests in
TestHnswGraph class.
  - When ml = 0, we use the previous code in the method buildSNW
  - When ml > 0, the method buildHNSW is used, that according the paper,
for every new node: generates a random level for this node, and then
 places the new node in those levels with its connections.
- HnswGraph's search method has also been modified to handle two cases:
  - When ml = 0, we use the previous code for a flat graph: generate
boundedNumSeed number of random entry points, and use them to search
a flat graph
  - When ml > 0, we use the hierarchical graph: from max level till level 1
using topK=1, and on the level 0th using topK=boundedNumSeed

Work left for future: handle hierarchy on disk

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2021-08-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404019#comment-17404019
 ] 

ASF subversion and git services commented on LUCENE-10054:
--

Commit fb5ba4d86b930a8cbe7c37003fdb687845c237ff in lucene's branch 
refs/heads/hnsw from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb5ba4d ]

LUCENE-10054 Make HnswGraph hierarchical (#250)

Currently HNSW has only a single layer.
This is the first part to make it multi-layered.

To keep changes small, this PR only adds
 multiple layers in the HnswGraph class.


> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2021-08-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404007#comment-17404007
 ] 

ASF subversion and git services commented on LUCENE-10054:
--

Commit fc67d6aa6e2bf2ec8ff4b2b8e4a763f3f706de29 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fc67d6a ]

Revert "LUCENE-10054 Make HnswGraph hierarchical (#250)"

This reverts commit 257d256defc47c446493ea99b841f58c543673c0.

We've decided to have a separate feature branch for HNSW,
and put all related changes there.


> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2021-08-24 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403721#comment-17403721
 ] 

Mayya Sharipova commented on LUCENE-10054:
--

Julie, I think having a feature branch is a great idea.

If everyone is ok, I can revert my last commit to main and put it instead to a 
feature branch.

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2021-08-23 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403443#comment-17403443
 ] 

Julie Tibshirani commented on LUCENE-10054:
---

[~mayya] just a suggestion, what would you think of developing this in a 
feature branch instead of main? We could still review the individual PRs 
against the branch to provide feedback. It might be nice to keep the changes 
separate until we have a fuller picture with benchmarks, so we could look at 
the trade-off in performance vs. complexity?

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2021-08-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403363#comment-17403363
 ] 

ASF subversion and git services commented on LUCENE-10054:
--

Commit 257d256defc47c446493ea99b841f58c543673c0 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=257d256 ]

LUCENE-10054 Make HnswGraph hierarchical (#250)

Currently HNSW has only a single layer.
This is the first part to make it multi-layered.

To keep changes small, this PR only adds
 multiple layers in the HnswGraph class.

TODO  for following PRs:
- modify graph construction and search algorithm for a hierarchical
graph.
- modify Lucene90HnswVectorsWriter and Lucene90HnswVectorsReader to
write and read multiple layers\

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2021-08-20 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402399#comment-17402399
 ] 

Mayya Sharipova commented on LUCENE-10054:
--

[~sokolov] Thanks your for your feedback. I've modified the diagram to expand 
all abbreviations.

 
{quote}I wonder if we'll need {{ep}} as a special case – it's really just the 
single node in the max level isn't it?
{quote}
Great comment, we don't need ep, I've removed it from the diagram.. Although 
the max level may contain several nodes, it is the first node in the max level 
that is served as the graph's entry point. 

 
{quote}Could you explain what is in each of the {{NodesLevelX}}? I guess it's a 
list of the ordinals in level 0 that are contained in the other level – but I 
wonder if we will need this
{quote}
I've expanded the explanation, but I think with need this info, as I can't see 
how we can find for each level the closest nodes to a query; these ordinals 
allow us to get the vector values for distance calculations. 

 
{quote}Should we implement the in-memory version before defining the 
serialization?
{quote}
 

Agree. Great suggestion.

 

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2021-08-20 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402289#comment-17402289
 ] 

Michael Sokolov commented on LUCENE-10054:
--

Thanks for looking into this!

could you expand *all* the abbreviations in the text below the diagram? EG I 
think "VDOffset" is "vector data offset"? I think it will help to clarify for 
readers.

I wonder if we'll need {{ep}} as a special case -- it's really just the single 
node in the max level isn't it?

Could you explain what is in each of the {{NodesLevelX}}? I guess it's a list 
of the ordinals in level 0 that are contained in the other level -- but I 
wonder if we will  need this. Should we  implement the in-memory version before 
defining the serialization?


> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2021-08-18 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401172#comment-17401172
 ] 

Mayya Sharipova commented on LUCENE-10054:
--

Proposed .vem index file structure:



+-++--+--+--+-+--+---+
| FieldNumber | SimFun | VDOffset | VDLength | VIOffset | VILength| dims | 
docIds
+-++--+--+--+-+--+---+

--+-+--+-++---+-++
 | LevelsCount | SizeLevelmax | ... | SizeLevel0 | NodesLevelmax | ... | 
NodesLevel1
--+-+--+-++---+-++

--+--+-++
 | graphOffsetsLevelmax | ... | graphOffsetsLevel0 |
--+--+ ++

 

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org