Re: Filtering before a vector search.
+1 from me too, this will be a really helpful feature. I've done some background research and found a couple aspects that are tricky. If the filter only matches a small percentage of documents, HNSW can quickly degrade to a brute-force scan. With live docs this isn't a big problem, because our merge policies help keep deleted docs down to a reasonable percentage. But with an arbitrary query, you could easily filter away most documents, leading to a surprisingly slow kNN search. This blog post from the Weaviate engine has a graph showing a slowdown past ~20% filter selectivity: https://towardsdatascience.com/effects-of-filtered-hnsw-searches-on-recall-and-latency-434becf8041c. Looking forward to discussing more on the issue. Julie On Wed, Jan 19, 2022 at 12:10 PM Joel Bernstein wrote: > https://issues.apache.org/jira/browse/LUCENE-10382 > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Wed, Jan 19, 2022 at 2:59 PM Joel Bernstein wrote: > >> Ok, I can create the jira. >> >> >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> >> On Wed, Jan 19, 2022 at 2:49 PM Michael Sokolov >> wrote: >> >>> +1 we should extend the functionality to support any Bits, not just >>> liveDocs; we need to propose an API. The implementation should not be >>> too hard - we need to intersect the user-supplied Bits with liveDocs >>> and use that to filter. >>> >>> On Wed, Jan 19, 2022 at 1:42 PM Joel Bernstein >>> wrote: >>> > >>> > Hi, >>> > >>> > Thanks for all the work on the vector search! >>> > >>> > I was wondering if there was a way using KnnVectorQuery to filter the >>> docs this query looks at. Right now the searchLeaf method passes in the >>> liveDocs to LeafReader.searchNearestVectors, but there appears to be no way >>> to have the KnnVectorQuery operate on a subset of liveDocs. >>> > >>> > Thanks, >>> > >>> > Joel Bernstein >>> > http://joelsolr.blogspot.com/ >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>>
Re: Filtering before a vector search.
https://issues.apache.org/jira/browse/LUCENE-10382 Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jan 19, 2022 at 2:59 PM Joel Bernstein wrote: > Ok, I can create the jira. > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Wed, Jan 19, 2022 at 2:49 PM Michael Sokolov > wrote: > >> +1 we should extend the functionality to support any Bits, not just >> liveDocs; we need to propose an API. The implementation should not be >> too hard - we need to intersect the user-supplied Bits with liveDocs >> and use that to filter. >> >> On Wed, Jan 19, 2022 at 1:42 PM Joel Bernstein >> wrote: >> > >> > Hi, >> > >> > Thanks for all the work on the vector search! >> > >> > I was wondering if there was a way using KnnVectorQuery to filter the >> docs this query looks at. Right now the searchLeaf method passes in the >> liveDocs to LeafReader.searchNearestVectors, but there appears to be no way >> to have the KnnVectorQuery operate on a subset of liveDocs. >> > >> > Thanks, >> > >> > Joel Bernstein >> > http://joelsolr.blogspot.com/ >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>
Re: Filtering before a vector search.
Ok, I can create the jira. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jan 19, 2022 at 2:49 PM Michael Sokolov wrote: > +1 we should extend the functionality to support any Bits, not just > liveDocs; we need to propose an API. The implementation should not be > too hard - we need to intersect the user-supplied Bits with liveDocs > and use that to filter. > > On Wed, Jan 19, 2022 at 1:42 PM Joel Bernstein wrote: > > > > Hi, > > > > Thanks for all the work on the vector search! > > > > I was wondering if there was a way using KnnVectorQuery to filter the > docs this query looks at. Right now the searchLeaf method passes in the > liveDocs to LeafReader.searchNearestVectors, but there appears to be no way > to have the KnnVectorQuery operate on a subset of liveDocs. > > > > Thanks, > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: Filtering before a vector search.
+1 we should extend the functionality to support any Bits, not just liveDocs; we need to propose an API. The implementation should not be too hard - we need to intersect the user-supplied Bits with liveDocs and use that to filter. On Wed, Jan 19, 2022 at 1:42 PM Joel Bernstein wrote: > > Hi, > > Thanks for all the work on the vector search! > > I was wondering if there was a way using KnnVectorQuery to filter the docs > this query looks at. Right now the searchLeaf method passes in the liveDocs > to LeafReader.searchNearestVectors, but there appears to be no way to have > the KnnVectorQuery operate on a subset of liveDocs. > > Thanks, > > Joel Bernstein > http://joelsolr.blogspot.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Filtering before a vector search.
Hi, Thanks for all the work on the vector search! I was wondering if there was a way using KnnVectorQuery to filter the docs this query looks at. Right now the searchLeaf method passes in the liveDocs to LeafReader.searchNearestVectors, but there appears to be no way to have the KnnVectorQuery operate on a subset of liveDocs. Thanks, Joel Bernstein http://joelsolr.blogspot.com/