Re: Join module dependency

2024-05-19 Thread Michael Sokolov
I'm pretty sure it's only in core that we follow the no dependencies rule. On Sat, May 18, 2024, 11:25 AM Bruno Roustant wrote: > The facet module has a dependency on com.carrotsearch:hppc. > > Is it possible to add the same dependency to the join module ? What is the > rule ? > > Thanks > >

Re: How much is ja.dict.UserDictionary used?

2024-05-18 Thread Michael Sokolov
We use it Amazon. I can't really read it so I'm not sure, but I think it's used to encode terms that come up that aren't handled well by the standard dictionary. On Sat, May 18, 2024 at 8:39 AM Bruno Roustant wrote: > > Hi, > > While looking at the various usages of Map with Integer keys, I

Re: beasting tests

2024-04-04 Thread Michael Sokolov
Thanks for the explanation. It makes sense that we start with a given seed and then each iteration is different because it re-uses the same Random instance (or whatever static state?) without re-initialization? On Wed, Apr 3, 2024 at 6:09 PM Dawid Weiss wrote: > > >> Now I just need to

Re: beasting tests

2024-04-02 Thread Michael Sokolov
/apache/lucene/blob/main/gradle/testing/beasting.gradle#L62-L66> >> in beasting.gradle >> <https://github.com/apache/lucene/blob/main/gradle/testing/beasting.gradle> >> . >> >> - Shubham >> >> On Wed, Apr 3, 2024 at 1:49 AM Michael Sokolov >> w

Re: beasting tests

2024-04-02 Thread Michael Sokolov
Michael Sokolov wrote: > > Is there a convenient way to run a test multiple times with different > seeds? Do I need to write my own script? I feel like I used to be able > to do this in IntelliJ, but that option seems to have vanished, and I > don't see any such option in gradle tes

beasting tests

2024-04-02 Thread Michael Sokolov
Is there a convenient way to run a test multiple times with different seeds? Do I need to write my own script? I feel like I used to be able to do this in IntelliJ, but that option seems to have vanished, and I don't see any such option in gradle testOpts either. I tried -tests.iter but that

Re: [JENKINS] Lucene-9.x-Linux (64bit/hotspot/jdk-17.0.9) - Build # 15969 - Unstable!

2024-04-01 Thread Michael Sokolov
This TestBooleanMinShouldMatch.testRandomQueries failure did not reproduce for me on branch_9x, with JDK 11 or JDK 17 or JDK 21. I ran it a few times. TestByteVectorSimilarityQuery.testSomeDeletes reproduces reliably - I'll see if I can find out why it's unstable On Mon, Apr 1, 2024 at 9:50 AM

Re: Lucene 10

2024-03-14 Thread Michael Sokolov
timing makes sense to me. +1 for having a deadline to reduce procrastination, but Adrien I don't honestly believe anyone who is paying attention thinks that is what you have been doing! On Wed, Mar 13, 2024 at 10:40 AM Adrien Grand wrote: > > Hello everyone! > > It's been ~2.5 years since we

Re: Announcing githubsearch!

2024-02-27 Thread Michael Sokolov
Chrome on a Macbook, it's super dark. I can make > it out but I gotta stare for a bit ... do they make light and dark mode > .ico files in one!? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Sun, Feb 25, 2024 at 6:05 PM Michael Sokolov > wrote: > &

Re: Welcome Zhang Chao as Lucene committer

2024-02-25 Thread Michael Sokolov
Welcome and congratulations, Chao! On Sat, Feb 24, 2024 at 8:51 PM Christian Moen wrote: > > Congrats, Chao! > > On Wed, Feb 21, 2024 at 2:28 AM Adrien Grand wrote: >> >> I'm pleased to announce that Zhang Chao has accepted the PMC's >> invitation to become a committer. >> >> Chao, the

Re: [Vote] Bump the Lucene main branch to Java 21

2024-02-25 Thread Michael Sokolov
+1 On Fri, Feb 23, 2024 at 7:08 PM Stefan Vodita wrote: > > +1 > > On Fri, 23 Feb 2024 at 11:24, Chris Hegarty > wrote: >> >> Hi, >> >> Since the discussion on bumping the Lucene main branch to Java 21 is winding >> down, let's hold a vote on this important change. >> >> Once bumped, the next

Re: Announcing githubsearch!

2024-02-25 Thread Michael Sokolov
here is a favicon you might want to try: I cropped the "VL" from the Apache Lucene logo (ok I guess it's an AL) -- if you save it as favicon.ico in the root of your website (ie as url /favicon.ico) it should show up in bookmarks, browser toolbars, etc as a handy memory aid. Of course you might

Re: Announcing githubsearch!

2024-02-20 Thread Michael Sokolov
I love the gray all text UI. Don't change it! But I wonder if it's time for a favicon? On Tue, Feb 20, 2024, 4:40 AM Adrien Grand wrote: > Very cool, thank you Mike! > > On Mon, Feb 19, 2024 at 5:40 PM Michael McCandless < > luc...@mikemccandless.com> wrote: > >> Hi Team, >> >> ~1.5 years ago

Re: Welcome Stefan Vodita as Lucene committter

2024-01-19 Thread Michael Sokolov
Hello Stefan, welcome! On Fri, Jan 19, 2024 at 10:41 AM Martin Gainty wrote: > Congratulations Stefan! > > I look forward to reading your posts > > ~martin > -- > *From:* Michael McCandless > *Sent:* Thursday, January 18, 2024 10:53 AM > *To:* dev@lucene.apache.org

Re: [VOTE] Release Lucene 9.9.1 RC1

2023-12-14 Thread Michael Sokolov
+1 SUCCESS! [0:50:50.776559] Note: we did get some test fails on the mailing list this morning, but I believe they are not real bugs and will be resolved by tightening up our test assumptions On Thu, Dec 14, 2023 at 7:08 AM Guo Feng wrote: > +1 > > SUCCESS! [3:38:43.833896] > > On 2023/12/14

Re: [VOTE] Release Lucene 9.9.0 RC2

2023-11-30 Thread Michael Sokolov
SUCCESS! [0:46:20.693134] +1 On Thu, Nov 30, 2023 at 5:50 PM Tomás Fernández Löbbe wrote: > SUCCESS! [0:52:49.337126] > > +1 > > On Thu, Nov 30, 2023 at 12:05 PM Benjamin Trent > wrote: > >> SUCCESS! [0:44:05.132154] >> >> +1 >> >> On Thu, Nov 30, 2023 at 1:09 PM Chris Hegarty >> wrote: >>

Re: [VOTE] Release Lucene 9.9.0 RC1

2023-11-30 Thread Michael Sokolov
for the sake of posterity, I did get a successful smoketest: SUCCESS! [1:00:06.512261] but +0 to release I guess since it's moot... On Thu, Nov 30, 2023 at 10:38 AM Michael McCandless < luc...@mikemccandless.com> wrote: > On Thu, Nov 30, 2023 at 9:56 AM Chris Hegarty > wrote: > > P.S. I’m

Re: GDPR compliance

2023-11-29 Thread Michael Sokolov
Another way is to ensure that all documents get updated on a regular cadence whether there are changes in the underlying data or not. Or, regenerating the index from scratch all the time. Of course these approaches might be more costly for an index that has intrinsically low update rates, but they

Re: Lucene 9.9.0 Release

2023-11-22 Thread Michael Sokolov
+1 thanks for volunteering! Hijacking the thread a bit, sorry, I started looking into whether this is a good time to start looking ahead to 10? I know we had some rumblings about releasing that so we can start requiring newer JDKs. But looking at CHANGES it feels like we already back-ported most

Re: Test framework can't find SPI implementations from module sandbox

2023-11-21 Thread Michael Sokolov
did you add to the sandbox META-INF file? It looks like maybe sandbox is not included in the scope of the test, but you didn't say which test it was. Is the test also in the sandbox module? On Mon, Nov 20, 2023 at 6:56 PM Dongyu Xu wrote: > Hi devs, > > I tried to plug in my experimental

Re: Welcome Patrick Zhai to the Lucene PMC

2023-11-12 Thread Michael Sokolov
Welcome, Patrick! On Sun, Nov 12, 2023, 2:12 AM Ignacio Vera wrote: > Welcome Patrick! > > On Sat, Nov 11, 2023 at 3:29 PM Uwe Schindler wrote: > >> Welcome Patrick! >> >> Uwe >> >> >> Am 10. November 2023 21:04:32 MEZ schrieb Michael McCandless < >> luc...@mikemccandless.com>: >> >>> I'm

Re: Boolean field type

2023-11-09 Thread Michael Sokolov
Can you require the user to specify missing: true or missing: false semantics. With that you can decide what to do with the missing values On Thu, Nov 9, 2023, 7:55 AM Mikhail Khludnev wrote: > Hello Michael. > This optimization "NOT the less common value" assumes that boolean field > is

Re: Bump minimum Java version requirement to 21

2023-11-06 Thread Michael Sokolov
It's not just you - we have an internal JDK11 fork at BIG COMPANY for some folks that can't get off the stick. To be fair it's challenging because they have to shift all their dependencies. I think Spark was the one mentioned by one group, but there is a JDK17-based release of Spark, so clearly

Re: Squash vs merge of PRs

2023-11-04 Thread Michael Sokolov
Personally for me it's about how meaningful the commit messages (and contents) are vs whether we use merge commits or not. If it;s a long series of "fixed bug" "reformatted" "did stuff" "more stuff" "it finally works" and so on ... that doesn't smell good to me, but you know we all have done that

Re: Welcome Guo Feng to the Lucene PMC

2023-10-25 Thread Michael Sokolov
Welcome, gf2121! On Wed, Oct 25, 2023, 3:03 AM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > Congratulations and welcome, Feng! > > On Tue, 24 Oct 2023 at 22:35, Adrien Grand wrote: > >> I'm pleased to announce that Guo Feng has accepted an invitation to join >> the Lucene PMC! >>

Re: Welcome Luca Cavanna to the Lucene PMC

2023-10-22 Thread Michael Sokolov
Congratulations and welcome, Luca! On Sun, Oct 22, 2023 at 1:42 PM Julie Tibshirani wrote: > > Congratulations Luca!! > > On Fri, Oct 20, 2023 at 1:45 AM Bruno Roustant > wrote: >> >> Welcome, congratulations! >> >> Le ven. 20 oct. 2023 à 10:02, Dawid Weiss a écrit : >>> >>> >>>

Re: ByteBufferIndexInput.alreadyClosed creates an exception that doesn't track its cause

2023-10-22 Thread Michael Sokolov
gt; Uwe > > Am 22.10.2023 um 01:37 schrieb Michael Sokolov: > > Thanks for digging into this. I do think it will be helpful for > > developers that blithely access the IndexInput from multiple threads > > :) > > > > On Sat, Oct 21, 2023 at 3:53 PM Chris

Re: ByteBufferIndexInput.alreadyClosed creates an exception that doesn't track its cause

2023-10-21 Thread Michael Sokolov
Thanks for digging into this. I do think it will be helpful for developers that blithely access the IndexInput from multiple threads :) On Sat, Oct 21, 2023 at 3:53 PM Chris Hostetter wrote: > > > Uwe: In your PR, you should add these details to the javadocs of >

ByteBufferIndexInput.alreadyClosed creates an exception that doesn't track its cause

2023-10-17 Thread Michael Sokolov
I was messing around with something that was resulting in AlreadyClosedException being thrown and I noticed that we weren't tracking the exception that caused it. I found this in ByteBufferIndexInput: // the unused parameter is just to silence javac about unused variables

Re: [VOTE] Release Lucene 9.8.0 RC1

2023-09-25 Thread Michael Sokolov
nager has done everything it should do: It detected an > illegal access. Mission achieved! You have to report this issue and patch > your tool so it works correctly with SecurityManager. > > Uwe > > Am 24.09.2023 um 23:52 schrieb Michael Sokolov: > > I ran t

Re: [VOTE] Release Lucene 9.8.0 RC1

2023-09-24 Thread Michael Sokolov
ok, I re-ran without the pesky log4j-thingy running and SUCCESS! [0:55:54.865250] +1 On Sun, Sep 24, 2023 at 5:52 PM Michael Sokolov wrote: > > I ran the smoketester and had a failure. It seems related to some > log4j hot patch script we are required to run at work which i

Re: [VOTE] Release Lucene 9.8.0 RC1

2023-09-24 Thread Michael Sokolov
I ran the smoketester and had a failure. It seems related to some log4j hot patch script we are required to run at work which is somehow conflicting with the security manager? I'm killing that and trying again, but I wonder if this is going to cause problems at runtime as well? How do we enable

Re: Lucene 9.8 Release

2023-09-18 Thread Michael Sokolov
+1 for a release soon, and thanks for volunteering, Patrick! On Tue, Sep 12, 2023 at 2:08 AM Patrick Zhai wrote: > > Hi all, > It's been a while since the last release and we have quite a few good changes > including new APIs, improvements and bug fixes. Should we release the 9.8? > > If

Re: [VOTE] Release Lucene 9.7.0 RC1

2023-06-22 Thread Michael Sokolov
I have /tmp symlinked to /local/tmp (to get more space) and this seems to cause some issue: On Thu, Jun 22, 2023 at 7:07 PM Michael Sokolov wrote: > > +0 > > I had some test failures. Maybe a problem with my setup? I'll see if I can > repro > > gradlew :lucene:re

Re: [VOTE] Release Lucene 9.7.0 RC1

2023-06-22 Thread Michael Sokolov
+0 I had some test failures. Maybe a problem with my setup? I'll see if I can repro gradlew :lucene:replicator:test --tests "org.apache.lucene.replicator.nrt.TestNRTReplication.testCrashPrimary1" -Ptests.jvms=8 "-Ptests.jv margs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC

Re: Welcome Chris Hegarty to the Lucene PMC

2023-06-19 Thread Michael Sokolov
Welcome Chris! On Mon, Jun 19, 2023, 7:31 AM Michael McCandless wrote: > Welcome aboard Chris! > > Mike McCandless > > http://blog.mikemccandless.com > > > On Mon, Jun 19, 2023 at 7:16 AM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> wrote: > >> Congratulations Chris! >> >> On Mon, 19

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Michael Sokolov
community what > they want to see so they are unblocked from their explorations of vector > search. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Wed, May 17, 2023 at 7:51 AM Michael Sokolov > wrote: > &

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Michael Sokolov
I think I've said before on this list we don't actually enforce the limit in any way that can't easily be circumvented by a user. The codec already supports any size vector - it doesn't impose any limit. The way the API is written you can *already today* create an index with max-int sized vectors

Re: Running 10.0 build with a custom lucene 9.5

2023-05-15 Thread Michael Sokolov
nt lucene version differently >> than other dependencies... >> >> - Houston >> >> On Sat, May 13, 2023 at 10:14 AM Michael Sokolov wrote: >>> >>> doh I actually read your email and you said you already checked that - >>> I'm goi

Re: Running 10.0 build with a custom lucene 9.5

2023-05-13 Thread Michael Sokolov
doh I actually read your email and you said you already checked that - I'm going to send out one of those "sokolov would like to retract the previous email" emails. Does GMail even pretend to do that? I don't know what's going on there! sorry On Sat, May 13, 2023 at 10:13 AM Micha

Re: Running 10.0 build with a custom lucene 9.5

2023-05-13 Thread Michael Sokolov
sorry - META-INF not WEB-INF On Sat, May 13, 2023 at 10:12 AM Michael Sokolov wrote: > > You are probably missing the contents of WEB-INF in your custom jar? > Roughly speaking the files in there define run-time-bound "services" > that are looked up by name by the JD

Re: Running 10.0 build with a custom lucene 9.5

2023-05-13 Thread Michael Sokolov
You are probably missing the contents of WEB-INF in your custom jar? Roughly speaking the files in there define run-time-bound "services" that are looked up by name by the JDK's service-loader API. On Sat, May 13, 2023 at 9:33 AM Gus Heck wrote: > > Cross posting to lucene on the possibility

Re: HNSW questions

2023-05-11 Thread Michael Sokolov
eldWriter, is that handled somewhere else? Or is it just up to the user to > make sure no documents end up with duplicate vectors? > > On Wed, Apr 19, 2023 at 5:07 AM Michael Sokolov wrote: >> >> Oh identical vectors. Basically unsupported. If you create a large index >> fi

BooleanQuery score aggregation

2023-04-28 Thread Michael Sokolov
I think that in BooleanQuery and related classes we mostly aggregate child scores by summing (although there is DisjunctionMaxScorer which doesn't exactly take the max?). I have a use case where I want to take the min score from a bunch of required terms. To do this I had to write a new query and

Re: HNSW questions

2023-04-20 Thread Michael Sokolov
nstructor does not need to contain any values up front. Specifically, > Lucene95HnswVectorsWriter.FieldWriter adds vectors incrementally to the RAVV > that it gives to the builder as addValue is called. > > On Wed, Apr 19, 2023 at 1:37 PM Michael Sokolov wrote: >> >&g

Re: HNSW questions

2023-04-19 Thread Michael Sokolov
at the paper by Malkov and Yashunin, it looks like the algorithm > allows for building the hnsw graph incrementally. Why does our > implementation require specifying all the vectors up front to > HnswGraphBuilder.create? > > On Wed, Apr 19, 2023 at 3:04 AM Michael Sokolov wrote: >

Re: Lucene 9.6 release

2023-04-19 Thread Michael Sokolov
Yes, thanks Alan! On Wed, Apr 19, 2023 at 3:41 PM Michael Wechner wrote: > > +1 > > Thanks! > > Michael > > Am 19.04.23 um 18:09 schrieb Benjamin Trent: > > +1 ! > > You rock Alan! > > On Wed, Apr 19, 2023, 9:54 AM Ignacio Vera wrote: >> >> +1 >> >> Thanks Alan! >> >> On Wed, Apr 19, 2023 at

Re: HNSW questions

2023-04-19 Thread Michael Sokolov
Oh identical vectors. Basically unsupported. If you create a large index filled with identical vectors it leads to pathological behavior. Seems to be a weakness in the algorithm. If you have any idea how to improve that, it would be welcome. But in real world scenarios, it doesn't seem to arise?

Re: HNSW questions

2023-04-19 Thread Michael Sokolov
These vector values have internal buffers they use to return the vectors. In order to compare two vectors we need to use two independent sources so that one doesn't overwrite this internal state when fetching the second vector. Sorry I forgot the second question and can't see it on my phone. Brb

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-12 Thread Michael Sokolov
0.006306684575974941,0.020492585375905037,-0.029064252972602844 >>> >>> -0.08239810913801193,-0.01947402022778988,0.03827739879488945,-0.020566290244460106 >>> >>> -0.007012288551777601,-0.02666585892435,0.044495150446891785,-0.038030195981264114 >>

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-11 Thread Michael Sokolov
ngs I would consider to have technical merit that I don't hear: >>> >>> Impact on the speed of **other** indexing operations. (devaluation of other >>> functionality) >>> Actual scenarios that work when the limit is low and fail when the limit is >>> hi

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-10 Thread Michael Sokolov
ingface.co/sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco I did see some other larger-dimensional model, but they all seem to involve images+text. On Mon, Apr 10, 2023 at 9:54 AM Michael Sokolov wrote: > > I think concatenating word-embedding vectors is a reasonable thing to > do.

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-10 Thread Michael Sokolov
I think concatenating word-embedding vectors is a reasonable thing to do. It captures information about the sequence of tokens which is being lost by the current approach (summing them). Random article I found in a search

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-09 Thread Michael Sokolov
gt; >> > > >> Attacking me isn't helping the situation. > > >> > > >> PS: when i said the "one guy who wrote the code" I didn't mean it in > > >> any kind of demeaning fashion really. I meant to describe the current > > >> state

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-08 Thread Michael Sokolov
respect to indexing a few million docs with >>> high dimensions. You can scroll up the thread and see that at least >>> one other committer on the project experienced similar pain as me. >>> Then, think about users who aren't committers trying to use the >&

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-08 Thread Michael Sokolov
What you said about increasing dimensions requiring a bigger ram buffer on merge is wrong. That's the point I was trying to make. Your concerns about merge costs are not wrong, but your conclusion that we need to limit dimensions is not justified. You complain that hnsw sucks it doesn't scale,

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-07 Thread Michael Sokolov
one more data point: 32M 100dim (fp32) vectors indexed in 1h20m (M=16, IW cache=1994, heap=4GB) On Fri, Apr 7, 2023 at 8:52 AM Michael Sokolov wrote: > > I also want to add that we do impose some other limits on graph > construction to help ensure that HNSW-based vector fiel

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-07 Thread Michael Sokolov
I also want to add that we do impose some other limits on graph construction to help ensure that HNSW-based vector fields remain manageable; M is limited to <= 512, and maximum segment size also helps limit merge costs On Fri, Apr 7, 2023 at 7:45 AM Michael Sokolov wrote: > > Thanks

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-07 Thread Michael Sokolov
gt;>> >>> Great, thank you! >>> >>> How much RAM; etc. did you run this test on? >>> >>> Do the vectors really have to be based on real data for testing the >>> indexing? >>> I understand, if you want to test the quality of th

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-06 Thread Michael Sokolov
I'm trying to run a test. I indexed 8M 100d float32 vectors in ~20 minutes with a single thread. I have some 256K vectors, but only about 2M of them. Can anybody point me to a large set (say 8M+) of 1024+ dim vectors I can use for testing? If all else fails I can test with noise, but that tends to

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-06 Thread Michael Sokolov
Thanks > > Michael > > > > Am 06.04.23 um 16:11 schrieb Michael Sokolov: > > re: how does this HNSW stuff scale - I think people are calling out > > indexing memory usage here, so let's discuss some facts. During > > initial indexing we hold in RAM all th

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-06 Thread Michael Sokolov
re: how does this HNSW stuff scale - I think people are calling out indexing memory usage here, so let's discuss some facts. During initial indexing we hold in RAM all the vector data and the graph constructed from the new documents, but this is accounted for and limited by the size of

Re: question about impacts use case

2023-04-01 Thread Michael Sokolov
in my wrapping Query to assert this, and I can see it has some effect. Anyway I am seeing *some* skipping, which is tantalizing. On Sat, Apr 1, 2023 at 10:00 AM Michael Sokolov wrote: > > Hi, I've been working on seeing whether we can make use of impacts in > Amazon search and I have some

question about impacts use case

2023-04-01 Thread Michael Sokolov
Hi, I've been working on seeing whether we can make use of impacts in Amazon search and I have some questions. To date, we haven't used Lucene's scoring APIs at all; all of our queries are constant score, we early terminate based on a sorted index rank and then re-rank using custom non-Lucene

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-01 Thread Michael Sokolov
I'm also in favor of raising this limit. We do see some datasets with higher than 1024 dims. I also think we need to keep a limit. For example we currently need to keep all the vectors in RAM while indexing and we want to be able to support reasonable numbers of vectors in an index segment. Also

Re: [GitHub] [lucene] david-sitsky commented on issue #12185: Using DirectIODirectory results in BufferOverflowException

2023-03-22 Thread Michael Sokolov
Using directio with nfs makes no sense at all to me, I think that is the problem in a nutshell. Directio tries to bypass the operating systems buffers, but that's not going to play nicely with nfs. On Wed, Mar 22, 2023, 4:38 PM david-sitsky (via GitHub) wrote: > > david-sitsky commented on

Re: Welcome Ben Trent as Lucene committer

2023-01-27 Thread Michael Sokolov
Welcome, Ben! Congratulations On Fri, Jan 27, 2023 at 4:52 PM Anshum Gupta wrote: > > Congratulations and welcome, Ben! > > On Fri, Jan 27, 2023 at 7:18 AM Adrien Grand wrote: >> >> I'm pleased to announce that Ben Trent has accepted the PMC's >> invitation to become a committer. >> >> Ben, the

Re: Is there a way to customize segment names?

2022-12-16 Thread Michael Sokolov
+1 trying to coordinate multiple writers running independently will not work. My 2c for availability: you can have a single primary active writer with a backup one waiting, receiving all the segments from the primary. Then if the primary goes down, the secondary one has the most recent commit

Re: [VOTE] Release Lucene 9.4.2 RC1

2022-11-18 Thread Michael Sokolov
but it seems > like it's due to "can't connect to the agent: IPC connect call failed" > actually, which suggests an issue with the GPG agent? > > On Fri, Nov 18, 2022 at 3:00 PM Michael Sokolov wrote: >> >> I got this message when initially downloading the artifact

Re: [GitHub] [lucene] rmuir commented on pull request #11946: add similarity threshold for hnsw

2022-11-18 Thread Michael Sokolov
What I have in mind would be to implement entirely in the KnnVectorQuery. Since results are sorted by score, they can easily be post-filtered there: no need to implement anything at the codec layer I think. On Thu, Nov 17, 2022 at 10:10 AM GitBox wrote: > > > rmuir commented on PR #11946: > URL:

Re: [VOTE] Release Lucene 9.4.2 RC1

2022-11-18 Thread Michael Sokolov
I got this message when initially downloading the artifacts: Downloading https://dist.apache.org/repos/dist/dev/lucene/lucene-9.4.2-RC1-rev-858d9b437047a577fa9457089afff43eefa461db/lucene/lucene-9.4.2-src.tgz.asc File:

Re: HNSW search with threshold

2022-11-11 Thread Michael Sokolov
ld be hard to predict whether a given radius would actually match a >>> small set of vectors. Should the query still require a `k` value in >>> addition to the radius to make sure it doesn't go wild? >>> >>> On Tue, Nov 8, 2022 at 7:26 AM Alexey Gorlenko wrote:

Re: Release Lucene 9.4.2

2022-11-11 Thread Michael Sokolov
+1 makes sense. I do think given this is the second similar-flavored bug we've found that we should be thorough and try to get them all rather than having a 9.4.3 ... On Wed, Nov 9, 2022 at 10:25 AM Julie Tibshirani wrote: > > +1 from me for a bugfix release once we've solidified testing. Thanks

Re: HNSW search with threshold

2022-11-07 Thread Michael Sokolov
+1 to adding a scoring threshold. I think it could be another parameter to KnnVectorQuery. Do you want to have a try at adding this? If so, please feel free to open a PR and I will be happy to guide you. On Mon, Nov 7, 2022 at 6:38 AM Alexey Gorlenko wrote: > > Hi! > > There are some use cases

Re: Dense union of doc IDs

2022-11-04 Thread Michael Sokolov
It sounds like a lot of complexity to handle an unusual edge case, but ... I guess this actually happened? Can you give any sense of the end-user behavior that caused it? On Fri, Nov 4, 2022 at 2:26 AM Patrick Zhai wrote: > > Hi Froh, > > The idea sounds reasonable to me, altho I wonder whether

Re: HNSW and Multi-segments

2022-11-03 Thread Michael Sokolov
The way I think of this is that segmenting the graph will generally lead to higher recall and higher costs (at query time) for a given set of HNSW parameters. Indexing costs will tend to be lower for multiple segmented graphs. I don't think that increased irrelevant docs should be a concern since

Re: Expressions greedy advanceExact implementation

2022-10-26 Thread Michael Sokolov
, especially when it > would only improve the ternary "if" feature in such cases. > > On Wed, Oct 26, 2022 at 10:23 AM Michael Sokolov wrote: > > > > see https://github.com/apache/lucene/pull/11878 ... it doesn't do what > > I initially asked for (still adv

Re: Expressions greedy advanceExact implementation

2022-10-26 Thread Michael Sokolov
see https://github.com/apache/lucene/pull/11878 ... it doesn't do what I initially asked for (still advances all of the operands), but it delays until doubleValue() is called, which is safe and could have some impact On Wed, Oct 26, 2022 at 9:58 AM Michael Sokolov wrote: > > Hi, yes, makes

Re: Expressions greedy advanceExact implementation

2022-10-26 Thread Michael Sokolov
s, and actually advancing on doubleValue() only. > > On Tue, Oct 25, 2022 at 8:13 PM Michael Sokolov wrote: >> >> ExpressionFunctionValueSource lazily evaluates in doubleValues: an >> expression like >> >>condition ? f1 : f2 >> >> will only eva

Expressions greedy advanceExact implementation

2022-10-25 Thread Michael Sokolov
ExpressionFunctionValueSource lazily evaluates in doubleValues: an expression like condition ? f1 : f2 will only evaluate one of f1 or f2. At the same time, the advanceExact() call is greedy -- when you advance that expression it will also advance both f1 and f2. But here's the thing: it

Re: [VOTE] Release Lucene 9.4.1 RC1

2022-10-21 Thread Michael Sokolov
SUCCESS! [0:49:28.580122] +1 On Fri, Oct 21, 2022 at 5:57 AM Robert Muir wrote: > > I change my vote to +1 based on Julie's test. It fails for me with > 9.4.0 and passes for me with 9.4.1 > > :lucene:core:test (SUCCESS): 1 test(s) > > > Task :lucene:core:wipeTaskTemp > The slowest tests

Re: call for 9.4.1 release (bug in vectors format)

2022-10-18 Thread Michael Sokolov
Oh no! Very sorry -- thank you for volunteering to fix (hangs head in shame). I guess I'll see where the bug is soon ... On Tue, Oct 18, 2022 at 2:50 PM Michael Wechner wrote: > > +1 :-) > > Thanks > > Michael > > Am 18.10.22 um 19:52 schrieb Julie Tibshirani: > > Hi everyone, > > > > We

Re: Welcome Luca Cavanna as Lucene committer

2022-10-06 Thread Michael Sokolov
Welcome Luca! On Thu, Oct 6, 2022, 1:05 AM 陆徐刚 wrote: > Welcome! > > Xugang > > https://github.com/LuXugang > > On Oct 6, 2022, at 13:59, Mikhail Khludnev wrote: > >  > Welcome, Luca. > > On Wed, Oct 5, 2022 at 8:04 PM Adrien Grand wrote: > >> I'm pleased to announce that Luca Cavanna has

[ANNOUNCE] Apache Lucene 9.4.0 released

2022-09-30 Thread Michael Sokolov
The Lucene PMC is pleased to announce the release of Apache Lucene 9.4.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting,

[RESULT] [VOTE] Release Lucene 9.4.0 RC3

2022-09-30 Thread Michael Sokolov
It's been >72h since the vote was initiated and the result is: +1 8 (7 binding) 0 0 -1 0 This vote has PASSED - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: [VOTE] Release Lucene 9.4.0 RC3

2022-09-28 Thread Michael Sokolov
s >>>> >>>> http://blog.mikemccandless.com >>>> >>>> >>>> On Tue, Sep 27, 2022 at 3:45 PM Anshum Gupta >>>> wrote: >>>>> >>>>> +1 (binding) >>>>> >>>>> Smoketester is hap

[VOTE] Release Lucene 9.4.0 RC3

2022-09-27 Thread Michael Sokolov
Please vote for release candidate 3 for Lucene 9.4.0 The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-9.4.0-RC3-rev-d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 You can run the smoke tester directly with this command: python3 -u

Re: [VOTE] Release Lucene 9.4.0 RC2

2022-09-27 Thread Michael Sokolov
gt;> LatLonPoint field, see https://github.com/apache/lucene/issues/11824. >>> >>> It feels like an important regression so it might be worth a respinning. >>> Sorry about that. >>> >>> >>> On Mon, Sep 26, 2022 at 10:30 PM Anshum Gupta &g

[VOTE] Release Lucene 9.4.0 RC2

2022-09-26 Thread Michael Sokolov
Please vote for release candidate 2 for Lucene 9.4.0 The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-9.4.0-RC2-rev-0384b4fcad7856ddc574c8b994c814a568ce6d0a You can run the smoke tester directly with this command: python3 -u

Re: [VOTE] Release Lucene 9.4.0 RC1

2022-09-26 Thread Michael Sokolov
m 26.09.2022 um 15:51 schrieb Michael Sokolov: > > Hm the build failed with this: > > > > FAILURE: Build failed with an exception. > > > > * What went wrong: > > Execution failed for task ':lucene:core:compileMain19Java'. > >> Error while evaluating proper

Re: [VOTE] Release Lucene 9.4.0 RC1

2022-09-26 Thread Michael Sokolov
in our build scripts? If I install will it autodetect?? On Mon, Sep 26, 2022 at 9:36 AM Michael Sokolov wrote: > > Nice! Thanks everyone, I will refresh and start building the artifacts > > On Mon, Sep 26, 2022 at 9:33 AM Uwe Schindler wrote: > > > > OK, > > > >

Re: [VOTE] Release Lucene 9.4.0 RC1

2022-09-26 Thread Michael Sokolov
gt;>>> >>> >>>>> (no vote) >>> >>>>> >>> >>>>> SUCCESS! [1:12:31.588303] >>> >>>>> >>> >>>>> >>> >>>>> On Thu, Sep 22, 2022 at 2:27 AM Ignacio Ve

Re: [VOTE] Release Lucene 9.4.0 RC1

2022-09-26 Thread Michael Sokolov
Michael McCandless >>>>>> wrote: >>>>>>> >>>>>>> +1 >>>>>>> >>>>>>> >>>>>>> SUCCESS! [0:27:16.514391] >>>>>>> >>>>>>> >>>

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-21 Thread Michael Sokolov
ase. The vote is still ongoning, so we > > have all options. > > > > Uwe > > > > Am 21.09.2022 um 14:05 schrieb Michael Sokolov: > >> I see; I would kind of like to get the release out before ApacheCon > >> NA, which starts Oct 3. Do you think it's lik

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-21 Thread Michael Sokolov
ith JDK 19. No risk, it only activates when you enable it. > > Thoughts? > > Uwe > > Am 02.09.2022 um 21:42 schrieb Michael Sokolov: > > NOTICE: > > Branch branch_9_4 has been cut and versions updated to 9.5 on stable branch. > > Please observe the normal r

[VOTE] Release Lucene 9.4.0 RC1

2022-09-20 Thread Michael Sokolov
Please vote for release candidate 1 for Lucene 9.4.0 The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-9.4.0-RC1-rev-f5d0646daa5651f2192282ac85551bca667e34f9 You can run the smoke tester directly with this command: python3 -u

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-20 Thread Michael Sokolov
ublish my local ann-benchmarks set-up so that >> it's not so fragile! >> >> In summary, with your latest fix the recall and QPS look good to me -- I >> don't detect any regression between 9.3 and 9.4. >> >> Julie >> >> On Mon, Sep 19, 2022 a

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-19 Thread Michael Sokolov
orrow to > double-check there's no drop. It would also be nice to formalize the > ann-benchmarks set-up and run it regularly (like we've discussed in > https://github.com/apache/lucene/issues/10665). > > Julie > > On Mon, Sep 19, 2022 at 10:33 AM Michael Sokolov > wro

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-19 Thread Michael Sokolov
95.236 > n_cands=120 0.843 948.908 0.843 525.914 > n_cands=200 0.878 671.781 0.878 351.529 > n_cands=400 0.918 392.265 0.918 207.854 > n_cands=600 0.937 282.403 0.937 144.311 > n_cands=800 0.949 214.620 0.949 116.875 > > On Sun, Sep 18, 2022 at 6:25 PM Michael Sokolov > wrote: >

Re: Subject: New branch and feature freeze for Lucene 9.4.0

2022-09-18 Thread Michael Sokolov
operations? It would be a little surprising if that were the case given the small number of branchings compared to the number of multiplies in dot-product though. On Sun, Sep 18, 2022 at 3:25 PM Michael Sokolov wrote: > > Thanks for the deep-dive Julie. I was able to reproduce the ch

  1   2   3   4   5   6   >