[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects
[ https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084450#comment-17084450 ] Erick Erickson commented on LUCENE-7788: [~dsmiley] Well, it _is_ silly to test error and fatal level messages so I won't flag those. Oh, and as I find egregious patterns that don't really count, I'm adding them to the list of things NOT to report. For instance, lots of the test log messages have timeunit conversions, which don't matter. > fail precommit on unparameterised log messages and examine for wasted > work/objects > -- > > Key: LUCENE-7788 > URL: https://issues.apache.org/jira/browse/LUCENE-7788 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Erick Erickson >Priority: Minor > Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch, > gradle_only.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > SOLR-10415 would be removing existing unparameterised log.trace messages use > and once that is in place then this ticket's one-line change would be for > 'ant precommit' to reject any future unparameterised log.trace message use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects
[ https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084447#comment-17084447 ] Erick Erickson commented on LUCENE-7788: Fixed, gradle_only.patch > fail precommit on unparameterised log messages and examine for wasted > work/objects > -- > > Key: LUCENE-7788 > URL: https://issues.apache.org/jira/browse/LUCENE-7788 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Erick Erickson >Priority: Minor > Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch, > gradle_only.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > SOLR-10415 would be removing existing unparameterised log.trace messages use > and once that is in place then this ticket's one-line change would be for > 'ant precommit' to reject any future unparameterised log.trace message use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects
[ https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-7788: --- Attachment: gradle_only.patch > fail precommit on unparameterised log messages and examine for wasted > work/objects > -- > > Key: LUCENE-7788 > URL: https://issues.apache.org/jira/browse/LUCENE-7788 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Erick Erickson >Priority: Minor > Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch, > gradle_only.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > SOLR-10415 would be removing existing unparameterised log.trace messages use > and once that is in place then this ticket's one-line change would be for > 'ant precommit' to reject any future unparameterised log.trace message use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects
[ https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084442#comment-17084442 ] Erick Erickson edited comment on LUCENE-7788 at 4/16/20, 12:09 AM: --- I'm in a bit of an awkward spot, my fork has a ton of changes unrelated to just incorporating Gradle. So I'm attaching a separate patch that only contains the gradle bits in the hope that [~dweiss] (or anyone more gradle-knowledgable than me) will take a peek at it. If it's OK (or at least a good place to start), I'll fold it into my fork for the next commit. Which will be Real Soon Now, like Friday. Please don't bother with how the check actually works, all I'm really asking for is whether this looks like something that doesn't violate Gradle norms too violently. NOTE: Checking file paths rather than projects for inclusion/exclusion is temporary, going by project at this point is too big a chunk. I'll change it to be project-based before I'm done. Oh, I just noticed that the error message gets printed and the build stops even when executing other targets, fixing. was (Author: erickerickson): I'm in a bit of an awkward spot, my fork has a ton of changes unrelated to just incorporating Gradle. So I'm attaching a separate patch that only contains the gradle bits in the hope that [~dweiss] (or anyone more gradle-knowledgable than me) will take a peek at it. If it's OK (or at least a good place to start), I'll fold it into my fork for the next commit. Which will be Real Soon Now, like Friday. Please don't bother with how the check actually works, all I'm really asking for is whether this looks like something that doesn't violate Gradle norms too violently. NOTE: Checking file paths rather than projects for inclusion/exclusion is temporary, going by project at this point is too big a chunk. I'll change it to be project-based before I'm done. > fail precommit on unparameterised log messages and examine for wasted > work/objects > -- > > Key: LUCENE-7788 > URL: https://issues.apache.org/jira/browse/LUCENE-7788 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Erick Erickson >Priority: Minor > Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > SOLR-10415 would be removing existing unparameterised log.trace messages use > and once that is in place then this ticket's one-line change would be for > 'ant precommit' to reject any future unparameterised log.trace message use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects
[ https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-7788: --- Attachment: gradle_only.patch Status: Open (was: Open) I'm in a bit of an awkward spot, my fork has a ton of changes unrelated to just incorporating Gradle. So I'm attaching a separate patch that only contains the gradle bits in the hope that [~dweiss] (or anyone more gradle-knowledgable than me) will take a peek at it. If it's OK (or at least a good place to start), I'll fold it into my fork for the next commit. Which will be Real Soon Now, like Friday. Please don't bother with how the check actually works, all I'm really asking for is whether this looks like something that doesn't violate Gradle norms too violently. NOTE: Checking file paths rather than projects for inclusion/exclusion is temporary, going by project at this point is too big a chunk. I'll change it to be project-based before I'm done. > fail precommit on unparameterised log messages and examine for wasted > work/objects > -- > > Key: LUCENE-7788 > URL: https://issues.apache.org/jira/browse/LUCENE-7788 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Erick Erickson >Priority: Minor > Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > SOLR-10415 would be removing existing unparameterised log.trace messages use > and once that is in place then this ticket's one-line change would be for > 'ant precommit' to reject any future unparameterised log.trace message use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects
[ https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084423#comment-17084423 ] Erick Erickson edited comment on LUCENE-7788 at 4/15/20, 11:02 PM: --- My current thinking: 1> I think I've got the gradle task in place, sometime in the next couple of days I'll put up a preliminary version and ask for review 2> Given that the Lucene code only takes a couple of seconds to run, I I'll leave it in 3> My current Gradle integration _requires_ a relative path, i.e. "gradlew validateLoggingCalls -Psolr/core/src/java/org/apache/solr/response". Before it's done, I'll change it to "the gradle way" of specifying a project rather than a directory, defaulting to all. But right now projects are too big. 3a> really, if the path to any java file anywhere contains whatever srcDir is, it'll check the file. This is temporary so there's no need to refine it IMO. 4> this is not part of the standard check/precommit yet. It will be before I'm done. 5> I'm not happy at all with the //verify tag. First of all, exactly when it's OK to use it isn't clear at all. So I've changed my mind (again) and I'll change the check to be that the call must not have "+" signs or method calls _unless_ it's surrounded by "if (log.is*Enabled)". I think that's a much easier rule to understand. I'll also add a check that the log level corresponds to the if clause when used. was (Author: erickerickson): My current thinking: 1> I think I've got the gradle target in place, sometime in the next couple of days I'll put up a preliminary version and ask for review 2> Given that the Lucene code only takes a couple of seconds to run, I I'll leave it in 3> My current Gradle integration _requires_ a relative path, i.e. "gradlew validateLoggingCalls -Psolr/core/src/java/org/apache/solr/response". Before it's done, I'll change it to "the gradle way" of specifying a project rather than a directory, defaulting to all. But right now projects are too big. 3a> really, if the path to any java file anywhere contains whatever targetDir is, it'll check the file. This is temporary so there's no need to refine it IMO. 4> this is not part of the standard check/precommit yet. It will be before I'm done. 5> I'm not happy at all with the //verify tag. First of all, exactly when it's OK to use it isn't clear at all. So I've changed my mind (again) and I'll change the check to be that the call must not have "+" signs or method calls _unless_ it's surrounded by "if (log.is*Enabled)". I think that's a much easier rule to understand. I'll also add a check that the log level corresponds to the if clause when used. > fail precommit on unparameterised log messages and examine for wasted > work/objects > -- > > Key: LUCENE-7788 > URL: https://issues.apache.org/jira/browse/LUCENE-7788 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Erick Erickson >Priority: Minor > Attachments: LUCENE-7788.patch, LUCENE-7788.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > SOLR-10415 would be removing existing unparameterised log.trace messages use > and once that is in place then this ticket's one-line change would be for > 'ant precommit' to reject any future unparameterised log.trace message use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects
[ https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084428#comment-17084428 ] Erick Erickson commented on LUCENE-7788: Why? What's the advantage of having yet another idiom that is sometimes one way and sometimes another? We have far too many nooks and crannies in Solr that are opaque, I'm reluctant to add yet another one. And note that adding I'm proposing adding the "if" clause _only_ if the logging message contains a method call, or can't be rewritten to avoid the string concatenation. I'm not proposing wrapping every logging call in an if clause. Plus, the number of INFO level logging calls completely dominates finer-grained calls. It's perfectly reasonable to run at WARN level and expect that the WARN messages are something that you should pay attention to and can turn on INFO when needed. The entire discussion about whether we should look at the thousands of logging calls and figure out which ones should be at a different level is another topic, maybe SOLR-11934 > fail precommit on unparameterised log messages and examine for wasted > work/objects > -- > > Key: LUCENE-7788 > URL: https://issues.apache.org/jira/browse/LUCENE-7788 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Erick Erickson >Priority: Minor > Attachments: LUCENE-7788.patch, LUCENE-7788.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > SOLR-10415 would be removing existing unparameterised log.trace messages use > and once that is in place then this ticket's one-line change would be for > 'ant precommit' to reject any future unparameterised log.trace message use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects
[ https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084424#comment-17084424 ] David Smiley commented on LUCENE-7788: -- Can we skip this for info, warn & error please? These are generally logged by default anyway, thus this new check will be even less value for the hassle it brings. > fail precommit on unparameterised log messages and examine for wasted > work/objects > -- > > Key: LUCENE-7788 > URL: https://issues.apache.org/jira/browse/LUCENE-7788 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Erick Erickson >Priority: Minor > Attachments: LUCENE-7788.patch, LUCENE-7788.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > SOLR-10415 would be removing existing unparameterised log.trace messages use > and once that is in place then this ticket's one-line change would be for > 'ant precommit' to reject any future unparameterised log.trace message use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects
[ https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084423#comment-17084423 ] Erick Erickson commented on LUCENE-7788: My current thinking: 1> I think I've got the gradle target in place, sometime in the next couple of days I'll put up a preliminary version and ask for review 2> Given that the Lucene code only takes a couple of seconds to run, I I'll leave it in 3> My current Gradle integration _requires_ a relative path, i.e. "gradlew validateLoggingCalls -Psolr/core/src/java/org/apache/solr/response". Before it's done, I'll change it to "the gradle way" of specifying a project rather than a directory, defaulting to all. But right now projects are too big. 3a> really, if the path to any java file anywhere contains whatever targetDir is, it'll check the file. This is temporary so there's no need to refine it IMO. 4> this is not part of the standard check/precommit yet. It will be before I'm done. 5> I'm not happy at all with the //verify tag. First of all, exactly when it's OK to use it isn't clear at all. So I've changed my mind (again) and I'll change the check to be that the call must not have "+" signs or method calls _unless_ it's surrounded by "if (log.is*Enabled)". I think that's a much easier rule to understand. I'll also add a check that the log level corresponds to the if clause when used. > fail precommit on unparameterised log messages and examine for wasted > work/objects > -- > > Key: LUCENE-7788 > URL: https://issues.apache.org/jira/browse/LUCENE-7788 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Erick Erickson >Priority: Minor > Attachments: LUCENE-7788.patch, LUCENE-7788.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > SOLR-10415 would be removing existing unparameterised log.trace messages use > and once that is in place then this ticket's one-line change would be for > 'ant precommit' to reject any future unparameterised log.trace message use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9316) Incorporate all :precommit tasks into :check
[ https://issues.apache.org/jira/browse/LUCENE-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084410#comment-17084410 ] David Smiley commented on LUCENE-9316: -- Yay! So should I basically assume that {{gradlew check -x test}} will be checking everything {{ant precommit}} does? Sorry if I didn't read everything here. What remains for our CI master branch builds to use gradle? > Incorporate all :precommit tasks into :check > > > Key: LUCENE-9316 > URL: https://issues.apache.org/jira/browse/LUCENE-9316 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: master (9.0) > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7701) Refactor grouping collectors
[ https://issues.apache.org/jira/browse/LUCENE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084365#comment-17084365 ] Mikhail Khludnev commented on LUCENE-7701: -- Hi, [~romseygeek], would you mind if I followup here? Turns out, if: # {{group.truncate=true}} # {{group.sort=docvalues_enabled_field asc}} The following hotspot pops up: {code} "stackTrace":["org.apache.lucene.store.ByteBufferIndexInput.slice(ByteBufferIndexInput.java:268)", "org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.slice(ByteBufferIndexInput.java:347)", "org.apache.lucene.store.IndexInput.randomAccessSlice(IndexInput.java:122)", "org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict.(Lucene80DocValuesProducer.java:943)", "org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer.getSorted(Lucene80DocValuesProducer.java:750)", "org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.getSorted(PerFieldDocValuesFormat.java:329)", "org.apache.lucene.index.DocValues.getSorted(DocValues.java:367)", "org.apache.lucene.search.FieldComparator$TermOrdValComparator.getSortedDocValues(FieldComparator.java:709)", "org.apache.lucene.search.FieldComparator$TermOrdValComparator.getLeafComparator(FieldComparator.java:714)", "org.apache.lucene.search.grouping.AllGroupHeadsCollector$SortingGroupHead.(AllGroupHeadsCollector.java:266)", "org.apache.lucene.search.grouping.AllGroupHeadsCollector$SortingGroupHeadsCollector.newGroupHead(AllGroupHeadsCollector.java:250)", "org.apache.lucene.search.grouping.AllGroupHeadsCollector.collect(AllGroupHeadsCollector.java:133)", "org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:440)", "org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:158)", "org.apache.solr.handler.component.QueryComponent.doProcessGroupedDistributedSearchSecondPhase(QueryComponent.java:1399)", "org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:386)", {code} I read it as follows, when group collector encounter new group it creates field compactor for this value that opens DocValues that turns out to be a way more expensive to open rather than old -good- {{FieldCache}}. I think DocValues should somehow to be reused between groups. WDYT? > Refactor grouping collectors > > > Key: LUCENE-7701 > URL: https://issues.apache.org/jira/browse/LUCENE-7701 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Priority: Major > Fix For: 7.0 > > Attachments: LUCENE-7701.patch, LUCENE-7701.patch > > > Grouping currently works via abstract collectors, which need to be overridden > for each way of defining a group - currently we have two, 'term' (based on > SortedDocValues) and 'function' (based on ValueSources). These collectors > all have a lot of repeated code, and means that if you want to implement your > own group definitions, you need to override four or five different classes. > This would be easier to deal with if instead the 'group selection' code was > abstracted out into a single interface, and the various collectors were > changed to concrete implementations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r409079589 ## File path: lucene/core/src/java/org/apache/lucene/search/LeafCollector.java ## @@ -93,4 +93,16 @@ */ void collect(int doc) throws IOException; + /** + * Optionally creates a view of the scorerIterator where only competitive documents + * in the scorerIterator are kept and non-competitive are skipped. + * + * Collectors should delegate this method to their comparators if + * their comparators provide the skipping functionality over non-competitive docs. + * The default is to return the same iterator which is interpreted as the collector doesn't filter any documents. + */ + default DocIdSetIterator filterIterator(DocIdSetIterator scorerIterator) { +return scorerIterator; + } Review comment: Thanks, makes sense. Addressed in d7ef9b6 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API
[ https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084204#comment-17084204 ] Tomoko Uchida commented on LUCENE-9322: --- Hi [~jtibshirani], thank you for hard working on this! {code:java} TopDocs findNearestVectors(float[] queryVector, int k, int recallFactor) throws new IOException; {code} I like this interface, {{recallFactor}} might be an interface for further flexibility, but it's just an idea. {quote}Why do we have different implementations of `VectorsFormat`, couldn’t we just add an enum to the field info like `Strategy.HNSW` and `Strategy.COARSE_QUANTIZATION`? {quote} Personally I would prefer an unified file format for vectors since it is (theoretically) independent from higher level ANN algorithms. Could we expose just one "Lucene90VectorsFormat" and low-level I/O, and make only higher logic (o.a.l.a.index/document/search) to be customizable? Forward iteration is encouraged anyway... {quote}What about different distance metrics like angular and L1 distance? {quote} JFYI I previously implemented switchable distance function on the HNSW branch, if you have not noticed it: [https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/index/VectorValues.java]. It is implemented by enum with {{distance()}} function. Also, I think it would be good to persist (in the codec) which distance metric we use for the field. {quote}How exactly is this used in a search? Where are the `Query` classes? This would be the next part of the API to design/ discuss. {quote} We could refer/follow o.a.l.a.index.PointValues's approach, in other words, concrete field classes with newXXXQuery() methods? [https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/PointValues.java] Query part would also need some abstraction and there are many things to be well thought..., so could we discuss about it in another dedicated issue, to keep the scope here small ? > Discussing a unified vectors format API > --- > > Key: LUCENE-9322 > URL: https://issues.apache.org/jira/browse/LUCENE-9322 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Julie Tibshirani >Priority: Major > > Two different approximate nearest neighbor approaches are currently being > developed, one based on HNSW ([#LUCENE-9004]) and another based on coarse > quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to > handle vectors. In LUCENE-9136 we discussed the possibility of a unified API > that could support both approaches. The two ANN strategies give different > trade-offs in terms of speed, memory, and complexity, and it’s likely that > we’ll want to support both. Vector search is also an active research area, > and it would be great to be able to prototype and incorporate new approaches > without introducing more formats. > To me it seems like a good time to begin discussing a unified API. The > prototype for coarse quantization > ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit > soon (this depends on everyone's feedback of course). The approach is simple > and shows solid search performance, as seen > [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326]. > I think this API discussion is an important step in moving that > implementation forward. > The goals of the API would be > # Support for storing and retrieving individual float vectors. > # Support for approximate nearest neighbor search -- given a query vector, > return the indexed vectors that are closest to it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents
romseygeek commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r40656 ## File path: lucene/core/src/java/org/apache/lucene/search/LeafCollector.java ## @@ -93,4 +93,16 @@ */ void collect(int doc) throws IOException; + /** + * Optionally creates a view of the scorerIterator where only competitive documents + * in the scorerIterator are kept and non-competitive are skipped. + * + * Collectors should delegate this method to their comparators if + * their comparators provide the skipping functionality over non-competitive docs. + * The default is to return the same iterator which is interpreted as the collector doesn't filter any documents. + */ + default DocIdSetIterator filterIterator(DocIdSetIterator scorerIterator) { +return scorerIterator; + } Review comment: Oh that's a good point. +1 to just return an iterator based on the comparator, and do the conjuncion/combination in `BulkScorer` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents
romseygeek commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r408887800 ## File path: lucene/core/src/java/org/apache/lucene/search/ScoreMode.java ## @@ -24,37 +24,53 @@ /** * Produced scorers will allow visiting all matches and get their score. */ - COMPLETE { -@Override -public boolean needsScores() { - return true; -} - }, + COMPLETE(true, true), /** * Produced scorers will allow visiting all matches but scores won't be * available. */ - COMPLETE_NO_SCORES { -@Override -public boolean needsScores() { - return false; -} - }, + COMPLETE_NO_SCORES(true, false), /** * Produced scorers will optionally allow skipping over non-competitive * hits using the {@link Scorer#setMinCompetitiveScore(float)} API. */ - TOP_SCORES { -@Override -public boolean needsScores() { - return true; -} - }; + TOP_SCORES(false, true), + + /** + * ScoreMode for top field collectors that can provide their own iterators, + * to optionally allow to skip for non-competitive docs + */ + TOP_DOCS(false, false), + + /** + * ScoreMode for top field collectors that can provide their own iterators, + * to optionally allow to skip for non-competitive docs. + * This mode is used when there is a secondary sort by _score. + */ + TOP_DOCS_WITH_SCORES(false, true); Review comment: But `TOP_SCORES` and `TOP_DOCS_WITH_SCORES` have identical `needsScores()` and `isExhaustive()` values, so I'm not sure why we need both? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9317) Resolve package name conflicts for StandardAnalyzer to allow Java module system support
[ https://issues.apache.org/jira/browse/LUCENE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084123#comment-17084123 ] David Ryan commented on LUCENE-9317: I did a new experiment. This is quite severe looking, however, it could make sense. [https://github.com/oobles/lucene-solr/commit/5e25a9a9f4af9641b2ca01565060d4cb244b9266] The main changes are to move the following packages from common analysis to core: * org.apache.lucene.analysis.core * org.apache.lucene.analysis.custom * org.apache.lucene.analysis.standard (Just StandardTokenizerFactory) * org.apache.lucene.analysis.util This is based on the following comments from Uwe: {quote}One reason why the factories should move to core is that once we did this, one no longer need to depend on analyzers-common anymore. If he has a set of factories and tokenizers/filters and otherwise only requires the default ones, he can completely remove the huge common.jar file! Also public and commonly used abstract base classes should not be part of an optional module! {quote} Potentially not all of the classes in those packages need to be moved and would need someone with more knowledge than me to decide. Moving these to core has the benefit of not needing to change any of their names and leaves StandardAnalysis. No need for constructor changes either. The factory test cases will need to be updated so they don't rely on all the classes in common analysis. As the Tokenizers and TokenFilters are split over both jars, I tested to ensure both were loaded in the common analysis test cases which of course they were. The next changes look severe, but I think have less flow on effects. I've renamed org.apache.lucene.analysis to org.apache.lucene.common.analysis. This has the benefit of now matching the jar name and future module name. It removes any conflicts for packages. I moved classic from standard to classic package. I moved the UAX29* classes from standard to email. Both of these could have been left in oal.common.analysis.standard too. There's a few test cases I would need to fix if you think this approach is worth continuing, but I think generally makes a lot of sense. Most test cases are passing and I added Ignore to a few that need updating. Apologies that the commit is difficult to review. I staged the changes and moves in one commit when I should have done it as moves then changes. Let me know if you'd rather I redo it. > Resolve package name conflicts for StandardAnalyzer to allow Java module > system support > --- > > Key: LUCENE-9317 > URL: https://issues.apache.org/jira/browse/LUCENE-9317 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Affects Versions: master (9.0) >Reporter: David Ryan >Priority: Major > Labels: build, features > > > To allow Lucene to be modularised there are a few preparatory tasks to be > completed prior to this being possible. The Java module system requires that > jars do not use the same package name in different jars. The lucene-core and > lucene-analyzers-common both share the package > org.apache.lucene.analysis.standard. > Possible resolutions to this issue are discussed by Uwe on the mailing list > here: > > [http://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAM21Rt8FHOq_JeUSELhsQJH0uN0eKBgduBQX4fQKxbs49TLqzA%40mail.gmail.com%3E] > {quote}About StandardAnalyzer: Unfortunately I aggressively complained a > while back when Mike McCandless wanted to move standard analyzer out of the > analysis package into core (“for convenience”). This was a bad step, and IMHO > we should revert that or completely rename the packages and everything. The > problem here is: As the analysis services are only part of lucene-analyzers, > we had to leave the factory classes there, but move the implementation > classes in core. The package has to be the same. The only way around that is > to move the analysis factory framework also to core (I would not be against > that). This would include all factory base classes and the service loading > stuff. Then we can move standard analyzer and some of the filters/tokenizers > including their factories to core an that problem would be solved. > {quote} > There are two options here, either move factory framework into core or revert > StandardAnalyzer back to lucene-analyzers. In the email, the solution lands > on reverting back as per the task list: > {quote}Add some preparatory issues to cleanup class hierarchy: Move Analysis > SPI to core / remove StandardAnalyzer and related classes out of core back to > anaysis > {quote} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LUCENE-9271) Make BufferedIndexInput work on a ByteBuffer
[ https://issues.apache.org/jira/browse/LUCENE-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-9271. -- Fix Version/s: 8.6 master (9.0) Resolution: Fixed > Make BufferedIndexInput work on a ByteBuffer > > > Key: LUCENE-9271 > URL: https://issues.apache.org/jira/browse/LUCENE-9271 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: master (9.0), 8.6 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently {{BufferedIndexInput}} works on a {{byte[]}} but its main > implementation, in NIOFSDirectory, has to implement a hack to maintain a > ByteBuffer view of it that it can use in calls to the FileChannel API. Maybe > we should instead make {{BufferedIndexInput}} work directly on a > {{ByteBuffer}}? This would also help reuse the existing > {{ByteBuffer#get(|Short|Int|long)}} methods instead of duplicating them from > {{DataInput}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9260) Verify checksums of CFS files?
[ https://issues.apache.org/jira/browse/LUCENE-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084115#comment-17084115 ] ASF subversion and git services commented on LUCENE-9260: - Commit 4a559ac0c43ae40b9a70db679cc84d05a2e5f440 in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4a559ac ] LUCENE-9260: Verify checksums of CFS files. (#1311) > Verify checksums of CFS files? > -- > > Key: LUCENE-9260 > URL: https://issues.apache.org/jira/browse/LUCENE-9260 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > While CFS files write checksums in their footer, we never validate these > checksums. Can we verify them in LeafReader#checkIntegrity? > This checksum is a bit redundant with the checksums of the files that are > stored in the CFS file, but I'd rather verify some bytes multiple times than > have checksums that never get verified? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9260) Verify checksums of CFS files?
[ https://issues.apache.org/jira/browse/LUCENE-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-9260. -- Fix Version/s: 8.6 master (9.0) Resolution: Fixed > Verify checksums of CFS files? > -- > > Key: LUCENE-9260 > URL: https://issues.apache.org/jira/browse/LUCENE-9260 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: master (9.0), 8.6 > > Time Spent: 20m > Remaining Estimate: 0h > > While CFS files write checksums in their footer, we never validate these > checksums. Can we verify them in LeafReader#checkIntegrity? > This checksum is a bit redundant with the checksums of the files that are > stored in the CFS file, but I'd rather verify some bytes multiple times than > have checksums that never get verified? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9307) Remove the ability to set the buffer size on an existing BufferedIndexInput
[ https://issues.apache.org/jira/browse/LUCENE-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-9307. -- Fix Version/s: master (9.0) Resolution: Fixed > Remove the ability to set the buffer size on an existing BufferedIndexInput > --- > > Key: LUCENE-9307 > URL: https://issues.apache.org/jira/browse/LUCENE-9307 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: master (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > This feature is only used as an optimization when reading skip lists. Since > our default directory doesn't use buffering, I'd suggest removing it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9260) Verify checksums of CFS files?
[ https://issues.apache.org/jira/browse/LUCENE-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084070#comment-17084070 ] ASF subversion and git services commented on LUCENE-9260: - Commit 0aa4ba7ccb2f2c12e213ea34d76af378e55e3bf9 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0aa4ba7 ] LUCENE-9260: Verify checksums of CFS files. (#1311) > Verify checksums of CFS files? > -- > > Key: LUCENE-9260 > URL: https://issues.apache.org/jira/browse/LUCENE-9260 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > While CFS files write checksums in their footer, we never validate these > checksums. Can we verify them in LeafReader#checkIntegrity? > This checksum is a bit redundant with the checksums of the files that are > stored in the CFS file, but I'd rather verify some bytes multiple times than > have checksums that never get verified? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9307) Remove the ability to set the buffer size on an existing BufferedIndexInput
[ https://issues.apache.org/jira/browse/LUCENE-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084068#comment-17084068 ] ASF subversion and git services commented on LUCENE-9307: - Commit aa605b3c70fa5a4fa51761f318d134e387059e28 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=aa605b3 ] LUCENE-9307: Remove the ability to set the buffer size dynamically on BufferedIndexInput (#1415) > Remove the ability to set the buffer size on an existing BufferedIndexInput > --- > > Key: LUCENE-9307 > URL: https://issues.apache.org/jira/browse/LUCENE-9307 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > This feature is only used as an optimization when reading skip lists. Since > our default directory doesn't use buffering, I'd suggest removing it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz merged pull request #1311: LUCENE-9260: Verify checksums of CFS files.
jpountz merged pull request #1311: LUCENE-9260: Verify checksums of CFS files. URL: https://github.com/apache/lucene-solr/pull/1311 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz merged pull request #1415: LUCENE-9307: Remove the ability to set the buffer size dynamically on BufferedIndexInput
jpountz merged pull request #1415: LUCENE-9307: Remove the ability to set the buffer size dynamically on BufferedIndexInput URL: https://github.com/apache/lucene-solr/pull/1415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14408) Refactor MoreLikeThisHandler Implementation
[ https://issues.apache.org/jira/browse/SOLR-14408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083995#comment-17083995 ] Nazerke Seidan commented on SOLR-14408: --- just linked the PR > Refactor MoreLikeThisHandler Implementation > --- > > Key: SOLR-14408 > URL: https://issues.apache.org/jira/browse/SOLR-14408 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: MoreLikeThis >Reporter: Nazerke Seidan >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > The main goal of this refactoring is for readability and accessibility of > MoreLikeThisHandler class. Current MoreLikeThisHandler class consists of two > static subclasses and accessing them later in MoreLikeThisComponent. I > propose to have them as separate public classes. > cc: [~abenedetti], as you have had the recent commit for MLT, what do you > think about this? Anyway, the code is ready for review. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] NazerkeBS opened a new pull request #1433: SOLR-14408 Refactor MoreLikeThisHandler implementation
NazerkeBS opened a new pull request #1433: SOLR-14408 Refactor MoreLikeThisHandler implementation URL: https://github.com/apache/lucene-solr/pull/1433 # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14408) Refactor MoreLikeThisHandler Implementation
[ https://issues.apache.org/jira/browse/SOLR-14408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083990#comment-17083990 ] Alessandro Benedetti commented on SOLR-14408: - can you attach a Pull Request to review? happy to take a look. I will actively work on More Like This refactor to make it more usable. > Refactor MoreLikeThisHandler Implementation > --- > > Key: SOLR-14408 > URL: https://issues.apache.org/jira/browse/SOLR-14408 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: MoreLikeThis >Reporter: Nazerke Seidan >Priority: Minor > > The main goal of this refactoring is for readability and accessibility of > MoreLikeThisHandler class. Current MoreLikeThisHandler class consists of two > static subclasses and accessing them later in MoreLikeThisComponent. I > propose to have them as separate public classes. > cc: [~abenedetti], as you have had the recent commit for MLT, what do you > think about this? Anyway, the code is ready for review. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14408) Refactor MoreLikeThisHandler Implementation
Nazerke Seidan created SOLR-14408: - Summary: Refactor MoreLikeThisHandler Implementation Key: SOLR-14408 URL: https://issues.apache.org/jira/browse/SOLR-14408 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: MoreLikeThis Reporter: Nazerke Seidan The main goal of this refactoring is for readability and accessibility of MoreLikeThisHandler class. Current MoreLikeThisHandler class consists of two static subclasses and accessing them later in MoreLikeThisComponent. I propose to have them as separate public classes. cc: [~abenedetti], as you have had the recent commit for MLT, what do you think about this? Anyway, the code is ready for review. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents
jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r408712107 ## File path: lucene/core/src/java/org/apache/lucene/search/LeafCollector.java ## @@ -93,4 +93,16 @@ */ void collect(int doc) throws IOException; + /** + * Optionally creates a view of the scorerIterator where only competitive documents + * in the scorerIterator are kept and non-competitive are skipped. + * + * Collectors should delegate this method to their comparators if + * their comparators provide the skipping functionality over non-competitive docs. + * The default is to return the same iterator which is interpreted as the collector doesn't filter any documents. + */ + default DocIdSetIterator filterIterator(DocIdSetIterator scorerIterator) { +return scorerIterator; + } Review comment: This allows for some hacks like returning an iterator that matches more docs than the scorer. I liked the previous approach that returned an iterator better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents
jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r403471265 ## File path: lucene/core/src/java/org/apache/lucene/search/SortField.java ## @@ -91,6 +91,7 @@ private String field; private Type type; // defaults to determining type dynamically boolean reverse = false; // defaults to natural order + private boolean skipNonCompetitiveDocs = false; // if true, sortField will use a comparator that can skip non-competitive docs Review comment: I'd rather not have this on SortField for now. This is an old API that never required fields to be indexed. I'd rather have new SortField implementations for now, and later look at how we can enable this in SortField. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents
jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r403471265 ## File path: lucene/core/src/java/org/apache/lucene/search/SortField.java ## @@ -91,6 +91,7 @@ private String field; private Type type; // defaults to determining type dynamically boolean reverse = false; // defaults to natural order + private boolean skipNonCompetitiveDocs = false; // if true, sortField will use a comparator that can skip non-competitive docs Review comment: I'd rather not have this on SortField for now. This is an old API that never required fields to be indexed. I'd rather have new SortField implementations for now, and later look at how we can enable this in SortField. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents
jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r408712107 ## File path: lucene/core/src/java/org/apache/lucene/search/LeafCollector.java ## @@ -93,4 +93,16 @@ */ void collect(int doc) throws IOException; + /** + * Optionally creates a view of the scorerIterator where only competitive documents + * in the scorerIterator are kept and non-competitive are skipped. + * + * Collectors should delegate this method to their comparators if + * their comparators provide the skipping functionality over non-competitive docs. + * The default is to return the same iterator which is interpreted as the collector doesn't filter any documents. + */ + default DocIdSetIterator filterIterator(DocIdSetIterator scorerIterator) { +return scorerIterator; + } Review comment: This allows for some hacks like returning an iterator that matches more hacks than the scorer. I liked the previous approach that returned an iterator better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14013) javabin performance regressions
[ https://issues.apache.org/jira/browse/SOLR-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083861#comment-17083861 ] ASF subversion and git services commented on SOLR-14013: Commit 5d3dfbd0ce8a2ad990635e71144615f1c4815d22 in lucene-solr's branch refs/heads/branch_7_7 from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5d3dfbd ] SOLR-14013: trying to port to SOlr 7.7 (#1254) > javabin performance regressions > --- > > Key: SOLR-14013 > URL: https://issues.apache.org/jira/browse/SOLR-14013 > Project: Solr > Issue Type: Bug >Affects Versions: 7.7 >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Blocker > Fix For: 8.4 > > Attachments: SOLR-14013.patch, SOLR-14013.patch, TestQuerySpeed.java, > test.json > > > As noted by [~rrockenbaugh] in SOLR-13963, javabin also recently became > orders of magnitude slower in certain cases since v7.7. The cases identified > so far include large numbers of values in a field. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul merged pull request #1254: SOLR-14259: Back porting SOLR-14013 to Solr 7.7
noblepaul merged pull request #1254: SOLR-14259: Back porting SOLR-14013 to Solr 7.7 URL: https://github.com/apache/lucene-solr/pull/1254 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org