[jira] [Commented] (SOLR-13289) Support for BlockMax WAND
[ https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103601#comment-17103601 ] David Smiley commented on SOLR-13289: - Should we really add {{numFoundExact="true"}} on responses where the user didn't even specify a parameter to control this new feature? I prefer not adding the noise. I like the name {{numFoundExact}} in the response compared to others we explored – a last minute change I see. Wouldn't we want the controlling parameter to use "numFound" likewise instead of "hits"? I propose {{minNumFoundToBeExact}}. The word "hits" isn't particularly widespread in Solr, except for cache hits. I spent some time today reviewing what you pushed more closely, and especially testing my theory that there is a problem with interactions with the Collapse PostFilter/Collector. +There is, albeit not a big problem.+ Essentially the Collapse PostFilter must see and cache all docs before passing those it deems appropriate on to the rest of the collectors. TopDocs Collector is downstream of it, and TDC tries to tell the Scorer to do approximation stuff but it is in vain because by this point, all the docs are already accumulated cached with Collapse. Other than a possible waste in computation, it ultimately results in Solr saying that the results weren't exact when they are actually exact. I pushed a commit to my fork to demonstrate the problem: [https://github.com/dsmiley/lucene-solr/commit/8803db97a5e4deb0ad5f3bdaabd02cd3b302a09f] Interestingly I see some other test failures there. I think the solution is in {{org.apache.solr.search.SolrIndexSearcher#getDocListNC}} in the second half of the method ({{lastDocRequested <= 0}} i.e. top-X results case), right before {{buildTopDocsCollector}} in invoked, set {{cmd.setMinExactHits(Integer.MAX_VALUE);}} only if {{pf.postFilter.scoreMode}} isn't null and isn't TOP_SCORES, thus it's one of the two COMPLETE options. COMPLETE means the Scorer needs yield all matching docs. > Support for BlockMax WAND > - > > Key: SOLR-13289 > URL: https://issues.apache.org/jira/browse/SOLR-13289 > Project: Solr > Issue Type: New Feature >Reporter: Ishan Chattopadhyaya >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Major > Attachments: SOLR-13289.patch, SOLR-13289.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > > LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to > expose this via Solr. When enabled, the numFound returned will not be exact. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8716) Test logging can bleed from one suite to another, cause failures due to sysout limits
[ https://issues.apache.org/jira/browse/LUCENE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103497#comment-17103497 ] Erick Erickson commented on LUCENE-8716: This is still a puzzle, I have a filter set up to collect failures that mention this limit. I just upgraded log4j2 to 2.13.2, see: SOLR-14466. I'll start looking again at these failures if/when more come through. > Test logging can bleed from one suite to another, cause failures due to > sysout limits > - > > Key: LUCENE-8716 > URL: https://issues.apache.org/jira/browse/LUCENE-8716 > Project: Lucene - Core > Issue Type: Test >Reporter: Chris M. Hostetter >Assignee: Erick Erickson >Priority: Major > Attachments: thetaphi_Lucene-Solr-master-Linux_23743.log.txt > > > in solr land, {{HLLUtilTest}} is an incredibly tiny, simple, test that tests > a utility method w/o using any other solr features or doing any logging - as > such it extends {{LuceneTestCase}} directly, and doesn't use any of the > typical solr test framework/plumbing or {{@SuppressSysoutChecks}} > on a recent jenkins build, {{HLLUtilTest}} failed due to too much sysoutput > -- all of which seems to have come from the previous test run on that JVM -- > {{TestStressReorder}} -- suggesting that somehow the sysout from one test > suite can bleed over into the next suite? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13075) Harden SaslZkACLProviderTest.
[ https://issues.apache.org/jira/browse/SOLR-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103493#comment-17103493 ] Erick Erickson commented on SOLR-13075: --- It's pretty clear I won't get to this, so unassigning. > Harden SaslZkACLProviderTest. > - > > Key: SOLR-13075 > URL: https://issues.apache.org/jira/browse/SOLR-13075 > Project: Solr > Issue Type: Sub-task >Reporter: Mark Miller >Priority: Major > Attachments: SOLR-13075.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13075) Harden SaslZkACLProviderTest.
[ https://issues.apache.org/jira/browse/SOLR-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-13075: - Assignee: (was: Erick Erickson) > Harden SaslZkACLProviderTest. > - > > Key: SOLR-13075 > URL: https://issues.apache.org/jira/browse/SOLR-13075 > Project: Solr > Issue Type: Sub-task >Reporter: Mark Miller >Priority: Major > Attachments: SOLR-13075.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1501: SOLR-13289: Add Refguide changes
dsmiley commented on a change in pull request #1501: URL: https://github.com/apache/lucene-solr/pull/1501#discussion_r422539014 ## File path: solr/solr-ref-guide/src/common-query-parameters.adoc ## @@ -361,3 +361,42 @@ This is what happens if a similar request is sent that adds `echoParams=all` to } } + +== minExactHits Parameter Review comment: Wouldn't "minNumFoundExact" be a better parameter name because it aligns with the "numFound" that we expose? I searched for "hits" in Solr string literals and the ref guide. It's not used much; more often refers to cache hits. ## File path: solr/solr-ref-guide/src/common-query-parameters.adoc ## @@ -361,3 +361,42 @@ This is what happens if a similar request is sent that adds `echoParams=all` to } } + +== minExactHits Parameter +When this parameter is used, Solr will count the number of hits accurately at least until this value. After that, Solr can skip over documents that don't have a score high enough to enter in the top N. This can greatly improve performance o search queries. On the other hand, when this parameter is used, the `numFound` may not be exact, and may instead be an approximation. +The `numFoundExact` boolean attribute is included in all responses, indicating if the `numFound` value is exact or an approximation. If it's an approximation, the real number of hits for the query is guaranteed to be greater or equal `numFound`. + +More about approximate document counting and `minExactHits`: +* The documents returned in the response are guaranteed to be the docs with the top scores. This parameter will not skip documents that are to be returned in the response, it will only skip counting docs that, while they match the query, their score is low enough to not be in the top N. +* Providing `minExactHits` doesn't guarantee that Solr will use approximate hit counting (and thus, provide the speedup). Some types of queries, or other parameters (like if facets are requested) will require accurate counting. The value of `numFoundExact` indicates if the approximation was used or not. +* Approximate counting can only be used when sorting by `score desc` first (which is the default sort in Solr). Other fields can be used after `score desc`, but if any other type of sorting is used before score, then the approximation won't be applied. +* When doing distributed queries across multiple shards, each shard will accurately count hits until `minExactHits` (which means the query could be hitting `numShards * minExactHits` docs and `numFound` in the response would still be accurate) +For example: + +[source,text] +q=quick brown fox=100=10 + +[source,json] + +"response": { +"numFound": 153, +"start": 0, +"hitCountExact": false, Review comment: didn't we agree on "numFoundExact" ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14426) forbidden api error during precommit DateMathFunction
[ https://issues.apache.org/jira/browse/SOLR-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103452#comment-17103452 ] David Smiley commented on SOLR-14426: - Moving these classes to within another class disturbed the API (i.e. the very identity of these classes). If it's only used within the same source file then this makes sense to me, but not otherwise. The other solution I don't see discussed here is giving them their own source file. When I look at FaceContext in particular, it appears it's used widely and thus would make a good top level class, not inside FacetRequest. Please revisit this Mike. > forbidden api error during precommit DateMathFunction > - > > Key: SOLR-14426 > URL: https://issues.apache.org/jira/browse/SOLR-14426 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Build >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: master (9.0) > > Time Spent: 50m > Remaining Estimate: 0h > > When running `./gradlew precommit` I'll occasionally see > {code} > * What went wrong: > Execution failed for task ':solr:contrib:analytics:forbiddenApisMain'. > > de.thetaphi.forbiddenapis.ForbiddenApiException: Check for forbidden API > > calls failed while scanning class > > 'org.apache.solr.analytics.function.mapping.DateMathFunction' > > (DateMathFunction.java): java.lang.ClassNotFoundException: > > org.apache.solr.analytics.function.mapping.DateMathValueFunction (while > > looking up details about referenced class > > 'org.apache.solr.analytics.function.mapping.DateMathValueFunction') > {code} > `./gradlew clean` fixes this, but I don't understand what or why this > happens. Feels like a gradle issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-10562) TestSolrCLIRunExample failures indicating documents just indexed are not all searchable.
[ https://issues.apache.org/jira/browse/SOLR-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-10562. --- Resolution: Fixed This test hasn't failed since September of 2019, so I'll close it. Apparently SOLR-11035 and, perhaps, other changes since fixed this. > TestSolrCLIRunExample failures indicating documents just indexed are not all > searchable. > > > Key: SOLR-10562 > URL: https://issues.apache.org/jira/browse/SOLR-10562 > Project: Solr > Issue Type: Bug >Affects Versions: 6.6, 7.0 >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: 1_1.res, 1_2.res, 2_1.res, 2_2.res, debug.patch, > runcli_12.log > > > I've been beating the heck out of some test cases for fear that > SOLR-10007 really messed things up and I can get a pretty regular test > failure for TestSolrCLIRunExample.testInteractiveSolrCloudExample, but > it doesn't make sense. > So I went back to a revision _before_ SOLR-10007 and it still fails. > But the failure is "impossible". I put a bunch of log.error messages > in and, for experimental purposes a for loop in the test. Here's the > lines that fail in the original: > {code} > for (idx = 0; idx < 10; ++idx) { > construct a SolrInputDoc and then: > cloudClient.add(SolrInputDoc); > } > cloudClient.commit(); > QueryResponse qr = cloudClient.query(new SolrQuery("str_s:a")); > if (qr.getResults().getNumFound() != numDocs) { > fail("Expected "+numDocs+" to be found in the "+collectionName+ > " collection but only found "+qr.getResults().getNumFound()); > } > {code} > If I put the above (not the commit, just the query and the test) in a > loop and check the query 10 times with a 1 second sleep if the numDocs > != getNumFound(). Quite regularly I'll see a message in the log file > that getNumFound() != numDocs, but after a few loops getNumFound() == > numDocs and the test succeeds. > cloudClient is what you'd expect: > cloudClient = > getCloudSolrClient(executor.solrCloudCluster.getZkServer().getZkAddress()); > So unless I'm hallucinating, any tests that rely on > cloudClient.commit() insuring that all docs sent to the cluster are > searchable will potentially fail on occasion. > I looked over the JIRAs briefly and don't see any mentions, of a > similar problem, but I may have missed it. > The logging I'm writing from the update handler _seems_ to show it to be > doing the right thing. Just late. > I'll attach some data along with a "patch" which generates the logging > information. I also attempted to submit a single batch rather than 10 > individual docs and that fails too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-12301) Umbrella issue: paramaterize logging calls in Solr, use consistent naming conventions for the logger
[ https://issues.apache.org/jira/browse/SOLR-12301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-12301. --- Fix Version/s: 8.6 Resolution: Fixed This will be slightly untidy for the rest of Solr 8x. LUCENE-7788 fixed all the logging calls in both master and 8.6 as per this JIRA. However, only the master gradle "check" and "precommit" actions have checks that will flag these kinds of problems and fail, so in the interim some of these may creep back into future 8x releases. This shouldn't be much of a problem for any code that has gradle check or precommit run before merging into 8x. Back-porting to Ant is more trouble that this issue is worth given 8x has been cleaned up. > Umbrella issue: paramaterize logging calls in Solr, use consistent naming > conventions for the logger > > > Key: SOLR-12301 > URL: https://issues.apache.org/jira/browse/SOLR-12301 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Fix For: 8.6 > > > See the discussion at SOLR-12286 for a lot of background, but the short form > is that logging calls of the form > log.info("somehting" + "something"); > and > log.info("soemthing {}", object.someFunction()); > where someFunction includes toString() > generate useless garbage/work even when the message is not printed. > log.info("somehting {}", "something"); > and > log.info("soemthing {}", object); > do not. The first form is something of a relic, and there are even some uses > of the second. > This will entail a LOT of changes, but almost all secretarial. I'll take it > in chunks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11934) Visit Solr logging, it's too noisy.
[ https://issues.apache.org/jira/browse/SOLR-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-11934. --- Fix Version/s: 8.6 Resolution: Fixed After going around a few times, the egregious samples I analyzed were ones that fired off very frequent updates of single records, sometimes followed by external commits so _of course_ the ratio of update messages to total messages was super-high. For ill-behaved applications like this, our advice should be to set the particular classes that report huge numbers of messages to "WARN" level. That said, opening a new searcher generated 5-6 different messages at INFO level, which is unnecessary. All but one of them has been changed to log at DEBUG level and the one remaining altered to include the autowarm time. > Visit Solr logging, it's too noisy. > --- > > Key: SOLR-11934 > URL: https://issues.apache.org/jira/browse/SOLR-11934 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Fix For: 8.6 > > Time Spent: 10m > Remaining Estimate: 0h > > I think we have way too much INFO level logging. Or, perhaps more correctly, > Solr logging needs to be examined and messages logged at an appropriate level. > We log every update at an INFO level for instance. But I think we log LIR at > INFO as well. As a sysadmin I don't care to have my logs polluted with a > message for every update, but if I'm trying to keep my system healthy I want > to see LIR messages and try to understand why. > Plus, in large installations logging at INFO level is creating a _LOT_ of > files. > What I want to discuss on this JIRA is > 1> What kinds of messages do we want log at WARN, INFO, DEBUG, and TRACE > levels? > 2> Who's the audience at each level? For a running system that's functioning, > sysops folks would really like WARN messages that mean something need > attention for instance. If I'm troubleshooting should I turn on INFO? DEBUG? > TRACE? > So let's say we get some kind of agreement as to the above. Then I propose > three things > 1> Someone (and probably me but all help gratefully accepted) needs to go > through our logging and assign appropriate levels. This will take quite a > while, I intend to work on it in small chunks. > 2> Actually answer whether unnecessary objects are created when something > like log.info("whatever {}", someObjectOrMethodCall); is invoked. Is this > independent on the logging implementation used? The SLF4J and log4j seem a > bit contradictory. > 3> Maybe regularize log, logger, LOG as variable names, but that's a nit. > As a tactical approach, I suggest we tag each LoggerFactory.getLogger in > files we work on with //SOLR-(whatever number is assigned when I create > this). We can remove them all later, but since I expect to approach this > piecemeal it'd be nice to keep track of which files have been done already. > Finally, I really really really don't want to do this all at once. There are > 5-6 thousand log messages. Even at 1,000 a week that's 6 weeks, even starting > now it would probably span the 7.3 release. > This will probably be an umbrella issue so we can keep all the commits > straight and people can volunteer to "fix the files in core" as a separate > piece of work (hint). > There are several existing JIRAs about logging in general, let's link them in > here as well. > Let the discussion begin! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11934) Visit Solr logging, it's too noisy.
[ https://issues.apache.org/jira/browse/SOLR-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103371#comment-17103371 ] ASF subversion and git services commented on SOLR-11934: Commit 67be31cdcf147630835ac9346ee72e934b124f39 in lucene-solr's branch refs/heads/branch_8x from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=67be31c ] SOLR-11934: Visit Solr logging, it's too noisy. > Visit Solr logging, it's too noisy. > --- > > Key: SOLR-11934 > URL: https://issues.apache.org/jira/browse/SOLR-11934 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I think we have way too much INFO level logging. Or, perhaps more correctly, > Solr logging needs to be examined and messages logged at an appropriate level. > We log every update at an INFO level for instance. But I think we log LIR at > INFO as well. As a sysadmin I don't care to have my logs polluted with a > message for every update, but if I'm trying to keep my system healthy I want > to see LIR messages and try to understand why. > Plus, in large installations logging at INFO level is creating a _LOT_ of > files. > What I want to discuss on this JIRA is > 1> What kinds of messages do we want log at WARN, INFO, DEBUG, and TRACE > levels? > 2> Who's the audience at each level? For a running system that's functioning, > sysops folks would really like WARN messages that mean something need > attention for instance. If I'm troubleshooting should I turn on INFO? DEBUG? > TRACE? > So let's say we get some kind of agreement as to the above. Then I propose > three things > 1> Someone (and probably me but all help gratefully accepted) needs to go > through our logging and assign appropriate levels. This will take quite a > while, I intend to work on it in small chunks. > 2> Actually answer whether unnecessary objects are created when something > like log.info("whatever {}", someObjectOrMethodCall); is invoked. Is this > independent on the logging implementation used? The SLF4J and log4j seem a > bit contradictory. > 3> Maybe regularize log, logger, LOG as variable names, but that's a nit. > As a tactical approach, I suggest we tag each LoggerFactory.getLogger in > files we work on with //SOLR-(whatever number is assigned when I create > this). We can remove them all later, but since I expect to approach this > piecemeal it'd be nice to keep track of which files have been done already. > Finally, I really really really don't want to do this all at once. There are > 5-6 thousand log messages. Even at 1,000 a week that's 6 weeks, even starting > now it would probably span the 7.3 release. > This will probably be an umbrella issue so we can keep all the commits > straight and people can volunteer to "fix the files in core" as a separate > piece of work (hint). > There are several existing JIRAs about logging in general, let's link them in > here as well. > Let the discussion begin! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11934) Visit Solr logging, it's too noisy.
[ https://issues.apache.org/jira/browse/SOLR-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103370#comment-17103370 ] ASF subversion and git services commented on SOLR-11934: Commit 15be0db58696d379c6f7e1a6d0afa18dd7cdd43d in lucene-solr's branch refs/heads/master from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=15be0db ] SOLR-11934: Visit Solr logging, it's too noisy. > Visit Solr logging, it's too noisy. > --- > > Key: SOLR-11934 > URL: https://issues.apache.org/jira/browse/SOLR-11934 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I think we have way too much INFO level logging. Or, perhaps more correctly, > Solr logging needs to be examined and messages logged at an appropriate level. > We log every update at an INFO level for instance. But I think we log LIR at > INFO as well. As a sysadmin I don't care to have my logs polluted with a > message for every update, but if I'm trying to keep my system healthy I want > to see LIR messages and try to understand why. > Plus, in large installations logging at INFO level is creating a _LOT_ of > files. > What I want to discuss on this JIRA is > 1> What kinds of messages do we want log at WARN, INFO, DEBUG, and TRACE > levels? > 2> Who's the audience at each level? For a running system that's functioning, > sysops folks would really like WARN messages that mean something need > attention for instance. If I'm troubleshooting should I turn on INFO? DEBUG? > TRACE? > So let's say we get some kind of agreement as to the above. Then I propose > three things > 1> Someone (and probably me but all help gratefully accepted) needs to go > through our logging and assign appropriate levels. This will take quite a > while, I intend to work on it in small chunks. > 2> Actually answer whether unnecessary objects are created when something > like log.info("whatever {}", someObjectOrMethodCall); is invoked. Is this > independent on the logging implementation used? The SLF4J and log4j seem a > bit contradictory. > 3> Maybe regularize log, logger, LOG as variable names, but that's a nit. > As a tactical approach, I suggest we tag each LoggerFactory.getLogger in > files we work on with //SOLR-(whatever number is assigned when I create > this). We can remove them all later, but since I expect to approach this > piecemeal it'd be nice to keep track of which files have been done already. > Finally, I really really really don't want to do this all at once. There are > 5-6 thousand log messages. Even at 1,000 a week that's 6 weeks, even starting > now it would probably span the 7.3 release. > This will probably be an umbrella issue so we can keep all the commits > straight and people can volunteer to "fix the files in core" as a separate > piece of work (hint). > There are several existing JIRAs about logging in general, let's link them in > here as well. > Let the discussion begin! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9365) Fuzzy query has a false negative when prefix length == search term length
[ https://issues.apache.org/jira/browse/LUCENE-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103280#comment-17103280 ] Adrien Grand commented on LUCENE-9365: -- bq. Maybe we should disallow prefix == term.text().length() for FuzzyQuery? It is sort of strange to use FuzzyQuery in this way I guess that the downside of this approach is that it pushes the burden to the application? I would expect users to have a global prefix length that they apply to all fuzzy queries. Disallowing prefix >= term.length would require consumers of the API to check the length of the string that they are searching for, which would be prone to the usual errors, e.g. should it be counting java chars or unicode code points. bq. FuzzyQuery currently has checks for `prefix >= termLength` and collapses to a SingleTermEnum for that case. Maybe that's the bug and it should only do it when the prefix length is strictly greater than the term length? > Fuzzy query has a false negative when prefix length == search term length > -- > > Key: LUCENE-9365 > URL: https://issues.apache.org/jira/browse/LUCENE-9365 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Reporter: Mark Harwood >Priority: Major > > When using FuzzyQuery the search string `bba` does not match doc value `bbab` > with an edit distance of 1 and prefix length of 3. > In FuzzyQuery an automaton is created for the "suffix" part of the search > string which in this case is an empty string. > In this scenario maybe the FuzzyQuery should rewrite to a WildcardQuery of > the following form : > {code:java} > searchString + "?" > {code} > .. where there's an appropriate number of ? characters according to the edit > distance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9328) SortingGroupHead to reuse DocValues
[ https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103189#comment-17103189 ] Lucene/Solr QA commented on LUCENE-9328: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 8s{color} | {color:green} grouping in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 9s{color} | {color:red} join in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 11s{color} | {color:green} queries in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 35s{color} | {color:green} test-framework in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 46m 5s{color} | {color:red} core in the patch failed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | lucene.search.join.TestBlockJoinSelector | | | solr.TestGroupingSearch | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | LUCENE-9328 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13002452/LUCENE-9328.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / d9f9d6dd47c | | ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 | | Default Java | LTS | | unit | https://builds.apache.org/job/PreCommit-LUCENE-Build/272/artifact/out/patch-unit-lucene_join.txt | | unit | https://builds.apache.org/job/PreCommit-LUCENE-Build/272/artifact/out/patch-unit-solr_core.txt | | Test Results | https://builds.apache.org/job/PreCommit-LUCENE-Build/272/testReport/ | | modules | C: lucene/grouping lucene/join lucene/queries lucene/test-framework solr/core U: . | | Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/272/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > SortingGroupHead to reuse DocValues > --- > > Key: LUCENE-9328 > URL: https://issues.apache.org/jira/browse/LUCENE-9328 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/grouping >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Minor > Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, > LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, > LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > That's why > https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] CaoManhDat commented on a change in pull request #1470: SOLR-14354: Async or using threads in better way for HttpShardHandler
CaoManhDat commented on a change in pull request #1470: URL: https://github.com/apache/lucene-solr/pull/1470#discussion_r422459139 ## File path: solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java ## @@ -130,77 +134,64 @@ public void submit(final ShardRequest sreq, final String shard, final Modifiable final Tracer tracer = GlobalTracer.getTracer(); final Span span = tracer != null ? tracer.activeSpan() : null; -Callable task = () -> { +params.remove(CommonParams.WT); // use default (currently javabin) +params.remove(CommonParams.VERSION); +QueryRequest req = makeQueryRequest(sreq, params, shard); +req.setMethod(SolrRequest.METHOD.POST); - ShardResponse srsp = new ShardResponse(); - if (sreq.nodeName != null) { -srsp.setNodeName(sreq.nodeName); - } - srsp.setShardRequest(sreq); - srsp.setShard(shard); - SimpleSolrResponse ssr = new SimpleSolrResponse(); - srsp.setSolrResponse(ssr); - long startTime = System.nanoTime(); +LBSolrClient.Req lbReq = httpShardHandlerFactory.newLBHttpSolrClientReq(req, urls); + +ShardResponse srsp = new ShardResponse(); +if (sreq.nodeName != null) { + srsp.setNodeName(sreq.nodeName); +} +srsp.setShardRequest(sreq); +srsp.setShard(shard); +SimpleSolrResponse ssr = new SimpleSolrResponse(); +srsp.setSolrResponse(ssr); + +pending.incrementAndGet(); +// if there are no shards available for a slice, urls.size()==0 +if (urls.size() == 0) { + // TODO: what's the right error code here? We should use the same thing when Review comment: I do not, just copied and pasted from the old code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org