[GitHub] [lucene-solr] noblepaul opened a new pull request #2359: SOLR-15138: PerReplicaStates does not scale to large collections as well as state.json (8x )
noblepaul opened a new pull request #2359: URL: https://github.com/apache/lucene-solr/pull/2359 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2342: LUCENE-9406: Add IndexWriterEventListener to track events in IndexWriter
dweiss commented on pull request #2342: URL: https://github.com/apache/lucene-solr/pull/2342#issuecomment-778016468 I think it's good overall but I'm wondering whether it makes sense to make that field volatile... do we want to allow changing listeners over index writer lifecycle? I think it should be a regular field and IW should just read it once (and set forever). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052
[ https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283526#comment-17283526 ] ASF subversion and git services commented on SOLR-15136: Commit 759cb8079bd9f192936995a7d8cce4774b82 in lucene-solr's branch refs/heads/branch_8_8 from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=759cb80 ] SOLR-15136: Reduce excessive logging introduced with Per Replica States feature > need to audit for excesive INFO level logging after SOLR-15052 > -- > > Key: SOLR-15136 > URL: https://issues.apache.org/jira/browse/SOLR-15136 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 8.8.1 > > Attachments: SOLR-15136.patch > > > Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive > amount of INFO level logging, notable from ZkStateReader... > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E > this appears to be caused by SOLR-15052, but is not consistent between master > and branch_8x. at a glance it appears there was lots of new logging > introduced originally at the INFO level with "nocommit" comments to dial down > to debug -- some of which happened, but evidently a lot didn't > I'm filing this issue in the hopes that the folks involved in SOLR-15052 will > please go back and do a thorough review of what logging is on master vs 8x > and carefully reconsider what log messages added for development purposes are > still useful, and what levels they should be at. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052
[ https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya resolved SOLR-15136. - Fix Version/s: 8.8.1 Resolution: Fixed Thanks [~markus17], [~hossman]. > need to audit for excesive INFO level logging after SOLR-15052 > -- > > Key: SOLR-15136 > URL: https://issues.apache.org/jira/browse/SOLR-15136 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 8.8.1 > > Attachments: SOLR-15136.patch > > > Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive > amount of INFO level logging, notable from ZkStateReader... > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E > this appears to be caused by SOLR-15052, but is not consistent between master > and branch_8x. at a glance it appears there was lots of new logging > introduced originally at the INFO level with "nocommit" comments to dial down > to debug -- some of which happened, but evidently a lot didn't > I'm filing this issue in the hopes that the folks involved in SOLR-15052 will > please go back and do a thorough review of what logging is on master vs 8x > and carefully reconsider what log messages added for development purposes are > still useful, and what levels they should be at. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052
[ https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283524#comment-17283524 ] ASF subversion and git services commented on SOLR-15136: Commit db90ff541ec729dd04d2f293ae54467f2f3dc975 in lucene-solr's branch refs/heads/branch_8x from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=db90ff5 ] SOLR-15136: Reduce excessive logging introduced with Per Replica States feature > need to audit for excesive INFO level logging after SOLR-15052 > -- > > Key: SOLR-15136 > URL: https://issues.apache.org/jira/browse/SOLR-15136 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-15136.patch > > > Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive > amount of INFO level logging, notable from ZkStateReader... > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E > this appears to be caused by SOLR-15052, but is not consistent between master > and branch_8x. at a glance it appears there was lots of new logging > introduced originally at the INFO level with "nocommit" comments to dial down > to debug -- some of which happened, but evidently a lot didn't > I'm filing this issue in the hopes that the folks involved in SOLR-15052 will > please go back and do a thorough review of what logging is on master vs 8x > and carefully reconsider what log messages added for development purposes are > still useful, and what levels they should be at. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052
[ https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283523#comment-17283523 ] ASF subversion and git services commented on SOLR-15136: Commit 938039a6889f3b9125d123cbe11e389d6cf714ba in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=938039a ] SOLR-15136: Reduce excessive logging introduced with Per Replica States feature > need to audit for excesive INFO level logging after SOLR-15052 > -- > > Key: SOLR-15136 > URL: https://issues.apache.org/jira/browse/SOLR-15136 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-15136.patch > > > Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive > amount of INFO level logging, notable from ZkStateReader... > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E > this appears to be caused by SOLR-15052, but is not consistent between master > and branch_8x. at a glance it appears there was lots of new logging > introduced originally at the INFO level with "nocommit" comments to dial down > to debug -- some of which happened, but evidently a lot didn't > I'm filing this issue in the hopes that the folks involved in SOLR-15052 will > please go back and do a thorough review of what logging is on master vs 8x > and carefully reconsider what log messages added for development purposes are > still useful, and what levels they should be at. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9762) FunctionScoreQuery can fail when the score is requested twice
[ https://issues.apache.org/jira/browse/LUCENE-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283522#comment-17283522 ] David Smiley commented on LUCENE-9762: -- I filed a PR with a fix. The problem is actually not QueryValueSource's use of TwoPhaseIterator. That change increased the scenarios in which a _pre-existing bug_ in FunctionScoreQuery's score() method is exposed. It's valid for a Scorer's score() method to be called more than once, but FSQ's score() was calling a DoubleValues.advanceExact which is intolerant of that by contract. Many implementations allow it nevertheless but not QueryValueSource when the wrapped query is a PhraseQuery or perhaps some other TPI based queries. This bug was a tricky puzzle to track down! CC [~romseygeek] as you introduced FunctionScoreQuery. > FunctionScoreQuery can fail when the score is requested twice > - > > Key: LUCENE-9762 > URL: https://issues.apache.org/jira/browse/LUCENE-9762 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.8 >Reporter: Chris M. Hostetter >Assignee: David Smiley >Priority: Major > Attachments: LUCENE-9762.patch > > Time Spent: 10m > Remaining Estimate: 0h > > As originally reported by Nicolás Lichtmaier on the java-user list, there are > some trivial situations which can trigger an assertion error in the > PostingsReader when enumerating PhrasePositions for a sloppy PhraseQuery... > {noformat} > Exception in thread "main" java.lang.AssertionError > at > org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum.nextPosition(Lucene84PostingsReader.java:940) > at > org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57) > at > org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:46) > at > org.apache.lucene.search.SloppyPhraseMatcher.initSimple(SloppyPhraseMatcher.java:368) > at > org.apache.lucene.search.SloppyPhraseMatcher.initPhrasePositions(SloppyPhraseMatcher.java:356) > at > org.apache.lucene.search.SloppyPhraseMatcher.reset(SloppyPhraseMatcher.java:153) > at org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:49) > at > org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631) > at > org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343) > at > org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53) > at > org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53) > at > org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270) > at > org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228) > at > org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344) > at > org.apache.lucene.search.DoubleValues$1.doubleValue(DoubleValues.java:48) > at > org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265) > at > org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229) > at > org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76) > at > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276) > at > org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232) > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:659) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:572) > at > org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:419) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:430) > at LuceneCrash.main(LuceneCrash.java:51) > {noformat} > http://mail-archives.apache.org/mod_mbox/lucene-java-user/202102.mbox/%3C177a65ec-5ec3-e1aa-99c3-b478e165d5e8%40wolfram.com%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052
[ https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283521#comment-17283521 ] Noble Paul commented on SOLR-15136: --- LGTM > need to audit for excesive INFO level logging after SOLR-15052 > -- > > Key: SOLR-15136 > URL: https://issues.apache.org/jira/browse/SOLR-15136 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-15136.patch > > > Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive > amount of INFO level logging, notable from ZkStateReader... > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E > this appears to be caused by SOLR-15052, but is not consistent between master > and branch_8x. at a glance it appears there was lots of new logging > introduced originally at the INFO level with "nocommit" comments to dial down > to debug -- some of which happened, but evidently a lot didn't > I'm filing this issue in the hopes that the folks involved in SOLR-15052 will > please go back and do a thorough review of what logging is on master vs 8x > and carefully reconsider what log messages added for development purposes are > still useful, and what levels they should be at. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley opened a new pull request #2358: LUCENE-9762: FunctionScoreQuery must guard score() called twice
dsmiley opened a new pull request #2358: URL: https://github.com/apache/lucene-solr/pull/2358 https://issues.apache.org/jira/browse/LUCENE-9762 The score() may be called multiple times. It should take care to call DoubleValues.advanceExact only the first time, or risk faulty behavior including exceptions. There isn't an 8.8.1 section in the CHANGES.txt on master branch but I can add it into branch_8x and branch_8_8 where eventually it will show up when the RM does a release? Or I should just add it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9762) FunctionScoreQuery can fail when the score is requested twice
[ https://issues.apache.org/jira/browse/LUCENE-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-9762: - Summary: FunctionScoreQuery can fail when the score is requested twice (was: AssertionError from Lucene84PostingsReader$EverythingEnum.nextPosition) > FunctionScoreQuery can fail when the score is requested twice > - > > Key: LUCENE-9762 > URL: https://issues.apache.org/jira/browse/LUCENE-9762 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.8 >Reporter: Chris M. Hostetter >Assignee: David Smiley >Priority: Major > Attachments: LUCENE-9762.patch > > > As originally reported by Nicolás Lichtmaier on the java-user list, there are > some trivial situations which can trigger an assertion error in the > PostingsReader when enumerating PhrasePositions for a sloppy PhraseQuery... > {noformat} > Exception in thread "main" java.lang.AssertionError > at > org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum.nextPosition(Lucene84PostingsReader.java:940) > at > org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57) > at > org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:46) > at > org.apache.lucene.search.SloppyPhraseMatcher.initSimple(SloppyPhraseMatcher.java:368) > at > org.apache.lucene.search.SloppyPhraseMatcher.initPhrasePositions(SloppyPhraseMatcher.java:356) > at > org.apache.lucene.search.SloppyPhraseMatcher.reset(SloppyPhraseMatcher.java:153) > at org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:49) > at > org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631) > at > org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343) > at > org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53) > at > org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53) > at > org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270) > at > org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228) > at > org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344) > at > org.apache.lucene.search.DoubleValues$1.doubleValue(DoubleValues.java:48) > at > org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265) > at > org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229) > at > org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76) > at > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276) > at > org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232) > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:659) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:572) > at > org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:419) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:430) > at LuceneCrash.main(LuceneCrash.java:51) > {noformat} > http://mail-archives.apache.org/mod_mbox/lucene-java-user/202102.mbox/%3C177a65ec-5ec3-e1aa-99c3-b478e165d5e8%40wolfram.com%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9762) AssertionError from Lucene84PostingsReader$EverythingEnum.nextPosition
[ https://issues.apache.org/jira/browse/LUCENE-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley reassigned LUCENE-9762: Assignee: David Smiley > AssertionError from Lucene84PostingsReader$EverythingEnum.nextPosition > -- > > Key: LUCENE-9762 > URL: https://issues.apache.org/jira/browse/LUCENE-9762 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.8 >Reporter: Chris M. Hostetter >Assignee: David Smiley >Priority: Major > Attachments: LUCENE-9762.patch > > > As originally reported by Nicolás Lichtmaier on the java-user list, there are > some trivial situations which can trigger an assertion error in the > PostingsReader when enumerating PhrasePositions for a sloppy PhraseQuery... > {noformat} > Exception in thread "main" java.lang.AssertionError > at > org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum.nextPosition(Lucene84PostingsReader.java:940) > at > org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57) > at > org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:46) > at > org.apache.lucene.search.SloppyPhraseMatcher.initSimple(SloppyPhraseMatcher.java:368) > at > org.apache.lucene.search.SloppyPhraseMatcher.initPhrasePositions(SloppyPhraseMatcher.java:356) > at > org.apache.lucene.search.SloppyPhraseMatcher.reset(SloppyPhraseMatcher.java:153) > at org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:49) > at > org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631) > at > org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343) > at > org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53) > at > org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53) > at > org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270) > at > org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228) > at > org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344) > at > org.apache.lucene.search.DoubleValues$1.doubleValue(DoubleValues.java:48) > at > org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265) > at > org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229) > at > org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76) > at > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276) > at > org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232) > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:659) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:572) > at > org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:419) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:430) > at LuceneCrash.main(LuceneCrash.java:51) > {noformat} > http://mail-archives.apache.org/mod_mbox/lucene-java-user/202102.mbox/%3C177a65ec-5ec3-e1aa-99c3-b478e165d5e8%40wolfram.com%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052
[ https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-15136: Attachment: SOLR-15136.patch Assignee: Ishan Chattopadhyaya Status: Open (was: Open) Here's a patch reducing logging from ZkStateReader. > need to audit for excesive INFO level logging after SOLR-15052 > -- > > Key: SOLR-15136 > URL: https://issues.apache.org/jira/browse/SOLR-15136 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-15136.patch > > > Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive > amount of INFO level logging, notable from ZkStateReader... > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E > this appears to be caused by SOLR-15052, but is not consistent between master > and branch_8x. at a glance it appears there was lots of new logging > introduced originally at the INFO level with "nocommit" comments to dial down > to debug -- some of which happened, but evidently a lot didn't > I'm filing this issue in the hopes that the folks involved in SOLR-15052 will > please go back and do a thorough review of what logging is on master vs 8x > and carefully reconsider what log messages added for development purposes are > still useful, and what levels they should be at. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15089) Allow backup/restoration to Amazon's S3 blobstore
[ https://issues.apache.org/jira/browse/SOLR-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283493#comment-17283493 ] Ishan Chattopadhyaya commented on SOLR-15089: - I would strongly prefer for this to stay outside of solr-core, preferably in solr-extras repo (when that's created). Having AWS libraries shipped with Solr by default would feel very awkward. > Allow backup/restoration to Amazon's S3 blobstore > -- > > Key: SOLR-15089 > URL: https://issues.apache.org/jira/browse/SOLR-15089 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jason Gerlowski >Priority: Major > > Solr's BackupRepository interface provides an abstraction around the physical > location/format that backups are stored in. This allows plugin writers to > create "repositories" for a variety of storage mediums. It'd be nice if Solr > offered more mediums out of the box though, such as some of the "blobstore" > offerings provided by various cloud providers. > This ticket proposes that a "BackupRepository" implementation for Amazon's > popular 'S3' blobstore, so that Solr users can use it for backups without > needing to write their own code. > Amazon offers a s3 Java client with acceptable licensing, and the required > code is relatively simple. The biggest challenge in supporting this will > likely be procedural - integration testing requires S3 access and S3 access > costs money. We can check with INFRA to see if there is any way to get cloud > credits for an integration test to run in nightly Jenkins runs on the ASF > Jenkins server. Alternatively we can try to stub out the blobstore in some > reliable way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15011) /admin/logging handler should be able to configure logs on all nodes
[ https://issues.apache.org/jira/browse/SOLR-15011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283486#comment-17283486 ] ASF subversion and git services commented on SOLR-15011: Commit db6129759061e24bcdb90f3a1310bd25a004076b in lucene-solr's branch refs/heads/master from David Smiley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=db61297 ] SOLR-15011: Remove flawed test > /admin/logging handler should be able to configure logs on all nodes > > > Key: SOLR-15011 > URL: https://issues.apache.org/jira/browse/SOLR-15011 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: logging >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: master (9.0) > > Time Spent: 3.5h > Remaining Estimate: 0h > > The LoggingHandler registered at /admin/logging can configure log levels for > the current node. This is nice but in SolrCloud, what's needed is an ability > to change the level for _all_ nodes in the cluster. I propose that this be a > parameter name "distrib" defaulting to SolrCloud mode's status. An admin UI > could have a checkbox for it. I don't propose that the read operations be > changed -- they can continue to just look at the node you are hitting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15132) Add window paramater to the nodes Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-15132: -- Attachment: SOLR-15132.patch > Add window paramater to the nodes Streaming Expression > -- > > Key: SOLR-15132 > URL: https://issues.apache.org/jira/browse/SOLR-15132 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Joel Bernstein >Priority: Major > Attachments: SOLR-15132.patch, SOLR-15132.patch, SOLR-15132.patch, > SOLR-15132.patch > > > The *nodes* Streaming Expression performs a breadth first graph traversal. > This ticket will add a *window* parameter to allow the nodes expression to > traverse the graph within a window of time. > To take advantage of this feature you must index the content with a String > field which is an ISO timestamp truncated at ten seconds. Then the *window* > parameter can be applied to walk the graph within a *window prior* to a > specific ten second window and perform aggregations. > *The main use case for this feature is auto-detecting lagged correlations.* > This is useful in many different fields. > Here is an example using Solr logs to answer the following question: > What types of log events occur most frequently in the 30 second window prior > to 10 second windows with the most slow queries: > {code} > nodes(logs, > facet(logs, q="qtime_s:[5000 TO *]", buckets="time_ten_seconds", > rows="25"), > walk="time_ten_seconds->time_ten_seconds", > window="3", > gather="type_s", > count(*)) > {code} > This ticket is phase 1. Phase 2 will auto-detect different ISO Timestamp > truncations so that increments of one second, one minute, one day etc... can > also be traversed using the same query syntax. There will be a follow-on > ticket for that after this ticket is completed. This will create a more > general purpose time graph. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2342: LUCENE-9406: Add IndexWriterEventListener to track events in IndexWriter
zacharymorn commented on a change in pull request #2342: URL: https://github.com/apache/lucene-solr/pull/2342#discussion_r574958684 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3518,15 +3518,30 @@ private long prepareCommitInternal() throws IOException { } if (pointInTimeMerges != null) { +MergePolicy.OneMerge nextMerge = null; + +if (pendingMerges.size() > 0) { + // nocommit getting OneMerge instance here via mergeSource.getNextMerge() will Review comment: No problem! From the context I assume you are actually meaning to pass `pointInTimeMerges` into event listener, since `pendingMerges` is of type `Deque`? I've pushed a commit to use `pointInTimeMerges` for now. ## File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java ## @@ -388,6 +388,69 @@ public void testMergeOnCommit() throws IOException { dir.close(); } + private class TesIndexWriterEventListener implements IndexWriterEventListener { +private boolean beginMergeCalled = false; +private boolean endMergeCalled = false; + +@Override +public void beginMergeOnFullFlush(MergePolicy.OneMerge merge) { + beginMergeCalled = true; +} + +@Override +public void endMergeOnFullFlush(MergePolicy.OneMerge merge) { + endMergeCalled = true; +} + +public boolean isEventsRecorded() { + return beginMergeCalled && endMergeCalled; +} + } + + // Test basic semantics of merge on commit and events recording invocation + public void testMergeOnCommitWithEventListener() throws IOException { Review comment: Makes sense. Added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3
[ https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283450#comment-17283450 ] Naoto Minami commented on SOLR-15114: - I'm very glad to contribute the community! Thanks [~janhoy], [~dsmiley] and [~tflobbe]. > WAND does not work correctly on multiple segments in Solr 8.6.3 > --- > > Key: SOLR-15114 > URL: https://issues.apache.org/jira/browse/SOLR-15114 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6, 8.7, 8.8 >Reporter: Naoto Minami >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: master (9.0), 8.9, 8.8.1 > > Attachments: wand.pdf > > Time Spent: 20m > Remaining Estimate: 0h > > In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each > index segment and remain zero until maxScore is updated. > There are two causes of this problem: > - MaxScoreCollector does not set minCompetitiveScore of > MinCompetitiveScoreAwareScorable newly generated for another index segment. > - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. > This behavior is correct considering the purpose of MaxScoreCollector. > For details, see the attached pdf. > *Note* > This problem occurs in distributed search (SolrCloud) or the fl=score > parameter specified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-8711) Move the logic to score across multiple fields to Similarity?
[ https://issues.apache.org/jira/browse/LUCENE-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani reassigned LUCENE-8711: Assignee: Julie Tibshirani > Move the logic to score across multiple fields to Similarity? > - > > Key: LUCENE-8711 > URL: https://issues.apache.org/jira/browse/LUCENE-8711 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Julie Tibshirani >Priority: Minor > > BlendedTermQuery and BM25FTermQuery both try to merge score contributions of > terms across multiple fields. Using them is still very manual. Is it > something that we could make similarities responsible for instead? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9725) Allow BM25FQuery to use other similarities
[ https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani reassigned LUCENE-9725: Assignee: Julie Tibshirani > Allow BM25FQuery to use other similarities > -- > > Key: LUCENE-9725 > URL: https://issues.apache.org/jira/browse/LUCENE-9725 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Julie Tibshirani >Assignee: Julie Tibshirani >Priority: Major > Fix For: 8.9 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > From a high level, BM25FQuery works as follows: > # Given a list of fields and weights, it pretends there's a synthetic > combined field where all terms have been indexed. It computes new term and > collection statistics for this combined field. > # It uses a disjunction iterator and BM25Similarity to score the documents. > The steps are (1) compute statistics that represent the combined field > content, and (2) pass these to a similarity function. There is nothing really > specific to BM25Similarity in this approach. In step 2, we could use another > similarity, for example BooleanSimilarity or those based on language models > like LMDirichletSimilarity. The main restriction is that norms have to be > additive (the norm of the combined field must be the sum of the field norms). > Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the > one configured on IndexSearcher. We could think of this as providing a > sensible default approach to cross-field scoring for many similarities. It's > an incremental step towards LUCENE-8711, which would give similarities more > fine-grained control over how stats/ scores are combined across fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently
[ https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283442#comment-17283442 ] Robert Muir commented on LUCENE-9754: - I attached a prototype. Should address the TODO, so that users can customize this "chunking" in case they want it to work differently than sentence boundaries, I will look into it. > ICU Tokenizer: letter-space-number-letter tokenized inconsistently > -- > > Key: LUCENE-9754 > URL: https://issues.apache.org/jira/browse/LUCENE-9754 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.5 > Environment: Tested most recently on Elasticsearch 6.5.4. >Reporter: Trey Jones >Priority: Major > Attachments: LUCENE-9754_prototype.patch > > > The tokenization of strings like _14th_ with the ICU tokenizer is affected by > the character that comes before preceeding whitespace. > For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | > 14 | th. > In general, in a letter-space-number-letter sequence, if the writing system > before the space is the same as the writing system after the number, then you > get two tokens. If the writing systems differ, you get three tokens. > If the conditions are just right, the chunking that the ICU tokenizer does > (trying to split on spaces to create <4k chunks) can create an artificial > boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the > unexpected split of the second token (_14th_). Because chunking changes can > ripple through a long document, editing text or the effects of a character > filter can cause changes in tokenization thousands of lines later in a > document. > My guess is that some "previous character set" flag is not reset at the > space, and numbers are not in a character set, so _t_ is compared to _ァ_ and > they are not the same—causing a token split at the character set change—but > I'm not sure. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently
[ https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9754: Attachment: LUCENE-9754_prototype.patch > ICU Tokenizer: letter-space-number-letter tokenized inconsistently > -- > > Key: LUCENE-9754 > URL: https://issues.apache.org/jira/browse/LUCENE-9754 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.5 > Environment: Tested most recently on Elasticsearch 6.5.4. >Reporter: Trey Jones >Priority: Major > Attachments: LUCENE-9754_prototype.patch > > > The tokenization of strings like _14th_ with the ICU tokenizer is affected by > the character that comes before preceeding whitespace. > For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | > 14 | th. > In general, in a letter-space-number-letter sequence, if the writing system > before the space is the same as the writing system after the number, then you > get two tokens. If the writing systems differ, you get three tokens. > If the conditions are just right, the chunking that the ICU tokenizer does > (trying to split on spaces to create <4k chunks) can create an artificial > boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the > unexpected split of the second token (_14th_). Because chunking changes can > ripple through a long document, editing text or the effects of a character > filter can cause changes in tokenization thousands of lines later in a > document. > My guess is that some "previous character set" flag is not reset at the > space, and numbers are not in a character set, so _t_ is compared to _ァ_ and > they are not the same—causing a token split at the character set change—but > I'm not sure. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9751) Assertion error (int overflow) in ByteSliceReader
[ https://issues.apache.org/jira/browse/LUCENE-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283419#comment-17283419 ] Michael McCandless commented on LUCENE-9751: {quote}It's definitely not a single huge document, Mike. {quote} OK, hrmph. {quote}I've tried reproducing last night (on a different machine but with the same heap/ threads setup) but no luck - it finished successfully. {quote} Also hrmph. {quote}I guess this means the problem is gone?... :) {quote} I wish! Non ignorance is non bliss! > Assertion error (int overflow) in ByteSliceReader > - > > Key: LUCENE-9751 > URL: https://issues.apache.org/jira/browse/LUCENE-9751 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.7 >Reporter: Dawid Weiss >Priority: Major > > New computers come with insane amounts of ram and heaps can get pretty big. > If you adjust per-thread buffers to larger values strange things start > happening. This happened to us today: > {code} > Caused by: java.lang.AssertionError > at > org.apache.lucene.index.ByteSliceReader.init(ByteSliceReader.java:44) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.index.TermsHashPerField.initReader(TermsHashPerField.java:88) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.index.FreqProxFields$FreqProxPostingsEnum.reset(FreqProxFields.java:430) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.index.FreqProxFields$FreqProxTermsEnum.postings(FreqProxFields.java:247) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:127) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:264) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:480) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:394) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:440) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) > ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - > atrisharma - 2020-10-29 19:35:28] > ... 7 more > {code} > Likely an int overflow in TermsHashPerField: > {code} > reader.init(bytePool, > > postingsArray.byteStarts[termID]+stream*ByteBlockPool.FIRST_LEVEL_SIZE, > streamAddressBuffer[offsetInAddressBuffer+stream]); > {code} > Don't know if this can be prevented somehow. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2342: LUCENE-9406: Add IndexWriterEventListener to track events in IndexWriter
mikemccand commented on a change in pull request #2342: URL: https://github.com/apache/lucene-solr/pull/2342#discussion_r574877742 ## File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java ## @@ -388,6 +388,69 @@ public void testMergeOnCommit() throws IOException { dir.close(); } + private class TesIndexWriterEventListener implements IndexWriterEventListener { +private boolean beginMergeCalled = false; +private boolean endMergeCalled = false; + +@Override +public void beginMergeOnFullFlush(MergePolicy.OneMerge merge) { + beginMergeCalled = true; +} + +@Override +public void endMergeOnFullFlush(MergePolicy.OneMerge merge) { + endMergeCalled = true; +} + +public boolean isEventsRecorded() { + return beginMergeCalled && endMergeCalled; +} + } + + // Test basic semantics of merge on commit and events recording invocation + public void testMergeOnCommitWithEventListener() throws IOException { Review comment: You might also edit `LuceneTestCase.newIndexWriterConfig` to randomly swap in a `MockIndexWriterEventListener` just to exercise this listener in any tests using that API, which is quite a few. It could uncover times when we accidentally break something when this listener is invoked ... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2342: LUCENE-9406: Add IndexWriterEventListener to track events in IndexWriter
mikemccand commented on a change in pull request #2342: URL: https://github.com/apache/lucene-solr/pull/2342#discussion_r574877053 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3518,15 +3518,30 @@ private long prepareCommitInternal() throws IOException { } if (pointInTimeMerges != null) { +MergePolicy.OneMerge nextMerge = null; + +if (pendingMerges.size() > 0) { + // nocommit getting OneMerge instance here via mergeSource.getNextMerge() will Review comment: How about passing the `pendingMerges` to the event listener instead? (Sorry if I asked for `OneMerge` on the issue! `MergeSpecification` is better since it can hold multiple merges, allows event listener to log things like how many merges were requested during `commit`, etc.). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15129) Use the Solr TGZ artifact as Docker context
[ https://issues.apache.org/jira/browse/SOLR-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283393#comment-17283393 ] Houston Putman edited comment on SOLR-15129 at 2/11/21, 9:43 PM: - [~dsmiley], that's what the _/elasticsearch image looks like. It's a hardcoded sha reference of the image built within elastic. {{FROM docker.elastic.co/elasticsearch/elasticsearch:7.10.1@sha256:5d8f1962907ef60746a8cf61c8a7f2b8755510ee36bdee0f65417f90a38a0139}} We could certainly make that a part of the ReleaseWizard. It would stop us from doing incremental updates however for base images. I don't think that's a sticking point though. As per Hoss' comments above about the git repository being hosted on apache hardware, and the binary release being hosted on mirrors, couldn't we use https://downloads.apache.org/lucene/solr/8.8.0/solr-8.8.0.tgz? That's hosted on apache hardware. I don't see a large difference in the security provided by the git repo vs the security provided by the tgz on apache hardware. I can summarize our master plan and include the options we are looking at (github and binary release). was (Author: houston): [~dsmiley], that's what the _/elasticsearch image looks like. It's a hardcoded sha reference of the image built within elastic. {{FROM docker.elastic.co/elasticsearch/elasticsearch:7.10.1@sha256:5d8f1962907ef60746a8cf61c8a7f2b8755510ee36bdee0f65417f90a38a0139}} We could certainly make that a part of the ReleaseWizard. It would stop us from doing incremental updates however for base images. I don't think that's a sticking point though. As per Hoss' comments above about the git repository being hosted on apache hardware, and the binary release being hosted on mirrors, couldn't we use https://downloads.apache.org/lucene/solr/8.8.0/solr-8.8.0.tgz? That's hosted on apache hardware. I don't see a large difference in the security provided by the git repo vs the security provided by the tgz on apache hardware. I can summarize our master plan and have it be independent of which input we use (github or binary release), since I doubt that will make a difference in whether they accept it or not. > Use the Solr TGZ artifact as Docker context > --- > > Key: SOLR-15129 > URL: https://issues.apache.org/jira/browse/SOLR-15129 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (9.0) >Reporter: Houston Putman >Priority: Major > > As discussed in SOLR-15127, there is a need for a unified Dockerfile that > allows for release and local builds. > This ticket is an attempt to achieve this by using the Solr distribution TGZ > as the docker context to build from. > Therefore release images would be completely reproducible by running: > {{docker build -f solr-9.0.0/Dockerfile > https://www.apache.org/dyn/closer.lua/lucene/solr/9.0.0/solr-9.0.0.tgz}} > The changes to the Solr distribution would include adding a Dockerfile at > {{solr-/Dockerfile}}, adding the docker scripts under > {{solr-/docker}}, and adding a version file at > {{solr-/VERSION.txt}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15129) Use the Solr TGZ artifact as Docker context
[ https://issues.apache.org/jira/browse/SOLR-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283393#comment-17283393 ] Houston Putman commented on SOLR-15129: --- [~dsmiley], that's what the _/elasticsearch image looks like. It's a hardcoded sha reference of the image built within elastic. {{FROM docker.elastic.co/elasticsearch/elasticsearch:7.10.1@sha256:5d8f1962907ef60746a8cf61c8a7f2b8755510ee36bdee0f65417f90a38a0139}} We could certainly make that a part of the ReleaseWizard. It would stop us from doing incremental updates however for base images. I don't think that's a sticking point though. As per Hoss' comments above about the git repository being hosted on apache hardware, and the binary release being hosted on mirrors, couldn't we use https://downloads.apache.org/lucene/solr/8.8.0/solr-8.8.0.tgz? That's hosted on apache hardware. I don't see a large difference in the security provided by the git repo vs the security provided by the tgz on apache hardware. I can summarize our master plan and have it be independent of which input we use (github or binary release), since I doubt that will make a difference in whether they accept it or not. > Use the Solr TGZ artifact as Docker context > --- > > Key: SOLR-15129 > URL: https://issues.apache.org/jira/browse/SOLR-15129 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (9.0) >Reporter: Houston Putman >Priority: Major > > As discussed in SOLR-15127, there is a need for a unified Dockerfile that > allows for release and local builds. > This ticket is an attempt to achieve this by using the Solr distribution TGZ > as the docker context to build from. > Therefore release images would be completely reproducible by running: > {{docker build -f solr-9.0.0/Dockerfile > https://www.apache.org/dyn/closer.lua/lucene/solr/9.0.0/solr-9.0.0.tgz}} > The changes to the Solr distribution would include adding a Dockerfile at > {{solr-/Dockerfile}}, adding the docker scripts under > {{solr-/docker}}, and adding a version file at > {{solr-/VERSION.txt}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15132) Add window paramater to the nodes Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-15132: -- Attachment: SOLR-15132.patch > Add window paramater to the nodes Streaming Expression > -- > > Key: SOLR-15132 > URL: https://issues.apache.org/jira/browse/SOLR-15132 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Joel Bernstein >Priority: Major > Attachments: SOLR-15132.patch, SOLR-15132.patch, SOLR-15132.patch > > > The *nodes* Streaming Expression performs a breadth first graph traversal. > This ticket will add a *window* parameter to allow the nodes expression to > traverse the graph within a window of time. > To take advantage of this feature you must index the content with a String > field which is an ISO timestamp truncated at ten seconds. Then the *window* > parameter can be applied to walk the graph within a *window prior* to a > specific ten second window and perform aggregations. > *The main use case for this feature is auto-detecting lagged correlations.* > This is useful in many different fields. > Here is an example using Solr logs to answer the following question: > What types of log events occur most frequently in the 30 second window prior > to 10 second windows with the most slow queries: > {code} > nodes(logs, > facet(logs, q="qtime_s:[5000 TO *]", buckets="time_ten_seconds", > rows="25"), > walk="time_ten_seconds->time_ten_seconds", > window="3", > gather="type_s", > count(*)) > {code} > This ticket is phase 1. Phase 2 will auto-detect different ISO Timestamp > truncations so that increments of one second, one minute, one day etc... can > also be traversed using the same query syntax. There will be a follow-on > ticket for that after this ticket is completed. This will create a more > general purpose time graph. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15142) Allow the cat Streaming Expression to read gzip files
[ https://issues.apache.org/jira/browse/SOLR-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-15142: -- Attachment: (was: SOLR-15132.patch) > Allow the cat Streaming Expression to read gzip files > - > > Key: SOLR-15142 > URL: https://issues.apache.org/jira/browse/SOLR-15142 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-15142.patch, SOLR-15142.patch > > > This ticket will allow the *cat* Streaming Expression to read gzip files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15142) Allow the cat Streaming Expression to read gzip files
[ https://issues.apache.org/jira/browse/SOLR-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-15142: -- Attachment: SOLR-15132.patch > Allow the cat Streaming Expression to read gzip files > - > > Key: SOLR-15142 > URL: https://issues.apache.org/jira/browse/SOLR-15142 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-15132.patch, SOLR-15142.patch, SOLR-15142.patch > > > This ticket will allow the *cat* Streaming Expression to read gzip files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15145) Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url for core node props
[ https://issues.apache.org/jira/browse/SOLR-15145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283373#comment-17283373 ] ASF subversion and git services commented on SOLR-15145: Commit 650b03752850c80a0bd3eb33fb2cdafc417b4ecc in lucene-solr's branch refs/heads/branch_8x from Timothy Potter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=650b037 ] SOLR-15145: Fix up changes.txt for 8.8.1 release > Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url > for core node props > -- > > Key: SOLR-15145 > URL: https://issues.apache.org/jira/browse/SOLR-15145 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 8.8 >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Critical > Fix For: 8.8.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > From the mailing list: > {code} > Caused by: java.lang.NullPointerException > at > deployment.uleaf.ear//org.apache.solr.common.cloud.ZkCoreNodeProps.getCoreUrl(ZkCoreNodeProps.java:53) > at > deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.lambda$sendRequest$2(BaseCloudSolrClient.java:1161) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1540) > at > deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1159) > at > deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934) > ... 166 more > {code} > see: > https://lists.apache.org/thread.html/r3d131030f0a7026235451f71fabdae6d6b7c2f955822c75dcad4d41f%40%3Csolr-user.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2357: SOLR-15145: Fix up changes.txt in branch_8x for 8.8.1 release
thelabdude merged pull request #2357: URL: https://github.com/apache/lucene-solr/pull/2357 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude opened a new pull request #2357: SOLR-15145: Fix up changes.txt in branch_8x for 8.8.1 release
thelabdude opened a new pull request #2357: URL: https://github.com/apache/lucene-solr/pull/2357 No code changes, just updating release note and changes.txt This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Closed] (SOLR-15143) edismax is ignoring qf is query term is *
[ https://issues.apache.org/jira/browse/SOLR-15143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley closed SOLR-15143. --- > edismax is ignoring qf is query term is * > - > > Key: SOLR-15143 > URL: https://issues.apache.org/jira/browse/SOLR-15143 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 7.4 > Environment: Solr 7.4.0 - cloud mode > Java runtine: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_271 > 25.271-b09 > OS: Linux >Reporter: Yogendra Kumar Soni >Priority: Major > > We are using solr 7.4. when we are using edismax query parser and query for > * with qf param. > {code:java} > {!edismax qf=field1}*{code} > we are getting all documents in result. > I am posting debugQuery output below > > {code:java} > {... > "rawquerystring":"{!edismax qf=field1}*", "querystring":"{!edismax > qf=field1}*", "parsedquery":"+MatchAllDocsQuery(*:*)", > "parsedquery_toString":"+*:*", "QParser":"ExtendedDismaxQParser" > ...} > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15143) edismax is ignoring qf is query term is *
[ https://issues.apache.org/jira/browse/SOLR-15143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved SOLR-15143. - Resolution: Not A Problem This is working-as-designed. An asterisk does not imply that there needs to be any data in any of the fields. If you want to require that, then you'll have to recognize this special case and submit a different query to Solr, i.e. {{q=theField:*}} Consider first posting to solr-user list or Solr Slack ( https://apachesolr.slack.com ) before filing an issue. > edismax is ignoring qf is query term is * > - > > Key: SOLR-15143 > URL: https://issues.apache.org/jira/browse/SOLR-15143 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 7.4 > Environment: Solr 7.4.0 - cloud mode > Java runtine: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_271 > 25.271-b09 > OS: Linux >Reporter: Yogendra Kumar Soni >Priority: Major > > We are using solr 7.4. when we are using edismax query parser and query for > * with qf param. > {code:java} > {!edismax qf=field1}*{code} > we are getting all documents in result. > I am posting debugQuery output below > > {code:java} > {... > "rawquerystring":"{!edismax qf=field1}*", "querystring":"{!edismax > qf=field1}*", "parsedquery":"+MatchAllDocsQuery(*:*)", > "parsedquery_toString":"+*:*", "QParser":"ExtendedDismaxQParser" > ...} > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9590) Add javadoc for Lucene86PointsFormat class
[ https://issues.apache.org/jira/browse/LUCENE-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved LUCENE-9590. -- Resolution: Fixed Thanks Lu Xugang! Sorry for the long delay. > Add javadoc for Lucene86PointsFormat class > --- > > Key: LUCENE-9590 > URL: https://issues.apache.org/jira/browse/LUCENE-9590 > Project: Lucene - Core > Issue Type: Wish > Components: core/codecs >Reporter: Lu Xugang >Priority: Minor > Fix For: master (9.0) > > Attachments: 1.png > > Time Spent: 20m > Remaining Estimate: 0h > > I would like to add javadoc for Lucene86PointsFormat class, it is really > helpful for source reader to understand the data structure with point value, > is anyone doing this or plan? > The attachment list part of the data structure (filled with color means it > has sub data structure) > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9590) Add javadoc for Lucene86PointsFormat class
[ https://issues.apache.org/jira/browse/LUCENE-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283347#comment-17283347 ] ASF subversion and git services commented on LUCENE-9590: - Commit 9837bc4a4da1c63088a101c20374591f62e0be08 in lucene-solr's branch refs/heads/master from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9837bc4 ] LUCENE-9590: Add javadoc for Lucene86PointsFormat class (#2194) to Lucene's Confluence. * also corrected some trivial errors in javadocs & comments > Add javadoc for Lucene86PointsFormat class > --- > > Key: LUCENE-9590 > URL: https://issues.apache.org/jira/browse/LUCENE-9590 > Project: Lucene - Core > Issue Type: Wish > Components: core/codecs >Reporter: Lu Xugang >Priority: Minor > Fix For: master (9.0) > > Attachments: 1.png > > Time Spent: 20m > Remaining Estimate: 0h > > I would like to add javadoc for Lucene86PointsFormat class, it is really > helpful for source reader to understand the data structure with point value, > is anyone doing this or plan? > The attachment list part of the data structure (filled with color means it > has sub data structure) > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley merged pull request #2194: LUCENE-9590: Add javadoc for Lucene86PointsFormat class
dsmiley merged pull request #2194: URL: https://github.com/apache/lucene-solr/pull/2194 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14920) Format code automatically and enforce it in Solr
[ https://issues.apache.org/jira/browse/SOLR-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283338#comment-17283338 ] David Smiley commented on SOLR-14920: - I'm very much looking forward to this some day, but I very much concur with this sentiment from Erick: bq. We've lived with the formatting anomalies for many years, I don't see the driver for pushing this forward before the reference impl is resolved, there are better places to spend the effort IMO. Maybe we could start with just the contribs & docker, leaving aside Solr Core & SolrJ? Just an idea to get some middle ground. > Format code automatically and enforce it in Solr > > > Key: SOLR-14920 > URL: https://issues.apache.org/jira/browse/SOLR-14920 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Priority: Major > Labels: codestyle, formatting > > See the discussion at: LUCENE-9564. > This is a placeholder for the present, I'm reluctant to do this to the Solr > code base until after: > * we have some Solr-specific consensus > * we have some clue what this means for the reference impl. > Reconciling the reference impl will be difficult enough without a zillion > format changes to add to the confusion. > So my proposal is > 1> do this. > 2> Postpone this until after the reference impl is merged. > 3> do this in one single commit for reasons like being able to conveniently > have this separated out from git blame. > Assigning to myself so it doesn't get lost, but anyone who wants to take it > over please feel free. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15145) Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url for core node props
[ https://issues.apache.org/jira/browse/SOLR-15145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283330#comment-17283330 ] ASF subversion and git services commented on SOLR-15145: Commit 8662121ca527e456ba8f01f81e199a6c01322ac6 in lucene-solr's branch refs/heads/master from Timothy Potter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8662121 ] SOLR-15145: solr.storeBaseUrl feature flag introduced in 8.8.1 should default to false for 9.x > Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url > for core node props > -- > > Key: SOLR-15145 > URL: https://issues.apache.org/jira/browse/SOLR-15145 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 8.8 >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Critical > Fix For: 8.8.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > From the mailing list: > {code} > Caused by: java.lang.NullPointerException > at > deployment.uleaf.ear//org.apache.solr.common.cloud.ZkCoreNodeProps.getCoreUrl(ZkCoreNodeProps.java:53) > at > deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.lambda$sendRequest$2(BaseCloudSolrClient.java:1161) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1540) > at > deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1159) > at > deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934) > ... 166 more > {code} > see: > https://lists.apache.org/thread.html/r3d131030f0a7026235451f71fabdae6d6b7c2f955822c75dcad4d41f%40%3Csolr-user.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2355: SOLR-15145: solr.storeBaseUrl feature flag introduced in 8.8.1 should default to false for 9.x
thelabdude merged pull request #2355: URL: https://github.com/apache/lucene-solr/pull/2355 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani edited a comment on pull request #2310: LUCENE-9705: Create Lucene90PostingsFormat
jtibshirani edited a comment on pull request #2310: URL: https://github.com/apache/lucene-solr/pull/2310#issuecomment-46181 Moving these classes under versioned packages like `org.apache.lucene.codecs.lucene90` makes sense to me. I slightly prefer the name `Lucene90BlockTreeTermsReader` because it isn't always clear where to stick the version number. But no strong opinion, and I see we already have some classes like `Completion84PostingsFormat`. Also maybe we'll omit the number in smaller helper classes like `ForDeltaUtil`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)
[ https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283328#comment-17283328 ] David Smiley commented on SOLR-15051: - Linking SOLR-15089 for the S3 impl of BackupRepository, which is lightly a dependency to BlobDirectory. Some next steps, un-ordered: * Testing of what we have via existing tests. Maybe some sub-set of tests using MiniSolrCloudCluster? * Switch to BackupRepository API for backing storage API * Add first draft "Listings" component implementation with many limitations (trivial in-JVM static global (no ZK), no de-duping) Then: * Listings: ZK * Listings: De-duping > Shared storage -- BlobDirectory (de-duping) > --- > > Key: SOLR-15051 > URL: https://issues.apache.org/jira/browse/SOLR-15051 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This proposal is a way to accomplish shared storage in SolrCloud with a few > key characteristics: (A) using a Directory implementation, (B) delegates to a > backing local file Directory as a kind of read/write cache (C) replicas have > their own "space", (D) , de-duplication across replicas via reference > counting, (E) uses ZK but separately from SolrCloud stuff. > The Directory abstraction is a good one, and helps isolate shared storage > from the rest of SolrCloud that doesn't care. Using a backing normal file > Directory is faster for reads and is simpler than Solr's HDFSDirectory's > BlockCache. Replicas having their own space solves the problem of multiple > writers (e.g. of the same shard) trying to own and write to the same space, > and it implies that any of Solr's replica types can be used along with what > goes along with them like peer-to-peer replication (sometimes faster/cheaper > than pulling from shared storage). A de-duplication feature solves needless > duplication of files across replicas and from parent shards (i.e. from shard > splitting). The de-duplication feature requires a place to cache directory > listings so that they can be shared across replicas and atomically updated; > this is handled via ZooKeeper. Finally, some sort of Solr daemon / > auto-scaling code should be added to implement "autoAddReplicas", especially > to provide for a scenario where the leader is gone and can't be replicated > from directly but we can access shared storage. > For more about shared storage concepts, consider looking at the description > in SOLR-13101 and the linked Google Doc. > *[PROPOSAL > DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani commented on pull request #2310: LUCENE-9705: Create Lucene90PostingsFormat
jtibshirani commented on pull request #2310: URL: https://github.com/apache/lucene-solr/pull/2310#issuecomment-46181 Moving these classes under versioned packages like `org.apache.lucene.codecs.lucene90` makes sense to me. I slightly prefer the name `Lucene90BlockTreeTermsReader` because it isn't always clear where to stick the version number. But no strong opinion, and I see we already have some classes like `Completion84PostingsFormat`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15089) Allow backup/restoration to Amazon's S3 blobstore
[ https://issues.apache.org/jira/browse/SOLR-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283321#comment-17283321 ] Andy Throgmorton commented on SOLR-15089: - Great idea, [~gerlowskija]! How far along are you in coding this effort? I work at Salesforce and we wrote+use an S3 implementation of BackupRepository for a production Solr stack. It's not in an upstreamable state right now (e.g., uses some internal libraries for grabbing keys/secrets, etc.), but I would be happy to look into cleaning it up and submitting it for consideration if you haven't started yet. Or if you've already written the code, then feel free to add me on your code review. In regards to testing, we use the Adobe S3Mock (https://github.com/adobe/S3Mock) library for writing unit tests. Since this code is fairly simple, as you mentioned, the S3 APIs it uses are all mainstream and mockable with that framework. For larger, end-to-end integration tests, we've also started using Minio (https://min.io/) to emulate an S3 server, but I would think that's outside the scope of this ticket. > Allow backup/restoration to Amazon's S3 blobstore > -- > > Key: SOLR-15089 > URL: https://issues.apache.org/jira/browse/SOLR-15089 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jason Gerlowski >Priority: Major > > Solr's BackupRepository interface provides an abstraction around the physical > location/format that backups are stored in. This allows plugin writers to > create "repositories" for a variety of storage mediums. It'd be nice if Solr > offered more mediums out of the box though, such as some of the "blobstore" > offerings provided by various cloud providers. > This ticket proposes that a "BackupRepository" implementation for Amazon's > popular 'S3' blobstore, so that Solr users can use it for backups without > needing to write their own code. > Amazon offers a s3 Java client with acceptable licensing, and the required > code is relatively simple. The biggest challenge in supporting this will > likely be procedural - integration testing requires S3 access and S3 access > costs money. We can check with INFRA to see if there is any way to get cloud > credits for an integration test to run in nightly Jenkins runs on the ASF > Jenkins server. Alternatively we can try to stub out the blobstore in some > reliable way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14920) Format code automatically and enforce it in Solr
[ https://issues.apache.org/jira/browse/SOLR-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283304#comment-17283304 ] Dawid Weiss commented on SOLR-14920: Just to follow-up to what Erick said (hi Erick, how are the squirrels? :) - it indeed takes some work to make sure nothing gets *broken* on that initial formatting. We did this for Lucene and we did find some code that formatting would have broken (code in comments without pre, manually-adjusted examples). Arguably you could do the formatting and then recover from history (if somebody spots something wrong) - many approaches are possible, I guess. > Format code automatically and enforce it in Solr > > > Key: SOLR-14920 > URL: https://issues.apache.org/jira/browse/SOLR-14920 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Priority: Major > Labels: codestyle, formatting > > See the discussion at: LUCENE-9564. > This is a placeholder for the present, I'm reluctant to do this to the Solr > code base until after: > * we have some Solr-specific consensus > * we have some clue what this means for the reference impl. > Reconciling the reference impl will be difficult enough without a zillion > format changes to add to the confusion. > So my proposal is > 1> do this. > 2> Postpone this until after the reference impl is merged. > 3> do this in one single commit for reasons like being able to conveniently > have this separated out from git blame. > Assigning to myself so it doesn't get lost, but anyone who wants to take it > over please feel free. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] epugh opened a new pull request #2356: SOLR-15152: Export Tool should export nested docs cleanly in .json, .jsonl, and javabin
epugh opened a new pull request #2356: URL: https://github.com/apache/lucene-solr/pull/2356 # Description Export tool says it uses json, but it's actually a json lines format. It ignores anonymous and nested docs. # Solution * Tweaked the writer to properly handle anonymous and regular nested docs when exporting data. * Renamed the existing `json` format to `jsonl`, and introduced a proper `json` format. * Introduce explicit DocSinks per format, `json`, `jsonl`, and `javabin`. Now, with the `json` format you can export and then reimport the Solr docs, including with child docs. # Tests I've added a new `TestExportToolWithNestedDocs`, and extended the existing `TestExportTool` tests. The setup for the tests was quite different, so I didn't make them all one file. # Checklist Please review the following and check all that apply: - [ X] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ X] I have created a Jira issue and added the issue ID to my pull request title. - [ X] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ X] I have developed this patch against the `master` branch. - [ ] I have run `./gradlew check`. - [ X] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija edited a comment on pull request #2250: SOLR-13608: Incremental backup file format
gerlowskija edited a comment on pull request #2250: URL: https://github.com/apache/lucene-solr/pull/2250#issuecomment-00507 Totally agreed @epugh that we don't want the old format to linger. I tried to convey in documentation that it was deprecated and will be going away, hopefully those notes are sufficient. I thought about `deprecated` tags, but wasn't sure where to put them. The file format isn't a Java method API like those tags are typically used on. But if you have a place in mind you think makes sense, I'm happy to add them. Per the compatibility plan in the SIP, we'll maintain support for restoring the old format through the next major release line (9.x). Though conceivably it could be sooner if there's consensus that that's too long. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija edited a comment on pull request #2250: SOLR-13608: Incremental backup file format
gerlowskija edited a comment on pull request #2250: URL: https://github.com/apache/lucene-solr/pull/2250#issuecomment-00507 Totally agreed @epugh that we don't want the old format to linger. I tried to convey in documentation that it was deprecated and will be going away, hopefully those notes are sufficient. I thought about `deprecated` tags, but wasn't sure where to put them. The file format isn't a Java method API like those tags are typically used on. But if you have a place in mind you think makes sense, I'm happy to add them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija commented on pull request #2250: SOLR-13608: Incremental backup file format
gerlowskija commented on pull request #2250: URL: https://github.com/apache/lucene-solr/pull/2250#issuecomment-00507 Totally agreed that we don't want the old format to linger. I tried to convey in documentation that it was deprecated and will be going away, hopefully those notes are sufficient. I thought about `deprecated` tags, but wasn't sure where to put them. It's not a Java method API like those tags are typically used on. But if you have a place in mind you think makes sense I'm happy to add them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] HoustonPutman commented on issue #214: extensions/v1beta1 Ingress is deprecated
HoustonPutman commented on issue #214: URL: https://github.com/apache/lucene-solr-operator/issues/214#issuecomment-777699761 We should go ahead and make sure that all dependent resources are up to date with the most recent version. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15150) add request level option to fail an atomic update if it can't be done 'in-place'
[ https://issues.apache.org/jira/browse/SOLR-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283283#comment-17283283 ] Chris M. Hostetter commented on SOLR-15150: --- David: yeah, we probably intermix "atomic" and "partial" too much, and conflate the atomic nature of the partial updates with the atomic nature of optimistic concurrency updates – even when the users aren't doing that. So agreed: better to use "partial" here to clarify what aspect we're dealing with. Ishan: I had originally considering the verb "force" (and i realize now it stil lingers in a test variable) but it felt really missleading and I'd prefer we avoid it... Through the lens of a novice user "forceX" makes me think I'm telling solr "Hey solr do X even if you wouldn't by default and even if it might break something" ... similar to {{"rm -f"}} or {{"git push --force"}} ... but what we want to convey is *NOT* that this is a way to _force_ the update to be done in-place (because we can't actually promise that) .. what we want to convey is that this is way for the user to say "When i do a partial update, I expect that _either_ the update be done in place, or it must fail" "Require" felt a little better, because it seemed more like a _request_ if that makes sense? ... "I require that Solr do X" felt like a good english language equivalent, to the sentiment I was going for, because Solr can either satisfy the request, or so "I'm not capable of doing that" (ie:fail) (I had briefly considered using the verb "assert" as in {{"assert.inplace.atomic.update"}} or {{"update.partial.assertInPlace"}} but that felt too Java/C-ish for general solr users) Omitting any "verb" (either "force" or "require") and just go with something like {{"update.partial.inplace"}} is an interesting idea, but i feel like it's too ... "weak" is the closest word i can think of i guess? ... compared to how this would behave: {{update.partial.inplace=true}} it feels more like a way to override a default, (that might otherwise be hueristicly determined) and makes me think that {{update.partial.inplace=false}} would be a way to indicate that solr should _never_ do my update inplace, which isn't want we want the (default) {{"false"}} value to mean. So I think on balance at this point I like David's suggestion of {{"update.partial.requireInPlace"}} the best so far? > add request level option to fail an atomic update if it can't be done > 'in-place' > > > Key: SOLR-15150 > URL: https://issues.apache.org/jira/browse/SOLR-15150 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-15150.patch > > > When "In-Place" DocValue updates were added to Solr, the choice was made to > re-use the existing "Atomic Update" syntax, and use the DocValue updating > code if possible based on the index & schema, otherwise fall back to the > existing Atomic Update logic (to re-index the entire document). In essence, > "In-Place Atomic Updates" are treated as a (possible) optimization to > "regular" Atomic Updates > This works fine, but it leaves open the possibility of a "gotcha" situation > where users may (reasonably) assume that an update can be done "In-Place" but > some aspect of the schema prevents it, and the performance of the updates > doesn't meet expectations (notably in the case of things like deeply nested > documents, where the re-indexing cost is multiplicative based on the total > size of the document tree) > I think it would be a good idea to support an optional request param users > can specify with the semantics that say "If this update is an Atomic Update, > fail to execute it unless it can be done In-Place" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] HoustonPutman commented on issue #212: Solr cloud never getes deleted when reclaimPolicy is set to Delete
HoustonPutman commented on issue #212: URL: https://github.com/apache/lucene-solr-operator/issues/212#issuecomment-777699251 So yes, this is actually expected behavior. If you want to delete your SolrCloud after deleting the Solr Operator, you will need to remove the finalizer from the SolrCloud. Otherwise the SolrCloud will never be able to be deleted. ```bash kubectl patch solrcloud --type='json' -p='[{"op": "remove", "path": "/metadata/finalizers"}]' ``` That should do the trick. But I would recommend just deleting the SolrCloud objects before deleting the Solr Operator. It's the easiest way to go, and the PVCs will be handled correctly! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] anshumg commented on a change in pull request #2355: SOLR-15145: solr.storeBaseUrl feature flag introduced in 8.8.1 should default to false for 9.x
anshumg commented on a change in pull request #2355: URL: https://github.com/apache/lucene-solr/pull/2355#discussion_r574731370 ## File path: solr/solr-ref-guide/src/solr-upgrade-notes.adoc ## @@ -100,6 +100,12 @@ The default Prometheus Exporter configuration includes metrics like queries-per- Plugin developers using `SolrPaths.locateSolrHome()` or 'new `SolrResourceLoader`' should check deprecation warnings as existing some existing functionality will be removed in 9.0. https://issues.apache.org/jira/browse/SOLR-14934[SOLR-14934] has more technical details about this change for those concerned. +*Removing base_url from Stored State* + +If you're able to upgrade SolrJ to 8.8.x for all of your client applications, then you can set `-Dsolr.storeBaseUrl=false` (introduced in Solr 8.8.1) +to better align the stored state in Zookeeper with future versions of Solr. However, if you are not able to upgrade SolrJ to 8.8.x for all client applications, then +leave the default `-Dsolr.storeBaseUrl=true` so that Solr will continue to store the `base_url` in Zookeeper. Review comment: Perhaps add a note about this going away completely in the next major release? It's obvious for the folks who know but would be good to highlight for newer users or people who've inherited old systems. ## File path: solr/CHANGES.txt ## @@ -238,13 +238,21 @@ Bug Fixes - * SOLR-15078: Fix ExpandComponent behavior when expanding on numeric fields to differentiate '0' group from null group (hossman) -* SOLR-15114: Fix bug that caused WAND optimization to be disabled in cases where the max score is requested (such as Review comment: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15132) Add window paramater to the nodes Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-15132: -- Attachment: SOLR-15132.patch > Add window paramater to the nodes Streaming Expression > -- > > Key: SOLR-15132 > URL: https://issues.apache.org/jira/browse/SOLR-15132 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Joel Bernstein >Priority: Major > Attachments: SOLR-15132.patch, SOLR-15132.patch > > > The *nodes* Streaming Expression performs a breadth first graph traversal. > This ticket will add a *window* parameter to allow the nodes expression to > traverse the graph within a window of time. > To take advantage of this feature you must index the content with a String > field which is an ISO timestamp truncated at ten seconds. Then the *window* > parameter can be applied to walk the graph within a *window prior* to a > specific ten second window and perform aggregations. > *The main use case for this feature is auto-detecting lagged correlations.* > This is useful in many different fields. > Here is an example using Solr logs to answer the following question: > What types of log events occur most frequently in the 30 second window prior > to 10 second windows with the most slow queries: > {code} > nodes(logs, > facet(logs, q="qtime_s:[5000 TO *]", buckets="time_ten_seconds", > rows="25"), > walk="time_ten_seconds->time_ten_seconds", > window="3", > gather="type_s", > count(*)) > {code} > This ticket is phase 1. Phase 2 will auto-detect different ISO Timestamp > truncations so that increments of one second, one minute, one day etc... can > also be traversed using the same query syntax. There will be a follow-on > ticket for that after this ticket is completed. This will create a more > general purpose time graph. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on pull request #2310: LUCENE-9705: Create Lucene90PostingsFormat
jpountz commented on pull request #2310: URL: https://github.com/apache/lucene-solr/pull/2310#issuecomment-777690621 > Any thoughts on naming or package structure for these classes? What about putting the current blocktree classes into the `org.apache.lucene.backward_codecs.lucene40` (since they were introduced in Lucene 4.0) package and renaming the reader/writer classes to include the version too, ie. `BlockTree40TermsWriter` and `BlockTree40TermsReader`? And the new classes would be called `BlockTree90TermsWriter` and `BlockTree90TermsReader` and be under the `org.apache.lucene.codecs.lucene90` package? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on pull request #2310: LUCENE-9705: Create Lucene90PostingsFormat
jpountz commented on pull request #2310: URL: https://github.com/apache/lucene-solr/pull/2310#issuecomment-777686222 > I wonder if we should not version PFUtil classes, instead move then to a package under Util and change the visibility of the methods. Those classes seems more like a utility to me. I've become a bit wary of having shared utility classes for codecs given how it makes the code harder to evolve (e.g. I have the FST and PackedInts classes in mind). I'd rather like to copy this utility class wherever it's needed so that every file format that uses bit packing can more easily update the logic to fits its own needs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9755) Index Segment without DocValues May Cause Search to Fail
[ https://issues.apache.org/jira/browse/LUCENE-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283264#comment-17283264 ] Mayya Sharipova commented on LUCENE-9755: - {quote}>> Consider the following scenario: {quote} {quote}>> all documents in the index have a field "numfield" indexed as IntPoint {quote} {quote}>> in addition, SOME of those documents are also indexed with a SortedNumericDocValuesField using the same "numfield" name {quote} [~tomhecker]. I am working on the LUCENE-9334 that will ensure that this never happens. That is, if a document has "numfield" indexed as IntPoint, it also must have a "numfield" indexed as SortedNumericDocValuesField. In other words, there will be consistency between data-structures on a per-field across all the documents of an index. But this will be from version 9.0. Your point is still valid for 8.x > Index Segment without DocValues May Cause Search to Fail > > > Key: LUCENE-9755 > URL: https://issues.apache.org/jira/browse/LUCENE-9755 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.x, 8.3.1, 8.8 >Reporter: Thomas Hecker >Priority: Minor > Labels: docValues, sorting > Attachments: DocValuesTest.java > > > Not sure if this can be considered a bug, but it is certainly a caveat that > may slip through testing due to its nature. > Consider the following scenario: > * all documents in the index have a field "numfield" indexed as IntPoint > * in addition, SOME of those documents are also indexed with a > SortedNumericDocValuesField using the same "numfield" name > The documents without the DocValues cannot be matched from any queries that > involve sorting, so we save some space by omitting the DocValues for those > documents. > This works perfectly fine, unless > * the index contains a segment that only contains documents without the > DocValues > In this case, running a query that sorts by "numfield" will throw the > following exception: > {noformat} > java.lang.IllegalStateException: unexpected docvalues type NONE for field > 'numfield' (expected one of [SORTED_NUMERIC, NUMERIC]). Re-index with correct > docvalues type. > at org.apache.lucene.index.DocValues.checkField(DocValues.java:317) > at org.apache.lucene.index.DocValues.getSortedNumeric(DocValues.java:389) > at > org.apache.lucene.search.SortedNumericSortField$3.getNumericDocValues(SortedNumericSortField.java:159) > at > org.apache.lucene.search.FieldComparator$NumericComparator.doSetNextReader(FieldComparator.java:155){noformat} > I have included a minimal example program that demonstrates the issue. This > will > * create an index with two documents, each having "numfield" indexed > * add a DocValuesField "numfield" only for the first document > * force the two documents into separate index segments > * run a query that matches only the first document and sorts by "numfield" > This results in the aforementioned exception. > When removing the following lines from the code: > {code:java} > if (i==docCount/2) { > iw.commit(); > } > {code} > both documents get added to the same segment. When re-running the code > creating with a single index segment, the query works fine. > Tested with Lucene 8.3.1 and 8.8.0 . > Like I said, this may not be considered a bug. But it has slipped through our > testing because the existence of such a DocValues-free segment is such a rare > and short-lived event. > We can avoid this issue in the future by using a different field name for the > DocValuesField. But for our production systems we have to patch > DocValues.checkField() to suppress the IllegalStateException as reindexing is > not an option right now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude opened a new pull request #2355: SOLR-15145: solr.storeBaseUrl feature flag introduced in 8.8.1 should default to false for 9.x
thelabdude opened a new pull request #2355: URL: https://github.com/apache/lucene-solr/pull/2355 Align solr's changes.txt with fixes going into 8x / 8.8 and change the default value of the `solr.storeBaseUrl` sys prop for master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3
[ https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomas Eduardo Fernandez Lobbe updated SOLR-15114: - Affects Version/s: (was: 8.6.3) (was: master (9.0)) 8.8 8.6 8.7 > WAND does not work correctly on multiple segments in Solr 8.6.3 > --- > > Key: SOLR-15114 > URL: https://issues.apache.org/jira/browse/SOLR-15114 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6, 8.7, 8.8 >Reporter: Naoto Minami >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: master (9.0), 8.9, 8.8.1 > > Attachments: wand.pdf > > Time Spent: 20m > Remaining Estimate: 0h > > In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each > index segment and remain zero until maxScore is updated. > There are two causes of this problem: > - MaxScoreCollector does not set minCompetitiveScore of > MinCompetitiveScoreAwareScorable newly generated for another index segment. > - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. > This behavior is correct considering the purpose of MaxScoreCollector. > For details, see the attached pdf. > *Note* > This problem occurs in distributed search (SolrCloud) or the fl=score > parameter specified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3
[ https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomas Eduardo Fernandez Lobbe updated SOLR-15114: - Fix Version/s: 8.9 Resolution: Fixed Status: Resolved (was: Patch Available) Merged. Thanks [~nminami]! > WAND does not work correctly on multiple segments in Solr 8.6.3 > --- > > Key: SOLR-15114 > URL: https://issues.apache.org/jira/browse/SOLR-15114 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6.3, master (9.0) >Reporter: Naoto Minami >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: master (9.0), 8.9, 8.8.1 > > Attachments: wand.pdf > > Time Spent: 20m > Remaining Estimate: 0h > > In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each > index segment and remain zero until maxScore is updated. > There are two causes of this problem: > - MaxScoreCollector does not set minCompetitiveScore of > MinCompetitiveScoreAwareScorable newly generated for another index segment. > - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. > This behavior is correct considering the purpose of MaxScoreCollector. > For details, see the attached pdf. > *Note* > This problem occurs in distributed search (SolrCloud) or the fl=score > parameter specified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3
[ https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283234#comment-17283234 ] ASF subversion and git services commented on SOLR-15114: Commit 33a1f7f6b2541c096c835aead86dbe3cff111df9 in lucene-solr's branch refs/heads/branch_8_8 from Tomas Eduardo Fernandez Lobbe [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=33a1f7f ] SOLR-15114: Add CHANGES entry > WAND does not work correctly on multiple segments in Solr 8.6.3 > --- > > Key: SOLR-15114 > URL: https://issues.apache.org/jira/browse/SOLR-15114 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6.3, master (9.0) >Reporter: Naoto Minami >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: master (9.0), 8.8.1 > > Attachments: wand.pdf > > Time Spent: 20m > Remaining Estimate: 0h > > In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each > index segment and remain zero until maxScore is updated. > There are two causes of this problem: > - MaxScoreCollector does not set minCompetitiveScore of > MinCompetitiveScoreAwareScorable newly generated for another index segment. > - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. > This behavior is correct considering the purpose of MaxScoreCollector. > For details, see the attached pdf. > *Note* > This problem occurs in distributed search (SolrCloud) or the fl=score > parameter specified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3
[ https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283233#comment-17283233 ] ASF subversion and git services commented on SOLR-15114: Commit 04f92e613e0725c3dbcb4964b5f5886f84ffd847 in lucene-solr's branch refs/heads/branch_8_8 from Naoto MINAMI [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=04f92e6 ] SOLR-15114: WAND does not work correctly on multiple segments (#2259) In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each index segment and remain zero until maxScore is updated. There are two causes of this problem: * MaxScoreCollector does not set minCompetitiveScore of MinCompetitiveScoreAwareScorable newly generated for another index segment. * MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. This behavior is correct considering the purpose of MaxScoreCollector. For details, see the attached pdf https://issues.apache.org/jira/secure/attachment/13019548/wand.pdf. > WAND does not work correctly on multiple segments in Solr 8.6.3 > --- > > Key: SOLR-15114 > URL: https://issues.apache.org/jira/browse/SOLR-15114 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6.3, master (9.0) >Reporter: Naoto Minami >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: master (9.0), 8.8.1 > > Attachments: wand.pdf > > Time Spent: 20m > Remaining Estimate: 0h > > In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each > index segment and remain zero until maxScore is updated. > There are two causes of this problem: > - MaxScoreCollector does not set minCompetitiveScore of > MinCompetitiveScoreAwareScorable newly generated for another index segment. > - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. > This behavior is correct considering the purpose of MaxScoreCollector. > For details, see the attached pdf. > *Note* > This problem occurs in distributed search (SolrCloud) or the fl=score > parameter specified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3
[ https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283229#comment-17283229 ] ASF subversion and git services commented on SOLR-15114: Commit 6a801e21520dc7d6e0720672d530f32ded25019b in lucene-solr's branch refs/heads/branch_8x from Naoto MINAMI [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6a801e2 ] SOLR-15114: WAND does not work correctly on multiple segments (#2259) In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each index segment and remain zero until maxScore is updated. There are two causes of this problem: * MaxScoreCollector does not set minCompetitiveScore of MinCompetitiveScoreAwareScorable newly generated for another index segment. * MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. This behavior is correct considering the purpose of MaxScoreCollector. For details, see the attached pdf https://issues.apache.org/jira/secure/attachment/13019548/wand.pdf. > WAND does not work correctly on multiple segments in Solr 8.6.3 > --- > > Key: SOLR-15114 > URL: https://issues.apache.org/jira/browse/SOLR-15114 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6.3, master (9.0) >Reporter: Naoto Minami >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: master (9.0), 8.8.1 > > Attachments: wand.pdf > > Time Spent: 20m > Remaining Estimate: 0h > > In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each > index segment and remain zero until maxScore is updated. > There are two causes of this problem: > - MaxScoreCollector does not set minCompetitiveScore of > MinCompetitiveScoreAwareScorable newly generated for another index segment. > - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. > This behavior is correct considering the purpose of MaxScoreCollector. > For details, see the attached pdf. > *Note* > This problem occurs in distributed search (SolrCloud) or the fl=score > parameter specified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3
[ https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283230#comment-17283230 ] ASF subversion and git services commented on SOLR-15114: Commit 3d6d92feb5bc532e71e6c8ee5f92396b7f65b982 in lucene-solr's branch refs/heads/branch_8x from Tomas Eduardo Fernandez Lobbe [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3d6d92f ] SOLR-15114: Add CHANGES entry > WAND does not work correctly on multiple segments in Solr 8.6.3 > --- > > Key: SOLR-15114 > URL: https://issues.apache.org/jira/browse/SOLR-15114 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6.3, master (9.0) >Reporter: Naoto Minami >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: master (9.0), 8.8.1 > > Attachments: wand.pdf > > Time Spent: 20m > Remaining Estimate: 0h > > In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each > index segment and remain zero until maxScore is updated. > There are two causes of this problem: > - MaxScoreCollector does not set minCompetitiveScore of > MinCompetitiveScoreAwareScorable newly generated for another index segment. > - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. > This behavior is correct considering the purpose of MaxScoreCollector. > For details, see the attached pdf. > *Note* > This problem occurs in distributed search (SolrCloud) or the fl=score > parameter specified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter opened a new pull request #2354: LUCENE-9765: Hunspell: rename SpellChecker to Hunspell, fix test name…
donnerpeter opened a new pull request #2354: URL: https://github.com/apache/lucene-solr/pull/2354 …, update javadoc and CHANGES.txt # Description The class names are imperfect and the docs are outdated # Solution Fix that # Tests No behavior change. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15145) Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url for core node props
[ https://issues.apache.org/jira/browse/SOLR-15145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283218#comment-17283218 ] ASF subversion and git services commented on SOLR-15145: Commit b000e56b8f3b74e211cea450026b2d2c8da078b5 in lucene-solr's branch refs/heads/branch_8_8 from Timothy Potter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b000e56 ] SOLR-15145: System property to control whether base_url is stored in state.json to enable back-compat with older SolrJ versions > Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url > for core node props > -- > > Key: SOLR-15145 > URL: https://issues.apache.org/jira/browse/SOLR-15145 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 8.8 >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Critical > Fix For: 8.8.1 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > From the mailing list: > {code} > Caused by: java.lang.NullPointerException > at > deployment.uleaf.ear//org.apache.solr.common.cloud.ZkCoreNodeProps.getCoreUrl(ZkCoreNodeProps.java:53) > at > deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.lambda$sendRequest$2(BaseCloudSolrClient.java:1161) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1540) > at > deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1159) > at > deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934) > ... 166 more > {code} > see: > https://lists.apache.org/thread.html/r3d131030f0a7026235451f71fabdae6d6b7c2f955822c75dcad4d41f%40%3Csolr-user.lucene.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2346: SOLR-15145: System property to control whether base_url is stored in state.json to enable back-compat with older SolrJ versions
thelabdude merged pull request #2346: URL: https://github.com/apache/lucene-solr/pull/2346 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9765) Hunspell: rename SpellChecker to Hunspell, fix test name, update javadoc and CHANGES.txt
Peter Gromov created LUCENE-9765: Summary: Hunspell: rename SpellChecker to Hunspell, fix test name, update javadoc and CHANGES.txt Key: LUCENE-9765 URL: https://issues.apache.org/jira/browse/LUCENE-9765 Project: Lucene - Core Issue Type: Sub-task Reporter: Peter Gromov -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3
[ https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283204#comment-17283204 ] ASF subversion and git services commented on SOLR-15114: Commit b6db6c88d7bc4ee1757c450ad7a3df8b72add084 in lucene-solr's branch refs/heads/master from Tomas Eduardo Fernandez Lobbe [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b6db6c8 ] SOLR-15114: Add CHANGES entry > WAND does not work correctly on multiple segments in Solr 8.6.3 > --- > > Key: SOLR-15114 > URL: https://issues.apache.org/jira/browse/SOLR-15114 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6.3, master (9.0) >Reporter: Naoto Minami >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: master (9.0), 8.8.1 > > Attachments: wand.pdf > > Time Spent: 20m > Remaining Estimate: 0h > > In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each > index segment and remain zero until maxScore is updated. > There are two causes of this problem: > - MaxScoreCollector does not set minCompetitiveScore of > MinCompetitiveScoreAwareScorable newly generated for another index segment. > - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. > This behavior is correct considering the purpose of MaxScoreCollector. > For details, see the attached pdf. > *Note* > This problem occurs in distributed search (SolrCloud) or the fl=score > parameter specified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9705) Move all codec formats to the o.a.l.codecs.Lucene90 package
[ https://issues.apache.org/jira/browse/LUCENE-9705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283195#comment-17283195 ] ASF subversion and git services commented on LUCENE-9705: - Commit 096f054d562978a768d346f66b50332c686919a0 in lucene-solr's branch refs/heads/master from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=096f054 ] LUCENE-9705: Reset internal version in Lucene90FieldInfosFormat. (#2339) Since this is a fresh format, we can remove older version logic and reset the internal version to 0. > Move all codec formats to the o.a.l.codecs.Lucene90 package > --- > > Key: LUCENE-9705 > URL: https://issues.apache.org/jira/browse/LUCENE-9705 > Project: Lucene - Core > Issue Type: Wish >Reporter: Ignacio Vera >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Current formats are distributed in different packages, prefixed with the > Lucene version they were created. With the upcoming release of Lucene 9.0, it > would be nice to move all those formats to just the o.a.l.codecs.Lucene90 > package (and of course moving the current ones to the backwards-codecs). > This issue would actually facilitate moving the directory API to little > endian (LUCENE-9047) as the only codecs that would need to handle backwards > compatibility will be the codecs in backwards codecs. > In addition, it can help formalising the use of internal versions vs format > versioning ( LUCENE-9616) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani merged pull request #2339: LUCENE-9705: Reset internal version in Lucene90FieldInfosFormat.
jtibshirani merged pull request #2339: URL: https://github.com/apache/lucene-solr/pull/2339 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3
[ https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283192#comment-17283192 ] ASF subversion and git services commented on SOLR-15114: Commit 0cbb38ff4a38dea31324265da13412dd713d9a8e in lucene-solr's branch refs/heads/master from Naoto MINAMI [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0cbb38f ] SOLR-15114: WAND does not work correctly on multiple segments (#2259) In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each index segment and remain zero until maxScore is updated. There are two causes of this problem: * MaxScoreCollector does not set minCompetitiveScore of MinCompetitiveScoreAwareScorable newly generated for another index segment. * MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. This behavior is correct considering the purpose of MaxScoreCollector. For details, see the attached pdf https://issues.apache.org/jira/secure/attachment/13019548/wand.pdf. > WAND does not work correctly on multiple segments in Solr 8.6.3 > --- > > Key: SOLR-15114 > URL: https://issues.apache.org/jira/browse/SOLR-15114 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6.3, master (9.0) >Reporter: Naoto Minami >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: master (9.0), 8.8.1 > > Attachments: wand.pdf > > Time Spent: 20m > Remaining Estimate: 0h > > In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each > index segment and remain zero until maxScore is updated. > There are two causes of this problem: > - MaxScoreCollector does not set minCompetitiveScore of > MinCompetitiveScoreAwareScorable newly generated for another index segment. > - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. > This behavior is correct considering the purpose of MaxScoreCollector. > For details, see the attached pdf. > *Note* > This problem occurs in distributed search (SolrCloud) or the fl=score > parameter specified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] tflobbe merged pull request #2259: SOLR-15114: WAND does not work correctly on multiple segments in Solr 8.6.3
tflobbe merged pull request #2259: URL: https://github.com/apache/lucene-solr/pull/2259 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3
[ https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282783#comment-17282783 ] Tomas Eduardo Fernandez Lobbe edited comment on SOLR-15114 at 2/11/21, 5:08 PM: I've run a perf test on the change using Gatling: * Using a wikipedia snapshot (20M docs, shortened to 1k characters) * Using Mike McCandless [query set|https://github.com/mikemccand/luceneutil/blob/master/tasks/wikimedium.10M.tasks] * 10k queries per type (180k queries total) * 2 shards, 1 replica each (on the same node). * Each shard has ~30 segments * search on the article body * 10 parallel users * Single Solr instance (iMac Pro 3.2 GHz 8-Core Intel Xeon W with 128 GB RAM). * Gatling running in the machine * The default example Solr parameters * Always used {{rows=2}}, in the case of WAND I also added {{minExactCount=2}} While there is some noise in the tests (I'd expect master WAND, master no-WAND and patch no-WAND to perform similarly), the WAND scenario with the patch applied is definitely faster: ||Stat||master WAND||master no-WAND||patch WAND||patch no-WAND|| |QPS|97.72|103.687|153.061|111.732| |min|1|1|1|1| |p50|39|39|23|36| |p75|102|95|57|87| |p95|387|350|245|322| |p99|829|809|668|769| |max|2405|2416|1331|2447| |mean|95|89|59|82| |std dev|155|147|110|139| was (Author: tomasflobbe): I've run a perf test on the change using Gatling: * Using a wikipedia snapshot (20M docs, shortened to 1k characters) * Using Mike McCandless [query set|https://github.com/mikemccand/luceneutil/blob/master/tasks/wikimedium.10M.tasks] * 10k queries per type (18k queries total) * 2 shards, 1 replica each (on the same node). * Each shard has ~30 segments * search on the article body * 10 parallel users * Single Solr instance (iMac Pro 3.2 GHz 8-Core Intel Xeon W with 128 GB RAM). * Gatling running in the machine * The default example Solr parameters * Always used {{rows=2}}, in the case of WAND I also added {{minExactCount=2}} While there is some noise in the tests (I'd expect master WAND, master no-WAND and patch no-WAND to perform similarly), the WAND scenario with the patch applied is definitely faster: ||Stat||master WAND||master no-WAND||patch WAND||patch no-WAND|| |QPS|97.72|103.687|153.061|111.732| |min|1|1|1|1| |p50|39|39|23|36| |p75|102|95|57|87| |p95|387|350|245|322| |p99|829|809|668|769| |max|2405|2416|1331|2447| |mean|95|89|59|82| |std dev|155|147|110|139| > WAND does not work correctly on multiple segments in Solr 8.6.3 > --- > > Key: SOLR-15114 > URL: https://issues.apache.org/jira/browse/SOLR-15114 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6.3, master (9.0) >Reporter: Naoto Minami >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: master (9.0), 8.8.1 > > Attachments: wand.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each > index segment and remain zero until maxScore is updated. > There are two causes of this problem: > - MaxScoreCollector does not set minCompetitiveScore of > MinCompetitiveScoreAwareScorable newly generated for another index segment. > - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. > This behavior is correct considering the purpose of MaxScoreCollector. > For details, see the attached pdf. > *Note* > This problem occurs in distributed search (SolrCloud) or the fl=score > parameter specified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15152) Export Tool should export nested docs cleanly in .json, .jsonl, and javabin
[ https://issues.apache.org/jira/browse/SOLR-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283180#comment-17283180 ] David Eric Pugh commented on SOLR-15152: [~noble]what do you think of adding an explicit {{-compress}} switch that would add the .gz to the file, versus the pattern of naming the file with a \{{ -out /myexportdir/myoutput.gz}} > Export Tool should export nested docs cleanly in .json, .jsonl, and javabin > --- > > Key: SOLR-15152 > URL: https://issues.apache.org/jira/browse/SOLR-15152 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCLI >Affects Versions: 8.8 >Reporter: David Eric Pugh >Assignee: David Eric Pugh >Priority: Major > > ExportTool doesn't properly handle anonymous child docs or nested docs. It > also confuses the JSONL format with the JSON format. > I'd like to have the JSON Lines format output as .jsonl, which is the > standard, and have the JSON format to be a .json, which is the same output as > if you wanted to post a Solr doc as a JSON to upload the data... This will > let us round trip the data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15152) Export Tool should export nested docs cleanly in .json, .jsonl, and javabin
[ https://issues.apache.org/jira/browse/SOLR-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283180#comment-17283180 ] David Eric Pugh edited comment on SOLR-15152 at 2/11/21, 4:59 PM: -- [~noble]what do you think of adding an explicit {{-compress}} switch that would add the .gz to the file, versus the pattern of naming the file with a -out /myexportdir/myoutput.gz was (Author: epugh): [~noble]what do you think of adding an explicit {{-compress}} switch that would add the .gz to the file, versus the pattern of naming the file with a \{{ -out /myexportdir/myoutput.gz}} > Export Tool should export nested docs cleanly in .json, .jsonl, and javabin > --- > > Key: SOLR-15152 > URL: https://issues.apache.org/jira/browse/SOLR-15152 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCLI >Affects Versions: 8.8 >Reporter: David Eric Pugh >Assignee: David Eric Pugh >Priority: Major > > ExportTool doesn't properly handle anonymous child docs or nested docs. It > also confuses the JSONL format with the JSON format. > I'd like to have the JSON Lines format output as .jsonl, which is the > standard, and have the JSON format to be a .json, which is the same output as > if you wanted to post a Solr doc as a JSON to upload the data... This will > let us round trip the data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15150) add request level option to fail an atomic update if it can't be done 'in-place'
[ https://issues.apache.org/jira/browse/SOLR-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283179#comment-17283179 ] Ishan Chattopadhyaya commented on SOLR-15150: - +1 Hoss, this issue is very useful. How about "forceInPlaceUpdate" or "forceInPlace"? Or maybe just "inplace"? > add request level option to fail an atomic update if it can't be done > 'in-place' > > > Key: SOLR-15150 > URL: https://issues.apache.org/jira/browse/SOLR-15150 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-15150.patch > > > When "In-Place" DocValue updates were added to Solr, the choice was made to > re-use the existing "Atomic Update" syntax, and use the DocValue updating > code if possible based on the index & schema, otherwise fall back to the > existing Atomic Update logic (to re-index the entire document). In essence, > "In-Place Atomic Updates" are treated as a (possible) optimization to > "regular" Atomic Updates > This works fine, but it leaves open the possibility of a "gotcha" situation > where users may (reasonably) assume that an update can be done "In-Place" but > some aspect of the schema prevents it, and the performance of the updates > doesn't meet expectations (notably in the case of things like deeply nested > documents, where the re-indexing cost is multiplicative based on the total > size of the document tree) > I think it would be a good idea to support an optional request param users > can specify with the semantics that say "If this update is an Atomic Update, > fail to execute it unless it can be done In-Place" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15118) Make /v2/collections APIs annotation-based
[ https://issues.apache.org/jira/browse/SOLR-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283167#comment-17283167 ] David Smiley commented on SOLR-15118: - I'm a huge fan of what you propose in your first comment! > Make /v2/collections APIs annotation-based > -- > > Key: SOLR-15118 > URL: https://issues.apache.org/jira/browse/SOLR-15118 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: v2 API >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Fix For: master (9.0) > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The {{ApiBag}} class used to register v2 APIs (and the {{PathTrie}} object > underlying it) only holds a single {{Api}} object for a given "method" and > "path" combination. In short this means that API commands with the same > method and path must be declared homogenously: they either have to all be in > the JSON spec, or all be in annotated Java classes. > The SIP-12 proposal calls for new "list-backups" and "delete-backups" APIs. > For these v2 APIs to be annotation-based, as is preferred going forward, all > of the existing /v2/collections APIs must be changed to be annotation-based > as well. > It's worth noting that this will cause the introspection output to lose the > "description" text for these APIs and their parameters, as there's no support > for this yet for annotation-based v2 APIs. See SOLR-15117 for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15146) Distribute Collection API command execution
[ https://issues.apache.org/jira/browse/SOLR-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283160#comment-17283160 ] David Smiley commented on SOLR-15146: - Amazing work Ilan!!! I'm especially looking forward to easier debug-ability when things go wrong by not having to chase stack traces/context through the Overseer. That will also make things easier like adding more distributed tracing in Solr -- no need to inject then extract spans for the ZK queue. That doesn't happen today but it's something I (or a colleague) may add in the coming weeks. > Distribute Collection API command execution > --- > > Key: SOLR-15146 > URL: https://issues.apache.org/jira/browse/SOLR-15146 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: master (9.0) >Reporter: Ilan Ginzburg >Assignee: Ilan Ginzburg >Priority: Major > Labels: collection-api, overseer > > Building on the distributed cluster state update changes (SOLR-14928), this > ticket will distribute the Collection API so that commands can execute on any > node (i.e. the node handling the request through {{CollectionsHandler}}) > without having to go through a Zookeeper queue and the Overseer. > This is the second step (first was SOLR-14928) after which the Overseer could > be removed (but the code keeps existing execution options so completion by no > means Overseer is gone, but it could be removed in a future release). > There is a dependency on the distributed cluster state changes because the > Overseer locking protecting same collection (or same shard) Collection API > commands from executing concurrently will be replaced by optimistic locking > of the collection {{state.json}} znodes (or other znodes that will eventually > replace/augment {{state.json}}). > The goal of this ticket is threefold: > * Simplify the code (running synchronously and not going through the > Zookeeper queues and the Overseer dequeue logic is much simpler), > * Lead to improved performance for most/all use cases (although this is a > secondary goal, as long as performance is not degraded) and > * Allow a future change (in another future Jira) to the way cluster state is > cached on the nodes of the cluster (keep less information, be less dependent > on Zookeeper watches, do not care about collections not present on the node). > This future work will aim to significantly increase the scale (amount of > collections) supported by SolrCloud. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15150) add request level option to fail an atomic update if it can't be done 'in-place'
[ https://issues.apache.org/jira/browse/SOLR-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283139#comment-17283139 ] David Smiley commented on SOLR-15150: - +1 LGTM and great testing as usual RE "require.inplace.atomic.updates". honestly I cringe seeing flags named like an English sentence. I prefer the dots scope the module and then use camelCase for the option, e.g. "update.partial.requireInPlace". I'm not a fan of Solr's overuse of this word "atomic" when really it's the "partial"-ness that is more perceivable by the user as what's happening. I view the "atomic"-ness as an implementation detail to the "partial" aspect. It could be argued the "atomic" aspect is more visible when users choose to specify a version constraint... but few users even do that, and even then I'd rather say something like "conditional update". > add request level option to fail an atomic update if it can't be done > 'in-place' > > > Key: SOLR-15150 > URL: https://issues.apache.org/jira/browse/SOLR-15150 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-15150.patch > > > When "In-Place" DocValue updates were added to Solr, the choice was made to > re-use the existing "Atomic Update" syntax, and use the DocValue updating > code if possible based on the index & schema, otherwise fall back to the > existing Atomic Update logic (to re-index the entire document). In essence, > "In-Place Atomic Updates" are treated as a (possible) optimization to > "regular" Atomic Updates > This works fine, but it leaves open the possibility of a "gotcha" situation > where users may (reasonably) assume that an update can be done "In-Place" but > some aspect of the schema prevents it, and the performance of the updates > doesn't meet expectations (notably in the case of things like deeply nested > documents, where the re-indexing cost is multiplicative based on the total > size of the document tree) > I think it would be a good idea to support an optional request param users > can specify with the semantics that say "If this update is an Atomic Update, > fail to execute it unless it can be done In-Place" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] HoustonPutman closed issue #215: kubectl apply -f crds.yaml fails with metadata.annotations: Too long error
HoustonPutman closed issue #215: URL: https://github.com/apache/lucene-solr-operator/issues/215 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] HoustonPutman commented on issue #215: kubectl apply -f crds.yaml fails with metadata.annotations: Too long error
HoustonPutman commented on issue #215: URL: https://github.com/apache/lucene-solr-operator/issues/215#issuecomment-777622617 Use `kubectl replace` instead, it's what we use in the Makefile now. For CRDs, `replace` is better than `apply`, even without the annotation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15152) Export Tool should export nested docs cleanly in .json, .jsonl, and javabin
David Eric Pugh created SOLR-15152: -- Summary: Export Tool should export nested docs cleanly in .json, .jsonl, and javabin Key: SOLR-15152 URL: https://issues.apache.org/jira/browse/SOLR-15152 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: SolrCLI Affects Versions: 8.8 Reporter: David Eric Pugh Assignee: David Eric Pugh ExportTool doesn't properly handle anonymous child docs or nested docs. It also confuses the JSONL format with the JSON format. I'd like to have the JSON Lines format output as .jsonl, which is the standard, and have the JSON format to be a .json, which is the same output as if you wanted to post a Solr doc as a JSON to upload the data... This will let us round trip the data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15132) Add window paramater to the nodes Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283120#comment-17283120 ] Joel Bernstein commented on SOLR-15132: --- First patch, tests to follow. > Add window paramater to the nodes Streaming Expression > -- > > Key: SOLR-15132 > URL: https://issues.apache.org/jira/browse/SOLR-15132 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Joel Bernstein >Priority: Major > Attachments: SOLR-15132.patch > > > The *nodes* Streaming Expression performs a breadth first graph traversal. > This ticket will add a *window* parameter to allow the nodes expression to > traverse the graph within a window of time. > To take advantage of this feature you must index the content with a String > field which is an ISO timestamp truncated at ten seconds. Then the *window* > parameter can be applied to walk the graph within a *window prior* to a > specific ten second window and perform aggregations. > *The main use case for this feature is auto-detecting lagged correlations.* > This is useful in many different fields. > Here is an example using Solr logs to answer the following question: > What types of log events occur most frequently in the 30 second window prior > to 10 second windows with the most slow queries: > {code} > nodes(logs, > facet(logs, q="qtime_s:[5000 TO *]", buckets="time_ten_seconds", > rows="25"), > walk="time_ten_seconds->time_ten_seconds", > window="3", > gather="type_s", > count(*)) > {code} > This ticket is phase 1. Phase 2 will auto-detect different ISO Timestamp > truncations so that increments of one second, one minute, one day etc... can > also be traversed using the same query syntax. There will be a follow-on > ticket for that after this ticket is completed. This will create a more > general purpose time graph. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15132) Add window paramater to the nodes Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-15132: -- Attachment: SOLR-15132.patch > Add window paramater to the nodes Streaming Expression > -- > > Key: SOLR-15132 > URL: https://issues.apache.org/jira/browse/SOLR-15132 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Joel Bernstein >Priority: Major > Attachments: SOLR-15132.patch > > > The *nodes* Streaming Expression performs a breadth first graph traversal. > This ticket will add a *window* parameter to allow the nodes expression to > traverse the graph within a window of time. > To take advantage of this feature you must index the content with a String > field which is an ISO timestamp truncated at ten seconds. Then the *window* > parameter can be applied to walk the graph within a *window prior* to a > specific ten second window and perform aggregations. > *The main use case for this feature is auto-detecting lagged correlations.* > This is useful in many different fields. > Here is an example using Solr logs to answer the following question: > What types of log events occur most frequently in the 30 second window prior > to 10 second windows with the most slow queries: > {code} > nodes(logs, > facet(logs, q="qtime_s:[5000 TO *]", buckets="time_ten_seconds", > rows="25"), > walk="time_ten_seconds->time_ten_seconds", > window="3", > gather="type_s", > count(*)) > {code} > This ticket is phase 1. Phase 2 will auto-detect different ISO Timestamp > truncations so that increments of one second, one minute, one day etc... can > also be traversed using the same query syntax. There will be a follow-on > ticket for that after this ticket is completed. This will create a more > general purpose time graph. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently
[ https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283116#comment-17283116 ] Robert Muir commented on LUCENE-9754: - We just use unicode properties for that stuff which is standard. Changing any of that here won't address the root problem: the chunking is bad for long documents and can cause other issues that look nothing like this, too. So let's fix the chunking here to be more sane for long documents. As far as allowing such tweaks, maybe there are improvements we can do, but i'd prefer to look at that on separate issues. That's not related to chunking but instead splitting on scripts. > ICU Tokenizer: letter-space-number-letter tokenized inconsistently > -- > > Key: LUCENE-9754 > URL: https://issues.apache.org/jira/browse/LUCENE-9754 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.5 > Environment: Tested most recently on Elasticsearch 6.5.4. >Reporter: Trey Jones >Priority: Major > > The tokenization of strings like _14th_ with the ICU tokenizer is affected by > the character that comes before preceeding whitespace. > For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | > 14 | th. > In general, in a letter-space-number-letter sequence, if the writing system > before the space is the same as the writing system after the number, then you > get two tokens. If the writing systems differ, you get three tokens. > If the conditions are just right, the chunking that the ICU tokenizer does > (trying to split on spaces to create <4k chunks) can create an artificial > boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the > unexpected split of the second token (_14th_). Because chunking changes can > ripple through a long document, editing text or the effects of a character > filter can cause changes in tokenization thousands of lines later in a > document. > My guess is that some "previous character set" flag is not reset at the > space, and numbers are not in a character set, so _t_ is compared to _ァ_ and > they are not the same—causing a token split at the character set change—but > I'm not sure. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API
[ https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283110#comment-17283110 ] ASF subversion and git services commented on LUCENE-9322: - Commit 683a9bd78abcf486a668881bc3294847ce5d5d1a in lucene-solr's branch refs/heads/master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=683a9bd ] LUCENE-9322: Add Vectors format to CodecReader accounting methods (#2353) > Discussing a unified vectors format API > --- > > Key: LUCENE-9322 > URL: https://issues.apache.org/jira/browse/LUCENE-9322 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Julie Tibshirani >Priority: Major > Fix For: master (9.0) > > Time Spent: 10.5h > Remaining Estimate: 0h > > Two different approximate nearest neighbor approaches are currently being > developed, one based on HNSW (LUCENE-9004) and another based on coarse > quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to > handle vectors. In LUCENE-9136 we discussed the possibility of a unified API > that could support both approaches. The two ANN strategies give different > trade-offs in terms of speed, memory, and complexity, and it’s likely that > we’ll want to support both. Vector search is also an active research area, > and it would be great to be able to prototype and incorporate new approaches > without introducing more formats. > To me it seems like a good time to begin discussing a unified API. The > prototype for coarse quantization > ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit > soon (this depends on everyone's feedback of course). The approach is simple > and shows solid search performance, as seen > [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326]. > I think this API discussion is an important step in moving that > implementation forward. > The goals of the API would be > # Support for storing and retrieving individual float vectors. > # Support for approximate nearest neighbor search -- given a query vector, > return the indexed vectors that are closest to it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #2353: LUCENE-9322: Add Vectors format to CodecReader accounting methods
iverase merged pull request #2353: URL: https://github.com/apache/lucene-solr/pull/2353 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently
[ https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283105#comment-17283105 ] Michael Sokolov commented on LUCENE-9754: - Would it make sense to have the ability to treat digits as Latin script? I think we ended up doing that in order to be able to apply (maybe anglo-euro-centric?) number constructs that nevertheless do appear in multilingual texts like units (1", 1ft, 1m., etc., ranges 1-10, ordinals like 1st 2nd etc) > ICU Tokenizer: letter-space-number-letter tokenized inconsistently > -- > > Key: LUCENE-9754 > URL: https://issues.apache.org/jira/browse/LUCENE-9754 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.5 > Environment: Tested most recently on Elasticsearch 6.5.4. >Reporter: Trey Jones >Priority: Major > > The tokenization of strings like _14th_ with the ICU tokenizer is affected by > the character that comes before preceeding whitespace. > For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | > 14 | th. > In general, in a letter-space-number-letter sequence, if the writing system > before the space is the same as the writing system after the number, then you > get two tokens. If the writing systems differ, you get three tokens. > If the conditions are just right, the chunking that the ICU tokenizer does > (trying to split on spaces to create <4k chunks) can create an artificial > boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the > unexpected split of the second token (_14th_). Because chunking changes can > ripple through a long document, editing text or the effects of a character > filter can cause changes in tokenization thousands of lines later in a > document. > My guess is that some "previous character set" flag is not reset at the > space, and numbers are not in a character set, so _t_ is compared to _ァ_ and > they are not the same—causing a token split at the character set change—but > I'm not sure. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3
[ https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283104#comment-17283104 ] David Smiley commented on SOLR-15114: - I should second that sentiment as well; that PDF is amazing to anyone who wants a deeper look at BlockMax WAND! Kudos to Naoto! > WAND does not work correctly on multiple segments in Solr 8.6.3 > --- > > Key: SOLR-15114 > URL: https://issues.apache.org/jira/browse/SOLR-15114 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6.3, master (9.0) >Reporter: Naoto Minami >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: master (9.0), 8.8.1 > > Attachments: wand.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each > index segment and remain zero until maxScore is updated. > There are two causes of this problem: > - MaxScoreCollector does not set minCompetitiveScore of > MinCompetitiveScoreAwareScorable newly generated for another index segment. > - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. > This behavior is correct considering the purpose of MaxScoreCollector. > For details, see the attached pdf. > *Note* > This problem occurs in distributed search (SolrCloud) or the fl=score > parameter specified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14561) Validate parameters to CoreAdminAPI
[ https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283103#comment-17283103 ] Thomas Mortagne commented on SOLR-14561: I created SOLR-15151. I cannot work on a pull request right now, but I can definitely look at it in two weeks, just need to remember :) > Validate parameters to CoreAdminAPI > --- > > Key: SOLR-14561 > URL: https://issues.apache.org/jira/browse/SOLR-14561 > Project: Solr > Issue Type: Improvement >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.6 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > CoreAdminAPI does not validate parameter input. We should limit what users > can specify for at least {{instanceDir and dataDir}} params, perhaps restrict > them to be relative to SOLR_HOME or SOLR_DATA_HOME. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15151) It's not possible anymore to indicate a relative path to the core data folder
Thomas Mortagne created SOLR-15151: -- Summary: It's not possible anymore to indicate a relative path to the core data folder Key: SOLR-15151 URL: https://issues.apache.org/jira/browse/SOLR-15151 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 8.6 Reporter: Thomas Mortagne See https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.0/solr/core/src/java/org/apache/solr/core/SolrPaths.java#L124. SOLR-14561 introduced a check to forbid using a path starting with "../" for the data of a core. This makes impossible to indicate a relative path to a data folder which is not stored in the core folder itself. Og course you can set an absolute path (provided it's part of the allowed paths), but then it makes impossible to move the entire storage somewhere else because it will break the stored path. IMO the check for "../" prefix should be completely removed, and instead relative paths should be resolved to check if they are part of the allowed paths. At least for my use case having to set allowed paths is fine but it's possible some people might want to completely disable allowed path system. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently
[ https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283102#comment-17283102 ] Robert Muir commented on LUCENE-9754: - Just to be clear, I'm thinking something like "subclass SegmentingTokenizerBase" as the fix here. The crazy chunking code here predates the SegmentingTokenizerBase, but was never retrofitted with it afterwards. It could also lead to options for users to better control/customize the chunking. > ICU Tokenizer: letter-space-number-letter tokenized inconsistently > -- > > Key: LUCENE-9754 > URL: https://issues.apache.org/jira/browse/LUCENE-9754 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.5 > Environment: Tested most recently on Elasticsearch 6.5.4. >Reporter: Trey Jones >Priority: Major > > The tokenization of strings like _14th_ with the ICU tokenizer is affected by > the character that comes before preceeding whitespace. > For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | > 14 | th. > In general, in a letter-space-number-letter sequence, if the writing system > before the space is the same as the writing system after the number, then you > get two tokens. If the writing systems differ, you get three tokens. > If the conditions are just right, the chunking that the ICU tokenizer does > (trying to split on spaces to create <4k chunks) can create an artificial > boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the > unexpected split of the second token (_14th_). Because chunking changes can > ripple through a long document, editing text or the effects of a character > filter can cause changes in tokenization thousands of lines later in a > document. > My guess is that some "previous character set" flag is not reset at the > space, and numbers are not in a character set, so _t_ is compared to _ァ_ and > they are not the same—causing a token split at the character set change—but > I'm not sure. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently
[ https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283075#comment-17283075 ] Robert Muir commented on LUCENE-9754: - Yes, the issue is because of the chunking. Put aside long documents for a second, imagine short document such as {{ァ 14th}} then it will always first be split as {{ァ 14|th}}: That's because this tokenizer first divides on scripts, and lets you use different strategy per-script. These numbers have script code of "Common", and things like accent marks have script code of "Inherited", these "Common/Inherited" are "sticky". So under normal conditions it does not break until it hits the 't' ("Latin"). Maybe it is seen as undesirable in this example, but that is just the tradeoff the tokenizer makes (splitting on scripts). You can find more discussions on that in Notes section of https://unicode.org/reports/tr29/#Word_Boundary_Rules But if you feed it super long document, we can't read the whole document into RAM at once, so we have to limit to 4k chunk. and the chunking may split on that space before the script analysis runs {{ァ|14th}}. This leads to the inconsistency that you see. For super long documents, the current behavior of this tokenizer will be annoying. The chunking/4k was written more to be a failsafe than anything else: I don't think a little tweak here or there to this tokenizer will help. One idea: change the tokenizer to chunk "sentence-at-a-time" based on sentence boundaries first. It might add a little overhead, but then long documents would work consistently. The behavior of this chunking would be easier for user to understand, word segmenter only sees "one sentence" of context at a time. > ICU Tokenizer: letter-space-number-letter tokenized inconsistently > -- > > Key: LUCENE-9754 > URL: https://issues.apache.org/jira/browse/LUCENE-9754 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.5 > Environment: Tested most recently on Elasticsearch 6.5.4. >Reporter: Trey Jones >Priority: Major > > The tokenization of strings like _14th_ with the ICU tokenizer is affected by > the character that comes before preceeding whitespace. > For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | > 14 | th. > In general, in a letter-space-number-letter sequence, if the writing system > before the space is the same as the writing system after the number, then you > get two tokens. If the writing systems differ, you get three tokens. > If the conditions are just right, the chunking that the ICU tokenizer does > (trying to split on spaces to create <4k chunks) can create an artificial > boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the > unexpected split of the second token (_14th_). Because chunking changes can > ripple through a long document, editing text or the effects of a character > filter can cause changes in tokenization thousands of lines later in a > document. > My guess is that some "previous character set" flag is not reset at the > space, and numbers are not in a character set, so _t_ is compared to _ァ_ and > they are not the same—causing a token split at the character set change—but > I'm not sure. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14561) Validate parameters to CoreAdminAPI
[ https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283071#comment-17283071 ] Jan Høydahl commented on SOLR-14561: Ah, I see - you'd like to allow path traversal explicitly. Please file a new Jira suggesting that, and include a PR if you can. I suppose we could make it so that {{allowPaths=*}} is checked early and means allow anything, like before? We could also add a few explicit keywords such as {{allowPaths=..}} and {{allowPaths=_UNC_}} which would allow parent traversal and UNC paths respectively? > Validate parameters to CoreAdminAPI > --- > > Key: SOLR-14561 > URL: https://issues.apache.org/jira/browse/SOLR-14561 > Project: Solr > Issue Type: Improvement >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.6 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > CoreAdminAPI does not validate parameter input. We should limit what users > can specify for at least {{instanceDir and dataDir}} params, perhaps restrict > them to be relative to SOLR_HOME or SOLR_DATA_HOME. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] zhuopeng opened a new issue #215: kubectl apply -f crds.yaml fails with metadata.annotations: Too long error
zhuopeng opened a new issue #215: URL: https://github.com/apache/lucene-solr-operator/issues/215 To reproduce `kubectl apply -f helm/solr-operator/crds/crds.yaml` getting error `customresourcedefinition.apiextensions.k8s.io/solrbackups.solr.bloomberg.com configured Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply customresourcedefinition.apiextensions.k8s.io/solrcollectionaliases.solr.bloomberg.com configured customresourcedefinition.apiextensions.k8s.io/solrcollections.solr.bloomberg.com configured customresourcedefinition.apiextensions.k8s.io/solrprometheusexporters.solr.bloomberg.com configured The CustomResourceDefinition "solrclouds.solr.bloomberg.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes` It happens for any version >0.2.6 It is likely related to this https://github.com/kubernetes/kubectl/issues/712. When using kubectl apply to install CRD, kubectl apply use "last-applied-configuration" to store previous CRD configuration. And it exceed the limit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2351: LUCENE-9763: Hunspell: fix FORBIDDENWORD support
donnerpeter commented on a change in pull request #2351: URL: https://github.com/apache/lucene-solr/pull/2351#discussion_r574552151 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java ## @@ -94,10 +93,6 @@ public Stemmer(Dictionary dictionary) { word = scratchBuffer; } -if (dictionary.isForbiddenWord(word, length)) { Review comment: From my correspondence with Hunspell's author it seems to be a feature, that stemming gives some results even for misspelled words This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14920) Format code automatically and enforce it in Solr
[ https://issues.apache.org/jira/browse/SOLR-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283049#comment-17283049 ] David Eric Pugh commented on SOLR-14920: I've been relearning the code base, and with fresh eyes, it's shocking to see how many flavours of writing the code Solr has... I'm often faced with "Which way should I write this method" and then see multiple approaches ;) So I look forward to when this happens! > Format code automatically and enforce it in Solr > > > Key: SOLR-14920 > URL: https://issues.apache.org/jira/browse/SOLR-14920 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Priority: Major > Labels: codestyle, formatting > > See the discussion at: LUCENE-9564. > This is a placeholder for the present, I'm reluctant to do this to the Solr > code base until after: > * we have some Solr-specific consensus > * we have some clue what this means for the reference impl. > Reconciling the reference impl will be difficult enough without a zillion > format changes to add to the confusion. > So my proposal is > 1> do this. > 2> Postpone this until after the reference impl is merged. > 3> do this in one single commit for reasons like being able to conveniently > have this separated out from git blame. > Assigning to myself so it doesn't get lost, but anyone who wants to take it > over please feel free. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9763) Hunspell: fix FORBIDDENWORD support
[ https://issues.apache.org/jira/browse/LUCENE-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-9763. - Fix Version/s: master (9.0) Resolution: Fixed > Hunspell: fix FORBIDDENWORD support > --- > > Key: LUCENE-9763 > URL: https://issues.apache.org/jira/browse/LUCENE-9763 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Peter Gromov >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2318: SOLR-15138: PerReplicaStates does not scale to large collections as well as state.json
murblanc commented on a change in pull request #2318: URL: https://github.com/apache/lucene-solr/pull/2318#discussion_r574498232 ## File path: solr/core/src/java/org/apache/solr/cloud/RefreshCollectionMessage.java ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.cloud; + +import org.apache.solr.common.cloud.ClusterState; +import org.apache.solr.common.cloud.DocCollection; +import org.apache.solr.common.cloud.ZkStateReader; +import org.apache.zookeeper.KeeperException; +import org.apache.zookeeper.data.Stat; + +/**Refresh the Cluster State for a given collection + * + */ +public class RefreshCollectionMessage implements Overseer.Message { +public final Operation operation; +public final String collection; + +public RefreshCollectionMessage(String collection) { +this.operation = Operation.REFRESH_COLL; +this.collection = collection; +} + +ClusterState run(ClusterState clusterState, Overseer overseer) throws InterruptedException, KeeperException { Review comment: Can you please use another name than `run()` that is usually related to a `Runnable` which is not the case here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org