[GitHub] [lucene-solr] noblepaul opened a new pull request #2359: SOLR-15138: PerReplicaStates does not scale to large collections as well as state.json (8x )

2021-02-11 Thread GitBox


noblepaul opened a new pull request #2359:
URL: https://github.com/apache/lucene-solr/pull/2359


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #2342: LUCENE-9406: Add IndexWriterEventListener to track events in IndexWriter

2021-02-11 Thread GitBox


dweiss commented on pull request #2342:
URL: https://github.com/apache/lucene-solr/pull/2342#issuecomment-778016468


   I think it's good overall but I'm wondering whether it makes sense to make 
that field volatile... do we want to allow changing listeners over index writer 
lifecycle? I think it should be a regular field and IW should just read it once 
(and set forever).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283526#comment-17283526
 ] 

ASF subversion and git services commented on SOLR-15136:


Commit 759cb8079bd9f192936995a7d8cce4774b82 in lucene-solr's branch 
refs/heads/branch_8_8 from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=759cb80 ]

SOLR-15136: Reduce excessive logging introduced with Per Replica States feature


> need to audit for excesive INFO level logging after SOLR-15052
> --
>
> Key: SOLR-15136
> URL: https://issues.apache.org/jira/browse/SOLR-15136
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 8.8.1
>
> Attachments: SOLR-15136.patch
>
>
> Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive 
> amount of INFO level logging, notable from ZkStateReader...
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E
> this appears to be caused by SOLR-15052, but is not consistent between master 
> and branch_8x.  at a glance it appears there was lots of new logging 
> introduced originally at the INFO level with "nocommit" comments to dial down 
> to debug -- some of which happened, but evidently a lot didn't
> I'm filing this issue in the hopes that the folks involved in SOLR-15052 will 
> please go back and do a thorough review of what logging is on master vs 8x 
> and carefully reconsider what log messages added for development purposes are 
> still useful, and what levels they should be at.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052

2021-02-11 Thread Ishan Chattopadhyaya (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya resolved SOLR-15136.
-
Fix Version/s: 8.8.1
   Resolution: Fixed

Thanks [~markus17], [~hossman].

> need to audit for excesive INFO level logging after SOLR-15052
> --
>
> Key: SOLR-15136
> URL: https://issues.apache.org/jira/browse/SOLR-15136
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 8.8.1
>
> Attachments: SOLR-15136.patch
>
>
> Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive 
> amount of INFO level logging, notable from ZkStateReader...
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E
> this appears to be caused by SOLR-15052, but is not consistent between master 
> and branch_8x.  at a glance it appears there was lots of new logging 
> introduced originally at the INFO level with "nocommit" comments to dial down 
> to debug -- some of which happened, but evidently a lot didn't
> I'm filing this issue in the hopes that the folks involved in SOLR-15052 will 
> please go back and do a thorough review of what logging is on master vs 8x 
> and carefully reconsider what log messages added for development purposes are 
> still useful, and what levels they should be at.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283524#comment-17283524
 ] 

ASF subversion and git services commented on SOLR-15136:


Commit db90ff541ec729dd04d2f293ae54467f2f3dc975 in lucene-solr's branch 
refs/heads/branch_8x from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=db90ff5 ]

SOLR-15136: Reduce excessive logging introduced with Per Replica States feature


> need to audit for excesive INFO level logging after SOLR-15052
> --
>
> Key: SOLR-15136
> URL: https://issues.apache.org/jira/browse/SOLR-15136
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-15136.patch
>
>
> Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive 
> amount of INFO level logging, notable from ZkStateReader...
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E
> this appears to be caused by SOLR-15052, but is not consistent between master 
> and branch_8x.  at a glance it appears there was lots of new logging 
> introduced originally at the INFO level with "nocommit" comments to dial down 
> to debug -- some of which happened, but evidently a lot didn't
> I'm filing this issue in the hopes that the folks involved in SOLR-15052 will 
> please go back and do a thorough review of what logging is on master vs 8x 
> and carefully reconsider what log messages added for development purposes are 
> still useful, and what levels they should be at.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283523#comment-17283523
 ] 

ASF subversion and git services commented on SOLR-15136:


Commit 938039a6889f3b9125d123cbe11e389d6cf714ba in lucene-solr's branch 
refs/heads/master from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=938039a ]

SOLR-15136: Reduce excessive logging introduced with Per Replica States feature


> need to audit for excesive INFO level logging after SOLR-15052
> --
>
> Key: SOLR-15136
> URL: https://issues.apache.org/jira/browse/SOLR-15136
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-15136.patch
>
>
> Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive 
> amount of INFO level logging, notable from ZkStateReader...
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E
> this appears to be caused by SOLR-15052, but is not consistent between master 
> and branch_8x.  at a glance it appears there was lots of new logging 
> introduced originally at the INFO level with "nocommit" comments to dial down 
> to debug -- some of which happened, but evidently a lot didn't
> I'm filing this issue in the hopes that the folks involved in SOLR-15052 will 
> please go back and do a thorough review of what logging is on master vs 8x 
> and carefully reconsider what log messages added for development purposes are 
> still useful, and what levels they should be at.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9762) FunctionScoreQuery can fail when the score is requested twice

2021-02-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283522#comment-17283522
 ] 

David Smiley commented on LUCENE-9762:
--

I filed a PR with a fix.  The problem is actually not QueryValueSource's use of 
TwoPhaseIterator.  That change increased the scenarios in which a _pre-existing 
bug_ in FunctionScoreQuery's score() method is exposed.  It's valid for a 
Scorer's score() method to be called more than once, but FSQ's score() was 
calling a DoubleValues.advanceExact which is intolerant of that by contract.  
Many implementations allow it nevertheless but not QueryValueSource when the 
wrapped query is a PhraseQuery or perhaps some other TPI based queries.  This 
bug was a tricky puzzle to track down!

CC [~romseygeek] as you introduced FunctionScoreQuery.

> FunctionScoreQuery can fail when the score is requested twice
> -
>
> Key: LUCENE-9762
> URL: https://issues.apache.org/jira/browse/LUCENE-9762
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.8
>Reporter: Chris M. Hostetter
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-9762.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As originally reported by Nicolás Lichtmaier on the java-user list, there are 
> some trivial situations which can trigger an assertion error in the 
> PostingsReader when enumerating PhrasePositions for a sloppy PhraseQuery...
> {noformat}
> Exception in thread "main" java.lang.AssertionError
>   at 
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum.nextPosition(Lucene84PostingsReader.java:940)
>   at 
> org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
>   at 
> org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:46)
>   at 
> org.apache.lucene.search.SloppyPhraseMatcher.initSimple(SloppyPhraseMatcher.java:368)
>   at 
> org.apache.lucene.search.SloppyPhraseMatcher.initPhrasePositions(SloppyPhraseMatcher.java:356)
>   at 
> org.apache.lucene.search.SloppyPhraseMatcher.reset(SloppyPhraseMatcher.java:153)
>   at org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:49)
>   at 
> org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
>   at 
> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>   at 
> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
>   at 
> org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
>   at 
> org.apache.lucene.search.DoubleValues$1.doubleValue(DoubleValues.java:48)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
>   at 
> org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
>   at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:659)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:572)
>   at 
> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:419)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:430)
>   at LuceneCrash.main(LuceneCrash.java:51)
> {noformat}
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/202102.mbox/%3C177a65ec-5ec3-e1aa-99c3-b478e165d5e8%40wolfram.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052

2021-02-11 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283521#comment-17283521
 ] 

Noble Paul commented on SOLR-15136:
---

LGTM

> need to audit for excesive INFO level logging after SOLR-15052
> --
>
> Key: SOLR-15136
> URL: https://issues.apache.org/jira/browse/SOLR-15136
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-15136.patch
>
>
> Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive 
> amount of INFO level logging, notable from ZkStateReader...
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E
> this appears to be caused by SOLR-15052, but is not consistent between master 
> and branch_8x.  at a glance it appears there was lots of new logging 
> introduced originally at the INFO level with "nocommit" comments to dial down 
> to debug -- some of which happened, but evidently a lot didn't
> I'm filing this issue in the hopes that the folks involved in SOLR-15052 will 
> please go back and do a thorough review of what logging is on master vs 8x 
> and carefully reconsider what log messages added for development purposes are 
> still useful, and what levels they should be at.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley opened a new pull request #2358: LUCENE-9762: FunctionScoreQuery must guard score() called twice

2021-02-11 Thread GitBox


dsmiley opened a new pull request #2358:
URL: https://github.com/apache/lucene-solr/pull/2358


   https://issues.apache.org/jira/browse/LUCENE-9762
   
   The score() may be called multiple times. It should take care to call 
DoubleValues.advanceExact only the first time, or risk faulty behavior 
including exceptions.
   
   There isn't an 8.8.1 section in the CHANGES.txt on master branch but I can 
add it into branch_8x and branch_8_8 where eventually it will show up when the 
RM does a release?  Or I should just add it?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9762) FunctionScoreQuery can fail when the score is requested twice

2021-02-11 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-9762:
-
Summary: FunctionScoreQuery can fail when the score is requested twice  
(was: AssertionError from Lucene84PostingsReader$EverythingEnum.nextPosition)

> FunctionScoreQuery can fail when the score is requested twice
> -
>
> Key: LUCENE-9762
> URL: https://issues.apache.org/jira/browse/LUCENE-9762
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.8
>Reporter: Chris M. Hostetter
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-9762.patch
>
>
> As originally reported by Nicolás Lichtmaier on the java-user list, there are 
> some trivial situations which can trigger an assertion error in the 
> PostingsReader when enumerating PhrasePositions for a sloppy PhraseQuery...
> {noformat}
> Exception in thread "main" java.lang.AssertionError
>   at 
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum.nextPosition(Lucene84PostingsReader.java:940)
>   at 
> org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
>   at 
> org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:46)
>   at 
> org.apache.lucene.search.SloppyPhraseMatcher.initSimple(SloppyPhraseMatcher.java:368)
>   at 
> org.apache.lucene.search.SloppyPhraseMatcher.initPhrasePositions(SloppyPhraseMatcher.java:356)
>   at 
> org.apache.lucene.search.SloppyPhraseMatcher.reset(SloppyPhraseMatcher.java:153)
>   at org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:49)
>   at 
> org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
>   at 
> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>   at 
> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
>   at 
> org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
>   at 
> org.apache.lucene.search.DoubleValues$1.doubleValue(DoubleValues.java:48)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
>   at 
> org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
>   at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:659)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:572)
>   at 
> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:419)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:430)
>   at LuceneCrash.main(LuceneCrash.java:51)
> {noformat}
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/202102.mbox/%3C177a65ec-5ec3-e1aa-99c3-b478e165d5e8%40wolfram.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-9762) AssertionError from Lucene84PostingsReader$EverythingEnum.nextPosition

2021-02-11 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-9762:


Assignee: David Smiley

> AssertionError from Lucene84PostingsReader$EverythingEnum.nextPosition
> --
>
> Key: LUCENE-9762
> URL: https://issues.apache.org/jira/browse/LUCENE-9762
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.8
>Reporter: Chris M. Hostetter
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-9762.patch
>
>
> As originally reported by Nicolás Lichtmaier on the java-user list, there are 
> some trivial situations which can trigger an assertion error in the 
> PostingsReader when enumerating PhrasePositions for a sloppy PhraseQuery...
> {noformat}
> Exception in thread "main" java.lang.AssertionError
>   at 
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum.nextPosition(Lucene84PostingsReader.java:940)
>   at 
> org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
>   at 
> org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:46)
>   at 
> org.apache.lucene.search.SloppyPhraseMatcher.initSimple(SloppyPhraseMatcher.java:368)
>   at 
> org.apache.lucene.search.SloppyPhraseMatcher.initPhrasePositions(SloppyPhraseMatcher.java:356)
>   at 
> org.apache.lucene.search.SloppyPhraseMatcher.reset(SloppyPhraseMatcher.java:153)
>   at org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:49)
>   at 
> org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
>   at 
> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>   at 
> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
>   at 
> org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
>   at 
> org.apache.lucene.search.DoubleValues$1.doubleValue(DoubleValues.java:48)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
>   at 
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
>   at 
> org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
>   at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:659)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:572)
>   at 
> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:419)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:430)
>   at LuceneCrash.main(LuceneCrash.java:51)
> {noformat}
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/202102.mbox/%3C177a65ec-5ec3-e1aa-99c3-b478e165d5e8%40wolfram.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15136) need to audit for excesive INFO level logging after SOLR-15052

2021-02-11 Thread Ishan Chattopadhyaya (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-15136:

Attachment: SOLR-15136.patch
  Assignee: Ishan Chattopadhyaya
Status: Open  (was: Open)

Here's a patch reducing logging from ZkStateReader.

> need to audit for excesive INFO level logging after SOLR-15052
> --
>
> Key: SOLR-15136
> URL: https://issues.apache.org/jira/browse/SOLR-15136
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-15136.patch
>
>
> Markus Jelsma noted on solr-user thread that 8.8 introduced an excessive 
> amount of INFO level logging, notable from ZkStateReader...
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202102.mbox/%3CCAJvgyxqDBNjPtzamszOf1ZZiZGPg42PYegYbHPkVMm5OF1%3DaVw%40mail.gmail.com%3E
> this appears to be caused by SOLR-15052, but is not consistent between master 
> and branch_8x.  at a glance it appears there was lots of new logging 
> introduced originally at the INFO level with "nocommit" comments to dial down 
> to debug -- some of which happened, but evidently a lot didn't
> I'm filing this issue in the hopes that the folks involved in SOLR-15052 will 
> please go back and do a thorough review of what logging is on master vs 8x 
> and carefully reconsider what log messages added for development purposes are 
> still useful, and what levels they should be at.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15089) Allow backup/restoration to Amazon's S3 blobstore

2021-02-11 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283493#comment-17283493
 ] 

Ishan Chattopadhyaya commented on SOLR-15089:
-

I would strongly prefer for this to stay outside of solr-core, preferably in 
solr-extras repo (when that's created). Having AWS libraries shipped with Solr 
by default would feel very awkward.

> Allow backup/restoration to Amazon's S3 blobstore 
> --
>
> Key: SOLR-15089
> URL: https://issues.apache.org/jira/browse/SOLR-15089
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jason Gerlowski
>Priority: Major
>
> Solr's BackupRepository interface provides an abstraction around the physical 
> location/format that backups are stored in.  This allows plugin writers to 
> create "repositories" for a variety of storage mediums.  It'd be nice if Solr 
> offered more mediums out of the box though, such as some of the "blobstore" 
> offerings provided by various cloud providers.
> This ticket proposes that a "BackupRepository" implementation for Amazon's 
> popular 'S3' blobstore, so that Solr users can use it for backups without 
> needing to write their own code.
> Amazon offers a s3 Java client with acceptable licensing, and the required 
> code is relatively simple.  The biggest challenge in supporting this will 
> likely be procedural - integration testing requires S3 access and S3 access 
> costs money.  We can check with INFRA to see if there is any way to get cloud 
> credits for an integration test to run in nightly Jenkins runs on the ASF 
> Jenkins server.  Alternatively we can try to stub out the blobstore in some 
> reliable way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15011) /admin/logging handler should be able to configure logs on all nodes

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283486#comment-17283486
 ] 

ASF subversion and git services commented on SOLR-15011:


Commit db6129759061e24bcdb90f3a1310bd25a004076b in lucene-solr's branch 
refs/heads/master from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=db61297 ]

SOLR-15011: Remove flawed test


> /admin/logging handler should be able to configure logs on all nodes
> 
>
> Key: SOLR-15011
> URL: https://issues.apache.org/jira/browse/SOLR-15011
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: logging
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The LoggingHandler registered at /admin/logging can configure log levels for 
> the current node.  This is nice but in SolrCloud, what's needed is an ability 
> to change the level for _all_ nodes in the cluster.  I propose that this be a 
> parameter name "distrib" defaulting to SolrCloud mode's status.  An admin UI 
> could have a checkbox for it.  I don't propose that the read operations be 
> changed -- they can continue to just look at the node you are hitting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15132) Add window paramater to the nodes Streaming Expression

2021-02-11 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-15132:
--
Attachment: SOLR-15132.patch

> Add window paramater to the nodes Streaming Expression
> --
>
> Key: SOLR-15132
> URL: https://issues.apache.org/jira/browse/SOLR-15132
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Priority: Major
> Attachments: SOLR-15132.patch, SOLR-15132.patch, SOLR-15132.patch, 
> SOLR-15132.patch
>
>
> The *nodes* Streaming Expression performs a breadth first graph traversal. 
> This ticket will add a *window* parameter to allow the nodes expression to 
> traverse the graph within a window of time. 
> To take advantage of this feature you must index the content with a String 
> field which is an ISO timestamp truncated at ten seconds. Then the *window* 
> parameter can be applied to walk the graph within a *window prior* to a 
> specific ten second window and perform aggregations. 
> *The main use case for this feature is auto-detecting lagged correlations.* 
> This is useful in many different fields.
> Here is an example using Solr logs to answer the following question: 
> What types of log events occur most frequently in the 30 second window prior 
> to 10 second windows with the most slow queries:
> {code}
> nodes(logs,
>   facet(logs, q="qtime_s:[5000 TO *]", buckets="time_ten_seconds", 
> rows="25"),
>   walk="time_ten_seconds->time_ten_seconds",
>   window="3",
>   gather="type_s",
>   count(*))
> {code}
> This ticket is phase 1. Phase 2 will auto-detect different ISO Timestamp 
> truncations so that increments of one second, one minute, one day etc... can 
> also be traversed using the same query syntax. There will be a follow-on 
> ticket for that after this ticket is completed. This will create a more 
> general purpose time graph.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2342: LUCENE-9406: Add IndexWriterEventListener to track events in IndexWriter

2021-02-11 Thread GitBox


zacharymorn commented on a change in pull request #2342:
URL: https://github.com/apache/lucene-solr/pull/2342#discussion_r574958684



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3518,15 +3518,30 @@ private long prepareCommitInternal() throws IOException 
{
   }
 
   if (pointInTimeMerges != null) {
+MergePolicy.OneMerge nextMerge = null;
+
+if (pendingMerges.size() > 0) {
+  // nocommit getting OneMerge instance here via 
mergeSource.getNextMerge() will

Review comment:
   No problem! From the context I assume you are actually meaning to pass 
`pointInTimeMerges` into event listener, since `pendingMerges` is of type 
`Deque`? I've pushed a commit to use `pointInTimeMerges` 
for now.

##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java
##
@@ -388,6 +388,69 @@ public void testMergeOnCommit() throws IOException {
 dir.close();
   }
 
+  private class TesIndexWriterEventListener implements 
IndexWriterEventListener {
+private boolean beginMergeCalled = false;
+private boolean endMergeCalled = false;
+
+@Override
+public void beginMergeOnFullFlush(MergePolicy.OneMerge merge) {
+  beginMergeCalled = true;
+}
+
+@Override
+public void endMergeOnFullFlush(MergePolicy.OneMerge merge) {
+  endMergeCalled = true;
+}
+
+public boolean isEventsRecorded() {
+  return beginMergeCalled && endMergeCalled;
+}
+  }
+
+  // Test basic semantics of merge on commit and events recording invocation
+  public void testMergeOnCommitWithEventListener() throws IOException {

Review comment:
   Makes sense. Added.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread Naoto Minami (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283450#comment-17283450
 ] 

Naoto Minami commented on SOLR-15114:
-

I'm very glad to contribute the community!
Thanks [~janhoy], [~dsmiley] and [~tflobbe].

> WAND does not work correctly on multiple segments in Solr 8.6.3
> ---
>
> Key: SOLR-15114
> URL: https://issues.apache.org/jira/browse/SOLR-15114
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6, 8.7, 8.8
>Reporter: Naoto Minami
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: master (9.0), 8.9, 8.8.1
>
> Attachments: wand.pdf
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each 
> index segment and remain zero until maxScore is updated.
> There are two causes of this problem:
>  - MaxScoreCollector does not set minCompetitiveScore of 
> MinCompetitiveScoreAwareScorable newly generated for another index segment.
>  - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
> This behavior is correct considering the purpose of MaxScoreCollector.
> For details, see the attached pdf.
> *Note*
> This problem occurs in distributed search (SolrCloud) or the fl=score 
> parameter specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-8711) Move the logic to score across multiple fields to Similarity?

2021-02-11 Thread Julie Tibshirani (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julie Tibshirani reassigned LUCENE-8711:


Assignee: Julie Tibshirani

> Move the logic to score across multiple fields to Similarity?
> -
>
> Key: LUCENE-8711
> URL: https://issues.apache.org/jira/browse/LUCENE-8711
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Julie Tibshirani
>Priority: Minor
>
> BlendedTermQuery and BM25FTermQuery both try to merge score contributions of 
> terms across multiple fields. Using them is still very manual. Is it 
> something that we could make similarities responsible for instead?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-9725) Allow BM25FQuery to use other similarities

2021-02-11 Thread Julie Tibshirani (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julie Tibshirani reassigned LUCENE-9725:


Assignee: Julie Tibshirani

> Allow BM25FQuery to use other similarities
> --
>
> Key: LUCENE-9725
> URL: https://issues.apache.org/jira/browse/LUCENE-9725
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Assignee: Julie Tibshirani
>Priority: Major
> Fix For: 8.9
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic 
> combined field where all terms have been indexed. It computes new term and 
> collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field 
> content, and (2) pass these to a similarity function. There is nothing really 
> specific to BM25Similarity in this approach. In step 2, we could use another 
> similarity, for example BooleanSimilarity or those based on language models 
> like LMDirichletSimilarity. The main restriction is that norms have to be 
> additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the 
> one configured on IndexSearcher. We could think of this as providing a 
> sensible default approach to cross-field scoring for many similarities. It's 
> an incremental step towards LUCENE-8711, which would give similarities more 
> fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently

2021-02-11 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283442#comment-17283442
 ] 

Robert Muir commented on LUCENE-9754:
-

I attached a prototype. Should address the TODO, so that users can customize 
this "chunking" in case they want it to work differently than sentence 
boundaries, I will look into it.


> ICU Tokenizer: letter-space-number-letter tokenized inconsistently
> --
>
> Key: LUCENE-9754
> URL: https://issues.apache.org/jira/browse/LUCENE-9754
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.5
> Environment: Tested most recently on Elasticsearch 6.5.4.
>Reporter: Trey Jones
>Priority: Major
> Attachments: LUCENE-9754_prototype.patch
>
>
> The tokenization of strings like _14th_ with the ICU tokenizer is affected by 
> the character that comes before preceeding whitespace.
> For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | 
> 14 | th.
> In general, in a letter-space-number-letter sequence, if the writing system 
> before the space is the same as the writing system after the number, then you 
> get two tokens. If the writing systems differ, you get three tokens.
> If the conditions are just right, the chunking that the ICU tokenizer does 
> (trying to split on spaces to create <4k chunks) can create an artificial 
> boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the 
> unexpected split of the second token (_14th_). Because chunking changes can 
> ripple through a long document, editing text or the effects of a character 
> filter can cause changes in tokenization thousands of lines later in a 
> document.
> My guess is that some "previous character set" flag is not reset at the 
> space, and numbers are not in a character set, so _t_ is compared to _ァ_ and 
> they are not the same—causing a token split at the character set change—but 
> I'm not sure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently

2021-02-11 Thread Robert Muir (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-9754:

Attachment: LUCENE-9754_prototype.patch

> ICU Tokenizer: letter-space-number-letter tokenized inconsistently
> --
>
> Key: LUCENE-9754
> URL: https://issues.apache.org/jira/browse/LUCENE-9754
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.5
> Environment: Tested most recently on Elasticsearch 6.5.4.
>Reporter: Trey Jones
>Priority: Major
> Attachments: LUCENE-9754_prototype.patch
>
>
> The tokenization of strings like _14th_ with the ICU tokenizer is affected by 
> the character that comes before preceeding whitespace.
> For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | 
> 14 | th.
> In general, in a letter-space-number-letter sequence, if the writing system 
> before the space is the same as the writing system after the number, then you 
> get two tokens. If the writing systems differ, you get three tokens.
> If the conditions are just right, the chunking that the ICU tokenizer does 
> (trying to split on spaces to create <4k chunks) can create an artificial 
> boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the 
> unexpected split of the second token (_14th_). Because chunking changes can 
> ripple through a long document, editing text or the effects of a character 
> filter can cause changes in tokenization thousands of lines later in a 
> document.
> My guess is that some "previous character set" flag is not reset at the 
> space, and numbers are not in a character set, so _t_ is compared to _ァ_ and 
> they are not the same—causing a token split at the character set change—but 
> I'm not sure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9751) Assertion error (int overflow) in ByteSliceReader

2021-02-11 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283419#comment-17283419
 ] 

Michael McCandless commented on LUCENE-9751:


{quote}It's definitely not a single huge document, Mike.
{quote}
OK, hrmph.
{quote}I've tried reproducing last night (on a different machine but with the 
same heap/ threads setup) but no luck - it finished successfully.
{quote}
Also hrmph.
{quote}I guess this means the problem is gone?... :)
{quote}
 
I wish!  Non ignorance is non bliss!

> Assertion error (int overflow) in ByteSliceReader
> -
>
> Key: LUCENE-9751
> URL: https://issues.apache.org/jira/browse/LUCENE-9751
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.7
>Reporter: Dawid Weiss
>Priority: Major
>
> New computers come with insane amounts of ram and heaps can get pretty big. 
> If you adjust per-thread buffers to larger values strange things start 
> happening. This happened to us today:
> {code}
> Caused by: java.lang.AssertionError
>   at 
> org.apache.lucene.index.ByteSliceReader.init(ByteSliceReader.java:44) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.TermsHashPerField.initReader(TermsHashPerField.java:88)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxFields$FreqProxPostingsEnum.reset(FreqProxFields.java:430)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxFields$FreqProxTermsEnum.postings(FreqProxFields.java:247)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:127)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:264)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:480) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:394) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:440)
>  ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) 
> ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - 
> atrisharma - 2020-10-29 19:35:28]
>   ... 7 more
> {code}
> Likely an int overflow in TermsHashPerField:
> {code}
> reader.init(bytePool,
> 
> postingsArray.byteStarts[termID]+stream*ByteBlockPool.FIRST_LEVEL_SIZE,
> streamAddressBuffer[offsetInAddressBuffer+stream]);
> {code}
> Don't know if this can be prevented somehow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2342: LUCENE-9406: Add IndexWriterEventListener to track events in IndexWriter

2021-02-11 Thread GitBox


mikemccand commented on a change in pull request #2342:
URL: https://github.com/apache/lucene-solr/pull/2342#discussion_r574877742



##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java
##
@@ -388,6 +388,69 @@ public void testMergeOnCommit() throws IOException {
 dir.close();
   }
 
+  private class TesIndexWriterEventListener implements 
IndexWriterEventListener {
+private boolean beginMergeCalled = false;
+private boolean endMergeCalled = false;
+
+@Override
+public void beginMergeOnFullFlush(MergePolicy.OneMerge merge) {
+  beginMergeCalled = true;
+}
+
+@Override
+public void endMergeOnFullFlush(MergePolicy.OneMerge merge) {
+  endMergeCalled = true;
+}
+
+public boolean isEventsRecorded() {
+  return beginMergeCalled && endMergeCalled;
+}
+  }
+
+  // Test basic semantics of merge on commit and events recording invocation
+  public void testMergeOnCommitWithEventListener() throws IOException {

Review comment:
   You might also edit `LuceneTestCase.newIndexWriterConfig` to randomly 
swap in a `MockIndexWriterEventListener` just to exercise this listener in any 
tests using that API, which is quite a few.  It could uncover times when we 
accidentally break something when this listener is invoked ...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2342: LUCENE-9406: Add IndexWriterEventListener to track events in IndexWriter

2021-02-11 Thread GitBox


mikemccand commented on a change in pull request #2342:
URL: https://github.com/apache/lucene-solr/pull/2342#discussion_r574877053



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3518,15 +3518,30 @@ private long prepareCommitInternal() throws IOException 
{
   }
 
   if (pointInTimeMerges != null) {
+MergePolicy.OneMerge nextMerge = null;
+
+if (pendingMerges.size() > 0) {
+  // nocommit getting OneMerge instance here via 
mergeSource.getNextMerge() will

Review comment:
   How about passing the `pendingMerges` to the event listener instead?  
(Sorry if I asked for `OneMerge` on the issue!  `MergeSpecification` is better 
since it can hold multiple merges, allows event listener to log things like how 
many merges were requested during `commit`, etc.).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15129) Use the Solr TGZ artifact as Docker context

2021-02-11 Thread Houston Putman (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283393#comment-17283393
 ] 

Houston Putman edited comment on SOLR-15129 at 2/11/21, 9:43 PM:
-

[~dsmiley], that's what the _/elasticsearch image looks like. It's a hardcoded 
sha reference of the image built within elastic.

{{FROM 
docker.elastic.co/elasticsearch/elasticsearch:7.10.1@sha256:5d8f1962907ef60746a8cf61c8a7f2b8755510ee36bdee0f65417f90a38a0139}}

We could certainly make that a part of the ReleaseWizard. It would stop us from 
doing incremental updates however for base images. I don't think that's a 
sticking point though.

As per Hoss' comments above about the git repository being hosted on apache 
hardware, and the binary release being hosted on mirrors, couldn't we use 
https://downloads.apache.org/lucene/solr/8.8.0/solr-8.8.0.tgz? That's hosted on 
apache hardware. I don't see a large difference in the security provided by the 
git repo vs the security provided by the tgz on apache hardware.

I can summarize our master plan and include the options we are looking at 
(github and binary release).


was (Author: houston):
[~dsmiley], that's what the _/elasticsearch image looks like. It's a hardcoded 
sha reference of the image built within elastic.

{{FROM 
docker.elastic.co/elasticsearch/elasticsearch:7.10.1@sha256:5d8f1962907ef60746a8cf61c8a7f2b8755510ee36bdee0f65417f90a38a0139}}

We could certainly make that a part of the ReleaseWizard. It would stop us from 
doing incremental updates however for base images. I don't think that's a 
sticking point though.

As per Hoss' comments above about the git repository being hosted on apache 
hardware, and the binary release being hosted on mirrors, couldn't we use 
https://downloads.apache.org/lucene/solr/8.8.0/solr-8.8.0.tgz? That's hosted on 
apache hardware. I don't see a large difference in the security provided by the 
git repo vs the security provided by the tgz on apache hardware.

I can summarize our master plan and have it be independent of which input we 
use (github or binary release), since I doubt that will make a difference in 
whether they accept it or not.

> Use the Solr TGZ artifact as Docker context
> ---
>
> Key: SOLR-15129
> URL: https://issues.apache.org/jira/browse/SOLR-15129
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Houston Putman
>Priority: Major
>
> As discussed in SOLR-15127, there is a need for a unified Dockerfile that 
> allows for release and local builds.
> This ticket is an attempt to achieve this by using the Solr distribution TGZ 
> as the docker context to build from.
> Therefore release images would be completely reproducible by running:
> {{docker build -f solr-9.0.0/Dockerfile 
> https://www.apache.org/dyn/closer.lua/lucene/solr/9.0.0/solr-9.0.0.tgz}}
> The changes to the Solr distribution would include adding a Dockerfile at 
> {{solr-/Dockerfile}}, adding the docker scripts under 
> {{solr-/docker}}, and adding a version file at 
> {{solr-/VERSION.txt}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15129) Use the Solr TGZ artifact as Docker context

2021-02-11 Thread Houston Putman (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283393#comment-17283393
 ] 

Houston Putman commented on SOLR-15129:
---

[~dsmiley], that's what the _/elasticsearch image looks like. It's a hardcoded 
sha reference of the image built within elastic.

{{FROM 
docker.elastic.co/elasticsearch/elasticsearch:7.10.1@sha256:5d8f1962907ef60746a8cf61c8a7f2b8755510ee36bdee0f65417f90a38a0139}}

We could certainly make that a part of the ReleaseWizard. It would stop us from 
doing incremental updates however for base images. I don't think that's a 
sticking point though.

As per Hoss' comments above about the git repository being hosted on apache 
hardware, and the binary release being hosted on mirrors, couldn't we use 
https://downloads.apache.org/lucene/solr/8.8.0/solr-8.8.0.tgz? That's hosted on 
apache hardware. I don't see a large difference in the security provided by the 
git repo vs the security provided by the tgz on apache hardware.

I can summarize our master plan and have it be independent of which input we 
use (github or binary release), since I doubt that will make a difference in 
whether they accept it or not.

> Use the Solr TGZ artifact as Docker context
> ---
>
> Key: SOLR-15129
> URL: https://issues.apache.org/jira/browse/SOLR-15129
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0)
>Reporter: Houston Putman
>Priority: Major
>
> As discussed in SOLR-15127, there is a need for a unified Dockerfile that 
> allows for release and local builds.
> This ticket is an attempt to achieve this by using the Solr distribution TGZ 
> as the docker context to build from.
> Therefore release images would be completely reproducible by running:
> {{docker build -f solr-9.0.0/Dockerfile 
> https://www.apache.org/dyn/closer.lua/lucene/solr/9.0.0/solr-9.0.0.tgz}}
> The changes to the Solr distribution would include adding a Dockerfile at 
> {{solr-/Dockerfile}}, adding the docker scripts under 
> {{solr-/docker}}, and adding a version file at 
> {{solr-/VERSION.txt}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15132) Add window paramater to the nodes Streaming Expression

2021-02-11 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-15132:
--
Attachment: SOLR-15132.patch

> Add window paramater to the nodes Streaming Expression
> --
>
> Key: SOLR-15132
> URL: https://issues.apache.org/jira/browse/SOLR-15132
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Priority: Major
> Attachments: SOLR-15132.patch, SOLR-15132.patch, SOLR-15132.patch
>
>
> The *nodes* Streaming Expression performs a breadth first graph traversal. 
> This ticket will add a *window* parameter to allow the nodes expression to 
> traverse the graph within a window of time. 
> To take advantage of this feature you must index the content with a String 
> field which is an ISO timestamp truncated at ten seconds. Then the *window* 
> parameter can be applied to walk the graph within a *window prior* to a 
> specific ten second window and perform aggregations. 
> *The main use case for this feature is auto-detecting lagged correlations.* 
> This is useful in many different fields.
> Here is an example using Solr logs to answer the following question: 
> What types of log events occur most frequently in the 30 second window prior 
> to 10 second windows with the most slow queries:
> {code}
> nodes(logs,
>   facet(logs, q="qtime_s:[5000 TO *]", buckets="time_ten_seconds", 
> rows="25"),
>   walk="time_ten_seconds->time_ten_seconds",
>   window="3",
>   gather="type_s",
>   count(*))
> {code}
> This ticket is phase 1. Phase 2 will auto-detect different ISO Timestamp 
> truncations so that increments of one second, one minute, one day etc... can 
> also be traversed using the same query syntax. There will be a follow-on 
> ticket for that after this ticket is completed. This will create a more 
> general purpose time graph.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15142) Allow the cat Streaming Expression to read gzip files

2021-02-11 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-15142:
--
Attachment: (was: SOLR-15132.patch)

> Allow the cat Streaming Expression to read gzip files
> -
>
> Key: SOLR-15142
> URL: https://issues.apache.org/jira/browse/SOLR-15142
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-15142.patch, SOLR-15142.patch
>
>
> This ticket will allow the *cat* Streaming Expression to read gzip files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15142) Allow the cat Streaming Expression to read gzip files

2021-02-11 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-15142:
--
Attachment: SOLR-15132.patch

> Allow the cat Streaming Expression to read gzip files
> -
>
> Key: SOLR-15142
> URL: https://issues.apache.org/jira/browse/SOLR-15142
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-15132.patch, SOLR-15142.patch, SOLR-15142.patch
>
>
> This ticket will allow the *cat* Streaming Expression to read gzip files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15145) Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url for core node props

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283373#comment-17283373
 ] 

ASF subversion and git services commented on SOLR-15145:


Commit 650b03752850c80a0bd3eb33fb2cdafc417b4ecc in lucene-solr's branch 
refs/heads/branch_8x from Timothy Potter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=650b037 ]

SOLR-15145: Fix up changes.txt for 8.8.1 release



> Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url 
> for core node props
> --
>
> Key: SOLR-15145
> URL: https://issues.apache.org/jira/browse/SOLR-15145
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 8.8
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Critical
> Fix For: 8.8.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> From the mailing list:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> deployment.uleaf.ear//org.apache.solr.common.cloud.ZkCoreNodeProps.getCoreUrl(ZkCoreNodeProps.java:53)
>   at 
> deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.lambda$sendRequest$2(BaseCloudSolrClient.java:1161)
>   at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
>   at 
> deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1159)
>   at 
> deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934)
>   ... 166 more
> {code}
> see: 
> https://lists.apache.org/thread.html/r3d131030f0a7026235451f71fabdae6d6b7c2f955822c75dcad4d41f%40%3Csolr-user.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2357: SOLR-15145: Fix up changes.txt in branch_8x for 8.8.1 release

2021-02-11 Thread GitBox


thelabdude merged pull request #2357:
URL: https://github.com/apache/lucene-solr/pull/2357


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude opened a new pull request #2357: SOLR-15145: Fix up changes.txt in branch_8x for 8.8.1 release

2021-02-11 Thread GitBox


thelabdude opened a new pull request #2357:
URL: https://github.com/apache/lucene-solr/pull/2357


   No code changes, just updating release note and changes.txt



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Closed] (SOLR-15143) edismax is ignoring qf is query term is *

2021-02-11 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed SOLR-15143.
---

> edismax is ignoring qf is query term is *
> -
>
> Key: SOLR-15143
> URL: https://issues.apache.org/jira/browse/SOLR-15143
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.4
> Environment: Solr 7.4.0 - cloud mode
> Java runtine: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_271 
> 25.271-b09
> OS: Linux 
>Reporter: Yogendra Kumar Soni
>Priority: Major
>
> We are using solr 7.4. when we are using edismax  query parser and query for 
> * with qf param.
> {code:java}
> {!edismax qf=field1}*{code}
> we are getting all documents in result. 
> I am posting debugQuery output below
>  
> {code:java}
> {...
> "rawquerystring":"{!edismax qf=field1}*", "querystring":"{!edismax 
> qf=field1}*", "parsedquery":"+MatchAllDocsQuery(*:*)", 
> "parsedquery_toString":"+*:*", "QParser":"ExtendedDismaxQParser"
> ...}
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15143) edismax is ignoring qf is query term is *

2021-02-11 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-15143.
-
Resolution: Not A Problem

This is working-as-designed.  An asterisk does not imply that there needs to be 
any data in any of the fields.  If you want to require that, then you'll have 
to recognize this special case and submit a different query to Solr, i.e. 
{{q=theField:*}}  Consider first posting to solr-user list or Solr Slack ( 
https://apachesolr.slack.com ) before filing an issue.

> edismax is ignoring qf is query term is *
> -
>
> Key: SOLR-15143
> URL: https://issues.apache.org/jira/browse/SOLR-15143
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 7.4
> Environment: Solr 7.4.0 - cloud mode
> Java runtine: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_271 
> 25.271-b09
> OS: Linux 
>Reporter: Yogendra Kumar Soni
>Priority: Major
>
> We are using solr 7.4. when we are using edismax  query parser and query for 
> * with qf param.
> {code:java}
> {!edismax qf=field1}*{code}
> we are getting all documents in result. 
> I am posting debugQuery output below
>  
> {code:java}
> {...
> "rawquerystring":"{!edismax qf=field1}*", "querystring":"{!edismax 
> qf=field1}*", "parsedquery":"+MatchAllDocsQuery(*:*)", 
> "parsedquery_toString":"+*:*", "QParser":"ExtendedDismaxQParser"
> ...}
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9590) Add javadoc for Lucene86PointsFormat class

2021-02-11 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-9590.
--
Resolution: Fixed

Thanks Lu Xugang!  Sorry for the long delay.

> Add javadoc for  Lucene86PointsFormat class
> ---
>
> Key: LUCENE-9590
> URL: https://issues.apache.org/jira/browse/LUCENE-9590
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/codecs
>Reporter: Lu Xugang
>Priority: Minor
> Fix For: master (9.0)
>
> Attachments: 1.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I would like to add javadoc for Lucene86PointsFormat class,  it is really 
> helpful for source reader to understand the data structure with point value, 
> is anyone doing this or plan?
> The attachment list part of the data structure (filled with color means it 
> has sub data structure)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9590) Add javadoc for Lucene86PointsFormat class

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283347#comment-17283347
 ] 

ASF subversion and git services commented on LUCENE-9590:
-

Commit 9837bc4a4da1c63088a101c20374591f62e0be08 in lucene-solr's branch 
refs/heads/master from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9837bc4 ]

LUCENE-9590: Add javadoc for Lucene86PointsFormat class (#2194)

to Lucene's Confluence.
* also corrected some trivial errors in javadocs & comments

> Add javadoc for  Lucene86PointsFormat class
> ---
>
> Key: LUCENE-9590
> URL: https://issues.apache.org/jira/browse/LUCENE-9590
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/codecs
>Reporter: Lu Xugang
>Priority: Minor
> Fix For: master (9.0)
>
> Attachments: 1.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I would like to add javadoc for Lucene86PointsFormat class,  it is really 
> helpful for source reader to understand the data structure with point value, 
> is anyone doing this or plan?
> The attachment list part of the data structure (filled with color means it 
> has sub data structure)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley merged pull request #2194: LUCENE-9590: Add javadoc for Lucene86PointsFormat class

2021-02-11 Thread GitBox


dsmiley merged pull request #2194:
URL: https://github.com/apache/lucene-solr/pull/2194


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14920) Format code automatically and enforce it in Solr

2021-02-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283338#comment-17283338
 ] 

David Smiley commented on SOLR-14920:
-

I'm very much looking forward to this some day, but I very much concur with 
this sentiment from Erick:

bq. We've lived with the formatting anomalies for many years, I don't see the 
driver for pushing this forward before the reference impl is resolved, there 
are better places to spend the effort IMO.

Maybe we could start with just the contribs & docker, leaving aside Solr Core & 
SolrJ?  Just an idea to get some middle ground.

> Format code automatically and enforce it in Solr
> 
>
> Key: SOLR-14920
> URL: https://issues.apache.org/jira/browse/SOLR-14920
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Priority: Major
>  Labels: codestyle, formatting
>
> See the discussion at: LUCENE-9564.
> This is a placeholder for the present, I'm reluctant to do this to the Solr 
> code base until after:
>  * we have some Solr-specific consensus
>  * we have some clue what this means for the reference impl.
> Reconciling the reference impl will be difficult enough without a zillion 
> format changes to add to the confusion.
> So my proposal is
> 1> do this.
> 2> Postpone this until after the reference impl is merged.
> 3> do this in one single commit for reasons like being able to conveniently 
> have this separated out from git blame.
> Assigning to myself so it doesn't get lost, but anyone who wants to take it 
> over please feel free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15145) Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url for core node props

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283330#comment-17283330
 ] 

ASF subversion and git services commented on SOLR-15145:


Commit 8662121ca527e456ba8f01f81e199a6c01322ac6 in lucene-solr's branch 
refs/heads/master from Timothy Potter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8662121 ]

SOLR-15145: solr.storeBaseUrl feature flag introduced in 8.8.1 should default 
to false for 9.x



> Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url 
> for core node props
> --
>
> Key: SOLR-15145
> URL: https://issues.apache.org/jira/browse/SOLR-15145
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 8.8
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Critical
> Fix For: 8.8.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> From the mailing list:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> deployment.uleaf.ear//org.apache.solr.common.cloud.ZkCoreNodeProps.getCoreUrl(ZkCoreNodeProps.java:53)
>   at 
> deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.lambda$sendRequest$2(BaseCloudSolrClient.java:1161)
>   at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
>   at 
> deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1159)
>   at 
> deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934)
>   ... 166 more
> {code}
> see: 
> https://lists.apache.org/thread.html/r3d131030f0a7026235451f71fabdae6d6b7c2f955822c75dcad4d41f%40%3Csolr-user.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2355: SOLR-15145: solr.storeBaseUrl feature flag introduced in 8.8.1 should default to false for 9.x

2021-02-11 Thread GitBox


thelabdude merged pull request #2355:
URL: https://github.com/apache/lucene-solr/pull/2355


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani edited a comment on pull request #2310: LUCENE-9705: Create Lucene90PostingsFormat

2021-02-11 Thread GitBox


jtibshirani edited a comment on pull request #2310:
URL: https://github.com/apache/lucene-solr/pull/2310#issuecomment-46181


   Moving these classes under versioned packages like 
`org.apache.lucene.codecs.lucene90` makes sense to me.
   
   I slightly prefer the name `Lucene90BlockTreeTermsReader` because it isn't 
always clear where to stick the version number. But no strong opinion, and I 
see we already have some classes like `Completion84PostingsFormat`. Also maybe 
we'll omit the number in smaller helper classes like `ForDeltaUtil`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)

2021-02-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283328#comment-17283328
 ] 

David Smiley commented on SOLR-15051:
-

Linking SOLR-15089 for the S3 impl of BackupRepository, which is lightly a 
dependency to BlobDirectory.

Some next steps, un-ordered:
* Testing of what we have via existing tests.  Maybe some sub-set of tests 
using MiniSolrCloudCluster?
* Switch to BackupRepository API for backing storage API
* Add first draft "Listings" component implementation with many limitations 
(trivial in-JVM static global (no ZK), no de-duping)

Then:
* Listings: ZK
* Listings: De-duping

> Shared storage -- BlobDirectory (de-duping)
> ---
>
> Key: SOLR-15051
> URL: https://issues.apache.org/jira/browse/SOLR-15051
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This proposal is a way to accomplish shared storage in SolrCloud with a few 
> key characteristics: (A) using a Directory implementation, (B) delegates to a 
> backing local file Directory as a kind of read/write cache (C) replicas have 
> their own "space", (D) , de-duplication across replicas via reference 
> counting, (E) uses ZK but separately from SolrCloud stuff.
> The Directory abstraction is a good one, and helps isolate shared storage 
> from the rest of SolrCloud that doesn't care.  Using a backing normal file 
> Directory is faster for reads and is simpler than Solr's HDFSDirectory's 
> BlockCache.  Replicas having their own space solves the problem of multiple 
> writers (e.g. of the same shard) trying to own and write to the same space, 
> and it implies that any of Solr's replica types can be used along with what 
> goes along with them like peer-to-peer replication (sometimes faster/cheaper 
> than pulling from shared storage).  A de-duplication feature solves needless 
> duplication of files across replicas and from parent shards (i.e. from shard 
> splitting).  The de-duplication feature requires a place to cache directory 
> listings so that they can be shared across replicas and atomically updated; 
> this is handled via ZooKeeper.  Finally, some sort of Solr daemon / 
> auto-scaling code should be added to implement "autoAddReplicas", especially 
> to provide for a scenario where the leader is gone and can't be replicated 
> from directly but we can access shared storage.
> For more about shared storage concepts, consider looking at the description 
> in SOLR-13101 and the linked Google Doc.
> *[PROPOSAL 
> DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani commented on pull request #2310: LUCENE-9705: Create Lucene90PostingsFormat

2021-02-11 Thread GitBox


jtibshirani commented on pull request #2310:
URL: https://github.com/apache/lucene-solr/pull/2310#issuecomment-46181


   Moving these classes under versioned packages like 
`org.apache.lucene.codecs.lucene90` makes sense to me. I slightly prefer the 
name `Lucene90BlockTreeTermsReader` because it isn't always clear where to 
stick the version number. But no strong opinion, and I see we already have some 
classes like `Completion84PostingsFormat`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15089) Allow backup/restoration to Amazon's S3 blobstore

2021-02-11 Thread Andy Throgmorton (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283321#comment-17283321
 ] 

Andy Throgmorton commented on SOLR-15089:
-

Great idea, [~gerlowskija]! How far along are you in coding this effort? I work 
at Salesforce and we wrote+use an S3 implementation of BackupRepository for a 
production Solr stack. It's not in an upstreamable state right now (e.g., uses 
some internal libraries for grabbing keys/secrets, etc.), but I would be happy 
to look into cleaning it up and submitting it for consideration if you haven't 
started yet. Or if you've already written the code, then feel free to add me on 
your code review.

In regards to testing, we use the Adobe S3Mock 
(https://github.com/adobe/S3Mock) library for writing unit tests. Since this 
code is fairly simple, as you mentioned, the S3 APIs it uses are all mainstream 
and mockable with that framework. For larger, end-to-end integration tests, 
we've also started using Minio (https://min.io/) to emulate an S3 server, but I 
would think that's outside the scope of this ticket.

> Allow backup/restoration to Amazon's S3 blobstore 
> --
>
> Key: SOLR-15089
> URL: https://issues.apache.org/jira/browse/SOLR-15089
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jason Gerlowski
>Priority: Major
>
> Solr's BackupRepository interface provides an abstraction around the physical 
> location/format that backups are stored in.  This allows plugin writers to 
> create "repositories" for a variety of storage mediums.  It'd be nice if Solr 
> offered more mediums out of the box though, such as some of the "blobstore" 
> offerings provided by various cloud providers.
> This ticket proposes that a "BackupRepository" implementation for Amazon's 
> popular 'S3' blobstore, so that Solr users can use it for backups without 
> needing to write their own code.
> Amazon offers a s3 Java client with acceptable licensing, and the required 
> code is relatively simple.  The biggest challenge in supporting this will 
> likely be procedural - integration testing requires S3 access and S3 access 
> costs money.  We can check with INFRA to see if there is any way to get cloud 
> credits for an integration test to run in nightly Jenkins runs on the ASF 
> Jenkins server.  Alternatively we can try to stub out the blobstore in some 
> reliable way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14920) Format code automatically and enforce it in Solr

2021-02-11 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283304#comment-17283304
 ] 

Dawid Weiss commented on SOLR-14920:


Just to follow-up to what Erick said (hi Erick, how are the squirrels? :) - it 
indeed takes some work to make sure nothing gets *broken* on that initial 
formatting. We did this for Lucene and we did find some code that formatting 
would have broken (code in comments without pre, manually-adjusted examples).

Arguably you could do the formatting and then recover from history (if somebody 
spots something wrong) - many approaches are possible, I guess.

> Format code automatically and enforce it in Solr
> 
>
> Key: SOLR-14920
> URL: https://issues.apache.org/jira/browse/SOLR-14920
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Priority: Major
>  Labels: codestyle, formatting
>
> See the discussion at: LUCENE-9564.
> This is a placeholder for the present, I'm reluctant to do this to the Solr 
> code base until after:
>  * we have some Solr-specific consensus
>  * we have some clue what this means for the reference impl.
> Reconciling the reference impl will be difficult enough without a zillion 
> format changes to add to the confusion.
> So my proposal is
> 1> do this.
> 2> Postpone this until after the reference impl is merged.
> 3> do this in one single commit for reasons like being able to conveniently 
> have this separated out from git blame.
> Assigning to myself so it doesn't get lost, but anyone who wants to take it 
> over please feel free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh opened a new pull request #2356: SOLR-15152: Export Tool should export nested docs cleanly in .json, .jsonl, and javabin

2021-02-11 Thread GitBox


epugh opened a new pull request #2356:
URL: https://github.com/apache/lucene-solr/pull/2356


   # Description
   
   Export tool says it uses json, but it's actually a json lines format.   It 
ignores anonymous and nested docs.
   
   # Solution
   
   * Tweaked the writer to properly handle anonymous and regular nested docs 
when exporting data.
   * Renamed the existing `json` format to `jsonl`, and introduced a proper 
`json` format.
   * Introduce explicit DocSinks per format, `json`, `jsonl`, and `javabin`.
   
   Now, with the `json` format you can export and then reimport the Solr docs, 
including with child docs.
   
   # Tests
   
   I've added a new `TestExportToolWithNestedDocs`, and extended the existing 
`TestExportTool` tests.  The setup for the tests was quite different, so I 
didn't make them all one file.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ X] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ X] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ X] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [ X] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija edited a comment on pull request #2250: SOLR-13608: Incremental backup file format

2021-02-11 Thread GitBox


gerlowskija edited a comment on pull request #2250:
URL: https://github.com/apache/lucene-solr/pull/2250#issuecomment-00507


   Totally agreed @epugh that we don't want the old format to linger.  I tried 
to convey in documentation that it was deprecated and will be going away, 
hopefully those notes are sufficient.
   
   I thought about `deprecated` tags, but wasn't sure where to put them.  The 
file format isn't a Java method API like those tags are typically used on.  But 
if you have a place in mind you think makes sense, I'm happy to add them.
   
   Per the compatibility plan in the SIP, we'll maintain support for restoring 
the old format through the next major release line (9.x).  Though conceivably 
it could be sooner if there's consensus that that's too long.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija edited a comment on pull request #2250: SOLR-13608: Incremental backup file format

2021-02-11 Thread GitBox


gerlowskija edited a comment on pull request #2250:
URL: https://github.com/apache/lucene-solr/pull/2250#issuecomment-00507


   Totally agreed @epugh that we don't want the old format to linger.  I tried 
to convey in documentation that it was deprecated and will be going away, 
hopefully those notes are sufficient.
   
   I thought about `deprecated` tags, but wasn't sure where to put them.  The 
file format isn't a Java method API like those tags are typically used on.  But 
if you have a place in mind you think makes sense, I'm happy to add them.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija commented on pull request #2250: SOLR-13608: Incremental backup file format

2021-02-11 Thread GitBox


gerlowskija commented on pull request #2250:
URL: https://github.com/apache/lucene-solr/pull/2250#issuecomment-00507


   Totally agreed that we don't want the old format to linger.  I tried to 
convey in documentation that it was deprecated and will be going away, 
hopefully those notes are sufficient.
   
   I thought about `deprecated` tags, but wasn't sure where to put them.  It's 
not a Java method API like those tags are typically used on.  But if you have a 
place in mind you think makes sense I'm happy to add them.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] HoustonPutman commented on issue #214: extensions/v1beta1 Ingress is deprecated

2021-02-11 Thread GitBox


HoustonPutman commented on issue #214:
URL: 
https://github.com/apache/lucene-solr-operator/issues/214#issuecomment-777699761


   We should go ahead and make sure that all dependent resources are up to date 
with the most recent version.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15150) add request level option to fail an atomic update if it can't be done 'in-place'

2021-02-11 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283283#comment-17283283
 ] 

Chris M. Hostetter commented on SOLR-15150:
---

David: yeah, we probably intermix "atomic" and "partial" too much, and conflate 
the atomic nature of the partial updates with the atomic nature of optimistic 
concurrency updates – even when the users aren't doing that.

So agreed: better to use "partial" here to clarify what aspect we're dealing 
with.

Ishan: I had originally considering the verb "force" (and i realize now it stil 
lingers in a test variable) but it felt really missleading and I'd prefer we 
avoid it... 

Through the lens of a novice user "forceX" makes me think I'm telling solr "Hey 
solr do X even if you wouldn't by default and even if it might break something" 
... similar to {{"rm -f"}} or {{"git push --force"}} ... but what we want to 
convey is *NOT* that this is a way to _force_ the update to be done in-place 
(because we can't actually promise that) .. what we want to convey is that this 
is way for the user to say "When i do a partial update, I expect that _either_ 
the update be done in place, or it must fail"

"Require" felt a little better, because it seemed more like a _request_ if that 
makes sense? ... "I require that Solr do X" felt like a good english language 
equivalent, to the sentiment I was going for, because Solr can either satisfy 
the request, or so "I'm not capable of doing that" (ie:fail)

(I had briefly considered using the verb "assert" as in 
{{"assert.inplace.atomic.update"}} or {{"update.partial.assertInPlace"}} but 
that felt too Java/C-ish for general solr users)

Omitting any "verb" (either "force" or "require") and just go with something 
like {{"update.partial.inplace"}} is an interesting idea, but i feel like it's 
too ... "weak" is the closest word i can think of i guess? ... compared to how 
this would behave: {{update.partial.inplace=true}} it feels more like a way to 
override a default, (that might otherwise be hueristicly determined) and makes 
me think that {{update.partial.inplace=false}} would be a way to indicate that 
solr should _never_ do my update inplace, which isn't want we want the 
(default) {{"false"}} value to mean.

So I think on balance at this point I like David's suggestion of 
{{"update.partial.requireInPlace"}} the best so far?

> add request level option to fail an atomic update if it can't be done 
> 'in-place'
> 
>
> Key: SOLR-15150
> URL: https://issues.apache.org/jira/browse/SOLR-15150
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-15150.patch
>
>
> When "In-Place" DocValue updates were added to Solr, the choice was made to 
> re-use the existing "Atomic Update" syntax, and use the DocValue updating 
> code if possible based on the index & schema, otherwise fall back to the 
> existing Atomic Update logic (to re-index the entire document). In essence, 
> "In-Place Atomic Updates" are treated as a (possible) optimization to 
> "regular" Atomic Updates
> This works fine, but it leaves open the possibility of a "gotcha" situation 
> where users may (reasonably) assume that an update can be done "In-Place" but 
> some aspect of the schema prevents it, and the performance of the updates 
> doesn't meet expectations (notably in the case of things like deeply nested 
> documents, where the re-indexing cost is multiplicative based on the total 
> size of the document tree)
> I think it would be a good idea to support an optional request param users 
> can specify with the semantics that say "If this update is an Atomic Update, 
> fail to execute it unless it can be done In-Place"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] HoustonPutman commented on issue #212: Solr cloud never getes deleted when reclaimPolicy is set to Delete

2021-02-11 Thread GitBox


HoustonPutman commented on issue #212:
URL: 
https://github.com/apache/lucene-solr-operator/issues/212#issuecomment-777699251


   So yes, this is actually expected behavior. If you want to delete your 
SolrCloud after deleting the Solr Operator, you will need to remove the 
finalizer from the SolrCloud. Otherwise the SolrCloud will never be able to be 
deleted.
   
   ```bash
   kubectl patch solrcloud  --type='json' -p='[{"op": "remove", "path": 
"/metadata/finalizers"}]'
   ```
   
   That should do the trick. But I would recommend just deleting the SolrCloud 
objects before deleting the Solr Operator. It's the easiest way to go, and the 
PVCs will be handled correctly!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] anshumg commented on a change in pull request #2355: SOLR-15145: solr.storeBaseUrl feature flag introduced in 8.8.1 should default to false for 9.x

2021-02-11 Thread GitBox


anshumg commented on a change in pull request #2355:
URL: https://github.com/apache/lucene-solr/pull/2355#discussion_r574731370



##
File path: solr/solr-ref-guide/src/solr-upgrade-notes.adoc
##
@@ -100,6 +100,12 @@ The default Prometheus Exporter configuration includes 
metrics like queries-per-
 Plugin developers using `SolrPaths.locateSolrHome()` or 'new 
`SolrResourceLoader`' should check deprecation warnings as existing some 
existing functionality will be removed in 9.0.
 https://issues.apache.org/jira/browse/SOLR-14934[SOLR-14934] has more 
technical details about this change for those concerned.
 
+*Removing base_url from Stored State*
+
+If you're able to upgrade SolrJ to 8.8.x for all of your client applications, 
then you can set `-Dsolr.storeBaseUrl=false` (introduced in Solr 8.8.1)
+to better align the stored state in Zookeeper with future versions of Solr. 
However, if you are not able to upgrade SolrJ to 8.8.x for all client 
applications, then
+leave the default `-Dsolr.storeBaseUrl=true` so that Solr will continue to 
store the `base_url` in Zookeeper.

Review comment:
   Perhaps add a note about this going away completely in the next major 
release? It's obvious for the folks who know but would be good to highlight for 
newer users or people who've inherited old systems. 

##
File path: solr/CHANGES.txt
##
@@ -238,13 +238,21 @@ Bug Fixes
 -
 * SOLR-15078: Fix ExpandComponent behavior when expanding on numeric fields to 
differentiate '0' group from null group (hossman)
 
-* SOLR-15114: Fix bug that caused WAND optimization to be disabled in cases 
where the max score is requested (such as

Review comment:
    





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15132) Add window paramater to the nodes Streaming Expression

2021-02-11 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-15132:
--
Attachment: SOLR-15132.patch

> Add window paramater to the nodes Streaming Expression
> --
>
> Key: SOLR-15132
> URL: https://issues.apache.org/jira/browse/SOLR-15132
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Priority: Major
> Attachments: SOLR-15132.patch, SOLR-15132.patch
>
>
> The *nodes* Streaming Expression performs a breadth first graph traversal. 
> This ticket will add a *window* parameter to allow the nodes expression to 
> traverse the graph within a window of time. 
> To take advantage of this feature you must index the content with a String 
> field which is an ISO timestamp truncated at ten seconds. Then the *window* 
> parameter can be applied to walk the graph within a *window prior* to a 
> specific ten second window and perform aggregations. 
> *The main use case for this feature is auto-detecting lagged correlations.* 
> This is useful in many different fields.
> Here is an example using Solr logs to answer the following question: 
> What types of log events occur most frequently in the 30 second window prior 
> to 10 second windows with the most slow queries:
> {code}
> nodes(logs,
>   facet(logs, q="qtime_s:[5000 TO *]", buckets="time_ten_seconds", 
> rows="25"),
>   walk="time_ten_seconds->time_ten_seconds",
>   window="3",
>   gather="type_s",
>   count(*))
> {code}
> This ticket is phase 1. Phase 2 will auto-detect different ISO Timestamp 
> truncations so that increments of one second, one minute, one day etc... can 
> also be traversed using the same query syntax. There will be a follow-on 
> ticket for that after this ticket is completed. This will create a more 
> general purpose time graph.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #2310: LUCENE-9705: Create Lucene90PostingsFormat

2021-02-11 Thread GitBox


jpountz commented on pull request #2310:
URL: https://github.com/apache/lucene-solr/pull/2310#issuecomment-777690621


   > Any thoughts on naming or package structure for these classes?
   
   What about putting the current blocktree classes into the 
`org.apache.lucene.backward_codecs.lucene40` (since they were introduced in 
Lucene 4.0) package and renaming the reader/writer classes to include the 
version too, ie. `BlockTree40TermsWriter` and `BlockTree40TermsReader`? And the 
new classes would be called `BlockTree90TermsWriter` and 
`BlockTree90TermsReader` and be under the `org.apache.lucene.codecs.lucene90` 
package?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #2310: LUCENE-9705: Create Lucene90PostingsFormat

2021-02-11 Thread GitBox


jpountz commented on pull request #2310:
URL: https://github.com/apache/lucene-solr/pull/2310#issuecomment-777686222


   > I wonder if we should not version PFUtil classes, instead move then to a 
package under Util and change the visibility of the methods. Those classes 
seems more like a utility to me.
   
   I've become a bit wary of having shared utility classes for codecs given how 
it makes the code harder to evolve (e.g. I have the FST and PackedInts classes 
in mind). I'd rather like to copy this utility class wherever it's needed so 
that every file format that uses bit packing can more easily update the logic 
to fits its own needs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9755) Index Segment without DocValues May Cause Search to Fail

2021-02-11 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283264#comment-17283264
 ] 

Mayya Sharipova commented on LUCENE-9755:
-

{quote}>> Consider the following scenario:
{quote}
{quote}>> all documents in the index have a field "numfield" indexed as IntPoint
{quote}
{quote}>> in addition, SOME of those documents are also indexed with a 
SortedNumericDocValuesField using the same "numfield" name
{quote}
[~tomhecker]. I am working on the LUCENE-9334  that will ensure that this never 
happens. That is, if a document has "numfield" indexed as IntPoint, it also 
must have a "numfield" indexed as SortedNumericDocValuesField.  In other words, 
there will be consistency between data-structures on a per-field across all the 
documents of an index.  

But this will be from version 9.0.  Your point is still valid for 8.x

 

 

> Index Segment without DocValues May Cause Search to Fail
> 
>
> Key: LUCENE-9755
> URL: https://issues.apache.org/jira/browse/LUCENE-9755
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.x, 8.3.1, 8.8
>Reporter: Thomas Hecker
>Priority: Minor
>  Labels: docValues, sorting
> Attachments: DocValuesTest.java
>
>
> Not sure if this can be considered a bug, but it is certainly a caveat that 
> may slip through testing due to its nature.
> Consider the following scenario:
>  * all documents in the index have a field "numfield" indexed as IntPoint
>  * in addition, SOME of those documents are also indexed with a 
> SortedNumericDocValuesField using the same "numfield" name
> The documents without the DocValues cannot be matched from any queries that 
> involve sorting, so we save some space by omitting the DocValues for those 
> documents.
> This works perfectly fine, unless
>  * the index contains a segment that only contains documents without the 
> DocValues
> In this case, running a query that sorts by "numfield" will throw the 
> following exception:
> {noformat}
> java.lang.IllegalStateException: unexpected docvalues type NONE for field 
> 'numfield' (expected one of [SORTED_NUMERIC, NUMERIC]). Re-index with correct 
> docvalues type.
>    at org.apache.lucene.index.DocValues.checkField(DocValues.java:317)
>    at org.apache.lucene.index.DocValues.getSortedNumeric(DocValues.java:389)
>    at 
> org.apache.lucene.search.SortedNumericSortField$3.getNumericDocValues(SortedNumericSortField.java:159)
>    at 
> org.apache.lucene.search.FieldComparator$NumericComparator.doSetNextReader(FieldComparator.java:155){noformat}
> I have included a minimal example program that demonstrates the issue. This 
> will
>  * create an index with two documents, each having "numfield" indexed
>  * add a DocValuesField "numfield" only for the first document
>  * force the two documents into separate index segments
>  * run a query that matches only the first document and sorts by "numfield"
> This results in the aforementioned exception.
> When removing the following lines from the code:
> {code:java}
> if (i==docCount/2) {
>   iw.commit();
> }
> {code}
> both documents get added to the same segment. When re-running the code 
> creating with a single index segment, the query works fine.
> Tested with Lucene 8.3.1 and 8.8.0  .
> Like I said, this may not be considered a bug. But it has slipped through our 
> testing because the existence of such a DocValues-free segment is such a rare 
> and short-lived event.
> We can avoid this issue in the future by using a different field name for the 
> DocValuesField. But for our production systems we have to patch 
> DocValues.checkField() to suppress the IllegalStateException as reindexing is 
> not an option right now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude opened a new pull request #2355: SOLR-15145: solr.storeBaseUrl feature flag introduced in 8.8.1 should default to false for 9.x

2021-02-11 Thread GitBox


thelabdude opened a new pull request #2355:
URL: https://github.com/apache/lucene-solr/pull/2355


   Align solr's changes.txt with fixes going into 8x / 8.8 and change the 
default value of the `solr.storeBaseUrl` sys prop for master.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread Tomas Eduardo Fernandez Lobbe (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Eduardo Fernandez Lobbe updated SOLR-15114:
-
Affects Version/s: (was: 8.6.3)
   (was: master (9.0))
   8.8
   8.6
   8.7

> WAND does not work correctly on multiple segments in Solr 8.6.3
> ---
>
> Key: SOLR-15114
> URL: https://issues.apache.org/jira/browse/SOLR-15114
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6, 8.7, 8.8
>Reporter: Naoto Minami
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: master (9.0), 8.9, 8.8.1
>
> Attachments: wand.pdf
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each 
> index segment and remain zero until maxScore is updated.
> There are two causes of this problem:
>  - MaxScoreCollector does not set minCompetitiveScore of 
> MinCompetitiveScoreAwareScorable newly generated for another index segment.
>  - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
> This behavior is correct considering the purpose of MaxScoreCollector.
> For details, see the attached pdf.
> *Note*
> This problem occurs in distributed search (SolrCloud) or the fl=score 
> parameter specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread Tomas Eduardo Fernandez Lobbe (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Eduardo Fernandez Lobbe updated SOLR-15114:
-
Fix Version/s: 8.9
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged. Thanks [~nminami]!

> WAND does not work correctly on multiple segments in Solr 8.6.3
> ---
>
> Key: SOLR-15114
> URL: https://issues.apache.org/jira/browse/SOLR-15114
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Naoto Minami
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: master (9.0), 8.9, 8.8.1
>
> Attachments: wand.pdf
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each 
> index segment and remain zero until maxScore is updated.
> There are two causes of this problem:
>  - MaxScoreCollector does not set minCompetitiveScore of 
> MinCompetitiveScoreAwareScorable newly generated for another index segment.
>  - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
> This behavior is correct considering the purpose of MaxScoreCollector.
> For details, see the attached pdf.
> *Note*
> This problem occurs in distributed search (SolrCloud) or the fl=score 
> parameter specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283234#comment-17283234
 ] 

ASF subversion and git services commented on SOLR-15114:


Commit 33a1f7f6b2541c096c835aead86dbe3cff111df9 in lucene-solr's branch 
refs/heads/branch_8_8 from Tomas Eduardo Fernandez Lobbe
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=33a1f7f ]

SOLR-15114: Add CHANGES entry


> WAND does not work correctly on multiple segments in Solr 8.6.3
> ---
>
> Key: SOLR-15114
> URL: https://issues.apache.org/jira/browse/SOLR-15114
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Naoto Minami
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: master (9.0), 8.8.1
>
> Attachments: wand.pdf
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each 
> index segment and remain zero until maxScore is updated.
> There are two causes of this problem:
>  - MaxScoreCollector does not set minCompetitiveScore of 
> MinCompetitiveScoreAwareScorable newly generated for another index segment.
>  - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
> This behavior is correct considering the purpose of MaxScoreCollector.
> For details, see the attached pdf.
> *Note*
> This problem occurs in distributed search (SolrCloud) or the fl=score 
> parameter specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283233#comment-17283233
 ] 

ASF subversion and git services commented on SOLR-15114:


Commit 04f92e613e0725c3dbcb4964b5f5886f84ffd847 in lucene-solr's branch 
refs/heads/branch_8_8 from Naoto MINAMI
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=04f92e6 ]

SOLR-15114: WAND does not work correctly on multiple segments (#2259)

In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each index 
segment and remain zero until maxScore is updated.
There are two causes of this problem:
* MaxScoreCollector does not set minCompetitiveScore of 
MinCompetitiveScoreAwareScorable newly generated for another index segment.
* MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
This behavior is correct considering the purpose of MaxScoreCollector.

For details, see the attached pdf 
https://issues.apache.org/jira/secure/attachment/13019548/wand.pdf.

> WAND does not work correctly on multiple segments in Solr 8.6.3
> ---
>
> Key: SOLR-15114
> URL: https://issues.apache.org/jira/browse/SOLR-15114
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Naoto Minami
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: master (9.0), 8.8.1
>
> Attachments: wand.pdf
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each 
> index segment and remain zero until maxScore is updated.
> There are two causes of this problem:
>  - MaxScoreCollector does not set minCompetitiveScore of 
> MinCompetitiveScoreAwareScorable newly generated for another index segment.
>  - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
> This behavior is correct considering the purpose of MaxScoreCollector.
> For details, see the attached pdf.
> *Note*
> This problem occurs in distributed search (SolrCloud) or the fl=score 
> parameter specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283229#comment-17283229
 ] 

ASF subversion and git services commented on SOLR-15114:


Commit 6a801e21520dc7d6e0720672d530f32ded25019b in lucene-solr's branch 
refs/heads/branch_8x from Naoto MINAMI
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6a801e2 ]

SOLR-15114: WAND does not work correctly on multiple segments (#2259)

In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each index 
segment and remain zero until maxScore is updated.
There are two causes of this problem:
* MaxScoreCollector does not set minCompetitiveScore of 
MinCompetitiveScoreAwareScorable newly generated for another index segment.
* MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
This behavior is correct considering the purpose of MaxScoreCollector.

For details, see the attached pdf 
https://issues.apache.org/jira/secure/attachment/13019548/wand.pdf.

> WAND does not work correctly on multiple segments in Solr 8.6.3
> ---
>
> Key: SOLR-15114
> URL: https://issues.apache.org/jira/browse/SOLR-15114
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Naoto Minami
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: master (9.0), 8.8.1
>
> Attachments: wand.pdf
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each 
> index segment and remain zero until maxScore is updated.
> There are two causes of this problem:
>  - MaxScoreCollector does not set minCompetitiveScore of 
> MinCompetitiveScoreAwareScorable newly generated for another index segment.
>  - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
> This behavior is correct considering the purpose of MaxScoreCollector.
> For details, see the attached pdf.
> *Note*
> This problem occurs in distributed search (SolrCloud) or the fl=score 
> parameter specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283230#comment-17283230
 ] 

ASF subversion and git services commented on SOLR-15114:


Commit 3d6d92feb5bc532e71e6c8ee5f92396b7f65b982 in lucene-solr's branch 
refs/heads/branch_8x from Tomas Eduardo Fernandez Lobbe
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3d6d92f ]

SOLR-15114: Add CHANGES entry


> WAND does not work correctly on multiple segments in Solr 8.6.3
> ---
>
> Key: SOLR-15114
> URL: https://issues.apache.org/jira/browse/SOLR-15114
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Naoto Minami
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: master (9.0), 8.8.1
>
> Attachments: wand.pdf
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each 
> index segment and remain zero until maxScore is updated.
> There are two causes of this problem:
>  - MaxScoreCollector does not set minCompetitiveScore of 
> MinCompetitiveScoreAwareScorable newly generated for another index segment.
>  - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
> This behavior is correct considering the purpose of MaxScoreCollector.
> For details, see the attached pdf.
> *Note*
> This problem occurs in distributed search (SolrCloud) or the fl=score 
> parameter specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter opened a new pull request #2354: LUCENE-9765: Hunspell: rename SpellChecker to Hunspell, fix test name…

2021-02-11 Thread GitBox


donnerpeter opened a new pull request #2354:
URL: https://github.com/apache/lucene-solr/pull/2354


   …, update javadoc and CHANGES.txt
   
   
   
   
   # Description
   
   The class names are imperfect and the docs are outdated
   
   # Solution
   
   Fix that
   
   # Tests
   
   No behavior change.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15145) Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url for core node props

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283218#comment-17283218
 ] 

ASF subversion and git services commented on SOLR-15145:


Commit b000e56b8f3b74e211cea450026b2d2c8da078b5 in lucene-solr's branch 
refs/heads/branch_8_8 from Timothy Potter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b000e56 ]

SOLR-15145: System property to control whether base_url is stored in state.json 
to enable back-compat with older SolrJ versions



> Older versions of SolrJ (pre-8.8.0) hit an NPE when computing the base_url 
> for core node props
> --
>
> Key: SOLR-15145
> URL: https://issues.apache.org/jira/browse/SOLR-15145
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 8.8
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Critical
> Fix For: 8.8.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> From the mailing list:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> deployment.uleaf.ear//org.apache.solr.common.cloud.ZkCoreNodeProps.getCoreUrl(ZkCoreNodeProps.java:53)
>   at 
> deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.lambda$sendRequest$2(BaseCloudSolrClient.java:1161)
>   at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
>   at 
> deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1159)
>   at 
> deployment.uleaf.ear//org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934)
>   ... 166 more
> {code}
> see: 
> https://lists.apache.org/thread.html/r3d131030f0a7026235451f71fabdae6d6b7c2f955822c75dcad4d41f%40%3Csolr-user.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2346: SOLR-15145: System property to control whether base_url is stored in state.json to enable back-compat with older SolrJ versions

2021-02-11 Thread GitBox


thelabdude merged pull request #2346:
URL: https://github.com/apache/lucene-solr/pull/2346


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9765) Hunspell: rename SpellChecker to Hunspell, fix test name, update javadoc and CHANGES.txt

2021-02-11 Thread Peter Gromov (Jira)
Peter Gromov created LUCENE-9765:


 Summary: Hunspell: rename SpellChecker to Hunspell, fix test name, 
update javadoc and CHANGES.txt
 Key: LUCENE-9765
 URL: https://issues.apache.org/jira/browse/LUCENE-9765
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Peter Gromov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283204#comment-17283204
 ] 

ASF subversion and git services commented on SOLR-15114:


Commit b6db6c88d7bc4ee1757c450ad7a3df8b72add084 in lucene-solr's branch 
refs/heads/master from Tomas Eduardo Fernandez Lobbe
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b6db6c8 ]

SOLR-15114: Add CHANGES entry


> WAND does not work correctly on multiple segments in Solr 8.6.3
> ---
>
> Key: SOLR-15114
> URL: https://issues.apache.org/jira/browse/SOLR-15114
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Naoto Minami
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: master (9.0), 8.8.1
>
> Attachments: wand.pdf
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each 
> index segment and remain zero until maxScore is updated.
> There are two causes of this problem:
>  - MaxScoreCollector does not set minCompetitiveScore of 
> MinCompetitiveScoreAwareScorable newly generated for another index segment.
>  - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
> This behavior is correct considering the purpose of MaxScoreCollector.
> For details, see the attached pdf.
> *Note*
> This problem occurs in distributed search (SolrCloud) or the fl=score 
> parameter specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9705) Move all codec formats to the o.a.l.codecs.Lucene90 package

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283195#comment-17283195
 ] 

ASF subversion and git services commented on LUCENE-9705:
-

Commit 096f054d562978a768d346f66b50332c686919a0 in lucene-solr's branch 
refs/heads/master from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=096f054 ]

LUCENE-9705: Reset internal version in Lucene90FieldInfosFormat. (#2339)

Since this is a fresh format, we can remove older version logic and reset the
internal version to 0.

> Move all codec formats to the o.a.l.codecs.Lucene90 package
> ---
>
> Key: LUCENE-9705
> URL: https://issues.apache.org/jira/browse/LUCENE-9705
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Current formats are distributed in different packages, prefixed with the 
> Lucene version they were created. With the upcoming release of Lucene 9.0, it 
> would be nice to move all those formats to just the o.a.l.codecs.Lucene90 
> package (and of course moving the current ones to the backwards-codecs).
> This issue would actually facilitate moving the directory API to little 
> endian (LUCENE-9047) as the only codecs that would need to handle backwards 
> compatibility will be the codecs in backwards codecs.
> In addition, it can help formalising the use of internal versions vs format 
> versioning ( LUCENE-9616)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani merged pull request #2339: LUCENE-9705: Reset internal version in Lucene90FieldInfosFormat.

2021-02-11 Thread GitBox


jtibshirani merged pull request #2339:
URL: https://github.com/apache/lucene-solr/pull/2339


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283192#comment-17283192
 ] 

ASF subversion and git services commented on SOLR-15114:


Commit 0cbb38ff4a38dea31324265da13412dd713d9a8e in lucene-solr's branch 
refs/heads/master from Naoto MINAMI
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0cbb38f ]

SOLR-15114: WAND does not work correctly on multiple segments (#2259)

In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each index 
segment and remain zero until maxScore is updated.
There are two causes of this problem:
* MaxScoreCollector does not set minCompetitiveScore of 
MinCompetitiveScoreAwareScorable newly generated for another index segment.
* MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
This behavior is correct considering the purpose of MaxScoreCollector.

For details, see the attached pdf 
https://issues.apache.org/jira/secure/attachment/13019548/wand.pdf.

> WAND does not work correctly on multiple segments in Solr 8.6.3
> ---
>
> Key: SOLR-15114
> URL: https://issues.apache.org/jira/browse/SOLR-15114
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Naoto Minami
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: master (9.0), 8.8.1
>
> Attachments: wand.pdf
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each 
> index segment and remain zero until maxScore is updated.
> There are two causes of this problem:
>  - MaxScoreCollector does not set minCompetitiveScore of 
> MinCompetitiveScoreAwareScorable newly generated for another index segment.
>  - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
> This behavior is correct considering the purpose of MaxScoreCollector.
> For details, see the attached pdf.
> *Note*
> This problem occurs in distributed search (SolrCloud) or the fl=score 
> parameter specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] tflobbe merged pull request #2259: SOLR-15114: WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread GitBox


tflobbe merged pull request #2259:
URL: https://github.com/apache/lucene-solr/pull/2259


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread Tomas Eduardo Fernandez Lobbe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282783#comment-17282783
 ] 

Tomas Eduardo Fernandez Lobbe edited comment on SOLR-15114 at 2/11/21, 5:08 PM:


I've run a perf test on the change using Gatling:
 * Using a wikipedia snapshot (20M docs, shortened to 1k characters)
 * Using Mike McCandless [query 
set|https://github.com/mikemccand/luceneutil/blob/master/tasks/wikimedium.10M.tasks]
 * 10k queries per type (180k queries total)
 * 2 shards, 1 replica each (on the same node).
 * Each shard has ~30 segments
 * search on the article body
 * 10 parallel users
 * Single Solr instance (iMac Pro 3.2 GHz 8-Core Intel Xeon W with 128 GB RAM).
 * Gatling running in the machine
 * The default example Solr parameters
 * Always used {{rows=2}}, in the case of WAND I also added {{minExactCount=2}}

While there is some noise in the tests (I'd expect master WAND, master no-WAND 
and patch no-WAND to perform similarly), the WAND scenario with the patch 
applied is definitely faster:
||Stat||master WAND||master no-WAND||patch WAND||patch no-WAND||
|QPS|97.72|103.687|153.061|111.732|
|min|1|1|1|1|
|p50|39|39|23|36|
|p75|102|95|57|87|
|p95|387|350|245|322|
|p99|829|809|668|769|
|max|2405|2416|1331|2447|
|mean|95|89|59|82|
|std dev|155|147|110|139|


was (Author: tomasflobbe):
I've run a perf test on the change using Gatling:
 * Using a wikipedia snapshot (20M docs, shortened to 1k characters)
 * Using Mike McCandless [query 
set|https://github.com/mikemccand/luceneutil/blob/master/tasks/wikimedium.10M.tasks]
 * 10k queries per type (18k queries total)
 * 2 shards, 1 replica each (on the same node).
 * Each shard has ~30 segments
 * search on the article body
 * 10 parallel users
 * Single Solr instance (iMac Pro 3.2 GHz 8-Core Intel Xeon W with 128 GB RAM).
 * Gatling running in the machine
 * The default example Solr parameters
 * Always used {{rows=2}}, in the case of WAND I also added {{minExactCount=2}}

While there is some noise in the tests (I'd expect master WAND, master no-WAND 
and patch no-WAND to perform similarly), the WAND scenario with the patch 
applied is definitely faster:
||Stat||master WAND||master no-WAND||patch WAND||patch no-WAND||
|QPS|97.72|103.687|153.061|111.732|
|min|1|1|1|1|
|p50|39|39|23|36|
|p75|102|95|57|87|
|p95|387|350|245|322|
|p99|829|809|668|769|
|max|2405|2416|1331|2447|
|mean|95|89|59|82|
|std dev|155|147|110|139|

> WAND does not work correctly on multiple segments in Solr 8.6.3
> ---
>
> Key: SOLR-15114
> URL: https://issues.apache.org/jira/browse/SOLR-15114
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Naoto Minami
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: master (9.0), 8.8.1
>
> Attachments: wand.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each 
> index segment and remain zero until maxScore is updated.
> There are two causes of this problem:
>  - MaxScoreCollector does not set minCompetitiveScore of 
> MinCompetitiveScoreAwareScorable newly generated for another index segment.
>  - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
> This behavior is correct considering the purpose of MaxScoreCollector.
> For details, see the attached pdf.
> *Note*
> This problem occurs in distributed search (SolrCloud) or the fl=score 
> parameter specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15152) Export Tool should export nested docs cleanly in .json, .jsonl, and javabin

2021-02-11 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283180#comment-17283180
 ] 

David Eric Pugh commented on SOLR-15152:


[~noble]what do you think of adding an explicit {{-compress}} switch that would 
add the .gz to the file, versus the pattern of naming the file with a \{{ -out 
/myexportdir/myoutput.gz}}

> Export Tool should export nested docs cleanly in .json, .jsonl, and javabin
> ---
>
> Key: SOLR-15152
> URL: https://issues.apache.org/jira/browse/SOLR-15152
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCLI
>Affects Versions: 8.8
>Reporter: David Eric Pugh
>Assignee: David Eric Pugh
>Priority: Major
>
> ExportTool doesn't properly handle anonymous child docs or nested docs.   It 
> also confuses the JSONL format with the JSON format.  
> I'd like to have the JSON Lines format output as .jsonl, which is the 
> standard, and have the JSON format to be a .json, which is the same output as 
> if you wanted to post a Solr doc as a JSON to upload the data...    This will 
> let us round trip the data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15152) Export Tool should export nested docs cleanly in .json, .jsonl, and javabin

2021-02-11 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283180#comment-17283180
 ] 

David Eric Pugh edited comment on SOLR-15152 at 2/11/21, 4:59 PM:
--

[~noble]what do you think of adding an explicit {{-compress}} switch that would 
add the .gz to the file, versus the pattern of naming the file with a

 -out /myexportdir/myoutput.gz


was (Author: epugh):
[~noble]what do you think of adding an explicit {{-compress}} switch that would 
add the .gz to the file, versus the pattern of naming the file with a \{{ -out 
/myexportdir/myoutput.gz}}

> Export Tool should export nested docs cleanly in .json, .jsonl, and javabin
> ---
>
> Key: SOLR-15152
> URL: https://issues.apache.org/jira/browse/SOLR-15152
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCLI
>Affects Versions: 8.8
>Reporter: David Eric Pugh
>Assignee: David Eric Pugh
>Priority: Major
>
> ExportTool doesn't properly handle anonymous child docs or nested docs.   It 
> also confuses the JSONL format with the JSON format.  
> I'd like to have the JSON Lines format output as .jsonl, which is the 
> standard, and have the JSON format to be a .json, which is the same output as 
> if you wanted to post a Solr doc as a JSON to upload the data...    This will 
> let us round trip the data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15150) add request level option to fail an atomic update if it can't be done 'in-place'

2021-02-11 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283179#comment-17283179
 ] 

Ishan Chattopadhyaya commented on SOLR-15150:
-

+1 Hoss, this issue is very useful.
How about "forceInPlaceUpdate" or "forceInPlace"? Or maybe just "inplace"?

> add request level option to fail an atomic update if it can't be done 
> 'in-place'
> 
>
> Key: SOLR-15150
> URL: https://issues.apache.org/jira/browse/SOLR-15150
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-15150.patch
>
>
> When "In-Place" DocValue updates were added to Solr, the choice was made to 
> re-use the existing "Atomic Update" syntax, and use the DocValue updating 
> code if possible based on the index & schema, otherwise fall back to the 
> existing Atomic Update logic (to re-index the entire document). In essence, 
> "In-Place Atomic Updates" are treated as a (possible) optimization to 
> "regular" Atomic Updates
> This works fine, but it leaves open the possibility of a "gotcha" situation 
> where users may (reasonably) assume that an update can be done "In-Place" but 
> some aspect of the schema prevents it, and the performance of the updates 
> doesn't meet expectations (notably in the case of things like deeply nested 
> documents, where the re-indexing cost is multiplicative based on the total 
> size of the document tree)
> I think it would be a good idea to support an optional request param users 
> can specify with the semantics that say "If this update is an Atomic Update, 
> fail to execute it unless it can be done In-Place"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15118) Make /v2/collections APIs annotation-based

2021-02-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283167#comment-17283167
 ] 

David Smiley commented on SOLR-15118:
-

I'm a huge fan of what you propose in your first comment!

> Make /v2/collections APIs annotation-based
> --
>
> Key: SOLR-15118
> URL: https://issues.apache.org/jira/browse/SOLR-15118
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: v2 API
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The {{ApiBag}} class used to register v2 APIs (and the {{PathTrie}} object 
> underlying it) only holds a single {{Api}} object for a given "method" and 
> "path" combination.  In short this means that API commands with the same 
> method and path must be declared homogenously: they either have to all be in 
> the JSON spec, or all be in annotated Java classes.
> The SIP-12 proposal calls for new "list-backups" and "delete-backups" APIs.  
> For these v2 APIs to be annotation-based, as is preferred going forward, all 
> of the existing /v2/collections APIs must be changed to be annotation-based 
> as well.
> It's worth noting that this will cause the introspection output to lose the 
> "description" text for these APIs and their parameters, as there's no support 
> for this yet for annotation-based v2 APIs.  See SOLR-15117 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15146) Distribute Collection API command execution

2021-02-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283160#comment-17283160
 ] 

David Smiley commented on SOLR-15146:
-

Amazing work Ilan!!!  I'm especially looking forward to easier debug-ability 
when things go wrong by not having to chase stack traces/context through the 
Overseer.  That will also make things easier like adding more distributed 
tracing in Solr -- no need to inject then extract spans for the ZK queue.  That 
doesn't happen today but it's something I (or a colleague) may add in the 
coming weeks.

> Distribute Collection API command execution
> ---
>
> Key: SOLR-15146
> URL: https://issues.apache.org/jira/browse/SOLR-15146
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: master (9.0)
>Reporter: Ilan Ginzburg
>Assignee: Ilan Ginzburg
>Priority: Major
>  Labels: collection-api, overseer
>
> Building on the distributed cluster state update changes (SOLR-14928), this 
> ticket will distribute the Collection API so that commands can execute on any 
> node (i.e. the node handling the request through {{CollectionsHandler}}) 
> without having to go through a Zookeeper queue and the Overseer.
> This is the second step (first was SOLR-14928) after which the Overseer could 
> be removed (but the code keeps existing execution options so completion by no 
> means Overseer is gone, but it could be removed in a future release).
> There is a dependency on the distributed cluster state changes because the 
> Overseer locking protecting same collection (or same shard) Collection API 
> commands from executing concurrently will be replaced by optimistic locking 
> of the collection {{state.json}} znodes (or other znodes that will eventually 
> replace/augment {{state.json}}).
> The goal of this ticket is threefold:
> * Simplify the code (running synchronously and not going through the 
> Zookeeper queues and the Overseer dequeue logic is much simpler),
> * Lead to improved performance for most/all use cases (although this is a 
> secondary goal, as long as performance is not degraded) and
> * Allow a future change (in another future Jira) to the way cluster state is 
> cached on the nodes of the cluster (keep less information, be less dependent 
> on Zookeeper watches, do not care about collections not present on the node). 
> This future work will aim to significantly increase the scale (amount of 
> collections) supported by SolrCloud.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15150) add request level option to fail an atomic update if it can't be done 'in-place'

2021-02-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283139#comment-17283139
 ] 

David Smiley commented on SOLR-15150:
-

+1 LGTM and great testing as usual

RE "require.inplace.atomic.updates". honestly I cringe seeing flags named like 
an English sentence.  I prefer the dots scope the module and then use camelCase 
for the option, e.g. "update.partial.requireInPlace".  I'm not a fan of Solr's 
overuse of this word "atomic" when really it's the "partial"-ness that is more 
perceivable by the user as what's happening.  I view the "atomic"-ness as an 
implementation detail to the "partial" aspect.  It could be argued the "atomic" 
aspect is more visible when users choose to specify a version constraint... but 
few users even do that, and even then I'd rather say something like 
"conditional update".

> add request level option to fail an atomic update if it can't be done 
> 'in-place'
> 
>
> Key: SOLR-15150
> URL: https://issues.apache.org/jira/browse/SOLR-15150
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-15150.patch
>
>
> When "In-Place" DocValue updates were added to Solr, the choice was made to 
> re-use the existing "Atomic Update" syntax, and use the DocValue updating 
> code if possible based on the index & schema, otherwise fall back to the 
> existing Atomic Update logic (to re-index the entire document). In essence, 
> "In-Place Atomic Updates" are treated as a (possible) optimization to 
> "regular" Atomic Updates
> This works fine, but it leaves open the possibility of a "gotcha" situation 
> where users may (reasonably) assume that an update can be done "In-Place" but 
> some aspect of the schema prevents it, and the performance of the updates 
> doesn't meet expectations (notably in the case of things like deeply nested 
> documents, where the re-indexing cost is multiplicative based on the total 
> size of the document tree)
> I think it would be a good idea to support an optional request param users 
> can specify with the semantics that say "If this update is an Atomic Update, 
> fail to execute it unless it can be done In-Place"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] HoustonPutman closed issue #215: kubectl apply -f crds.yaml fails with metadata.annotations: Too long error

2021-02-11 Thread GitBox


HoustonPutman closed issue #215:
URL: https://github.com/apache/lucene-solr-operator/issues/215


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] HoustonPutman commented on issue #215: kubectl apply -f crds.yaml fails with metadata.annotations: Too long error

2021-02-11 Thread GitBox


HoustonPutman commented on issue #215:
URL: 
https://github.com/apache/lucene-solr-operator/issues/215#issuecomment-777622617


   Use `kubectl replace` instead, it's what we use in the Makefile now. For 
CRDs, `replace` is better than `apply`, even without the annotation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15152) Export Tool should export nested docs cleanly in .json, .jsonl, and javabin

2021-02-11 Thread David Eric Pugh (Jira)
David Eric Pugh created SOLR-15152:
--

 Summary: Export Tool should export nested docs cleanly in .json, 
.jsonl, and javabin
 Key: SOLR-15152
 URL: https://issues.apache.org/jira/browse/SOLR-15152
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCLI
Affects Versions: 8.8
Reporter: David Eric Pugh
Assignee: David Eric Pugh


ExportTool doesn't properly handle anonymous child docs or nested docs.   It 
also confuses the JSONL format with the JSON format.  

I'd like to have the JSON Lines format output as .jsonl, which is the standard, 
and have the JSON format to be a .json, which is the same output as if you 
wanted to post a Solr doc as a JSON to upload the data...    This will let us 
round trip the data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15132) Add window paramater to the nodes Streaming Expression

2021-02-11 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283120#comment-17283120
 ] 

Joel Bernstein commented on SOLR-15132:
---

First patch, tests to follow.

> Add window paramater to the nodes Streaming Expression
> --
>
> Key: SOLR-15132
> URL: https://issues.apache.org/jira/browse/SOLR-15132
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Priority: Major
> Attachments: SOLR-15132.patch
>
>
> The *nodes* Streaming Expression performs a breadth first graph traversal. 
> This ticket will add a *window* parameter to allow the nodes expression to 
> traverse the graph within a window of time. 
> To take advantage of this feature you must index the content with a String 
> field which is an ISO timestamp truncated at ten seconds. Then the *window* 
> parameter can be applied to walk the graph within a *window prior* to a 
> specific ten second window and perform aggregations. 
> *The main use case for this feature is auto-detecting lagged correlations.* 
> This is useful in many different fields.
> Here is an example using Solr logs to answer the following question: 
> What types of log events occur most frequently in the 30 second window prior 
> to 10 second windows with the most slow queries:
> {code}
> nodes(logs,
>   facet(logs, q="qtime_s:[5000 TO *]", buckets="time_ten_seconds", 
> rows="25"),
>   walk="time_ten_seconds->time_ten_seconds",
>   window="3",
>   gather="type_s",
>   count(*))
> {code}
> This ticket is phase 1. Phase 2 will auto-detect different ISO Timestamp 
> truncations so that increments of one second, one minute, one day etc... can 
> also be traversed using the same query syntax. There will be a follow-on 
> ticket for that after this ticket is completed. This will create a more 
> general purpose time graph.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15132) Add window paramater to the nodes Streaming Expression

2021-02-11 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-15132:
--
Attachment: SOLR-15132.patch

> Add window paramater to the nodes Streaming Expression
> --
>
> Key: SOLR-15132
> URL: https://issues.apache.org/jira/browse/SOLR-15132
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Priority: Major
> Attachments: SOLR-15132.patch
>
>
> The *nodes* Streaming Expression performs a breadth first graph traversal. 
> This ticket will add a *window* parameter to allow the nodes expression to 
> traverse the graph within a window of time. 
> To take advantage of this feature you must index the content with a String 
> field which is an ISO timestamp truncated at ten seconds. Then the *window* 
> parameter can be applied to walk the graph within a *window prior* to a 
> specific ten second window and perform aggregations. 
> *The main use case for this feature is auto-detecting lagged correlations.* 
> This is useful in many different fields.
> Here is an example using Solr logs to answer the following question: 
> What types of log events occur most frequently in the 30 second window prior 
> to 10 second windows with the most slow queries:
> {code}
> nodes(logs,
>   facet(logs, q="qtime_s:[5000 TO *]", buckets="time_ten_seconds", 
> rows="25"),
>   walk="time_ten_seconds->time_ten_seconds",
>   window="3",
>   gather="type_s",
>   count(*))
> {code}
> This ticket is phase 1. Phase 2 will auto-detect different ISO Timestamp 
> truncations so that increments of one second, one minute, one day etc... can 
> also be traversed using the same query syntax. There will be a follow-on 
> ticket for that after this ticket is completed. This will create a more 
> general purpose time graph.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently

2021-02-11 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283116#comment-17283116
 ] 

Robert Muir commented on LUCENE-9754:
-

We just use unicode properties for that stuff which is standard. Changing any 
of that here won't address the root problem: the chunking is bad for long 
documents and can cause other issues that look nothing like this, too. So let's 
fix the chunking here to be more sane for long documents. 

As far as allowing such tweaks, maybe there are improvements we can do, but i'd 
prefer to look at that on separate issues. That's not related to chunking but 
instead splitting on scripts.

> ICU Tokenizer: letter-space-number-letter tokenized inconsistently
> --
>
> Key: LUCENE-9754
> URL: https://issues.apache.org/jira/browse/LUCENE-9754
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.5
> Environment: Tested most recently on Elasticsearch 6.5.4.
>Reporter: Trey Jones
>Priority: Major
>
> The tokenization of strings like _14th_ with the ICU tokenizer is affected by 
> the character that comes before preceeding whitespace.
> For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | 
> 14 | th.
> In general, in a letter-space-number-letter sequence, if the writing system 
> before the space is the same as the writing system after the number, then you 
> get two tokens. If the writing systems differ, you get three tokens.
> If the conditions are just right, the chunking that the ICU tokenizer does 
> (trying to split on spaces to create <4k chunks) can create an artificial 
> boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the 
> unexpected split of the second token (_14th_). Because chunking changes can 
> ripple through a long document, editing text or the effects of a character 
> filter can cause changes in tokenization thousands of lines later in a 
> document.
> My guess is that some "previous character set" flag is not reset at the 
> space, and numbers are not in a character set, so _t_ is compared to _ァ_ and 
> they are not the same—causing a token split at the character set change—but 
> I'm not sure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API

2021-02-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283110#comment-17283110
 ] 

ASF subversion and git services commented on LUCENE-9322:
-

Commit 683a9bd78abcf486a668881bc3294847ce5d5d1a in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=683a9bd ]

LUCENE-9322:  Add Vectors format to CodecReader accounting methods (#2353)



> Discussing a unified vectors format API
> ---
>
> Key: LUCENE-9322
> URL: https://issues.apache.org/jira/browse/LUCENE-9322
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Julie Tibshirani
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Two different approximate nearest neighbor approaches are currently being 
> developed, one based on HNSW (LUCENE-9004) and another based on coarse 
> quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to 
> handle vectors. In LUCENE-9136 we discussed the possibility of a unified API 
> that could support both approaches. The two ANN strategies give different 
> trade-offs in terms of speed, memory, and complexity, and it’s likely that 
> we’ll want to support both. Vector search is also an active research area, 
> and it would be great to be able to prototype and incorporate new approaches 
> without introducing more formats.
> To me it seems like a good time to begin discussing a unified API. The 
> prototype for coarse quantization 
> ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit 
> soon (this depends on everyone's feedback of course). The approach is simple 
> and shows solid search performance, as seen 
> [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326].
>  I think this API discussion is an important step in moving that 
> implementation forward.
> The goals of the API would be
> # Support for storing and retrieving individual float vectors.
> # Support for approximate nearest neighbor search -- given a query vector, 
> return the indexed vectors that are closest to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase merged pull request #2353: LUCENE-9322: Add Vectors format to CodecReader accounting methods

2021-02-11 Thread GitBox


iverase merged pull request #2353:
URL: https://github.com/apache/lucene-solr/pull/2353


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently

2021-02-11 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283105#comment-17283105
 ] 

Michael Sokolov commented on LUCENE-9754:
-

Would it make sense to have the ability to treat digits as Latin script? I 
think we ended up doing that in order to be able to apply (maybe 
anglo-euro-centric?) number constructs that nevertheless do appear in 
multilingual texts like units (1", 1ft, 1m., etc., ranges 1-10, ordinals like 
1st 2nd etc)

> ICU Tokenizer: letter-space-number-letter tokenized inconsistently
> --
>
> Key: LUCENE-9754
> URL: https://issues.apache.org/jira/browse/LUCENE-9754
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.5
> Environment: Tested most recently on Elasticsearch 6.5.4.
>Reporter: Trey Jones
>Priority: Major
>
> The tokenization of strings like _14th_ with the ICU tokenizer is affected by 
> the character that comes before preceeding whitespace.
> For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | 
> 14 | th.
> In general, in a letter-space-number-letter sequence, if the writing system 
> before the space is the same as the writing system after the number, then you 
> get two tokens. If the writing systems differ, you get three tokens.
> If the conditions are just right, the chunking that the ICU tokenizer does 
> (trying to split on spaces to create <4k chunks) can create an artificial 
> boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the 
> unexpected split of the second token (_14th_). Because chunking changes can 
> ripple through a long document, editing text or the effects of a character 
> filter can cause changes in tokenization thousands of lines later in a 
> document.
> My guess is that some "previous character set" flag is not reset at the 
> space, and numbers are not in a character set, so _t_ is compared to _ァ_ and 
> they are not the same—causing a token split at the character set change—but 
> I'm not sure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15114) WAND does not work correctly on multiple segments in Solr 8.6.3

2021-02-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283104#comment-17283104
 ] 

David Smiley commented on SOLR-15114:
-

I should second that sentiment as well; that PDF is amazing to anyone who wants 
a deeper look at BlockMax WAND!  Kudos to Naoto!

> WAND does not work correctly on multiple segments in Solr 8.6.3
> ---
>
> Key: SOLR-15114
> URL: https://issues.apache.org/jira/browse/SOLR-15114
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Naoto Minami
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: master (9.0), 8.8.1
>
> Attachments: wand.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In Solr 8.6.3, minCompetitiveScore of WANDScorer resets to zero for each 
> index segment and remain zero until maxScore is updated.
> There are two causes of this problem:
>  - MaxScoreCollector does not set minCompetitiveScore of 
> MinCompetitiveScoreAwareScorable newly generated for another index segment.
>  - MaxScoreCollector updates minCompetitiveScore only if maxScore is updated. 
> This behavior is correct considering the purpose of MaxScoreCollector.
> For details, see the attached pdf.
> *Note*
> This problem occurs in distributed search (SolrCloud) or the fl=score 
> parameter specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14561) Validate parameters to CoreAdminAPI

2021-02-11 Thread Thomas Mortagne (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283103#comment-17283103
 ] 

Thomas Mortagne commented on SOLR-14561:


I created SOLR-15151. I cannot work on a pull request right now, but I can 
definitely look at it in two weeks, just need to remember :)

> Validate parameters to CoreAdminAPI
> ---
>
> Key: SOLR-14561
> URL: https://issues.apache.org/jira/browse/SOLR-14561
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> CoreAdminAPI does not validate parameter input. We should limit what users 
> can specify for at least {{instanceDir and dataDir}} params, perhaps restrict 
> them to be relative to SOLR_HOME or SOLR_DATA_HOME.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15151) It's not possible anymore to indicate a relative path to the core data folder

2021-02-11 Thread Thomas Mortagne (Jira)
Thomas Mortagne created SOLR-15151:
--

 Summary: It's not possible anymore to indicate a relative path to 
the core data folder
 Key: SOLR-15151
 URL: https://issues.apache.org/jira/browse/SOLR-15151
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 8.6
Reporter: Thomas Mortagne


See 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.0/solr/core/src/java/org/apache/solr/core/SolrPaths.java#L124.

SOLR-14561 introduced a check to forbid using a path starting with "../" for 
the data of a core. This makes impossible to indicate a relative path to a data 
folder which is not stored in the core folder itself. Og course you can set an 
absolute path (provided it's part of the allowed paths), but then it makes 
impossible to move the entire storage somewhere else because it will break the 
stored path.

IMO the check for "../" prefix should be completely removed, and instead 
relative paths should be resolved to check if they are part of the allowed 
paths. At least for my use case having to set allowed paths is fine but it's 
possible some people might want to completely disable allowed path system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently

2021-02-11 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283102#comment-17283102
 ] 

Robert Muir commented on LUCENE-9754:
-

Just to be clear, I'm thinking something like "subclass 
SegmentingTokenizerBase" as the fix here. The crazy chunking code here predates 
the SegmentingTokenizerBase, but was never retrofitted with it afterwards. It 
could also lead to options for users to better control/customize the chunking.

> ICU Tokenizer: letter-space-number-letter tokenized inconsistently
> --
>
> Key: LUCENE-9754
> URL: https://issues.apache.org/jira/browse/LUCENE-9754
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.5
> Environment: Tested most recently on Elasticsearch 6.5.4.
>Reporter: Trey Jones
>Priority: Major
>
> The tokenization of strings like _14th_ with the ICU tokenizer is affected by 
> the character that comes before preceeding whitespace.
> For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | 
> 14 | th.
> In general, in a letter-space-number-letter sequence, if the writing system 
> before the space is the same as the writing system after the number, then you 
> get two tokens. If the writing systems differ, you get three tokens.
> If the conditions are just right, the chunking that the ICU tokenizer does 
> (trying to split on spaces to create <4k chunks) can create an artificial 
> boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the 
> unexpected split of the second token (_14th_). Because chunking changes can 
> ripple through a long document, editing text or the effects of a character 
> filter can cause changes in tokenization thousands of lines later in a 
> document.
> My guess is that some "previous character set" flag is not reset at the 
> space, and numbers are not in a character set, so _t_ is compared to _ァ_ and 
> they are not the same—causing a token split at the character set change—but 
> I'm not sure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently

2021-02-11 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283075#comment-17283075
 ] 

Robert Muir commented on LUCENE-9754:
-

Yes, the issue is because of the chunking. Put aside long documents for a 
second, imagine short document such as {{ァ 14th}} then it will always first be 
split as {{ァ 14|th}}:

That's because this tokenizer first divides on scripts, and lets you use 
different strategy per-script. These numbers have script code of "Common", and 
things like accent marks have script code of "Inherited", these 
"Common/Inherited" are "sticky". So under normal conditions it does not break 
until it hits the 't' ("Latin"). Maybe it is seen as undesirable in this 
example, but that is just the tradeoff the tokenizer makes (splitting on 
scripts). You can find more discussions on that in Notes section of 
https://unicode.org/reports/tr29/#Word_Boundary_Rules

But if you feed it super long document, we can't read the whole document into 
RAM at once, so we have to limit to 4k chunk. and the chunking may split on 
that space before the script analysis runs {{ァ|14th}}. This leads to the 
inconsistency that you see. 

For super long documents, the current behavior of this tokenizer will be 
annoying. The chunking/4k was written more to be a failsafe than anything else: 
I don't think a little tweak here or there to this tokenizer will help. 

One idea: change the tokenizer to chunk "sentence-at-a-time" based on sentence 
boundaries first. It might add a little overhead, but then long documents would 
work consistently. The behavior of this chunking would be easier for user to 
understand, word segmenter only sees "one sentence" of context at a time.

> ICU Tokenizer: letter-space-number-letter tokenized inconsistently
> --
>
> Key: LUCENE-9754
> URL: https://issues.apache.org/jira/browse/LUCENE-9754
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.5
> Environment: Tested most recently on Elasticsearch 6.5.4.
>Reporter: Trey Jones
>Priority: Major
>
> The tokenization of strings like _14th_ with the ICU tokenizer is affected by 
> the character that comes before preceeding whitespace.
> For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | 
> 14 | th.
> In general, in a letter-space-number-letter sequence, if the writing system 
> before the space is the same as the writing system after the number, then you 
> get two tokens. If the writing systems differ, you get three tokens.
> If the conditions are just right, the chunking that the ICU tokenizer does 
> (trying to split on spaces to create <4k chunks) can create an artificial 
> boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the 
> unexpected split of the second token (_14th_). Because chunking changes can 
> ripple through a long document, editing text or the effects of a character 
> filter can cause changes in tokenization thousands of lines later in a 
> document.
> My guess is that some "previous character set" flag is not reset at the 
> space, and numbers are not in a character set, so _t_ is compared to _ァ_ and 
> they are not the same—causing a token split at the character set change—but 
> I'm not sure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14561) Validate parameters to CoreAdminAPI

2021-02-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283071#comment-17283071
 ] 

Jan Høydahl commented on SOLR-14561:


Ah, I see - you'd like to allow path traversal explicitly. Please file a new 
Jira suggesting that, and include a PR if you can. I suppose we could make it 
so that {{allowPaths=*}} is checked early and means allow anything, like 
before? We could also add a few explicit keywords such as {{allowPaths=..}} and 
{{allowPaths=_UNC_}} which would allow parent traversal and UNC paths 
respectively?

> Validate parameters to CoreAdminAPI
> ---
>
> Key: SOLR-14561
> URL: https://issues.apache.org/jira/browse/SOLR-14561
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> CoreAdminAPI does not validate parameter input. We should limit what users 
> can specify for at least {{instanceDir and dataDir}} params, perhaps restrict 
> them to be relative to SOLR_HOME or SOLR_DATA_HOME.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] zhuopeng opened a new issue #215: kubectl apply -f crds.yaml fails with metadata.annotations: Too long error

2021-02-11 Thread GitBox


zhuopeng opened a new issue #215:
URL: https://github.com/apache/lucene-solr-operator/issues/215


   To reproduce
   `kubectl apply -f helm/solr-operator/crds/crds.yaml`
   
   getting error
   
`customresourcedefinition.apiextensions.k8s.io/solrbackups.solr.bloomberg.com 
configured
   Warning: kubectl apply should be used on resource created by either kubectl 
create --save-config or kubectl apply
   
customresourcedefinition.apiextensions.k8s.io/solrcollectionaliases.solr.bloomberg.com
 configured
   
customresourcedefinition.apiextensions.k8s.io/solrcollections.solr.bloomberg.com
 configured
   
customresourcedefinition.apiextensions.k8s.io/solrprometheusexporters.solr.bloomberg.com
 configured
   The CustomResourceDefinition "solrclouds.solr.bloomberg.com" is invalid: 
metadata.annotations: Too long: must have at most 262144 bytes`
   
   It happens for any version >0.2.6
   
   It is likely related to this 
https://github.com/kubernetes/kubectl/issues/712. When using kubectl apply to 
install CRD, kubectl apply use "last-applied-configuration" to store previous 
CRD configuration. And it exceed the limit. 
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2351: LUCENE-9763: Hunspell: fix FORBIDDENWORD support

2021-02-11 Thread GitBox


donnerpeter commented on a change in pull request #2351:
URL: https://github.com/apache/lucene-solr/pull/2351#discussion_r574552151



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##
@@ -94,10 +93,6 @@ public Stemmer(Dictionary dictionary) {
   word = scratchBuffer;
 }
 
-if (dictionary.isForbiddenWord(word, length)) {

Review comment:
   From my correspondence with Hunspell's author it seems to be a feature, 
that stemming gives some results even for misspelled words





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14920) Format code automatically and enforce it in Solr

2021-02-11 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283049#comment-17283049
 ] 

David Eric Pugh commented on SOLR-14920:


I've been relearning the code base, and with fresh eyes, it's shocking to see 
how many flavours of writing the code Solr has...    I'm often faced with 
"Which way should I write this method" and then see multiple approaches ;)  

 

So I look forward to when this happens!

> Format code automatically and enforce it in Solr
> 
>
> Key: SOLR-14920
> URL: https://issues.apache.org/jira/browse/SOLR-14920
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Priority: Major
>  Labels: codestyle, formatting
>
> See the discussion at: LUCENE-9564.
> This is a placeholder for the present, I'm reluctant to do this to the Solr 
> code base until after:
>  * we have some Solr-specific consensus
>  * we have some clue what this means for the reference impl.
> Reconciling the reference impl will be difficult enough without a zillion 
> format changes to add to the confusion.
> So my proposal is
> 1> do this.
> 2> Postpone this until after the reference impl is merged.
> 3> do this in one single commit for reasons like being able to conveniently 
> have this separated out from git blame.
> Assigning to myself so it doesn't get lost, but anyone who wants to take it 
> over please feel free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9763) Hunspell: fix FORBIDDENWORD support

2021-02-11 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9763.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> Hunspell: fix FORBIDDENWORD support
> ---
>
> Key: LUCENE-9763
> URL: https://issues.apache.org/jira/browse/LUCENE-9763
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Peter Gromov
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2318: SOLR-15138: PerReplicaStates does not scale to large collections as well as state.json

2021-02-11 Thread GitBox


murblanc commented on a change in pull request #2318:
URL: https://github.com/apache/lucene-solr/pull/2318#discussion_r574498232



##
File path: 
solr/core/src/java/org/apache/solr/cloud/RefreshCollectionMessage.java
##
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cloud;
+
+import org.apache.solr.common.cloud.ClusterState;
+import org.apache.solr.common.cloud.DocCollection;
+import org.apache.solr.common.cloud.ZkStateReader;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.data.Stat;
+
+/**Refresh the Cluster State for a given collection
+ *
+ */
+public class RefreshCollectionMessage implements Overseer.Message {
+public final Operation operation;
+public final String collection;
+
+public RefreshCollectionMessage(String collection) {
+this.operation = Operation.REFRESH_COLL;
+this.collection = collection;
+}
+
+ClusterState run(ClusterState clusterState, Overseer overseer) throws 
InterruptedException, KeeperException {

Review comment:
   Can you please use another name than `run()` that is usually related to 
a `Runnable` which is not the case here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >