[jira] [Comment Edited] (LUCENE-9616) Improve test coverage for internal format versions
[ https://issues.apache.org/jira/browse/LUCENE-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237767#comment-17237767 ] Julie Tibshirani edited comment on LUCENE-9616 at 11/24/20, 12:45 AM: -- This seems like a nice way to reframe the issue: if internal versions are meant for bug fixes, maybe the real problem is that we even have internal versions with a lot of logic that needs testing? This would avoid the need for a special testing approach. I noticed PointsFormat uses internal versions extensively, in particular {{BKDWriter}} has ~5 internal versions. Moving away from internal versions seems like it’d cause much more code to be duplicated. Maybe that’s okay, or maybe we’d choose to maintain some shared write logic. was (Author: jtibshirani): This seems like a nice way to reframe the issue: if internal versions are meant for bug fixes, maybe the real problem is that we even have internal versions with a lot of logic that needs testing? This would avoid the need for a special testing approach. I noticed PointsFormat uses internal versions extensively, in particular `BKDWriter` has ~5 internal versions. Moving away from internal versions seems like it’d cause much more code to be duplicated. Maybe that’s okay, or maybe we’d choose to maintain some shared write logic. > Improve test coverage for internal format versions > -- > > Key: LUCENE-9616 > URL: https://issues.apache.org/jira/browse/LUCENE-9616 > Project: Lucene - Core > Issue Type: Test >Reporter: Julie Tibshirani >Priority: Minor > > Some formats use an internal versioning system -- for example > {{CompressingStoredFieldsFormat}} maintains older logic for reading an > on-heap fields index. Because we always allow reading segments from the > current + previous major version, some users still rely on the read-side > logic of older internal versions. > Although the older version logic is covered by > {{TestBackwardsCompatibility}}, it looks like it's not exercised in unit > tests. Older versions aren't "in rotation" when choosing a random codec for > tests. They also don't have dedicated unit tests as we have for separate > older formats, for example {{TestLucene60PointsFormat}}. > It could be good to improve unit test coverage for the older versions, since > they're in active use. A downside is that it's not straightforward to add > unit tests, since we tend to just change/ delete the old write-side logic as > we bump internal versions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] TomMD commented on pull request #2092: SOLR-15009 Propogate IOException from DF.exists
TomMD commented on pull request #2092: URL: https://github.com/apache/lucene-solr/pull/2092#issuecomment-732502367 @madrob I see a backend failure which lacks sufficient explanation. We have opened a ticket to investigate further. In the mean time I have re-started analysis of this PR. I can babysit this - if the job does not finish within the next hour then we'll have more to investigate. Thank you for asking about this, it's certainly not a situation that should come up. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14788) Solr: The Next Big Thing
[ https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237734#comment-17237734 ] Mark Robert Miller edited comment on SOLR-14788 at 11/23/20, 10:38 PM: --- It’s been on my mind, so before I forget and before I reveal the final state of phase 1, I’d like rectify one of own complaints. [~caomanhdat] played a silent but but crucial role in this work. He took some of http2 work on the starburst branch and ran with it and got it committed. That helped me a lot in this work. He did a fantastic initial Gradle implementation, and Gradle was critical to speeding up my dev iteration times for this work. It’s a fair bit slower for me these days, but I didn’t finish the work and so I won’t complain about trade offs I don’t know about. He did the initial search side async impl that ive stolen - I had taken a early not yet working stab at it, his work was better and more complete. Others may have been involved, but his work on leader initiated recovery is a fantastic and crucial part to this whole system. And likely I am forgetting something, but more importantly, as I found his work built on mine or built my work on his or saw him start issues that were also in my backlog, he gave me the feeling that I am not alone in my thinking of what can and should be done with SolrCloud. Thank you Dat, whatever comes out of this issue, you were a keystone to it. was (Author: markrmiller): It’s been on my mind, so before I forget and before I reveal the final state of phase 1, I’d like rectify one of own complaints. [~caomanhdat] played a silent but but crucial role in this work. He took some of http2 work on the starburst branch and ran with it and got it committer. That helped me a lot in this work. He did a fantastic initial Gradle implementation, and Gradle was critical to speeding up my dev iteration times for this work. It’s a fair bit slower for me these days, but I didn’t finish the work and so I won’t complain about trade offs I don’t know about. He did the initial search side async impl that ive stolen - I had taken a early not yet working stab at it, his work was better and more complete. Others may have been involved, but his work on leader initiated recovery is a fantastic and crucial part to this whole system. And likely I am forgetting something, but more importantly, as I found his work built on mine or built my work on his or say him start issues that were also in my backlog, he gave me the feeling that I am not alone in my thinking of can and should be done with SolrCloud. Thank you Dat, whatever comes out of this issue, you were a keystone to it. > Solr: The Next Big Thing > > > Key: SOLR-14788 > URL: https://issues.apache.org/jira/browse/SOLR-14788 > Project: Solr > Issue Type: Task >Reporter: Mark Robert Miller >Assignee: Mark Robert Miller >Priority: Critical > Time Spent: 4h > Remaining Estimate: 0h > > h3. > [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The > Policeman is on duty!*{color} > {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and > have some fun. Try to make some progress. Don't stress too much about the > impact of your changes or maintaining stability and performance and > correctness so much. Until the end of phase 1, I've got your back. I have a > variety of tools and contraptions I have been building over the years and I > will continue training them on this branch. I will review your changes and > peer out across the land and course correct where needed. As Mike D will be > thinking, "Sounds like a bottleneck Mark." And indeed it will be to some > extent. Which is why once stage one is completed, I will flip The Policeman > to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} > *down for some vigilante justice, but I won't be walking the beat, all that > stuff about sit back and relax goes out the window.*{color}_ > {quote} > > I have stolen this title from Ishan or Noble and Ishan. > This issue is meant to capture the work of a small team that is forming to > push Solr and SolrCloud to the next phase. > I have kicked off the work with an effort to create a very fast and solid > base. That work is not 100% done, but it's ready to join the fight. > Tim Potter has started giving me a tremendous hand in finishing up. Ishan and > Noble have already contributed support and testing and have plans for > additional work to shore up some of our current shortcomings. > Others have expressed an interest in helping and hopefully they will pop up > here as well. > Let's organize and discuss our efforts here and in various sub issues. --
[jira] [Commented] (SOLR-14788) Solr: The Next Big Thing
[ https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237734#comment-17237734 ] Mark Robert Miller commented on SOLR-14788: --- It’s been on my mind, so before I forget and before I reveal the final state of phase 1, I’d like rectify one of own complaints. [~caomanhdat] played a silent but but crucial role in this work. He took some of http2 work on the starburst branch and ran with it and got it committer. That helped me a lot in this work. He did a fantastic initial Gradle implementation, and Gradle was critical to speeding up my dev iteration times for this work. It’s a fair bit slower for me these days, but I didn’t finish the work and so I won’t complain about trade offs I don’t know about. He did the initial search side async impl that ive stolen - I had taken a early not yet working stab at it, his work was better and more complete. Others may have been involved, but his work on leader initiated recovery is a fantastic and crucial part to this whole system. And likely I am forgetting something, but more importantly, as I found his work built on mine or built my work on his or say him start issues that were also in my backlog, he gave me the feeling that I am not alone in my thinking of can and should be done with SolrCloud. Thank you Dat, whatever comes out of this issue, you were a keystone to it. > Solr: The Next Big Thing > > > Key: SOLR-14788 > URL: https://issues.apache.org/jira/browse/SOLR-14788 > Project: Solr > Issue Type: Task >Reporter: Mark Robert Miller >Assignee: Mark Robert Miller >Priority: Critical > Time Spent: 4h > Remaining Estimate: 0h > > h3. > [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The > Policeman is on duty!*{color} > {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and > have some fun. Try to make some progress. Don't stress too much about the > impact of your changes or maintaining stability and performance and > correctness so much. Until the end of phase 1, I've got your back. I have a > variety of tools and contraptions I have been building over the years and I > will continue training them on this branch. I will review your changes and > peer out across the land and course correct where needed. As Mike D will be > thinking, "Sounds like a bottleneck Mark." And indeed it will be to some > extent. Which is why once stage one is completed, I will flip The Policeman > to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} > *down for some vigilante justice, but I won't be walking the beat, all that > stuff about sit back and relax goes out the window.*{color}_ > {quote} > > I have stolen this title from Ishan or Noble and Ishan. > This issue is meant to capture the work of a small team that is forming to > push Solr and SolrCloud to the next phase. > I have kicked off the work with an effort to create a very fast and solid > base. That work is not 100% done, but it's ready to join the fight. > Tim Potter has started giving me a tremendous hand in finishing up. Ishan and > Noble have already contributed support and testing and have plans for > additional work to shore up some of our current shortcomings. > Others have expressed an interest in helping and hopefully they will pop up > here as well. > Let's organize and discuss our efforts here and in various sub issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dxl360 edited a comment on pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled
dxl360 edited a comment on pull request #2080: URL: https://github.com/apache/lucene-solr/pull/2080#issuecomment-732411622 Had offline discussion with @mikemccand. Maybe we can change the type of `invertState.length` from `int` to `long` and keep the current check on field length/termFreq accumulation but safely cast the length back to `int` when calculating the norms. `long totalTermFreq/sumTotalTermFreq` is not expected to be broken by `invertState.length`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dxl360 commented on pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled
dxl360 commented on pull request #2080: URL: https://github.com/apache/lucene-solr/pull/2080#issuecomment-732411622 Had offline discussion with @mikemccand. Maybe we can change the type of `invertState.length` from `int` to `long` and keep the current check on field length/termFreq accumulation but safely cast the length back to `int` when calculating the norms. `totalTermFreq` and `sumTotalTermFreq` are both `long` and is not expected to be broken by `invertState.length`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2095: LUCENE-9618: Do not call IntervalIterator.nextInterval after NO_MORE_DOCS returned
mikemccand commented on a change in pull request #2095: URL: https://github.com/apache/lucene-solr/pull/2095#discussion_r528930069 ## File path: lucene/queries/src/java/org/apache/lucene/queries/intervals/IntervalIterator.java ## @@ -82,6 +82,11 @@ public int width() { /** * Advance the iterator to the next interval * + * Should not be called after {@link DocIdSetIterator#NO_MORE_DOCS} is returned by other methods + * if that's the case in some existing code, please consider opening an issue Review comment: This is a new sentence -- maybe add `.` at end of previous one and capitlize `If`? ## File path: lucene/queries/src/java/org/apache/lucene/queries/intervals/IntervalIterator.java ## @@ -82,6 +82,11 @@ public int width() { /** * Advance the iterator to the next interval * + * Should not be called after {@link DocIdSetIterator#NO_MORE_DOCS} is returned by other methods Review comment: Hmm maybe be more specific than `other methods`? E.g. maybe say `returned by the query scorer's nextDoc() method`? ## File path: lucene/queries/src/java/org/apache/lucene/queries/intervals/IntervalIterator.java ## @@ -82,6 +82,11 @@ public int width() { /** * Advance the iterator to the next interval * + * Should not be called after {@link DocIdSetIterator#NO_MORE_DOCS} is returned by other methods + * if that's the case in some existing code, please consider opening an issue + * However, after {@link IntervalIterator#NO_MORE_INTERVALS} is returned by this method, it might be + * called again Review comment: Period at end of this sentence? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti opened a new pull request #2096: SOLR-15015: added support to parametric Interleaving algorithm
alessandrobenedetti opened a new pull request #2096: URL: https://github.com/apache/lucene-solr/pull/2096 # Description This pull requests add a parameter 'interleavingAlgorithm' in Learning To Rank to specify the Interleaving algorithm to use (Only TeamDraft supported) # Solution Added the parameter and defaults # Tests Added a new test class for the LTR QueryParser, related with interleaving parameters # Checklist Please review the following and check all that apply: - [X] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [X] I have created a Jira issue and added the issue ID to my pull request title. - [X] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [X] I have developed this patch against the `master` branch. - [X] I have run `./gradlew check`. - [X] I have added tests for my changes. - [X] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zhaih opened a new pull request #2095: LUCENE-9618: Do not call IntervalIterator.nextInterval after NO_MORE_DOCS returned
zhaih opened a new pull request #2095: URL: https://github.com/apache/lucene-solr/pull/2095 # Description * In `ConjunctionIntervalIterator` check whether approximation's returned docId is NO_MORE_DOCS to avoid `nextInterval()` call after NO_MORE_DOCS is returned * Add a test case to verify the problem is addressed # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml
[ https://issues.apache.org/jira/browse/SOLR-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237590#comment-17237590 ] David Smiley edited comment on SOLR-15012 at 11/23/20, 6:31 PM: Moreover... I wonder if all "request logging" (no matter node/admin level vs specific to a SolrCore) ought to be logged by one call site using one "Request" SLF4J logger and with the MarkerFilter of the handler. Something to consider. was (Author: dsmiley): Moreover... I wonder if all "request logging" (no matter node/admin level vs specific to a SolrCore) ought to be logged by one call sight using one "Request" SLF4J logger and with the MarkerFilter of the handler. > Add a logging filter marker for /admin/ping requests to be silenced via > log4j2.xml > -- > > Key: SOLR-15012 > URL: https://issues.apache.org/jira/browse/SOLR-15012 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Nazerke Seidan >Priority: Minor > > While looking at logs, I have observed a lot of noise from /admin/ping > requests which is often called to ping core and all replicas coming from > org.apache.solr.core.SolrCore.Request. I think it makes sense to add a > marker to SolrCore. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml
[ https://issues.apache.org/jira/browse/SOLR-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237590#comment-17237590 ] David Smiley commented on SOLR-15012: - Moreover... I wonder if all "request logging" (no matter node/admin level vs specific to a SolrCore) ought to be logged by one call sight using one "Request" SLF4J logger and with the MarkerFilter of the handler. > Add a logging filter marker for /admin/ping requests to be silenced via > log4j2.xml > -- > > Key: SOLR-15012 > URL: https://issues.apache.org/jira/browse/SOLR-15012 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Nazerke Seidan >Priority: Minor > > While looking at logs, I have observed a lot of noise from /admin/ping > requests which is often called to ping core and all replicas coming from > org.apache.solr.core.SolrCore.Request. I think it makes sense to add a > marker to SolrCore. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other
[ https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Risden resolved SOLR-14973. - Resolution: Fixed Marking as fixed in 8.7. Thanks [~schuch] for finding the fix. Thanks [~shuremov] for confirming fixed in 8.7. Thanks [~tallison] for jumping in :D > Solr 8.6 is shipping libraries that are incompatible with each other > > > Key: SOLR-14973 > URL: https://issues.apache.org/jira/browse/SOLR-14973 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 8.6 >Reporter: Samir Huremovic >Assignee: Tim Allison >Priority: Major > Labels: tika-parsers > Fix For: 8.7 > > > Hi, > since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This > version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} > (see https://issues.apache.org/jira/browse/TIKA-3047) > Solr has version {{4.1.1}} of poi included. > This creates (at least) a problem for parsing {{.xls}} files. The following > exception gets thrown by trying to post an {{.xls}} file in the techproducts > example: > {{java.lang.NoSuchMethodError: > org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other
[ https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Risden updated SOLR-14973: Fix Version/s: 8.7 > Solr 8.6 is shipping libraries that are incompatible with each other > > > Key: SOLR-14973 > URL: https://issues.apache.org/jira/browse/SOLR-14973 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 8.6 >Reporter: Samir Huremovic >Priority: Major > Labels: tika-parsers > Fix For: 8.7 > > > Hi, > since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This > version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} > (see https://issues.apache.org/jira/browse/TIKA-3047) > Solr has version {{4.1.1}} of poi included. > This creates (at least) a problem for parsing {{.xls}} files. The following > exception gets thrown by trying to post an {{.xls}} file in the techproducts > example: > {{java.lang.NoSuchMethodError: > org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other
[ https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Risden reassigned SOLR-14973: --- Assignee: Tim Allison > Solr 8.6 is shipping libraries that are incompatible with each other > > > Key: SOLR-14973 > URL: https://issues.apache.org/jira/browse/SOLR-14973 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 8.6 >Reporter: Samir Huremovic >Assignee: Tim Allison >Priority: Major > Labels: tika-parsers > Fix For: 8.7 > > > Hi, > since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This > version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} > (see https://issues.apache.org/jira/browse/TIKA-3047) > Solr has version {{4.1.1}} of poi included. > This creates (at least) a problem for parsing {{.xls}} files. The following > exception gets thrown by trying to post an {{.xls}} file in the techproducts > example: > {{java.lang.NoSuchMethodError: > org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15010) Missing jstack warning is alarming, when using bin/solr as client interface to solr
[ https://issues.apache.org/jira/browse/SOLR-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237571#comment-17237571 ] Christine Poerschke commented on SOLR-15010: {quote}... Thoughts on maybe only conducting this check if you are running {{bin/solr start}} or one of the other commands that is actually starting Solr as a process? {quote} Sounds good to me. > Missing jstack warning is alarming, when using bin/solr as client interface > to solr > --- > > Key: SOLR-15010 > URL: https://issues.apache.org/jira/browse/SOLR-15010 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.7 >Reporter: David Eric Pugh >Priority: Minor > > In SOLR-14442 we added a warning if jstack wasn't found. I notice that I > use the bin/solr command a lot as a client, so bin solr zk or bin solr > healthcheck. > For example: > {{docker exec solr1 solr zk cp /security.json zk:security.json -z zoo1:2181}} > All of these emit the message: > The currently defined JAVA_HOME (/usr/local/openjdk-11) refers to a location > where java was found but jstack was not found. Continuing. > This is somewhat alarming, and then becomes annoying. Thoughts on maybe > only conducting this check if you are running {{bin/solr start}} or one of > the other commands that is actually starting Solr as a process? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15008) Avoid building OrdinalMap for each facet
[ https://issues.apache.org/jira/browse/SOLR-15008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237533#comment-17237533 ] Radu Gheorghe commented on SOLR-15008: -- Thanks for pointing to the optimization, it's nice to know! But I don't think this was the problem. We initially added a query that did match a lot of docs. But it didn't seem to warm up anything. I'm sure there was a (syntax?) error somewhere, but we didn't see anything in the logs or something like that. Plus, copy-pasting the query parameters worked as expected. We eventually added legacy facets and we stopped there, but I thought I should forward this info, just in case this is a different (known?) issue. I could look into it separately. > Avoid building OrdinalMap for each facet > > > Key: SOLR-15008 > URL: https://issues.apache.org/jira/browse/SOLR-15008 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Affects Versions: 8.7 >Reporter: Radu Gheorghe >Priority: Major > Labels: performance > Attachments: Screenshot 2020-11-19 at 12.01.55.png, writes_commits.png > > > I'm running against the following scenario: > * [JSON] faceting on a high cardinality field > * few matching documents => few unique values > Yet the query almost always takes a long time. Here's an example taking > almost 4s for ~300 documents and unique values (edited a bit): > > {code:java} > "QTime":3869, > "params":{ > "json":"{\"query\": \"*:*\", > \"filter\": [\"type:test_type\", \"date:[1603670360 TO 1604361599]\", > \"unique_id:49866\"] > \"facet\": > {\"keywords\":{\"type\":\"terms\",\"field\":\"keywords\",\"limit\":20,\"mincount\":20}}}", > "rows":"0"}}, > > "response":{"numFound":333,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[] > }, > "facets":{ > "count":333, > "keywords":{ > "buckets":[{ > "val":"value1", > "count":124}, > ... > {code} > I did some [profiling with our Sematext > Monitoring|https://sematext.com/docs/monitoring/on-demand-profiling/] and it > points me to OrdinalMap building (see attached screenshot). If I read the > code right, an OrdinalMap is built with every facet. And it's expensive since > there are many unique values in the shard (previously, there we more smaller > shards, making latency better, but this approach doesn't scale for this > particular use-case). > If I'm right up to this point, I see a couple of potential improvements, > [inspired from > Elasticsearch|#search-aggregations-bucket-terms-aggregation-execution-hint]: > # *Keep the OrdinalMap cached until the next softCommit*, so that only the > first query takes the penalty > # *Allow faceting on actual values (a Map) rather than ordinals*, for > situations like the one above where we have few matching documents. We could > potentially auto-detect this scenario (e.g. by configuring a threshold) and > use a Map when there are few documents > I'm curious about what you're thinking: > * would a PR/patch be welcome for any of the two ideas above? > * do you see better options? am I missing something? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on pull request #2092: SOLR-15009 Propogate IOException from DF.exists
madrob commented on pull request #2092: URL: https://github.com/apache/lucene-solr/pull/2092#issuecomment-732299521 @TomMD musebot seems to be stuck - any ideas? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15008) Avoid building OrdinalMap for each facet
[ https://issues.apache.org/jira/browse/SOLR-15008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237511#comment-17237511 ] Michael Gibney commented on SOLR-15008: --- I think [this optimization|https://github.com/apache/lucene-solr/blob/9bfaca0606968ed970d9d12d871f977e2655765b/solr/core/src/java/org/apache/solr/search/facet/FacetFieldProcessorByArrayDV.java#L94-L98] in {{FacetFieldProcessorByArrayDV.collectDocs()}} is the reason you couldn't do the warming with {{json.facet}} and a query that matches no docs. There's evidently not an analogous optimization in "legacy facets", which is why that worked (and I'd guess (\?) that this optimization won't be added to legacy facet code anytime soon). In any event, sounds like a good outcome, and I'm happy to have been able to help (and no worries re: the elephant :)). > Avoid building OrdinalMap for each facet > > > Key: SOLR-15008 > URL: https://issues.apache.org/jira/browse/SOLR-15008 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Affects Versions: 8.7 >Reporter: Radu Gheorghe >Priority: Major > Labels: performance > Attachments: Screenshot 2020-11-19 at 12.01.55.png, writes_commits.png > > > I'm running against the following scenario: > * [JSON] faceting on a high cardinality field > * few matching documents => few unique values > Yet the query almost always takes a long time. Here's an example taking > almost 4s for ~300 documents and unique values (edited a bit): > > {code:java} > "QTime":3869, > "params":{ > "json":"{\"query\": \"*:*\", > \"filter\": [\"type:test_type\", \"date:[1603670360 TO 1604361599]\", > \"unique_id:49866\"] > \"facet\": > {\"keywords\":{\"type\":\"terms\",\"field\":\"keywords\",\"limit\":20,\"mincount\":20}}}", > "rows":"0"}}, > > "response":{"numFound":333,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[] > }, > "facets":{ > "count":333, > "keywords":{ > "buckets":[{ > "val":"value1", > "count":124}, > ... > {code} > I did some [profiling with our Sematext > Monitoring|https://sematext.com/docs/monitoring/on-demand-profiling/] and it > points me to OrdinalMap building (see attached screenshot). If I read the > code right, an OrdinalMap is built with every facet. And it's expensive since > there are many unique values in the shard (previously, there we more smaller > shards, making latency better, but this approach doesn't scale for this > particular use-case). > If I'm right up to this point, I see a couple of potential improvements, > [inspired from > Elasticsearch|#search-aggregations-bucket-terms-aggregation-execution-hint]: > # *Keep the OrdinalMap cached until the next softCommit*, so that only the > first query takes the penalty > # *Allow faceting on actual values (a Map) rather than ordinals*, for > situations like the one above where we have few matching documents. We could > potentially auto-detect this scenario (e.g. by configuring a threshold) and > use a Map when there are few documents > I'm curious about what you're thinking: > * would a PR/patch be welcome for any of the two ideas above? > * do you see better options? am I missing something? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide
[ https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237501#comment-17237501 ] Gus Heck commented on SOLR-15014: - Actually I got brave and let it run longer, and it seems to stop after 30 replicas have been created, leaving me with 31 replicas of shard 1 (and still 1 of shard 2) > Runaway replica creation with autoscaling example from ref guide > > > Key: SOLR-15014 > URL: https://issues.apache.org/jira/browse/SOLR-15014 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.6.3 >Reporter: Gus Heck >Priority: Major > Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, > image-2020-11-23-11-37-15-124.png > > > Although the present autoscaling implementation is deprecated, I have a > client intent on using it, and in trying to create rules that ensure all > replicas on all nodes, I wound up getting into a state where one replica was > (apparently) infinitely creating new copies of itself. The boiled down steps > to reproduce: > Create a 4 node cluster locally for testing from a checkout of the tagged > version for 8.6.3 > (Using solr/cloud-dev/cloud.sh) > {code:java} > ./cloud.sh new -r > {code} > Create a collection > {code:java} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1 > {code} > Add this trigger from the ref guide > ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):] > {code:java} > { > "set-trigger": { > "name": "node_added_trigger", > "event": "nodeAdded", > "waitFor": "5s", > "preferredOperation": "ADDREPLICA", > "replicaType": "PULL" > } > } > {code} > Reboot the cluster, and when it comes up infinite replica creation ensues > (attaching screen shot of admin UI showing replicated shard momentarily) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15015) Add support for Interleaving Algorithm parameter in Learning To Rank
Alessandro Benedetti created SOLR-15015: --- Summary: Add support for Interleaving Algorithm parameter in Learning To Rank Key: SOLR-15015 URL: https://issues.apache.org/jira/browse/SOLR-15015 Project: Solr Issue Type: Task Security Level: Public (Default Security Level. Issues are Public) Components: contrib - LTR Reporter: Alessandro Benedetti Interleaving has been contributed with SOLR-14560 and it now supports just one algorithm ( Team Draft) To facilitate contributions of new algorithm the scope of this issue is to support a new parameter : 'interleavingAlgorithm' (tentative) Default value will be team draft interleaving. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide
[ https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237495#comment-17237495 ] Gus Heck commented on SOLR-15014: - Discussion on slack suggests that given the fact that this functionality is going away, the primary thing here will be to remove the example from the ref guide. (or if folks have an idea how to mitigate it with additional configuration, add that to the ref guide, but I haven't found such yet) > Runaway replica creation with autoscaling example from ref guide > > > Key: SOLR-15014 > URL: https://issues.apache.org/jira/browse/SOLR-15014 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.6.3 >Reporter: Gus Heck >Priority: Major > Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, > image-2020-11-23-11-37-15-124.png > > > Although the present autoscaling implementation is deprecated, I have a > client intent on using it, and in trying to create rules that ensure all > replicas on all nodes, I wound up getting into a state where one replica was > (apparently) infinitely creating new copies of itself. The boiled down steps > to reproduce: > Create a 4 node cluster locally for testing from a checkout of the tagged > version for 8.6.3 > (Using solr/cloud-dev/cloud.sh) > {code:java} > ./cloud.sh new -r > {code} > Create a collection > {code:java} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1 > {code} > Add this trigger from the ref guide > ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):] > {code:java} > { > "set-trigger": { > "name": "node_added_trigger", > "event": "nodeAdded", > "waitFor": "5s", > "preferredOperation": "ADDREPLICA", > "replicaType": "PULL" > } > } > {code} > Reboot the cluster, and when it comes up infinite replica creation ensues > (attaching screen shot of admin UI showing replicated shard momentarily) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide
[ https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-15014: Attachment: Screen Shot 2020-11-23 at 11.40.29 AM.png > Runaway replica creation with autoscaling example from ref guide > > > Key: SOLR-15014 > URL: https://issues.apache.org/jira/browse/SOLR-15014 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.6.3 >Reporter: Gus Heck >Priority: Major > Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, > image-2020-11-23-11-37-15-124.png > > > Although the present autoscaling implementation is deprecated, I have a > client intent on using it, and in trying to create rules that ensure all > replicas on all nodes, I wound up getting into a state where one replica was > (apparently) infinitely creating new copies of itself. The boiled down steps > to reproduce: > Create a 4 node cluster locally for testing from a checkout of the tagged > version for 8.6.3 > (Using solr/cloud-dev/cloud.sh) > {code:java} > ./cloud.sh new -r > {code} > Create a collection > {code:java} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1 > {code} > Add this trigger from the ref guide > ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):] > {code:java} > { > "set-trigger": { > "name": "node_added_trigger", > "event": "nodeAdded", > "waitFor": "5s", > "preferredOperation": "ADDREPLICA", > "replicaType": "PULL" > } > } > {code} > Reboot the cluster, and when it comes up infinite replica creation ensues > (attaching screen shot of admin UI showing replicated shard momentarily) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide
Gus Heck created SOLR-15014: --- Summary: Runaway replica creation with autoscaling example from ref guide Key: SOLR-15014 URL: https://issues.apache.org/jira/browse/SOLR-15014 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Affects Versions: 8.6.3 Reporter: Gus Heck Attachments: image-2020-11-23-11-37-15-124.png Although the present autoscaling implementation is deprecated, I have a client intent on using it, and in trying to create rules that ensure all replicas on all nodes, I wound up getting into a state where one replica was (apparently) infinitely creating new copies of itself. The boiled down steps to reproduce: Create a 4 node cluster locally for testing from a checkout of the tagged version for 8.6.3 (Using solr/cloud-dev/cloud.sh) {code:java} ./cloud.sh new -r {code} Create a collection {code:java} http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1 {code} Add this trigger from the ref guide ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):] {code:java} { "set-trigger": { "name": "node_added_trigger", "event": "nodeAdded", "waitFor": "5s", "preferredOperation": "ADDREPLICA", "replicaType": "PULL" } } {code} Reboot the cluster, and when it comes up infinite replica creation ensues (attaching screen shot of admin UI showing replicated shard momentarily) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15008) Avoid building OrdinalMap for each facet
[ https://issues.apache.org/jira/browse/SOLR-15008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radu Gheorghe resolved SOLR-15008. -- Resolution: Won't Fix > Avoid building OrdinalMap for each facet > > > Key: SOLR-15008 > URL: https://issues.apache.org/jira/browse/SOLR-15008 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Affects Versions: 8.7 >Reporter: Radu Gheorghe >Priority: Major > Labels: performance > Attachments: Screenshot 2020-11-19 at 12.01.55.png, writes_commits.png > > > I'm running against the following scenario: > * [JSON] faceting on a high cardinality field > * few matching documents => few unique values > Yet the query almost always takes a long time. Here's an example taking > almost 4s for ~300 documents and unique values (edited a bit): > > {code:java} > "QTime":3869, > "params":{ > "json":"{\"query\": \"*:*\", > \"filter\": [\"type:test_type\", \"date:[1603670360 TO 1604361599]\", > \"unique_id:49866\"] > \"facet\": > {\"keywords\":{\"type\":\"terms\",\"field\":\"keywords\",\"limit\":20,\"mincount\":20}}}", > "rows":"0"}}, > > "response":{"numFound":333,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[] > }, > "facets":{ > "count":333, > "keywords":{ > "buckets":[{ > "val":"value1", > "count":124}, > ... > {code} > I did some [profiling with our Sematext > Monitoring|https://sematext.com/docs/monitoring/on-demand-profiling/] and it > points me to OrdinalMap building (see attached screenshot). If I read the > code right, an OrdinalMap is built with every facet. And it's expensive since > there are many unique values in the shard (previously, there we more smaller > shards, making latency better, but this approach doesn't scale for this > particular use-case). > If I'm right up to this point, I see a couple of potential improvements, > [inspired from > Elasticsearch|#search-aggregations-bucket-terms-aggregation-execution-hint]: > # *Keep the OrdinalMap cached until the next softCommit*, so that only the > first query takes the penalty > # *Allow faceting on actual values (a Map) rather than ordinals*, for > situations like the one above where we have few matching documents. We could > potentially auto-detect this scenario (e.g. by configuring a threshold) and > use a Map when there are few documents > I'm curious about what you're thinking: > * would a PR/patch be welcome for any of the two ideas above? > * do you see better options? am I missing something? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15008) Avoid building OrdinalMap for each facet
[ https://issues.apache.org/jira/browse/SOLR-15008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237488#comment-17237488 ] Radu Gheorghe commented on SOLR-15008: -- > Yes, I'd be inclined to agree (pending confirmation that warming query fixes >the issue). I can confirm: we enabled warming and pretty much all facets are now <100ms (the ones we tested today were 5s+ before quite constantly). CPU didn't significantly increase, nor did heap usage. An interesting thing, though: we couldn't add a json.facet to the warmup query. We ended up adding a regular facet. By semi-accident, it has mincount=0 and the query matches no docs (this part was intentional, so we don't have an expensive warmup). It seems to do the trick on all collections. I'll close this issue for now. If using a "facet-by-value" would be interesting, I guess we can reopen it or open a new one. Thanks a lot Michael. And, to address the elephant in the room, I realize this should have been a mailing list thread, sorry. I was too sure of my research that the caching part wasn't implemented :( > Avoid building OrdinalMap for each facet > > > Key: SOLR-15008 > URL: https://issues.apache.org/jira/browse/SOLR-15008 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Affects Versions: 8.7 >Reporter: Radu Gheorghe >Priority: Major > Labels: performance > Attachments: Screenshot 2020-11-19 at 12.01.55.png, writes_commits.png > > > I'm running against the following scenario: > * [JSON] faceting on a high cardinality field > * few matching documents => few unique values > Yet the query almost always takes a long time. Here's an example taking > almost 4s for ~300 documents and unique values (edited a bit): > > {code:java} > "QTime":3869, > "params":{ > "json":"{\"query\": \"*:*\", > \"filter\": [\"type:test_type\", \"date:[1603670360 TO 1604361599]\", > \"unique_id:49866\"] > \"facet\": > {\"keywords\":{\"type\":\"terms\",\"field\":\"keywords\",\"limit\":20,\"mincount\":20}}}", > "rows":"0"}}, > > "response":{"numFound":333,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[] > }, > "facets":{ > "count":333, > "keywords":{ > "buckets":[{ > "val":"value1", > "count":124}, > ... > {code} > I did some [profiling with our Sematext > Monitoring|https://sematext.com/docs/monitoring/on-demand-profiling/] and it > points me to OrdinalMap building (see attached screenshot). If I read the > code right, an OrdinalMap is built with every facet. And it's expensive since > there are many unique values in the shard (previously, there we more smaller > shards, making latency better, but this approach doesn't scale for this > particular use-case). > If I'm right up to this point, I see a couple of potential improvements, > [inspired from > Elasticsearch|#search-aggregations-bucket-terms-aggregation-execution-hint]: > # *Keep the OrdinalMap cached until the next softCommit*, so that only the > first query takes the penalty > # *Allow faceting on actual values (a Map) rather than ordinals*, for > situations like the one above where we have few matching documents. We could > potentially auto-detect this scenario (e.g. by configuring a threshold) and > use a Map when there are few documents > I'm curious about what you're thinking: > * would a PR/patch be welcome for any of the two ideas above? > * do you see better options? am I missing something? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] HoustonPutman commented on a change in pull request #1972: SOLR-14915: Prometheus-exporter does not depend on Solr-core any longer
HoustonPutman commented on a change in pull request #1972: URL: https://github.com/apache/lucene-solr/pull/1972#discussion_r528841890 ## File path: solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/exporter/MetricsConfiguration.java ## @@ -77,22 +79,22 @@ public PrometheusExporterSettings getSettings() { return searchConfiguration; } - public static MetricsConfiguration from(String path) throws Exception { + public static MetricsConfiguration from(String resource) throws Exception { // See solr-core XmlConfigFile final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); try { dbf.setXIncludeAware(true); dbf.setNamespaceAware(true); } catch (UnsupportedOperationException e) { - log.warn("{} XML parser doesn't support XInclude option", path); + log.warn("{} XML parser doesn't support XInclude option", resource); } Document document; -File file = new File(path); -if (file.isFile()) { - document = dbf.newDocumentBuilder().parse(file); +Path path = Path.of(resource); +if (Files.exists(path)) { + document = dbf.newDocumentBuilder().parse(path.toAbsolutePath().toString()); Review comment: Good catch. I tested it in multiple configurations, but there's probably some edge case it doesn't work for. I'll use `path.toUri().toASCIIString()` since the `parse(file)` method just gets the URI of the file and passes that to the `parse(uri)` implementation anyways. Might as well skip the middleman-method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15013) Reproducing test failure TestFieldCacheSort 8x
Erick Erickson created SOLR-15013: - Summary: Reproducing test failure TestFieldCacheSort 8x Key: SOLR-15013 URL: https://issues.apache.org/jira/browse/SOLR-15013 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Erick Erickson ant test -Dtestcase=TestFieldCacheSort -Dtests.method=testEmptyStringVsNullStringSort -Dtests.seed=2E14D932C133811F -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=hr-BA -Dtests.timezone=America/Scoresbysund -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] Started J0 PID(96093@localhost). [junit4] Suite: org.apache.solr.uninverting.TestFieldCacheSort [junit4] 2> 1334 INFO (SUITE-TestFieldCacheSort-seed#[2E14D932C133811F]-worker) [ ] o.a.s.SolrTestCase Setting 'solr.default.confdir' system property to test-framework derived value of '/Users/Erick/apache/solr/solrtest8/solr/server/solr/configsets/_default/conf' [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestFieldCacheSort -Dtests.method=testEmptyStringVsNullStringSort -Dtests.seed=2E14D932C133811F -Dtests.multiplier=3 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=hr-BA -Dtests.timezone=America/Scoresbysund -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] FAILURE 0.40s | TestFieldCacheSort.testEmptyStringVsNullStringSort <<< [junit4] > Throwable #1: java.lang.AssertionError: expected:<1> but was:<0> [junit4] > at __randomizedtesting.SeedInfo.seed([2E14D932C133811F:4FF7EBE5B95287AF]:0) [junit4] > at org.apache.solr.uninverting.TestFieldCacheSort.testEmptyStringVsNullStringSort(TestFieldCacheSort.java:1610) [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] > at java.base/java.lang.reflect.Method.invoke(Method.java:566) [junit4] > at java.base/java.lang.Thread.run(Thread.java:834) [junit4] 2> NOTE: test params are: codec=CheapBastard, sim=Asserting(RandomSimilarity(queryNorm=false): \{t=DFR I(F)BZ(0.3)}), locale=hr-BA, timezone=America/Scoresbysund [junit4] 2> NOTE: Mac OS X 10.16 x86_64/AdoptOpenJDK 11.0.5 (64-bit)/cpus=12,threads=1,free=468375472,total=536870912 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14413) allow timeAllowed and cursorMark parameters
[ https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237423#comment-17237423 ] Bram Van Dam commented on SOLR-14413: - [~slackhappy] [~mdrob] Sounds good to me. When partialResults is set, you know it's possible that some items are missing from the result set. If that's not acceptable, you should retry with a larger timeout, or inform the user that their query is unacceptable. This is definitely a tradeoff I can live with. Thanks for the effort! I hope someone will be kind enough to merge this <3 > allow timeAllowed and cursorMark parameters > --- > > Key: SOLR-14413 > URL: https://issues.apache.org/jira/browse/SOLR-14413 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: John Gallagher >Priority: Minor > Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, > SOLR-14413-jg-update2.patch, SOLR-14413-jg-update3.patch, SOLR-14413.patch, > Screen Shot 2020-10-23 at 10.08.26 PM.png, Screen Shot 2020-10-23 at 10.09.11 > PM.png, image-2020-08-18-16-56-41-736.png, image-2020-08-18-16-56-59-178.png, > image-2020-08-21-14-18-36-229.png, timeallowed_cursormarks_results.txt > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and > timeAllowed parameters were not allowed in combination ("Can not search using > both cursorMark and timeAllowed") > , from [QueryComponent.java|#L359]]: > > {code:java} > > if (null != rb.getCursorMark() && 0 < timeAllowed) { > // fundamentally incompatible > throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not > search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + > CommonParams.TIME_ALLOWED); > } {code} > While theoretically impure to use them in combination, it is often desirable > to support cursormarks-style deep paging and attempt to protect Solr nodes > from runaway queries using timeAllowed, in the hopes that most of the time, > the query completes in the allotted time, and there is no conflict. > > However if the query takes too long, it may be preferable to end the query > and protect the Solr node and provide the user with a somewhat inaccurate > sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is > frequently used to prevent runaway load. In fact, cursorMark and > shards.tolerant are allowed in combination, so any argument in favor of > purity would be a bit muddied in my opinion. > > This was discussed once in the mailing list that I can find: > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E] > It did not look like there was strong support for preventing the combination. > > I have tested cursorMark and timeAllowed combination together, and even when > partial results are returned because the timeAllowed is exceeded, the > cursorMark response value is still valid and reasonable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian
iverase commented on pull request #2094: URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-732187741 We have the same situation in PackedInts and FST where you cannot just pass a wrapped object. There is as well the issue where the serialisation / deserialisation is endian dependent. In DocIdsWriter we serialise using: ``` out.writeShort(out, (short) (docIds[start + i] >>> 8)); out.writeByte((byte) docIds[start + i]); ``` But deserialise using: ``` long l1 = in readLong(in); long l2 = in readLong(in); long l3 = in readLong(in); ``` There is a similar situation in `CompressingStoredFieldsWriter`. I have another iteration to see if we can simplify but just wrapping the IndexOutput / IndexInput now that I have better understanding on the problem. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian
jpountz commented on pull request #2094: URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-732177038 If we only have a few cases like that, maybe we could fork the writeHeader/readHeader logic inside BKDWriter/BKDReader so that we can apply different migration rules to these calls than to `CodecUtil#readHeader/writeHeader`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian
iverase commented on pull request #2094: URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-732172442 That was my first approach but it become too hairy once I started to process headers and footers without wrapping the IndexOutput / IndexInput. One example is in the BKD tree we have the following line: https://github.com/apache/lucene-solr/blob/59b17366ff45d958810b1f8e4950eebd93f1b20d/lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java#L993 That means we will need a reference to the unwrapped IndexOutput to call this line. I did not want to change method signatures or move code around on this first pass so I went to manually revert endianness when needed so we could have a good understanding of the places where work is needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian
dweiss commented on pull request #2094: URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-732161113 I agree with Adrien - I thought (but please correct me if I'm wrong) that a single wrapper would be needed to keep the code compatible with existing indexes and dropping this wrapper would make everything work without those numerous calls to manual byte-shuffling in the "reverser"... I'm sorry if I fail to see the bigger picture here but looking at the diff it seems more complicated than it was before. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9047) Directory APIs should be little endian
[ https://issues.apache.org/jira/browse/LUCENE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237366#comment-17237366 ] Dawid Weiss commented on LUCENE-9047: - Hmm I don't know. I look at this patch and I can't dodge the feeling that somehow it's become more complex and weird than it was before... All those calls to EndiannessReverserUtil kill me. Even the byte-by-byte snippets have become expanded into a multitude of local variables to be assembled again (I prefer the terse version that doesn't use local variables, to be honest). > Directory APIs should be little endian > -- > > Key: LUCENE-9047 > URL: https://issues.apache.org/jira/browse/LUCENE-9047 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > We started discussing this on LUCENE-9027. It's a shame that we need to keep > reversing the order of bytes all the time because our APIs are big endian > while the vast majority of architectures are little endian. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian
jpountz commented on pull request #2094: URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-732153572 Wow, exciting! I'm curious why you didn't generalize usage of `EndiannessReverserIndexInput` to all index formats like you did for SegmentInfos. My expectation is that it would have helped keep the change contained to very few lines in the constructor of the various readers/writers rather than scattered in all places that read or write ints/longs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9619) Move Points from a visitor API to a custor-style API?
[ https://issues.apache.org/jira/browse/LUCENE-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237347#comment-17237347 ] Adrien Grand commented on LUCENE-9619: -- Here is what I have in mind in terms of API. The main downside compared to today is that it makes more assumptions about how points are implemented under the hood, e.g. it suggests a tree structure, which the current API doesn't. But I like that it would give us more control over how matching is performed as mentioned in the issue description. I still tried to push too many requirements to possible implementations, e.g. not enforcing an arity of 2 on inner nodes and not enforcing that the tree is balanced. {code:java} import java.io.IOException; import java.util.function.IntConsumer; import org.apache.lucene.search.DocIdSetIterator; public abstract class PointValues { /* Global statistics, that don't change when moving from a node to another node. */ /** Returns how many dimensions are represented in the values */ public abstract int getNumDimensions() throws IOException; /** Returns how many dimensions are used for the index */ public abstract int getNumIndexDimensions() throws IOException; /** Returns the number of bytes per dimension */ public abstract int getBytesPerDimension() throws IOException; /** Return the total number of documents that have a value for this field, across all nodes. */ public abstract int getTotalDocCount(); /* Per-node statistics */ /** Return the minimum packed value of the current node. */ public abstract byte[] getMinPackedValue(); /** Return the maximum packed value of the current node. */ public abstract byte[] getMaxPackedValue(); /** Return the total number of points under the current node. On the root node this returns the * total number of points in the field on the current segment. */ public abstract long size(); /* API to walk the tree. */ /** Move to the first child node and return {@code true} upon success. Returns {@code false} for * leaf nodes and {@code true} otherwise. */ public abstract boolean moveToChild(); /** Move to the parent node and return {@code true} upon success. Returns {@code false} for the * root node and {@code true} otherwise. */ public abstract boolean moveToParent(); /** Move to the next sibling node and return {@code true} upon success. Returns {@code false} if * the current node has no more siblings. */ public abstract boolean moveToSibling(); /** A visitor for the content of the tree. */ @FunctionalInterface public interface IntersectVisitor { /** Called for all documents in a leaf cell that crosses the query. The consumer * should scrutinize the packedValue to decide whether to accept it. In the 1D case, * values are visited in increasing order, and in the case of ties, in increasing * docID order. */ void visit(int docID, byte[] packedValue) throws IOException; /** Similar to {@link IntersectVisitor#visit(int, byte[])} but in this case the packedValue * can have more than one docID associated to it. The provided iterator should not escape the * scope of this method so that implementations of PointValues are free to reuse it,*/ default void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOException { int docID; while ((docID = iterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { visit(docID, packedValue); } } } /** Visit all (document,value) pairs under the current node. {@link IntersectVisitor#visit} will * be called {@link #size()} times. */ public abstract void intersect(IntersectVisitor visitor); /** Visit all documents under the current node. {@link IntConsumer#accept} will be called * {@link #size()} times. */ public abstract void intersectAll(IntConsumer visitor); } {code} Opinions? > Move Points from a visitor API to a custor-style API? > - > > Key: LUCENE-9619 > URL: https://issues.apache.org/jira/browse/LUCENE-9619 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Points' visitor API work well but there are a couple things we could make > better if we moved to a cursor API, e.g. > - Term queries could return a DocIdSetIterator without having to materialize > a BitSet. > - Nearest-neighbor search could work on top of the regular API instead of > casting to BKDReader > https://github.com/apache/lucene-solr/blob/6a7131ee246d700c2436a85ddc537575de2aeacf/lucene/sandbox/src/java/org/apache/lucene/sandbox/document/FloatPointNearestNeighbor.java#L296 > - We could optimize counting the number of matches of a query by adding the > number of points in a leaf without visiting documents where there are no
[GitHub] [lucene-solr] iverase opened a new pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian
iverase opened a new pull request #2094: URL: https://github.com/apache/lucene-solr/pull/2094 Directory API is now little endian. Note that codecs still work on Big endian for backwards compatibility, therefore they reverse the bytes whenever they are writing / reading short, ints and longs. CodecUtils for header and footers has been modified to be little Indian. Still the version and checksum will be written / read reversing bytes for backwards compatibility. SegmentInfos is read / written in little endian, for previous version, the IndexInput is wrapped for backwards compatibility. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml
[ https://issues.apache.org/jira/browse/SOLR-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237269#comment-17237269 ] Nazerke Seidan edited comment on SOLR-15012 at 11/23/20, 10:19 AM: --- [~gus], you have added a filter marker to a logger (org.apache.solr.servlet.HttpSolrCall). Similarly, we could add a marker to the SolrCore logger. was (Author: nazerke): [~gus], you have added a filter marker to a logger (org.apache.solr.servlet.HttpSolrCall). Similarly, we could add a marker to the logger of SolrCore. > Add a logging filter marker for /admin/ping requests to be silenced via > log4j2.xml > -- > > Key: SOLR-15012 > URL: https://issues.apache.org/jira/browse/SOLR-15012 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Nazerke Seidan >Priority: Minor > > While looking at logs, I have observed a lot of noise from /admin/ping > requests which is often called to ping core and all replicas coming from > org.apache.solr.core.SolrCore.Request. I think it makes sense to add a > marker to SolrCore. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml
[ https://issues.apache.org/jira/browse/SOLR-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237269#comment-17237269 ] Nazerke Seidan commented on SOLR-15012: --- [~gus], you have added a filter marker to a logger (org.apache.solr.servlet.HttpSolrCall). Similarly, we could add a marker to the logger of SolrCore. > Add a logging filter marker for /admin/ping requests to be silenced via > log4j2.xml > -- > > Key: SOLR-15012 > URL: https://issues.apache.org/jira/browse/SOLR-15012 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Nazerke Seidan >Priority: Minor > > While looking at logs, I have observed a lot of noise from /admin/ping > requests which is often called to ping core and all replicas coming from > org.apache.solr.core.SolrCore.Request. I think it makes sense to add a > marker to SolrCore. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml
[ https://issues.apache.org/jira/browse/SOLR-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nazerke Seidan updated SOLR-15012: -- Description: While looking at logs, I have observed a lot of noise from /admin/ping requests which is often called to ping core and all replicas coming from org.apache.solr.core.SolrCore.Request. I think it makes sense to add a marker to SolrCore. (was: While looking at logs, I have observed a lot of noise from /admin/ping requests which is often called to ping core and all replicas coming from org.apache.solr.core.SolrCore.Request. [~gus], you have added a filter marker to a logger (org.apache.solr.servlet.HttpSolrCall). I think it makes sense to add a marker to SolrCore. ) > Add a logging filter marker for /admin/ping requests to be silenced via > log4j2.xml > -- > > Key: SOLR-15012 > URL: https://issues.apache.org/jira/browse/SOLR-15012 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Nazerke Seidan >Priority: Minor > > While looking at logs, I have observed a lot of noise from /admin/ping > requests which is often called to ping core and all replicas coming from > org.apache.solr.core.SolrCore.Request. I think it makes sense to add a > marker to SolrCore. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml
Nazerke Seidan created SOLR-15012: - Summary: Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml Key: SOLR-15012 URL: https://issues.apache.org/jira/browse/SOLR-15012 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Nazerke Seidan While looking at logs, I have observed a lot of noise from /admin/ping requests which is often called to ping core and all replicas coming from org.apache.solr.core.SolrCore.Request. [~gus], you have added a filter marker to a logger (org.apache.solr.servlet.HttpSolrCall). I think it makes sense to add a marker to SolrCore. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9620) Add Weight#count(LeafReaderContext)
Adrien Grand created LUCENE-9620: Summary: Add Weight#count(LeafReaderContext) Key: LUCENE-9620 URL: https://issues.apache.org/jira/browse/LUCENE-9620 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand We have IndexSearcher#count today, which tries to optimize counting for TermQuery and MatchAllDocsQuery, and falls back to BulkScorer + TotalHitCountCollector otherwise. I'm considering moving this to Weight instead, where it'd be a better place to add counting optimizations for other queries, e.g. pure disjunctions over single-valued fields or range queries on points. The default implementation could use a BulkScorer+TotalHitCountCollector like IndexSearcher#count does today. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase opened a new pull request #2093: LUCENE-9606: Wrap boolean queries generated by shape fields with a Constant score query
iverase opened a new pull request #2093: URL: https://github.com/apache/lucene-solr/pull/2093 When querying a shape field with a Geometry collection and a CONTAINS spatial relationship, the query is rewritten as a boolean query. We should wrap the resulting query with a ConstantScoreQuery. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9595) Component2D#withinPoint logic is inconsistent with ShapeQuery logic
[ https://issues.apache.org/jira/browse/LUCENE-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-9595. -- Fix Version/s: 8.8 Assignee: Ignacio Vera Resolution: Fixed > Component2D#withinPoint logic is inconsistent with ShapeQuery logic > --- > > Key: LUCENE-9595 > URL: https://issues.apache.org/jira/browse/LUCENE-9595 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: 8.8 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The logic of ShapeQuery for contains assumes that if a branch of the BKD tree > is inside of the shape query, the all documents in that branch are excluded > from the result. On the other hand, Component2D#withinPoint implementation, > eg. Polygon2D, ignores points even when the point is inside the query. > That might lead to inconsistencies in edges cases with geometry collections. > The proposal here is to keep the logic of the shapeQuery and therefore > contains logic will only return true if the query shape is inside a geometry > and it does not intersects with any other geometry belonging to the same > document. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9595) Component2D#withinPoint logic is inconsistent with ShapeQuery logic
[ https://issues.apache.org/jira/browse/LUCENE-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237222#comment-17237222 ] ASF subversion and git services commented on LUCENE-9595: - Commit 2d7d315f970ae413b27d5a11c10de1cb643b089d in lucene-solr's branch refs/heads/branch_8x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2d7d315 ] LUCENE-9595: Make Component2D#withinPoint implementations consistent with ShapeQuery logic (#2059) > Component2D#withinPoint logic is inconsistent with ShapeQuery logic > --- > > Key: LUCENE-9595 > URL: https://issues.apache.org/jira/browse/LUCENE-9595 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > The logic of ShapeQuery for contains assumes that if a branch of the BKD tree > is inside of the shape query, the all documents in that branch are excluded > from the result. On the other hand, Component2D#withinPoint implementation, > eg. Polygon2D, ignores points even when the point is inside the query. > That might lead to inconsistencies in edges cases with geometry collections. > The proposal here is to keep the logic of the shapeQuery and therefore > contains logic will only return true if the query shape is inside a geometry > and it does not intersects with any other geometry belonging to the same > document. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9595) Component2D#withinPoint logic is inconsistent with ShapeQuery logic
[ https://issues.apache.org/jira/browse/LUCENE-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237220#comment-17237220 ] ASF subversion and git services commented on LUCENE-9595: - Commit 44be9f903dbad601d3b46108802a951555a6d7ba in lucene-solr's branch refs/heads/master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=44be9f9 ] LUCENE-9595: Make Component2D#withinPoint implementations consistent with ShapeQuery logic (#2059) > Component2D#withinPoint logic is inconsistent with ShapeQuery logic > --- > > Key: LUCENE-9595 > URL: https://issues.apache.org/jira/browse/LUCENE-9595 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > The logic of ShapeQuery for contains assumes that if a branch of the BKD tree > is inside of the shape query, the all documents in that branch are excluded > from the result. On the other hand, Component2D#withinPoint implementation, > eg. Polygon2D, ignores points even when the point is inside the query. > That might lead to inconsistencies in edges cases with geometry collections. > The proposal here is to keep the logic of the shapeQuery and therefore > contains logic will only return true if the query shape is inside a geometry > and it does not intersects with any other geometry belonging to the same > document. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #2059: LUCENE-9595: Make Component2D#withinPoint implementations consistent with ShapeQuery logic
iverase merged pull request #2059: URL: https://github.com/apache/lucene-solr/pull/2059 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9614) Implement KNN Query
[ https://issues.apache.org/jira/browse/LUCENE-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237219#comment-17237219 ] Adrien Grand commented on LUCENE-9614: -- I wonder if the Query could be just a map from N doc IDs to scores, and the KNN search would actually be run to construct the Query, not as part of running the Query. This way we could still blend scores via BooleanQuery or FeatureField, and even things like block-max WAND would still work. > Implement KNN Query > --- > > Key: LUCENE-9614 > URL: https://issues.apache.org/jira/browse/LUCENE-9614 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > > Now we have a vector index format, and one vector indexing/KNN search > implementation, but the interface is low-level: you can search across a > single segment only. We would like to expose a Query implementation. > Initially, we want to support a usage where the KnnVectorQuery selects the > k-nearest neighbors without regard to any other constraints, and these can > then be filtered as part of an enclosing Boolean or other query. > Later we will want to explore some kind of filtering *while* performing > vector search, or a re-entrant search process that can yield further results. > Because of the nature of knn search (all documents having any vector value > match), it is more like a ranking than a filtering operation, and it doesn't > really make sense to provide an iterator interface that can be merged in the > usual way, in docid order, skipping ahead. It's not yet clear how to satisfy > a query that is "k nearest neighbors satsifying some arbitrary Query", at > least not without realizing a complete bitset for the Query. But this is for > a later issue; *this* issue is just about performing the knn search in > isolation, computing a set of (some given) K nearest neighbors, and providing > an iterator over those. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other
[ https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237211#comment-17237211 ] Samir Huremovic commented on SOLR-14973: Confirmed fixed in 8.7. > Solr 8.6 is shipping libraries that are incompatible with each other > > > Key: SOLR-14973 > URL: https://issues.apache.org/jira/browse/SOLR-14973 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 8.6 >Reporter: Samir Huremovic >Priority: Major > Labels: tika-parsers > > Hi, > since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This > version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} > (see https://issues.apache.org/jira/browse/TIKA-3047) > Solr has version {{4.1.1}} of poi included. > This creates (at least) a problem for parsing {{.xls}} files. The following > exception gets thrown by trying to post an {{.xls}} file in the techproducts > example: > {{java.lang.NoSuchMethodError: > org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9581) Clarify discardCompoundToken behavior in the JapaneseTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi resolved LUCENE-9581. -- Fix Version/s: 8.8 master (9.0) Resolution: Fixed > Clarify discardCompoundToken behavior in the JapaneseTokenizer > -- > > Key: LUCENE-9581 > URL: https://issues.apache.org/jira/browse/LUCENE-9581 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi >Priority: Minor > Fix For: master (9.0), 8.8 > > Attachments: LUCENE-9581.patch, LUCENE-9581.patch, LUCENE-9581.patch > > > At first sight, the discardCompoundToken option added in LUCENE-9123 seems > redundant with the NORMAL mode of the Japanese tokenizer. When set to true, > the current behavior is to disable the decomposition for compounds, that's > exactly what the NORMAL mode does. > So I wonder if the right semantic of the option would be to keep only the > decomposition of the compound or if it's really needed. If the goal is to > make the output compatible with a graph token filter, the current workaround > to set the mode to NORMAL should be enough. > That's consistent with the mode that should be used to preserve positions in > the index since we don't handle position length on the indexing side. > Am I missing something regarding the new option ? Is there a compelling case > where it differs from the NORMAL mode ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9023) GlobalOrdinalsWithScore should not compute occurrences when the provided min is 1
[ https://issues.apache.org/jira/browse/LUCENE-9023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi updated LUCENE-9023: - Fix Version/s: master (9.0) > GlobalOrdinalsWithScore should not compute occurrences when the provided min > is 1 > - > > Key: LUCENE-9023 > URL: https://issues.apache.org/jira/browse/LUCENE-9023 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jim Ferenczi >Priority: Minor > Fix For: master (9.0), 8.8 > > Time Spent: 40m > Remaining Estimate: 0h > > This is a continuation of https://issues.apache.org/jira/browse/LUCENE-9022 > Today the GlobalOrdinalsWithScore collector and query checks the number of > matching docs per parent if the provided min is greater than 0. However we > should also not compute the occurrences of children when min is equals to 1 > since this is the minimum requirement for a document to match. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org