[jira] [Comment Edited] (LUCENE-9616) Improve test coverage for internal format versions

2020-11-23 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237767#comment-17237767
 ] 

Julie Tibshirani edited comment on LUCENE-9616 at 11/24/20, 12:45 AM:
--

This seems like a nice way to reframe the issue: if internal versions are meant 
for bug fixes, maybe the real problem is that we even have internal versions 
with a lot of logic that needs testing? This would avoid the need for a special 
testing approach.

I noticed PointsFormat uses internal versions extensively, in particular 
{{BKDWriter}} has ~5 internal versions. Moving away from internal versions 
seems like it’d cause much more code to be duplicated. Maybe that’s okay, or 
maybe we’d choose to maintain some shared write logic.


was (Author: jtibshirani):
This seems like a nice way to reframe the issue: if internal versions are meant 
for bug fixes, maybe the real problem is that we even have internal versions 
with a lot of logic that needs testing? This would avoid the need for a special 
testing approach.

I noticed PointsFormat uses internal versions extensively, in particular 
`BKDWriter` has ~5 internal versions. Moving away from internal versions seems 
like it’d cause much more code to be duplicated. Maybe that’s okay, or maybe 
we’d choose to maintain some shared write logic.

> Improve test coverage for internal format versions
> --
>
> Key: LUCENE-9616
> URL: https://issues.apache.org/jira/browse/LUCENE-9616
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Julie Tibshirani
>Priority: Minor
>
> Some formats use an internal versioning system -- for example 
> {{CompressingStoredFieldsFormat}} maintains older logic for reading an 
> on-heap fields index. Because we always allow reading segments from the 
> current + previous major version, some users still rely on the read-side 
> logic of older internal versions.
> Although the older version logic is covered by 
> {{TestBackwardsCompatibility}}, it looks like it's not exercised in unit 
> tests. Older versions aren't "in rotation" when choosing a random codec for 
> tests. They also don't have dedicated unit tests as we have for separate 
> older formats, for example {{TestLucene60PointsFormat}}.
> It could be good to improve unit test coverage for the older versions, since 
> they're in active use. A downside is that it's not straightforward to add 
> unit tests, since we tend to just change/ delete the old write-side logic as 
> we bump internal versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] TomMD commented on pull request #2092: SOLR-15009 Propogate IOException from DF.exists

2020-11-23 Thread GitBox


TomMD commented on pull request #2092:
URL: https://github.com/apache/lucene-solr/pull/2092#issuecomment-732502367


   @madrob I see a backend failure which lacks sufficient explanation.  We have 
opened a ticket to investigate further.  In the mean time I have re-started 
analysis of this PR.  I can babysit this - if the job does not finish within 
the next hour then we'll have more to investigate.  Thank you for asking about 
this, it's certainly not a situation that should come up.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14788) Solr: The Next Big Thing

2020-11-23 Thread Mark Robert Miller (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237734#comment-17237734
 ] 

Mark Robert Miller edited comment on SOLR-14788 at 11/23/20, 10:38 PM:
---

It’s been on my mind, so before I forget and before I reveal the final state of 
phase 1, I’d like rectify one of own complaints. [~caomanhdat] played a silent 
but but crucial role in this work.

He took some of http2 work on the starburst branch and ran with it and got it 
committed. That helped me a lot in this work.

He did a fantastic initial Gradle implementation, and Gradle was critical to 
speeding up my dev iteration times for this work. It’s a fair bit slower for me 
these days, but I didn’t finish the work and so I won’t complain about trade 
offs I don’t know about.

He did the initial search side async impl that ive stolen - I had taken a early 
not yet working stab at it, his work was better and more complete.

Others may have been involved, but his work on leader initiated recovery is a 
fantastic and crucial part to this whole system.

And likely I am forgetting something, but more importantly, as I found his work 
built on mine or built my work on his or saw him start issues that were also in 
my backlog, he gave me the feeling that I am not alone in my thinking of what 
can and should be done with SolrCloud.

Thank you Dat, whatever comes out of this issue, you were a keystone to it. 


was (Author: markrmiller):
It’s been on my mind, so before I forget and before I reveal the final state of 
phase 1, I’d like rectify one of own complaints. [~caomanhdat] played a silent 
but but crucial role in this work.

He took some of http2 work on the starburst branch and ran with it and got it 
committer. That helped me a lot in this work.

He did a fantastic initial Gradle implementation, and Gradle was critical to 
speeding up my dev iteration times for this work. It’s a fair bit slower for me 
these days, but I didn’t finish the work and so I won’t complain about trade 
offs I don’t know about.

He did the initial search side async impl that ive stolen - I had taken a early 
not yet working stab at it, his work was better and more complete.

Others may have been involved, but his work on leader initiated recovery is a 
fantastic and crucial part to this whole system.

And likely I am forgetting something, but more importantly, as I found his work 
built on mine or built my work on his or say him start issues that were also in 
my backlog, he gave me the feeling that I am not alone in my thinking of can 
and should be done with SolrCloud.

Thank you Dat, whatever comes out of this issue, you were a keystone to it. 

> Solr: The Next Big Thing
> 
>
> Key: SOLR-14788
> URL: https://issues.apache.org/jira/browse/SOLR-14788
> Project: Solr
>  Issue Type: Task
>Reporter: Mark Robert Miller
>Assignee: Mark Robert Miller
>Priority: Critical
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> h3. 
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
>  Policeman is on duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
> have some fun. Try to make some progress. Don't stress too much about the 
> impact of your changes or maintaining stability and performance and 
> correctness so much. Until the end of phase 1, I've got your back. I have a 
> variety of tools and contraptions I have been building over the years and I 
> will continue training them on this branch. I will review your changes and 
> peer out across the land and course correct where needed. As Mike D will be 
> thinking, "Sounds like a bottleneck Mark." And indeed it will be to some 
> extent. Which is why once stage one is completed, I will flip The Policeman 
> to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} 
> *down for some vigilante justice, but I won't be walking the beat, all that 
> stuff about sit back and relax goes out the window.*{color}_
> {quote}
>  
> I have stolen this title from Ishan or Noble and Ishan.
> This issue is meant to capture the work of a small team that is forming to 
> push Solr and SolrCloud to the next phase.
> I have kicked off the work with an effort to create a very fast and solid 
> base. That work is not 100% done, but it's ready to join the fight.
> Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
> Noble have already contributed support and testing and have plans for 
> additional work to shore up some of our current shortcomings.
> Others have expressed an interest in helping and hopefully they will pop up 
> here as well.
> Let's organize and discuss our efforts here and in various sub issues.



--

[jira] [Commented] (SOLR-14788) Solr: The Next Big Thing

2020-11-23 Thread Mark Robert Miller (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237734#comment-17237734
 ] 

Mark Robert Miller commented on SOLR-14788:
---

It’s been on my mind, so before I forget and before I reveal the final state of 
phase 1, I’d like rectify one of own complaints. [~caomanhdat] played a silent 
but but crucial role in this work.

He took some of http2 work on the starburst branch and ran with it and got it 
committer. That helped me a lot in this work.

He did a fantastic initial Gradle implementation, and Gradle was critical to 
speeding up my dev iteration times for this work. It’s a fair bit slower for me 
these days, but I didn’t finish the work and so I won’t complain about trade 
offs I don’t know about.

He did the initial search side async impl that ive stolen - I had taken a early 
not yet working stab at it, his work was better and more complete.

Others may have been involved, but his work on leader initiated recovery is a 
fantastic and crucial part to this whole system.

And likely I am forgetting something, but more importantly, as I found his work 
built on mine or built my work on his or say him start issues that were also in 
my backlog, he gave me the feeling that I am not alone in my thinking of can 
and should be done with SolrCloud.

Thank you Dat, whatever comes out of this issue, you were a keystone to it. 

> Solr: The Next Big Thing
> 
>
> Key: SOLR-14788
> URL: https://issues.apache.org/jira/browse/SOLR-14788
> Project: Solr
>  Issue Type: Task
>Reporter: Mark Robert Miller
>Assignee: Mark Robert Miller
>Priority: Critical
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> h3. 
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
>  Policeman is on duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
> have some fun. Try to make some progress. Don't stress too much about the 
> impact of your changes or maintaining stability and performance and 
> correctness so much. Until the end of phase 1, I've got your back. I have a 
> variety of tools and contraptions I have been building over the years and I 
> will continue training them on this branch. I will review your changes and 
> peer out across the land and course correct where needed. As Mike D will be 
> thinking, "Sounds like a bottleneck Mark." And indeed it will be to some 
> extent. Which is why once stage one is completed, I will flip The Policeman 
> to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} 
> *down for some vigilante justice, but I won't be walking the beat, all that 
> stuff about sit back and relax goes out the window.*{color}_
> {quote}
>  
> I have stolen this title from Ishan or Noble and Ishan.
> This issue is meant to capture the work of a small team that is forming to 
> push Solr and SolrCloud to the next phase.
> I have kicked off the work with an effort to create a very fast and solid 
> base. That work is not 100% done, but it's ready to join the fight.
> Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
> Noble have already contributed support and testing and have plans for 
> additional work to shore up some of our current shortcomings.
> Others have expressed an interest in helping and hopefully they will pop up 
> here as well.
> Let's organize and discuss our efforts here and in various sub issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dxl360 edited a comment on pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled

2020-11-23 Thread GitBox


dxl360 edited a comment on pull request #2080:
URL: https://github.com/apache/lucene-solr/pull/2080#issuecomment-732411622


   Had offline discussion with @mikemccand. Maybe we can change the type of 
`invertState.length` from `int` to `long` and keep the current check on field 
length/termFreq accumulation but safely cast the length back to `int` when 
calculating the norms. `long totalTermFreq/sumTotalTermFreq` is not expected to 
be broken by `invertState.length`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dxl360 commented on pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled

2020-11-23 Thread GitBox


dxl360 commented on pull request #2080:
URL: https://github.com/apache/lucene-solr/pull/2080#issuecomment-732411622


   Had offline discussion with @mikemccand. Maybe we can change the type of 
`invertState.length` from `int` to `long` and keep the current check on field 
length/termFreq accumulation but safely cast the length back to `int` when 
calculating the norms. `totalTermFreq` and `sumTotalTermFreq` are both `long` 
and is not expected to be broken by `invertState.length`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2095: LUCENE-9618: Do not call IntervalIterator.nextInterval after NO_MORE_DOCS returned

2020-11-23 Thread GitBox


mikemccand commented on a change in pull request #2095:
URL: https://github.com/apache/lucene-solr/pull/2095#discussion_r528930069



##
File path: 
lucene/queries/src/java/org/apache/lucene/queries/intervals/IntervalIterator.java
##
@@ -82,6 +82,11 @@ public int width() {
   /**
* Advance the iterator to the next interval
*
+   * Should not be called after {@link DocIdSetIterator#NO_MORE_DOCS} is 
returned by other methods
+   * if that's the case in some existing code, please consider opening an issue

Review comment:
   This is a new sentence -- maybe add `.` at end of previous one and 
capitlize `If`?

##
File path: 
lucene/queries/src/java/org/apache/lucene/queries/intervals/IntervalIterator.java
##
@@ -82,6 +82,11 @@ public int width() {
   /**
* Advance the iterator to the next interval
*
+   * Should not be called after {@link DocIdSetIterator#NO_MORE_DOCS} is 
returned by other methods

Review comment:
   Hmm maybe be more specific than `other methods`?  E.g. maybe say 
`returned by the query scorer's nextDoc() method`?

##
File path: 
lucene/queries/src/java/org/apache/lucene/queries/intervals/IntervalIterator.java
##
@@ -82,6 +82,11 @@ public int width() {
   /**
* Advance the iterator to the next interval
*
+   * Should not be called after {@link DocIdSetIterator#NO_MORE_DOCS} is 
returned by other methods
+   * if that's the case in some existing code, please consider opening an issue
+   * However, after {@link IntervalIterator#NO_MORE_INTERVALS} is returned by 
this method, it might be
+   * called again

Review comment:
   Period at end of this sentence?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti opened a new pull request #2096: SOLR-15015: added support to parametric Interleaving algorithm

2020-11-23 Thread GitBox


alessandrobenedetti opened a new pull request #2096:
URL: https://github.com/apache/lucene-solr/pull/2096


   # Description
   
   This pull requests add a parameter 'interleavingAlgorithm' in Learning To 
Rank to specify the Interleaving algorithm to use (Only TeamDraft supported)
   # Solution
   
   Added the parameter and defaults
   
   # Tests
   
   Added a new test class for the  LTR QueryParser, related with interleaving 
parameters
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [X] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [X] I have developed this patch against the `master` branch.
   - [X] I have run `./gradlew check`.
   - [X] I have added tests for my changes.
   - [X] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zhaih opened a new pull request #2095: LUCENE-9618: Do not call IntervalIterator.nextInterval after NO_MORE_DOCS returned

2020-11-23 Thread GitBox


zhaih opened a new pull request #2095:
URL: https://github.com/apache/lucene-solr/pull/2095


   
   
   
   
   # Description
   
* In `ConjunctionIntervalIterator` check whether approximation's returned 
docId is NO_MORE_DOCS to avoid `nextInterval()` call after NO_MORE_DOCS is 
returned
* Add a test case to verify the problem is addressed
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml

2020-11-23 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237590#comment-17237590
 ] 

David Smiley edited comment on SOLR-15012 at 11/23/20, 6:31 PM:


Moreover... I wonder if all "request logging" (no matter node/admin level vs 
specific to a SolrCore) ought to be logged by one call site using one "Request" 
SLF4J logger and with the MarkerFilter of the handler.  Something to consider.


was (Author: dsmiley):
Moreover... I wonder if all "request logging" (no matter node/admin level vs 
specific to a SolrCore) ought to be logged by one call sight using one 
"Request" SLF4J logger and with the MarkerFilter of the handler.

> Add a logging filter marker for /admin/ping requests to be silenced via 
> log4j2.xml
> --
>
> Key: SOLR-15012
> URL: https://issues.apache.org/jira/browse/SOLR-15012
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
>
> While looking at logs, I have observed a lot of noise from /admin/ping 
> requests which is often called to ping core and all replicas coming from 
> org.apache.solr.core.SolrCore.Request.  I think it makes sense to add a 
> marker to SolrCore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml

2020-11-23 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237590#comment-17237590
 ] 

David Smiley commented on SOLR-15012:
-

Moreover... I wonder if all "request logging" (no matter node/admin level vs 
specific to a SolrCore) ought to be logged by one call sight using one 
"Request" SLF4J logger and with the MarkerFilter of the handler.

> Add a logging filter marker for /admin/ping requests to be silenced via 
> log4j2.xml
> --
>
> Key: SOLR-15012
> URL: https://issues.apache.org/jira/browse/SOLR-15012
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
>
> While looking at logs, I have observed a lot of noise from /admin/ping 
> requests which is often called to ping core and all replicas coming from 
> org.apache.solr.core.SolrCore.Request.  I think it makes sense to add a 
> marker to SolrCore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-23 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-14973.
-
Resolution: Fixed

Marking as fixed in 8.7. Thanks [~schuch] for finding the fix. Thanks 
[~shuremov] for confirming fixed in 8.7. Thanks [~tallison] for jumping in :D

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Assignee: Tim Allison
>Priority: Major
>  Labels: tika-parsers
> Fix For: 8.7
>
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-23 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14973:

Fix Version/s: 8.7

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Priority: Major
>  Labels: tika-parsers
> Fix For: 8.7
>
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-23 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden reassigned SOLR-14973:
---

Assignee: Tim Allison

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Assignee: Tim Allison
>Priority: Major
>  Labels: tika-parsers
> Fix For: 8.7
>
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15010) Missing jstack warning is alarming, when using bin/solr as client interface to solr

2020-11-23 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237571#comment-17237571
 ] 

Christine Poerschke commented on SOLR-15010:


{quote}... Thoughts on maybe only conducting this check if you are running 
{{bin/solr start}} or one of the other commands that is actually starting Solr 
as a process?
{quote}
Sounds good to me.

> Missing jstack warning is alarming, when using bin/solr as client interface 
> to solr
> ---
>
> Key: SOLR-15010
> URL: https://issues.apache.org/jira/browse/SOLR-15010
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.7
>Reporter: David Eric Pugh
>Priority: Minor
>
> In SOLR-14442 we added a warning if jstack wasn't found.   I notice that I 
> use the bin/solr command a lot as a client, so bin solr zk or bin solr 
> healthcheck. 
> For example:
> {{docker exec solr1 solr zk cp /security.json zk:security.json -z zoo1:2181}}
> All of these emit the message:
> The currently defined JAVA_HOME (/usr/local/openjdk-11) refers to a location
> where java was found but jstack was not found. Continuing.
> This is somewhat alarming, and then becomes annoying.   Thoughts on maybe 
> only conducting this check if you are running {{bin/solr start}} or one of 
> the other commands that is actually starting Solr as a process?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15008) Avoid building OrdinalMap for each facet

2020-11-23 Thread Radu Gheorghe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237533#comment-17237533
 ] 

Radu Gheorghe commented on SOLR-15008:
--

Thanks for pointing to the optimization, it's nice to know! But I don't think 
this was the problem. We initially added a query that did match a lot of docs. 
But it didn't seem to warm up anything. I'm sure there was a (syntax?) error 
somewhere, but we didn't see anything in the logs or something like that. Plus, 
copy-pasting the query parameters worked as expected.

We eventually added legacy facets and we stopped there, but I thought I should 
forward this info, just in case this is a different (known?) issue. I could 
look into it separately.

> Avoid building OrdinalMap for each facet
> 
>
> Key: SOLR-15008
> URL: https://issues.apache.org/jira/browse/SOLR-15008
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Affects Versions: 8.7
>Reporter: Radu Gheorghe
>Priority: Major
>  Labels: performance
> Attachments: Screenshot 2020-11-19 at 12.01.55.png, writes_commits.png
>
>
> I'm running against the following scenario:
>  * [JSON] faceting on a high cardinality field
>  * few matching documents => few unique values
> Yet the query almost always takes a long time. Here's an example taking 
> almost 4s for ~300 documents and unique values (edited a bit):
>  
> {code:java}
> "QTime":3869,
> "params":{
>   "json":"{\"query\": \"*:*\",
>   \"filter\": [\"type:test_type\", \"date:[1603670360 TO 1604361599]\", 
> \"unique_id:49866\"]
>   \"facet\": 
> {\"keywords\":{\"type\":\"terms\",\"field\":\"keywords\",\"limit\":20,\"mincount\":20}}}",
>   "rows":"0"}},
>   
> "response":{"numFound":333,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[]
>   },
>   "facets":{
> "count":333,
> "keywords":{
>   "buckets":[{
>   "val":"value1",
>   "count":124},
>   ...
> {code}
> I did some [profiling with our Sematext 
> Monitoring|https://sematext.com/docs/monitoring/on-demand-profiling/] and it 
> points me to OrdinalMap building (see attached screenshot). If I read the 
> code right, an OrdinalMap is built with every facet. And it's expensive since 
> there are many unique values in the shard (previously, there we more smaller 
> shards, making latency better, but this approach doesn't scale for this 
> particular use-case).
> If I'm right up to this point, I see a couple of potential improvements, 
> [inspired from 
> Elasticsearch|#search-aggregations-bucket-terms-aggregation-execution-hint]:
>  # *Keep the OrdinalMap cached until the next softCommit*, so that only the 
> first query takes the penalty
>  # *Allow faceting on actual values (a Map) rather than ordinals*, for 
> situations like the one above where we have few matching documents. We could 
> potentially auto-detect this scenario (e.g. by configuring a threshold) and 
> use a Map when there are few documents
> I'm curious about what you're thinking:
>  * would a PR/patch be welcome for any of the two ideas above?
>  * do you see better options? am I missing something?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #2092: SOLR-15009 Propogate IOException from DF.exists

2020-11-23 Thread GitBox


madrob commented on pull request #2092:
URL: https://github.com/apache/lucene-solr/pull/2092#issuecomment-732299521


   @TomMD musebot seems to be stuck - any ideas?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15008) Avoid building OrdinalMap for each facet

2020-11-23 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237511#comment-17237511
 ] 

Michael Gibney commented on SOLR-15008:
---

I think [this 
optimization|https://github.com/apache/lucene-solr/blob/9bfaca0606968ed970d9d12d871f977e2655765b/solr/core/src/java/org/apache/solr/search/facet/FacetFieldProcessorByArrayDV.java#L94-L98]
 in {{FacetFieldProcessorByArrayDV.collectDocs()}} is the reason you couldn't 
do the warming with {{json.facet}} and a query that matches no docs. There's 
evidently not an analogous optimization in "legacy facets", which is why that 
worked (and I'd guess (\?) that this optimization won't be added to legacy 
facet code anytime soon).

In any event, sounds like a good outcome, and I'm happy to have been able to 
help (and no worries re: the elephant :)).

> Avoid building OrdinalMap for each facet
> 
>
> Key: SOLR-15008
> URL: https://issues.apache.org/jira/browse/SOLR-15008
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Affects Versions: 8.7
>Reporter: Radu Gheorghe
>Priority: Major
>  Labels: performance
> Attachments: Screenshot 2020-11-19 at 12.01.55.png, writes_commits.png
>
>
> I'm running against the following scenario:
>  * [JSON] faceting on a high cardinality field
>  * few matching documents => few unique values
> Yet the query almost always takes a long time. Here's an example taking 
> almost 4s for ~300 documents and unique values (edited a bit):
>  
> {code:java}
> "QTime":3869,
> "params":{
>   "json":"{\"query\": \"*:*\",
>   \"filter\": [\"type:test_type\", \"date:[1603670360 TO 1604361599]\", 
> \"unique_id:49866\"]
>   \"facet\": 
> {\"keywords\":{\"type\":\"terms\",\"field\":\"keywords\",\"limit\":20,\"mincount\":20}}}",
>   "rows":"0"}},
>   
> "response":{"numFound":333,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[]
>   },
>   "facets":{
> "count":333,
> "keywords":{
>   "buckets":[{
>   "val":"value1",
>   "count":124},
>   ...
> {code}
> I did some [profiling with our Sematext 
> Monitoring|https://sematext.com/docs/monitoring/on-demand-profiling/] and it 
> points me to OrdinalMap building (see attached screenshot). If I read the 
> code right, an OrdinalMap is built with every facet. And it's expensive since 
> there are many unique values in the shard (previously, there we more smaller 
> shards, making latency better, but this approach doesn't scale for this 
> particular use-case).
> If I'm right up to this point, I see a couple of potential improvements, 
> [inspired from 
> Elasticsearch|#search-aggregations-bucket-terms-aggregation-execution-hint]:
>  # *Keep the OrdinalMap cached until the next softCommit*, so that only the 
> first query takes the penalty
>  # *Allow faceting on actual values (a Map) rather than ordinals*, for 
> situations like the one above where we have few matching documents. We could 
> potentially auto-detect this scenario (e.g. by configuring a threshold) and 
> use a Map when there are few documents
> I'm curious about what you're thinking:
>  * would a PR/patch be welcome for any of the two ideas above?
>  * do you see better options? am I missing something?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide

2020-11-23 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237501#comment-17237501
 ] 

Gus Heck commented on SOLR-15014:
-

Actually I got brave and let it run longer, and it seems to stop after 30 
replicas have been created, leaving me with 31 replicas of shard 1 (and still 1 
of shard 2)

> Runaway replica creation with autoscaling example from ref guide
> 
>
> Key: SOLR-15014
> URL: https://issues.apache.org/jira/browse/SOLR-15014
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.3
>Reporter: Gus Heck
>Priority: Major
> Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, 
> image-2020-11-23-11-37-15-124.png
>
>
> Although the present autoscaling implementation is deprecated, I have a 
> client intent on using it, and in trying to create rules that ensure all 
> replicas on all nodes, I wound up getting into a state where one replica was 
> (apparently) infinitely creating new copies of itself. The boiled down steps 
> to reproduce:
> Create a 4 node cluster locally for testing from a checkout of the tagged 
> version for 8.6.3
> (Using solr/cloud-dev/cloud.sh)
> {code:java}
> ./cloud.sh  new -r   
> {code}
> Create a collection
> {code:java}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1
> {code}
> Add this trigger from the ref guide 
> ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):]
> {code:java}
> {
>   "set-trigger": {
> "name": "node_added_trigger",
> "event": "nodeAdded",
> "waitFor": "5s",
> "preferredOperation": "ADDREPLICA",
> "replicaType": "PULL"
>   }
> }
> {code}
> Reboot the cluster, and when it comes up infinite replica creation ensues 
> (attaching screen shot of admin UI showing replicated shard momentarily)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15015) Add support for Interleaving Algorithm parameter in Learning To Rank

2020-11-23 Thread Alessandro Benedetti (Jira)
Alessandro Benedetti created SOLR-15015:
---

 Summary: Add support for Interleaving Algorithm parameter in 
Learning To Rank
 Key: SOLR-15015
 URL: https://issues.apache.org/jira/browse/SOLR-15015
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: contrib - LTR
Reporter: Alessandro Benedetti


Interleaving has been contributed with SOLR-14560 and it now supports just one 
algorithm ( Team Draft)

To facilitate contributions of new algorithm the scope of this issue is to 
support a new parameter : 'interleavingAlgorithm' (tentative)

Default value will be team draft interleaving.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide

2020-11-23 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237495#comment-17237495
 ] 

Gus Heck commented on SOLR-15014:
-

Discussion on slack suggests that given the fact that this functionality is 
going away, the primary thing here will be to remove the example from the ref 
guide. (or if folks have an idea how to mitigate it with additional 
configuration, add that to the ref guide, but I haven't found such yet)

> Runaway replica creation with autoscaling example from ref guide
> 
>
> Key: SOLR-15014
> URL: https://issues.apache.org/jira/browse/SOLR-15014
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.3
>Reporter: Gus Heck
>Priority: Major
> Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, 
> image-2020-11-23-11-37-15-124.png
>
>
> Although the present autoscaling implementation is deprecated, I have a 
> client intent on using it, and in trying to create rules that ensure all 
> replicas on all nodes, I wound up getting into a state where one replica was 
> (apparently) infinitely creating new copies of itself. The boiled down steps 
> to reproduce:
> Create a 4 node cluster locally for testing from a checkout of the tagged 
> version for 8.6.3
> (Using solr/cloud-dev/cloud.sh)
> {code:java}
> ./cloud.sh  new -r   
> {code}
> Create a collection
> {code:java}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1
> {code}
> Add this trigger from the ref guide 
> ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):]
> {code:java}
> {
>   "set-trigger": {
> "name": "node_added_trigger",
> "event": "nodeAdded",
> "waitFor": "5s",
> "preferredOperation": "ADDREPLICA",
> "replicaType": "PULL"
>   }
> }
> {code}
> Reboot the cluster, and when it comes up infinite replica creation ensues 
> (attaching screen shot of admin UI showing replicated shard momentarily)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide

2020-11-23 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-15014:

Attachment: Screen Shot 2020-11-23 at 11.40.29 AM.png

> Runaway replica creation with autoscaling example from ref guide
> 
>
> Key: SOLR-15014
> URL: https://issues.apache.org/jira/browse/SOLR-15014
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.3
>Reporter: Gus Heck
>Priority: Major
> Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, 
> image-2020-11-23-11-37-15-124.png
>
>
> Although the present autoscaling implementation is deprecated, I have a 
> client intent on using it, and in trying to create rules that ensure all 
> replicas on all nodes, I wound up getting into a state where one replica was 
> (apparently) infinitely creating new copies of itself. The boiled down steps 
> to reproduce:
> Create a 4 node cluster locally for testing from a checkout of the tagged 
> version for 8.6.3
> (Using solr/cloud-dev/cloud.sh)
> {code:java}
> ./cloud.sh  new -r   
> {code}
> Create a collection
> {code:java}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1
> {code}
> Add this trigger from the ref guide 
> ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):]
> {code:java}
> {
>   "set-trigger": {
> "name": "node_added_trigger",
> "event": "nodeAdded",
> "waitFor": "5s",
> "preferredOperation": "ADDREPLICA",
> "replicaType": "PULL"
>   }
> }
> {code}
> Reboot the cluster, and when it comes up infinite replica creation ensues 
> (attaching screen shot of admin UI showing replicated shard momentarily)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide

2020-11-23 Thread Gus Heck (Jira)
Gus Heck created SOLR-15014:
---

 Summary: Runaway replica creation with autoscaling example from 
ref guide
 Key: SOLR-15014
 URL: https://issues.apache.org/jira/browse/SOLR-15014
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: AutoScaling
Affects Versions: 8.6.3
Reporter: Gus Heck
 Attachments: image-2020-11-23-11-37-15-124.png

Although the present autoscaling implementation is deprecated, I have a client 
intent on using it, and in trying to create rules that ensure all replicas on 
all nodes, I wound up getting into a state where one replica was (apparently) 
infinitely creating new copies of itself. The boiled down steps to reproduce:

Create a 4 node cluster locally for testing from a checkout of the tagged 
version for 8.6.3

(Using solr/cloud-dev/cloud.sh)
{code:java}
./cloud.sh  new -r   
{code}
Create a collection
{code:java}
http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1
{code}
Add this trigger from the ref guide 
([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):]
{code:java}
{
  "set-trigger": {
"name": "node_added_trigger",
"event": "nodeAdded",
"waitFor": "5s",
"preferredOperation": "ADDREPLICA",
"replicaType": "PULL"
  }
}
{code}
Reboot the cluster, and when it comes up infinite replica creation ensues 
(attaching screen shot of admin UI showing replicated shard momentarily)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15008) Avoid building OrdinalMap for each facet

2020-11-23 Thread Radu Gheorghe (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radu Gheorghe resolved SOLR-15008.
--
Resolution: Won't Fix

> Avoid building OrdinalMap for each facet
> 
>
> Key: SOLR-15008
> URL: https://issues.apache.org/jira/browse/SOLR-15008
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Affects Versions: 8.7
>Reporter: Radu Gheorghe
>Priority: Major
>  Labels: performance
> Attachments: Screenshot 2020-11-19 at 12.01.55.png, writes_commits.png
>
>
> I'm running against the following scenario:
>  * [JSON] faceting on a high cardinality field
>  * few matching documents => few unique values
> Yet the query almost always takes a long time. Here's an example taking 
> almost 4s for ~300 documents and unique values (edited a bit):
>  
> {code:java}
> "QTime":3869,
> "params":{
>   "json":"{\"query\": \"*:*\",
>   \"filter\": [\"type:test_type\", \"date:[1603670360 TO 1604361599]\", 
> \"unique_id:49866\"]
>   \"facet\": 
> {\"keywords\":{\"type\":\"terms\",\"field\":\"keywords\",\"limit\":20,\"mincount\":20}}}",
>   "rows":"0"}},
>   
> "response":{"numFound":333,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[]
>   },
>   "facets":{
> "count":333,
> "keywords":{
>   "buckets":[{
>   "val":"value1",
>   "count":124},
>   ...
> {code}
> I did some [profiling with our Sematext 
> Monitoring|https://sematext.com/docs/monitoring/on-demand-profiling/] and it 
> points me to OrdinalMap building (see attached screenshot). If I read the 
> code right, an OrdinalMap is built with every facet. And it's expensive since 
> there are many unique values in the shard (previously, there we more smaller 
> shards, making latency better, but this approach doesn't scale for this 
> particular use-case).
> If I'm right up to this point, I see a couple of potential improvements, 
> [inspired from 
> Elasticsearch|#search-aggregations-bucket-terms-aggregation-execution-hint]:
>  # *Keep the OrdinalMap cached until the next softCommit*, so that only the 
> first query takes the penalty
>  # *Allow faceting on actual values (a Map) rather than ordinals*, for 
> situations like the one above where we have few matching documents. We could 
> potentially auto-detect this scenario (e.g. by configuring a threshold) and 
> use a Map when there are few documents
> I'm curious about what you're thinking:
>  * would a PR/patch be welcome for any of the two ideas above?
>  * do you see better options? am I missing something?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15008) Avoid building OrdinalMap for each facet

2020-11-23 Thread Radu Gheorghe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237488#comment-17237488
 ] 

Radu Gheorghe commented on SOLR-15008:
--

> Yes, I'd be inclined to agree (pending confirmation that warming query fixes 
>the issue).

I can confirm: we enabled warming and pretty much all facets are now <100ms 
(the ones we tested today were 5s+ before quite constantly). CPU didn't 
significantly increase, nor did heap usage.

An interesting thing, though: we couldn't add a json.facet to the warmup query. 
We ended up adding a regular facet. By semi-accident, it has mincount=0 and the 
query matches no docs (this part was intentional, so we don't have an expensive 
warmup). It seems to do the trick on all collections.

I'll close this issue for now. If using a "facet-by-value" would be 
interesting, I guess we can reopen it or open a new one.

Thanks a lot Michael. And, to address the elephant in the room, I realize this 
should have been a mailing list thread, sorry. I was too sure of my research 
that the caching part wasn't implemented :(

> Avoid building OrdinalMap for each facet
> 
>
> Key: SOLR-15008
> URL: https://issues.apache.org/jira/browse/SOLR-15008
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Affects Versions: 8.7
>Reporter: Radu Gheorghe
>Priority: Major
>  Labels: performance
> Attachments: Screenshot 2020-11-19 at 12.01.55.png, writes_commits.png
>
>
> I'm running against the following scenario:
>  * [JSON] faceting on a high cardinality field
>  * few matching documents => few unique values
> Yet the query almost always takes a long time. Here's an example taking 
> almost 4s for ~300 documents and unique values (edited a bit):
>  
> {code:java}
> "QTime":3869,
> "params":{
>   "json":"{\"query\": \"*:*\",
>   \"filter\": [\"type:test_type\", \"date:[1603670360 TO 1604361599]\", 
> \"unique_id:49866\"]
>   \"facet\": 
> {\"keywords\":{\"type\":\"terms\",\"field\":\"keywords\",\"limit\":20,\"mincount\":20}}}",
>   "rows":"0"}},
>   
> "response":{"numFound":333,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[]
>   },
>   "facets":{
> "count":333,
> "keywords":{
>   "buckets":[{
>   "val":"value1",
>   "count":124},
>   ...
> {code}
> I did some [profiling with our Sematext 
> Monitoring|https://sematext.com/docs/monitoring/on-demand-profiling/] and it 
> points me to OrdinalMap building (see attached screenshot). If I read the 
> code right, an OrdinalMap is built with every facet. And it's expensive since 
> there are many unique values in the shard (previously, there we more smaller 
> shards, making latency better, but this approach doesn't scale for this 
> particular use-case).
> If I'm right up to this point, I see a couple of potential improvements, 
> [inspired from 
> Elasticsearch|#search-aggregations-bucket-terms-aggregation-execution-hint]:
>  # *Keep the OrdinalMap cached until the next softCommit*, so that only the 
> first query takes the penalty
>  # *Allow faceting on actual values (a Map) rather than ordinals*, for 
> situations like the one above where we have few matching documents. We could 
> potentially auto-detect this scenario (e.g. by configuring a threshold) and 
> use a Map when there are few documents
> I'm curious about what you're thinking:
>  * would a PR/patch be welcome for any of the two ideas above?
>  * do you see better options? am I missing something?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman commented on a change in pull request #1972: SOLR-14915: Prometheus-exporter does not depend on Solr-core any longer

2020-11-23 Thread GitBox


HoustonPutman commented on a change in pull request #1972:
URL: https://github.com/apache/lucene-solr/pull/1972#discussion_r528841890



##
File path: 
solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/exporter/MetricsConfiguration.java
##
@@ -77,22 +79,22 @@ public PrometheusExporterSettings getSettings() {
 return searchConfiguration;
   }
 
-  public static MetricsConfiguration from(String path) throws Exception {
+  public static MetricsConfiguration from(String resource) throws Exception {
 // See solr-core XmlConfigFile
 final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
 try {
   dbf.setXIncludeAware(true);
   dbf.setNamespaceAware(true);
 } catch (UnsupportedOperationException e) {
-  log.warn("{} XML parser doesn't support XInclude option", path);
+  log.warn("{} XML parser doesn't support XInclude option", resource);
 }
 
 Document document;
-File file = new File(path);
-if (file.isFile()) {
-  document = dbf.newDocumentBuilder().parse(file);
+Path path = Path.of(resource);
+if (Files.exists(path)) {
+  document = 
dbf.newDocumentBuilder().parse(path.toAbsolutePath().toString());

Review comment:
   Good catch. I tested it in multiple configurations, but there's probably 
some edge case it doesn't work for. I'll use `path.toUri().toASCIIString()` 
since the `parse(file)` method just gets the URI of the file and passes that to 
the `parse(uri)` implementation anyways. Might as well skip the 
middleman-method.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15013) Reproducing test failure TestFieldCacheSort 8x

2020-11-23 Thread Erick Erickson (Jira)
Erick Erickson created SOLR-15013:
-

 Summary: Reproducing test failure TestFieldCacheSort 8x
 Key: SOLR-15013
 URL: https://issues.apache.org/jira/browse/SOLR-15013
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Erick Erickson


ant test -Dtestcase=TestFieldCacheSort 
-Dtests.method=testEmptyStringVsNullStringSort -Dtests.seed=2E14D932C133811F 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=hr-BA 
-Dtests.timezone=America/Scoresbysund -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1

 

 

[junit4] Started J0 PID(96093@localhost).
 [junit4] Suite: org.apache.solr.uninverting.TestFieldCacheSort
 [junit4] 2> 1334 INFO 
(SUITE-TestFieldCacheSort-seed#[2E14D932C133811F]-worker) [ ] 
o.a.s.SolrTestCase Setting 'solr.default.confdir' system property to 
test-framework derived value of 
'/Users/Erick/apache/solr/solrtest8/solr/server/solr/configsets/_default/conf'
 [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestFieldCacheSort 
-Dtests.method=testEmptyStringVsNullStringSort -Dtests.seed=2E14D932C133811F 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=hr-BA -Dtests.timezone=America/Scoresbysund -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
 [junit4] FAILURE 0.40s | TestFieldCacheSort.testEmptyStringVsNullStringSort <<<
 [junit4] > Throwable #1: java.lang.AssertionError: expected:<1> but was:<0>
 [junit4] > at 
__randomizedtesting.SeedInfo.seed([2E14D932C133811F:4FF7EBE5B95287AF]:0)
 [junit4] > at 
org.apache.solr.uninverting.TestFieldCacheSort.testEmptyStringVsNullStringSort(TestFieldCacheSort.java:1610)
 [junit4] > at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit4] > at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 [junit4] > at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 [junit4] > at java.base/java.lang.reflect.Method.invoke(Method.java:566)
 [junit4] > at java.base/java.lang.Thread.run(Thread.java:834)
 [junit4] 2> NOTE: test params are: codec=CheapBastard, 
sim=Asserting(RandomSimilarity(queryNorm=false): \{t=DFR I(F)BZ(0.3)}), 
locale=hr-BA, timezone=America/Scoresbysund
 [junit4] 2> NOTE: Mac OS X 10.16 x86_64/AdoptOpenJDK 11.0.5 
(64-bit)/cpus=12,threads=1,free=468375472,total=536870912



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14413) allow timeAllowed and cursorMark parameters

2020-11-23 Thread Bram Van Dam (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237423#comment-17237423
 ] 

Bram Van Dam commented on SOLR-14413:
-

[~slackhappy] [~mdrob] Sounds good to me. When partialResults is set, you know 
it's possible that some items are missing from the result set. If that's not 
acceptable, you should retry with a larger timeout, or inform the user that 
their query is unacceptable. This is definitely a tradeoff I can live with. 
Thanks for the effort!

I hope someone will be kind enough to merge this <3

> allow timeAllowed and cursorMark parameters
> ---
>
> Key: SOLR-14413
> URL: https://issues.apache.org/jira/browse/SOLR-14413
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: John Gallagher
>Priority: Minor
> Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, 
> SOLR-14413-jg-update2.patch, SOLR-14413-jg-update3.patch, SOLR-14413.patch, 
> Screen Shot 2020-10-23 at 10.08.26 PM.png, Screen Shot 2020-10-23 at 10.09.11 
> PM.png, image-2020-08-18-16-56-41-736.png, image-2020-08-18-16-56-59-178.png, 
> image-2020-08-21-14-18-36-229.png, timeallowed_cursormarks_results.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and 
> timeAllowed parameters were not allowed in combination ("Can not search using 
> both cursorMark and timeAllowed")
> , from [QueryComponent.java|#L359]]:
>  
> {code:java}
>  
>  if (null != rb.getCursorMark() && 0 < timeAllowed) {
>   // fundamentally incompatible
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not 
> search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + 
> CommonParams.TIME_ALLOWED);
> } {code}
> While theoretically impure to use them in combination, it is often desirable 
> to support cursormarks-style deep paging and attempt to protect Solr nodes 
> from runaway queries using timeAllowed, in the hopes that most of the time, 
> the query completes in the allotted time, and there is no conflict.
>  
> However if the query takes too long, it may be preferable to end the query 
> and protect the Solr node and provide the user with a somewhat inaccurate 
> sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is 
> frequently used to prevent runaway load.  In fact, cursorMark and 
> shards.tolerant are allowed in combination, so any argument in favor of 
> purity would be a bit muddied in my opinion.
>  
> This was discussed once in the mailing list that I can find: 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E]
>  It did not look like there was strong support for preventing the combination.
>  
> I have tested cursorMark and timeAllowed combination together, and even when 
> partial results are returned because the timeAllowed is exceeded, the 
> cursorMark response value is still valid and reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian

2020-11-23 Thread GitBox


iverase commented on pull request #2094:
URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-732187741


   We have the same situation in PackedInts and FST where you cannot just pass 
a wrapped object.  
   There is as well the issue where the serialisation / deserialisation is 
endian dependent. In DocIdsWriter we serialise using:
   
   ```
 out.writeShort(out, (short) (docIds[start + i] >>> 8));
 out.writeByte((byte) docIds[start + i]);
   ```
   
   But deserialise using:
   
   ```
  long l1 = in readLong(in);
 long l2 = in readLong(in);
 long l3 = in readLong(in);
   ```
   
   There is a similar situation in `CompressingStoredFieldsWriter`. 
   
   I have another iteration to see if we can simplify but just wrapping the 
IndexOutput / IndexInput now that I have better understanding on the problem.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian

2020-11-23 Thread GitBox


jpountz commented on pull request #2094:
URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-732177038


   If we only have a few cases like that, maybe we could fork the 
writeHeader/readHeader logic inside BKDWriter/BKDReader so that we can apply 
different migration rules to these calls than to 
`CodecUtil#readHeader/writeHeader`?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian

2020-11-23 Thread GitBox


iverase commented on pull request #2094:
URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-732172442


   That was my first approach but it become too hairy once I started to process 
headers and footers without wrapping the IndexOutput / IndexInput. One example 
is in the BKD tree we have the following line:
   

https://github.com/apache/lucene-solr/blob/59b17366ff45d958810b1f8e4950eebd93f1b20d/lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java#L993
   
   That means we will need a reference to the unwrapped IndexOutput to call 
this line. I did not want to change method signatures or move code around on 
this first pass so I went to manually revert endianness when needed so we could 
have a good understanding of the places where work is needed.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian

2020-11-23 Thread GitBox


dweiss commented on pull request #2094:
URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-732161113


   I agree with Adrien - I thought (but please correct me if I'm wrong) that a 
single wrapper would be needed to keep the code compatible with existing 
indexes and dropping this wrapper would make everything work without those 
numerous calls to manual byte-shuffling in the "reverser"... I'm sorry if I 
fail to see the bigger picture here but looking at the diff it seems more 
complicated than it was before.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9047) Directory APIs should be little endian

2020-11-23 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237366#comment-17237366
 ] 

Dawid Weiss commented on LUCENE-9047:
-

Hmm I don't know. I look at this patch and I can't dodge the feeling that 
somehow it's become more complex and weird than it was before... All those 
calls to EndiannessReverserUtil kill me. Even the byte-by-byte snippets have 
become expanded into a multitude of local variables to be assembled again (I 
prefer the terse version that doesn't use local variables, to be honest).



> Directory APIs should be little endian
> --
>
> Key: LUCENE-9047
> URL: https://issues.apache.org/jira/browse/LUCENE-9047
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We started discussing this on LUCENE-9027. It's a shame that we need to keep 
> reversing the order of bytes all the time because our APIs are big endian 
> while the vast majority of architectures are little endian.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian

2020-11-23 Thread GitBox


jpountz commented on pull request #2094:
URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-732153572


   Wow, exciting! I'm curious why you didn't generalize usage of 
`EndiannessReverserIndexInput` to all index formats like you did for 
SegmentInfos. My expectation is that it would have helped keep the change 
contained to very few lines in the constructor of the various readers/writers 
rather than scattered in all places that read or write ints/longs?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9619) Move Points from a visitor API to a custor-style API?

2020-11-23 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237347#comment-17237347
 ] 

Adrien Grand commented on LUCENE-9619:
--

Here is what I have in mind in terms of API. The main downside compared to 
today is that it makes more assumptions about how points are implemented under 
the hood, e.g. it suggests a tree structure, which the current API doesn't. But 
I like that it would give us more control over how matching is performed as 
mentioned in the issue description. I still tried to push too many requirements 
to possible implementations, e.g. not enforcing an arity of 2 on inner nodes 
and not enforcing that the tree is balanced.

{code:java}
import java.io.IOException;
import java.util.function.IntConsumer;

import org.apache.lucene.search.DocIdSetIterator;

public abstract class PointValues {

  /* Global statistics, that don't change when moving from a node to another 
node. */

  /** Returns how many dimensions are represented in the values */
  public abstract int getNumDimensions() throws IOException;

  /** Returns how many dimensions are used for the index */
  public abstract int getNumIndexDimensions() throws IOException;

  /** Returns the number of bytes per dimension */
  public abstract int getBytesPerDimension() throws IOException;

  /** Return the total number of documents that have a value for this field, 
across all nodes. */
  public abstract int getTotalDocCount();

  /* Per-node statistics */
  
  /** Return the minimum packed value of the current node. */
  public abstract byte[] getMinPackedValue();

  /** Return the maximum packed value of the current node. */
  public abstract byte[] getMaxPackedValue();

  /** Return the total number of points under the current node. On the root 
node this returns the
   *  total number of points in the field on the current segment. */
  public abstract long size();

  /* API to walk the tree. */

  /** Move to the first child node and return {@code true} upon success. 
Returns {@code false} for
   *  leaf nodes and {@code true} otherwise. */
  public abstract boolean moveToChild();

  /** Move to the parent node and return {@code true} upon success. Returns 
{@code false} for the
   *  root node and {@code true} otherwise. */
  public abstract boolean moveToParent();

  /** Move to the next sibling node and return {@code true} upon success. 
Returns {@code false} if
   *  the current node has no more siblings. */
  public abstract boolean moveToSibling();

  /** A visitor for the content of the tree. */
  @FunctionalInterface
  public interface IntersectVisitor {

/** Called for all documents in a leaf cell that crosses the query.  The 
consumer
 *  should scrutinize the packedValue to decide whether to accept it.  In 
the 1D case,
 *  values are visited in increasing order, and in the case of ties, in 
increasing
 *  docID order. */
void visit(int docID, byte[] packedValue) throws IOException;

/** Similar to {@link IntersectVisitor#visit(int, byte[])} but in this case 
the packedValue
 * can have more than one docID associated to it. The provided iterator 
should not escape the
 * scope of this method so that implementations of PointValues are free to 
reuse it,*/
default void visit(DocIdSetIterator iterator, byte[] packedValue) throws 
IOException {
  int docID;
  while ((docID = iterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
visit(docID, packedValue);
  }
}
  }

  /** Visit all (document,value) pairs under the current node. {@link 
IntersectVisitor#visit} will
   *  be called {@link #size()} times. */
  public abstract void intersect(IntersectVisitor visitor);

  /** Visit all documents under the current node. {@link IntConsumer#accept} 
will be called
   *  {@link #size()} times. */
  public abstract void intersectAll(IntConsumer visitor);
}

{code}

Opinions?

> Move Points from a visitor API to a custor-style API?
> -
>
> Key: LUCENE-9619
> URL: https://issues.apache.org/jira/browse/LUCENE-9619
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Points' visitor API work well but there are a couple things we could make 
> better if we moved to a cursor API, e.g.
>  - Term queries could return a DocIdSetIterator without having to materialize 
> a BitSet.
>  - Nearest-neighbor search could work on top of the regular API instead of 
> casting to BKDReader 
> https://github.com/apache/lucene-solr/blob/6a7131ee246d700c2436a85ddc537575de2aeacf/lucene/sandbox/src/java/org/apache/lucene/sandbox/document/FloatPointNearestNeighbor.java#L296
>  - We could optimize counting the number of matches of a query by adding the 
> number of points in a leaf without visiting documents where there are no 

[GitHub] [lucene-solr] iverase opened a new pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian

2020-11-23 Thread GitBox


iverase opened a new pull request #2094:
URL: https://github.com/apache/lucene-solr/pull/2094


   Directory API is now little endian. Note that codecs still work on Big 
endian for backwards compatibility, therefore they reverse the bytes whenever 
they are writing / reading short, ints and longs.
   
   CodecUtils for header and footers has been modified to be little Indian. 
Still the version and checksum will be written / read reversing bytes for 
backwards compatibility.
   
   SegmentInfos is read / written in little endian, for previous version, the 
IndexInput is wrapped for backwards compatibility.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml

2020-11-23 Thread Nazerke Seidan (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237269#comment-17237269
 ] 

Nazerke Seidan edited comment on SOLR-15012 at 11/23/20, 10:19 AM:
---

[~gus], you have added a filter marker to a logger 
(org.apache.solr.servlet.HttpSolrCall). Similarly, we could add a marker to the 
SolrCore logger. 


was (Author: nazerke):
[~gus], you have added a filter marker to a logger 
(org.apache.solr.servlet.HttpSolrCall). Similarly, we could add a marker to the 
logger of SolrCore. 

> Add a logging filter marker for /admin/ping requests to be silenced via 
> log4j2.xml
> --
>
> Key: SOLR-15012
> URL: https://issues.apache.org/jira/browse/SOLR-15012
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
>
> While looking at logs, I have observed a lot of noise from /admin/ping 
> requests which is often called to ping core and all replicas coming from 
> org.apache.solr.core.SolrCore.Request.  I think it makes sense to add a 
> marker to SolrCore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml

2020-11-23 Thread Nazerke Seidan (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237269#comment-17237269
 ] 

Nazerke Seidan commented on SOLR-15012:
---

[~gus], you have added a filter marker to a logger 
(org.apache.solr.servlet.HttpSolrCall). Similarly, we could add a marker to the 
logger of SolrCore. 

> Add a logging filter marker for /admin/ping requests to be silenced via 
> log4j2.xml
> --
>
> Key: SOLR-15012
> URL: https://issues.apache.org/jira/browse/SOLR-15012
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
>
> While looking at logs, I have observed a lot of noise from /admin/ping 
> requests which is often called to ping core and all replicas coming from 
> org.apache.solr.core.SolrCore.Request.  I think it makes sense to add a 
> marker to SolrCore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml

2020-11-23 Thread Nazerke Seidan (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nazerke Seidan updated SOLR-15012:
--
Description: While looking at logs, I have observed a lot of noise from 
/admin/ping requests which is often called to ping core and all replicas coming 
from org.apache.solr.core.SolrCore.Request.  I think it makes sense to add a 
marker to SolrCore.   (was: While looking at logs, I have observed a lot of 
noise from /admin/ping requests which is often called to ping core and all 
replicas coming from org.apache.solr.core.SolrCore.Request. [~gus], you have 
added a filter marker to a logger (org.apache.solr.servlet.HttpSolrCall). I 
think it makes sense to add a marker to SolrCore. )

> Add a logging filter marker for /admin/ping requests to be silenced via 
> log4j2.xml
> --
>
> Key: SOLR-15012
> URL: https://issues.apache.org/jira/browse/SOLR-15012
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Nazerke Seidan
>Priority: Minor
>
> While looking at logs, I have observed a lot of noise from /admin/ping 
> requests which is often called to ping core and all replicas coming from 
> org.apache.solr.core.SolrCore.Request.  I think it makes sense to add a 
> marker to SolrCore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15012) Add a logging filter marker for /admin/ping requests to be silenced via log4j2.xml

2020-11-23 Thread Nazerke Seidan (Jira)
Nazerke Seidan created SOLR-15012:
-

 Summary: Add a logging filter marker for /admin/ping requests to 
be silenced via log4j2.xml
 Key: SOLR-15012
 URL: https://issues.apache.org/jira/browse/SOLR-15012
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Nazerke Seidan


While looking at logs, I have observed a lot of noise from /admin/ping requests 
which is often called to ping core and all replicas coming from 
org.apache.solr.core.SolrCore.Request. [~gus], you have added a filter marker 
to a logger (org.apache.solr.servlet.HttpSolrCall). I think it makes sense to 
add a marker to SolrCore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9620) Add Weight#count(LeafReaderContext)

2020-11-23 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-9620:


 Summary: Add Weight#count(LeafReaderContext)
 Key: LUCENE-9620
 URL: https://issues.apache.org/jira/browse/LUCENE-9620
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


We have IndexSearcher#count today, which tries to optimize counting for 
TermQuery and MatchAllDocsQuery, and falls back to BulkScorer + 
TotalHitCountCollector otherwise.

I'm considering moving this to Weight instead, where it'd be a better place to 
add counting optimizations for other queries, e.g. pure disjunctions over 
single-valued fields or range queries on points. The default implementation 
could use a BulkScorer+TotalHitCountCollector like IndexSearcher#count does 
today.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase opened a new pull request #2093: LUCENE-9606: Wrap boolean queries generated by shape fields with a Constant score query

2020-11-23 Thread GitBox


iverase opened a new pull request #2093:
URL: https://github.com/apache/lucene-solr/pull/2093


   When querying a shape field with a Geometry collection and a CONTAINS 
spatial relationship, the query is rewritten as a boolean query. We should wrap 
the resulting query with a ConstantScoreQuery.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9595) Component2D#withinPoint logic is inconsistent with ShapeQuery logic

2020-11-23 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-9595.
--
Fix Version/s: 8.8
 Assignee: Ignacio Vera
   Resolution: Fixed

> Component2D#withinPoint logic is inconsistent with ShapeQuery logic
> ---
>
> Key: LUCENE-9595
> URL: https://issues.apache.org/jira/browse/LUCENE-9595
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.8
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The logic of ShapeQuery for contains assumes that if a branch of the BKD tree 
> is inside of the  shape query, the all documents in that branch are excluded 
> from the result. On the other hand, Component2D#withinPoint implementation, 
> eg. Polygon2D,  ignores points even when the point is inside the query.
> That might lead to inconsistencies in edges cases with geometry collections. 
> The proposal here is to keep the logic of the shapeQuery and therefore 
> contains logic will only return true if the query shape is inside a geometry 
> and it does not intersects with any other geometry belonging to the same 
> document. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9595) Component2D#withinPoint logic is inconsistent with ShapeQuery logic

2020-11-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237222#comment-17237222
 ] 

ASF subversion and git services commented on LUCENE-9595:
-

Commit 2d7d315f970ae413b27d5a11c10de1cb643b089d in lucene-solr's branch 
refs/heads/branch_8x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2d7d315 ]

LUCENE-9595: Make Component2D#withinPoint implementations consistent with 
ShapeQuery logic (#2059)



> Component2D#withinPoint logic is inconsistent with ShapeQuery logic
> ---
>
> Key: LUCENE-9595
> URL: https://issues.apache.org/jira/browse/LUCENE-9595
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The logic of ShapeQuery for contains assumes that if a branch of the BKD tree 
> is inside of the  shape query, the all documents in that branch are excluded 
> from the result. On the other hand, Component2D#withinPoint implementation, 
> eg. Polygon2D,  ignores points even when the point is inside the query.
> That might lead to inconsistencies in edges cases with geometry collections. 
> The proposal here is to keep the logic of the shapeQuery and therefore 
> contains logic will only return true if the query shape is inside a geometry 
> and it does not intersects with any other geometry belonging to the same 
> document. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9595) Component2D#withinPoint logic is inconsistent with ShapeQuery logic

2020-11-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237220#comment-17237220
 ] 

ASF subversion and git services commented on LUCENE-9595:
-

Commit 44be9f903dbad601d3b46108802a951555a6d7ba in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=44be9f9 ]

LUCENE-9595: Make Component2D#withinPoint implementations consistent with 
ShapeQuery logic (#2059)



> Component2D#withinPoint logic is inconsistent with ShapeQuery logic
> ---
>
> Key: LUCENE-9595
> URL: https://issues.apache.org/jira/browse/LUCENE-9595
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The logic of ShapeQuery for contains assumes that if a branch of the BKD tree 
> is inside of the  shape query, the all documents in that branch are excluded 
> from the result. On the other hand, Component2D#withinPoint implementation, 
> eg. Polygon2D,  ignores points even when the point is inside the query.
> That might lead to inconsistencies in edges cases with geometry collections. 
> The proposal here is to keep the logic of the shapeQuery and therefore 
> contains logic will only return true if the query shape is inside a geometry 
> and it does not intersects with any other geometry belonging to the same 
> document. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase merged pull request #2059: LUCENE-9595: Make Component2D#withinPoint implementations consistent with ShapeQuery logic

2020-11-23 Thread GitBox


iverase merged pull request #2059:
URL: https://github.com/apache/lucene-solr/pull/2059


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9614) Implement KNN Query

2020-11-23 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237219#comment-17237219
 ] 

Adrien Grand commented on LUCENE-9614:
--

I wonder if the Query could be just a map from N doc IDs to scores, and the KNN 
search would actually be run to construct the Query, not as part of running the 
Query. This way we could still blend scores via BooleanQuery or FeatureField, 
and even things like block-max WAND would still work.

> Implement KNN Query
> ---
>
> Key: LUCENE-9614
> URL: https://issues.apache.org/jira/browse/LUCENE-9614
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
>
> Now we have a vector index format, and one vector indexing/KNN search 
> implementation, but the interface is low-level: you can search across a 
> single segment only. We would like to expose a Query implementation. 
> Initially, we want to support a usage where the KnnVectorQuery selects the 
> k-nearest neighbors without regard to any other constraints, and these can 
> then be filtered as part of an enclosing Boolean or other query.
> Later we will want to explore some kind of filtering *while* performing 
> vector search, or a re-entrant search process that can yield further results. 
> Because of the nature of knn search (all documents having any vector value 
> match), it is more like a ranking than a filtering operation, and it doesn't 
> really make sense to provide an iterator interface that can be merged in the 
> usual way, in docid order, skipping ahead. It's not yet clear how to satisfy 
> a query that is "k nearest neighbors satsifying some arbitrary Query", at 
> least not without realizing a complete bitset for the Query. But this is for 
> a later issue; *this* issue is just about performing the knn search in 
> isolation, computing a set of (some given) K nearest neighbors, and providing 
> an iterator over those.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-23 Thread Samir Huremovic (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237211#comment-17237211
 ] 

Samir Huremovic commented on SOLR-14973:


Confirmed fixed in 8.7.

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Priority: Major
>  Labels: tika-parsers
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9581) Clarify discardCompoundToken behavior in the JapaneseTokenizer

2020-11-23 Thread Jim Ferenczi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi resolved LUCENE-9581.
--
Fix Version/s: 8.8
   master (9.0)
   Resolution: Fixed

> Clarify discardCompoundToken behavior in the JapaneseTokenizer
> --
>
> Key: LUCENE-9581
> URL: https://issues.apache.org/jira/browse/LUCENE-9581
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
>Priority: Minor
> Fix For: master (9.0), 8.8
>
> Attachments: LUCENE-9581.patch, LUCENE-9581.patch, LUCENE-9581.patch
>
>
> At first sight, the discardCompoundToken option added in LUCENE-9123 seems 
> redundant with the NORMAL mode of the Japanese tokenizer. When set to true, 
> the current behavior is to disable the decomposition for compounds, that's 
> exactly what the NORMAL mode does.
> So I wonder if the right semantic of the option would be to keep only the 
> decomposition of the compound or if it's really needed. If the goal is to 
> make the output compatible with a graph token filter, the current workaround 
> to set the mode to NORMAL should be enough.
> That's consistent with the mode that should be used to preserve positions in 
> the index since we don't handle position length on the indexing side. 
> Am I missing something regarding the new option ? Is there a compelling case 
> where it differs from the NORMAL mode ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9023) GlobalOrdinalsWithScore should not compute occurrences when the provided min is 1

2020-11-23 Thread Jim Ferenczi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-9023:
-
Fix Version/s: master (9.0)

> GlobalOrdinalsWithScore should not compute occurrences when the provided min 
> is 1
> -
>
> Key: LUCENE-9023
> URL: https://issues.apache.org/jira/browse/LUCENE-9023
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Fix For: master (9.0), 8.8
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a continuation of https://issues.apache.org/jira/browse/LUCENE-9022
> Today the GlobalOrdinalsWithScore collector and query checks the number of 
> matching docs per parent if the provided min is greater than 0. However we 
> should also not compute the occurrences of children when min is equals to 1 
> since this is the minimum requirement for a document to match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org