[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2020-12-15 Thread Elbek Kamoliddinov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250097#comment-17250097
 ] 

Elbek Kamoliddinov commented on LUCENE-9640:


I have a naive implementation where {{TrackingQuery}} creates a sparse bitset 
per segment and sets a bit for a matching doc as query runs. I will put a PR 
later this week. I wanted to start a discussion and opinion from the community.

Thanks, 
 Elbek.

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9640) Add TrackingQuery to track matching documents

2020-12-15 Thread Elbek Kamoliddinov (Jira)
Elbek Kamoliddinov created LUCENE-9640:
--

 Summary: Add TrackingQuery to track matching documents
 Key: LUCENE-9640
 URL: https://issues.apache.org/jira/browse/LUCENE-9640
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/search
Reporter: Elbek Kamoliddinov


Some users benefit having {{TrackingQuery}} functionality. This query would 
wrap another query and should be able to provide the matched DocIds for the 
wrapped query after search is run.  For example a user running a boolean query 
{{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the boolean 
query and check if documents that matched the boolean query matches the query 
{{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14923) Indexing performance is unacceptable when child documents are involved

2020-12-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250026#comment-17250026
 ] 

Thomas Wöckinger edited comment on SOLR-14923 at 12/16/20, 12:29 AM:
-

That's good news, i can support you, if you want, just leave a comment.


was (Author: thomas.woeckinger):
That's good news, i can support you, if you want, just leave comment.

> Indexing performance is unacceptable when child documents are involved
> --
>
> Key: SOLR-14923
> URL: https://issues.apache.org/jira/browse/SOLR-14923
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0)
>Reporter: Thomas Wöckinger
>Priority: Critical
>  Labels: performance, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the 
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the 
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, 
> and executed in a synchronized block of the UpdateLog instance, therefore all 
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a 
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, 
> so it does not make any difference if 'waitFlush', 'waitSearcher' or 
> 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the 
> performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14923) Indexing performance is unacceptable when child documents are involved

2020-12-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250026#comment-17250026
 ] 

Thomas Wöckinger commented on SOLR-14923:
-

That's good news, i can support you, if you want, just leave comment.

> Indexing performance is unacceptable when child documents are involved
> --
>
> Key: SOLR-14923
> URL: https://issues.apache.org/jira/browse/SOLR-14923
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0)
>Reporter: Thomas Wöckinger
>Priority: Critical
>  Labels: performance, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the 
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the 
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, 
> and executed in a synchronized block of the UpdateLog instance, therefore all 
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a 
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, 
> so it does not make any difference if 'waitFlush', 'waitSearcher' or 
> 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the 
> performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14923) Indexing performance is unacceptable when child documents are involved

2020-12-15 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Wöckinger updated SOLR-14923:

Affects Version/s: 8.7

> Indexing performance is unacceptable when child documents are involved
> --
>
> Key: SOLR-14923
> URL: https://issues.apache.org/jira/browse/SOLR-14923
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0)
>Reporter: Thomas Wöckinger
>Priority: Critical
>  Labels: performance, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the 
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the 
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, 
> and executed in a synchronized block of the UpdateLog instance, therefore all 
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a 
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, 
> so it does not make any difference if 'waitFlush', 'waitSearcher' or 
> 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the 
> performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #2118: SOLR-15031: Prevent null being wrapped in a QueryValueSource

2020-12-15 Thread GitBox


madrob commented on pull request #2118:
URL: https://github.com/apache/lucene-solr/pull/2118#issuecomment-745607522


   Thanks for adding the test! it passes for me when run alone, but not as part 
of the full class, can you verify in your environment as well?
   
   `./gradlew :solr:core:test --tests TestFunctionQuery`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #2118: SOLR-15031: Prevent null being wrapped in a QueryValueSource

2020-12-15 Thread GitBox


madrob commented on pull request #2118:
URL: https://github.com/apache/lucene-solr/pull/2118#issuecomment-745608863


   Also, would you mind adding an entry to `solr/CHANGES.txt` with how you 
would like proper credit/attribution? I'll make sure that it gets to the right 
section, so don't stress about that if you are unsure.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #2121: SOLR-10860: Return proper error code for bad input incase of inplace updates

2020-12-15 Thread GitBox


madrob commented on pull request #2121:
URL: https://github.com/apache/lucene-solr/pull/2121#issuecomment-745598327


   @munendrasn all of your comments make sense, thanks. let's update the error 
message as you describe and then this looks good.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-12-15 Thread Ankur (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249980#comment-17249980
 ] 

Ankur commented on LUCENE-9444:
---

[~mikemccand], Sorry for the late response. Yes we can resolve this one now.

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Fix For: master (9.0)
>
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document

2020-12-15 Thread Ankur (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur resolved LUCENE-9444.
---
Resolution: Fixed

> Need an API to easily fetch facet labels for a field in a document
> --
>
> Key: LUCENE-9444
> URL: https://issues.apache.org/jira/browse/LUCENE-9444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6
>Reporter: Ankur
>Priority: Major
>  Labels: facet
> Fix For: master (9.0)
>
> Attachments: LUCENE-9444.patch, LUCENE-9444.patch, 
> LUCENE-9444.v2.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> A facet field may be included in the list of fields whose values are to be 
> returned for each hit.
> In order to get the facet labels for each hit we need to
>  # Create an instance of _DocValuesOrdinalsReader_ and invoke 
> _getReader(LeafReaderContext context)_ method to obtain an instance of 
> _OrdinalsSegmentReader()_
>  # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then 
> used to fetch and decode the binary payload in the document's BinaryDocValues 
> field. This provides the ordinals that refer to facet labels in the 
> taxonomy.**
>  # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be 
> returned.
>  
> Ideally there should be a simple API - *String[] getLabels(docId)* that hides 
> all the above details and gives us the string labels. This can be part of 
> *TaxonomyFacets* but that's just one idea.
> I am opening this issue to get community feedback and suggestions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15029) More gracefully allow Shard Leader to give up leadership

2020-12-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249976#comment-17249976
 ] 

ASF subversion and git services commented on SOLR-15029:


Commit bf7b438f12d65904b461e595594fc9a64cfcc899 in lucene-solr's branch 
refs/heads/master from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bf7b438 ]

SOLR-15029 Trigger leader election on index writer tragedy

SOLR-13027 Use TestInjection so that we always have a Tragic Event

When we encounter a tragic error in the index writer, we can trigger a
leader election instead of queing up a delete and re-add of the node in
question. This should result in a more graceful transition, and the
previous leader will eventually be put into recovery by a new leader.

closes #2120


> More gracefully allow Shard Leader to give up leadership
> 
>
> Key: SOLR-15029
> URL: https://issues.apache.org/jira/browse/SOLR-15029
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: 8.8, master (9.0)
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently we have (via SOLR-12412) that when a leader sees an index writing 
> error during an update it will give up leadership by deleting the replica and 
> adding a new replica. One stated benefit of this was that because we are 
> using the overseer and a known code path, that this is done asynchronous and 
> very efficiently.
> I would argue that this approach is too heavy handed.
> In the case of a corrupt index exception, it makes some sense to completely 
> delete the index dir and attempt to sync from a good peer. Even in this case, 
> however, it might be better to allow fingerprinting and other index delta 
> mechanisms take over and allow for a more efficient data transfer.
> In an alternate case where the index error arises due to a disconnected file 
> system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) 
> and the required solution is some kind of reconnect, then this approach has 
> several shortcomings - the core delete and creations are going to fail 
> leaving dangling replicas. Further, the data is still present so there is no 
> need to do so many extra copies.
> I propose that we bring in a mechanism to give up leadership via the existing 
> shard terms language. I believe we would be able to set all replicas 
> currently equal to leader term T to T+1, and then trigger a new leader 
> election. The current leader would know it is ineligible, while the other 
> replicas that were current before the failed update would be eligible. This 
> improvement would entail adding an additional possible operation to terms 
> state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13027) Harden LeaderTragicEventTest.

2020-12-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249977#comment-17249977
 ] 

ASF subversion and git services commented on SOLR-13027:


Commit bf7b438f12d65904b461e595594fc9a64cfcc899 in lucene-solr's branch 
refs/heads/master from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bf7b438 ]

SOLR-15029 Trigger leader election on index writer tragedy

SOLR-13027 Use TestInjection so that we always have a Tragic Event

When we encounter a tragic error in the index writer, we can trigger a
leader election instead of queing up a delete and re-add of the node in
question. This should result in a more graceful transition, and the
previous leader will eventually be put into recovery by a new leader.

closes #2120


> Harden LeaderTragicEventTest.
> -
>
> Key: SOLR-13027
> URL: https://issues.apache.org/jira/browse/SOLR-13027
> Project: Solr
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15029) More gracefully allow Shard Leader to give up leadership

2020-12-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249978#comment-17249978
 ] 

ASF subversion and git services commented on SOLR-15029:


Commit b090971259f57973941d70d13612e22985a09a8d in lucene-solr's branch 
refs/heads/branch_8x from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b090971 ]

SOLR-15029 Trigger leader election on index writer tragedy

SOLR-13027 Use TestInjection so that we always have a Tragic Event

When we encounter a tragic error in the index writer, we can trigger a
leader election instead of queing up a delete and re-add of the node in
question. This should result in a more graceful transition, and the
previous leader will eventually be put into recovery by a new leader.

Backport removes additional logging from ShardTerms.save because we do
not have StackWalker in Java 8.


> More gracefully allow Shard Leader to give up leadership
> 
>
> Key: SOLR-15029
> URL: https://issues.apache.org/jira/browse/SOLR-15029
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: 8.8, master (9.0)
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently we have (via SOLR-12412) that when a leader sees an index writing 
> error during an update it will give up leadership by deleting the replica and 
> adding a new replica. One stated benefit of this was that because we are 
> using the overseer and a known code path, that this is done asynchronous and 
> very efficiently.
> I would argue that this approach is too heavy handed.
> In the case of a corrupt index exception, it makes some sense to completely 
> delete the index dir and attempt to sync from a good peer. Even in this case, 
> however, it might be better to allow fingerprinting and other index delta 
> mechanisms take over and allow for a more efficient data transfer.
> In an alternate case where the index error arises due to a disconnected file 
> system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) 
> and the required solution is some kind of reconnect, then this approach has 
> several shortcomings - the core delete and creations are going to fail 
> leaving dangling replicas. Further, the data is still present so there is no 
> need to do so many extra copies.
> I propose that we bring in a mechanism to give up leadership via the existing 
> shard terms language. I believe we would be able to set all replicas 
> currently equal to leader term T to T+1, and then trigger a new leader 
> election. The current leader would know it is ineligible, while the other 
> replicas that were current before the failed update would be eligible. This 
> improvement would entail adding an additional possible operation to terms 
> state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13027) Harden LeaderTragicEventTest.

2020-12-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249979#comment-17249979
 ] 

ASF subversion and git services commented on SOLR-13027:


Commit b090971259f57973941d70d13612e22985a09a8d in lucene-solr's branch 
refs/heads/branch_8x from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b090971 ]

SOLR-15029 Trigger leader election on index writer tragedy

SOLR-13027 Use TestInjection so that we always have a Tragic Event

When we encounter a tragic error in the index writer, we can trigger a
leader election instead of queing up a delete and re-add of the node in
question. This should result in a more graceful transition, and the
previous leader will eventually be put into recovery by a new leader.

Backport removes additional logging from ShardTerms.save because we do
not have StackWalker in Java 8.


> Harden LeaderTragicEventTest.
> -
>
> Key: SOLR-13027
> URL: https://issues.apache.org/jira/browse/SOLR-13027
> Project: Solr
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15029) More gracefully allow Shard Leader to give up leadership

2020-12-15 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved SOLR-15029.
--
Resolution: Fixed

> More gracefully allow Shard Leader to give up leadership
> 
>
> Key: SOLR-15029
> URL: https://issues.apache.org/jira/browse/SOLR-15029
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: 8.8, master (9.0)
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently we have (via SOLR-12412) that when a leader sees an index writing 
> error during an update it will give up leadership by deleting the replica and 
> adding a new replica. One stated benefit of this was that because we are 
> using the overseer and a known code path, that this is done asynchronous and 
> very efficiently.
> I would argue that this approach is too heavy handed.
> In the case of a corrupt index exception, it makes some sense to completely 
> delete the index dir and attempt to sync from a good peer. Even in this case, 
> however, it might be better to allow fingerprinting and other index delta 
> mechanisms take over and allow for a more efficient data transfer.
> In an alternate case where the index error arises due to a disconnected file 
> system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) 
> and the required solution is some kind of reconnect, then this approach has 
> several shortcomings - the core delete and creations are going to fail 
> leaving dangling replicas. Further, the data is still present so there is no 
> need to do so many extra copies.
> I propose that we bring in a mechanism to give up leadership via the existing 
> shard terms language. I believe we would be able to set all replicas 
> currently equal to leader term T to T+1, and then trigger a new leader 
> election. The current leader would know it is ineligible, while the other 
> replicas that were current before the failed update would be eligible. This 
> improvement would entail adding an additional possible operation to terms 
> state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob closed pull request #2120: SOLR-15029 More gracefully give up shard leadership

2020-12-15 Thread GitBox


madrob closed pull request #2120:
URL: https://github.com/apache/lucene-solr/pull/2120


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13102) Shared storage Directory implementation

2020-12-15 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249965#comment-17249965
 ] 

David Smiley commented on SOLR-13102:
-

I forgot about this proposal.  Still; [~ysee...@gmail.com] please take a look 
at my proposal SOLR-15051.  The issue here centers around the use of a 
SolrCloud shard "term" to keep multiple readers and one writer with leader 
hand-off happy using the same space for a shard.  My primary concern with this 
plan is how this may conceptually leak concerns between the low level Directory 
and high level SolrCloud.  Perhaps it can work in some way nicely – I dunno 
just by looking at the issue description.  Also it's unclear if the replica 
types would know/care about the use of this Directory; hopefully not.

Might you re-title this to somehow include "via shard leader term prefix" or 
some-such differentiator?  Solr *already* has a shared storage implementation 
using HdfsDirectory.

> Shared storage Directory implementation
> ---
>
> Key: SOLR-13102
> URL: https://issues.apache.org/jira/browse/SOLR-13102
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Priority: Major
>
> We need a general strategy (and probably a general base class) that can work 
> with shared storage and not corrupt indexes from multiple writers.
> One strategy that is used on local disk is to use locks.  This doesn't extend 
> well to remote / shared filesystems when the locking is not tied into the 
> object store itself since a process can lose the lock (a long GC or whatever) 
> and then immediately try to write a file and there is no way to stop it.
> An alternate strategy ditches the use of locks and simply avoids overwriting 
> files by some algorithmic mechanism.
> One of my colleagues outlined one way to do this: 
> https://www.youtube.com/watch?v=UeTFpNeJ1Fo
> That strategy uses random looking filenames and then writes a "core.metadata" 
> file that maps between the random names and the original names.  The problem 
> is then reduced to overwriting "core.metadata" when you lose the lock.  One 
> way to fix this is to version "core.metadata".  Since the new leader election 
> code was implemented, each shard as a monotonically increasing "leader term", 
> and we can use that as part of the filename.  When a reader goes to open an 
> index, it can use the latest file from the directory listing, or even use the 
> term obtained from ZK if we can't trust the directory listing to be up to 
> date.  Additionally, we don't need random filenames to avoid collisions... a 
> simple unique prefix or suffix would work fine (such as the leader term again)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)

2020-12-15 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249961#comment-17249961
 ] 

David Smiley commented on SOLR-15051:
-

BTW I started with a SIP but then I doubted this because BlobDirectory fits an 
existing abstraction well, without requiring changes elsewhere.  Regardless, 
IMO the point of a SIP is largely to draw attention to important things.  At 
the conclusion of an internal "hack day" I'm doing now with [~broustant] and 
[~nazerke] to make this thing real, I'll share this more (e.g. the dev list).

> Shared storage -- BlobDirectory (de-duping)
> ---
>
> Key: SOLR-15051
> URL: https://issues.apache.org/jira/browse/SOLR-15051
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>
> This proposal is a way to accomplish shared storage in SolrCloud with a few 
> key characteristics: (A) using a Directory implementation, (B) delegates to a 
> backing local file Directory as a kind of read/write cache (C) replicas have 
> their own "space", (D) , de-duplication across replicas via reference 
> counting, (E) uses ZK but separately from SolrCloud stuff.
> The Directory abstraction is a good one, and helps isolate shared storage 
> from the rest of SolrCloud that doesn't care.  Using a backing normal file 
> Directory is faster for reads and is simpler than Solr's HDFSDirectory's 
> BlockCache.  Replicas having their own space solves the problem of multiple 
> writers (e.g. of the same shard) trying to own and write to the same space, 
> and it implies that any of Solr's replica types can be used along with what 
> goes along with them like peer-to-peer replication (sometimes faster/cheaper 
> than pulling from shared storage).  A de-duplication feature solves needless 
> duplication of files across replicas and from parent shards (i.e. from shard 
> splitting).  The de-duplication feature requires a place to cache directory 
> listings so that they can be shared across replicas and atomically updated; 
> this is handled via ZooKeeper.  Finally, some sort of Solr daemon / 
> auto-scaling code should be added to implement "autoAddReplicas", especially 
> to provide for a scenario where the leader is gone and can't be replicated 
> from directly but we can access shared storage.
> For more about shared storage concepts, consider looking at the description 
> in SOLR-13101 and the linked Google Doc.
> *[PROPOSAL 
> DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)

2020-12-15 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-15051:

Description: 
This proposal is a way to accomplish shared storage in SolrCloud with a few key 
characteristics: (A) using a Directory implementation, (B) delegates to a 
backing local file Directory as a kind of read/write cache (C) replicas have 
their own "space", (D) , de-duplication across replicas via reference counting, 
(E) uses ZK but separately from SolrCloud stuff.

The Directory abstraction is a good one, and helps isolate shared storage from 
the rest of SolrCloud that doesn't care.  Using a backing normal file Directory 
is faster for reads and is simpler than Solr's HDFSDirectory's BlockCache.  
Replicas having their own space solves the problem of multiple writers (e.g. of 
the same shard) trying to own and write to the same space, and it implies that 
any of Solr's replica types can be used along with what goes along with them 
like peer-to-peer replication (sometimes faster/cheaper than pulling from 
shared storage).  A de-duplication feature solves needless duplication of files 
across replicas and from parent shards (i.e. from shard splitting).  The 
de-duplication feature requires a place to cache directory listings so that 
they can be shared across replicas and atomically updated; this is handled via 
ZooKeeper.  Finally, some sort of Solr daemon / auto-scaling code should be 
added to implement "autoAddReplicas", especially to provide for a scenario 
where the leader is gone and can't be replicated from directly but we can 
access shared storage.

For more about shared storage concepts, consider looking at the description in 
SOLR-13101 and the linked Google Doc.

*[PROPOSAL 
DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]*

  was:
This proposal is a way to accomplish shared storage in SolrCloud with a few key 
characteristics: (A) using a Directory implementation, (B) delegates to a 
backing local file Directory as a kind of read/write cache (C) replicas have 
their own "space", (D) , de-duplication across replicas via reference counting, 
(E) uses ZK but separately from SolrCloud stuff.

The Directory abstraction is a good one, and helps isolate shared storage from 
the rest of SolrCloud that doesn't care.  Using a backing normal file Directory 
is faster for reads and is simpler than Solr's HDFSDirectory's BlockCache.  
Replicas having their own space solves the problem of multiple writers (e.g. of 
the same shard) trying to own and write to the same space, and it implies that 
any of Solr's replica types can be used along with what goes along with them 
like peer-to-peer replication (sometimes faster/cheaper than pulling from 
shared storage).  A de-duplication feature solves needless duplication of files 
across replicas and from parent shards (i.e. from shard splitting).  The 
de-duplication feature requires a place to cache directory listings so that 
they can be shared across replicas and atomically updated; this is handled via 
ZooKeeper.  Finally, some sort of Solr daemon / auto-scaling code should be 
added to implement "autoAddReplicas", especially to provide for a scenario 
where the leader is gone and can't be replicated from directly but we can 
access shared storage.

For more about shared storage concepts, consider looking at the description in 
SOLR-13101.


> Shared storage -- BlobDirectory (de-duping)
> ---
>
> Key: SOLR-15051
> URL: https://issues.apache.org/jira/browse/SOLR-15051
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>
> This proposal is a way to accomplish shared storage in SolrCloud with a few 
> key characteristics: (A) using a Directory implementation, (B) delegates to a 
> backing local file Directory as a kind of read/write cache (C) replicas have 
> their own "space", (D) , de-duplication across replicas via reference 
> counting, (E) uses ZK but separately from SolrCloud stuff.
> The Directory abstraction is a good one, and helps isolate shared storage 
> from the rest of SolrCloud that doesn't care.  Using a backing normal file 
> Directory is faster for reads and is simpler than Solr's HDFSDirectory's 
> BlockCache.  Replicas having their own space solves the problem of multiple 
> writers (e.g. of the same shard) trying to own and write to the same space, 
> and it implies that any of Solr's replica types can be used along with what 
> goes along with them like peer-to-peer replication (sometimes faster/cheaper 
> than pulling from shared storage).  A de-duplication feature solves needless 
> duplication of files across replicas and 

[jira] [Created] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)

2020-12-15 Thread David Smiley (Jira)
David Smiley created SOLR-15051:
---

 Summary: Shared storage -- BlobDirectory (de-duping)
 Key: SOLR-15051
 URL: https://issues.apache.org/jira/browse/SOLR-15051
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: David Smiley
Assignee: David Smiley


This proposal is a way to accomplish shared storage in SolrCloud with a few key 
characteristics: (A) using a Directory implementation, (B) delegates to a 
backing local file Directory as a kind of read/write cache (C) replicas have 
their own "space", (D) , de-duplication across replicas via reference counting, 
(E) uses ZK but separately from SolrCloud stuff.

The Directory abstraction is a good one, and helps isolate shared storage from 
the rest of SolrCloud that doesn't care.  Using a backing normal file Directory 
is faster for reads and is simpler than Solr's HDFSDirectory's BlockCache.  
Replicas having their own space solves the problem of multiple writers (e.g. of 
the same shard) trying to own and write to the same space, and it implies that 
any of Solr's replica types can be used along with what goes along with them 
like peer-to-peer replication (sometimes faster/cheaper than pulling from 
shared storage).  A de-duplication feature solves needless duplication of files 
across replicas and from parent shards (i.e. from shard splitting).  The 
de-duplication feature requires a place to cache directory listings so that 
they can be shared across replicas and atomically updated; this is handled via 
ZooKeeper.  Finally, some sort of Solr daemon / auto-scaling code should be 
added to implement "autoAddReplicas", especially to provide for a scenario 
where the leader is gone and can't be replicated from directly but we can 
access shared storage.

For more about shared storage concepts, consider looking at the description in 
SOLR-13101.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15026) MiniSolrCloudCluster can inconsistently get confused about when it's using SSL

2020-12-15 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249891#comment-17249891
 ] 

Mike Drob commented on SOLR-15026:
--

The relevant bit of why the change is these system properties - 
https://github.com/apache/lucene-solr/blob/master/solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java#L291

If we need to break that inheritance, we should be able to duplicate some 
minimal setup bits.

> MiniSolrCloudCluster can inconsistently get confused about when it's using SSL
> --
>
> Key: SOLR-15026
> URL: https://issues.apache.org/jira/browse/SOLR-15026
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> MiniSolrCloudCluster makes multiple assumptions related to "SSL" that can be 
> confusing/missleading when attempting to write a test that uses 
> MiniSolrCloudCluster.  This can lead to some aspects of  MiniSolrCloudCluster 
> assuming that SSL should be used -- inspite of what JettyConfig is specified 
> -- based on system properties; or conversly: to not correctly using SSL 
> realted options in all code paths even when the JettyConfig indicates SSL is 
> needed.
> Current workarounds:
> * Directly instantiating a MiniSolrCloudCluster in a subclass of 
> {{SolrTestCaseJ4}} should be avoided unless you explicitly use the 
> {{SuppressSSL}} anotation.
> * If you wish to use a MiniSolrCloudCluster w/SSL use {{SolrCloudTestCase}} 
> (or {{SolrTestCaseJ4}} directly) along with the {{RandomizeSSL}} annotation 
> instead of attempting to directly instantiate a MiniSolrCloudCluster.
> ** There is currently no _easy_ way to directly instantiate a 
> MiniSolrCloudCluster _and_ use SSL with setting a few system properties and 
> calling some static methods from your test case ({{SolrTestCaseJ4}} / 
> {{SolrCloudTestCase}} handles this for you when the {{RandomizeSSL}} 
> annotation is used)
> {panel:title=original issue report}
> A new test added in SOLR-14934 caused the following reproducible failure to 
> pop up on jenkins...
> {noformat}
> hossman@slate:~/lucene/dev [j11] [master] $ ./gradlew -p solr/test-framework/ 
> test --tests MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders 
> -Dtests.seed=806A85748BD81F48 -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=ln-CG -Dtests.timezone=Asia/Thimbu -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> Starting a Gradle Daemon (subsequent builds will be faster)
> > Task :randomizationInfo
> Running tests with randomization seed: tests.seed=806A85748BD81F48
> > Task :solr:test-framework:test
> org.apache.solr.cloud.MiniSolrCloudClusterTest > 
> testSolrHomeAndResourceLoaders FAILED
> org.apache.solr.client.solrj.SolrServerException: IOException occurred 
> when talking to server at: https://127.0.0.1:38681/solr
> at 
> __randomizedtesting.SeedInfo.seed([806A85748BD81F48:37548FA7602CB5FD]:0)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:712)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:269)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:390)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:360)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1168)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:931)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:865)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:229)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:246)
> at 
> org.apache.solr.cloud.MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders(MiniSolrCloudClusterTest.java:125)
> ...
> Caused by:
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
> at 
> java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:439)
> {noformat}
> The problem sems to be that even though the MiniSolrCloudCluster being 
> instantiated isn't _intentionally_ using any SSL randomization (it just uses 
> {{JettyConfig.builder().build()}} the CloudSolrClient returned by 
> {{cluster.getSolrClient()}} is evidnetly picking up the ranodmized SSL and 
> trying to use it to talk to the cluster.
> {panel}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (SOLR-14886) Suppress stack trace in Query response.

2020-12-15 Thread Isabelle Giguere (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249854#comment-17249854
 ] 

Isabelle Giguere edited comment on SOLR-14886 at 12/15/20, 6:45 PM:


[~gerlowskija]
The full stack trace in the error response can be a vulnerability.

As explained by our application security assessment team:
{quote}
Detailed technical error messages can allow an attacker to gain information 
about the application and database that could be used to conduct an attack. 
This information could include the names of database tables and columns, the 
structure of database queries, method names, configuration details, etc.
{quote}

So, OK, no database here.  But the basic idea is that the stack trace contains 
too much information for a response to the outside world.  Stack traces are for 
logs, for developers.

It falls into item #6 in the OWASP top 10
https://owasp.org/www-project-top-ten/
"verbose error messages containing sensitive information"
So, either each and every error message needs to be cleaned-up individually, 
which is error-prone, or, we don't display any details to the outside world.

Because the stack trace lists all classes and methods, a hacker can determine 
which vulnerable library is included on the classpath.  So in this sense, even 
information about the classpath is sensitive information.



was (Author: igiguere):
[~gerlowskija]
The full stack trace in the error response can be a vulnerability.

As explained by our application security assessment team:
{quote}
Detailed technical error messages can allow an attacker to gain information 
about the application and database that could be used to conduct an attack. 
This information could include the names of database tables and columns, the 
structure of database queries, method names, configuration details, etc.
{quote}

So, OK, no database here.  But the basic idea is that the stack trace contains 
too much information for a response to the outside world.  Stack traces are for 
logs, for developers.

It falls into item #6 in the OWASP top 10
https://owasp.org/www-project-top-ten/
"verbose error messages containing sensitive information"
So, either each an every error message needs to be cleaned-up individually, 
which is error-prone, or, we don't display any details to the outside world.

Because the stack trace lists all classes and methods, a hacker can determine 
which vulnerable library is included on the classpath.  So in this sense, even 
information about the classpath is sensitive information.


> Suppress stack trace in Query response.
> ---
>
> Key: SOLR-14886
> URL: https://issues.apache.org/jira/browse/SOLR-14886
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.6.2
>Reporter: Vrinda Davda
>Priority: Minor
>
> Currently there is no way to suppress the stack trace in solr response when 
> it throws an exception, like when a client sends a badly formed query string, 
> or exception with status 500 It sends full stack trace in the response. 
> I would propose a configuration for error messages so that the stack trace is 
> not visible to avoid any sensitive information in the stack trace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14886) Suppress stack trace in Query response.

2020-12-15 Thread Isabelle Giguere (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249854#comment-17249854
 ] 

Isabelle Giguere edited comment on SOLR-14886 at 12/15/20, 6:44 PM:


[~gerlowskija]
The full stack trace in the error response can be a vulnerability.

As explained by our application security assessment team:
{quote}
Detailed technical error messages can allow an attacker to gain information 
about the application and database that could be used to conduct an attack. 
This information could include the names of database tables and columns, the 
structure of database queries, method names, configuration details, etc.
{quote}

So, OK, no database here.  But the basic idea is that the stack trace contains 
too much information for a response to the outside world.  Stack traces are for 
logs, for developers.

It falls into item #6 in the OWASP top 10
https://owasp.org/www-project-top-ten/
"verbose error messages containing sensitive information"
So, either each an every error message needs to be cleaned-up individually, 
which is error-prone, or, we don't display any details to the outside world.

Because the stack trace lists all classes and methods, a hacker can determine 
which vulnerable library is included on the classpath.  So in this sense, even 
information about the classpath is sensitive information.



was (Author: igiguere):
[~gerlowskija]
The full stack trace in the error response can be a vulnerability.

As explained by our application security assessment team:
{quote}
Detailed technical error messages can allow an attacker to gain information 
about the application and database that could be used to conduct an attack. 
This information could include the names of database tables and columns, the 
structure of database queries, method names, configuration details, etc.
{quote}

So, OK, no database here.  But the basic idea is that the stack trace contains 
too much information for a response.


> Suppress stack trace in Query response.
> ---
>
> Key: SOLR-14886
> URL: https://issues.apache.org/jira/browse/SOLR-14886
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.6.2
>Reporter: Vrinda Davda
>Priority: Minor
>
> Currently there is no way to suppress the stack trace in solr response when 
> it throws an exception, like when a client sends a badly formed query string, 
> or exception with status 500 It sends full stack trace in the response. 
> I would propose a configuration for error messages so that the stack trace is 
> not visible to avoid any sensitive information in the stack trace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14886) Suppress stack trace in Query response.

2020-12-15 Thread Isabelle Giguere (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249854#comment-17249854
 ] 

Isabelle Giguere commented on SOLR-14886:
-

[~gerlowskija]
The full stack trace in the error response can be a vulnerability.

As explained by our application security assessment team:
{quote}
Detailed technical error messages can allow an attacker to gain information 
about the application and database that could be used to conduct an attack. 
This information could include the names of database tables and columns, the 
structure of database queries, method names, configuration details, etc.
{quote}

So, OK, no database here.  But the basic idea is that the stack trace contains 
too much information for a response.


> Suppress stack trace in Query response.
> ---
>
> Key: SOLR-14886
> URL: https://issues.apache.org/jira/browse/SOLR-14886
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.6.2
>Reporter: Vrinda Davda
>Priority: Minor
>
> Currently there is no way to suppress the stack trace in solr response when 
> it throws an exception, like when a client sends a badly formed query string, 
> or exception with status 500 It sends full stack trace in the response. 
> I would propose a configuration for error messages so that the stack trace is 
> not visible to avoid any sensitive information in the stack trace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14034) remove deprecated min_rf references

2020-12-15 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249836#comment-17249836
 ] 

Christine Poerschke edited comment on SOLR-14034 at 12/15/20, 6:19 PM:
---

Hello everyone.

I'm not very familiar with the {{min_rf}} parameter and what exactly its 
removal here will entail but yes based on 
[https://github.com/apache/lucene-solr/search?q=min_rf] and 
[https://github.com/apache/lucene-solr/search?q=MIN_REPFACT] search results 
this JIRA item remains available to be worked on further – thanks to everyone 
who already started analysing the code as per above.

Perhaps a _draft_ and _initially partially scoped_ pull request could provide a 
way to continue to move forward here i.e. if the SOLR-14034 ticket number is 
included in its title then it will get automatically linked here i.e. we can 
all easily find it then and perhaps then incrementally the possibilities for 
the above questions would become clearer?

Hope that helps.


was (Author: cpoerschke):
Hello everyone.

I'm not very familiar with the {{min_rf}} parameter and what exactly its 
removal here will entail but yes based on 
[https://github.com/apache/lucene-solr/search?q=min_rf] and 
[https://github.com/apache/lucene-solr/search?q=MIN_REPFACT] search results 
this JIRA item remains available to be worked on.

Perhaps a _draft_ and _initially partially scoped_ pull request provide a way 
forward here i.e. if the SOLR-14034 ticket number is included in its title then 
it will get automatically linked here i.e. we can all easily find it then and 
perhaps then incrementally the possibilities for the above questions would 
become clearer?

Hope that helps.

> remove deprecated min_rf references
> ---
>
> Key: SOLR-14034
> URL: https://issues.apache.org/jira/browse/SOLR-14034
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Priority: Blocker
>  Labels: newdev
> Fix For: master (9.0)
>
>
> * {{min_rf}} support was added under SOLR-5468 in version 4.9 
> (https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.9.0/solr/solrj/src/java/org/apache/solr/client/solrj/request/UpdateRequest.java#L50)
>  and deprecated under SOLR-12767 in version 7.6 
> (https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.6.0/solr/solrj/src/java/org/apache/solr/client/solrj/request/UpdateRequest.java#L57-L61)
> * http://lucene.apache.org/solr/7_6_0/changes/Changes.html and 
> https://lucene.apache.org/solr/guide/8_0/major-changes-in-solr-8.html#solr-7-6
>  both clearly mention the deprecation
> This ticket is to fully remove {{min_rf}} references in code, tests and 
> documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke closed pull request #1705: factor out static LTRQParserPlugin.newLTRScoringQuery(...) method

2020-12-15 Thread GitBox


cpoerschke closed pull request #1705:
URL: https://github.com/apache/lucene-solr/pull/1705


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14034) remove deprecated min_rf references

2020-12-15 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249836#comment-17249836
 ] 

Christine Poerschke commented on SOLR-14034:


Hello everyone.

I'm not very familiar with the {{min_rf}} parameter and what exactly its 
removal here will entail but yes based on 
[https://github.com/apache/lucene-solr/search?q=min_rf] and 
[https://github.com/apache/lucene-solr/search?q=MIN_REPFACT] search results 
this JIRA item remains available to be worked on.

Perhaps a _draft_ and _initially partially scoped_ pull request provide a way 
forward here i.e. if the SOLR-14034 ticket number is included in its title then 
it will get automatically linked here i.e. we can all easily find it then and 
perhaps then incrementally the possibilities for the above questions would 
become clearer?

Hope that helps.

> remove deprecated min_rf references
> ---
>
> Key: SOLR-14034
> URL: https://issues.apache.org/jira/browse/SOLR-14034
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Priority: Blocker
>  Labels: newdev
> Fix For: master (9.0)
>
>
> * {{min_rf}} support was added under SOLR-5468 in version 4.9 
> (https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.9.0/solr/solrj/src/java/org/apache/solr/client/solrj/request/UpdateRequest.java#L50)
>  and deprecated under SOLR-12767 in version 7.6 
> (https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.6.0/solr/solrj/src/java/org/apache/solr/client/solrj/request/UpdateRequest.java#L57-L61)
> * http://lucene.apache.org/solr/7_6_0/changes/Changes.html and 
> https://lucene.apache.org/solr/guide/8_0/major-changes-in-solr-8.html#solr-7-6
>  both clearly mention the deprecation
> This ticket is to fully remove {{min_rf}} references in code, tests and 
> documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jbampton commented on a change in pull request #2120: SOLR-15029 More gracefully give up shard leadership

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2120:
URL: https://github.com/apache/lucene-solr/pull/2120#discussion_r543515585



##
File path: 
solr/solrj/src/java/org/apache/solr/client/solrj/cloud/ShardTerms.java
##
@@ -102,16 +101,16 @@ public ShardTerms increaseTerms(String leader, 
Set replicasNeedingRecove
   if (replicasNeedingRecovery.contains(key)) foundReplicasInLowerTerms = 
true;
   if (Objects.equals(entry.getValue(), leaderTerm)) {
 if(skipIncreaseTermOf(key, replicasNeedingRecovery)) {

Review comment:
   ```suggestion
   if (skipIncreaseTermOf(key, replicasNeedingRecovery)) {
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jbampton commented on a change in pull request #2121: SOLR-10860: Return proper error code for bad input incase of inplace updates

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2121:
URL: https://github.com/apache/lucene-solr/pull/2121#discussion_r543503718



##
File path: 
solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java
##
@@ -143,6 +147,15 @@ public SolrInputDocument merge(final SolrInputDocument 
fromDoc, SolrInputDocumen
 return toDoc;
   }
 
+  private static String getID(SolrInputDocument doc, IndexSchema schema) {
+String id = "";
+SchemaField sf = schema.getUniqueKeyField();
+if( sf != null ) {

Review comment:
   ```suggestion
   if ( sf != null ) {
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jbampton commented on a change in pull request #2122: SOLR-14950: Fix copyfield regeneration with explicit src/dest matching dyn rule

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2122:
URL: https://github.com/apache/lucene-solr/pull/2122#discussion_r543500974



##
File path: solr/core/src/test/org/apache/solr/rest/schema/TestBulkSchemaAPI.java
##
@@ -773,6 +773,108 @@ public void testCopyFieldRules() throws Exception {
 assertTrue("'bleh_s' copyField rule exists in the schema", l.isEmpty());
   }
 
+  @SuppressWarnings({"rawtypes"})
+  public void testCopyFieldWithReplace() throws Exception {
+RestTestHarness harness = restTestHarness;
+String newFieldName = "test_solr_14950";
+
+// add-field-type
+String addFieldTypeAnalyzer = "{\n" +
+"'add-field-type' : {" +
+"'name' : 'myNewTextField',\n" +
+"'class':'solr.TextField',\n" +

Review comment:
   ```suggestion
   "'class' : 'solr.TextField',\n" +
   ```

##
File path: solr/core/src/test/org/apache/solr/rest/schema/TestBulkSchemaAPI.java
##
@@ -773,6 +773,108 @@ public void testCopyFieldRules() throws Exception {
 assertTrue("'bleh_s' copyField rule exists in the schema", l.isEmpty());
   }
 
+  @SuppressWarnings({"rawtypes"})
+  public void testCopyFieldWithReplace() throws Exception {
+RestTestHarness harness = restTestHarness;
+String newFieldName = "test_solr_14950";
+
+// add-field-type
+String addFieldTypeAnalyzer = "{\n" +
+"'add-field-type' : {" +
+"'name' : 'myNewTextField',\n" +
+"'class':'solr.TextField',\n" +
+"'analyzer' : {\n" +
+"'charFilters' : [{\n" +
+"'name':'patternReplace',\n" +
+"'replacement':'$1$1',\n" +
+"'pattern':'([a-zA-Z])1+'\n" +
+"}],\n" +
+"'tokenizer' : { 'name':'whitespace' },\n" +
+"'filters' : [{ 'name':'asciiFolding' }]\n" +
+"}\n"+
+"}}";
+
+String response = restTestHarness.post("/schema", 
json(addFieldTypeAnalyzer));
+Map map = (Map) fromJSONString(response);
+assertNull(response, map.get("error"));
+map = getObj(harness, "myNewTextField", "fieldTypes");
+assertNotNull("'myNewTextField' field type does not exist in the schema", 
map);
+
+// add-field
+String payload = "{\n" +
+"'add-field' : {\n" +
+" 'name':'" + newFieldName + "',\n" +
+" 'type':'myNewTextField',\n" +
+" 'stored':true,\n" +
+" 'indexed':true\n" +
+" }\n" +
+"}";
+
+response = harness.post("/schema", json(payload));
+
+map = (Map) fromJSONString(response);
+assertNull(response, map.get("error"));
+
+Map m = getObj(harness, newFieldName, "fields");
+assertNotNull("'"+ newFieldName + "' field does not exist in the schema", 
m);
+
+// add copy-field with explicit source and destination
+List l = getSourceCopyFields(harness, "bleh_s");
+assertTrue("'bleh_s' copyField rule exists in the schema", l.isEmpty());
+
+payload = "{\n" +
+"  'add-copy-field' : {\n" +
+"   'source' :'bleh_s',\n" +
+"   'dest':'"+ newFieldName + "'\n" +
+"   }\n" +
+"  }\n";
+response = harness.post("/schema", json(payload));
+
+map = (Map) fromJSONString(response);
+assertNull(response, map.get("error"));
+
+l = getSourceCopyFields(harness, "bleh_s");
+assertFalse("'bleh_s' copyField rule doesn't exist", l.isEmpty());
+assertEquals("bleh_s", ((Map)l.get(0)).get("source"));
+assertEquals(newFieldName, ((Map)l.get(0)).get("dest"));
+
+// replace-field-type
+String replaceFieldTypeAnalyzer = "{\n" +
+"'replace-field-type' : {" +
+"'name' : 'myNewTextField',\n" +
+"'class':'solr.TextField',\n" +

Review comment:
   ```suggestion
   "'class' : 'solr.TextField',\n" +
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-15 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692
 ] 

Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 4:32 PM:
--

The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported fields. The fl for drill specifies the 
exported fields.

Maybe we should change the syntax of drill so that the input() function takes 
field names as parameters and drill selects the export fl from this field list. 
This is quite clean and ties together all the fields needed for export with the 
expression wrapping the input function.


was (Author: joel.bernstein):
The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported fields. The fl for drill specifies the 
exported fields.

Maybe we should change the syntax of drill so that the input() function takes 
field names as parameters and drill selects the export fl from this field list. 
This is quit clean and ties together all the fields needed for export with the 
expression wrapping the input function.

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One 

[jira] [Deleted] (SOLR-15050) Проводка Москвич 2140

2020-12-15 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev deleted SOLR-15050:



> Проводка Москвич 2140
> -
>
> Key: SOLR-15050
> URL: https://issues.apache.org/jira/browse/SOLR-15050
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: ||Основные||
> |Марка|Москвич|
> |Модель|2140|
> |Тип|Автомобильные провода|
> |Производитель  |Wire& aвтопроводка|
> |Страна производитель|Украина|
> |Тип запчасти|Оригинал|
> |Тип техники|Легковой автомобиль|
> |Код запчасти|1082|
> |Состояние|Новое|
> https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html
>Reporter: Vladimir
>Priority: Major
>  Labels: Проводка
>
> !2365641414_w640_h640_2365641414.jpg|width=179,height=179!
> Проводка на Москвич 2140
> 1.  Жгут проводов основной.
> 2.  Жгут проводов задняя часть.
> [https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html]
> Проводка от Каменец-Подольского производителя, это всегда гарантия качества, 
> доступные цены и быстрая доставка.
> Будем рады сотрудничать, как с частными лицами, предпринимателями так и с 
> предприятиями.
>  ** 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-15 Thread Timothy Potter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249765#comment-17249765
 ] 

Timothy Potter commented on SOLR-15036:
---

Confirmed, it works nicely now! Thanks for your help Joel

{code}
{count(*)=6, a_i=0, max(max(a_d))=2.2515625018914305, 
min(min(a_d))=-0.5859583807765252, sum(sum(a_d))=5.894460990302006, 
wsum(avg(a_d), count(*))=0.9824101650503342}
{count(*)=4, a_i=1, max(max(a_d))=3.338305310115201, 
min(min(a_d))=0.03050220236482515, sum(sum(a_d))=12.517492417715335, 
wsum(avg(a_d), count(*))=2.086248736285889}
{count(*)=4, a_i=2, max(max(a_d))=4.832815828279073, 
min(min(a_d))=3.16905458918893, sum(sum(a_d))=24.076139429000165, 
wsum(avg(a_d), count(*))=4.012689904833361}
{count(*)=4, a_i=3, max(max(a_d))=5.66831997419713, 
min(min(a_d))=2.902262184046103, sum(sum(a_d))=22.58303980377591, 
wsum(avg(a_d), count(*))=3.763839967295984}
{count(*)=4, a_i=4, max(max(a_d))=6.531585917691583, 
min(min(a_d))=2.6395698661907963, sum(sum(a_d))=28.243748570490624, 
wsum(avg(a_d), count(*))=4.707291428415103}
{count(*)=5, a_i=5, max(max(a_d))=7.555382540979672, 
min(min(a_d))=4.808772939476107, sum(sum(a_d))=37.88196903407075, 
wsum(avg(a_d), count(*))=6.313661505678459}
{count(*)=5, a_i=6, max(max(a_d))=8.416136012729918, 
min(min(a_d))=5.422492404700898, sum(sum(a_d))=39.25679972070782, 
wsum(avg(a_d), count(*))=6.542799953451303}
{count(*)=5, a_i=7, max(max(a_d))=8.667999236934058, 
min(min(a_d))=6.934577412906803, sum(sum(a_d))=46.7622185952807, wsum(avg(a_d), 
count(*))=7.793703099213451}
{count(*)=5, a_i=8, max(max(a_d))=9.566181963643201, 
min(min(a_d))=7.4397380388592556, sum(sum(a_d))=53.296172957938325, 
wsum(avg(a_d), count(*))=8.88269549298972}
{count(*)=4, a_i=9, max(max(a_d))=12.251349466753346, 
min(min(a_d))=9.232427215193514, sum(sum(a_d))=63.46244550204135, 
wsum(avg(a_d), count(*))=10.577074250340223}
{code}

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", 

[jira] [Updated] (SOLR-15050) Проводка Москвич 2140

2020-12-15 Thread Vladimir (Jira)

 [ 
https://issues.apache.org/jira/browse/SOLR-15050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir updated SOLR-15050:

Description: 
!2365641414_w640_h640_2365641414.jpg|width=179,height=179!

Проводка на Москвич 2140

1.  Жгут проводов основной.

2.  Жгут проводов задняя часть.

[https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html]

Проводка от Каменец-Подольского производителя, это всегда гарантия качества, 
доступные цены и быстрая доставка.

Будем рады сотрудничать, как с частными лицами, предпринимателями так и с 
предприятиями.

 ** 

  was:
[привязать 
заголовок|https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.htmlПроводка
 на Москвич 2140

1.  Жгут проводов основной.

2.  Жгут проводов задняя часть.

Проводка от Каменец-Подольского производителя, это всегда гарантия качества, 
доступные цены и быстрая доставка.

Будем рады сотрудничать, как с частными лицами, предпринимателями так и с 
предприятиями.

 ** 


> Проводка Москвич 2140
> -
>
> Key: SOLR-15050
> URL: https://issues.apache.org/jira/browse/SOLR-15050
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.7
> Environment: ||Основные||
> |Марка|Москвич|
> |Модель|2140|
> |Тип|Автомобильные провода|
> |Производитель  |Wire& aвтопроводка|
> |Страна производитель|Украина|
> |Тип запчасти|Оригинал|
> |Тип техники|Легковой автомобиль|
> |Код запчасти|1082|
> |Состояние|Новое|
> https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html
>Reporter: Vladimir
>Priority: Major
>  Labels: Проводка
> Attachments: 2365641414_w640_h640_2365641414.jpg
>
>
> !2365641414_w640_h640_2365641414.jpg|width=179,height=179!
> Проводка на Москвич 2140
> 1.  Жгут проводов основной.
> 2.  Жгут проводов задняя часть.
> [https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html]
> Проводка от Каменец-Подольского производителя, это всегда гарантия качества, 
> доступные цены и быстрая доставка.
> Будем рады сотрудничать, как с частными лицами, предпринимателями так и с 
> предприятиями.
>  ** 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)
-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled

2020-12-15 Thread GitBox


mikemccand commented on pull request #2080:
URL: https://github.com/apache/lucene-solr/pull/2080#issuecomment-745375768


   > > > Hmm, but I think sumTotalTermFreq, which is per field sum of all 
totalTermFreq across all terms in that field, could overflow long even today, 
in and adversarial case. And it would not be detected by Lucene...
   > 
   > I don't think so. I like to think of this as "number of tokens" in the 
corpus. Because each doc is limited to Integer.MAX_VALUE and there can only be 
Integer.MAX_VALUE docs, sumTotalTermFreq can't overflow. and totalTermFreq is 
<= sumTotalTermFreq (it would be equal, in a degraded case where all your 
documents only have a single word repeated many times).
   
   Ahh you're right ... no more than `Integer.MAX_VALUE` tokens in one 
document, OK.
   
   > > How about decoupling these two problems? First, let's fix the 
aggregation of totalTermFreq and sumTotalTermFreq to explicitly catch any 
overflow instead of just doing the dangerous += today: 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/PushPostingsWriterBase.java#L142
 and 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java#L915?
 I.e. switch these accumluations to Math.addExact. This will explicitly catch 
long overflow for either of these stats.
   > 
   > I don't think this is correct. You wouldn't trip this until after merge, 
far after you've already overflowed the values and caused broken search results 
(assuming you have more than one segment).
   
   Hrmph, also correct, boo.
   
   Alright I guess there is nothing we can fix here ... applications simply 
must not create > `Integer.MAX_VALUE` term frequencies in one doc/field.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15050) Проводка Москвич 2140

2020-12-15 Thread Vladimir (Jira)
Vladimir created SOLR-15050:
---

 Summary: Проводка Москвич 2140
 Key: SOLR-15050
 URL: https://issues.apache.org/jira/browse/SOLR-15050
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: AutoScaling
Affects Versions: 8.7
 Environment: ||Основные||
|Марка|Москвич|
|Модель|2140|
|Тип|Автомобильные провода|
|Производитель  |Wire& aвтопроводка|
|Страна производитель|Украина|
|Тип запчасти|Оригинал|
|Тип техники|Легковой автомобиль|
|Код запчасти|1082|
|Состояние|Новое|

https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html
Reporter: Vladimir
 Attachments: 2365641414_w640_h640_2365641414.jpg

[привязать 
заголовок|https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.htmlПроводка
 на Москвич 2140

1.  Жгут проводов основной.

2.  Жгут проводов задняя часть.

Проводка от Каменец-Подольского производителя, это всегда гарантия качества, 
доступные цены и быстрая доставка.

Будем рады сотрудничать, как с частными лицами, предпринимателями так и с 
предприятиями.

 ** 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)
-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jbampton commented on a change in pull request #2132: SOLR-15036: auto- select / rollup / sort / plist over facet expression when using a collection alias with multiple collecti

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2132:
URL: https://github.com/apache/lucene-solr/pull/2132#discussion_r543443487



##
File path: 
solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/FacetStream.java
##
@@ -351,25 +361,26 @@ public String getCollection() {
 
 FieldComparator[] comps = new FieldComparator[sorts.length];
 for(int i=0; i 1) {
-  return new MultipleFieldComparator(bucketSorts);
+return (bucketSorts.length > 1) ? new MultipleFieldComparator(bucketSorts) 
: bucketSorts[0];
+  }
+
+  @Override
+  public TupleStream[] parallelize(List partitions) throws IOException 
{
+TupleStream[] parallelStreams = new TupleStream[partitions.size()];
+
+// prefer a different node for each collection if possible as we don't 
want the same remote node
+// being the coordinator if possible, otherwise, our plist isn't 
distributing the load as well
+final Set preferredNodes = new HashSet<>(Math.max((int) 
(parallelStreams.length/.75f) + 1, 16));
+
+for (int c=0; c < parallelStreams.length; c++) {

Review comment:
   ```suggestion
   for (int c = 0; c < parallelStreams.length; c++) {
   ```

##
File path: 
solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/metrics/MinMetric.java
##
@@ -87,7 +87,7 @@ public void update(Tuple tuple) {
   if(l < longMin) {
 longMin = l;
   }
-} else {
+} else if(o instanceof Long) {

Review comment:
   ```suggestion
   } else if (o instanceof Long) {
   ```

##
File path: 
solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/FacetStream.java
##
@@ -351,25 +361,26 @@ public String getCollection() {
 
 FieldComparator[] comps = new FieldComparator[sorts.length];
 for(int i=0; ihttp://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.client.solrj.io.stream;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import org.apache.solr.client.solrj.io.comp.StreamComparator;
+import org.apache.solr.client.solrj.io.stream.metrics.CountMetric;
+import org.apache.solr.client.solrj.io.stream.metrics.MaxMetric;
+import org.apache.solr.client.solrj.io.stream.metrics.MeanMetric;
+import org.apache.solr.client.solrj.io.stream.metrics.Metric;
+import org.apache.solr.client.solrj.io.stream.metrics.MinMetric;
+import org.apache.solr.client.solrj.io.stream.metrics.SumMetric;
+import org.apache.solr.client.solrj.io.stream.metrics.WeightedSumMetric;
+
+/**
+ * Indicates the underlying stream source supports parallelizing metrics 
computation across collections
+ * using a rollup of metrics from each collection.
+ */
+public interface ParallelMetricsRollup {
+  TupleStream[] parallelize(List partitions) throws IOException;
+  StreamComparator getParallelListSortOrder() throws IOException;
+  RollupStream getRollupStream(SortStream sortStream, Metric[] rollupMetrics) 
throws IOException;
+  Map getRollupSelectFields(Metric[] rollupMetrics);
+
+  default Optional openParallelStream(StreamContext context, 
List partitions, Metric[] metrics) throws IOException {
+Optional maybeRollupMetrics = getRollupMetrics(metrics);
+if (maybeRollupMetrics.isEmpty())
+  return Optional.empty(); // some metric is incompatible with doing a 
rollup over the plist results
+
+TupleStream[] parallelStreams = parallelize(partitions);
+
+// the tuples from each plist need to be sorted using the same order to do 
a rollup
+Metric[] rollupMetrics = maybeRollupMetrics.get();
+StreamComparator comparator = getParallelListSortOrder();
+SortStream sortStream = new SortStream(new 
ParallelListStream(parallelStreams), comparator);
+RollupStream rollup = getRollupStream(sortStream, rollupMetrics);
+SelectStream select = new SelectStream(rollup, 
getRollupSelectFields(rollupMetrics));
+select.setStreamContext(context);
+select.open();
+
+return Optional.of(select);
+  }
+
+  default Optional getRollupMetrics(Metric[] metrics) {
+Metric[] rollup = new Metric[metrics.length];
+CountMetric count = null;
+for (int m=0; m < rollup.length; m++) {

Review comment:
   ```suggestion
   for (int m = 0; m < rollup.length; m++) {
   ```

##
File path: 
solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/FacetStream.java
##
@@ -351,25 +361,26 @@ public String getCollection() {
 
 FieldComparator[] comps = new FieldComparator[sorts.length];
 for(int i=0; ihttp://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed 

[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-15 Thread Timothy Potter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249747#comment-17249747
 ] 

Timothy Potter commented on SOLR-15036:
---

Oh darn, I should have spotted that! Sorry for the noise there [~jbernste] ... 
seems like we could improve the error handling in the drill code to barf if the 
user is trying to compute metrics for fields not in the fl? That could help 
with silly mistakes like this ...

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection aliases. After all, 
> aliases are supposed to hide these types of complexities from client 
> applications!
> The point of this ticket is to investigate the feasibility of auto-wrapping 
> the facet expression with a rollup 

[GitHub] [lucene-solr] jbampton commented on a change in pull request #2134: SOLR-15038: Add elevateDocsWithoutMatchingQ and onlyElevatedReprese…

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2134:
URL: https://github.com/apache/lucene-solr/pull/2134#discussion_r543432124



##
File path: 
solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java
##
@@ -689,15 +693,18 @@ public void finish() throws IOException {
 
   //Handle the boosted docs.
   if(this.boostOrds != null) {
-int s = boostOrds.size();
-for(int i=0; i -1) {
-//Remove any group heads that are in the same groups as boosted 
documents.
-ords.remove(ord);
+// representative is already part of the collapsedset.
+if(!onlyElevatedRepresentativeVisible) {

Review comment:
   ```suggestion
   if (!onlyElevatedRepresentativeVisible) {
   ```

##
File path: 
solr/core/src/java/org/apache/solr/handler/component/QueryElevationComponent.java
##
@@ -504,25 +506,27 @@ private void setQuery(ResponseBuilder rb, Elevation 
elevation) {
 
 // Change the query to insert forced documents
 SolrParams params = rb.req.getParams();
-if (params.getBool(QueryElevationParams.EXCLUSIVE, false)) {
-  // We only want these elevated results
-  rb.setQuery(new BoostQuery(elevation.includeQuery, 0f));
-} else {
-  BooleanQuery.Builder queryBuilder = new BooleanQuery.Builder();
-  queryBuilder.add(rb.getQuery(), BooleanClause.Occur.SHOULD);
-  queryBuilder.add(new BoostQuery(elevation.includeQuery, 0f), 
BooleanClause.Occur.SHOULD);
-  if (elevation.excludeQueries != null) {
-if (params.getBool(QueryElevationParams.MARK_EXCLUDES, false)) {
-  // We are only going to mark items as excluded, not actually exclude 
them.
-  // This works with the EditorialMarkerFactory.
-  rb.req.getContext().put(EXCLUDED, elevation.excludedIds);
-} else {
-  for (TermQuery tq : elevation.excludeQueries) {
-queryBuilder.add(tq, BooleanClause.Occur.MUST_NOT);
+if(params.getBool(ELEVATE_DOCS_WITHOUT_MATCHING_Q, true)) {

Review comment:
   ```suggestion
   if (params.getBool(ELEVATE_DOCS_WITHOUT_MATCHING_Q, true)) {
   ```

##
File path: 
solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java
##
@@ -1030,25 +1045,25 @@ public OrdFieldValueCollector(int maxDoc,
   this.needsScores4Collapsing = needsScores4Collapsing;
   this.needsScores = needsScores;
   if (null != sortSpec) {
-this.collapseStrategy = new OrdSortSpecStrategy(maxDoc, nullPolicy, 
valueCount, groupHeadSelector, this.needsScores4Collapsing, this.needsScores, 
boostDocs, sortSpec, searcher, collapseValues);
+this.collapseStrategy = new OrdSortSpecStrategy(maxDoc, nullPolicy, 
valueCount, groupHeadSelector, this.needsScores4Collapsing, this.needsScores, 
boostDocs, sortSpec, searcher, onlyElevatedRepresentativeVisible);
   } else if (funcQuery != null) {
-this.collapseStrategy =  new OrdValueSourceStrategy(maxDoc, 
nullPolicy, valueCount, groupHeadSelector, this.needsScores4Collapsing, 
this.needsScores, boostDocs, funcQuery, searcher, collapseValues);
+this.collapseStrategy =  new OrdValueSourceStrategy(maxDoc, 
nullPolicy, valueCount, groupHeadSelector, this.needsScores4Collapsing, 
this.needsScores, boostDocs, funcQuery, searcher, 
onlyElevatedRepresentativeVisible);
   } else {
 NumberType numType = fieldType.getNumberType();
 if (null == numType) {
   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, 
"min/max must be either Int/Long/Float based field types");
 }
 switch (numType) {
   case INTEGER: {
-this.collapseStrategy = new OrdIntStrategy(maxDoc, nullPolicy, 
valueCount, groupHeadSelector, this.needsScores, boostDocs, collapseValues);
+this.collapseStrategy = new OrdIntStrategy(maxDoc, nullPolicy, 
valueCount, groupHeadSelector, this.needsScores, boostDocs, 
onlyElevatedRepresentativeVisible);
 break;
   }
   case FLOAT: {
-this.collapseStrategy = new OrdFloatStrategy(maxDoc, nullPolicy, 
valueCount, groupHeadSelector, this.needsScores, boostDocs, collapseValues);
+this.collapseStrategy = new OrdFloatStrategy(maxDoc, nullPolicy, 
valueCount, groupHeadSelector, this.needsScores, boostDocs, 
onlyElevatedRepresentativeVisible);
 break;
   }
   case LONG: {
-this.collapseStrategy =  new OrdLongStrategy(maxDoc, nullPolicy, 
valueCount, groupHeadSelector, this.needsScores, boostDocs, collapseValues);
+this.collapseStrategy =  new OrdLongStrategy(maxDoc, nullPolicy, 
valueCount, groupHeadSelector, this.needsScores, boostDocs, 
onlyElevatedRepresentativeVisible);

Review comment:
   ```suggestion
   this.collapseStrategy = new OrdLongStrategy(maxDoc, nullPolicy, 
valueCount, groupHeadSelector, this.needsScores, boostDocs, 
onlyElevatedRepresentativeVisible);
   

[jira] [Commented] (LUCENE-9021) QueryParser should avoid creating an LookaheadSuccess(Error) object with every instance

2020-12-15 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249741#comment-17249741
 ] 

Mikhail Khludnev commented on LUCENE-9021:
--

Thanks, [~pbruski_]. It seems we've done here. 

I'm wondering why javacc can't optimize it itself.

> QueryParser should avoid creating an LookaheadSuccess(Error) object with 
> every instance
> ---
>
> Key: LUCENE-9021
> URL: https://issues.apache.org/jira/browse/LUCENE-9021
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Przemek Bruski
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.8
>
> Attachments: LUCENE-9021.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is basically the same as 
> https://issues.apache.org/jira/browse/SOLR-11242 , but for Lucene QueryParser



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jbampton commented on a change in pull request #2135: SOLR-15038: Add elevateDocsWithoutMatchingQ and onlyElevatedReprese…

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2135:
URL: https://github.com/apache/lucene-solr/pull/2135#discussion_r543423038



##
File path: 
solr/core/src/java/org/apache/solr/handler/component/QueryElevationComponent.java
##
@@ -504,25 +506,27 @@ private void setQuery(ResponseBuilder rb, Elevation 
elevation) {
 
 // Change the query to insert forced documents
 SolrParams params = rb.req.getParams();
-if (params.getBool(QueryElevationParams.EXCLUSIVE, false)) {
-  // We only want these elevated results
-  rb.setQuery(new BoostQuery(elevation.includeQuery, 0f));
-} else {
-  BooleanQuery.Builder queryBuilder = new BooleanQuery.Builder();
-  queryBuilder.add(rb.getQuery(), BooleanClause.Occur.SHOULD);
-  queryBuilder.add(new BoostQuery(elevation.includeQuery, 0f), 
BooleanClause.Occur.SHOULD);
-  if (elevation.excludeQueries != null) {
-if (params.getBool(QueryElevationParams.MARK_EXCLUDES, false)) {
-  // We are only going to mark items as excluded, not actually exclude 
them.
-  // This works with the EditorialMarkerFactory.
-  rb.req.getContext().put(EXCLUDED, elevation.excludedIds);
-} else {
-  for (TermQuery tq : elevation.excludeQueries) {
-queryBuilder.add(tq, BooleanClause.Occur.MUST_NOT);
+if(params.getBool(ELEVATE_DOCS_WITHOUT_MATCHING_Q, true)) {

Review comment:
   ```suggestion
   if (params.getBool(ELEVATE_DOCS_WITHOUT_MATCHING_Q, true)) {
   ```

##
File path: 
solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java
##
@@ -569,15 +570,18 @@ public int docID() {
 private IntArrayList boostDocs;
 private MergeBoost mergeBoost;
 private boolean boosts;
+private boolean onlyElevatedRepresentativeVisible;
 
 public OrdScoreCollector(int maxDoc,
  int segments,
  DocValuesProducer collapseValuesProducer,
  int nullPolicy,
  IntIntHashMap boostDocsMap,
- IndexSearcher searcher) throws IOException {
+ IndexSearcher searcher,
+ boolean onlyElevatedRepresentativeVisible) throws 
IOException {
   this.maxDoc = maxDoc;
   this.contexts = new LeafReaderContext[segments];
+  this.onlyElevatedRepresentativeVisible = 
onlyElevatedRepresentativeVisible;
   List con = searcher.getTopReaderContext().leaves();
   for(int i=0; i con = searcher.getTopReaderContext().leaves();
   for(int i=0; i

[jira] [Updated] (LUCENE-9021) QueryParser should avoid creating an LookaheadSuccess(Error) object with every instance

2020-12-15 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated LUCENE-9021:
-
Fix Version/s: 8.8
 Assignee: Mikhail Khludnev
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> QueryParser should avoid creating an LookaheadSuccess(Error) object with 
> every instance
> ---
>
> Key: LUCENE-9021
> URL: https://issues.apache.org/jira/browse/LUCENE-9021
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Przemek Bruski
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.8
>
> Attachments: LUCENE-9021.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is basically the same as 
> https://issues.apache.org/jira/browse/SOLR-11242 , but for Lucene QueryParser



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jbampton commented on a change in pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to skip checking for availble disk space before splitting shards. Useful with shar

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2137:
URL: https://github.com/apache/lucene-solr/pull/2137#discussion_r543419505



##
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
##
@@ -129,10 +130,16 @@ public boolean split(ClusterState clusterState, 
ZkNodeProps message, NamedList

[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-15 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692
 ] 

Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 1:48 PM:
--

The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported fields. The fl for drill specifies the 
exported fields.

Maybe we should change the syntax of drill so that the input() function takes 
field names as parameters and drill selects the export fl from this field list. 
This is quit clean and ties together all the fields needed for export with the 
expression wrapping the input function.


was (Author: joel.bernstein):
The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported fields. The fl for drill specifies the 
exported fields.

Maybe we should change the syntax of drill so that the input() function takes 
field names as parameters and drill selects the export fl from this field list.

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a 

[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-15 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692
 ] 

Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 1:46 PM:
--

The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported fields. The fl for drill specifies the 
exported fields.

Maybe we should change the syntax of drill so that the input() function takes 
field names as parameters and drill selects the export fl from this field list.


was (Author: joel.bernstein):
The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported fields. 

Maybe we should change the syntax of drill so that the input() function takes 
field names as parameters and drill selects the export fl from this field list.

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some 

[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-15 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692
 ] 

Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 1:33 PM:
--

The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported fields. 

Maybe we should change the syntax of drill so that the input() function takes 
field names as parameters and drill selects the export fl from this field list.


was (Author: joel.bernstein):
The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported fields.

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection 

[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-15 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692
 ] 

Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 1:31 PM:
--

The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported fields.


was (Author: joel.bernstein):
The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported fields

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection aliases. After all, 
> aliases are supposed to hide these types of complexities from client 
> applications!
> The point of this ticket is to investigate the feasibility 

[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-15 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692
 ] 

Joel Bernstein commented on SOLR-15036:
---

The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported field.s

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection aliases. After all, 
> aliases are supposed to hide these types of complexities from client 
> applications!
> The point of this ticket is to investigate the feasibility of auto-wrapping 
> the facet expression with a rollup / sort / plist when the collection 
> argument is an alias with multiple collections; other stream sources will be 
> considered after facet is proven 

[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-15 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692
 ] 

Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 1:30 PM:
--

The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported fields


was (Author: joel.bernstein):
The fl for drill needs to include the *a_d* field. Basically you're rolling up 
and aggregating from the exported field.s

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection aliases. After all, 
> aliases are supposed to hide these types of complexities from client 
> applications!
> The point of this ticket is to investigate the feasibility 

[jira] [Commented] (LUCENE-9638) TestVectorValues.testIndexMultipleVectorFields reproducing test failure

2020-12-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249686#comment-17249686
 ] 

ASF subversion and git services commented on LUCENE-9638:
-

Commit 3c9d355315434ff17a10cd073a0a04fa6a25c202 in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3c9d355 ]

LUCENE-9638: fix simple text vector format fields list terminator


> TestVectorValues.testIndexMultipleVectorFields reproducing test failure
> ---
>
> Key: LUCENE-9638
> URL: https://issues.apache.org/jira/browse/LUCENE-9638
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> I was beasting [this PR|https://github.com/apache/lucene-solr/pull/2088] but 
> then hit this failure, likely not related to that PR:
> {noformat}
> org.apache.lucene.index.TestVectorValues > testIndexMultipleVectorFields 
> FAILED
>     org.apache.lucene.index.CorruptIndexException: SimpleText failure: 
> expected checksum line but got field-number 2 
> (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((sliced) 
> offset=96, length=478 (clone of) ByteBuffersIndexInput (file=_0.scf, 
> buffers=2,097 b\
> ytes, block size: 1,024, blocks: 3, position: 0) [slice=_0.gri])))
>         at 
> __randomizedtesting.SeedInfo.seed([9963345FEF3254D:173EA6E1008A23F8]:0)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextUtil.checkFooter(SimpleTextUtil.java:89)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextVectorReader.(SimpleTextVectorReader.java:81)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextVectorFormat.fieldsReader(SimpleTextVectorFormat.java:43)
>         at 
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:144)
>         at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84)
>         at 
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:171)
>         at 
> org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:213)
>         at 
> org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:572)
>         at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:105)
>         at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:630)
>         at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:474)
>         at 
> org.apache.lucene.index.TestVectorValues.testIndexMultipleVectorFields(TestVectorValues.java:619)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at 
> 

[jira] [Resolved] (LUCENE-9638) TestVectorValues.testIndexMultipleVectorFields reproducing test failure

2020-12-15 Thread Michael Sokolov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Sokolov resolved LUCENE-9638.
-
Resolution: Fixed

> TestVectorValues.testIndexMultipleVectorFields reproducing test failure
> ---
>
> Key: LUCENE-9638
> URL: https://issues.apache.org/jira/browse/LUCENE-9638
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> I was beasting [this PR|https://github.com/apache/lucene-solr/pull/2088] but 
> then hit this failure, likely not related to that PR:
> {noformat}
> org.apache.lucene.index.TestVectorValues > testIndexMultipleVectorFields 
> FAILED
>     org.apache.lucene.index.CorruptIndexException: SimpleText failure: 
> expected checksum line but got field-number 2 
> (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((sliced) 
> offset=96, length=478 (clone of) ByteBuffersIndexInput (file=_0.scf, 
> buffers=2,097 b\
> ytes, block size: 1,024, blocks: 3, position: 0) [slice=_0.gri])))
>         at 
> __randomizedtesting.SeedInfo.seed([9963345FEF3254D:173EA6E1008A23F8]:0)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextUtil.checkFooter(SimpleTextUtil.java:89)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextVectorReader.(SimpleTextVectorReader.java:81)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextVectorFormat.fieldsReader(SimpleTextVectorFormat.java:43)
>         at 
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:144)
>         at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84)
>         at 
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:171)
>         at 
> org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:213)
>         at 
> org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:572)
>         at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:105)
>         at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:630)
>         at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:474)
>         at 
> org.apache.lucene.index.TestVectorValues.testIndexMultipleVectorFields(TestVectorValues.java:619)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>         at 
> 

[jira] [Created] (LUCENE-9639) Add unit tests for SImpleTextVector format

2020-12-15 Thread Michael Sokolov (Jira)
Michael Sokolov created LUCENE-9639:
---

 Summary: Add unit tests for SImpleTextVector format
 Key: LUCENE-9639
 URL: https://issues.apache.org/jira/browse/LUCENE-9639
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Michael Sokolov


The other simple text field formats have unit tests; we should add tests of the 
simple text vector format as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9638) TestVectorValues.testIndexMultipleVectorFields reproducing test failure

2020-12-15 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249684#comment-17249684
 ] 

Michael Sokolov commented on LUCENE-9638:
-

Hmm well we do not have any unit tests for SImpleTextVector format; probably we 
should have. I'll open a separate issue

> TestVectorValues.testIndexMultipleVectorFields reproducing test failure
> ---
>
> Key: LUCENE-9638
> URL: https://issues.apache.org/jira/browse/LUCENE-9638
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> I was beasting [this PR|https://github.com/apache/lucene-solr/pull/2088] but 
> then hit this failure, likely not related to that PR:
> {noformat}
> org.apache.lucene.index.TestVectorValues > testIndexMultipleVectorFields 
> FAILED
>     org.apache.lucene.index.CorruptIndexException: SimpleText failure: 
> expected checksum line but got field-number 2 
> (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((sliced) 
> offset=96, length=478 (clone of) ByteBuffersIndexInput (file=_0.scf, 
> buffers=2,097 b\
> ytes, block size: 1,024, blocks: 3, position: 0) [slice=_0.gri])))
>         at 
> __randomizedtesting.SeedInfo.seed([9963345FEF3254D:173EA6E1008A23F8]:0)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextUtil.checkFooter(SimpleTextUtil.java:89)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextVectorReader.(SimpleTextVectorReader.java:81)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextVectorFormat.fieldsReader(SimpleTextVectorFormat.java:43)
>         at 
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:144)
>         at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84)
>         at 
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:171)
>         at 
> org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:213)
>         at 
> org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:572)
>         at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:105)
>         at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:630)
>         at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:474)
>         at 
> org.apache.lucene.index.TestVectorValues.testIndexMultipleVectorFields(TestVectorValues.java:619)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>         at 
> 

[jira] [Commented] (LUCENE-9638) TestVectorValues.testIndexMultipleVectorFields reproducing test failure

2020-12-15 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249683#comment-17249683
 ] 

Michael Sokolov commented on LUCENE-9638:
-

Thanks, yes it's writing an end marker after each field instead of after all 
the fields. I don't really see how this didn't fail earlier. Maybe SimpleText 
codec is not tested very often?

> TestVectorValues.testIndexMultipleVectorFields reproducing test failure
> ---
>
> Key: LUCENE-9638
> URL: https://issues.apache.org/jira/browse/LUCENE-9638
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> I was beasting [this PR|https://github.com/apache/lucene-solr/pull/2088] but 
> then hit this failure, likely not related to that PR:
> {noformat}
> org.apache.lucene.index.TestVectorValues > testIndexMultipleVectorFields 
> FAILED
>     org.apache.lucene.index.CorruptIndexException: SimpleText failure: 
> expected checksum line but got field-number 2 
> (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((sliced) 
> offset=96, length=478 (clone of) ByteBuffersIndexInput (file=_0.scf, 
> buffers=2,097 b\
> ytes, block size: 1,024, blocks: 3, position: 0) [slice=_0.gri])))
>         at 
> __randomizedtesting.SeedInfo.seed([9963345FEF3254D:173EA6E1008A23F8]:0)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextUtil.checkFooter(SimpleTextUtil.java:89)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextVectorReader.(SimpleTextVectorReader.java:81)
>         at 
> org.apache.lucene.codecs.simpletext.SimpleTextVectorFormat.fieldsReader(SimpleTextVectorFormat.java:43)
>         at 
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:144)
>         at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84)
>         at 
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:171)
>         at 
> org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:213)
>         at 
> org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:572)
>         at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:105)
>         at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:630)
>         at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:474)
>         at 
> org.apache.lucene.index.TestVectorValues.testIndexMultipleVectorFields(TestVectorValues.java:619)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>      

[jira] [Resolved] (SOLR-14728) Add self join optimization to the TopLevelJoinQuery

2020-12-15 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-14728.

Resolution: Duplicate

> Add self join optimization to the TopLevelJoinQuery
> ---
>
> Key: SOLR-14728
> URL: https://issues.apache.org/jira/browse/SOLR-14728
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Priority: Major
>
> A simple optimization can be put in place to massively improve join 
> performance when the TopLevelJoinQuery is performing a self join (same core) 
> and the *to* and *from* fields are the same field. In this scenario the top 
> level doc values ordinals can be used directly as a filter avoiding the most 
> expensive part of the join which is the bytes ref reconciliation between the 
> *to* and *from* fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14728) Add self join optimization to the TopLevelJoinQuery

2020-12-15 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249668#comment-17249668
 ] 

Jason Gerlowski commented on SOLR-14728:


Hey [~jbernste] I'm going to close this out as a duplicate of SOLR-15049.  I 
wouldn't've created that ticket if I'd realized this one existed, but the newer 
ticket already has an associated PR, so this should be the one we close.

> Add self join optimization to the TopLevelJoinQuery
> ---
>
> Key: SOLR-14728
> URL: https://issues.apache.org/jira/browse/SOLR-14728
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Priority: Major
>
> A simple optimization can be put in place to massively improve join 
> performance when the TopLevelJoinQuery is performing a self join (same core) 
> and the *to* and *from* fields are the same field. In this scenario the top 
> level doc values ordinals can be used directly as a filter avoiding the most 
> expensive part of the join which is the bytes ref reconciliation between the 
> *to* and *from* fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9638) TestVectorValues.testIndexMultipleVectorFields reproducing test failure

2020-12-15 Thread Michael McCandless (Jira)
Michael McCandless created LUCENE-9638:
--

 Summary: TestVectorValues.testIndexMultipleVectorFields 
reproducing test failure
 Key: LUCENE-9638
 URL: https://issues.apache.org/jira/browse/LUCENE-9638
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless


I was beasting [this PR|https://github.com/apache/lucene-solr/pull/2088] but 
then hit this failure, likely not related to that PR:
{noformat}
org.apache.lucene.index.TestVectorValues > testIndexMultipleVectorFields FAILED
    org.apache.lucene.index.CorruptIndexException: SimpleText failure: expected 
checksum line but got field-number 2 
(resource=BufferedChecksumIndexInput(MockIndexInputWrapper((sliced) offset=96, 
length=478 (clone of) ByteBuffersIndexInput (file=_0.scf, buffers=2,097 b\
ytes, block size: 1,024, blocks: 3, position: 0) [slice=_0.gri])))
        at 
__randomizedtesting.SeedInfo.seed([9963345FEF3254D:173EA6E1008A23F8]:0)
        at 
org.apache.lucene.codecs.simpletext.SimpleTextUtil.checkFooter(SimpleTextUtil.java:89)
        at 
org.apache.lucene.codecs.simpletext.SimpleTextVectorReader.(SimpleTextVectorReader.java:81)
        at 
org.apache.lucene.codecs.simpletext.SimpleTextVectorFormat.fieldsReader(SimpleTextVectorFormat.java:43)
        at 
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:144)
        at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84)
        at 
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:171)
        at 
org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:213)
        at 
org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:572)
        at 
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:105)
        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:630)
        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:474)
        at 
org.apache.lucene.index.TestVectorValues.testIndexMultipleVectorFields(TestVectorValues.java:619)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
        at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
        at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
        at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
        at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
        at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  

[GitHub] [lucene-solr] jbampton commented on a change in pull request #2136: SOLR-15037: fix prevent config change listener to reload core while s…

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2136:
URL: https://github.com/apache/lucene-solr/pull/2136#discussion_r543243184



##
File path: solr/core/src/java/org/apache/solr/schema/SchemaManager.java
##
@@ -103,8 +104,9 @@ private List doOperations(List 
operations) throws InterruptedE
 String errorMsg = "Unable to persist managed schema. ";
 List errors = Collections.emptyList();
 int latestVersion = -1;
-
-synchronized (req.getSchema().getSchemaUpdateLock()) {
+   Lock schemaChangeLock =  req.getSchema().getSchemaUpdateLock();

Review comment:
   ```suggestion
  Lock schemaChangeLock = req.getSchema().getSchemaUpdateLock();
   ```

##
File path: solr/core/src/java/org/apache/solr/schema/SchemaManager.java
##
@@ -454,8 +459,8 @@ private ManagedIndexSchema getFreshManagedSchema(SolrCore 
core) throws IOExcepti
   if (in instanceof ZkSolrResourceLoader.ZkByteArrayInputStream) {
 int version = ((ZkSolrResourceLoader.ZkByteArrayInputStream) 
in).getStat().getVersion();
 log.info("managed schema loaded . version : {} ", version);
-return new ManagedIndexSchema(core.getSolrConfig(), name, new 
InputSource(in), true, name, version,
-core.getLatestSchema().getSchemaUpdateLock());
+Lock schemaLock =  (Lock) core.getLatestSchema().getSchemaUpdateLock();

Review comment:
   ```suggestion
   Lock schemaLock = (Lock) 
core.getLatestSchema().getSchemaUpdateLock();
   ```

##
File path: 
solr/core/src/java/org/apache/solr/schema/ManagedIndexSchemaFactory.java
##
@@ -178,9 +180,14 @@ public ManagedIndexSchema create(String resourceName, 
SolrConfig config) {
 managedSchemaResourceName, 
schemaZkVersion, getSchemaUpdateLock());
 if (shouldUpgrade) {
   // Persist the managed schema if it doesn't already exist
-  synchronized (schema.getSchemaUpdateLock()) {
+  Lock schemaUpdateLock =schema.getSchemaUpdateLock();

Review comment:
   ```suggestion
 Lock schemaUpdateLock = schema.getSchemaUpdateLock();
   ```

##
File path: solr/core/src/java/org/apache/solr/schema/SchemaManager.java
##
@@ -454,8 +459,8 @@ private ManagedIndexSchema getFreshManagedSchema(SolrCore 
core) throws IOExcepti
   if (in instanceof ZkSolrResourceLoader.ZkByteArrayInputStream) {
 int version = ((ZkSolrResourceLoader.ZkByteArrayInputStream) 
in).getStat().getVersion();
 log.info("managed schema loaded . version : {} ", version);
-return new ManagedIndexSchema(core.getSolrConfig(), name, new 
InputSource(in), true, name, version,
-core.getLatestSchema().getSchemaUpdateLock());
+Lock schemaLock =  (Lock) core.getLatestSchema().getSchemaUpdateLock();
+return new ManagedIndexSchema(core.getSolrConfig(), name, new 
InputSource(in), true, name, version,schemaLock );

Review comment:
   ```suggestion
   return new ManagedIndexSchema(core.getSolrConfig(), name, new 
InputSource(in), true, name, version, schemaLock );
   ```

##
File path: solr/core/src/java/org/apache/solr/core/SolrCore.java
##
@@ -3145,14 +3145,17 @@ public static Runnable getConfListener(SolrCore core, 
ZkSolrResourceLoader zkSol
   checkStale(zkClient, managedSchmaResourcePath, 
managedSchemaVersion)) {
 log.info("core reload {}", coreName);
 SolrConfigHandler configHandler = ((SolrConfigHandler) 
core.getRequestHandler("/config"));
-if (configHandler.getReloadLock().tryLock()) {
-
+ if ((!core.schema.isMutable() || 
core.schema.getSchemaUpdateLock().tryLock())
+  && configHandler.getReloadLock().tryLock()) {
   try {
 cc.reload(coreName, coreId);
   } catch (SolrCoreState.CoreIsClosedException e) {
 /*no problem this core is already closed*/
   } finally {
 configHandler.getReloadLock().unlock();
+if(core.schema.isMutable()){

Review comment:
   ```suggestion
   if (core.schema.isMutable()) {
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jbampton commented on a change in pull request #2141: LUCENE-9346: Support minimumNumberShouldMatch in WANDScorer

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2141:
URL: https://github.com/apache/lucene-solr/pull/2141#discussion_r543238073



##
File path: 
lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
##
@@ -195,10 +201,13 @@ private Scorer opt(Collection optional, 
int minShouldMatch,
   for (ScorerSupplier scorer : optional) {
 optionalScorers.add(scorer.get(leadCost));
   }
-  if (minShouldMatch > 1) {
+
+  if (scoreMode == ScoreMode.TOP_SCORES) {
+return new WANDScorer(weight, optionalScorers, minShouldMatch);
+  } else if (minShouldMatch > 1) {
+// nocommit minShouldMath > 1 && scoreMode != ScoreMode.TOP_SCORES 
still requires MinShouldMatchSumScorer.
+// Do we want to depcate this entirely now ?

Review comment:
   ```suggestion
   // Do we want to deprecate this entirely now ?
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jbampton commented on a change in pull request #2142: SOLR-14923: Reload RealtimeSearcher on next getInputDocument if forced

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2142:
URL: https://github.com/apache/lucene-solr/pull/2142#discussion_r543234918



##
File path: 
solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java
##
@@ -618,6 +626,16 @@ public static SolrInputDocument getInputDocument(SolrCore 
core, BytesRef idBytes
 return getInputDocument (core, idBytes, null, null, lookupStrategy);
   }
   
+  /**
+   * Marks the {@link RealTimeGetComponent} of the corresponding {@link 
SolrCore} to reload it's realtime searcher before the next access.

Review comment:
   ```suggestion
  * Marks the {@link RealTimeGetComponent} of the corresponding {@link 
SolrCore} to reload its realtime searcher before the next access.
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jbampton commented on a change in pull request #2146: SOLR-15049: Add TopLevelJoinQuery optimization for 'self-joins'

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2146:
URL: https://github.com/apache/lucene-solr/pull/2146#discussion_r543233296



##
File path: solr/core/src/java/org/apache/solr/search/TopLevelJoinQuery.java
##
@@ -218,4 +218,28 @@ public BitsetBounds(long lower, long upper) {
   this.upper = upper;
 }
   }
+
+  /**
+   * A {@link TopLevelJoinQuery} implementation optimized for when 'from' and 
'to' cores and fields match and no ordinal-
+   * conversion is necessary.
+   */
+  static class SelfJoin extends TopLevelJoinQuery {
+public SelfJoin(String joinField, Query subQuery) {
+  super(joinField, joinField, null, subQuery);
+}
+
+protected BitsetBounds convertFromOrdinalsIntoToField(LongBitSet 
fromOrdBitSet, SortedSetDocValues fromDocValues,
+  LongBitSet 
toOrdBitSet, SortedSetDocValues toDocValues) throws IOException {
+
+  // 'from' and 'to' ordinals are identical for self-joins.
+  toOrdBitSet.or(fromOrdBitSet);
+
+  // Calculate boundary ords used for other optimizations
+  final long firstToOrd = toOrdBitSet.nextSetBit(0);
+  final long lastToOrd = toOrdBitSet.prevSetBit(toOrdBitSet.length() - 1);
+  return new BitsetBounds(firstToOrd, lastToOrd);
+}
+  }
 }
+

Review comment:
   ```suggestion
   ```

##
File path: solr/core/src/java/org/apache/solr/search/TopLevelJoinQuery.java
##
@@ -218,4 +218,28 @@ public BitsetBounds(long lower, long upper) {
   this.upper = upper;
 }
   }
+
+  /**
+   * A {@link TopLevelJoinQuery} implementation optimized for when 'from' and 
'to' cores and fields match and no ordinal-
+   * conversion is necessary.
+   */
+  static class SelfJoin extends TopLevelJoinQuery {
+public SelfJoin(String joinField, Query subQuery) {
+  super(joinField, joinField, null, subQuery);
+}
+
+protected BitsetBounds convertFromOrdinalsIntoToField(LongBitSet 
fromOrdBitSet, SortedSetDocValues fromDocValues,
+  LongBitSet 
toOrdBitSet, SortedSetDocValues toDocValues) throws IOException {
+
+  // 'from' and 'to' ordinals are identical for self-joins.
+  toOrdBitSet.or(fromOrdBitSet);
+
+  // Calculate boundary ords used for other optimizations
+  final long firstToOrd = toOrdBitSet.nextSetBit(0);
+  final long lastToOrd = toOrdBitSet.prevSetBit(toOrdBitSet.length() - 1);
+  return new BitsetBounds(firstToOrd, lastToOrd);
+}
+  }
 }
+
+

Review comment:
   ```suggestion
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jbampton commented on a change in pull request #2147: LUCENE-9637: Clean up ShapeField/ShapeQuery random test

2020-12-15 Thread GitBox


jbampton commented on a change in pull request #2147:
URL: https://github.com/apache/lucene-solr/pull/2147#discussion_r543223927



##
File path: 
lucene/core/src/test/org/apache/lucene/document/BaseXYShapeTestCase.java
##
@@ -126,24 +132,12 @@ protected boolean rectCrossesDateline(Object rect) {
 return false;
   }
 
-  /** use {@link ShapeTestUtil#nextPolygon()} to create a random line; TODO: 
move to GeoTestUtil */
+  /** use {@link ShapeTestUtil#nextPolygon()} to create a random line */
   @Override
   public XYLine nextLine() {
-return getNextLine();
-  }
-
-  public static XYLine getNextLine() {
-XYPolygon poly = ShapeTestUtil.nextPolygon();
-float[] x = new float[poly.numPoints() - 1];
-float[] y = new float[x.length];
-for (int i = 0; i < x.length; ++i) {
-  x[i] = poly.getPolyX(i);
-  y[i] = poly.getPolyY(i);
-}
-
-return new XYLine(x, y);
+return ShapeTestUtil.nextLine();
   }
-
+  

Review comment:
   ```suggestion
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase opened a new pull request #2147: LUCENE-9637: Clean up ShapeField/ShapeQuery random test

2020-12-15 Thread GitBox


iverase opened a new pull request #2147:
URL: https://github.com/apache/lucene-solr/pull/2147


   Removes some unused code and replaces the Point implementation on the test 
with the Point implementation in the geo package.
   
   cc: @nknize 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9637) Clean up ShapeField /ShapeQuery random test

2020-12-15 Thread Ignacio Vera (Jira)
Ignacio Vera created LUCENE-9637:


 Summary: Clean up ShapeField /ShapeQuery random test
 Key: LUCENE-9637
 URL: https://issues.apache.org/jira/browse/LUCENE-9637
 Project: Lucene - Core
  Issue Type: Test
Reporter: Ignacio Vera


There it seems to be a few unused code on those tests and in addition we can 
replace the Point implementation on those test with the Point implementation in 
the geo package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #2133: SOLR-15019: Replica placement API needs a way to fetch existing replica metrics

2020-12-15 Thread GitBox


sigram commented on a change in pull request #2133:
URL: https://github.com/apache/lucene-solr/pull/2133#discussion_r543196623



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/impl/CollectionMetricsBuilder.java
##
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.cluster.placement.impl;
+
+import org.apache.solr.cluster.placement.CollectionMetrics;
+import org.apache.solr.cluster.placement.ReplicaMetrics;
+import org.apache.solr.cluster.placement.ShardMetrics;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+
+/**
+ *
+ */
+public class CollectionMetricsBuilder {
+
+  final Map shardMetricsBuilders = new 
HashMap<>();
+
+
+  public void addShardMetrics(String shardName, ShardMetricsBuilder 
shardMetricsBuilder) {
+shardMetricsBuilders.put(shardName, shardMetricsBuilder);
+  }
+
+  public CollectionMetrics build() {
+final Map metricsMap = new HashMap<>();
+shardMetricsBuilders.forEach((shard, builder) -> metricsMap.put(shard, 
builder.build()));
+return shardName -> Optional.ofNullable(metricsMap.get(shardName));
+  }
+
+  public static class ShardMetricsBuilder {
+final Map replicaMetricsBuilders = new 
HashMap<>();
+
+public ShardMetricsBuilder addReplicaMetrics(String replicaName, 
ReplicaMetricsBuilder replicaMetricsBuilder) {
+  replicaMetricsBuilders.put(replicaName, replicaMetricsBuilder);
+  return this;
+}
+
+public ShardMetricsBuilder setLeaderMetrics(ReplicaMetricsBuilder 
replicaMetricsBuilder) {
+  replicaMetricsBuilders.put(LEADER, replicaMetricsBuilder);
+  return this;
+}
+
+public static final String LEADER = "__leader__";
+
+public ShardMetrics build() {
+  final Map metricsMap = new HashMap<>();
+  replicaMetricsBuilders.forEach((name, replicaBuilder) -> {
+ReplicaMetrics metrics = replicaBuilder.build();
+metricsMap.put(name, metrics);
+if (replicaBuilder.leader) {
+  metricsMap.put(LEADER, metrics);
+}
+  });
+  return new ShardMetrics() {
+@Override
+public Optional getLeaderMetrics() {
+  return Optional.ofNullable(metricsMap.get(LEADER));
+}
+
+@Override
+public Optional getReplicaMetrics(String replicaName) {
+  return Optional.ofNullable(metricsMap.get(replicaName));
+}
+  };
+}
+  }
+
+  public static class ReplicaMetricsBuilder {
+final Map metrics = new HashMap<>();
+int sizeGB = 0;

Review comment:
   Maybe use `Integer` here? Replicas always have size, even new ones, but 
if we can return null then we can signal that the size is unknown (eg. couldn't 
be retrieved from the node).
   Also, should we use Integer or Long? The unit is GB, I don't think we need 
Long here (and in other places that deal with disk size).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #2133: SOLR-15019: Replica placement API needs a way to fetch existing replica metrics

2020-12-15 Thread GitBox


sigram commented on a change in pull request #2133:
URL: https://github.com/apache/lucene-solr/pull/2133#discussion_r543196623



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/impl/CollectionMetricsBuilder.java
##
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.cluster.placement.impl;
+
+import org.apache.solr.cluster.placement.CollectionMetrics;
+import org.apache.solr.cluster.placement.ReplicaMetrics;
+import org.apache.solr.cluster.placement.ShardMetrics;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+
+/**
+ *
+ */
+public class CollectionMetricsBuilder {
+
+  final Map shardMetricsBuilders = new 
HashMap<>();
+
+
+  public void addShardMetrics(String shardName, ShardMetricsBuilder 
shardMetricsBuilder) {
+shardMetricsBuilders.put(shardName, shardMetricsBuilder);
+  }
+
+  public CollectionMetrics build() {
+final Map metricsMap = new HashMap<>();
+shardMetricsBuilders.forEach((shard, builder) -> metricsMap.put(shard, 
builder.build()));
+return shardName -> Optional.ofNullable(metricsMap.get(shardName));
+  }
+
+  public static class ShardMetricsBuilder {
+final Map replicaMetricsBuilders = new 
HashMap<>();
+
+public ShardMetricsBuilder addReplicaMetrics(String replicaName, 
ReplicaMetricsBuilder replicaMetricsBuilder) {
+  replicaMetricsBuilders.put(replicaName, replicaMetricsBuilder);
+  return this;
+}
+
+public ShardMetricsBuilder setLeaderMetrics(ReplicaMetricsBuilder 
replicaMetricsBuilder) {
+  replicaMetricsBuilders.put(LEADER, replicaMetricsBuilder);
+  return this;
+}
+
+public static final String LEADER = "__leader__";
+
+public ShardMetrics build() {
+  final Map metricsMap = new HashMap<>();
+  replicaMetricsBuilders.forEach((name, replicaBuilder) -> {
+ReplicaMetrics metrics = replicaBuilder.build();
+metricsMap.put(name, metrics);
+if (replicaBuilder.leader) {
+  metricsMap.put(LEADER, metrics);
+}
+  });
+  return new ShardMetrics() {
+@Override
+public Optional getLeaderMetrics() {
+  return Optional.ofNullable(metricsMap.get(LEADER));
+}
+
+@Override
+public Optional getReplicaMetrics(String replicaName) {
+  return Optional.ofNullable(metricsMap.get(replicaName));
+}
+  };
+}
+  }
+
+  public static class ReplicaMetricsBuilder {
+final Map metrics = new HashMap<>();
+int sizeGB = 0;

Review comment:
   Maybe use `Integer` here? Replicas always have size, even new ones, but 
if we can return null then we can signal that the size is unknown (eg. couldn't 
be retrieved from the node).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #2133: SOLR-15019: Replica placement API needs a way to fetch existing replica metrics

2020-12-15 Thread GitBox


sigram commented on a change in pull request #2133:
URL: https://github.com/apache/lucene-solr/pull/2133#discussion_r543195639



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/impl/CollectionMetricsBuilder.java
##
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.cluster.placement.impl;
+
+import org.apache.solr.cluster.placement.CollectionMetrics;
+import org.apache.solr.cluster.placement.ReplicaMetrics;
+import org.apache.solr.cluster.placement.ShardMetrics;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+
+/**
+ *
+ */
+public class CollectionMetricsBuilder {
+
+  final Map shardMetricsBuilders = new 
HashMap<>();
+
+
+  public void addShardMetrics(String shardName, ShardMetricsBuilder 
shardMetricsBuilder) {
+shardMetricsBuilders.put(shardName, shardMetricsBuilder);
+  }
+
+  public CollectionMetrics build() {
+final Map metricsMap = new HashMap<>();
+shardMetricsBuilders.forEach((shard, builder) -> metricsMap.put(shard, 
builder.build()));
+return shardName -> Optional.ofNullable(metricsMap.get(shardName));
+  }
+
+  public static class ShardMetricsBuilder {
+final Map replicaMetricsBuilders = new 
HashMap<>();
+
+public ShardMetricsBuilder addReplicaMetrics(String replicaName, 
ReplicaMetricsBuilder replicaMetricsBuilder) {
+  replicaMetricsBuilders.put(replicaName, replicaMetricsBuilder);
+  return this;
+}
+
+public ShardMetricsBuilder setLeaderMetrics(ReplicaMetricsBuilder 
replicaMetricsBuilder) {
+  replicaMetricsBuilders.put(LEADER, replicaMetricsBuilder);
+  return this;
+}
+
+public static final String LEADER = "__leader__";

Review comment:
   Make this `@VisibleForTesting` or package-private.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #2133: SOLR-15019: Replica placement API needs a way to fetch existing replica metrics

2020-12-15 Thread GitBox


sigram commented on a change in pull request #2133:
URL: https://github.com/apache/lucene-solr/pull/2133#discussion_r543193912



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/ShardMetrics.java
##
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.cluster.placement;
+
+import java.util.Optional;
+
+/**
+ *
+ */
+public interface ShardMetrics {
+  Optional getLeaderMetrics();
+  Optional getReplicaMetrics(String replicaName);

Review comment:
   Perhaps we should add `iterator()` here too, so that the consumers are 
not required to know the replica name in advance.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9627) Small refactor of codec classes

2020-12-15 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-9627.
--
Fix Version/s: master (9.0)
 Assignee: Ignacio Vera
   Resolution: Fixed

> Small refactor of codec classes
> ---
>
> Key: LUCENE-9627
> URL: https://issues.apache.org/jira/browse/LUCENE-9627
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While working on LUCENE-9047, I had to refactor some classes in order to 
> separate code that opens a file and reads the header/ footer from the code 
> that reads the actual content of the file. Regardless of that issue, I think 
> the refactor is a good thing.
>  
> In addition it seems Lucene50FieldInfosFormat is not used anywhere in the 
> code so I propose to remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9627) Small refactor of codec classes

2020-12-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249578#comment-17249578
 ] 

ASF subversion and git services commented on LUCENE-9627:
-

Commit 4b3e8d7ce8feba658a9730e65da70c04a7e9c52f in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4b3e8d7 ]

LUCENE-9627: Remove unused Lucene50FieldInfosFormat codec and small refactor 
some codecs  to separate reading header/footer from reading content of the file



> Small refactor of codec classes
> ---
>
> Key: LUCENE-9627
> URL: https://issues.apache.org/jira/browse/LUCENE-9627
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While working on LUCENE-9047, I had to refactor some classes in order to 
> separate code that opens a file and reads the header/ footer from the code 
> that reads the actual content of the file. Regardless of that issue, I think 
> the refactor is a good thing.
>  
> In addition it seems Lucene50FieldInfosFormat is not used anywhere in the 
> code so I propose to remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase merged pull request #2109: LUCENE-9627: Small refactor of codec classes

2020-12-15 Thread GitBox


iverase merged pull request #2109:
URL: https://github.com/apache/lucene-solr/pull/2109


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gf2121 commented on pull request #2113: LUCENE-9629: Use computed masks

2020-12-15 Thread GitBox


gf2121 commented on pull request #2113:
URL: https://github.com/apache/lucene-solr/pull/2113#issuecomment-745134305


   > @dweiss Thank you very much for the guiding of how to do a benchmark on 
Lucene, the lueneutil tool is really nice!
   > I repeatedly execute the wikimedium1m 20 times, the following is the 
result of the last iter. I guess it indicates that the reading performance is 
stable overall :)
   > 
   > ```
   >  TaskQPSbaseline   StdDevQPS   my_mod_ver StdDev   
   Pct diff p-value
   >HighTermMonthSort  405.40 (11.6%)  390.20 (11.1%)   
-3.7% ( -23% -   21%) 0.295
   > PKLookup  131.35  (4.5%)  128.26  (6.5%)   
-2.4% ( -12% -9%) 0.183
   >  Respell   88.96  (8.0%)   86.91  (8.7%)   
-2.3% ( -17% -   15%) 0.383
   >   AndHighLow  611.88  (5.6%)  603.60  (9.2%)   
-1.4% ( -15% -   14%) 0.575
   > BrowseDayOfYearTaxoFacets   21.70  (7.7%)   21.43  (9.1%)  
 -1.3% ( -16% -   16%) 0.630
   > HighSloppyPhrase   74.97  (4.8%)   74.10  (8.6%)   
-1.2% ( -13% -   12%) 0.601
   >BrowseMonthSSDVFacets   91.65  (4.8%)   90.66  (5.8%)   
-1.1% ( -11% -9%) 0.519
   >MedPhrase  110.37  (8.8%)  109.19  (7.3%)   
-1.1% ( -15% -   16%) 0.674
   >OrHighMed  167.04  (8.9%)  165.47  (7.6%)   
-0.9% ( -16% -   17%) 0.718
   >   OrHighHigh   86.33  (9.1%)   85.61  (7.9%)   
-0.8% ( -16% -   17%) 0.755
   >  LowSpanNear  156.83  (7.1%)  155.74  (6.5%)   
-0.7% ( -13% -   13%) 0.746
   > Wildcard  139.21  (7.0%)  138.29  (9.8%)   
-0.7% ( -16% -   17%) 0.805
   > BrowseDayOfYearSSDVFacets   78.95  (5.0%)   78.52  (5.8%)  
 -0.5% ( -10% -   10%) 0.753
   >   Fuzzy1   76.09  (9.7%)   75.84  (7.7%)   
-0.3% ( -16% -   18%) 0.905
   >  MedSloppyPhrase   48.37  (5.9%)   48.22  (5.1%)   
-0.3% ( -10% -   11%) 0.859
   >   IntNRQ  310.73 (10.5%)  310.11 (12.2%)   
-0.2% ( -20% -   25%) 0.955
   > HighTerm  759.07  (7.3%)  757.84 (12.1%)   
-0.2% ( -18% -   20%) 0.959
   >BrowseMonthTaxoFacets   24.17  (9.5%)   24.19 (10.0%)   
 0.1% ( -17% -   21%) 0.984
   > HighIntervalsOrdered  121.98  (5.8%)  122.10  (7.9%)   
 0.1% ( -12% -   14%) 0.964
   >  LowSloppyPhrase  188.90  (7.8%)  189.42  (5.7%)   
 0.3% ( -12% -   14%) 0.898
   >   AndHighMed  418.22  (9.0%)  420.49  (9.7%)   
 0.5% ( -16% -   21%) 0.855
   >  MedTerm  874.56  (7.6%)  880.02 (10.9%)   
 0.6% ( -16% -   20%) 0.833
   >  MedSpanNear  378.96  (7.2%)  381.70  (9.2%)   
 0.7% ( -14% -   18%) 0.781
   >   Fuzzy2   25.74  (9.8%)   25.97 (10.9%)   
 0.9% ( -17% -   23%) 0.777
   >  AndHighHigh  126.51  (6.1%)  127.75  (8.5%)   
 1.0% ( -12% -   16%) 0.676
   >   HighPhrase  165.71  (8.2%)  167.64 (11.3%)   
 1.2% ( -16% -   22%) 0.708
   > BrowseDateTaxoFacets   21.40 (10.4%)   21.69 (10.0%)   
 1.3% ( -17% -   24%) 0.682
   >  LowTerm 1032.43  (8.0%) 1049.57 (10.1%)   
 1.7% ( -15% -   21%) 0.566
   >OrHighLow  228.54  (5.0%)  232.36  (8.5%)   
 1.7% ( -11% -   15%) 0.446
   > HighSpanNear   93.72  (7.4%)   95.46  (7.3%)   
 1.9% ( -12% -   17%) 0.427
   >HighTermDayOfYearSort  380.08 (12.2%)  387.47  (9.4%)   
 1.9% ( -17% -   26%) 0.573
   >LowPhrase  137.13  (8.2%)  140.30  (6.2%)   
 2.3% ( -11% -   18%) 0.314
   >  Prefix3  290.55 (11.0%)  299.98 (12.9%)   
 3.2% ( -18% -   30%) 0.390
   > ```
   
   Though this result looks fine, i'm still a bit worried about the cost of 
reading array in some cases, for example, in a different java version. So i 
updated this PR again to make sure that we only read arrays when the index is a 
variable. In other case, we still read final longs. There are only 14 lines 
more than the origin code, which i think can be acceptable :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional