[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250097#comment-17250097 ] Elbek Kamoliddinov commented on LUCENE-9640: I have a naive implementation where {{TrackingQuery}} creates a sparse bitset per segment and sets a bit for a matching doc as query runs. I will put a PR later this week. I wanted to start a discussion and opinion from the community. Thanks, Elbek. > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9640) Add TrackingQuery to track matching documents
Elbek Kamoliddinov created LUCENE-9640: -- Summary: Add TrackingQuery to track matching documents Key: LUCENE-9640 URL: https://issues.apache.org/jira/browse/LUCENE-9640 Project: Lucene - Core Issue Type: New Feature Components: core/search Reporter: Elbek Kamoliddinov Some users benefit having {{TrackingQuery}} functionality. This query would wrap another query and should be able to provide the matched DocIds for the wrapped query after search is run. For example a user running a boolean query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the boolean query and check if documents that matched the boolean query matches the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14923) Indexing performance is unacceptable when child documents are involved
[ https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250026#comment-17250026 ] Thomas Wöckinger edited comment on SOLR-14923 at 12/16/20, 12:29 AM: - That's good news, i can support you, if you want, just leave a comment. was (Author: thomas.woeckinger): That's good news, i can support you, if you want, just leave comment. > Indexing performance is unacceptable when child documents are involved > -- > > Key: SOLR-14923 > URL: https://issues.apache.org/jira/browse/SOLR-14923 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0) >Reporter: Thomas Wöckinger >Priority: Critical > Labels: performance, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Parallel indexing does not make sense at moment when child documents are used. > The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the > end of the method doVersionAdd if Ulog caches should be refreshed. > This check will return true if any child document is included in the > AddUpdateCommand. > If so ulog.openRealtimeSearcher(); is called, this call is very expensive, > and executed in a synchronized block of the UpdateLog instance, therefore all > other operations on the UpdateLog are blocked too. > Because every important UpdateLog method (add, delete, ...) is done using a > synchronized block almost each operation is blocked. > This reduces multi threaded index update to a single thread behavior. > The described behavior is not depending on any option of the UpdateRequest, > so it does not make any difference if 'waitFlush', 'waitSearcher' or > 'softCommit' is true or false. > The described behavior makes the usage of ChildDocuments useless, because the > performance is unacceptable. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14923) Indexing performance is unacceptable when child documents are involved
[ https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250026#comment-17250026 ] Thomas Wöckinger commented on SOLR-14923: - That's good news, i can support you, if you want, just leave comment. > Indexing performance is unacceptable when child documents are involved > -- > > Key: SOLR-14923 > URL: https://issues.apache.org/jira/browse/SOLR-14923 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0) >Reporter: Thomas Wöckinger >Priority: Critical > Labels: performance, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Parallel indexing does not make sense at moment when child documents are used. > The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the > end of the method doVersionAdd if Ulog caches should be refreshed. > This check will return true if any child document is included in the > AddUpdateCommand. > If so ulog.openRealtimeSearcher(); is called, this call is very expensive, > and executed in a synchronized block of the UpdateLog instance, therefore all > other operations on the UpdateLog are blocked too. > Because every important UpdateLog method (add, delete, ...) is done using a > synchronized block almost each operation is blocked. > This reduces multi threaded index update to a single thread behavior. > The described behavior is not depending on any option of the UpdateRequest, > so it does not make any difference if 'waitFlush', 'waitSearcher' or > 'softCommit' is true or false. > The described behavior makes the usage of ChildDocuments useless, because the > performance is unacceptable. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14923) Indexing performance is unacceptable when child documents are involved
[ https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Wöckinger updated SOLR-14923: Affects Version/s: 8.7 > Indexing performance is unacceptable when child documents are involved > -- > > Key: SOLR-14923 > URL: https://issues.apache.org/jira/browse/SOLR-14923 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0) >Reporter: Thomas Wöckinger >Priority: Critical > Labels: performance, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Parallel indexing does not make sense at moment when child documents are used. > The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the > end of the method doVersionAdd if Ulog caches should be refreshed. > This check will return true if any child document is included in the > AddUpdateCommand. > If so ulog.openRealtimeSearcher(); is called, this call is very expensive, > and executed in a synchronized block of the UpdateLog instance, therefore all > other operations on the UpdateLog are blocked too. > Because every important UpdateLog method (add, delete, ...) is done using a > synchronized block almost each operation is blocked. > This reduces multi threaded index update to a single thread behavior. > The described behavior is not depending on any option of the UpdateRequest, > so it does not make any difference if 'waitFlush', 'waitSearcher' or > 'softCommit' is true or false. > The described behavior makes the usage of ChildDocuments useless, because the > performance is unacceptable. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on pull request #2118: SOLR-15031: Prevent null being wrapped in a QueryValueSource
madrob commented on pull request #2118: URL: https://github.com/apache/lucene-solr/pull/2118#issuecomment-745607522 Thanks for adding the test! it passes for me when run alone, but not as part of the full class, can you verify in your environment as well? `./gradlew :solr:core:test --tests TestFunctionQuery` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on pull request #2118: SOLR-15031: Prevent null being wrapped in a QueryValueSource
madrob commented on pull request #2118: URL: https://github.com/apache/lucene-solr/pull/2118#issuecomment-745608863 Also, would you mind adding an entry to `solr/CHANGES.txt` with how you would like proper credit/attribution? I'll make sure that it gets to the right section, so don't stress about that if you are unsure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on pull request #2121: SOLR-10860: Return proper error code for bad input incase of inplace updates
madrob commented on pull request #2121: URL: https://github.com/apache/lucene-solr/pull/2121#issuecomment-745598327 @munendrasn all of your comments make sense, thanks. let's update the error message as you describe and then this looks good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249980#comment-17249980 ] Ankur commented on LUCENE-9444: --- [~mikemccand], Sorry for the late response. Yes we can resolve this one now. > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Fix For: master (9.0) > > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur resolved LUCENE-9444. --- Resolution: Fixed > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Fix For: master (9.0) > > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15029) More gracefully allow Shard Leader to give up leadership
[ https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249976#comment-17249976 ] ASF subversion and git services commented on SOLR-15029: Commit bf7b438f12d65904b461e595594fc9a64cfcc899 in lucene-solr's branch refs/heads/master from Mike Drob [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bf7b438 ] SOLR-15029 Trigger leader election on index writer tragedy SOLR-13027 Use TestInjection so that we always have a Tragic Event When we encounter a tragic error in the index writer, we can trigger a leader election instead of queing up a delete and re-add of the node in question. This should result in a more graceful transition, and the previous leader will eventually be put into recovery by a new leader. closes #2120 > More gracefully allow Shard Leader to give up leadership > > > Key: SOLR-15029 > URL: https://issues.apache.org/jira/browse/SOLR-15029 > Project: Solr > Issue Type: Improvement >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: 8.8, master (9.0) > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently we have (via SOLR-12412) that when a leader sees an index writing > error during an update it will give up leadership by deleting the replica and > adding a new replica. One stated benefit of this was that because we are > using the overseer and a known code path, that this is done asynchronous and > very efficiently. > I would argue that this approach is too heavy handed. > In the case of a corrupt index exception, it makes some sense to completely > delete the index dir and attempt to sync from a good peer. Even in this case, > however, it might be better to allow fingerprinting and other index delta > mechanisms take over and allow for a more efficient data transfer. > In an alternate case where the index error arises due to a disconnected file > system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) > and the required solution is some kind of reconnect, then this approach has > several shortcomings - the core delete and creations are going to fail > leaving dangling replicas. Further, the data is still present so there is no > need to do so many extra copies. > I propose that we bring in a mechanism to give up leadership via the existing > shard terms language. I believe we would be able to set all replicas > currently equal to leader term T to T+1, and then trigger a new leader > election. The current leader would know it is ineligible, while the other > replicas that were current before the failed update would be eligible. This > improvement would entail adding an additional possible operation to terms > state machine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13027) Harden LeaderTragicEventTest.
[ https://issues.apache.org/jira/browse/SOLR-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249977#comment-17249977 ] ASF subversion and git services commented on SOLR-13027: Commit bf7b438f12d65904b461e595594fc9a64cfcc899 in lucene-solr's branch refs/heads/master from Mike Drob [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bf7b438 ] SOLR-15029 Trigger leader election on index writer tragedy SOLR-13027 Use TestInjection so that we always have a Tragic Event When we encounter a tragic error in the index writer, we can trigger a leader election instead of queing up a delete and re-add of the node in question. This should result in a more graceful transition, and the previous leader will eventually be put into recovery by a new leader. closes #2120 > Harden LeaderTragicEventTest. > - > > Key: SOLR-13027 > URL: https://issues.apache.org/jira/browse/SOLR-13027 > Project: Solr > Issue Type: Sub-task > Components: Tests >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15029) More gracefully allow Shard Leader to give up leadership
[ https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249978#comment-17249978 ] ASF subversion and git services commented on SOLR-15029: Commit b090971259f57973941d70d13612e22985a09a8d in lucene-solr's branch refs/heads/branch_8x from Mike Drob [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b090971 ] SOLR-15029 Trigger leader election on index writer tragedy SOLR-13027 Use TestInjection so that we always have a Tragic Event When we encounter a tragic error in the index writer, we can trigger a leader election instead of queing up a delete and re-add of the node in question. This should result in a more graceful transition, and the previous leader will eventually be put into recovery by a new leader. Backport removes additional logging from ShardTerms.save because we do not have StackWalker in Java 8. > More gracefully allow Shard Leader to give up leadership > > > Key: SOLR-15029 > URL: https://issues.apache.org/jira/browse/SOLR-15029 > Project: Solr > Issue Type: Improvement >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: 8.8, master (9.0) > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently we have (via SOLR-12412) that when a leader sees an index writing > error during an update it will give up leadership by deleting the replica and > adding a new replica. One stated benefit of this was that because we are > using the overseer and a known code path, that this is done asynchronous and > very efficiently. > I would argue that this approach is too heavy handed. > In the case of a corrupt index exception, it makes some sense to completely > delete the index dir and attempt to sync from a good peer. Even in this case, > however, it might be better to allow fingerprinting and other index delta > mechanisms take over and allow for a more efficient data transfer. > In an alternate case where the index error arises due to a disconnected file > system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) > and the required solution is some kind of reconnect, then this approach has > several shortcomings - the core delete and creations are going to fail > leaving dangling replicas. Further, the data is still present so there is no > need to do so many extra copies. > I propose that we bring in a mechanism to give up leadership via the existing > shard terms language. I believe we would be able to set all replicas > currently equal to leader term T to T+1, and then trigger a new leader > election. The current leader would know it is ineligible, while the other > replicas that were current before the failed update would be eligible. This > improvement would entail adding an additional possible operation to terms > state machine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13027) Harden LeaderTragicEventTest.
[ https://issues.apache.org/jira/browse/SOLR-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249979#comment-17249979 ] ASF subversion and git services commented on SOLR-13027: Commit b090971259f57973941d70d13612e22985a09a8d in lucene-solr's branch refs/heads/branch_8x from Mike Drob [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b090971 ] SOLR-15029 Trigger leader election on index writer tragedy SOLR-13027 Use TestInjection so that we always have a Tragic Event When we encounter a tragic error in the index writer, we can trigger a leader election instead of queing up a delete and re-add of the node in question. This should result in a more graceful transition, and the previous leader will eventually be put into recovery by a new leader. Backport removes additional logging from ShardTerms.save because we do not have StackWalker in Java 8. > Harden LeaderTragicEventTest. > - > > Key: SOLR-13027 > URL: https://issues.apache.org/jira/browse/SOLR-13027 > Project: Solr > Issue Type: Sub-task > Components: Tests >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15029) More gracefully allow Shard Leader to give up leadership
[ https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob resolved SOLR-15029. -- Resolution: Fixed > More gracefully allow Shard Leader to give up leadership > > > Key: SOLR-15029 > URL: https://issues.apache.org/jira/browse/SOLR-15029 > Project: Solr > Issue Type: Improvement >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: 8.8, master (9.0) > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently we have (via SOLR-12412) that when a leader sees an index writing > error during an update it will give up leadership by deleting the replica and > adding a new replica. One stated benefit of this was that because we are > using the overseer and a known code path, that this is done asynchronous and > very efficiently. > I would argue that this approach is too heavy handed. > In the case of a corrupt index exception, it makes some sense to completely > delete the index dir and attempt to sync from a good peer. Even in this case, > however, it might be better to allow fingerprinting and other index delta > mechanisms take over and allow for a more efficient data transfer. > In an alternate case where the index error arises due to a disconnected file > system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) > and the required solution is some kind of reconnect, then this approach has > several shortcomings - the core delete and creations are going to fail > leaving dangling replicas. Further, the data is still present so there is no > need to do so many extra copies. > I propose that we bring in a mechanism to give up leadership via the existing > shard terms language. I believe we would be able to set all replicas > currently equal to leader term T to T+1, and then trigger a new leader > election. The current leader would know it is ineligible, while the other > replicas that were current before the failed update would be eligible. This > improvement would entail adding an additional possible operation to terms > state machine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob closed pull request #2120: SOLR-15029 More gracefully give up shard leadership
madrob closed pull request #2120: URL: https://github.com/apache/lucene-solr/pull/2120 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13102) Shared storage Directory implementation
[ https://issues.apache.org/jira/browse/SOLR-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249965#comment-17249965 ] David Smiley commented on SOLR-13102: - I forgot about this proposal. Still; [~ysee...@gmail.com] please take a look at my proposal SOLR-15051. The issue here centers around the use of a SolrCloud shard "term" to keep multiple readers and one writer with leader hand-off happy using the same space for a shard. My primary concern with this plan is how this may conceptually leak concerns between the low level Directory and high level SolrCloud. Perhaps it can work in some way nicely – I dunno just by looking at the issue description. Also it's unclear if the replica types would know/care about the use of this Directory; hopefully not. Might you re-title this to somehow include "via shard leader term prefix" or some-such differentiator? Solr *already* has a shared storage implementation using HdfsDirectory. > Shared storage Directory implementation > --- > > Key: SOLR-13102 > URL: https://issues.apache.org/jira/browse/SOLR-13102 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Priority: Major > > We need a general strategy (and probably a general base class) that can work > with shared storage and not corrupt indexes from multiple writers. > One strategy that is used on local disk is to use locks. This doesn't extend > well to remote / shared filesystems when the locking is not tied into the > object store itself since a process can lose the lock (a long GC or whatever) > and then immediately try to write a file and there is no way to stop it. > An alternate strategy ditches the use of locks and simply avoids overwriting > files by some algorithmic mechanism. > One of my colleagues outlined one way to do this: > https://www.youtube.com/watch?v=UeTFpNeJ1Fo > That strategy uses random looking filenames and then writes a "core.metadata" > file that maps between the random names and the original names. The problem > is then reduced to overwriting "core.metadata" when you lose the lock. One > way to fix this is to version "core.metadata". Since the new leader election > code was implemented, each shard as a monotonically increasing "leader term", > and we can use that as part of the filename. When a reader goes to open an > index, it can use the latest file from the directory listing, or even use the > term obtained from ZK if we can't trust the directory listing to be up to > date. Additionally, we don't need random filenames to avoid collisions... a > simple unique prefix or suffix would work fine (such as the leader term again) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)
[ https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249961#comment-17249961 ] David Smiley commented on SOLR-15051: - BTW I started with a SIP but then I doubted this because BlobDirectory fits an existing abstraction well, without requiring changes elsewhere. Regardless, IMO the point of a SIP is largely to draw attention to important things. At the conclusion of an internal "hack day" I'm doing now with [~broustant] and [~nazerke] to make this thing real, I'll share this more (e.g. the dev list). > Shared storage -- BlobDirectory (de-duping) > --- > > Key: SOLR-15051 > URL: https://issues.apache.org/jira/browse/SOLR-15051 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > > This proposal is a way to accomplish shared storage in SolrCloud with a few > key characteristics: (A) using a Directory implementation, (B) delegates to a > backing local file Directory as a kind of read/write cache (C) replicas have > their own "space", (D) , de-duplication across replicas via reference > counting, (E) uses ZK but separately from SolrCloud stuff. > The Directory abstraction is a good one, and helps isolate shared storage > from the rest of SolrCloud that doesn't care. Using a backing normal file > Directory is faster for reads and is simpler than Solr's HDFSDirectory's > BlockCache. Replicas having their own space solves the problem of multiple > writers (e.g. of the same shard) trying to own and write to the same space, > and it implies that any of Solr's replica types can be used along with what > goes along with them like peer-to-peer replication (sometimes faster/cheaper > than pulling from shared storage). A de-duplication feature solves needless > duplication of files across replicas and from parent shards (i.e. from shard > splitting). The de-duplication feature requires a place to cache directory > listings so that they can be shared across replicas and atomically updated; > this is handled via ZooKeeper. Finally, some sort of Solr daemon / > auto-scaling code should be added to implement "autoAddReplicas", especially > to provide for a scenario where the leader is gone and can't be replicated > from directly but we can access shared storage. > For more about shared storage concepts, consider looking at the description > in SOLR-13101 and the linked Google Doc. > *[PROPOSAL > DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)
[ https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-15051: Description: This proposal is a way to accomplish shared storage in SolrCloud with a few key characteristics: (A) using a Directory implementation, (B) delegates to a backing local file Directory as a kind of read/write cache (C) replicas have their own "space", (D) , de-duplication across replicas via reference counting, (E) uses ZK but separately from SolrCloud stuff. The Directory abstraction is a good one, and helps isolate shared storage from the rest of SolrCloud that doesn't care. Using a backing normal file Directory is faster for reads and is simpler than Solr's HDFSDirectory's BlockCache. Replicas having their own space solves the problem of multiple writers (e.g. of the same shard) trying to own and write to the same space, and it implies that any of Solr's replica types can be used along with what goes along with them like peer-to-peer replication (sometimes faster/cheaper than pulling from shared storage). A de-duplication feature solves needless duplication of files across replicas and from parent shards (i.e. from shard splitting). The de-duplication feature requires a place to cache directory listings so that they can be shared across replicas and atomically updated; this is handled via ZooKeeper. Finally, some sort of Solr daemon / auto-scaling code should be added to implement "autoAddReplicas", especially to provide for a scenario where the leader is gone and can't be replicated from directly but we can access shared storage. For more about shared storage concepts, consider looking at the description in SOLR-13101 and the linked Google Doc. *[PROPOSAL DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]* was: This proposal is a way to accomplish shared storage in SolrCloud with a few key characteristics: (A) using a Directory implementation, (B) delegates to a backing local file Directory as a kind of read/write cache (C) replicas have their own "space", (D) , de-duplication across replicas via reference counting, (E) uses ZK but separately from SolrCloud stuff. The Directory abstraction is a good one, and helps isolate shared storage from the rest of SolrCloud that doesn't care. Using a backing normal file Directory is faster for reads and is simpler than Solr's HDFSDirectory's BlockCache. Replicas having their own space solves the problem of multiple writers (e.g. of the same shard) trying to own and write to the same space, and it implies that any of Solr's replica types can be used along with what goes along with them like peer-to-peer replication (sometimes faster/cheaper than pulling from shared storage). A de-duplication feature solves needless duplication of files across replicas and from parent shards (i.e. from shard splitting). The de-duplication feature requires a place to cache directory listings so that they can be shared across replicas and atomically updated; this is handled via ZooKeeper. Finally, some sort of Solr daemon / auto-scaling code should be added to implement "autoAddReplicas", especially to provide for a scenario where the leader is gone and can't be replicated from directly but we can access shared storage. For more about shared storage concepts, consider looking at the description in SOLR-13101. > Shared storage -- BlobDirectory (de-duping) > --- > > Key: SOLR-15051 > URL: https://issues.apache.org/jira/browse/SOLR-15051 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > > This proposal is a way to accomplish shared storage in SolrCloud with a few > key characteristics: (A) using a Directory implementation, (B) delegates to a > backing local file Directory as a kind of read/write cache (C) replicas have > their own "space", (D) , de-duplication across replicas via reference > counting, (E) uses ZK but separately from SolrCloud stuff. > The Directory abstraction is a good one, and helps isolate shared storage > from the rest of SolrCloud that doesn't care. Using a backing normal file > Directory is faster for reads and is simpler than Solr's HDFSDirectory's > BlockCache. Replicas having their own space solves the problem of multiple > writers (e.g. of the same shard) trying to own and write to the same space, > and it implies that any of Solr's replica types can be used along with what > goes along with them like peer-to-peer replication (sometimes faster/cheaper > than pulling from shared storage). A de-duplication feature solves needless > duplication of files across replicas and
[jira] [Created] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)
David Smiley created SOLR-15051: --- Summary: Shared storage -- BlobDirectory (de-duping) Key: SOLR-15051 URL: https://issues.apache.org/jira/browse/SOLR-15051 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: David Smiley Assignee: David Smiley This proposal is a way to accomplish shared storage in SolrCloud with a few key characteristics: (A) using a Directory implementation, (B) delegates to a backing local file Directory as a kind of read/write cache (C) replicas have their own "space", (D) , de-duplication across replicas via reference counting, (E) uses ZK but separately from SolrCloud stuff. The Directory abstraction is a good one, and helps isolate shared storage from the rest of SolrCloud that doesn't care. Using a backing normal file Directory is faster for reads and is simpler than Solr's HDFSDirectory's BlockCache. Replicas having their own space solves the problem of multiple writers (e.g. of the same shard) trying to own and write to the same space, and it implies that any of Solr's replica types can be used along with what goes along with them like peer-to-peer replication (sometimes faster/cheaper than pulling from shared storage). A de-duplication feature solves needless duplication of files across replicas and from parent shards (i.e. from shard splitting). The de-duplication feature requires a place to cache directory listings so that they can be shared across replicas and atomically updated; this is handled via ZooKeeper. Finally, some sort of Solr daemon / auto-scaling code should be added to implement "autoAddReplicas", especially to provide for a scenario where the leader is gone and can't be replicated from directly but we can access shared storage. For more about shared storage concepts, consider looking at the description in SOLR-13101. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15026) MiniSolrCloudCluster can inconsistently get confused about when it's using SSL
[ https://issues.apache.org/jira/browse/SOLR-15026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249891#comment-17249891 ] Mike Drob commented on SOLR-15026: -- The relevant bit of why the change is these system properties - https://github.com/apache/lucene-solr/blob/master/solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java#L291 If we need to break that inheritance, we should be able to duplicate some minimal setup bits. > MiniSolrCloudCluster can inconsistently get confused about when it's using SSL > -- > > Key: SOLR-15026 > URL: https://issues.apache.org/jira/browse/SOLR-15026 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Priority: Major > > MiniSolrCloudCluster makes multiple assumptions related to "SSL" that can be > confusing/missleading when attempting to write a test that uses > MiniSolrCloudCluster. This can lead to some aspects of MiniSolrCloudCluster > assuming that SSL should be used -- inspite of what JettyConfig is specified > -- based on system properties; or conversly: to not correctly using SSL > realted options in all code paths even when the JettyConfig indicates SSL is > needed. > Current workarounds: > * Directly instantiating a MiniSolrCloudCluster in a subclass of > {{SolrTestCaseJ4}} should be avoided unless you explicitly use the > {{SuppressSSL}} anotation. > * If you wish to use a MiniSolrCloudCluster w/SSL use {{SolrCloudTestCase}} > (or {{SolrTestCaseJ4}} directly) along with the {{RandomizeSSL}} annotation > instead of attempting to directly instantiate a MiniSolrCloudCluster. > ** There is currently no _easy_ way to directly instantiate a > MiniSolrCloudCluster _and_ use SSL with setting a few system properties and > calling some static methods from your test case ({{SolrTestCaseJ4}} / > {{SolrCloudTestCase}} handles this for you when the {{RandomizeSSL}} > annotation is used) > {panel:title=original issue report} > A new test added in SOLR-14934 caused the following reproducible failure to > pop up on jenkins... > {noformat} > hossman@slate:~/lucene/dev [j11] [master] $ ./gradlew -p solr/test-framework/ > test --tests MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders > -Dtests.seed=806A85748BD81F48 -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=ln-CG -Dtests.timezone=Asia/Thimbu -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > Starting a Gradle Daemon (subsequent builds will be faster) > > Task :randomizationInfo > Running tests with randomization seed: tests.seed=806A85748BD81F48 > > Task :solr:test-framework:test > org.apache.solr.cloud.MiniSolrCloudClusterTest > > testSolrHomeAndResourceLoaders FAILED > org.apache.solr.client.solrj.SolrServerException: IOException occurred > when talking to server at: https://127.0.0.1:38681/solr > at > __randomizedtesting.SeedInfo.seed([806A85748BD81F48:37548FA7602CB5FD]:0) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:712) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:269) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251) > at > org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:390) > at > org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:360) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1168) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:931) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:865) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:229) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:246) > at > org.apache.solr.cloud.MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders(MiniSolrCloudClusterTest.java:125) > ... > Caused by: > javax.net.ssl.SSLException: Unsupported or unrecognized SSL message > at > java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:439) > {noformat} > The problem sems to be that even though the MiniSolrCloudCluster being > instantiated isn't _intentionally_ using any SSL randomization (it just uses > {{JettyConfig.builder().build()}} the CloudSolrClient returned by > {{cluster.getSolrClient()}} is evidnetly picking up the ranodmized SSL and > trying to use it to talk to the cluster. > {panel} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (SOLR-14886) Suppress stack trace in Query response.
[ https://issues.apache.org/jira/browse/SOLR-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249854#comment-17249854 ] Isabelle Giguere edited comment on SOLR-14886 at 12/15/20, 6:45 PM: [~gerlowskija] The full stack trace in the error response can be a vulnerability. As explained by our application security assessment team: {quote} Detailed technical error messages can allow an attacker to gain information about the application and database that could be used to conduct an attack. This information could include the names of database tables and columns, the structure of database queries, method names, configuration details, etc. {quote} So, OK, no database here. But the basic idea is that the stack trace contains too much information for a response to the outside world. Stack traces are for logs, for developers. It falls into item #6 in the OWASP top 10 https://owasp.org/www-project-top-ten/ "verbose error messages containing sensitive information" So, either each and every error message needs to be cleaned-up individually, which is error-prone, or, we don't display any details to the outside world. Because the stack trace lists all classes and methods, a hacker can determine which vulnerable library is included on the classpath. So in this sense, even information about the classpath is sensitive information. was (Author: igiguere): [~gerlowskija] The full stack trace in the error response can be a vulnerability. As explained by our application security assessment team: {quote} Detailed technical error messages can allow an attacker to gain information about the application and database that could be used to conduct an attack. This information could include the names of database tables and columns, the structure of database queries, method names, configuration details, etc. {quote} So, OK, no database here. But the basic idea is that the stack trace contains too much information for a response to the outside world. Stack traces are for logs, for developers. It falls into item #6 in the OWASP top 10 https://owasp.org/www-project-top-ten/ "verbose error messages containing sensitive information" So, either each an every error message needs to be cleaned-up individually, which is error-prone, or, we don't display any details to the outside world. Because the stack trace lists all classes and methods, a hacker can determine which vulnerable library is included on the classpath. So in this sense, even information about the classpath is sensitive information. > Suppress stack trace in Query response. > --- > > Key: SOLR-14886 > URL: https://issues.apache.org/jira/browse/SOLR-14886 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.6.2 >Reporter: Vrinda Davda >Priority: Minor > > Currently there is no way to suppress the stack trace in solr response when > it throws an exception, like when a client sends a badly formed query string, > or exception with status 500 It sends full stack trace in the response. > I would propose a configuration for error messages so that the stack trace is > not visible to avoid any sensitive information in the stack trace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14886) Suppress stack trace in Query response.
[ https://issues.apache.org/jira/browse/SOLR-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249854#comment-17249854 ] Isabelle Giguere edited comment on SOLR-14886 at 12/15/20, 6:44 PM: [~gerlowskija] The full stack trace in the error response can be a vulnerability. As explained by our application security assessment team: {quote} Detailed technical error messages can allow an attacker to gain information about the application and database that could be used to conduct an attack. This information could include the names of database tables and columns, the structure of database queries, method names, configuration details, etc. {quote} So, OK, no database here. But the basic idea is that the stack trace contains too much information for a response to the outside world. Stack traces are for logs, for developers. It falls into item #6 in the OWASP top 10 https://owasp.org/www-project-top-ten/ "verbose error messages containing sensitive information" So, either each an every error message needs to be cleaned-up individually, which is error-prone, or, we don't display any details to the outside world. Because the stack trace lists all classes and methods, a hacker can determine which vulnerable library is included on the classpath. So in this sense, even information about the classpath is sensitive information. was (Author: igiguere): [~gerlowskija] The full stack trace in the error response can be a vulnerability. As explained by our application security assessment team: {quote} Detailed technical error messages can allow an attacker to gain information about the application and database that could be used to conduct an attack. This information could include the names of database tables and columns, the structure of database queries, method names, configuration details, etc. {quote} So, OK, no database here. But the basic idea is that the stack trace contains too much information for a response. > Suppress stack trace in Query response. > --- > > Key: SOLR-14886 > URL: https://issues.apache.org/jira/browse/SOLR-14886 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.6.2 >Reporter: Vrinda Davda >Priority: Minor > > Currently there is no way to suppress the stack trace in solr response when > it throws an exception, like when a client sends a badly formed query string, > or exception with status 500 It sends full stack trace in the response. > I would propose a configuration for error messages so that the stack trace is > not visible to avoid any sensitive information in the stack trace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14886) Suppress stack trace in Query response.
[ https://issues.apache.org/jira/browse/SOLR-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249854#comment-17249854 ] Isabelle Giguere commented on SOLR-14886: - [~gerlowskija] The full stack trace in the error response can be a vulnerability. As explained by our application security assessment team: {quote} Detailed technical error messages can allow an attacker to gain information about the application and database that could be used to conduct an attack. This information could include the names of database tables and columns, the structure of database queries, method names, configuration details, etc. {quote} So, OK, no database here. But the basic idea is that the stack trace contains too much information for a response. > Suppress stack trace in Query response. > --- > > Key: SOLR-14886 > URL: https://issues.apache.org/jira/browse/SOLR-14886 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.6.2 >Reporter: Vrinda Davda >Priority: Minor > > Currently there is no way to suppress the stack trace in solr response when > it throws an exception, like when a client sends a badly formed query string, > or exception with status 500 It sends full stack trace in the response. > I would propose a configuration for error messages so that the stack trace is > not visible to avoid any sensitive information in the stack trace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14034) remove deprecated min_rf references
[ https://issues.apache.org/jira/browse/SOLR-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249836#comment-17249836 ] Christine Poerschke edited comment on SOLR-14034 at 12/15/20, 6:19 PM: --- Hello everyone. I'm not very familiar with the {{min_rf}} parameter and what exactly its removal here will entail but yes based on [https://github.com/apache/lucene-solr/search?q=min_rf] and [https://github.com/apache/lucene-solr/search?q=MIN_REPFACT] search results this JIRA item remains available to be worked on further – thanks to everyone who already started analysing the code as per above. Perhaps a _draft_ and _initially partially scoped_ pull request could provide a way to continue to move forward here i.e. if the SOLR-14034 ticket number is included in its title then it will get automatically linked here i.e. we can all easily find it then and perhaps then incrementally the possibilities for the above questions would become clearer? Hope that helps. was (Author: cpoerschke): Hello everyone. I'm not very familiar with the {{min_rf}} parameter and what exactly its removal here will entail but yes based on [https://github.com/apache/lucene-solr/search?q=min_rf] and [https://github.com/apache/lucene-solr/search?q=MIN_REPFACT] search results this JIRA item remains available to be worked on. Perhaps a _draft_ and _initially partially scoped_ pull request provide a way forward here i.e. if the SOLR-14034 ticket number is included in its title then it will get automatically linked here i.e. we can all easily find it then and perhaps then incrementally the possibilities for the above questions would become clearer? Hope that helps. > remove deprecated min_rf references > --- > > Key: SOLR-14034 > URL: https://issues.apache.org/jira/browse/SOLR-14034 > Project: Solr > Issue Type: Task >Reporter: Christine Poerschke >Priority: Blocker > Labels: newdev > Fix For: master (9.0) > > > * {{min_rf}} support was added under SOLR-5468 in version 4.9 > (https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.9.0/solr/solrj/src/java/org/apache/solr/client/solrj/request/UpdateRequest.java#L50) > and deprecated under SOLR-12767 in version 7.6 > (https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.6.0/solr/solrj/src/java/org/apache/solr/client/solrj/request/UpdateRequest.java#L57-L61) > * http://lucene.apache.org/solr/7_6_0/changes/Changes.html and > https://lucene.apache.org/solr/guide/8_0/major-changes-in-solr-8.html#solr-7-6 > both clearly mention the deprecation > This ticket is to fully remove {{min_rf}} references in code, tests and > documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke closed pull request #1705: factor out static LTRQParserPlugin.newLTRScoringQuery(...) method
cpoerschke closed pull request #1705: URL: https://github.com/apache/lucene-solr/pull/1705 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14034) remove deprecated min_rf references
[ https://issues.apache.org/jira/browse/SOLR-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249836#comment-17249836 ] Christine Poerschke commented on SOLR-14034: Hello everyone. I'm not very familiar with the {{min_rf}} parameter and what exactly its removal here will entail but yes based on [https://github.com/apache/lucene-solr/search?q=min_rf] and [https://github.com/apache/lucene-solr/search?q=MIN_REPFACT] search results this JIRA item remains available to be worked on. Perhaps a _draft_ and _initially partially scoped_ pull request provide a way forward here i.e. if the SOLR-14034 ticket number is included in its title then it will get automatically linked here i.e. we can all easily find it then and perhaps then incrementally the possibilities for the above questions would become clearer? Hope that helps. > remove deprecated min_rf references > --- > > Key: SOLR-14034 > URL: https://issues.apache.org/jira/browse/SOLR-14034 > Project: Solr > Issue Type: Task >Reporter: Christine Poerschke >Priority: Blocker > Labels: newdev > Fix For: master (9.0) > > > * {{min_rf}} support was added under SOLR-5468 in version 4.9 > (https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.9.0/solr/solrj/src/java/org/apache/solr/client/solrj/request/UpdateRequest.java#L50) > and deprecated under SOLR-12767 in version 7.6 > (https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.6.0/solr/solrj/src/java/org/apache/solr/client/solrj/request/UpdateRequest.java#L57-L61) > * http://lucene.apache.org/solr/7_6_0/changes/Changes.html and > https://lucene.apache.org/solr/guide/8_0/major-changes-in-solr-8.html#solr-7-6 > both clearly mention the deprecation > This ticket is to fully remove {{min_rf}} references in code, tests and > documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2120: SOLR-15029 More gracefully give up shard leadership
jbampton commented on a change in pull request #2120: URL: https://github.com/apache/lucene-solr/pull/2120#discussion_r543515585 ## File path: solr/solrj/src/java/org/apache/solr/client/solrj/cloud/ShardTerms.java ## @@ -102,16 +101,16 @@ public ShardTerms increaseTerms(String leader, Set replicasNeedingRecove if (replicasNeedingRecovery.contains(key)) foundReplicasInLowerTerms = true; if (Objects.equals(entry.getValue(), leaderTerm)) { if(skipIncreaseTermOf(key, replicasNeedingRecovery)) { Review comment: ```suggestion if (skipIncreaseTermOf(key, replicasNeedingRecovery)) { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2121: SOLR-10860: Return proper error code for bad input incase of inplace updates
jbampton commented on a change in pull request #2121: URL: https://github.com/apache/lucene-solr/pull/2121#discussion_r543503718 ## File path: solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java ## @@ -143,6 +147,15 @@ public SolrInputDocument merge(final SolrInputDocument fromDoc, SolrInputDocumen return toDoc; } + private static String getID(SolrInputDocument doc, IndexSchema schema) { +String id = ""; +SchemaField sf = schema.getUniqueKeyField(); +if( sf != null ) { Review comment: ```suggestion if ( sf != null ) { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2122: SOLR-14950: Fix copyfield regeneration with explicit src/dest matching dyn rule
jbampton commented on a change in pull request #2122: URL: https://github.com/apache/lucene-solr/pull/2122#discussion_r543500974 ## File path: solr/core/src/test/org/apache/solr/rest/schema/TestBulkSchemaAPI.java ## @@ -773,6 +773,108 @@ public void testCopyFieldRules() throws Exception { assertTrue("'bleh_s' copyField rule exists in the schema", l.isEmpty()); } + @SuppressWarnings({"rawtypes"}) + public void testCopyFieldWithReplace() throws Exception { +RestTestHarness harness = restTestHarness; +String newFieldName = "test_solr_14950"; + +// add-field-type +String addFieldTypeAnalyzer = "{\n" + +"'add-field-type' : {" + +"'name' : 'myNewTextField',\n" + +"'class':'solr.TextField',\n" + Review comment: ```suggestion "'class' : 'solr.TextField',\n" + ``` ## File path: solr/core/src/test/org/apache/solr/rest/schema/TestBulkSchemaAPI.java ## @@ -773,6 +773,108 @@ public void testCopyFieldRules() throws Exception { assertTrue("'bleh_s' copyField rule exists in the schema", l.isEmpty()); } + @SuppressWarnings({"rawtypes"}) + public void testCopyFieldWithReplace() throws Exception { +RestTestHarness harness = restTestHarness; +String newFieldName = "test_solr_14950"; + +// add-field-type +String addFieldTypeAnalyzer = "{\n" + +"'add-field-type' : {" + +"'name' : 'myNewTextField',\n" + +"'class':'solr.TextField',\n" + +"'analyzer' : {\n" + +"'charFilters' : [{\n" + +"'name':'patternReplace',\n" + +"'replacement':'$1$1',\n" + +"'pattern':'([a-zA-Z])1+'\n" + +"}],\n" + +"'tokenizer' : { 'name':'whitespace' },\n" + +"'filters' : [{ 'name':'asciiFolding' }]\n" + +"}\n"+ +"}}"; + +String response = restTestHarness.post("/schema", json(addFieldTypeAnalyzer)); +Map map = (Map) fromJSONString(response); +assertNull(response, map.get("error")); +map = getObj(harness, "myNewTextField", "fieldTypes"); +assertNotNull("'myNewTextField' field type does not exist in the schema", map); + +// add-field +String payload = "{\n" + +"'add-field' : {\n" + +" 'name':'" + newFieldName + "',\n" + +" 'type':'myNewTextField',\n" + +" 'stored':true,\n" + +" 'indexed':true\n" + +" }\n" + +"}"; + +response = harness.post("/schema", json(payload)); + +map = (Map) fromJSONString(response); +assertNull(response, map.get("error")); + +Map m = getObj(harness, newFieldName, "fields"); +assertNotNull("'"+ newFieldName + "' field does not exist in the schema", m); + +// add copy-field with explicit source and destination +List l = getSourceCopyFields(harness, "bleh_s"); +assertTrue("'bleh_s' copyField rule exists in the schema", l.isEmpty()); + +payload = "{\n" + +" 'add-copy-field' : {\n" + +" 'source' :'bleh_s',\n" + +" 'dest':'"+ newFieldName + "'\n" + +" }\n" + +" }\n"; +response = harness.post("/schema", json(payload)); + +map = (Map) fromJSONString(response); +assertNull(response, map.get("error")); + +l = getSourceCopyFields(harness, "bleh_s"); +assertFalse("'bleh_s' copyField rule doesn't exist", l.isEmpty()); +assertEquals("bleh_s", ((Map)l.get(0)).get("source")); +assertEquals(newFieldName, ((Map)l.get(0)).get("dest")); + +// replace-field-type +String replaceFieldTypeAnalyzer = "{\n" + +"'replace-field-type' : {" + +"'name' : 'myNewTextField',\n" + +"'class':'solr.TextField',\n" + Review comment: ```suggestion "'class' : 'solr.TextField',\n" + ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections
[ https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692 ] Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 4:32 PM: -- The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported fields. The fl for drill specifies the exported fields. Maybe we should change the syntax of drill so that the input() function takes field names as parameters and drill selects the export fl from this field list. This is quite clean and ties together all the fields needed for export with the expression wrapping the input function. was (Author: joel.bernstein): The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported fields. The fl for drill specifies the exported fields. Maybe we should change the syntax of drill so that the input() function takes field names as parameters and drill selects the export fl from this field list. This is quit clean and ties together all the fields needed for export with the expression wrapping the input function. > Use plist automatically for executing a facet expression against a collection > alias backed by multiple collections > -- > > Key: SOLR-15036 > URL: https://issues.apache.org/jira/browse/SOLR-15036 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Attachments: relay-approach.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > For analytics use cases, streaming expressions make it possible to compute > basic aggregations (count, min, max, sum, and avg) over massive data sets. > Moreover, with massive data sets, it is common to use collection aliases over > many underlying collections, for instance time-partitioned aliases backed by > a set of collections, each covering a specific time range. In some cases, we > can end up with many collections (think 50-60) each with 100's of shards. > Aliases help insulate client applications from complex collection topologies > on the server side. > Let's take a basic facet expression that computes some useful aggregation > metrics: > {code:java} > facet( > some_alias, > q="*:*", > fl="a_i", > sort="a_i asc", > buckets="a_i", > bucketSorts="count(*) asc", > bucketSizeLimit=1, > sum(a_d), avg(a_d), min(a_d), max(a_d), count(*) > ) > {code} > Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr > which then expands the alias to a list of collections. For each collection, > the top-level distributed query controller gathers a candidate set of > replicas to query and then scatters {{distrib=false}} queries to each replica > in the list. For instance, if we have 60 collections with 200 shards each, > then this results in 12,000 shard requests from the query controller node to > the other nodes in the cluster. The requests are sent in an async manner (see > {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases > where we hit 18,000 replicas and these queries don’t always come back in a > timely manner. Put simply, this also puts a lot of load on the top-level > query controller node in terms of open connections and new object creation. > Instead, we can use {{plist}} to send the JSON facet query to each collection > in the alias in parallel, which reduces the overhead of each top-level > distributed query from 12,000 to 200 in my example above. With this approach, > you’ll then need to sort the tuples back from each collection and do a > rollup, something like: > {code:java} > select( > rollup( > sort( > plist( > select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt), > select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt) > ), > by="a_i asc" > ), > over="a_i", > sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt) > ), > a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as > the_min, max(the_max) as the_max, sum(cnt) as cnt > ) > {code} > One
[jira] [Deleted] (SOLR-15050) Проводка Москвич 2140
[ https://issues.apache.org/jira/browse/SOLR-15050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev deleted SOLR-15050: > Проводка Москвич 2140 > - > > Key: SOLR-15050 > URL: https://issues.apache.org/jira/browse/SOLR-15050 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Environment: ||Основные|| > |Марка|Москвич| > |Модель|2140| > |Тип|Автомобильные провода| > |Производитель |Wire& aвтопроводка| > |Страна производитель|Украина| > |Тип запчасти|Оригинал| > |Тип техники|Легковой автомобиль| > |Код запчасти|1082| > |Состояние|Новое| > https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html >Reporter: Vladimir >Priority: Major > Labels: Проводка > > !2365641414_w640_h640_2365641414.jpg|width=179,height=179! > Проводка на Москвич 2140 > 1. Жгут проводов основной. > 2. Жгут проводов задняя часть. > [https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html] > Проводка от Каменец-Подольского производителя, это всегда гарантия качества, > доступные цены и быстрая доставка. > Будем рады сотрудничать, как с частными лицами, предпринимателями так и с > предприятиями. > ** -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections
[ https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249765#comment-17249765 ] Timothy Potter commented on SOLR-15036: --- Confirmed, it works nicely now! Thanks for your help Joel {code} {count(*)=6, a_i=0, max(max(a_d))=2.2515625018914305, min(min(a_d))=-0.5859583807765252, sum(sum(a_d))=5.894460990302006, wsum(avg(a_d), count(*))=0.9824101650503342} {count(*)=4, a_i=1, max(max(a_d))=3.338305310115201, min(min(a_d))=0.03050220236482515, sum(sum(a_d))=12.517492417715335, wsum(avg(a_d), count(*))=2.086248736285889} {count(*)=4, a_i=2, max(max(a_d))=4.832815828279073, min(min(a_d))=3.16905458918893, sum(sum(a_d))=24.076139429000165, wsum(avg(a_d), count(*))=4.012689904833361} {count(*)=4, a_i=3, max(max(a_d))=5.66831997419713, min(min(a_d))=2.902262184046103, sum(sum(a_d))=22.58303980377591, wsum(avg(a_d), count(*))=3.763839967295984} {count(*)=4, a_i=4, max(max(a_d))=6.531585917691583, min(min(a_d))=2.6395698661907963, sum(sum(a_d))=28.243748570490624, wsum(avg(a_d), count(*))=4.707291428415103} {count(*)=5, a_i=5, max(max(a_d))=7.555382540979672, min(min(a_d))=4.808772939476107, sum(sum(a_d))=37.88196903407075, wsum(avg(a_d), count(*))=6.313661505678459} {count(*)=5, a_i=6, max(max(a_d))=8.416136012729918, min(min(a_d))=5.422492404700898, sum(sum(a_d))=39.25679972070782, wsum(avg(a_d), count(*))=6.542799953451303} {count(*)=5, a_i=7, max(max(a_d))=8.667999236934058, min(min(a_d))=6.934577412906803, sum(sum(a_d))=46.7622185952807, wsum(avg(a_d), count(*))=7.793703099213451} {count(*)=5, a_i=8, max(max(a_d))=9.566181963643201, min(min(a_d))=7.4397380388592556, sum(sum(a_d))=53.296172957938325, wsum(avg(a_d), count(*))=8.88269549298972} {count(*)=4, a_i=9, max(max(a_d))=12.251349466753346, min(min(a_d))=9.232427215193514, sum(sum(a_d))=63.46244550204135, wsum(avg(a_d), count(*))=10.577074250340223} {code} > Use plist automatically for executing a facet expression against a collection > alias backed by multiple collections > -- > > Key: SOLR-15036 > URL: https://issues.apache.org/jira/browse/SOLR-15036 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Attachments: relay-approach.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > For analytics use cases, streaming expressions make it possible to compute > basic aggregations (count, min, max, sum, and avg) over massive data sets. > Moreover, with massive data sets, it is common to use collection aliases over > many underlying collections, for instance time-partitioned aliases backed by > a set of collections, each covering a specific time range. In some cases, we > can end up with many collections (think 50-60) each with 100's of shards. > Aliases help insulate client applications from complex collection topologies > on the server side. > Let's take a basic facet expression that computes some useful aggregation > metrics: > {code:java} > facet( > some_alias, > q="*:*", > fl="a_i", > sort="a_i asc", > buckets="a_i", > bucketSorts="count(*) asc", > bucketSizeLimit=1, > sum(a_d), avg(a_d), min(a_d), max(a_d), count(*) > ) > {code} > Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr > which then expands the alias to a list of collections. For each collection, > the top-level distributed query controller gathers a candidate set of > replicas to query and then scatters {{distrib=false}} queries to each replica > in the list. For instance, if we have 60 collections with 200 shards each, > then this results in 12,000 shard requests from the query controller node to > the other nodes in the cluster. The requests are sent in an async manner (see > {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases > where we hit 18,000 replicas and these queries don’t always come back in a > timely manner. Put simply, this also puts a lot of load on the top-level > query controller node in terms of open connections and new object creation. > Instead, we can use {{plist}} to send the JSON facet query to each collection > in the alias in parallel, which reduces the overhead of each top-level > distributed query from 12,000 to 200 in my example above. With this approach, > you’ll then need to sort the tuples back from each collection and do a > rollup, something like: > {code:java} > select( > rollup( > sort( > plist( > select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc",
[jira] [Updated] (SOLR-15050) Проводка Москвич 2140
[ https://issues.apache.org/jira/browse/SOLR-15050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir updated SOLR-15050: Description: !2365641414_w640_h640_2365641414.jpg|width=179,height=179! Проводка на Москвич 2140 1. Жгут проводов основной. 2. Жгут проводов задняя часть. [https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html] Проводка от Каменец-Подольского производителя, это всегда гарантия качества, доступные цены и быстрая доставка. Будем рады сотрудничать, как с частными лицами, предпринимателями так и с предприятиями. ** was: [привязать заголовок|https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.htmlПроводка на Москвич 2140 1. Жгут проводов основной. 2. Жгут проводов задняя часть. Проводка от Каменец-Подольского производителя, это всегда гарантия качества, доступные цены и быстрая доставка. Будем рады сотрудничать, как с частными лицами, предпринимателями так и с предприятиями. ** > Проводка Москвич 2140 > - > > Key: SOLR-15050 > URL: https://issues.apache.org/jira/browse/SOLR-15050 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.7 > Environment: ||Основные|| > |Марка|Москвич| > |Модель|2140| > |Тип|Автомобильные провода| > |Производитель |Wire& aвтопроводка| > |Страна производитель|Украина| > |Тип запчасти|Оригинал| > |Тип техники|Легковой автомобиль| > |Код запчасти|1082| > |Состояние|Новое| > https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html >Reporter: Vladimir >Priority: Major > Labels: Проводка > Attachments: 2365641414_w640_h640_2365641414.jpg > > > !2365641414_w640_h640_2365641414.jpg|width=179,height=179! > Проводка на Москвич 2140 > 1. Жгут проводов основной. > 2. Жгут проводов задняя часть. > [https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html] > Проводка от Каменец-Подольского производителя, это всегда гарантия качества, > доступные цены и быстрая доставка. > Будем рады сотрудничать, как с частными лицами, предпринимателями так и с > предприятиями. > ** -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on pull request #2080: LUCENE-8947: Skip field length accumulation when norms are disabled
mikemccand commented on pull request #2080: URL: https://github.com/apache/lucene-solr/pull/2080#issuecomment-745375768 > > > Hmm, but I think sumTotalTermFreq, which is per field sum of all totalTermFreq across all terms in that field, could overflow long even today, in and adversarial case. And it would not be detected by Lucene... > > I don't think so. I like to think of this as "number of tokens" in the corpus. Because each doc is limited to Integer.MAX_VALUE and there can only be Integer.MAX_VALUE docs, sumTotalTermFreq can't overflow. and totalTermFreq is <= sumTotalTermFreq (it would be equal, in a degraded case where all your documents only have a single word repeated many times). Ahh you're right ... no more than `Integer.MAX_VALUE` tokens in one document, OK. > > How about decoupling these two problems? First, let's fix the aggregation of totalTermFreq and sumTotalTermFreq to explicitly catch any overflow instead of just doing the dangerous += today: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/PushPostingsWriterBase.java#L142 and https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java#L915? I.e. switch these accumluations to Math.addExact. This will explicitly catch long overflow for either of these stats. > > I don't think this is correct. You wouldn't trip this until after merge, far after you've already overflowed the values and caused broken search results (assuming you have more than one segment). Hrmph, also correct, boo. Alright I guess there is nothing we can fix here ... applications simply must not create > `Integer.MAX_VALUE` term frequencies in one doc/field. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15050) Проводка Москвич 2140
Vladimir created SOLR-15050: --- Summary: Проводка Москвич 2140 Key: SOLR-15050 URL: https://issues.apache.org/jira/browse/SOLR-15050 Project: Solr Issue Type: Task Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Affects Versions: 8.7 Environment: ||Основные|| |Марка|Москвич| |Модель|2140| |Тип|Автомобильные провода| |Производитель |Wire& aвтопроводка| |Страна производитель|Украина| |Тип запчасти|Оригинал| |Тип техники|Легковой автомобиль| |Код запчасти|1082| |Состояние|Новое| https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.html Reporter: Vladimir Attachments: 2365641414_w640_h640_2365641414.jpg [привязать заголовок|https://avto-pro.com.ua/p1136095013-provodka-moskvich-2140.htmlПроводка на Москвич 2140 1. Жгут проводов основной. 2. Жгут проводов задняя часть. Проводка от Каменец-Подольского производителя, это всегда гарантия качества, доступные цены и быстрая доставка. Будем рады сотрудничать, как с частными лицами, предпринимателями так и с предприятиями. ** -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2132: SOLR-15036: auto- select / rollup / sort / plist over facet expression when using a collection alias with multiple collecti
jbampton commented on a change in pull request #2132: URL: https://github.com/apache/lucene-solr/pull/2132#discussion_r543443487 ## File path: solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/FacetStream.java ## @@ -351,25 +361,26 @@ public String getCollection() { FieldComparator[] comps = new FieldComparator[sorts.length]; for(int i=0; i 1) { - return new MultipleFieldComparator(bucketSorts); +return (bucketSorts.length > 1) ? new MultipleFieldComparator(bucketSorts) : bucketSorts[0]; + } + + @Override + public TupleStream[] parallelize(List partitions) throws IOException { +TupleStream[] parallelStreams = new TupleStream[partitions.size()]; + +// prefer a different node for each collection if possible as we don't want the same remote node +// being the coordinator if possible, otherwise, our plist isn't distributing the load as well +final Set preferredNodes = new HashSet<>(Math.max((int) (parallelStreams.length/.75f) + 1, 16)); + +for (int c=0; c < parallelStreams.length; c++) { Review comment: ```suggestion for (int c = 0; c < parallelStreams.length; c++) { ``` ## File path: solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/metrics/MinMetric.java ## @@ -87,7 +87,7 @@ public void update(Tuple tuple) { if(l < longMin) { longMin = l; } -} else { +} else if(o instanceof Long) { Review comment: ```suggestion } else if (o instanceof Long) { ``` ## File path: solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/FacetStream.java ## @@ -351,25 +361,26 @@ public String getCollection() { FieldComparator[] comps = new FieldComparator[sorts.length]; for(int i=0; ihttp://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.client.solrj.io.stream; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.Optional; + +import org.apache.solr.client.solrj.io.comp.StreamComparator; +import org.apache.solr.client.solrj.io.stream.metrics.CountMetric; +import org.apache.solr.client.solrj.io.stream.metrics.MaxMetric; +import org.apache.solr.client.solrj.io.stream.metrics.MeanMetric; +import org.apache.solr.client.solrj.io.stream.metrics.Metric; +import org.apache.solr.client.solrj.io.stream.metrics.MinMetric; +import org.apache.solr.client.solrj.io.stream.metrics.SumMetric; +import org.apache.solr.client.solrj.io.stream.metrics.WeightedSumMetric; + +/** + * Indicates the underlying stream source supports parallelizing metrics computation across collections + * using a rollup of metrics from each collection. + */ +public interface ParallelMetricsRollup { + TupleStream[] parallelize(List partitions) throws IOException; + StreamComparator getParallelListSortOrder() throws IOException; + RollupStream getRollupStream(SortStream sortStream, Metric[] rollupMetrics) throws IOException; + Map getRollupSelectFields(Metric[] rollupMetrics); + + default Optional openParallelStream(StreamContext context, List partitions, Metric[] metrics) throws IOException { +Optional maybeRollupMetrics = getRollupMetrics(metrics); +if (maybeRollupMetrics.isEmpty()) + return Optional.empty(); // some metric is incompatible with doing a rollup over the plist results + +TupleStream[] parallelStreams = parallelize(partitions); + +// the tuples from each plist need to be sorted using the same order to do a rollup +Metric[] rollupMetrics = maybeRollupMetrics.get(); +StreamComparator comparator = getParallelListSortOrder(); +SortStream sortStream = new SortStream(new ParallelListStream(parallelStreams), comparator); +RollupStream rollup = getRollupStream(sortStream, rollupMetrics); +SelectStream select = new SelectStream(rollup, getRollupSelectFields(rollupMetrics)); +select.setStreamContext(context); +select.open(); + +return Optional.of(select); + } + + default Optional getRollupMetrics(Metric[] metrics) { +Metric[] rollup = new Metric[metrics.length]; +CountMetric count = null; +for (int m=0; m < rollup.length; m++) { Review comment: ```suggestion for (int m = 0; m < rollup.length; m++) { ``` ## File path: solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/FacetStream.java ## @@ -351,25 +361,26 @@ public String getCollection() { FieldComparator[] comps = new FieldComparator[sorts.length]; for(int i=0; ihttp://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed
[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections
[ https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249747#comment-17249747 ] Timothy Potter commented on SOLR-15036: --- Oh darn, I should have spotted that! Sorry for the noise there [~jbernste] ... seems like we could improve the error handling in the drill code to barf if the user is trying to compute metrics for fields not in the fl? That could help with silly mistakes like this ... > Use plist automatically for executing a facet expression against a collection > alias backed by multiple collections > -- > > Key: SOLR-15036 > URL: https://issues.apache.org/jira/browse/SOLR-15036 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Attachments: relay-approach.patch > > Time Spent: 20m > Remaining Estimate: 0h > > For analytics use cases, streaming expressions make it possible to compute > basic aggregations (count, min, max, sum, and avg) over massive data sets. > Moreover, with massive data sets, it is common to use collection aliases over > many underlying collections, for instance time-partitioned aliases backed by > a set of collections, each covering a specific time range. In some cases, we > can end up with many collections (think 50-60) each with 100's of shards. > Aliases help insulate client applications from complex collection topologies > on the server side. > Let's take a basic facet expression that computes some useful aggregation > metrics: > {code:java} > facet( > some_alias, > q="*:*", > fl="a_i", > sort="a_i asc", > buckets="a_i", > bucketSorts="count(*) asc", > bucketSizeLimit=1, > sum(a_d), avg(a_d), min(a_d), max(a_d), count(*) > ) > {code} > Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr > which then expands the alias to a list of collections. For each collection, > the top-level distributed query controller gathers a candidate set of > replicas to query and then scatters {{distrib=false}} queries to each replica > in the list. For instance, if we have 60 collections with 200 shards each, > then this results in 12,000 shard requests from the query controller node to > the other nodes in the cluster. The requests are sent in an async manner (see > {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases > where we hit 18,000 replicas and these queries don’t always come back in a > timely manner. Put simply, this also puts a lot of load on the top-level > query controller node in terms of open connections and new object creation. > Instead, we can use {{plist}} to send the JSON facet query to each collection > in the alias in parallel, which reduces the overhead of each top-level > distributed query from 12,000 to 200 in my example above. With this approach, > you’ll then need to sort the tuples back from each collection and do a > rollup, something like: > {code:java} > select( > rollup( > sort( > plist( > select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt), > select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt) > ), > by="a_i asc" > ), > over="a_i", > sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt) > ), > a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as > the_min, max(the_max) as the_max, sum(cnt) as cnt > ) > {code} > One thing to point out is that you can’t just avg. the averages back from > each collection in the rollup. It needs to be a *weighted avg.* when rolling > up the avg. from each facet expression in the plist. However, we have the > count per collection, so this is doable but will require some changes to the > rollup expression to support weighted average. > While this plist approach is doable, it’s a pain for users to have to create > the rollup / sort over plist expression for collection aliases. After all, > aliases are supposed to hide these types of complexities from client > applications! > The point of this ticket is to investigate the feasibility of auto-wrapping > the facet expression with a rollup
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2134: SOLR-15038: Add elevateDocsWithoutMatchingQ and onlyElevatedReprese…
jbampton commented on a change in pull request #2134: URL: https://github.com/apache/lucene-solr/pull/2134#discussion_r543432124 ## File path: solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java ## @@ -689,15 +693,18 @@ public void finish() throws IOException { //Handle the boosted docs. if(this.boostOrds != null) { -int s = boostOrds.size(); -for(int i=0; i -1) { -//Remove any group heads that are in the same groups as boosted documents. -ords.remove(ord); +// representative is already part of the collapsedset. +if(!onlyElevatedRepresentativeVisible) { Review comment: ```suggestion if (!onlyElevatedRepresentativeVisible) { ``` ## File path: solr/core/src/java/org/apache/solr/handler/component/QueryElevationComponent.java ## @@ -504,25 +506,27 @@ private void setQuery(ResponseBuilder rb, Elevation elevation) { // Change the query to insert forced documents SolrParams params = rb.req.getParams(); -if (params.getBool(QueryElevationParams.EXCLUSIVE, false)) { - // We only want these elevated results - rb.setQuery(new BoostQuery(elevation.includeQuery, 0f)); -} else { - BooleanQuery.Builder queryBuilder = new BooleanQuery.Builder(); - queryBuilder.add(rb.getQuery(), BooleanClause.Occur.SHOULD); - queryBuilder.add(new BoostQuery(elevation.includeQuery, 0f), BooleanClause.Occur.SHOULD); - if (elevation.excludeQueries != null) { -if (params.getBool(QueryElevationParams.MARK_EXCLUDES, false)) { - // We are only going to mark items as excluded, not actually exclude them. - // This works with the EditorialMarkerFactory. - rb.req.getContext().put(EXCLUDED, elevation.excludedIds); -} else { - for (TermQuery tq : elevation.excludeQueries) { -queryBuilder.add(tq, BooleanClause.Occur.MUST_NOT); +if(params.getBool(ELEVATE_DOCS_WITHOUT_MATCHING_Q, true)) { Review comment: ```suggestion if (params.getBool(ELEVATE_DOCS_WITHOUT_MATCHING_Q, true)) { ``` ## File path: solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java ## @@ -1030,25 +1045,25 @@ public OrdFieldValueCollector(int maxDoc, this.needsScores4Collapsing = needsScores4Collapsing; this.needsScores = needsScores; if (null != sortSpec) { -this.collapseStrategy = new OrdSortSpecStrategy(maxDoc, nullPolicy, valueCount, groupHeadSelector, this.needsScores4Collapsing, this.needsScores, boostDocs, sortSpec, searcher, collapseValues); +this.collapseStrategy = new OrdSortSpecStrategy(maxDoc, nullPolicy, valueCount, groupHeadSelector, this.needsScores4Collapsing, this.needsScores, boostDocs, sortSpec, searcher, onlyElevatedRepresentativeVisible); } else if (funcQuery != null) { -this.collapseStrategy = new OrdValueSourceStrategy(maxDoc, nullPolicy, valueCount, groupHeadSelector, this.needsScores4Collapsing, this.needsScores, boostDocs, funcQuery, searcher, collapseValues); +this.collapseStrategy = new OrdValueSourceStrategy(maxDoc, nullPolicy, valueCount, groupHeadSelector, this.needsScores4Collapsing, this.needsScores, boostDocs, funcQuery, searcher, onlyElevatedRepresentativeVisible); } else { NumberType numType = fieldType.getNumberType(); if (null == numType) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "min/max must be either Int/Long/Float based field types"); } switch (numType) { case INTEGER: { -this.collapseStrategy = new OrdIntStrategy(maxDoc, nullPolicy, valueCount, groupHeadSelector, this.needsScores, boostDocs, collapseValues); +this.collapseStrategy = new OrdIntStrategy(maxDoc, nullPolicy, valueCount, groupHeadSelector, this.needsScores, boostDocs, onlyElevatedRepresentativeVisible); break; } case FLOAT: { -this.collapseStrategy = new OrdFloatStrategy(maxDoc, nullPolicy, valueCount, groupHeadSelector, this.needsScores, boostDocs, collapseValues); +this.collapseStrategy = new OrdFloatStrategy(maxDoc, nullPolicy, valueCount, groupHeadSelector, this.needsScores, boostDocs, onlyElevatedRepresentativeVisible); break; } case LONG: { -this.collapseStrategy = new OrdLongStrategy(maxDoc, nullPolicy, valueCount, groupHeadSelector, this.needsScores, boostDocs, collapseValues); +this.collapseStrategy = new OrdLongStrategy(maxDoc, nullPolicy, valueCount, groupHeadSelector, this.needsScores, boostDocs, onlyElevatedRepresentativeVisible); Review comment: ```suggestion this.collapseStrategy = new OrdLongStrategy(maxDoc, nullPolicy, valueCount, groupHeadSelector, this.needsScores, boostDocs, onlyElevatedRepresentativeVisible);
[jira] [Commented] (LUCENE-9021) QueryParser should avoid creating an LookaheadSuccess(Error) object with every instance
[ https://issues.apache.org/jira/browse/LUCENE-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249741#comment-17249741 ] Mikhail Khludnev commented on LUCENE-9021: -- Thanks, [~pbruski_]. It seems we've done here. I'm wondering why javacc can't optimize it itself. > QueryParser should avoid creating an LookaheadSuccess(Error) object with > every instance > --- > > Key: LUCENE-9021 > URL: https://issues.apache.org/jira/browse/LUCENE-9021 > Project: Lucene - Core > Issue Type: Bug >Reporter: Przemek Bruski >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.8 > > Attachments: LUCENE-9021.patch > > Time Spent: 1h > Remaining Estimate: 0h > > This is basically the same as > https://issues.apache.org/jira/browse/SOLR-11242 , but for Lucene QueryParser -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2135: SOLR-15038: Add elevateDocsWithoutMatchingQ and onlyElevatedReprese…
jbampton commented on a change in pull request #2135: URL: https://github.com/apache/lucene-solr/pull/2135#discussion_r543423038 ## File path: solr/core/src/java/org/apache/solr/handler/component/QueryElevationComponent.java ## @@ -504,25 +506,27 @@ private void setQuery(ResponseBuilder rb, Elevation elevation) { // Change the query to insert forced documents SolrParams params = rb.req.getParams(); -if (params.getBool(QueryElevationParams.EXCLUSIVE, false)) { - // We only want these elevated results - rb.setQuery(new BoostQuery(elevation.includeQuery, 0f)); -} else { - BooleanQuery.Builder queryBuilder = new BooleanQuery.Builder(); - queryBuilder.add(rb.getQuery(), BooleanClause.Occur.SHOULD); - queryBuilder.add(new BoostQuery(elevation.includeQuery, 0f), BooleanClause.Occur.SHOULD); - if (elevation.excludeQueries != null) { -if (params.getBool(QueryElevationParams.MARK_EXCLUDES, false)) { - // We are only going to mark items as excluded, not actually exclude them. - // This works with the EditorialMarkerFactory. - rb.req.getContext().put(EXCLUDED, elevation.excludedIds); -} else { - for (TermQuery tq : elevation.excludeQueries) { -queryBuilder.add(tq, BooleanClause.Occur.MUST_NOT); +if(params.getBool(ELEVATE_DOCS_WITHOUT_MATCHING_Q, true)) { Review comment: ```suggestion if (params.getBool(ELEVATE_DOCS_WITHOUT_MATCHING_Q, true)) { ``` ## File path: solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java ## @@ -569,15 +570,18 @@ public int docID() { private IntArrayList boostDocs; private MergeBoost mergeBoost; private boolean boosts; +private boolean onlyElevatedRepresentativeVisible; public OrdScoreCollector(int maxDoc, int segments, DocValuesProducer collapseValuesProducer, int nullPolicy, IntIntHashMap boostDocsMap, - IndexSearcher searcher) throws IOException { + IndexSearcher searcher, + boolean onlyElevatedRepresentativeVisible) throws IOException { this.maxDoc = maxDoc; this.contexts = new LeafReaderContext[segments]; + this.onlyElevatedRepresentativeVisible = onlyElevatedRepresentativeVisible; List con = searcher.getTopReaderContext().leaves(); for(int i=0; i con = searcher.getTopReaderContext().leaves(); for(int i=0; i
[jira] [Updated] (LUCENE-9021) QueryParser should avoid creating an LookaheadSuccess(Error) object with every instance
[ https://issues.apache.org/jira/browse/LUCENE-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated LUCENE-9021: - Fix Version/s: 8.8 Assignee: Mikhail Khludnev Resolution: Fixed Status: Resolved (was: Patch Available) > QueryParser should avoid creating an LookaheadSuccess(Error) object with > every instance > --- > > Key: LUCENE-9021 > URL: https://issues.apache.org/jira/browse/LUCENE-9021 > Project: Lucene - Core > Issue Type: Bug >Reporter: Przemek Bruski >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.8 > > Attachments: LUCENE-9021.patch > > Time Spent: 1h > Remaining Estimate: 0h > > This is basically the same as > https://issues.apache.org/jira/browse/SOLR-11242 , but for Lucene QueryParser -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to skip checking for availble disk space before splitting shards. Useful with shar
jbampton commented on a change in pull request #2137: URL: https://github.com/apache/lucene-solr/pull/2137#discussion_r543419505 ## File path: solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java ## @@ -129,10 +130,16 @@ public boolean split(ClusterState clusterState, ZkNodeProps message, NamedList
[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections
[ https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692 ] Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 1:48 PM: -- The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported fields. The fl for drill specifies the exported fields. Maybe we should change the syntax of drill so that the input() function takes field names as parameters and drill selects the export fl from this field list. This is quit clean and ties together all the fields needed for export with the expression wrapping the input function. was (Author: joel.bernstein): The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported fields. The fl for drill specifies the exported fields. Maybe we should change the syntax of drill so that the input() function takes field names as parameters and drill selects the export fl from this field list. > Use plist automatically for executing a facet expression against a collection > alias backed by multiple collections > -- > > Key: SOLR-15036 > URL: https://issues.apache.org/jira/browse/SOLR-15036 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Attachments: relay-approach.patch > > Time Spent: 20m > Remaining Estimate: 0h > > For analytics use cases, streaming expressions make it possible to compute > basic aggregations (count, min, max, sum, and avg) over massive data sets. > Moreover, with massive data sets, it is common to use collection aliases over > many underlying collections, for instance time-partitioned aliases backed by > a set of collections, each covering a specific time range. In some cases, we > can end up with many collections (think 50-60) each with 100's of shards. > Aliases help insulate client applications from complex collection topologies > on the server side. > Let's take a basic facet expression that computes some useful aggregation > metrics: > {code:java} > facet( > some_alias, > q="*:*", > fl="a_i", > sort="a_i asc", > buckets="a_i", > bucketSorts="count(*) asc", > bucketSizeLimit=1, > sum(a_d), avg(a_d), min(a_d), max(a_d), count(*) > ) > {code} > Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr > which then expands the alias to a list of collections. For each collection, > the top-level distributed query controller gathers a candidate set of > replicas to query and then scatters {{distrib=false}} queries to each replica > in the list. For instance, if we have 60 collections with 200 shards each, > then this results in 12,000 shard requests from the query controller node to > the other nodes in the cluster. The requests are sent in an async manner (see > {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases > where we hit 18,000 replicas and these queries don’t always come back in a > timely manner. Put simply, this also puts a lot of load on the top-level > query controller node in terms of open connections and new object creation. > Instead, we can use {{plist}} to send the JSON facet query to each collection > in the alias in parallel, which reduces the overhead of each top-level > distributed query from 12,000 to 200 in my example above. With this approach, > you’ll then need to sort the tuples back from each collection and do a > rollup, something like: > {code:java} > select( > rollup( > sort( > plist( > select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt), > select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt) > ), > by="a_i asc" > ), > over="a_i", > sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt) > ), > a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as > the_min, max(the_max) as the_max, sum(cnt) as cnt > ) > {code} > One thing to point out is that you can’t just avg. the averages back from > each collection in the rollup. It needs to be a
[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections
[ https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692 ] Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 1:46 PM: -- The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported fields. The fl for drill specifies the exported fields. Maybe we should change the syntax of drill so that the input() function takes field names as parameters and drill selects the export fl from this field list. was (Author: joel.bernstein): The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported fields. Maybe we should change the syntax of drill so that the input() function takes field names as parameters and drill selects the export fl from this field list. > Use plist automatically for executing a facet expression against a collection > alias backed by multiple collections > -- > > Key: SOLR-15036 > URL: https://issues.apache.org/jira/browse/SOLR-15036 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Attachments: relay-approach.patch > > Time Spent: 20m > Remaining Estimate: 0h > > For analytics use cases, streaming expressions make it possible to compute > basic aggregations (count, min, max, sum, and avg) over massive data sets. > Moreover, with massive data sets, it is common to use collection aliases over > many underlying collections, for instance time-partitioned aliases backed by > a set of collections, each covering a specific time range. In some cases, we > can end up with many collections (think 50-60) each with 100's of shards. > Aliases help insulate client applications from complex collection topologies > on the server side. > Let's take a basic facet expression that computes some useful aggregation > metrics: > {code:java} > facet( > some_alias, > q="*:*", > fl="a_i", > sort="a_i asc", > buckets="a_i", > bucketSorts="count(*) asc", > bucketSizeLimit=1, > sum(a_d), avg(a_d), min(a_d), max(a_d), count(*) > ) > {code} > Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr > which then expands the alias to a list of collections. For each collection, > the top-level distributed query controller gathers a candidate set of > replicas to query and then scatters {{distrib=false}} queries to each replica > in the list. For instance, if we have 60 collections with 200 shards each, > then this results in 12,000 shard requests from the query controller node to > the other nodes in the cluster. The requests are sent in an async manner (see > {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases > where we hit 18,000 replicas and these queries don’t always come back in a > timely manner. Put simply, this also puts a lot of load on the top-level > query controller node in terms of open connections and new object creation. > Instead, we can use {{plist}} to send the JSON facet query to each collection > in the alias in parallel, which reduces the overhead of each top-level > distributed query from 12,000 to 200 in my example above. With this approach, > you’ll then need to sort the tuples back from each collection and do a > rollup, something like: > {code:java} > select( > rollup( > sort( > plist( > select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt), > select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt) > ), > by="a_i asc" > ), > over="a_i", > sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt) > ), > a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as > the_min, max(the_max) as the_max, sum(cnt) as cnt > ) > {code} > One thing to point out is that you can’t just avg. the averages back from > each collection in the rollup. It needs to be a *weighted avg.* when rolling > up the avg. from each facet expression in the plist. However, we have the > count per collection, so this is doable but will require some
[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections
[ https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692 ] Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 1:33 PM: -- The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported fields. Maybe we should change the syntax of drill so that the input() function takes field names as parameters and drill selects the export fl from this field list. was (Author: joel.bernstein): The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported fields. > Use plist automatically for executing a facet expression against a collection > alias backed by multiple collections > -- > > Key: SOLR-15036 > URL: https://issues.apache.org/jira/browse/SOLR-15036 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Attachments: relay-approach.patch > > Time Spent: 20m > Remaining Estimate: 0h > > For analytics use cases, streaming expressions make it possible to compute > basic aggregations (count, min, max, sum, and avg) over massive data sets. > Moreover, with massive data sets, it is common to use collection aliases over > many underlying collections, for instance time-partitioned aliases backed by > a set of collections, each covering a specific time range. In some cases, we > can end up with many collections (think 50-60) each with 100's of shards. > Aliases help insulate client applications from complex collection topologies > on the server side. > Let's take a basic facet expression that computes some useful aggregation > metrics: > {code:java} > facet( > some_alias, > q="*:*", > fl="a_i", > sort="a_i asc", > buckets="a_i", > bucketSorts="count(*) asc", > bucketSizeLimit=1, > sum(a_d), avg(a_d), min(a_d), max(a_d), count(*) > ) > {code} > Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr > which then expands the alias to a list of collections. For each collection, > the top-level distributed query controller gathers a candidate set of > replicas to query and then scatters {{distrib=false}} queries to each replica > in the list. For instance, if we have 60 collections with 200 shards each, > then this results in 12,000 shard requests from the query controller node to > the other nodes in the cluster. The requests are sent in an async manner (see > {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases > where we hit 18,000 replicas and these queries don’t always come back in a > timely manner. Put simply, this also puts a lot of load on the top-level > query controller node in terms of open connections and new object creation. > Instead, we can use {{plist}} to send the JSON facet query to each collection > in the alias in parallel, which reduces the overhead of each top-level > distributed query from 12,000 to 200 in my example above. With this approach, > you’ll then need to sort the tuples back from each collection and do a > rollup, something like: > {code:java} > select( > rollup( > sort( > plist( > select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt), > select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt) > ), > by="a_i asc" > ), > over="a_i", > sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt) > ), > a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as > the_min, max(the_max) as the_max, sum(cnt) as cnt > ) > {code} > One thing to point out is that you can’t just avg. the averages back from > each collection in the rollup. It needs to be a *weighted avg.* when rolling > up the avg. from each facet expression in the plist. However, we have the > count per collection, so this is doable but will require some changes to the > rollup expression to support weighted average. > While this plist approach is doable, it’s a pain for users to have to create > the rollup / sort over plist expression for collection
[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections
[ https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692 ] Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 1:31 PM: -- The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported fields. was (Author: joel.bernstein): The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported fields > Use plist automatically for executing a facet expression against a collection > alias backed by multiple collections > -- > > Key: SOLR-15036 > URL: https://issues.apache.org/jira/browse/SOLR-15036 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Attachments: relay-approach.patch > > Time Spent: 20m > Remaining Estimate: 0h > > For analytics use cases, streaming expressions make it possible to compute > basic aggregations (count, min, max, sum, and avg) over massive data sets. > Moreover, with massive data sets, it is common to use collection aliases over > many underlying collections, for instance time-partitioned aliases backed by > a set of collections, each covering a specific time range. In some cases, we > can end up with many collections (think 50-60) each with 100's of shards. > Aliases help insulate client applications from complex collection topologies > on the server side. > Let's take a basic facet expression that computes some useful aggregation > metrics: > {code:java} > facet( > some_alias, > q="*:*", > fl="a_i", > sort="a_i asc", > buckets="a_i", > bucketSorts="count(*) asc", > bucketSizeLimit=1, > sum(a_d), avg(a_d), min(a_d), max(a_d), count(*) > ) > {code} > Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr > which then expands the alias to a list of collections. For each collection, > the top-level distributed query controller gathers a candidate set of > replicas to query and then scatters {{distrib=false}} queries to each replica > in the list. For instance, if we have 60 collections with 200 shards each, > then this results in 12,000 shard requests from the query controller node to > the other nodes in the cluster. The requests are sent in an async manner (see > {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases > where we hit 18,000 replicas and these queries don’t always come back in a > timely manner. Put simply, this also puts a lot of load on the top-level > query controller node in terms of open connections and new object creation. > Instead, we can use {{plist}} to send the JSON facet query to each collection > in the alias in parallel, which reduces the overhead of each top-level > distributed query from 12,000 to 200 in my example above. With this approach, > you’ll then need to sort the tuples back from each collection and do a > rollup, something like: > {code:java} > select( > rollup( > sort( > plist( > select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt), > select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt) > ), > by="a_i asc" > ), > over="a_i", > sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt) > ), > a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as > the_min, max(the_max) as the_max, sum(cnt) as cnt > ) > {code} > One thing to point out is that you can’t just avg. the averages back from > each collection in the rollup. It needs to be a *weighted avg.* when rolling > up the avg. from each facet expression in the plist. However, we have the > count per collection, so this is doable but will require some changes to the > rollup expression to support weighted average. > While this plist approach is doable, it’s a pain for users to have to create > the rollup / sort over plist expression for collection aliases. After all, > aliases are supposed to hide these types of complexities from client > applications! > The point of this ticket is to investigate the feasibility
[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections
[ https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692 ] Joel Bernstein commented on SOLR-15036: --- The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported field.s > Use plist automatically for executing a facet expression against a collection > alias backed by multiple collections > -- > > Key: SOLR-15036 > URL: https://issues.apache.org/jira/browse/SOLR-15036 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Attachments: relay-approach.patch > > Time Spent: 20m > Remaining Estimate: 0h > > For analytics use cases, streaming expressions make it possible to compute > basic aggregations (count, min, max, sum, and avg) over massive data sets. > Moreover, with massive data sets, it is common to use collection aliases over > many underlying collections, for instance time-partitioned aliases backed by > a set of collections, each covering a specific time range. In some cases, we > can end up with many collections (think 50-60) each with 100's of shards. > Aliases help insulate client applications from complex collection topologies > on the server side. > Let's take a basic facet expression that computes some useful aggregation > metrics: > {code:java} > facet( > some_alias, > q="*:*", > fl="a_i", > sort="a_i asc", > buckets="a_i", > bucketSorts="count(*) asc", > bucketSizeLimit=1, > sum(a_d), avg(a_d), min(a_d), max(a_d), count(*) > ) > {code} > Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr > which then expands the alias to a list of collections. For each collection, > the top-level distributed query controller gathers a candidate set of > replicas to query and then scatters {{distrib=false}} queries to each replica > in the list. For instance, if we have 60 collections with 200 shards each, > then this results in 12,000 shard requests from the query controller node to > the other nodes in the cluster. The requests are sent in an async manner (see > {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases > where we hit 18,000 replicas and these queries don’t always come back in a > timely manner. Put simply, this also puts a lot of load on the top-level > query controller node in terms of open connections and new object creation. > Instead, we can use {{plist}} to send the JSON facet query to each collection > in the alias in parallel, which reduces the overhead of each top-level > distributed query from 12,000 to 200 in my example above. With this approach, > you’ll then need to sort the tuples back from each collection and do a > rollup, something like: > {code:java} > select( > rollup( > sort( > plist( > select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt), > select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt) > ), > by="a_i asc" > ), > over="a_i", > sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt) > ), > a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as > the_min, max(the_max) as the_max, sum(cnt) as cnt > ) > {code} > One thing to point out is that you can’t just avg. the averages back from > each collection in the rollup. It needs to be a *weighted avg.* when rolling > up the avg. from each facet expression in the plist. However, we have the > count per collection, so this is doable but will require some changes to the > rollup expression to support weighted average. > While this plist approach is doable, it’s a pain for users to have to create > the rollup / sort over plist expression for collection aliases. After all, > aliases are supposed to hide these types of complexities from client > applications! > The point of this ticket is to investigate the feasibility of auto-wrapping > the facet expression with a rollup / sort / plist when the collection > argument is an alias with multiple collections; other stream sources will be > considered after facet is proven
[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections
[ https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249692#comment-17249692 ] Joel Bernstein edited comment on SOLR-15036 at 12/15/20, 1:30 PM: -- The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported fields was (Author: joel.bernstein): The fl for drill needs to include the *a_d* field. Basically you're rolling up and aggregating from the exported field.s > Use plist automatically for executing a facet expression against a collection > alias backed by multiple collections > -- > > Key: SOLR-15036 > URL: https://issues.apache.org/jira/browse/SOLR-15036 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Attachments: relay-approach.patch > > Time Spent: 20m > Remaining Estimate: 0h > > For analytics use cases, streaming expressions make it possible to compute > basic aggregations (count, min, max, sum, and avg) over massive data sets. > Moreover, with massive data sets, it is common to use collection aliases over > many underlying collections, for instance time-partitioned aliases backed by > a set of collections, each covering a specific time range. In some cases, we > can end up with many collections (think 50-60) each with 100's of shards. > Aliases help insulate client applications from complex collection topologies > on the server side. > Let's take a basic facet expression that computes some useful aggregation > metrics: > {code:java} > facet( > some_alias, > q="*:*", > fl="a_i", > sort="a_i asc", > buckets="a_i", > bucketSorts="count(*) asc", > bucketSizeLimit=1, > sum(a_d), avg(a_d), min(a_d), max(a_d), count(*) > ) > {code} > Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr > which then expands the alias to a list of collections. For each collection, > the top-level distributed query controller gathers a candidate set of > replicas to query and then scatters {{distrib=false}} queries to each replica > in the list. For instance, if we have 60 collections with 200 shards each, > then this results in 12,000 shard requests from the query controller node to > the other nodes in the cluster. The requests are sent in an async manner (see > {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases > where we hit 18,000 replicas and these queries don’t always come back in a > timely manner. Put simply, this also puts a lot of load on the top-level > query controller node in terms of open connections and new object creation. > Instead, we can use {{plist}} to send the JSON facet query to each collection > in the alias in parallel, which reduces the overhead of each top-level > distributed query from 12,000 to 200 in my example above. With this approach, > you’ll then need to sort the tuples back from each collection and do a > rollup, something like: > {code:java} > select( > rollup( > sort( > plist( > select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt), > select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", > bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), > min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, > min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt) > ), > by="a_i asc" > ), > over="a_i", > sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt) > ), > a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as > the_min, max(the_max) as the_max, sum(cnt) as cnt > ) > {code} > One thing to point out is that you can’t just avg. the averages back from > each collection in the rollup. It needs to be a *weighted avg.* when rolling > up the avg. from each facet expression in the plist. However, we have the > count per collection, so this is doable but will require some changes to the > rollup expression to support weighted average. > While this plist approach is doable, it’s a pain for users to have to create > the rollup / sort over plist expression for collection aliases. After all, > aliases are supposed to hide these types of complexities from client > applications! > The point of this ticket is to investigate the feasibility
[jira] [Commented] (LUCENE-9638) TestVectorValues.testIndexMultipleVectorFields reproducing test failure
[ https://issues.apache.org/jira/browse/LUCENE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249686#comment-17249686 ] ASF subversion and git services commented on LUCENE-9638: - Commit 3c9d355315434ff17a10cd073a0a04fa6a25c202 in lucene-solr's branch refs/heads/master from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3c9d355 ] LUCENE-9638: fix simple text vector format fields list terminator > TestVectorValues.testIndexMultipleVectorFields reproducing test failure > --- > > Key: LUCENE-9638 > URL: https://issues.apache.org/jira/browse/LUCENE-9638 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > I was beasting [this PR|https://github.com/apache/lucene-solr/pull/2088] but > then hit this failure, likely not related to that PR: > {noformat} > org.apache.lucene.index.TestVectorValues > testIndexMultipleVectorFields > FAILED > org.apache.lucene.index.CorruptIndexException: SimpleText failure: > expected checksum line but got field-number 2 > (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((sliced) > offset=96, length=478 (clone of) ByteBuffersIndexInput (file=_0.scf, > buffers=2,097 b\ > ytes, block size: 1,024, blocks: 3, position: 0) [slice=_0.gri]))) > at > __randomizedtesting.SeedInfo.seed([9963345FEF3254D:173EA6E1008A23F8]:0) > at > org.apache.lucene.codecs.simpletext.SimpleTextUtil.checkFooter(SimpleTextUtil.java:89) > at > org.apache.lucene.codecs.simpletext.SimpleTextVectorReader.(SimpleTextVectorReader.java:81) > at > org.apache.lucene.codecs.simpletext.SimpleTextVectorFormat.fieldsReader(SimpleTextVectorFormat.java:43) > at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:144) > at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84) > at > org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:171) > at > org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:213) > at > org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:572) > at > org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:105) > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:630) > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:474) > at > org.apache.lucene.index.TestVectorValues.testIndexMultipleVectorFields(TestVectorValues.java:619) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887) > at >
[jira] [Resolved] (LUCENE-9638) TestVectorValues.testIndexMultipleVectorFields reproducing test failure
[ https://issues.apache.org/jira/browse/LUCENE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Sokolov resolved LUCENE-9638. - Resolution: Fixed > TestVectorValues.testIndexMultipleVectorFields reproducing test failure > --- > > Key: LUCENE-9638 > URL: https://issues.apache.org/jira/browse/LUCENE-9638 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > I was beasting [this PR|https://github.com/apache/lucene-solr/pull/2088] but > then hit this failure, likely not related to that PR: > {noformat} > org.apache.lucene.index.TestVectorValues > testIndexMultipleVectorFields > FAILED > org.apache.lucene.index.CorruptIndexException: SimpleText failure: > expected checksum line but got field-number 2 > (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((sliced) > offset=96, length=478 (clone of) ByteBuffersIndexInput (file=_0.scf, > buffers=2,097 b\ > ytes, block size: 1,024, blocks: 3, position: 0) [slice=_0.gri]))) > at > __randomizedtesting.SeedInfo.seed([9963345FEF3254D:173EA6E1008A23F8]:0) > at > org.apache.lucene.codecs.simpletext.SimpleTextUtil.checkFooter(SimpleTextUtil.java:89) > at > org.apache.lucene.codecs.simpletext.SimpleTextVectorReader.(SimpleTextVectorReader.java:81) > at > org.apache.lucene.codecs.simpletext.SimpleTextVectorFormat.fieldsReader(SimpleTextVectorFormat.java:43) > at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:144) > at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84) > at > org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:171) > at > org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:213) > at > org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:572) > at > org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:105) > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:630) > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:474) > at > org.apache.lucene.index.TestVectorValues.testIndexMultipleVectorFields(TestVectorValues.java:619) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at >
[jira] [Created] (LUCENE-9639) Add unit tests for SImpleTextVector format
Michael Sokolov created LUCENE-9639: --- Summary: Add unit tests for SImpleTextVector format Key: LUCENE-9639 URL: https://issues.apache.org/jira/browse/LUCENE-9639 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Michael Sokolov The other simple text field formats have unit tests; we should add tests of the simple text vector format as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9638) TestVectorValues.testIndexMultipleVectorFields reproducing test failure
[ https://issues.apache.org/jira/browse/LUCENE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249684#comment-17249684 ] Michael Sokolov commented on LUCENE-9638: - Hmm well we do not have any unit tests for SImpleTextVector format; probably we should have. I'll open a separate issue > TestVectorValues.testIndexMultipleVectorFields reproducing test failure > --- > > Key: LUCENE-9638 > URL: https://issues.apache.org/jira/browse/LUCENE-9638 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > I was beasting [this PR|https://github.com/apache/lucene-solr/pull/2088] but > then hit this failure, likely not related to that PR: > {noformat} > org.apache.lucene.index.TestVectorValues > testIndexMultipleVectorFields > FAILED > org.apache.lucene.index.CorruptIndexException: SimpleText failure: > expected checksum line but got field-number 2 > (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((sliced) > offset=96, length=478 (clone of) ByteBuffersIndexInput (file=_0.scf, > buffers=2,097 b\ > ytes, block size: 1,024, blocks: 3, position: 0) [slice=_0.gri]))) > at > __randomizedtesting.SeedInfo.seed([9963345FEF3254D:173EA6E1008A23F8]:0) > at > org.apache.lucene.codecs.simpletext.SimpleTextUtil.checkFooter(SimpleTextUtil.java:89) > at > org.apache.lucene.codecs.simpletext.SimpleTextVectorReader.(SimpleTextVectorReader.java:81) > at > org.apache.lucene.codecs.simpletext.SimpleTextVectorFormat.fieldsReader(SimpleTextVectorFormat.java:43) > at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:144) > at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84) > at > org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:171) > at > org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:213) > at > org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:572) > at > org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:105) > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:630) > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:474) > at > org.apache.lucene.index.TestVectorValues.testIndexMultipleVectorFields(TestVectorValues.java:619) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898) > at >
[jira] [Commented] (LUCENE-9638) TestVectorValues.testIndexMultipleVectorFields reproducing test failure
[ https://issues.apache.org/jira/browse/LUCENE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249683#comment-17249683 ] Michael Sokolov commented on LUCENE-9638: - Thanks, yes it's writing an end marker after each field instead of after all the fields. I don't really see how this didn't fail earlier. Maybe SimpleText codec is not tested very often? > TestVectorValues.testIndexMultipleVectorFields reproducing test failure > --- > > Key: LUCENE-9638 > URL: https://issues.apache.org/jira/browse/LUCENE-9638 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > I was beasting [this PR|https://github.com/apache/lucene-solr/pull/2088] but > then hit this failure, likely not related to that PR: > {noformat} > org.apache.lucene.index.TestVectorValues > testIndexMultipleVectorFields > FAILED > org.apache.lucene.index.CorruptIndexException: SimpleText failure: > expected checksum line but got field-number 2 > (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((sliced) > offset=96, length=478 (clone of) ByteBuffersIndexInput (file=_0.scf, > buffers=2,097 b\ > ytes, block size: 1,024, blocks: 3, position: 0) [slice=_0.gri]))) > at > __randomizedtesting.SeedInfo.seed([9963345FEF3254D:173EA6E1008A23F8]:0) > at > org.apache.lucene.codecs.simpletext.SimpleTextUtil.checkFooter(SimpleTextUtil.java:89) > at > org.apache.lucene.codecs.simpletext.SimpleTextVectorReader.(SimpleTextVectorReader.java:81) > at > org.apache.lucene.codecs.simpletext.SimpleTextVectorFormat.fieldsReader(SimpleTextVectorFormat.java:43) > at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:144) > at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84) > at > org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:171) > at > org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:213) > at > org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:572) > at > org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:105) > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:630) > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:474) > at > org.apache.lucene.index.TestVectorValues.testIndexMultipleVectorFields(TestVectorValues.java:619) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898) >
[jira] [Resolved] (SOLR-14728) Add self join optimization to the TopLevelJoinQuery
[ https://issues.apache.org/jira/browse/SOLR-14728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-14728. Resolution: Duplicate > Add self join optimization to the TopLevelJoinQuery > --- > > Key: SOLR-14728 > URL: https://issues.apache.org/jira/browse/SOLR-14728 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Priority: Major > > A simple optimization can be put in place to massively improve join > performance when the TopLevelJoinQuery is performing a self join (same core) > and the *to* and *from* fields are the same field. In this scenario the top > level doc values ordinals can be used directly as a filter avoiding the most > expensive part of the join which is the bytes ref reconciliation between the > *to* and *from* fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14728) Add self join optimization to the TopLevelJoinQuery
[ https://issues.apache.org/jira/browse/SOLR-14728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249668#comment-17249668 ] Jason Gerlowski commented on SOLR-14728: Hey [~jbernste] I'm going to close this out as a duplicate of SOLR-15049. I wouldn't've created that ticket if I'd realized this one existed, but the newer ticket already has an associated PR, so this should be the one we close. > Add self join optimization to the TopLevelJoinQuery > --- > > Key: SOLR-14728 > URL: https://issues.apache.org/jira/browse/SOLR-14728 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Priority: Major > > A simple optimization can be put in place to massively improve join > performance when the TopLevelJoinQuery is performing a self join (same core) > and the *to* and *from* fields are the same field. In this scenario the top > level doc values ordinals can be used directly as a filter avoiding the most > expensive part of the join which is the bytes ref reconciliation between the > *to* and *from* fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9638) TestVectorValues.testIndexMultipleVectorFields reproducing test failure
Michael McCandless created LUCENE-9638: -- Summary: TestVectorValues.testIndexMultipleVectorFields reproducing test failure Key: LUCENE-9638 URL: https://issues.apache.org/jira/browse/LUCENE-9638 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless I was beasting [this PR|https://github.com/apache/lucene-solr/pull/2088] but then hit this failure, likely not related to that PR: {noformat} org.apache.lucene.index.TestVectorValues > testIndexMultipleVectorFields FAILED org.apache.lucene.index.CorruptIndexException: SimpleText failure: expected checksum line but got field-number 2 (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((sliced) offset=96, length=478 (clone of) ByteBuffersIndexInput (file=_0.scf, buffers=2,097 b\ ytes, block size: 1,024, blocks: 3, position: 0) [slice=_0.gri]))) at __randomizedtesting.SeedInfo.seed([9963345FEF3254D:173EA6E1008A23F8]:0) at org.apache.lucene.codecs.simpletext.SimpleTextUtil.checkFooter(SimpleTextUtil.java:89) at org.apache.lucene.codecs.simpletext.SimpleTextVectorReader.(SimpleTextVectorReader.java:81) at org.apache.lucene.codecs.simpletext.SimpleTextVectorFormat.fieldsReader(SimpleTextVectorFormat.java:43) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:144) at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84) at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:171) at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:213) at org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:572) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:105) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:630) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:474) at org.apache.lucene.index.TestVectorValues.testIndexMultipleVectorFields(TestVectorValues.java:619) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2136: SOLR-15037: fix prevent config change listener to reload core while s…
jbampton commented on a change in pull request #2136: URL: https://github.com/apache/lucene-solr/pull/2136#discussion_r543243184 ## File path: solr/core/src/java/org/apache/solr/schema/SchemaManager.java ## @@ -103,8 +104,9 @@ private List doOperations(List operations) throws InterruptedE String errorMsg = "Unable to persist managed schema. "; List errors = Collections.emptyList(); int latestVersion = -1; - -synchronized (req.getSchema().getSchemaUpdateLock()) { + Lock schemaChangeLock = req.getSchema().getSchemaUpdateLock(); Review comment: ```suggestion Lock schemaChangeLock = req.getSchema().getSchemaUpdateLock(); ``` ## File path: solr/core/src/java/org/apache/solr/schema/SchemaManager.java ## @@ -454,8 +459,8 @@ private ManagedIndexSchema getFreshManagedSchema(SolrCore core) throws IOExcepti if (in instanceof ZkSolrResourceLoader.ZkByteArrayInputStream) { int version = ((ZkSolrResourceLoader.ZkByteArrayInputStream) in).getStat().getVersion(); log.info("managed schema loaded . version : {} ", version); -return new ManagedIndexSchema(core.getSolrConfig(), name, new InputSource(in), true, name, version, -core.getLatestSchema().getSchemaUpdateLock()); +Lock schemaLock = (Lock) core.getLatestSchema().getSchemaUpdateLock(); Review comment: ```suggestion Lock schemaLock = (Lock) core.getLatestSchema().getSchemaUpdateLock(); ``` ## File path: solr/core/src/java/org/apache/solr/schema/ManagedIndexSchemaFactory.java ## @@ -178,9 +180,14 @@ public ManagedIndexSchema create(String resourceName, SolrConfig config) { managedSchemaResourceName, schemaZkVersion, getSchemaUpdateLock()); if (shouldUpgrade) { // Persist the managed schema if it doesn't already exist - synchronized (schema.getSchemaUpdateLock()) { + Lock schemaUpdateLock =schema.getSchemaUpdateLock(); Review comment: ```suggestion Lock schemaUpdateLock = schema.getSchemaUpdateLock(); ``` ## File path: solr/core/src/java/org/apache/solr/schema/SchemaManager.java ## @@ -454,8 +459,8 @@ private ManagedIndexSchema getFreshManagedSchema(SolrCore core) throws IOExcepti if (in instanceof ZkSolrResourceLoader.ZkByteArrayInputStream) { int version = ((ZkSolrResourceLoader.ZkByteArrayInputStream) in).getStat().getVersion(); log.info("managed schema loaded . version : {} ", version); -return new ManagedIndexSchema(core.getSolrConfig(), name, new InputSource(in), true, name, version, -core.getLatestSchema().getSchemaUpdateLock()); +Lock schemaLock = (Lock) core.getLatestSchema().getSchemaUpdateLock(); +return new ManagedIndexSchema(core.getSolrConfig(), name, new InputSource(in), true, name, version,schemaLock ); Review comment: ```suggestion return new ManagedIndexSchema(core.getSolrConfig(), name, new InputSource(in), true, name, version, schemaLock ); ``` ## File path: solr/core/src/java/org/apache/solr/core/SolrCore.java ## @@ -3145,14 +3145,17 @@ public static Runnable getConfListener(SolrCore core, ZkSolrResourceLoader zkSol checkStale(zkClient, managedSchmaResourcePath, managedSchemaVersion)) { log.info("core reload {}", coreName); SolrConfigHandler configHandler = ((SolrConfigHandler) core.getRequestHandler("/config")); -if (configHandler.getReloadLock().tryLock()) { - + if ((!core.schema.isMutable() || core.schema.getSchemaUpdateLock().tryLock()) + && configHandler.getReloadLock().tryLock()) { try { cc.reload(coreName, coreId); } catch (SolrCoreState.CoreIsClosedException e) { /*no problem this core is already closed*/ } finally { configHandler.getReloadLock().unlock(); +if(core.schema.isMutable()){ Review comment: ```suggestion if (core.schema.isMutable()) { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2141: LUCENE-9346: Support minimumNumberShouldMatch in WANDScorer
jbampton commented on a change in pull request #2141: URL: https://github.com/apache/lucene-solr/pull/2141#discussion_r543238073 ## File path: lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java ## @@ -195,10 +201,13 @@ private Scorer opt(Collection optional, int minShouldMatch, for (ScorerSupplier scorer : optional) { optionalScorers.add(scorer.get(leadCost)); } - if (minShouldMatch > 1) { + + if (scoreMode == ScoreMode.TOP_SCORES) { +return new WANDScorer(weight, optionalScorers, minShouldMatch); + } else if (minShouldMatch > 1) { +// nocommit minShouldMath > 1 && scoreMode != ScoreMode.TOP_SCORES still requires MinShouldMatchSumScorer. +// Do we want to depcate this entirely now ? Review comment: ```suggestion // Do we want to deprecate this entirely now ? ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2142: SOLR-14923: Reload RealtimeSearcher on next getInputDocument if forced
jbampton commented on a change in pull request #2142: URL: https://github.com/apache/lucene-solr/pull/2142#discussion_r543234918 ## File path: solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java ## @@ -618,6 +626,16 @@ public static SolrInputDocument getInputDocument(SolrCore core, BytesRef idBytes return getInputDocument (core, idBytes, null, null, lookupStrategy); } + /** + * Marks the {@link RealTimeGetComponent} of the corresponding {@link SolrCore} to reload it's realtime searcher before the next access. Review comment: ```suggestion * Marks the {@link RealTimeGetComponent} of the corresponding {@link SolrCore} to reload its realtime searcher before the next access. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2146: SOLR-15049: Add TopLevelJoinQuery optimization for 'self-joins'
jbampton commented on a change in pull request #2146: URL: https://github.com/apache/lucene-solr/pull/2146#discussion_r543233296 ## File path: solr/core/src/java/org/apache/solr/search/TopLevelJoinQuery.java ## @@ -218,4 +218,28 @@ public BitsetBounds(long lower, long upper) { this.upper = upper; } } + + /** + * A {@link TopLevelJoinQuery} implementation optimized for when 'from' and 'to' cores and fields match and no ordinal- + * conversion is necessary. + */ + static class SelfJoin extends TopLevelJoinQuery { +public SelfJoin(String joinField, Query subQuery) { + super(joinField, joinField, null, subQuery); +} + +protected BitsetBounds convertFromOrdinalsIntoToField(LongBitSet fromOrdBitSet, SortedSetDocValues fromDocValues, + LongBitSet toOrdBitSet, SortedSetDocValues toDocValues) throws IOException { + + // 'from' and 'to' ordinals are identical for self-joins. + toOrdBitSet.or(fromOrdBitSet); + + // Calculate boundary ords used for other optimizations + final long firstToOrd = toOrdBitSet.nextSetBit(0); + final long lastToOrd = toOrdBitSet.prevSetBit(toOrdBitSet.length() - 1); + return new BitsetBounds(firstToOrd, lastToOrd); +} + } } + Review comment: ```suggestion ``` ## File path: solr/core/src/java/org/apache/solr/search/TopLevelJoinQuery.java ## @@ -218,4 +218,28 @@ public BitsetBounds(long lower, long upper) { this.upper = upper; } } + + /** + * A {@link TopLevelJoinQuery} implementation optimized for when 'from' and 'to' cores and fields match and no ordinal- + * conversion is necessary. + */ + static class SelfJoin extends TopLevelJoinQuery { +public SelfJoin(String joinField, Query subQuery) { + super(joinField, joinField, null, subQuery); +} + +protected BitsetBounds convertFromOrdinalsIntoToField(LongBitSet fromOrdBitSet, SortedSetDocValues fromDocValues, + LongBitSet toOrdBitSet, SortedSetDocValues toDocValues) throws IOException { + + // 'from' and 'to' ordinals are identical for self-joins. + toOrdBitSet.or(fromOrdBitSet); + + // Calculate boundary ords used for other optimizations + final long firstToOrd = toOrdBitSet.nextSetBit(0); + final long lastToOrd = toOrdBitSet.prevSetBit(toOrdBitSet.length() - 1); + return new BitsetBounds(firstToOrd, lastToOrd); +} + } } + + Review comment: ```suggestion ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jbampton commented on a change in pull request #2147: LUCENE-9637: Clean up ShapeField/ShapeQuery random test
jbampton commented on a change in pull request #2147: URL: https://github.com/apache/lucene-solr/pull/2147#discussion_r543223927 ## File path: lucene/core/src/test/org/apache/lucene/document/BaseXYShapeTestCase.java ## @@ -126,24 +132,12 @@ protected boolean rectCrossesDateline(Object rect) { return false; } - /** use {@link ShapeTestUtil#nextPolygon()} to create a random line; TODO: move to GeoTestUtil */ + /** use {@link ShapeTestUtil#nextPolygon()} to create a random line */ @Override public XYLine nextLine() { -return getNextLine(); - } - - public static XYLine getNextLine() { -XYPolygon poly = ShapeTestUtil.nextPolygon(); -float[] x = new float[poly.numPoints() - 1]; -float[] y = new float[x.length]; -for (int i = 0; i < x.length; ++i) { - x[i] = poly.getPolyX(i); - y[i] = poly.getPolyY(i); -} - -return new XYLine(x, y); +return ShapeTestUtil.nextLine(); } - + Review comment: ```suggestion ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase opened a new pull request #2147: LUCENE-9637: Clean up ShapeField/ShapeQuery random test
iverase opened a new pull request #2147: URL: https://github.com/apache/lucene-solr/pull/2147 Removes some unused code and replaces the Point implementation on the test with the Point implementation in the geo package. cc: @nknize This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9637) Clean up ShapeField /ShapeQuery random test
Ignacio Vera created LUCENE-9637: Summary: Clean up ShapeField /ShapeQuery random test Key: LUCENE-9637 URL: https://issues.apache.org/jira/browse/LUCENE-9637 Project: Lucene - Core Issue Type: Test Reporter: Ignacio Vera There it seems to be a few unused code on those tests and in addition we can replace the Point implementation on those test with the Point implementation in the geo package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram commented on a change in pull request #2133: SOLR-15019: Replica placement API needs a way to fetch existing replica metrics
sigram commented on a change in pull request #2133: URL: https://github.com/apache/lucene-solr/pull/2133#discussion_r543196623 ## File path: solr/core/src/java/org/apache/solr/cluster/placement/impl/CollectionMetricsBuilder.java ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.cluster.placement.impl; + +import org.apache.solr.cluster.placement.CollectionMetrics; +import org.apache.solr.cluster.placement.ReplicaMetrics; +import org.apache.solr.cluster.placement.ShardMetrics; + +import java.util.HashMap; +import java.util.Map; +import java.util.Optional; + +/** + * + */ +public class CollectionMetricsBuilder { + + final Map shardMetricsBuilders = new HashMap<>(); + + + public void addShardMetrics(String shardName, ShardMetricsBuilder shardMetricsBuilder) { +shardMetricsBuilders.put(shardName, shardMetricsBuilder); + } + + public CollectionMetrics build() { +final Map metricsMap = new HashMap<>(); +shardMetricsBuilders.forEach((shard, builder) -> metricsMap.put(shard, builder.build())); +return shardName -> Optional.ofNullable(metricsMap.get(shardName)); + } + + public static class ShardMetricsBuilder { +final Map replicaMetricsBuilders = new HashMap<>(); + +public ShardMetricsBuilder addReplicaMetrics(String replicaName, ReplicaMetricsBuilder replicaMetricsBuilder) { + replicaMetricsBuilders.put(replicaName, replicaMetricsBuilder); + return this; +} + +public ShardMetricsBuilder setLeaderMetrics(ReplicaMetricsBuilder replicaMetricsBuilder) { + replicaMetricsBuilders.put(LEADER, replicaMetricsBuilder); + return this; +} + +public static final String LEADER = "__leader__"; + +public ShardMetrics build() { + final Map metricsMap = new HashMap<>(); + replicaMetricsBuilders.forEach((name, replicaBuilder) -> { +ReplicaMetrics metrics = replicaBuilder.build(); +metricsMap.put(name, metrics); +if (replicaBuilder.leader) { + metricsMap.put(LEADER, metrics); +} + }); + return new ShardMetrics() { +@Override +public Optional getLeaderMetrics() { + return Optional.ofNullable(metricsMap.get(LEADER)); +} + +@Override +public Optional getReplicaMetrics(String replicaName) { + return Optional.ofNullable(metricsMap.get(replicaName)); +} + }; +} + } + + public static class ReplicaMetricsBuilder { +final Map metrics = new HashMap<>(); +int sizeGB = 0; Review comment: Maybe use `Integer` here? Replicas always have size, even new ones, but if we can return null then we can signal that the size is unknown (eg. couldn't be retrieved from the node). Also, should we use Integer or Long? The unit is GB, I don't think we need Long here (and in other places that deal with disk size). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram commented on a change in pull request #2133: SOLR-15019: Replica placement API needs a way to fetch existing replica metrics
sigram commented on a change in pull request #2133: URL: https://github.com/apache/lucene-solr/pull/2133#discussion_r543196623 ## File path: solr/core/src/java/org/apache/solr/cluster/placement/impl/CollectionMetricsBuilder.java ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.cluster.placement.impl; + +import org.apache.solr.cluster.placement.CollectionMetrics; +import org.apache.solr.cluster.placement.ReplicaMetrics; +import org.apache.solr.cluster.placement.ShardMetrics; + +import java.util.HashMap; +import java.util.Map; +import java.util.Optional; + +/** + * + */ +public class CollectionMetricsBuilder { + + final Map shardMetricsBuilders = new HashMap<>(); + + + public void addShardMetrics(String shardName, ShardMetricsBuilder shardMetricsBuilder) { +shardMetricsBuilders.put(shardName, shardMetricsBuilder); + } + + public CollectionMetrics build() { +final Map metricsMap = new HashMap<>(); +shardMetricsBuilders.forEach((shard, builder) -> metricsMap.put(shard, builder.build())); +return shardName -> Optional.ofNullable(metricsMap.get(shardName)); + } + + public static class ShardMetricsBuilder { +final Map replicaMetricsBuilders = new HashMap<>(); + +public ShardMetricsBuilder addReplicaMetrics(String replicaName, ReplicaMetricsBuilder replicaMetricsBuilder) { + replicaMetricsBuilders.put(replicaName, replicaMetricsBuilder); + return this; +} + +public ShardMetricsBuilder setLeaderMetrics(ReplicaMetricsBuilder replicaMetricsBuilder) { + replicaMetricsBuilders.put(LEADER, replicaMetricsBuilder); + return this; +} + +public static final String LEADER = "__leader__"; + +public ShardMetrics build() { + final Map metricsMap = new HashMap<>(); + replicaMetricsBuilders.forEach((name, replicaBuilder) -> { +ReplicaMetrics metrics = replicaBuilder.build(); +metricsMap.put(name, metrics); +if (replicaBuilder.leader) { + metricsMap.put(LEADER, metrics); +} + }); + return new ShardMetrics() { +@Override +public Optional getLeaderMetrics() { + return Optional.ofNullable(metricsMap.get(LEADER)); +} + +@Override +public Optional getReplicaMetrics(String replicaName) { + return Optional.ofNullable(metricsMap.get(replicaName)); +} + }; +} + } + + public static class ReplicaMetricsBuilder { +final Map metrics = new HashMap<>(); +int sizeGB = 0; Review comment: Maybe use `Integer` here? Replicas always have size, even new ones, but if we can return null then we can signal that the size is unknown (eg. couldn't be retrieved from the node). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram commented on a change in pull request #2133: SOLR-15019: Replica placement API needs a way to fetch existing replica metrics
sigram commented on a change in pull request #2133: URL: https://github.com/apache/lucene-solr/pull/2133#discussion_r543195639 ## File path: solr/core/src/java/org/apache/solr/cluster/placement/impl/CollectionMetricsBuilder.java ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.cluster.placement.impl; + +import org.apache.solr.cluster.placement.CollectionMetrics; +import org.apache.solr.cluster.placement.ReplicaMetrics; +import org.apache.solr.cluster.placement.ShardMetrics; + +import java.util.HashMap; +import java.util.Map; +import java.util.Optional; + +/** + * + */ +public class CollectionMetricsBuilder { + + final Map shardMetricsBuilders = new HashMap<>(); + + + public void addShardMetrics(String shardName, ShardMetricsBuilder shardMetricsBuilder) { +shardMetricsBuilders.put(shardName, shardMetricsBuilder); + } + + public CollectionMetrics build() { +final Map metricsMap = new HashMap<>(); +shardMetricsBuilders.forEach((shard, builder) -> metricsMap.put(shard, builder.build())); +return shardName -> Optional.ofNullable(metricsMap.get(shardName)); + } + + public static class ShardMetricsBuilder { +final Map replicaMetricsBuilders = new HashMap<>(); + +public ShardMetricsBuilder addReplicaMetrics(String replicaName, ReplicaMetricsBuilder replicaMetricsBuilder) { + replicaMetricsBuilders.put(replicaName, replicaMetricsBuilder); + return this; +} + +public ShardMetricsBuilder setLeaderMetrics(ReplicaMetricsBuilder replicaMetricsBuilder) { + replicaMetricsBuilders.put(LEADER, replicaMetricsBuilder); + return this; +} + +public static final String LEADER = "__leader__"; Review comment: Make this `@VisibleForTesting` or package-private. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram commented on a change in pull request #2133: SOLR-15019: Replica placement API needs a way to fetch existing replica metrics
sigram commented on a change in pull request #2133: URL: https://github.com/apache/lucene-solr/pull/2133#discussion_r543193912 ## File path: solr/core/src/java/org/apache/solr/cluster/placement/ShardMetrics.java ## @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.cluster.placement; + +import java.util.Optional; + +/** + * + */ +public interface ShardMetrics { + Optional getLeaderMetrics(); + Optional getReplicaMetrics(String replicaName); Review comment: Perhaps we should add `iterator()` here too, so that the consumers are not required to know the replica name in advance. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9627) Small refactor of codec classes
[ https://issues.apache.org/jira/browse/LUCENE-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-9627. -- Fix Version/s: master (9.0) Assignee: Ignacio Vera Resolution: Fixed > Small refactor of codec classes > --- > > Key: LUCENE-9627 > URL: https://issues.apache.org/jira/browse/LUCENE-9627 > Project: Lucene - Core > Issue Type: Wish >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Minor > Fix For: master (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > While working on LUCENE-9047, I had to refactor some classes in order to > separate code that opens a file and reads the header/ footer from the code > that reads the actual content of the file. Regardless of that issue, I think > the refactor is a good thing. > > In addition it seems Lucene50FieldInfosFormat is not used anywhere in the > code so I propose to remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9627) Small refactor of codec classes
[ https://issues.apache.org/jira/browse/LUCENE-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249578#comment-17249578 ] ASF subversion and git services commented on LUCENE-9627: - Commit 4b3e8d7ce8feba658a9730e65da70c04a7e9c52f in lucene-solr's branch refs/heads/master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4b3e8d7 ] LUCENE-9627: Remove unused Lucene50FieldInfosFormat codec and small refactor some codecs to separate reading header/footer from reading content of the file > Small refactor of codec classes > --- > > Key: LUCENE-9627 > URL: https://issues.apache.org/jira/browse/LUCENE-9627 > Project: Lucene - Core > Issue Type: Wish >Reporter: Ignacio Vera >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > While working on LUCENE-9047, I had to refactor some classes in order to > separate code that opens a file and reads the header/ footer from the code > that reads the actual content of the file. Regardless of that issue, I think > the refactor is a good thing. > > In addition it seems Lucene50FieldInfosFormat is not used anywhere in the > code so I propose to remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #2109: LUCENE-9627: Small refactor of codec classes
iverase merged pull request #2109: URL: https://github.com/apache/lucene-solr/pull/2109 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gf2121 commented on pull request #2113: LUCENE-9629: Use computed masks
gf2121 commented on pull request #2113: URL: https://github.com/apache/lucene-solr/pull/2113#issuecomment-745134305 > @dweiss Thank you very much for the guiding of how to do a benchmark on Lucene, the lueneutil tool is really nice! > I repeatedly execute the wikimedium1m 20 times, the following is the result of the last iter. I guess it indicates that the reading performance is stable overall :) > > ``` > TaskQPSbaseline StdDevQPS my_mod_ver StdDev Pct diff p-value >HighTermMonthSort 405.40 (11.6%) 390.20 (11.1%) -3.7% ( -23% - 21%) 0.295 > PKLookup 131.35 (4.5%) 128.26 (6.5%) -2.4% ( -12% -9%) 0.183 > Respell 88.96 (8.0%) 86.91 (8.7%) -2.3% ( -17% - 15%) 0.383 > AndHighLow 611.88 (5.6%) 603.60 (9.2%) -1.4% ( -15% - 14%) 0.575 > BrowseDayOfYearTaxoFacets 21.70 (7.7%) 21.43 (9.1%) -1.3% ( -16% - 16%) 0.630 > HighSloppyPhrase 74.97 (4.8%) 74.10 (8.6%) -1.2% ( -13% - 12%) 0.601 >BrowseMonthSSDVFacets 91.65 (4.8%) 90.66 (5.8%) -1.1% ( -11% -9%) 0.519 >MedPhrase 110.37 (8.8%) 109.19 (7.3%) -1.1% ( -15% - 16%) 0.674 >OrHighMed 167.04 (8.9%) 165.47 (7.6%) -0.9% ( -16% - 17%) 0.718 > OrHighHigh 86.33 (9.1%) 85.61 (7.9%) -0.8% ( -16% - 17%) 0.755 > LowSpanNear 156.83 (7.1%) 155.74 (6.5%) -0.7% ( -13% - 13%) 0.746 > Wildcard 139.21 (7.0%) 138.29 (9.8%) -0.7% ( -16% - 17%) 0.805 > BrowseDayOfYearSSDVFacets 78.95 (5.0%) 78.52 (5.8%) -0.5% ( -10% - 10%) 0.753 > Fuzzy1 76.09 (9.7%) 75.84 (7.7%) -0.3% ( -16% - 18%) 0.905 > MedSloppyPhrase 48.37 (5.9%) 48.22 (5.1%) -0.3% ( -10% - 11%) 0.859 > IntNRQ 310.73 (10.5%) 310.11 (12.2%) -0.2% ( -20% - 25%) 0.955 > HighTerm 759.07 (7.3%) 757.84 (12.1%) -0.2% ( -18% - 20%) 0.959 >BrowseMonthTaxoFacets 24.17 (9.5%) 24.19 (10.0%) 0.1% ( -17% - 21%) 0.984 > HighIntervalsOrdered 121.98 (5.8%) 122.10 (7.9%) 0.1% ( -12% - 14%) 0.964 > LowSloppyPhrase 188.90 (7.8%) 189.42 (5.7%) 0.3% ( -12% - 14%) 0.898 > AndHighMed 418.22 (9.0%) 420.49 (9.7%) 0.5% ( -16% - 21%) 0.855 > MedTerm 874.56 (7.6%) 880.02 (10.9%) 0.6% ( -16% - 20%) 0.833 > MedSpanNear 378.96 (7.2%) 381.70 (9.2%) 0.7% ( -14% - 18%) 0.781 > Fuzzy2 25.74 (9.8%) 25.97 (10.9%) 0.9% ( -17% - 23%) 0.777 > AndHighHigh 126.51 (6.1%) 127.75 (8.5%) 1.0% ( -12% - 16%) 0.676 > HighPhrase 165.71 (8.2%) 167.64 (11.3%) 1.2% ( -16% - 22%) 0.708 > BrowseDateTaxoFacets 21.40 (10.4%) 21.69 (10.0%) 1.3% ( -17% - 24%) 0.682 > LowTerm 1032.43 (8.0%) 1049.57 (10.1%) 1.7% ( -15% - 21%) 0.566 >OrHighLow 228.54 (5.0%) 232.36 (8.5%) 1.7% ( -11% - 15%) 0.446 > HighSpanNear 93.72 (7.4%) 95.46 (7.3%) 1.9% ( -12% - 17%) 0.427 >HighTermDayOfYearSort 380.08 (12.2%) 387.47 (9.4%) 1.9% ( -17% - 26%) 0.573 >LowPhrase 137.13 (8.2%) 140.30 (6.2%) 2.3% ( -11% - 18%) 0.314 > Prefix3 290.55 (11.0%) 299.98 (12.9%) 3.2% ( -18% - 30%) 0.390 > ``` Though this result looks fine, i'm still a bit worried about the cost of reading array in some cases, for example, in a different java version. So i updated this PR again to make sure that we only read arrays when the index is a variable. In other case, we still read final longs. There are only 14 lines more than the origin code, which i think can be acceptable :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional