[jira] [Commented] (LUCENE-9454) Upgrade hamcrest to version 2.2
[ https://issues.apache.org/jira/browse/LUCENE-9454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563422#comment-17563422 ] Gus Heck commented on LUCENE-9454: -- The second commit listed here appears to be attributed to the wrong issue number, I was hoping to understand the motivation/upgrade path for this, but it's not discussed here. [~romseygeek] ? > Upgrade hamcrest to version 2.2 > --- > > Key: LUCENE-9454 > URL: https://issues.apache.org/jira/browse/LUCENE-9454 > Project: Lucene - Core > Issue Type: Task >Affects Versions: 9.0 >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9575) Add PatternTypingFilter
[ https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck resolved LUCENE-9575. -- Fix Version/s: 8.9 Resolution: Implemented > Add PatternTypingFilter > --- > > Key: LUCENE-9575 > URL: https://issues.apache.org/jira/browse/LUCENE-9575 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Fix For: 8.9 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > One of the key asks when the Library of Congress was asking me to develop the > Advanced Query Parser was to be able to recognize arbitrary patterns that > included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they > wanted 401k and 401(k) to match documents with either style reference, and > NOT match documents that happen to have isolated 401 or k tokens (i.e. not > documents about the http status code) And of course we wanted to give up as > little of the text analysis features they were already using. > This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and > one solr specific filter in SOLR-14597 that re-analyzes tokens with an > arbitrary analyzer defined for a type in the solr schema, combine to achieve > this. > This filter has the job of spotting the patterns, and adding the intended > synonym as at type to the token (from which minimal punctuation has been > removed). It also sets flags on the token which are retained through the > analysis chain, and at the very end the type is converted to a synonym and > the original token(s) for that type are dropped avoiding the match on 401 > (for example) > The pattern matching is specified in a file that looks like: > {code} > 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2 > 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3 > 2 C\+\+ ::: c_plus_plus > {code} > That file would match match legal reference patterns such as 401(k), 401k, > 501(c)3 and C++ The format is: > ::: > and groups in the pattern are substituted into the replacement so the first > line above would create synonyms such as: > {code} > 401k --> legal2_401_k > 401(k) --> legal2_401_k > 503(c) --> legal2_503_c > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types
[ https://issues.apache.org/jira/browse/LUCENE-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck resolved LUCENE-9572. -- Fix Version/s: 8.9 Resolution: Implemented > Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types > --- > > Key: LUCENE-9572 > URL: https://issues.apache.org/jira/browse/LUCENE-9572 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis, modules/test-framework >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Fix For: 8.9 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > (Breaking this off of SOLR-14597 for independent review) > TypeAsSynonymFilter converts types attributes to a synonym. In some cases the > original token may have already had flags set on it and it may be useful to > propagate some or all of those flags to the synonym we are generating. This > ticket provides that ability and allows the user to specify a bitmask to > specify which flags are retained. > Additionally there may be some set of types that should not be converted to > synonyms, and this change allows the user to specify a comma separated list > of types to ignore (most common case will be to ignore a common default type > of 'word' I suspect) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9943) DOC: Fix spelling(camelCase it like GitHub )
[ https://issues.apache.org/jira/browse/LUCENE-9943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck resolved LUCENE-9943. -- Fix Version/s: 9.0 Resolution: Fixed Thanks :) > DOC: Fix spelling(camelCase it like GitHub ) > - > > Key: LUCENE-9943 > URL: https://issues.apache.org/jira/browse/LUCENE-9943 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Affects Versions: 8.8.1 >Reporter: AYUSHMAN SINGH CHAUHAN >Priority: Minor > Labels: documentation > Fix For: 9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > docs update => spelling: github -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9574) Add a token filter to drop tokens based on flags.
[ https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated LUCENE-9574: - Fix Version/s: 8.9 > Add a token filter to drop tokens based on flags. > - > > Key: LUCENE-9574 > URL: https://issues.apache.org/jira/browse/LUCENE-9574 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Fix For: 8.9 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > (Breaking this off of SOLR-14597 for independent review) > A filter that tests flags on tokens vs a bitmask and drops tokens that have > all specified flags. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9574) Add a token filter to drop tokens based on flags.
[ https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck resolved LUCENE-9574. -- Resolution: Implemented > Add a token filter to drop tokens based on flags. > - > > Key: LUCENE-9574 > URL: https://issues.apache.org/jira/browse/LUCENE-9574 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > (Breaking this off of SOLR-14597 for independent review) > A filter that tests flags on tokens vs a bitmask and drops tokens that have > all specified flags. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser
[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17291072#comment-17291072 ] Gus Heck commented on SOLR-14787: - [~jbernste] This operates at a token level not a document level. Fields and joins would filter at a document level. In the simple equals case the payload might be "noun" or "verb" string and you could search for documents where the word "set" was used as a "NOUN". One could also perhaps score tokens for "offensiveness" (or something else) and then encode that as a payload and match (or avoid matches) only if the tokens were more offensive than X... or vice-versa (that analysis could be context sensitive NLP based stuff). These sorts of things likely slow down and inflate the index but enable detailed token by token functionality not otherwise available. > Inequality support in Payload Check query parser > > > Key: SOLR-14787 > URL: https://issues.apache.org/jira/browse/SOLR-14787 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The goal of this ticket/pull request is to support a richer set of matching > and filtering based on term payloads. This patch extends the > PayloadCheckQueryParser to add a new local param for "op" > The value of OP could be one of the following > * gt - greater than > * gte - greater than or equal > * lt - less than > * lte - less than or equal > default value for "op" if not specified is to be the current behavior of > equals. > Additionally to the operation you can specify a threshold local parameter > This will provide the ability to search for the term "cat" so long as the > payload has a value of greater than 0.75. > One use case is to classify a document into various categories with an > associated confidence or probability that the classification is correct. > That can be indexed into a delimited payload field. The searches can find > and match documents that were tagged with the "cat" category with a > confidence of greater than 0.5. > Example Document > {code:java} > { > "id":"doc_1", > "classifications_payload":["cat|0.75 dog|2.0"] > } > {code} > Example Syntax > {code:java} > {!payload_check f=classifications_payload payloads='1' op='gt' > threshold='0.5'}cat {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13696) DimensionalRoutedAliasUpdateProcessorTest / RoutedAliasUpdateProcessorTest failures due commitWithin/openSearcher delays
[ https://issues.apache.org/jira/browse/SOLR-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288112#comment-17288112 ] Gus Heck commented on SOLR-13696: - Finally coming back to this. In retrospect I think this test was overzealous. "Commit within" is a feature that is really orthogonal to routed aliases and there's no good reason to believe that it would succeed or fail differently than a regular commit. Removing this aspect of the test simplifies the code, makes the test faster and probably costs us little or nothing in terms of safety. > DimensionalRoutedAliasUpdateProcessorTest / RoutedAliasUpdateProcessorTest > failures due commitWithin/openSearcher delays > > > Key: SOLR-13696 > URL: https://issues.apache.org/jira/browse/SOLR-13696 > Project: Solr > Issue Type: Test >Reporter: Chris M. Hostetter >Assignee: Gus Heck >Priority: Major > Attachments: thetaphi_Lucene-Solr-8.x-MacOSX_272.log.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Recent jenkins failure... > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-MacOSX/272/ > Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC > {noformat} > Stack Trace: > java.lang.AssertionError: expected:<16> but was:<15> > at > __randomizedtesting.SeedInfo.seed([DB6DC28D5560B1D2:E295833E1541FDB9]:0) > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.assertCatTimeInvariants(DimensionalRoutedAliasUpdateProcessorTest.java:677 > ) > at > org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.testTimeCat(DimensionalRoutedAliasUpdateProcessorTest.java:282) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > {noformat} > Digging into the logs, the problem appears to be in the way the test > verifies/assumes docs have been committed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14787) Inequality support in Payload Check query parser
[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck resolved SOLR-14787. - Fix Version/s: master (9.0) Resolution: Implemented > Inequality support in Payload Check query parser > > > Key: SOLR-14787 > URL: https://issues.apache.org/jira/browse/SOLR-14787 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The goal of this ticket/pull request is to support a richer set of matching > and filtering based on term payloads. This patch extends the > PayloadCheckQueryParser to add a new local param for "op" > The value of OP could be one of the following > * gt - greater than > * gte - greater than or equal > * lt - less than > * lte - less than or equal > default value for "op" if not specified is to be the current behavior of > equals. > Additionally to the operation you can specify a threshold local parameter > This will provide the ability to search for the term "cat" so long as the > payload has a value of greater than 0.75. > One use case is to classify a document into various categories with an > associated confidence or probability that the classification is correct. > That can be indexed into a delimited payload field. The searches can find > and match documents that were tagged with the "cat" category with a > confidence of greater than 0.5. > Example Document > {code:java} > { > "id":"doc_1", > "classifications_payload":["cat|0.75 dog|2.0"] > } > {code} > Example Syntax > {code:java} > {!payload_check f=classifications_payload payloads='1' op='gt' > threshold='0.5'}cat {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh
[ https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck resolved SOLR-14704. - Fix Version/s: 8.9 Resolution: Fixed > Add download option to solr/cloud-dev/cloud.sh > -- > > Key: SOLR-14704 > URL: https://issues.apache.org/jira/browse/SOLR-14704 > Project: Solr > Issue Type: New Feature > Components: scripts and tools >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Fix For: 8.9 > > Time Spent: 1h > Remaining Estimate: 0h > > For easier testing of things like RC artifacts I'm adding an option to > cloud.sh which will curl a tarball down from the web instead of building it > locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15160) Update cloud-dev/cloud.sh to work with gradle
Gus Heck created SOLR-15160: --- Summary: Update cloud-dev/cloud.sh to work with gradle Key: SOLR-15160 URL: https://issues.apache.org/jira/browse/SOLR-15160 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: scripts and tools Reporter: Gus Heck Now that the gradle build is a bit more mature, we can update this tool to smooth the creation of testing clusters on the local machine for master. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15125) Link to docs is brroken
[ https://issues.apache.org/jira/browse/SOLR-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276537#comment-17276537 ] Gus Heck commented on SOLR-15125: - There has been some difficulty with deploying the docs for the recent release, several of the latest versions are presently not available on the web, this is being worked on urgently by several folks. > Link to docs is brroken > --- > > Key: SOLR-15125 > URL: https://issues.apache.org/jira/browse/SOLR-15125 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: website >Reporter: Thomas Güttler >Priority: Minor > > [On this page: > https://lucene.apache.org/solr/guide/|https://lucene.apache.org/solr/guide/] > the link to [https://lucene.apache.org/solr/guide/8_8/] > is broken. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-7642) Should launching Solr in cloud mode using a ZooKeeper chroot create the chroot znode if it doesn't exist?
[ https://issues.apache.org/jira/browse/SOLR-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275457#comment-17275457 ] Gus Heck commented on SOLR-7642: ch does indeed mean change, but it's a reference to the unix chroot operation (https://en.wikipedia.org/wiki/Chroot). I think it should be createZkChRoot for consistency both with other documentation and with similar concepts at the OS level. For those familiar with chroot elsewhere it's read as "create zk chroot" meaning isolating the zk stuff to it's own sub-tree and preventing upward access. One could argue for not capitalizing the R, but I think we do capitalize elsewhere so best to be consistent. > Should launching Solr in cloud mode using a ZooKeeper chroot create the > chroot znode if it doesn't exist? > - > > Key: SOLR-7642 > URL: https://issues.apache.org/jira/browse/SOLR-7642 > Project: Solr > Issue Type: Improvement >Reporter: Timothy Potter >Priority: Minor > Attachments: SOLR-7642.patch, SOLR-7642.patch, SOLR-7642.patch, > SOLR-7642.patch, SOLR-7642_tag_7.5.0.patch, > SOLR-7642_tag_7.5.0_proposition.patch > > > If you launch Solr for the first time in cloud mode using a ZooKeeper > connection string that includes a chroot leads to the following > initialization error: > {code} > ERROR - 2015-06-05 17:15:50.410; [ ] org.apache.solr.common.SolrException; > null:org.apache.solr.common.cloud.ZooKeeperException: A chroot was specified > in ZkHost but the znode doesn't exist. localhost:2181/lan > at > org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:113) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:339) > at > org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:140) > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:110) > at > org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:138) > at > org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:852) > at > org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:298) > at > org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1349) > at > org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1342) > at > org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741) > at > org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:505) > {code} > The work-around for this is to use the scripts/cloud-scripts/zkcli.sh script > to create the chroot znode (bootstrap action does this). > I'm wondering if we shouldn't just create the znode if it doesn't exist? Or > is that some violation of using a chroot? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9696) RegExp with group references
Gus Heck created LUCENE-9696: Summary: RegExp with group references Key: LUCENE-9696 URL: https://issues.apache.org/jira/browse/LUCENE-9696 Project: Lucene - Core Issue Type: Wish Reporter: Gus Heck PatternTypingFilter presently relies on java util regexes, but LUCENE-7465 found performance benefits using our own RegExp class instead. Unfortunately RegExp does not currently report matching subgroups which is key to PatternTypingFilter's use (and probably useful in other endeavors as well). What's needed is reporting of sub-groups such that new RegExp("(foo(.+)")) -->> converted to run atomaton etc --> match found for "foobar" --> somehow reports getGroup(1) as "bar" And getGroup() can be called on some object reasonably accessible to the code using RegExp in the first place. Clearly there's a lot to be worked out there since the normal usage pattern converts things to a DFA / run Automaton etc, and subgroups are not a natural concept for those classes. But if this could be achieved without loosing the performance benefits, that would be interesting :). Opening this Wish ticket as encouraged by [~mikemccand] in LUCENE-9575. I won't be able to work on it any time soon to encourage anyone else interested to pick it up or to drop links or ideas in here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter
[ https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271184#comment-17271184 ] Gus Heck commented on LUCENE-9575: -- ah thanks, though I was waiting on tests in github for [https://github.com/apache/lucene-solr/pull/2240] > Add PatternTypingFilter > --- > > Key: LUCENE-9575 > URL: https://issues.apache.org/jira/browse/LUCENE-9575 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > One of the key asks when the Library of Congress was asking me to develop the > Advanced Query Parser was to be able to recognize arbitrary patterns that > included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they > wanted 401k and 401(k) to match documents with either style reference, and > NOT match documents that happen to have isolated 401 or k tokens (i.e. not > documents about the http status code) And of course we wanted to give up as > little of the text analysis features they were already using. > This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and > one solr specific filter in SOLR-14597 that re-analyzes tokens with an > arbitrary analyzer defined for a type in the solr schema, combine to achieve > this. > This filter has the job of spotting the patterns, and adding the intended > synonym as at type to the token (from which minimal punctuation has been > removed). It also sets flags on the token which are retained through the > analysis chain, and at the very end the type is converted to a synonym and > the original token(s) for that type are dropped avoiding the match on 401 > (for example) > The pattern matching is specified in a file that looks like: > {code} > 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2 > 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3 > 2 C\+\+ ::: c_plus_plus > {code} > That file would match match legal reference patterns such as 401(k), 401k, > 501(c)3 and C++ The format is: > ::: > and groups in the pattern are substituted into the replacement so the first > line above would create synonyms such as: > {code} > 401k --> legal2_401_k > 401(k) --> legal2_401_k > 503(c) --> legal2_503_c > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter
[ https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270947#comment-17270947 ] Gus Heck commented on LUCENE-9575: -- Thanks for fixing that, yeah separate ticket for groups in RegExp would be cool, though when I'd find time for it is a question. I had googled around and I recall looking at some paper as well wonder if its the same :). However, I couldn't say the customer at the time really needed that so I had to set it aside. I'm interested in backporting this, and all of the related AQP stuff, but want to make sure a full set gets in master before I spend time on that. This also gets complicated by a strong desire by many to get 9x out the door and issues with Lucene 9.0 needing to support 8.9. Based on that perhaps I should revise my and get the Lucene bits backported asap. > Add PatternTypingFilter > --- > > Key: LUCENE-9575 > URL: https://issues.apache.org/jira/browse/LUCENE-9575 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > One of the key asks when the Library of Congress was asking me to develop the > Advanced Query Parser was to be able to recognize arbitrary patterns that > included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they > wanted 401k and 401(k) to match documents with either style reference, and > NOT match documents that happen to have isolated 401 or k tokens (i.e. not > documents about the http status code) And of course we wanted to give up as > little of the text analysis features they were already using. > This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and > one solr specific filter in SOLR-14597 that re-analyzes tokens with an > arbitrary analyzer defined for a type in the solr schema, combine to achieve > this. > This filter has the job of spotting the patterns, and adding the intended > synonym as at type to the token (from which minimal punctuation has been > removed). It also sets flags on the token which are retained through the > analysis chain, and at the very end the type is converted to a synonym and > the original token(s) for that type are dropped avoiding the match on 401 > (for example) > The pattern matching is specified in a file that looks like: > {code} > 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2 > 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3 > 2 C\+\+ ::: c_plus_plus > {code} > That file would match match legal reference patterns such as 401(k), 401k, > 501(c)3 and C++ The format is: > ::: > and groups in the pattern are substituted into the replacement so the first > line above would create synonyms such as: > {code} > 401k --> legal2_401_k > 401(k) --> legal2_401_k > 503(c) --> legal2_503_c > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14608) Faster sorting for the /export handler
[ https://issues.apache.org/jira/browse/SOLR-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270740#comment-17270740 ] Gus Heck commented on SOLR-14608: - Having just gone through some cost minimization, the particular case may be undersized, and it wasn't a clean test, so not looking to trouble shoot that in a Jira ticket :), Just trying to understand the shape of the change in this ticket. Would it be possible to quantify the memory cost here, I often find that one of the things making solr implementations difficult for several customers I've seen is the cost of fielding machines with enough memory. I have a client that has implemented very complex arrangements with spot machines to keep costs under control for example. If there's a way to trade memory vs speed, that's a great feature to have, but if the memory difference is large maybe it needs to be something the user can select? You mention options to tune this implementation, but I'm not seeing any documentation updates... Particularly important would be documentation of settings that offer similar memory usage to the previous implementation (even if they are not the default). > Faster sorting for the /export handler > -- > > Key: SOLR-14608 > URL: https://issues.apache.org/jira/browse/SOLR-14608 > Project: Solr > Issue Type: New Feature >Affects Versions: master (9.0) >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Fix For: master (9.0) > > > The largest cost of the export handler is the sorting. This ticket will > implement an improved algorithm for sorting that should greatly increase > overall throughput for the export handler. > *The current algorithm is as follows:* > Collect a bitset of matching docs. Iterate over that bitset and materialize > the top level oridinals for the sort fields in the document and add them to > priority queue of size 3. Then export the top 3 docs, turn off the > bits in the bit set and iterate again until all docs are sorted and sent. > There are two performance bottlenecks with this approach: > 1) Materializing the top level ordinals adds a huge amount of overhead to the > sorting process. > 2) The size of priority queue, 30,000, adds significant overhead to sorting > operations. > *The new algorithm:* > Has a top level *merge sort iterator* that wraps segment level iterators that > perform segment level priority queue sorts. > *Segment level:* > The segment level docset will be iterated and the segment level ordinals for > the sort fields will be materialized and added to a segment level priority > queue. As the segment level iterator pops docs from the priority queue the > top level ordinals for the sort fields are materialized. Because the top > level ordinals are materialized AFTER the sort, they only need to be looked > up when the segment level ordinal changes. This takes advantage of the sort > to limit the lookups into the top level ordinal structures. This also > eliminates redundant lookups of top level ordinals that occur during the > multiple passes over the matching docset. > The segment level priority queues can be kept smaller than 30,000 to improve > performance of the sorting operations because the overall batch size will > still be 30,000 or greater when all the segment priority queue sizes are > added up. This allows for batch sizes much larger then 30,000 without using a > single large priority queue. The increased batch size means fewer iterations > over the matching docset and the decreased priority queue size means faster > sorting operations. > *Top level:* > A top level iterator does a merge sort over the segment level iterators by > comparing the top level ordinals materialized when the segment level docs are > popped from the segment level priority queues. This requires no extra memory > and will be very performant. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14608) Faster sorting for the /export handler
[ https://issues.apache.org/jira/browse/SOLR-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270300#comment-17270300 ] Gus Heck commented on SOLR-14608: - Came back to re-read this to fuel a better understanding of sort memory requirements after an OOM on a relatively simple query that should yield ~38k docs out of an 11 Billion doc corpus (but other stuff including data ingestion was going on, so it's not a clean case, just a bit of a surprise since I assumed that the sort memory would relate to the 38k docs, which seemed like it ought to be trivial, only a few fields were requested all numeric or short strings, probably ~0.25k/doc so maybe 8 Mb?). Did you ever investigate my prior question regarding queue size? And I'm also wondering if your algorithm is dependent on having a lot of segments, what if there's been a force-merge? Above in your description of the current algorithm you say "turn off the bits in the bit set" I'm assuming this means just the bits for the docs that were "sent"? and when you say "sent" you mean sent to the coordinating node? > Faster sorting for the /export handler > -- > > Key: SOLR-14608 > URL: https://issues.apache.org/jira/browse/SOLR-14608 > Project: Solr > Issue Type: New Feature >Affects Versions: master (9.0) >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Fix For: master (9.0) > > > The largest cost of the export handler is the sorting. This ticket will > implement an improved algorithm for sorting that should greatly increase > overall throughput for the export handler. > *The current algorithm is as follows:* > Collect a bitset of matching docs. Iterate over that bitset and materialize > the top level oridinals for the sort fields in the document and add them to > priority queue of size 3. Then export the top 3 docs, turn off the > bits in the bit set and iterate again until all docs are sorted and sent. > There are two performance bottlenecks with this approach: > 1) Materializing the top level ordinals adds a huge amount of overhead to the > sorting process. > 2) The size of priority queue, 30,000, adds significant overhead to sorting > operations. > *The new algorithm:* > Has a top level *merge sort iterator* that wraps segment level iterators that > perform segment level priority queue sorts. > *Segment level:* > The segment level docset will be iterated and the segment level ordinals for > the sort fields will be materialized and added to a segment level priority > queue. As the segment level iterator pops docs from the priority queue the > top level ordinals for the sort fields are materialized. Because the top > level ordinals are materialized AFTER the sort, they only need to be looked > up when the segment level ordinal changes. This takes advantage of the sort > to limit the lookups into the top level ordinal structures. This also > eliminates redundant lookups of top level ordinals that occur during the > multiple passes over the matching docset. > The segment level priority queues can be kept smaller than 30,000 to improve > performance of the sorting operations because the overall batch size will > still be 30,000 or greater when all the segment priority queue sizes are > added up. This allows for batch sizes much larger then 30,000 without using a > single large priority queue. The increased batch size means fewer iterations > over the matching docset and the decreased priority queue size means faster > sorting operations. > *Top level:* > A top level iterator does a merge sort over the segment level iterators by > comparing the top level ordinals materialized when the segment level docs are > popped from the segment level priority queues. This requires no extra memory > and will be very performant. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9659) Support inequality operations in payload check queries
[ https://issues.apache.org/jira/browse/LUCENE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck reassigned LUCENE-9659: Assignee: Gus Heck > Support inequality operations in payload check queries > -- > > Key: LUCENE-9659 > URL: https://issues.apache.org/jira/browse/LUCENE-9659 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > This is a ticket broken out from > https://issues.apache.org/jira/browse/SOLR-14787 > The patch will extend the SpanPayloadCheck query to support inequality checks > to see if the term and payload should match. Currently, this query operator > only supports equals as the payload check. This ticket introduces > gt,gte,lt,lte and eq operations to support testing if a payload is greater > than/less than a specified reference payload value. One such use case is to > have a label on a document with a confidence level stored as a payload. This > patch will support searching for the term where a confidence level is above a > given threshold. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide
[ https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237501#comment-17237501 ] Gus Heck commented on SOLR-15014: - Actually I got brave and let it run longer, and it seems to stop after 30 replicas have been created, leaving me with 31 replicas of shard 1 (and still 1 of shard 2) > Runaway replica creation with autoscaling example from ref guide > > > Key: SOLR-15014 > URL: https://issues.apache.org/jira/browse/SOLR-15014 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.6.3 >Reporter: Gus Heck >Priority: Major > Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, > image-2020-11-23-11-37-15-124.png > > > Although the present autoscaling implementation is deprecated, I have a > client intent on using it, and in trying to create rules that ensure all > replicas on all nodes, I wound up getting into a state where one replica was > (apparently) infinitely creating new copies of itself. The boiled down steps > to reproduce: > Create a 4 node cluster locally for testing from a checkout of the tagged > version for 8.6.3 > (Using solr/cloud-dev/cloud.sh) > {code:java} > ./cloud.sh new -r > {code} > Create a collection > {code:java} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1 > {code} > Add this trigger from the ref guide > ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):] > {code:java} > { > "set-trigger": { > "name": "node_added_trigger", > "event": "nodeAdded", > "waitFor": "5s", > "preferredOperation": "ADDREPLICA", > "replicaType": "PULL" > } > } > {code} > Reboot the cluster, and when it comes up infinite replica creation ensues > (attaching screen shot of admin UI showing replicated shard momentarily) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide
[ https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237495#comment-17237495 ] Gus Heck commented on SOLR-15014: - Discussion on slack suggests that given the fact that this functionality is going away, the primary thing here will be to remove the example from the ref guide. (or if folks have an idea how to mitigate it with additional configuration, add that to the ref guide, but I haven't found such yet) > Runaway replica creation with autoscaling example from ref guide > > > Key: SOLR-15014 > URL: https://issues.apache.org/jira/browse/SOLR-15014 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.6.3 >Reporter: Gus Heck >Priority: Major > Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, > image-2020-11-23-11-37-15-124.png > > > Although the present autoscaling implementation is deprecated, I have a > client intent on using it, and in trying to create rules that ensure all > replicas on all nodes, I wound up getting into a state where one replica was > (apparently) infinitely creating new copies of itself. The boiled down steps > to reproduce: > Create a 4 node cluster locally for testing from a checkout of the tagged > version for 8.6.3 > (Using solr/cloud-dev/cloud.sh) > {code:java} > ./cloud.sh new -r > {code} > Create a collection > {code:java} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1 > {code} > Add this trigger from the ref guide > ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):] > {code:java} > { > "set-trigger": { > "name": "node_added_trigger", > "event": "nodeAdded", > "waitFor": "5s", > "preferredOperation": "ADDREPLICA", > "replicaType": "PULL" > } > } > {code} > Reboot the cluster, and when it comes up infinite replica creation ensues > (attaching screen shot of admin UI showing replicated shard momentarily) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide
[ https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-15014: Attachment: Screen Shot 2020-11-23 at 11.40.29 AM.png > Runaway replica creation with autoscaling example from ref guide > > > Key: SOLR-15014 > URL: https://issues.apache.org/jira/browse/SOLR-15014 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.6.3 >Reporter: Gus Heck >Priority: Major > Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, > image-2020-11-23-11-37-15-124.png > > > Although the present autoscaling implementation is deprecated, I have a > client intent on using it, and in trying to create rules that ensure all > replicas on all nodes, I wound up getting into a state where one replica was > (apparently) infinitely creating new copies of itself. The boiled down steps > to reproduce: > Create a 4 node cluster locally for testing from a checkout of the tagged > version for 8.6.3 > (Using solr/cloud-dev/cloud.sh) > {code:java} > ./cloud.sh new -r > {code} > Create a collection > {code:java} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1 > {code} > Add this trigger from the ref guide > ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):] > {code:java} > { > "set-trigger": { > "name": "node_added_trigger", > "event": "nodeAdded", > "waitFor": "5s", > "preferredOperation": "ADDREPLICA", > "replicaType": "PULL" > } > } > {code} > Reboot the cluster, and when it comes up infinite replica creation ensues > (attaching screen shot of admin UI showing replicated shard momentarily) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide
Gus Heck created SOLR-15014: --- Summary: Runaway replica creation with autoscaling example from ref guide Key: SOLR-15014 URL: https://issues.apache.org/jira/browse/SOLR-15014 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Affects Versions: 8.6.3 Reporter: Gus Heck Attachments: image-2020-11-23-11-37-15-124.png Although the present autoscaling implementation is deprecated, I have a client intent on using it, and in trying to create rules that ensure all replicas on all nodes, I wound up getting into a state where one replica was (apparently) infinitely creating new copies of itself. The boiled down steps to reproduce: Create a 4 node cluster locally for testing from a checkout of the tagged version for 8.6.3 (Using solr/cloud-dev/cloud.sh) {code:java} ./cloud.sh new -r {code} Create a collection {code:java} http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1 {code} Add this trigger from the ref guide ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):] {code:java} { "set-trigger": { "name": "node_added_trigger", "event": "nodeAdded", "waitFor": "5s", "preferredOperation": "ADDREPLICA", "replicaType": "PULL" } } {code} Reboot the cluster, and when it comes up infinite replica creation ensues (attaching screen shot of admin UI showing replicated shard momentarily) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14986) Restrict the properties possible to define with "property.name=value" when creating a collection
[ https://issues.apache.org/jira/browse/SOLR-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229266#comment-17229266 ] Gus Heck commented on SOLR-14986: - Yeah, It seems to me that any property specified in the create command that would conflict with the actual properties of the create command should just fail with a message about overlapping properties. > Restrict the properties possible to define with "property.name=value" when > creating a collection > > > Key: SOLR-14986 > URL: https://issues.apache.org/jira/browse/SOLR-14986 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > This came to light when I was looking at two user-list questions where people > try to manually define core.properties to define _replicas_ in SolrCloud. > There are two related issues: > 1> You can do things like "action=CREATE=eoe=blivet" > which results in an opaque error about "could not create replica." I > propose we return a better error here like "property.collection should not be > specified when creating a collection". What do people think about the rest of > the auto-created properties on collection creation? > coreNodeName > collection.configName > name > numShards > shard > collection > replicaType > "name" seems to be OK to change, although i don't see anyplace anyone can > actually see it afterwards > 2> Change the ref guide to steer people away from attempting to manually > create a core.properties file to define cores/replicas in SolrCloud. There's > no warning on the "defining-core-properties.adoc" for instance. Additionally > there should be some kind of message on the collections API documentation > about not trying to set the properties in <1> on the CREATE command. > <2> used to actually work (apparently) with legacyCloud... -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9575) Add PatternTypingFilter
[ https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215434#comment-17215434 ] Gus Heck edited comment on LUCENE-9575 at 10/16/20, 3:15 PM: - Yeah I looked at our FST based regex class, but as you say, no group tracking which was critical. I had somewhat hoped that the performance of a non FST list of regexes would force me to learn all the nitty gritty of FST's and do something really nifty add group support but the ingest for the customer (involving ~25 regexps) didn't seem to be limited by the analysis so there was no justifying that work... optimize later. Also, no not across multiple tokens, again more than the customer needed, but a valid enhancement. was (Author: gus_heck): Yeah I looked at our FST based regex class, but as you say, no group tracking which was critical. I had somewhat hoped that the performance of a non FST list of regexes would force me to learn all the nitty gritty of FST's and do something really nifty add group support but the ingest for the customer (involving ~25 regexps) didn't seem to be limited by the analysis so there was no justifying that work... optimize later. > Add PatternTypingFilter > --- > > Key: LUCENE-9575 > URL: https://issues.apache.org/jira/browse/LUCENE-9575 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > One of the key asks when the Library of Congress was asking me to develop the > Advanced Query Parser was to be able to recognize arbitrary patterns that > included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they > wanted 401k and 401(k) to match documents with either style reference, and > NOT match documents that happen to have isolated 401 or k tokens (i.e. not > documents about the http status code) And of course we wanted to give up as > little of the text analysis features they were already using. > This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and > one solr specific filter in SOLR-14597 that re-analyzes tokens with an > arbitrary analyzer defined for a type in the solr schema, combine to achieve > this. > This filter has the job of spotting the patterns, and adding the intended > synonym as at type to the token (from which minimal punctuation has been > removed). It also sets flags on the token which are retained through the > analysis chain, and at the very end the type is converted to a synonym and > the original token(s) for that type are dropped avoiding the match on 401 > (for example) > The pattern matching is specified in a file that looks like: > {code} > 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2 > 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3 > 2 C\+\+ ::: c_plus_plus > {code} > That file would match match legal reference patterns such as 401(k), 401k, > 501(c)3 and C++ The format is: > ::: > and groups in the pattern are substituted into the replacement so the first > line above would create synonyms such as: > {code} > 401k --> legal2_401_k > 401(k) --> legal2_401_k > 503(c) --> legal2_503_c > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter
[ https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215434#comment-17215434 ] Gus Heck commented on LUCENE-9575: -- Yeah I looked at our FST based regex class, but as you say, no group tracking which was critical. I had somewhat hoped that the performance of a non FST list of regexes would force me to learn all the nitty gritty of FST's and do something really nifty add group support but the ingest for the customer (involving ~25 regexps) didn't seem to be limited by the analysis so there was no justifying that work... optimize later. > Add PatternTypingFilter > --- > > Key: LUCENE-9575 > URL: https://issues.apache.org/jira/browse/LUCENE-9575 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > > One of the key asks when the Library of Congress was asking me to develop the > Advanced Query Parser was to be able to recognize arbitrary patterns that > included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they > wanted 401k and 401(k) to match documents with either style reference, and > NOT match documents that happen to have isolated 401 or k tokens (i.e. not > documents about the http status code) And of course we wanted to give up as > little of the text analysis features they were already using. > This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and > one solr specific filter in SOLR-14597 that re-analyzes tokens with an > arbitrary analyzer defined for a type in the solr schema, combine to achieve > this. > This filter has the job of spotting the patterns, and adding the intended > synonym as at type to the token (from which minimal punctuation has been > removed). It also sets flags on the token which are retained through the > analysis chain, and at the very end the type is converted to a synonym and > the original token(s) for that type are dropped avoiding the match on 401 > (for example) > The pattern matching is specified in a file that looks like: > {code} > 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2 > 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3 > 2 C\+\+ ::: c_plus_plus > {code} > That file would match match legal reference patterns such as 401(k), 401k, > 501(c)3 and C++ The format is: > ::: > and groups in the pattern are substituted into the replacement so the first > line above would create synonyms such as: > {code} > 401k --> legal2_401_k > 401(k) --> legal2_401_k > 503(c) --> legal2_503_c > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9574) Add a token filter to drop tokens based on flags.
[ https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214201#comment-17214201 ] Gus Heck commented on LUCENE-9574: -- Actually, I had expected when I started this, that 8.7 branch might have been cut already by the time I committed, and certainly the rest of the AQP changes won't make 8.7. Do we want to include it in 8.7 even so? > Add a token filter to drop tokens based on flags. > - > > Key: LUCENE-9574 > URL: https://issues.apache.org/jira/browse/LUCENE-9574 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > (Breaking this off of SOLR-14597 for independent review) > A filter that tests flags on tokens vs a bitmask and drops tokens that have > all specified flags. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14861) CoreContainer shutdown needs to be aware of other ongoing operations and wait until they're complete
[ https://issues.apache.org/jira/browse/SOLR-14861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213282#comment-17213282 ] Gus Heck commented on SOLR-14861: - Sorry didn't mean to sound accusatory. I guess I'm not understanding this: How is reload "complete" if it's time-sliced out.. that sounds like it's not "complete" to me. Looking at the test specifically, it appears to create 5 threads start them and then thread.join all of them, if reload is timesliced out, the thread in question shouldn't' be finished (unless the reload call is happening async which would be the problem I'm talking about) and the join should continue to block preventing the test harness from shutting down (because the test method isn't finished). Alternately maybe I'm confused about who is calling shutdown? Looking into the sub-methods I find another example of what I'm talking about even though it shouldn't actually cause failure here unless perhaps this heuristic can pass before reload completes... {code:java} RestTestHarness publisher = randomRestTestHarness(r); String response = publisher.post("/schema", SolrTestCaseJ4.json(payload)); {code} should be blocking until the core is reloaded and changes are safe for use by the caller (IMHO). The subsequent loop should not be needed: {code:java} try { long startTime = System.nanoTime(); long maxTimeoutMillis = 10; while (TimeUnit.MILLISECONDS.convert(System.nanoTime() - startTime, TimeUnit.NANOSECONDS) < maxTimeoutMillis) { errmessages.clear(); Map m = getObj(harness, aField, "fields"); if (m != null) errmessages.add(StrUtils.formatString("field {0} still exists", aField)); m = getObj(harness, dynamicFldName, "dynamicFields"); if (m != null) errmessages.add(StrUtils.formatString("dynamic field {0} still exists", dynamicFldName)); List l = getSourceCopyFields(harness, aField); if (checkCopyField(l, aField, dynamicCopyFldDest)) errmessages.add(StrUtils.formatString("CopyField source={0},dest={1} still exists", aField, dynamicCopyFldDest)); m = getObj(harness, newFieldTypeName, "fieldTypes"); if (m != null) errmessages.add(StrUtils.formatString("new type {0} still exists", newFieldTypeName)); if (errmessages.isEmpty()) break; Thread.sleep(10); } {code} As for code after shutdown, It looks like people may have read isShutDown two different ways perhaps? Maybe we need two flags with clearer names... isShutDownComplete and isShutDownInProgress? > CoreContainer shutdown needs to be aware of other ongoing operations and wait > until they're complete > > > Key: SOLR-14861 > URL: https://issues.apache.org/jira/browse/SOLR-14861 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-14861.patch > > > Noble and I are trying to get to the bottom of the TestBulkSchemaConcurrent > failures and found what looks like a glaring gap in how > CoreContainer.shutdown operates. I don't know the impact on production since > we're shutting down anyway, but I think this is responsible for the errors in > TestBulkSchemaConcurrent and likely behind others, especially any other test > that fails intermittently that involves core reloads, including and > especially any tests that exercise managed schema. > We have clear evidence of this sequence: > 1> some CoreContainer.reloads come in and get _partway_ through, in > particular past the test at the top where CoreContainer.reload() throws an > AlreadyClosed exception if (isShutdown). > 2> Some CoreContainer.shutdown() threads get some processing time before the > reloads in <1> are finished. > 3> the threads in <1> pick back up and go wonky. I suspect that there are a > number of different things that could be going wrong here depending on how > far through CoreContainer.shutdown() gets that pop out in different ways. > Since it's my shift (Noble has to sleep sometime), I put some crude locking > in just to test the idea; incrementing an AtomicInteger on entry to > CoreContainer.reload then decrementing it at the end, and spinning in > CoreContainer.shutdown() until the AtomicInteger was back to zero. With that > in place, 100 runs and no errors whereas before I could never get even 10 > runs to finish without an error. This is not a proper fix at all, and the way > it's currently running there are still possible race conditions, just much > smaller windows. And I suspect it risks spinning forever. But it's enough to > make me believe I finally understand what's happening. > I also suspect that reload is more sensitive than most operations on a core > due
[jira] [Comment Edited] (SOLR-14861) CoreContainer shutdown needs to be aware of other ongoing operations and wait until they're complete
[ https://issues.apache.org/jira/browse/SOLR-14861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213118#comment-17213118 ] Gus Heck edited comment on SOLR-14861 at 10/13/20, 1:33 PM: Why aren't we viewing the problem as reload (etc) returns before it has ACTUALLY completed? How can the test be proceeding to shutdown if we aren't lying to it about the completion of reload? (assuming the test isn't making it's own threads and failing to track when they complete) A call to any admin level operation (from Java, SolrJ or the admin API) really should not complete until the command is complete, and the definition of complete should be the target resource is 100% ready to use (see also: create collection) Tests should never need any waiting strategies unless they themselves have started their own threads. (Credit: Above, I'm parroting a rehashed form of something Mark Miller said ages ago, at least as I recall it) If we *need* to track what's in-flight on shutdown, we've failed in the event of a power loss, so we shouldn't be doing that (Where need defined as "otherwise persisted state will be corrupted", anything else is "want"). If we want a graceful "drain existing requests" process we should build that explicitly by tracking all requests at a high level (we do this with SolrRequestInfo partly already, plus need to account for async)... Of course that only works if we don't lie about request completion in the first place. Once we can perform a "start rejecting and drain" (that doesn't lie about when it completes) we can paste request draining on the front of shutdown and reload fairly trivially as an option. was (Author: gus_heck): Why aren't we viewing the problem as reload (etc) returns before it has ACTUALLY completed? How can the test be proceeding to shutdown if we aren't lying to it about the completion of reload? (assuming the test isn't making it's own threads and failing to track when they complete) A call to any admin level operation (from Java, SolrJ or the admin API) really should not complete until the command is complete, and the definition of complete should be the target resource is 100% ready to use (see also: create collection) Tests should never need any waiting strategies unless they themselves have started their own threads. (Credit: Above, I'm parroting a rehashed form of something Mark Miller said ages ago, at least as I recall it) If we *need* to track what's in-flight on shutdown, we've failed in the event of a power loss, so we shouldn't be doing that (Where need defined as "otherwise persisted state will be corrupted", anything else is "want"). If we want a graceful "drain existing requests" process we should build that explicitly by tracking all requests at a high level (we do this with SolrRequestInfo partly already, plus need to account for async)... Of course that only works if we don't lie about request completion in the first place. Once we can perform a "start rejecting and drain" (that doesn't lie about when it completes) we can paste request draining on the front of shutdown and reload fairly trivially. > CoreContainer shutdown needs to be aware of other ongoing operations and wait > until they're complete > > > Key: SOLR-14861 > URL: https://issues.apache.org/jira/browse/SOLR-14861 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-14861.patch > > > Noble and I are trying to get to the bottom of the TestBulkSchemaConcurrent > failures and found what looks like a glaring gap in how > CoreContainer.shutdown operates. I don't know the impact on production since > we're shutting down anyway, but I think this is responsible for the errors in > TestBulkSchemaConcurrent and likely behind others, especially any other test > that fails intermittently that involves core reloads, including and > especially any tests that exercise managed schema. > We have clear evidence of this sequence: > 1> some CoreContainer.reloads come in and get _partway_ through, in > particular past the test at the top where CoreContainer.reload() throws an > AlreadyClosed exception if (isShutdown). > 2> Some CoreContainer.shutdown() threads get some processing time before the > reloads in <1> are finished. > 3> the threads in <1> pick back up and go wonky. I suspect that there are a > number of different things that could be going wrong here depending on how > far through CoreContainer.shutdown() gets that pop out in different ways. > Since it's my shift (Noble has to sleep sometime), I put some crude locking > in just to test the idea; incrementing an AtomicInteger on entry to >
[jira] [Commented] (SOLR-14861) CoreContainer shutdown needs to be aware of other ongoing operations and wait until they're complete
[ https://issues.apache.org/jira/browse/SOLR-14861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213118#comment-17213118 ] Gus Heck commented on SOLR-14861: - Why aren't we viewing the problem as reload (etc) returns before it has ACTUALLY completed? How can the test be proceeding to shutdown if we aren't lying to it about the completion of reload? (assuming the test isn't making it's own threads and failing to track when they complete) A call to any admin level operation (from Java, SolrJ or the admin API) really should not complete until the command is complete, and the definition of complete should be the target resource is 100% ready to use (see also: create collection) Tests should never need any waiting strategies unless they themselves have started their own threads. (Credit: Above, I'm parroting a rehashed form of something Mark Miller said ages ago, at least as I recall it) If we *need* to track what's in-flight on shutdown, we've failed in the event of a power loss, so we shouldn't be doing that (Where need defined as "otherwise persisted state will be corrupted", anything else is "want"). If we want a graceful "drain existing requests" process we should build that explicitly by tracking all requests at a high level (we do this with SolrRequestInfo partly already, plus need to account for async)... Of course that only works if we don't lie about request completion in the first place. Once we can perform a "start rejecting and drain" (that doesn't lie about when it completes) we can paste request draining on the front of shutdown and reload fairly trivially. > CoreContainer shutdown needs to be aware of other ongoing operations and wait > until they're complete > > > Key: SOLR-14861 > URL: https://issues.apache.org/jira/browse/SOLR-14861 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-14861.patch > > > Noble and I are trying to get to the bottom of the TestBulkSchemaConcurrent > failures and found what looks like a glaring gap in how > CoreContainer.shutdown operates. I don't know the impact on production since > we're shutting down anyway, but I think this is responsible for the errors in > TestBulkSchemaConcurrent and likely behind others, especially any other test > that fails intermittently that involves core reloads, including and > especially any tests that exercise managed schema. > We have clear evidence of this sequence: > 1> some CoreContainer.reloads come in and get _partway_ through, in > particular past the test at the top where CoreContainer.reload() throws an > AlreadyClosed exception if (isShutdown). > 2> Some CoreContainer.shutdown() threads get some processing time before the > reloads in <1> are finished. > 3> the threads in <1> pick back up and go wonky. I suspect that there are a > number of different things that could be going wrong here depending on how > far through CoreContainer.shutdown() gets that pop out in different ways. > Since it's my shift (Noble has to sleep sometime), I put some crude locking > in just to test the idea; incrementing an AtomicInteger on entry to > CoreContainer.reload then decrementing it at the end, and spinning in > CoreContainer.shutdown() until the AtomicInteger was back to zero. With that > in place, 100 runs and no errors whereas before I could never get even 10 > runs to finish without an error. This is not a proper fix at all, and the way > it's currently running there are still possible race conditions, just much > smaller windows. And I suspect it risks spinning forever. But it's enough to > make me believe I finally understand what's happening. > I also suspect that reload is more sensitive than most operations on a core > due to the fact that it runs for a long time, but I assume other operations > have the same potential. Shouldn't CoreContainer.shutDown() wait until no > other operations are in flight? > On a quick scan of CoreContainer, there are actually few places where we even > check for isShutdown, I suspect the places we do are ad-hoc that we've found > by trial-and-error when tests fail. We need a design rather than hit-or-miss > hacking. > I think that isShutdown should be replaced with something more robust. What > that is IDK quite yet because I've been hammering at this long enough and I > need a break. > This is consistent with another observation about this particular test. If > there's sleep at the end, it wouldn't fail; all the reloads get a chance to > finish before anything was shut down. > An open question how much this matters to production systems. In the testing > case, bunches of these reloads are issued then we immediately
[jira] [Commented] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types
[ https://issues.apache.org/jira/browse/LUCENE-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210350#comment-17210350 ] Gus Heck commented on LUCENE-9572: -- The test framework changes in this ticket are also required by LUCENE-9575 > Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types > --- > > Key: LUCENE-9572 > URL: https://issues.apache.org/jira/browse/LUCENE-9572 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis, modules/test-framework >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > (Breaking this off of SOLR-14597 for independent review) > TypeAsSynonymFilter converts types attributes to a synonym. In some cases the > original token may have already had flags set on it and it may be useful to > propagate some or all of those flags to the synonym we are generating. This > ticket provides that ability and allows the user to specify a bitmask to > specify which flags are retained. > Additionally there may be some set of types that should not be converted to > synonyms, and this change allows the user to specify a comma separated list > of types to ignore (most common case will be to ignore a common default type > of 'word' I suspect) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9575) Add PatternTypingFilter
Gus Heck created LUCENE-9575: Summary: Add PatternTypingFilter Key: LUCENE-9575 URL: https://issues.apache.org/jira/browse/LUCENE-9575 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Gus Heck Assignee: Gus Heck One of the key asks when the Library of Congress was asking me to develop the Advanced Query Parser was to be able to recognize arbitrary patterns that included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they wanted 401k and 401(k) to match documents with either style reference, and NOT match documents that happen to have isolated 401 or k tokens (i.e. not documents about the http status code) And of course we wanted to give up as little of the text analysis features they were already using. This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and one solr specific filter in SOLR-14597 that re-analyzes tokens with an arbitrary analyzer defined for a type in the solr schema, combine to achieve this. This filter has the job of spotting the patterns, and adding the intended synonym as at type to the token (from which minimal punctuation has been removed). It also sets flags on the token which are retained through the analysis chain, and at the very end the type is converted to a synonym and the original token(s) for that type are dropped avoiding the match on 401 (for example) The pattern matching is specified in a file that looks like: {code} 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3 2 C\+\+ ::: c_plus_plus {code} That file would match match legal reference patterns such as 401(k), 401k, 501(c)3 and C++ The format is: ::: and groups in the pattern are substituted into the replacement so the first line above would create synonyms such as: {code} 401k --> legal2_401_k 401(k) --> legal2_401_k 503(c) --> legal2_503_c {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9574) Add a token filter to drop tokens based on flags.
[ https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210335#comment-17210335 ] Gus Heck commented on LUCENE-9574: -- Since this is blocking SIP-9 and SOLR-14597 I'll be presuming silent consensus if there are no comments by Monday > Add a token filter to drop tokens based on flags. > - > > Key: LUCENE-9574 > URL: https://issues.apache.org/jira/browse/LUCENE-9574 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > (Breaking this off of SOLR-14597 for independent review) > A filter that tests flags on tokens vs a bitmask and drops tokens that have > all specified flags. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9574) Add a token filter to drop tokens based on flags.
[ https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210333#comment-17210333 ] Gus Heck commented on LUCENE-9574: -- One interesting corner case came up when the first token in the stream matched the flags, but had already had a synonym added. The synonym of course had position increment 0 and so dropping the token caused compliants about first token not having a position increment > 0. I could think of no way to reach forward in the stream and adjust the synonym token to account for the dropping of it's parent. So the workaround I came up with was to create a random token that will effectively never match anything and thus be invisible to to replace instead of drop if the first token in the stream is being dropped. Not crazy about it and would like to ask why the restriction on position increment is there... it feels like for some reason downstream code expects token positions be be starting with 1 instead of zero or something? Open to suggestions for a better solution too. > Add a token filter to drop tokens based on flags. > - > > Key: LUCENE-9574 > URL: https://issues.apache.org/jira/browse/LUCENE-9574 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > > (Breaking this off of SOLR-14597 for independent review) > A filter that tests flags on tokens vs a bitmask and drops tokens that have > all specified flags. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types
[ https://issues.apache.org/jira/browse/LUCENE-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210332#comment-17210332 ] Gus Heck commented on LUCENE-9572: -- Since this is blocking SIP-9 and SOLR-14597 I'll be presuming silent consensus if there are no comments by Monday > Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types > --- > > Key: LUCENE-9572 > URL: https://issues.apache.org/jira/browse/LUCENE-9572 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis, modules/test-framework >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > (Breaking this off of SOLR-14597 for independent review) > TypeAsSynonymFilter converts types attributes to a synonym. In some cases the > original token may have already had flags set on it and it may be useful to > propagate some or all of those flags to the synonym we are generating. This > ticket provides that ability and allows the user to specify a bitmask to > specify which flags are retained. > Additionally there may be some set of types that should not be converted to > synonyms, and this change allows the user to specify a comma separated list > of types to ignore (most common case will be to ignore a common default type > of 'word' I suspect) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9574) Add a token filter to drop tokens based on flags.
[ https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated LUCENE-9574: - Description: (Breaking this off of SOLR-14597 for independent review) A filter that tests flags on tokens vs a bitmask and drops tokens that have all specified flags. was:A filter that tests flags on tokens vs a bitmask and drops tokens that have all specified flags. > Add a token filter to drop tokens based on flags. > - > > Key: LUCENE-9574 > URL: https://issues.apache.org/jira/browse/LUCENE-9574 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > > (Breaking this off of SOLR-14597 for independent review) > A filter that tests flags on tokens vs a bitmask and drops tokens that have > all specified flags. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9574) Add a token filter to drop tokens based on flags.
Gus Heck created LUCENE-9574: Summary: Add a token filter to drop tokens based on flags. Key: LUCENE-9574 URL: https://issues.apache.org/jira/browse/LUCENE-9574 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Gus Heck Assignee: Gus Heck A filter that tests flags on tokens vs a bitmask and drops tokens that have all specified flags. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types
Gus Heck created LUCENE-9572: Summary: Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types Key: LUCENE-9572 URL: https://issues.apache.org/jira/browse/LUCENE-9572 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis, modules/test-framework Reporter: Gus Heck Assignee: Gus Heck (Breaking this off of SOLR-14597 for independent review) TypeAsSynonymFilter converts types attributes to a synonym. In some cases the original token may have already had flags set on it and it may be useful to propagate some or all of those flags to the synonym we are generating. This ticket provides that ability and allows the user to specify a bitmask to specify which flags are retained. Additionally there may be some set of types that should not be converted to synonyms, and this change allows the user to specify a comma separated list of types to ignore (most common case will be to ignore a common default type of 'word' I suspect) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser
[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210252#comment-17210252 ] Gus Heck commented on SOLR-14787: - New Syntax with latest change (one less parameter, can check multiple tokens): for payloads such as {code:java} "one|1.0 two|2.0 three|3.0" {code} This does not match {code:java} {!payload_check f=vals_dpf payloads='0.75 3' op='gt'}one two {code} but this does match {code:java} {!payload_check f=vals_dpf payloads='0.75 1.5' op='gt'}one two {code} > Inequality support in Payload Check query parser > > > Key: SOLR-14787 > URL: https://issues.apache.org/jira/browse/SOLR-14787 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > The goal of this ticket/pull request is to support a richer set of matching > and filtering based on term payloads. This patch extends the > PayloadCheckQueryParser to add a new local param for "op" > The value of OP could be one of the following > * gt - greater than > * gte - greater than or equal > * lt - less than > * lte - less than or equal > default value for "op" if not specified is to be the current behavior of > equals. > Additionally to the operation you can specify a threshold local parameter > This will provide the ability to search for the term "cat" so long as the > payload has a value of greater than 0.75. > One use case is to classify a document into various categories with an > associated confidence or probability that the classification is correct. > That can be indexed into a delimited payload field. The searches can find > and match documents that were tagged with the "cat" category with a > confidence of greater than 0.5. > Example Document > {code:java} > { > "id":"doc_1", > "classifications_payload":["cat|0.75 dog|2.0"] > } > {code} > Example Syntax > {code:java} > {!payload_check f=classifications_payload payloads='1' op='gt' > threshold='0.5'}cat {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser
[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205764#comment-17205764 ] Gus Heck commented on SOLR-14787: - So after spending some more time with this I have the following thoughts: # The threshold parameter is redundant with the payloads parameter. This should all be choosing operators in the same manner in the code, with "equals" being the default operator rather than having two distinct code paths. I think {{"\{!payload_check f=vals_dpf payloads='0.75' op='gt'}one"}} makes more sense. This also opens up the possibility of testing vs multiple payload values just like the equals case. Accepting a different operator per payload value can be a future enhancement however if anyone wants it. # There is a lucene class change here and so there definitely should be lucene level tests and we should have a lucene ticket too. # As you mentioned in a separate channel, this doesn't work with integers ( ie. {{"\{!payload_check f=vals_dpi payloads='1' op='gt' threshold='0.75'}A"}} won't work... this is because the integer payload (from the index, not the query) gets decoded as a float and winds up being some very very small value (saw it in debug, forgot to copy it down, but something ten to the minus 14 IIRC), so this deceptively gives wrong answers and does not throw errors which is bad. I think this needs to be addressed by communicating the payload type to the query at the lucene layer (where folks are responsible for knowing the types info of their own fields) and deriving it from schema at the solr level where folks expect stuff to just work, because they declared a schema. Additionally, by analogy with range queries, probably strings should work via lexical order but possibly that could be for future enhancement, since users are less likely to expect strings to work in the same fashion as floats. # I'm still trying to explain why I get different results in IDE vs build here, but the build and the running applications is the important thing. # Needs docs of course > Inequality support in Payload Check query parser > > > Key: SOLR-14787 > URL: https://issues.apache.org/jira/browse/SOLR-14787 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The goal of this ticket/pull request is to support a richer set of matching > and filtering based on term payloads. This patch extends the > PayloadCheckQueryParser to add a new local param for "op" > The value of OP could be one of the following > * gt - greater than > * gte - greater than or equal > * lt - less than > * lte - less than or equal > default value for "op" if not specified is to be the current behavior of > equals. > Additionally to the operation you can specify a threshold local parameter > This will provide the ability to search for the term "cat" so long as the > payload has a value of greater than 0.75. > One use case is to classify a document into various categories with an > associated confidence or probability that the classification is correct. > That can be indexed into a delimited payload field. The searches can find > and match documents that were tagged with the "cat" category with a > confidence of greater than 0.5. > Example Document > {code:java} > { > "id":"doc_1", > "classifications_payload":["cat|0.75 dog|2.0"] > } > {code} > Example Syntax > {code:java} > {!payload_check f=classifications_payload payloads='1' op='gt' > threshold='0.5'}cat {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser
[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205577#comment-17205577 ] Gus Heck commented on SOLR-14787: - Hmm now I suspect my IDE had somehow got confused WRT 9.0 match version or perhaps I was reading a wrong window, but trying fresh today I just reproduced the NOUN VERB failure again with a freshly started IDE but this time failing with the appropriate 8.7 match version messages... That said, but the build still passes when I do this {code:java} gus@ns-l1:~/projects/apache/lucene-solr/fork/lucene-solr8$ ant test -Dtests.class=org.apache.solr.search.TestPayloadCheckQParserPlugin > build.out.txt gus@ns-l1:~/projects/apache/lucene-solr/fork/lucene-solr8$ grep NOUN build.out.txt [junit4] 2> 5995 INFO (TEST-TestPayloadCheckQParserPlugin.test-seed#[39C6574AF7C1723D]) [ ] o.a.s.c.S.Request [collection1] webapp=null path=null params={q={!payload_check+f%3Dvals_dps+payloads%3D'NOUN+VERB'}cat+jumped=*,score=xml} hits=1 status=0 QTime=4 [junit4] 2> 6004 INFO (TEST-TestPayloadCheckQParserPlugin.test-seed#[39C6574AF7C1723D]) [ ] o.a.s.c.S.Request [collection1] webapp=null path=null params={q={!payload_check+f%3Dvals_dps+payloads%3D'VERB+NOUN'}cat+jumped=*,score=xml} hits=0 status=0 QTime=0 {code} Note the hits=1 above vs hits=0 I get in the ide running of the same test {code:java} 3618 INFO (TEST-TestPayloadCheckQParserPlugin.test-seed#[C26FC0AC309214A9]) [ ] o.a.s.c.S.Request [collection1] webapp=null path=null params={q={!payload_check+f%3Dvals_dps+payloads%3D'NOUN+VERB'}cat+jumped=*,score=xml} hits=0 status=0 QTime=2 {code} > Inequality support in Payload Check query parser > > > Key: SOLR-14787 > URL: https://issues.apache.org/jira/browse/SOLR-14787 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The goal of this ticket/pull request is to support a richer set of matching > and filtering based on term payloads. This patch extends the > PayloadCheckQueryParser to add a new local param for "op" > The value of OP could be one of the following > * gt - greater than > * gte - greater than or equal > * lt - less than > * lte - less than or equal > default value for "op" if not specified is to be the current behavior of > equals. > Additionally to the operation you can specify a threshold local parameter > This will provide the ability to search for the term "cat" so long as the > payload has a value of greater than 0.75. > One use case is to classify a document into various categories with an > associated confidence or probability that the classification is correct. > That can be indexed into a delimited payload field. The searches can find > and match documents that were tagged with the "cat" category with a > confidence of greater than 0.5. > Example Document > {code:java} > { > "id":"doc_1", > "classifications_payload":["cat|0.75 dog|2.0"] > } > {code} > Example Syntax > {code:java} > {!payload_check f=classifications_payload payloads='1' op='gt' > threshold='0.5'}cat {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200848#comment-17200848 ] Gus Heck commented on SOLR-8281: This seems related to something I wanted to do for a client... I had reduce with group() and I wanted to then feed the groups to an arbitrary streaming expression for further processing, and have the result show up in the groups (result would have been a matrix). Problem I stopped on was how to express the stream to process the group without having a source (the source is the group). > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser
[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200828#comment-17200828 ] Gus Heck commented on SOLR-14787: - I have found something interesting WRT the failing case you mention... it only fails when I run the test in my IDE. If I use the ant build it passes. I notice some interesting differences in startup for these two scenarios... build: {code:java} [junit4] Suite: org.apache.solr.search.TestPayloadCheckQParserPlugin [junit4] 2> 1454 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCase Setting 'solr.default.confdir' system property to test-framework derived value of '/home/gus/projects/apache/lucene-solr/fork/lucene-solr8/solr/server/solr/configsets/_default/conf' [junit4] 2> 1475 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCaseJ4 Created dataDir: /home/gus/projects/apache/lucene-solr/fork/lucene-solr8/solr/build/solr-core/test/J0/temp/solr.search.TestPayloadCheckQParserPlugin_AB5E0FC0380BB866-001/data-dir-1-001 [junit4] 2> 1551 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCaseJ4 Using TrieFields (NUMERIC_POINTS_SYSPROP=false) w/NUMERIC_DOCVALUES_SYSPROP=true [junit4] 2> 1592 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.e.j.u.log Logging initialized @1620ms to org.eclipse.jetty.util.log.Slf4jLog [junit4] 2> 1597 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (true) via: @org.apache.solr.util.RandomizeSSL(reason=, ssl=NaN, value=NaN, clientAuth=NaN) [junit4] 2> 1621 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCaseJ4 SecureRandom sanity checks: test.solr.allowed.securerandom=null & java.security.egd=file:/dev/./urandom [junit4] 2> 1626 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCaseJ4 initCore [junit4] 2> 1757 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrConfig Using Lucene MatchVersion: 8.7.0 [junit4] 2> 1901 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.s.IndexSchema Schema name=example [junit4] 2> 1931 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.TrieIntField]. Please consult documentation how to replace it accordingly. [junit4] 2> 1936 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.TrieFloatField]. Please consult documentation how to replace it accordingly. [junit4] 2> 1940 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.TrieLongField]. Please consult documentation how to replace it accordingly. [junit4] 2> 1944 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.TrieDoubleField]. Please consult documentation how to replace it accordingly. [junit4] 2> 1966 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.TrieDateField]. Please consult documentation how to replace it accordingly. [junit4] 2> 2202 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.GeoHashField]. Please consult documentation how to replace it accordingly. [junit4] 2> 2208 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.LatLonType]. Please consult documentation how to replace it accordingly. [junit4] 2> 2217 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.EnumField]. Please consult documentation how to replace it accordingly. {code} IDE (Intellij) {code:java} 1172 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[5A2517E33080AEE6]-worker) [ ] o.a.s.SolrTestCase Setting 'solr.default.confdir' system property to test-framework derived value of '/home/gus/projects/apache/lucene-solr/fork/lucene-solr/solr/server/solr/configsets/_default/conf' 1190 INFO
[jira] [Commented] (SOLR-14597) Advanced Query Parser
[ https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200304#comment-17200304 ] Gus Heck commented on SOLR-14597: - Right, agreed, Lucene stuff also should be broken out to Lucene tickets. All initially here to keep the donation process simple. > Advanced Query Parser > - > > Key: SOLR-14597 > URL: https://issues.apache.org/jira/browse/SOLR-14597 > Project: Solr > Issue Type: New Feature > Components: query parsers >Reporter: Mike Nibeck >Assignee: Gus Heck >Priority: Major > Attachments: aqp_patch.patch > > > This JIRA ticket tracks the progress of SIP-9, the Advanced Query Parser that > is being donated by the Library of Congress. Full description of the feature > can be found on the SIP Page. > [https://cwiki.apache.org/confluence/display/SOLR/SIP-9+Advanced+Query+Parser] > Briefly, this parser provides a comprehensive syntax for users that use > search on a daily basis. It also reserves a smaller set of punctuators than > other parsers. This facilitates easier handling of acronyms and punctuated > patterns with meaning ( such as C++ or 401(k) ). The new syntax opens up some > advanced features while also preventing access to arbitrary features via > local parameters. This parser will be safe for accepting user queries > directly with minimal pre-parsing, but for use cases beyond it's established > features alternate query paths (using other parsers) will need to be supplied. > The code drop is being prepared and will be supplied as soon as we receive > guidance from the PMC regarding the proper process. Given that the Library > already has a signed CCLA we need to understand which of these (or other > processes) apply: > [http://incubator.apache.org/ip-clearance/ip-clearance-template.html] > and > [https://www.apache.org/licenses/contributor-agreements.html#grants] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14597) Advanced Query Parser
[ https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199726#comment-17199726 ] Gus Heck commented on SOLR-14597: - looks like LUCENE-9531 has caused a conflict with the patch, and there have been some changes to the gradle files running javacc so I'm working on updating to work with that and I'll publish the fix as a pull-request for easier review. Now that there is code to look at some responses: [~arafalov]: Point noted about TypeTokenFilter, there is similarity though filtering on flags instead of types. It would be attractive to also inherit from FilteringTokenFilter but It looks like one edge case I ran into isn't handled by the super class. (and makes me wonder if there's a lurking issue with other FilteringTokenFilter sub classes. The case I ran into is thus: The first token in the stream gets assigned a synonym, then in a subsequent step the first token is dropped (this is quite intentional in some use cases we had where the intent was to entirely prevent matches on the original token, but still match on the synonym). When this happens it causes {{java.lang.IllegalArgumentException: first position increment must be > 0 (got 0)}} despite the fact that this scenario is not actually an error in terms of which tokens we want. Unfortunately there's no good way to know what's going to happen to the next token (which may not have the flags in question) so I came up with a workaround that I'm not very pleased with dropping in a placeholder token that is unlikely to match anything. Open to suggestions for better options there, and interested in whether or not other filters that drop tokens can hit the same issue, or if they've handled it in some graceful way I'm not appreciating. Also, now that the code is available, let me know if you still see similarity between PatternTypingFilterFactory and KeywordMarkerFilterFactory... I think they are quite different. [~ichattopadhyaya], [~dsmiley] While some of this could potentially be broken out into a package, there are also some changes to core and some lucene level classes that probably wouldn't want to be in a package, so feel free to put some eyes on it and suggest what the dividing line is (more eyes == better). I'm not against the idea of a 1st party package, but the question is will this be popular enough to merit default inclusion? Another breaking new ground sort of question is "Is it easier to pull it in later or push it out to a package later if we change our minds?" Maybe neither is harder... Changes to note to classes outside the new org.apache.solr.aqp package (where the meat of the new parser and it's .jj file lives): # TypeAsSynonymFilter is gaining the ability to manage what flags are transmitted from the original token to the synonym when it is created # BaseTokenStreamTestCase is gaining the ability to verify the flags on the tokens produced. # access org.apache.solr.cloud.AbstractDistribZkTestBase#copyConfigUp is opened up so that it can be used in a wider array of tests. # Solr gains TokenAnalyzerFilter which applies the Analyzer from a specified field type to the individual tokens of the current stream (see javadoc for more detail) # Operator and SynonymQueryStyle are extracted from the standard parser's base class so they can be re-used. Reuse is is necessary because TextField references SynonymQueryStyle directly. # The above change forces an compile time API change in TextField, which might force this to not be available till 9.x (though the desire to make AQP available in 8.x is there). # The change to TextField then failed TestPackages which failed with a ClassNotFound when it went looking for the old SynonymeQueryStyle inner class that had been promoted to a separate class. This forced me to decompile and provide classes and build/rebuild support for the binary jars checked in for TestPackages (as *.jar.bin). (the .java files for the classes loaded by this test had not been checked in). This is the genesis of the o.a.smy.pkg package namespace. Some of the above (especially #7) might want to be broken into related or sub-tickets. > Advanced Query Parser > - > > Key: SOLR-14597 > URL: https://issues.apache.org/jira/browse/SOLR-14597 > Project: Solr > Issue Type: New Feature > Components: query parsers >Reporter: Mike Nibeck >Assignee: Gus Heck >Priority: Major > Attachments: aqp_patch.patch > > > This JIRA ticket tracks the progress of SIP-9, the Advanced Query Parser that > is being donated by the Library of Congress. Full description of the feature > can be found on the SIP Page. > [https://cwiki.apache.org/confluence/display/SOLR/SIP-9+Advanced+Query+Parser] > Briefly, this parser provides a comprehensive syntax for users that use >
[jira] [Updated] (SOLR-14597) Advanced Query Parser
[ https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-14597: Affects Version/s: (was: 8.7) > Advanced Query Parser > - > > Key: SOLR-14597 > URL: https://issues.apache.org/jira/browse/SOLR-14597 > Project: Solr > Issue Type: New Feature > Components: query parsers >Reporter: Mike Nibeck >Assignee: Gus Heck >Priority: Major > > This JIRA ticket tracks the progress of SIP-9, the Advanced Query Parser that > is being donated by the Library of Congress. Full description of the feature > can be found on the SIP Page. > [https://cwiki.apache.org/confluence/display/SOLR/SIP-9+Advanced+Query+Parser] > Briefly, this parser provides a comprehensive syntax for users that use > search on a daily basis. It also reserves a smaller set of punctuators than > other parsers. This facilitates easier handling of acronyms and punctuated > patterns with meaning ( such as C++ or 401(k) ). The new syntax opens up some > advanced features while also preventing access to arbitrary features via > local parameters. This parser will be safe for accepting user queries > directly with minimal pre-parsing, but for use cases beyond it's established > features alternate query paths (using other parsers) will need to be supplied. > The code drop is being prepared and will be supplied as soon as we receive > guidance from the PMC regarding the proper process. Given that the Library > already has a signed CCLA we need to understand which of these (or other > processes) apply: > [http://incubator.apache.org/ip-clearance/ip-clearance-template.html] > and > [https://www.apache.org/licenses/contributor-agreements.html#grants] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14597) Advanced Query Parser
[ https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-14597: Affects Version/s: (was: 8.6) 8.7 > Advanced Query Parser > - > > Key: SOLR-14597 > URL: https://issues.apache.org/jira/browse/SOLR-14597 > Project: Solr > Issue Type: New Feature > Components: query parsers >Affects Versions: 8.7 >Reporter: Mike Nibeck >Assignee: Gus Heck >Priority: Major > > This JIRA ticket tracks the progress of SIP-9, the Advanced Query Parser that > is being donated by the Library of Congress. Full description of the feature > can be found on the SIP Page. > [https://cwiki.apache.org/confluence/display/SOLR/SIP-9+Advanced+Query+Parser] > Briefly, this parser provides a comprehensive syntax for users that use > search on a daily basis. It also reserves a smaller set of punctuators than > other parsers. This facilitates easier handling of acronyms and punctuated > patterns with meaning ( such as C++ or 401(k) ). The new syntax opens up some > advanced features while also preventing access to arbitrary features via > local parameters. This parser will be safe for accepting user queries > directly with minimal pre-parsing, but for use cases beyond it's established > features alternate query paths (using other parsers) will need to be supplied. > The code drop is being prepared and will be supplied as soon as we receive > guidance from the PMC regarding the proper process. Given that the Library > already has a signed CCLA we need to understand which of these (or other > processes) apply: > [http://incubator.apache.org/ip-clearance/ip-clearance-template.html] > and > [https://www.apache.org/licenses/contributor-agreements.html#grants] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14787) Inequality support in Payload Check query parser
[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck reassigned SOLR-14787: --- Assignee: Gus Heck > Inequality support in Payload Check query parser > > > Key: SOLR-14787 > URL: https://issues.apache.org/jira/browse/SOLR-14787 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The goal of this ticket/pull request is to support a richer set of matching > and filtering based on term payloads. This patch extends the > PayloadCheckQueryParser to add a new local param for "op" > The value of OP could be one of the following > * gt - greater than > * gte - greater than or equal > * lt - less than > * lte - less than or equal > default value for "op" if not specified is to be the current behavior of > equals. > Additionally to the operation you can specify a threshold local parameter > This will provide the ability to search for the term "cat" so long as the > payload has a value of greater than 0.75. > One use case is to classify a document into various categories with an > associated confidence or probability that the classification is correct. > That can be indexed into a delimited payload field. The searches can find > and match documents that were tagged with the "cat" category with a > confidence of greater than 0.5. > Example Document > {code:java} > { > "id":"doc_1", > "classifications_payload":["cat|0.75 dog|2.0"] > } > {code} > Example Syntax > {code:java} > {!payload_check f=classifications_payload payloads='1' op='gt' > threshold='0.5'}cat {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14726) Streamline getting started experience
[ https://issues.apache.org/jira/browse/SOLR-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185873#comment-17185873 ] Gus Heck commented on SOLR-14726: - Oh and there's that elephant just outside the doorway (i.e. not in scope for this ticket)... the lack of user friendly documentation for lucene itself :) > Streamline getting started experience > - > > Key: SOLR-14726 > URL: https://issues.apache.org/jira/browse/SOLR-14726 > Project: Solr > Issue Type: Task >Reporter: Ishan Chattopadhyaya >Assignee: Alexandre Rafalovitch >Priority: Major > Labels: newdev > Attachments: yasa-http.png > > > The reference guide Solr tutorial is here: > https://lucene.apache.org/solr/guide/8_6/solr-tutorial.html > It needs to be simplified and easy to follow. Also, it should reflect our > best practices, that should also be followed in production. I have following > suggestions: > # Make it less verbose. It is too long. On my laptop, it required 35 page > downs button presses to get to the bottom of the page! > # First step of the tutorial should be to enable security (basic auth should > suffice). > # {{./bin/solr start -e cloud}} <-- All references of -e should be removed. > # All references of {{bin/solr post}} to be replaced with {{curl}} > # Convert all {{bin/solr create}} references to curl of collection creation > commands > # Add docker based startup instructions. > # Create a Jupyter Notebook version of the entire tutorial, make it so that > it can be easily executed from Google Colaboratory. Here's an example: > https://twitter.com/TheSearchStack/status/1289703715981496320 > # Provide downloadable Postman and Insomnia files so that the same tutorial > can be executed from those tools. Except for starting Solr, all other steps > should be possible to be carried out from those tools. > # Use V2 APIs everywhere in the tutorial > # Remove all example modes, sample data (films, tech products etc.), > configsets from Solr's distribution (instead let the examples refer to them > from github) > # Remove the post tool from Solr, curl should suffice. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14726) Streamline getting started experience
[ https://issues.apache.org/jira/browse/SOLR-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185855#comment-17185855 ] Gus Heck edited comment on SOLR-14726 at 8/27/20, 2:00 PM: --- Can we make it a goal that the user be **completely** unaware of what mode (cloud or not) they are using in the initial contact. That's deployment stuff and nothing they should even think about on first contact. I think they should run "tutorial1.sh" or {{bin/solr -e tutorial1}} and then pull up a page in their web browser to see it worked. Cloud or non-cloud can be used behind the scenes as current or future maintainers see fit. An adapted version of my comments on slack: There are various things to learn about solr... I might order them thus for what I (IMHO) consider optimal pedagogy: # {color:#0747a6}First Contact: A cushy easy intro that stands up solr, throws data in for them, and let's the user query it either in the UI or via curl as suits them (different people have different styles){color} # {color:#0747a6}Basic search concepts: inverted indexes, tokenization, a query syntax, sort vs relevancy scoring.{color} # {color:#0747a6}How to get data in (because without data whatever), and the need to be able to re-index{color} # How to deploy solr in a basically competent fashion for light duty use in low security environments # Features such as facets, highlighting, analysis options etc, this section should be an a la carte menu into the ref guide, as by this point they are becoming more advanced. # Hardening and Scaling solr, and otherwise making it production ready For the first 3 you really don't want the user to see any of #4 and it really doesn't matter if it's cloud or not so long as the person trying to learn doesn't see whichever it is. I think bin/solr -e accomplishes that with #1, and we basically don't do a good job of teaching #3 (in the ref guide). When you get to #4 I can't imagine which cases you would want to have them start with non-cloud solr, though that section should have a closing section on non-cloud and the trade-offs of using it. #5 should be a la carte anyway, and we do have a fairly coherent section for #6 was (Author: gus_heck): Can we make it a goal that the user be **completely** unaware of what mode (cloud or not) they are using in the initial contact. That's deployment stuff and nothing they should even think about on first contact. I think they should run "tutorial1.sh" or {{bin/solr -e tutorial1}} and then pull up a page in their web browser to see it worked. Cloud or non-cloud can be used behind the scenes as current or future maintainers see fit. An adapted version of my comments on slack: There are various things to learn about solr... I might order them thus for what I (IMHO) consider optimal pedagogy: # {color:#0747a6}First Contact: A cushy easy intro that stands up solr, throws data in for them, and let's the user query it either in the UI or via curl as suits them (different people have different styles){color} # {color:#0747a6}Basic search concepts: inverted indexes, tokenization, a query syntax, sort vs relevancy scoring.{color} # {color:#0747a6}How to get data in (because without data whatever), and the need to be able to re-index{color} # How to deploy solr in a basically competent fashion for light duty use in low security environments # Features such as facets, highlighting, analysis options etc, this section should be an a la carte menu into the ref guide, as by this point they are becoming more advanced. # Hardening and Scaling solr, and otherwise making it production ready For the first 3 you really don't want the user to see any of #4 and it really doesn't matter if it's cloud or not so long as the person trying to learn doesn't see whichever it is. I think bin/solr -e accomplishes that with #1, and we basically don't do a good job of teaching #3 (in the ref guide). When you get to #4 I can't imagine which cases you would want to have them start with non-cloud solr, and have a closing section on non-cloud and the trade-offs of using it. #5 should be a la carte anyway, and we do have a fairly coherent section for #6 > Streamline getting started experience > - > > Key: SOLR-14726 > URL: https://issues.apache.org/jira/browse/SOLR-14726 > Project: Solr > Issue Type: Task >Reporter: Ishan Chattopadhyaya >Assignee: Alexandre Rafalovitch >Priority: Major > Labels: newdev > Attachments: yasa-http.png > > > The reference guide Solr tutorial is here: > https://lucene.apache.org/solr/guide/8_6/solr-tutorial.html > It needs to be simplified and easy to follow. Also, it should reflect our > best practices, that should also be followed in production. I have following > suggestions: > #
[jira] [Commented] (SOLR-14726) Streamline getting started experience
[ https://issues.apache.org/jira/browse/SOLR-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185857#comment-17185857 ] Gus Heck commented on SOLR-14726: - One caveat to what I just said is that cloud vs non-cloud does somewhat matter for "getting data in" WRT which SolrJ classes one might use. > Streamline getting started experience > - > > Key: SOLR-14726 > URL: https://issues.apache.org/jira/browse/SOLR-14726 > Project: Solr > Issue Type: Task >Reporter: Ishan Chattopadhyaya >Assignee: Alexandre Rafalovitch >Priority: Major > Labels: newdev > Attachments: yasa-http.png > > > The reference guide Solr tutorial is here: > https://lucene.apache.org/solr/guide/8_6/solr-tutorial.html > It needs to be simplified and easy to follow. Also, it should reflect our > best practices, that should also be followed in production. I have following > suggestions: > # Make it less verbose. It is too long. On my laptop, it required 35 page > downs button presses to get to the bottom of the page! > # First step of the tutorial should be to enable security (basic auth should > suffice). > # {{./bin/solr start -e cloud}} <-- All references of -e should be removed. > # All references of {{bin/solr post}} to be replaced with {{curl}} > # Convert all {{bin/solr create}} references to curl of collection creation > commands > # Add docker based startup instructions. > # Create a Jupyter Notebook version of the entire tutorial, make it so that > it can be easily executed from Google Colaboratory. Here's an example: > https://twitter.com/TheSearchStack/status/1289703715981496320 > # Provide downloadable Postman and Insomnia files so that the same tutorial > can be executed from those tools. Except for starting Solr, all other steps > should be possible to be carried out from those tools. > # Use V2 APIs everywhere in the tutorial > # Remove all example modes, sample data (films, tech products etc.), > configsets from Solr's distribution (instead let the examples refer to them > from github) > # Remove the post tool from Solr, curl should suffice. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14726) Streamline getting started experience
[ https://issues.apache.org/jira/browse/SOLR-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185855#comment-17185855 ] Gus Heck commented on SOLR-14726: - Can we make it a goal that the user be **completely** unaware of what mode (cloud or not) they are using in the initial contact. That's deployment stuff and nothing they should even think about on first contact. I think they should run "tutorial1.sh" or {{bin/solr -e tutorial1}} and then pull up a page in their web browser to see it worked. Cloud or non-cloud can be used behind the scenes as current or future maintainers see fit. An adapted version of my comments on slack: There are various things to learn about solr... I might order them thus for what I (IMHO) consider optimal pedagogy: # {color:#0747a6}First Contact: A cushy easy intro that stands up solr, throws data in for them, and let's the user query it either in the UI or via curl as suits them (different people have different styles){color} # {color:#0747a6}Basic search concepts: inverted indexes, tokenization, a query syntax, sort vs relevancy scoring.{color} # {color:#0747a6}How to get data in (because without data whatever), and the need to be able to re-index{color} # How to deploy solr in a basically competent fashion for light duty use in low security environments # Features such as facets, highlighting, analysis options etc, this section should be an a la carte menu into the ref guide, as by this point they are becoming more advanced. # Hardening and Scaling solr, and otherwise making it production ready For the first 3 you really don't want the user to see any of #4 and it really doesn't matter if it's cloud or not so long as the person trying to learn doesn't see whichever it is. I think bin/solr -e accomplishes that with #1, and we basically don't do a good job of teaching #3 (in the ref guide). When you get to #4 I can't imagine which cases you would want to have them start with non-cloud solr, and have a closing section on non-cloud and the trade-offs of using it. #5 should be a la carte anyway, and we do have a fairly coherent section for #6 > Streamline getting started experience > - > > Key: SOLR-14726 > URL: https://issues.apache.org/jira/browse/SOLR-14726 > Project: Solr > Issue Type: Task >Reporter: Ishan Chattopadhyaya >Assignee: Alexandre Rafalovitch >Priority: Major > Labels: newdev > Attachments: yasa-http.png > > > The reference guide Solr tutorial is here: > https://lucene.apache.org/solr/guide/8_6/solr-tutorial.html > It needs to be simplified and easy to follow. Also, it should reflect our > best practices, that should also be followed in production. I have following > suggestions: > # Make it less verbose. It is too long. On my laptop, it required 35 page > downs button presses to get to the bottom of the page! > # First step of the tutorial should be to enable security (basic auth should > suffice). > # {{./bin/solr start -e cloud}} <-- All references of -e should be removed. > # All references of {{bin/solr post}} to be replaced with {{curl}} > # Convert all {{bin/solr create}} references to curl of collection creation > commands > # Add docker based startup instructions. > # Create a Jupyter Notebook version of the entire tutorial, make it so that > it can be easily executed from Google Colaboratory. Here's an example: > https://twitter.com/TheSearchStack/status/1289703715981496320 > # Provide downloadable Postman and Insomnia files so that the same tutorial > can be executed from those tools. Except for starting Solr, all other steps > should be possible to be carried out from those tools. > # Use V2 APIs everywhere in the tutorial > # Remove all example modes, sample data (films, tech products etc.), > configsets from Solr's distribution (instead let the examples refer to them > from github) > # Remove the post tool from Solr, curl should suffice. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13260) Add support for 128 bit integer point fields
[ https://issues.apache.org/jira/browse/SOLR-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179290#comment-17179290 ] Gus Heck commented on SOLR-13260: - I'm interested in SOLR-6741 which you've set as requiring this. Can you elaborate on how this PR will be used in 6741? Are you planning on extending the "ByteStringPointField"? Do your plans account for or use InetAddressPoint (lucene class)? Also a couple comments on the PR.. > Add support for 128 bit integer point fields > > > Key: SOLR-13260 > URL: https://issues.apache.org/jira/browse/SOLR-13260 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Dale Richardson >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > Since support for ipv6 requires dealing with 128 bit Point fields, I'm > splitting out support for 128 bit integer point fields into a separate commit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (SOLR-14582) Expose IWC.setMaxCommitMergeWaitMillis as an expert feature in Solr's index config
[ https://issues.apache.org/jira/browse/SOLR-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck reopened SOLR-14582: - Broken test, AwaitsFix > Expose IWC.setMaxCommitMergeWaitMillis as an expert feature in Solr's index > config > -- > > Key: SOLR-14582 > URL: https://issues.apache.org/jira/browse/SOLR-14582 > Project: Solr > Issue Type: Improvement >Reporter: Tomas Eduardo Fernandez Lobbe >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Trivial > Fix For: master (9.0), 8.7 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > LUCENE-8962 added the ability to merge segments synchronously on commit. This > isn't done by default and the default {{MergePolicy}} won't do it, but custom > merge policies can take advantage of this. Solr allows plugging in custom > merge policies, so if someone wants to make use of this feature they could, > however, they need to set {{IndexWriterConfig.maxCommitMergeWaitSeconds}} to > something greater than 0. > Since this is an expert feature, I plan to document it only in javadoc and > not the ref guide. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
[ https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171256#comment-17171256 ] Gus Heck commented on SOLR-14706: - I tested the PR and I found A) Collection creation no longer fails B) The upgrade recommendation to remove the policy works as far as it goes C) The autoscaling.json still has cluster preferences after upgrade which are not present with a fresh install. I think we also want to recommend {code:java} {"set-cluster-preferences" : []} {code} To ensure exact parity with a fresh install. > Upgrading 8.6.0 to 8.6.1 causes collection creation to fail > --- > > Key: SOLR-14706 > URL: https://issues.apache.org/jira/browse/SOLR-14706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.7, 8.6.1 > Environment: 8.6.1 upgraded from 8.6.0 with more than one node >Reporter: Gus Heck >Assignee: Houston Putman >Priority: Blocker > Fix For: 8.6.1 > > Time Spent: 20m > Remaining Estimate: 0h > > The following steps will reproduce a situation in which collection creation > fails with this stack trace: > {code:java} > 2020-08-03 12:17:58.617 INFO > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 > 2020-08-03 12:17:58.751 ERROR > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: > create failed:org.apache.solr.common.SolrException > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347) > at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) > at > org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Only one extra tag supported for the > tag cores in { > "cores":"#EQUAL", > "node":"#ANY", > "strict":"false"} > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122) > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144) > at > org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277) > at > org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192) > ... 6 more > {code} > Generalized steps: > # Deploy 8.6.0 with separate data directories, create a collection to prove > it's working > # download > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz > # Stop the server on all nodes > # replace the 8.6.0 with 8.6.1 > # Start the server > # via the admin UI create a collection > # Observe failure warning box (with no text), check logs, find above trace > Or more exactly here are my actual commands with a checkout of the 8.6.0 tag > in the working dir to which cloud.sh was configured: > # /cloud.sh new -r upgrademe > # Create collection named test860 via admin ui with _default > # ./cloud.sh stop > # cd
[jira] [Commented] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh
[ https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171212#comment-17171212 ] Gus Heck commented on SOLR-14704: - This is being added because the current script presently only unpacks the tarball if it has {{-r}} (recompile), or if you are running the {{new}} command. The new command will fail before extraction if the directory already exists(intentional, for safety). If {{-r}} is used it would overwrite whatever you placed in the directory with whatever is in your working copy after compilation/packaging and then immediately start the server with that instead. This could also have been done as {{-t }} (and that could still be added), or {{-u}} to trigger an archive/re-extract but I thought it was slightly nicer to do the download without requiring separate steps. Among the things that may want to be added to this PR (which is just a start) is support for {{-d}} (and/or {{-t}}) in start/restart for upgrade testing, and pushing a new solr.xml to zk could be necessary in some cases but is not yet accounted for. > Add download option to solr/cloud-dev/cloud.sh > -- > > Key: SOLR-14704 > URL: https://issues.apache.org/jira/browse/SOLR-14704 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > For easier testing of things like RC artifacts I'm adding an option to > cloud.sh which will curl a tarball down from the web instead of building it > locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh
[ https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171205#comment-17171205 ] Gus Heck edited comment on SOLR-14704 at 8/5/20, 1:10 AM: -- I had gone for simply downloading from a specified url, for flexibility... thus it could be used for RC, actual releases, internal releases on internal repositories, etc etc. Also, given the hard to predict has sequence in RC artifact urls, I think it would be a lot more work to derive that URL and a lot more fragile was (Author: gus_heck): I had gone for simply downloading from a specified url, for flexibility... thus it could be used for RC, actual releases, internal releases on internal repositories, etc etc. > Add download option to solr/cloud-dev/cloud.sh > -- > > Key: SOLR-14704 > URL: https://issues.apache.org/jira/browse/SOLR-14704 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > For easier testing of things like RC artifacts I'm adding an option to > cloud.sh which will curl a tarball down from the web instead of building it > locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh
[ https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171205#comment-17171205 ] Gus Heck commented on SOLR-14704: - I had gone for simply downloading from a specified url, for flexibility... thus it could be used for RC, actual releases, internal releases on internal repositories, etc etc. > Add download option to solr/cloud-dev/cloud.sh > -- > > Key: SOLR-14704 > URL: https://issues.apache.org/jira/browse/SOLR-14704 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > For easier testing of things like RC artifacts I'm adding an option to > cloud.sh which will curl a tarball down from the web instead of building it > locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
[ https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-14706: Fix Version/s: 8.6.1 > Upgrading 8.6.0 to 8.6.1 causes collection creation to fail > --- > > Key: SOLR-14706 > URL: https://issues.apache.org/jira/browse/SOLR-14706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.6.1 > Environment: 8.6.1 upgraded from 8.6.0 >Reporter: Gus Heck >Priority: Blocker > Fix For: 8.6.1 > > > The following steps will reproduce a situation in which collection creation > fails with this stack trace: > {code:java} > 2020-08-03 12:17:58.617 INFO > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 > 2020-08-03 12:17:58.751 ERROR > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: > create failed:org.apache.solr.common.SolrException > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347) > at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) > at > org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Only one extra tag supported for the > tag cores in { > "cores":"#EQUAL", > "node":"#ANY", > "strict":"false"} > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122) > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144) > at > org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277) > at > org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192) > ... 6 more > {code} > Generalized steps: > # Deploy 8.6.0 with separate data directories, create a collection to prove > it's working > # download > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz > # Stop the server on all nodes > # replace the 8.6.0 with 8.6.1 > # Start the server > # via the admin UI create a collection > # Observe failure warning box (with no text), check logs, find above trace > Or more exactly here are my actual commands with a checkout of the 8.6.0 tag > in the working dir to which cloud.sh was configured: > # /cloud.sh new -r upgrademe > # Create collection named test860 via admin ui with _default > # ./cloud.sh stop > # cd upgrademe/ > # cp ../8_6_1_RC1/solr-8.6.1.tgz . > # mv solr-8.6.0-SNAPSHOT old > # tar xzvf solr-8.6.1.tgz > # cd .. > # ./cloud.sh start > # Try to create collection test861 with _default config > For those not familiar with it the first command there with cloud.sh builds > the tarball in the working directory and then makes a directory named > "upgrademe" copies it to "upgrademe" unpacks it, sets up a chroot based on > the path in (already running separate) zookeeper, and by default starts 4 > local nodes on ports 8981
[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
[ https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-14706: Environment: 8.6.1 upgraded from 8.6.0 with more than one node (was: 8.6.1 upgraded from 8.6.0) > Upgrading 8.6.0 to 8.6.1 causes collection creation to fail > --- > > Key: SOLR-14706 > URL: https://issues.apache.org/jira/browse/SOLR-14706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.6.1 > Environment: 8.6.1 upgraded from 8.6.0 with more than one node >Reporter: Gus Heck >Priority: Blocker > Fix For: 8.6.1 > > > The following steps will reproduce a situation in which collection creation > fails with this stack trace: > {code:java} > 2020-08-03 12:17:58.617 INFO > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 > 2020-08-03 12:17:58.751 ERROR > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: > create failed:org.apache.solr.common.SolrException > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347) > at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) > at > org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Only one extra tag supported for the > tag cores in { > "cores":"#EQUAL", > "node":"#ANY", > "strict":"false"} > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122) > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144) > at > org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277) > at > org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192) > ... 6 more > {code} > Generalized steps: > # Deploy 8.6.0 with separate data directories, create a collection to prove > it's working > # download > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz > # Stop the server on all nodes > # replace the 8.6.0 with 8.6.1 > # Start the server > # via the admin UI create a collection > # Observe failure warning box (with no text), check logs, find above trace > Or more exactly here are my actual commands with a checkout of the 8.6.0 tag > in the working dir to which cloud.sh was configured: > # /cloud.sh new -r upgrademe > # Create collection named test860 via admin ui with _default > # ./cloud.sh stop > # cd upgrademe/ > # cp ../8_6_1_RC1/solr-8.6.1.tgz . > # mv solr-8.6.0-SNAPSHOT old > # tar xzvf solr-8.6.1.tgz > # cd .. > # ./cloud.sh start > # Try to create collection test861 with _default config > For those not familiar with it the first command there with cloud.sh builds > the tarball in the working directory and then makes a directory named > "upgrademe" copies it to "upgrademe" unpacks it, sets up a chroot based on >
[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
[ https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-14706: Issue Type: Bug (was: New Feature) > Upgrading 8.6.0 to 8.6.1 causes collection creation to fail > --- > > Key: SOLR-14706 > URL: https://issues.apache.org/jira/browse/SOLR-14706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.6.1 > Environment: 8.6.1 upgraded from 8.6.0 >Reporter: Gus Heck >Priority: Blocker > > The following steps will reproduce a situation in which collection creation > fails with this stack trace: > {code:java} > 2020-08-03 12:17:58.617 INFO > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 > 2020-08-03 12:17:58.751 ERROR > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: > create failed:org.apache.solr.common.SolrException > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347) > at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) > at > org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Only one extra tag supported for the > tag cores in { > "cores":"#EQUAL", > "node":"#ANY", > "strict":"false"} > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122) > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144) > at > org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277) > at > org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192) > ... 6 more > {code} > Generalized steps: > # Deploy 8.6.0 with separate data directories, create a collection to prove > it's working > # download > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz > # Stop the server on all nodes > # replace the 8.6.0 with 8.6.1 > # Start the server > # via the admin UI create a collection > # Observe failure warning box (with no text), check logs, find above trace > Or more exactly here are my actual commands with a checkout of the 8.6.0 tag > in the working dir to which cloud.sh was configured: > # /cloud.sh new -r upgrademe > # Create collection named test860 via admin ui with _default > # ./cloud.sh stop > # cd upgrademe/ > # cp ../8_6_1_RC1/solr-8.6.1.tgz . > # mv solr-8.6.0-SNAPSHOT old > # tar xzvf solr-8.6.1.tgz > # cd .. > # ./cloud.sh start > # Try to create collection test861 with _default config > For those not familiar with it the first command there with cloud.sh builds > the tarball in the working directory and then makes a directory named > "upgrademe" copies it to "upgrademe" unpacks it, sets up a chroot based on > the path in (already running separate) zookeeper, and by default starts 4 > local nodes on ports 8981 to 8984 all
[jira] [Commented] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
[ https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170384#comment-17170384 ] Gus Heck commented on SOLR-14706: - You can use solr/cloud-dev/cloud.sh to fire up multiple nodes quickly. The top of cloud.sh has extensive comments on it's use. But setting it up independently would be interesting too in case the script is actually misconfiguring something. The script does upload a solr.xml to zk too which wouldn't have been done again when I upgraded but I haven't thought of how that could be involved yet since that didn't change > Upgrading 8.6.0 to 8.6.1 causes collection creation to fail > --- > > Key: SOLR-14706 > URL: https://issues.apache.org/jira/browse/SOLR-14706 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.6.1 > Environment: 8.6.1 upgraded from 8.6.0 >Reporter: Gus Heck >Priority: Blocker > > The following steps will reproduce a situation in which collection creation > fails with this stack trace: > {code:java} > 2020-08-03 12:17:58.617 INFO > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 > 2020-08-03 12:17:58.751 ERROR > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: > create failed:org.apache.solr.common.SolrException > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347) > at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) > at > org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Only one extra tag supported for the > tag cores in { > "cores":"#EQUAL", > "node":"#ANY", > "strict":"false"} > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122) > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144) > at > org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277) > at > org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192) > ... 6 more > {code} > Generalized steps: > # Deploy 8.6.0 with separate data directories, create a collection to prove > it's working > # download > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz > # Stop the server on all nodes > # replace the 8.6.0 with 8.6.1 > # Start the server > # via the admin UI create a collection > # Observe failure warning box (with no text), check logs, find above trace > Or more exactly here are my actual commands with a checkout of the 8.6.0 tag > in the working dir to which cloud.sh was configured: > # /cloud.sh new -r upgrademe > # Create collection named test860 via admin ui with _default > # ./cloud.sh stop > # cd upgrademe/ > # cp ../8_6_1_RC1/solr-8.6.1.tgz . > # mv solr-8.6.0-SNAPSHOT old > # tar xzvf solr-8.6.1.tgz > # cd
[jira] [Commented] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
[ https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170025#comment-17170025 ] Gus Heck commented on SOLR-14706: - This has to do with the following bit of code from Clause {code} if (globalTagName.isPresent()) { globalTag = parse(globalTagName.get(), m); if (m.size() > 2) { throw new RuntimeException("Only one extra tag supported for the tag " + globalTagName.get() + " in " + toJSONString(m)); } {code} which was recently changed from > 3 to > 2 by [~houstonputman] I am quite unfamiliar with this area of the code. Houston, can you take a look? > Upgrading 8.6.0 to 8.6.1 causes collection creation to fail > --- > > Key: SOLR-14706 > URL: https://issues.apache.org/jira/browse/SOLR-14706 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.6.1 > Environment: 8.6.1 upgraded from 8.6.0 >Reporter: Gus Heck >Priority: Blocker > > The following steps will reproduce a situation in which collection creation > fails with this stack trace: > {code:java} > 2020-08-03 12:17:58.617 INFO > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 > 2020-08-03 12:17:58.751 ERROR > (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ > ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: > create failed:org.apache.solr.common.SolrException > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347) > at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) > at > org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Only one extra tag supported for the > tag cores in { > "cores":"#EQUAL", > "node":"#ANY", > "strict":"false"} > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122) > at > org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144) > at > org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300) > at > org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277) > at > org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415) > at > org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192) > ... 6 more > {code} > Generalized steps: > # Deploy 8.6.0 with separate data directories, create a collection to prove > it's working > # download > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz > # Stop the server on all nodes > # replace the 8.6.0 with 8.6.1 > # Start the server > # via the admin UI create a collection > # Observe failure warning box (with no text), check logs, find above trace > Or more exactly here are my actual commands with a checkout of the 8.6.0 tag > in the working dir to which cloud.sh was configured: > # /cloud.sh new -r upgrademe > # Create collection named test860 via admin ui with _default > # ./cloud.sh stop > # cd upgrademe/ > # cp ../8_6_1_RC1/solr-8.6.1.tgz . > # mv
[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
[ https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-14706: Description: The following steps will reproduce a situation in which collection creation fails with this stack trace: {code:java} 2020-08-03 12:17:58.617 INFO (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 2020-08-03 12:17:58.751 ERROR (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: create failed:org.apache.solr.common.SolrException at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347) at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Only one extra tag supported for the tag cores in { "cores":"#EQUAL", "node":"#ANY", "strict":"false"} at org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122) at org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144) at org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372) at org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300) at org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277) at org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661) at org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415) at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192) ... 6 more {code} Generalized steps: # Deploy 8.6.0 with separate data directories, create a collection to prove it's working # download https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz # Stop the server on all nodes # replace the 8.6.0 with 8.6.1 # Start the server # via the admin UI create a collection # Observe failure warning box (with no text), check logs, find above trace Or more exactly here are my actual commands with a checkout of the 8.6.0 tag in the working dir to which cloud.sh was configured: # /cloud.sh new -r upgrademe # Create collection named test860 via admin ui with _default # ./cloud.sh stop # cd upgrademe/ # cp ../8_6_1_RC1/solr-8.6.1.tgz . # mv solr-8.6.0-SNAPSHOT old # tar xzvf solr-8.6.1.tgz # cd .. # ./cloud.sh start # Try to create collection test861 with _default config For those not familiar with it the first command there with cloud.sh builds the tarball in the working directory and then makes a directory named "upgrademe" copies it to "upgrademe" unpacks it, sets up a chroot based on the path in (already running separate) zookeeper, and by default starts 4 local nodes on ports 8981 to 8984 all with separate data directorys hosted under the "upgrademe" directory. was: The following steps will reproduce a situation in which collection creation fails with this stack trace: {code:java} 2020-08-03 12:17:58.617 INFO (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 2020-08-03 12:17:58.751 ERROR (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: create failed:org.apache.solr.common.SolrException at
[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
[ https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-14706: Description: The following steps will reproduce a situation in which collection creation fails with this stack trace: {code:java} 2020-08-03 12:17:58.617 INFO (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 2020-08-03 12:17:58.751 ERROR (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: create failed:org.apache.solr.common.SolrException at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347) at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Only one extra tag supported for the tag cores in { "cores":"#EQUAL", "node":"#ANY", "strict":"false"} at org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122) at org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144) at org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372) at org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300) at org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277) at org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661) at org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415) at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192) ... 6 more {code} Generalized steps: # Deploy 8.6.0 with separate data directories, create a collection to prove it's working** # download https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz # Stop the server on all nodes # replace the 8.6.0 with 8.6.1 # Start the server # via the admin UI create a collection # Observe failure warning box (with no text), check logs, find above trace Or more exactly here are my actual commands with a checkout of the 8.6.0 tag in the working dir to which cloud.sh was configured: # /cloud.sh new -r upgrademe # Create collection named test860 via admin ui with _default # ./cloud.sh stop # cd upgrademe/ # cp ../8_6_1_RC1/solr-8.6.1.tgz . # mv solr-8.6.0-SNAPSHOT old # tar xzvf solr-8.6.1.tgz # cd .. # ./cloud.sh start # Try to create collection test861 with _default config For those not familiar with it the first command there with cloud.sh builds the tarball in the working directory and then makes a directory named "upgrademe" copies it to "upgrademe" unpacks it, sets up a chroot based on the path in (already running separate) zookeeper, and by default starts 4 local nodes on ports 8981 to 8984 all with separate data directorys hosted under the "upgrademe" directory. was: The following steps will reproduce a situation in which collection creation fails with this stack trace: {code:java} 2020-08-03 12:17:58.617 INFO (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 2020-08-03 12:17:58.751 ERROR (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: create failed:org.apache.solr.common.SolrException at
[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
[ https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-14706: Description: The following steps will reproduce a situation in which collection creation fails with this stack trace: {code:java} 2020-08-03 12:17:58.617 INFO (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 2020-08-03 12:17:58.751 ERROR (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: create failed:org.apache.solr.common.SolrException at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347) at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Only one extra tag supported for the tag cores in { "cores":"#EQUAL", "node":"#ANY", "strict":"false"} at org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122) at org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144) at org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372) at org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300) at org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277) at org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661) at org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415) at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192) ... 6 more {code} Generalized steps: # Deploy 8.6.0 with separate data directories, create a collection to prove it's working** # download https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz # Stop the server on all nodes # replace the 8.6.0 with 8.6.1 # Start the server # via the admin UI create a collection # Observe failure warning box (with no text), check logs, find above trace Or more exactly here are my actual commands with a checkout of the 8.6.0 tag in the working dir to which cloud.sh was configured: # /cloud.sh new -r upgrademe # Create collection named test860 via admin ui with _default # ./cloud.sh stop # cd upgrademe/ # cp ../8_6_1_RC1/solr-8.6.1.tgz . # mv solr-8.6.0-SNAPSHOT old # tar xzvf solr-8.6.1.tgz # cd .. # ./cloud.sh start For those not familiar with it the first command there with cloud.sh builds the tarball in the working directory and then makes a directory named "upgrademe" copies it to "upgrademe" unpacks it, sets up a chroot based on the path in (already running separate) zookeeper, and by default starts 4 local nodes on ports 8981 to 8984 all with separate data directorys hosted under the "upgrademe" directory. was: The following steps will reproduce a situation in which collection creation fails with this stack trace: {code:java} 2020-08-03 12:17:58.617 INFO (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 2020-08-03 12:17:58.751 ERROR (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: create failed:org.apache.solr.common.SolrException at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347) at
[jira] [Created] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
Gus Heck created SOLR-14706: --- Summary: Upgrading 8.6.0 to 8.6.1 causes collection creation to fail Key: SOLR-14706 URL: https://issues.apache.org/jira/browse/SOLR-14706 Project: Solr Issue Type: New Feature Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Affects Versions: 8.6.1 Environment: 8.6.1 upgraded from 8.6.0 Reporter: Gus Heck The following steps will reproduce a situation in which collection creation fails with this stack trace: {code:java} 2020-08-03 12:17:58.617 INFO (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861 2020-08-03 12:17:58.751 ERROR (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [ ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: create failed:org.apache.solr.common.SolrException at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347) at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Only one extra tag supported for the tag cores in { "cores":"#EQUAL", "node":"#ANY", "strict":"false"} at org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122) at org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144) at org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372) at org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300) at org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277) at org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661) at org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415) at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192) ... 6 more {code} Generalized steps: # Deploy 8.6.0 with separate data directories, create a collection to prove it's working** # download https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz # Stop the server on all nodes # replace the 8.6.0 with 8.6.1 # Start the server # via the admin UI create a collection # Observe failure warning box (with no text), check logs, find above trace Or more exactly here are my actual commands with a checkout of the 8.6.0 tag in the working dir to which cloud.sh was configured: # /cloud.sh new -r upgrademe # Create collection named test860 via admin ui with _default # ./cloud.sh stop # cd upgrademe/ # ../8_6_1_RC1/solr-8.6.1.tgz . # mv solr-8.6.0-SNAPSHOT old # tar xzvf solr-8.6.1.tgz # cd .. # ./cloud.sh start For those not familiar with it the first command there with cloud.sh builds the tarball in the working directory and then makes a directory named "upgrademe" copies it to "upgrademe" unpacks it, sets up a chroot based on the path in (already running separate) zookeeper, and by default starts 4 local nodes on ports 8981 to 8984 all with separate data directorys hosted under the "upgrademe" directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh
[ https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169700#comment-17169700 ] Gus Heck edited comment on SOLR-14704 at 8/3/20, 3:57 AM: -- For example: {code:java} ./cloud.sh new -d https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz 8_6_1_RC1 {code} was (Author: gus_heck): For example: {code:java} ./cloud.sh new -t https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz 8_6_1_RC1 {code} > Add download option to solr/cloud-dev/cloud.sh > -- > > Key: SOLR-14704 > URL: https://issues.apache.org/jira/browse/SOLR-14704 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > For easier testing of things like RC artifacts I'm adding an option to > cloud.sh which will curl a tarball down from the web instead of building it > locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh
[ https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169700#comment-17169700 ] Gus Heck commented on SOLR-14704: - For example: {code:java} ./cloud.sh new -t https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz 8_6_1_RC1 {code} > Add download option to solr/cloud-dev/cloud.sh > -- > > Key: SOLR-14704 > URL: https://issues.apache.org/jira/browse/SOLR-14704 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > For easier testing of things like RC artifacts I'm adding an option to > cloud.sh which will curl a tarball down from the web instead of building it > locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh
Gus Heck created SOLR-14704: --- Summary: Add download option to solr/cloud-dev/cloud.sh Key: SOLR-14704 URL: https://issues.apache.org/jira/browse/SOLR-14704 Project: Solr Issue Type: New Feature Security Level: Public (Default Security Level. Issues are Public) Components: scripts and tools Reporter: Gus Heck Assignee: Gus Heck For easier testing of things like RC artifacts I'm adding an option to cloud.sh which will curl a tarball down from the web instead of building it locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck resolved SOLR-13169. - Fix Version/s: 8.6 Resolution: Fixed > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Fix For: 8.6 > > Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14608) Faster sorting for the /export handler
[ https://issues.apache.org/jira/browse/SOLR-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165712#comment-17165712 ] Gus Heck edited comment on SOLR-14608 at 7/27/20, 1:41 PM: --- A question from a customer caused me to re-read this and think a bit more deeply. I'm wondering about the fact that the priority queue has a limit on it's size. This would seem to place a (hard to define) limit on the size of the segment, and perhaps fail by returning out of order docs silently? (The client case in question is a collection that is approaching half a trillion documents...) was (Author: gus_heck): A question from a customer caused me to re-read this and think a bit more deeply. I'm wondering about the fact that the priority queue has a limit on it's size. This would seem to place a (hard to define) limit on the size of the segment, and perhaps fail by returning out of order docs silently? (The client case in question is a cluster that is approaching half a trillion documents...) > Faster sorting for the /export handler > -- > > Key: SOLR-14608 > URL: https://issues.apache.org/jira/browse/SOLR-14608 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Andrzej Bialecki >Priority: Major > > The largest cost of the export handler is the sorting. This ticket will > implement an improved algorithm for sorting that should greatly increase > overall throughput for the export handler. > *The current algorithm is as follows:* > Collect a bitset of matching docs. Iterate over that bitset and materialize > the top level oridinals for the sort fields in the document and add them to > priority queue of size 3. Then export the top 3 docs, turn off the > bits in the bit set and iterate again until all docs are sorted and sent. > There are two performance bottlenecks with this approach: > 1) Materializing the top level ordinals adds a huge amount of overhead to the > sorting process. > 2) The size of priority queue, 30,000, adds significant overhead to sorting > operations. > *The new algorithm:* > Has a top level *merge sort iterator* that wraps segment level iterators that > perform segment level priority queue sorts. > *Segment level:* > The segment level docset will be iterated and the segment level ordinals for > the sort fields will be materialized and added to a segment level priority > queue. As the segment level iterator pops docs from the priority queue the > top level ordinals for the sort fields are materialized. Because the top > level ordinals are materialized AFTER the sort, they only need to be looked > up when the segment level ordinal changes. This takes advantage of the sort > to limit the lookups into the top level ordinal structures. This also > eliminates redundant lookups of top level ordinals that occur during the > multiple passes over the matching docset. > The segment level priority queues can be kept smaller than 30,000 to improve > performance of the sorting operations because the overall batch size will > still be 30,000 or greater when all the segment priority queue sizes are > added up. This allows for batch sizes much larger then 30,000 without using a > single large priority queue. The increased batch size means fewer iterations > over the matching docset and the decreased priority queue size means faster > sorting operations. > *Top level:* > A top level iterator does a merge sort over the segment level iterators by > comparing the top level ordinals materialized when the segment level docs are > popped from the segment level priority queues. This requires no extra memory > and will be very performant. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14608) Faster sorting for the /export handler
[ https://issues.apache.org/jira/browse/SOLR-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165712#comment-17165712 ] Gus Heck commented on SOLR-14608: - A question from a customer caused me to re-read this and think a bit more deeply. I'm wondering about the fact that the priority queue has a limit on it's size. This would seem to place a (hard to define) limit on the size of the segment, and perhaps fail by returning out of order docs silently? (The client case in question is a cluster that is approaching half a trillion documents...) > Faster sorting for the /export handler > -- > > Key: SOLR-14608 > URL: https://issues.apache.org/jira/browse/SOLR-14608 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Andrzej Bialecki >Priority: Major > > The largest cost of the export handler is the sorting. This ticket will > implement an improved algorithm for sorting that should greatly increase > overall throughput for the export handler. > *The current algorithm is as follows:* > Collect a bitset of matching docs. Iterate over that bitset and materialize > the top level oridinals for the sort fields in the document and add them to > priority queue of size 3. Then export the top 3 docs, turn off the > bits in the bit set and iterate again until all docs are sorted and sent. > There are two performance bottlenecks with this approach: > 1) Materializing the top level ordinals adds a huge amount of overhead to the > sorting process. > 2) The size of priority queue, 30,000, adds significant overhead to sorting > operations. > *The new algorithm:* > Has a top level *merge sort iterator* that wraps segment level iterators that > perform segment level priority queue sorts. > *Segment level:* > The segment level docset will be iterated and the segment level ordinals for > the sort fields will be materialized and added to a segment level priority > queue. As the segment level iterator pops docs from the priority queue the > top level ordinals for the sort fields are materialized. Because the top > level ordinals are materialized AFTER the sort, they only need to be looked > up when the segment level ordinal changes. This takes advantage of the sort > to limit the lookups into the top level ordinal structures. This also > eliminates redundant lookups of top level ordinals that occur during the > multiple passes over the matching docset. > The segment level priority queues can be kept smaller than 30,000 to improve > performance of the sorting operations because the overall batch size will > still be 30,000 or greater when all the segment priority queue sizes are > added up. This allows for batch sizes much larger then 30,000 without using a > single large priority queue. The increased batch size means fewer iterations > over the matching docset and the decreased priority queue size means faster > sorting operations. > *Top level:* > A top level iterator does a merge sort over the segment level iterators by > comparing the top level ordinals materialized when the segment level docs are > popped from the segment level priority queues. This requires no extra memory > and will be very performant. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck reassigned SOLR-13169: --- Assignee: Gus Heck > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12847) Cut over implementation of maxShardsPerNode to a collection policy
[ https://issues.apache.org/jira/browse/SOLR-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161571#comment-17161571 ] Gus Heck edited comment on SOLR-12847 at 7/20/20, 10:05 PM: FWIW when I was investigating and fixing the MOVEREPLICA docs I found that maxShardsPerNode is advisory only once the collection is created and in not a hard limit. If destination node is specified in ADDREPLICA it will force placement above the limit and move replica always issues its add with a specified value for the node. Thus, MOVEREPLICA entirely ignores maxShardsPerNode. SOLR-13169 was (Author: gus_heck): FWIW when I was investigating and fixing the MOVEREPLICA docs I found that maxShardsPerNode is advisory only once the collection is created and in not a hard limit. If destination node is specified in ADDREPLICA it will force placement above the limit and move replica always issues its add with a specified value for the node. Thus, MOVEREPLICA entirely ignores maxShardsPerNode. > Cut over implementation of maxShardsPerNode to a collection policy > -- > > Key: SOLR-12847 > URL: https://issues.apache.org/jira/browse/SOLR-12847 > Project: Solr > Issue Type: Bug > Components: AutoScaling, SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > We've back and forth over handling maxShardsPerNode with autoscaling policies > (see SOLR-11005 for history). Now that we've reimplemented support for > creating collections with maxShardsPerNode when autoscaling policy is > enabled, we should re-look at how it is implemented. > I propose that we fold maxShardsPerNode (if specified) to a collection level > policy that overrides the corresponding default in cluster policy (see > SOLR-12845). We'll need to ensure that if maxShardsPerNode is specified then > the user sees neither violations nor corresponding suggestions because of the > default cluster policy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12847) Cut over implementation of maxShardsPerNode to a collection policy
[ https://issues.apache.org/jira/browse/SOLR-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161571#comment-17161571 ] Gus Heck commented on SOLR-12847: - FWIW when I was investigating and fixing the MOVEREPLICA docs I found that maxShardsPerNode is advisory only once the collection is created and in not a hard limit. If destination node is specified in ADDREPLICA it will force placement above the limit and move replica always issues its add with a specified value for the node. Thus, MOVEREPLICA entirely ignores maxShardsPerNode. > Cut over implementation of maxShardsPerNode to a collection policy > -- > > Key: SOLR-12847 > URL: https://issues.apache.org/jira/browse/SOLR-12847 > Project: Solr > Issue Type: Bug > Components: AutoScaling, SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > We've back and forth over handling maxShardsPerNode with autoscaling policies > (see SOLR-11005 for history). Now that we've reimplemented support for > creating collections with maxShardsPerNode when autoscaling policy is > enabled, we should re-look at how it is implemented. > I propose that we fold maxShardsPerNode (if specified) to a collection level > policy that overrides the corresponding default in cluster policy (see > SOLR-12845). We'll need to ensure that if maxShardsPerNode is specified then > the user sees neither violations nor corresponding suggestions because of the > default cluster policy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14597) Advanced Query Parser
[ https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155587#comment-17155587 ] Gus Heck commented on SOLR-14597: - After some work came up with this which omits files that don't have "java" in their name, but should give a decent idea: {code:java} NS2-MacBook-Pro:lucene-solr-cdg3 gus$ git diff HEAD..master_head | grep 'diff ..git' | grep java |sed 's#b/#@#' | rev | cut -d'@' -f 1 | rev gradle/generation/javacc.gradle lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/DropIfFlaggedFilter.java lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/DropIfFlaggedFilterFactory.java lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/PatternTypingFilter.java lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/PatternTypingFilterFactory.java lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/TypeAsSynonymFilter.java lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/TypeAsSynonymFilterFactory.java lucene/analysis/common/src/test/org/apache/lucene/analysis/minhash/MinHashFilterTest.java lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestConcatenatingTokenStream.java lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestDropIfFlaggedFilter.java lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestDropIfFlaggedFilterFactory.java lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestPatternTypingFilter.java lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestPatternTypingFilterFactory.java lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestTypeAsSynonymFilter.java lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestTypeAsSynonymFilterFactory.java lucene/core/src/test/org/apache/lucene/analysis/TestStopFilter.java lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java solr/core/src/java/org/apache/solr/analysis/TokenAnalyzerFilter.java solr/core/src/java/org/apache/solr/analysis/TokenAnalyzerFilterFactory.java solr/core/src/java/org/apache/solr/aqp/AdvToken.java solr/core/src/java/org/apache/solr/aqp/AdvancedQueryParserBase.java solr/core/src/java/org/apache/solr/aqp/ParseException.java solr/core/src/java/org/apache/solr/aqp/QueryParser.java solr/core/src/java/org/apache/solr/aqp/QueryParser.jj solr/core/src/java/org/apache/solr/aqp/QueryParserConstants.java solr/core/src/java/org/apache/solr/aqp/QueryParserTokenManager.java solr/core/src/java/org/apache/solr/aqp/SpanContext.java solr/core/src/java/org/apache/solr/aqp/Token.java solr/core/src/java/org/apache/solr/aqp/TokenMgrError.java solr/core/src/java/org/apache/solr/aqp/package-info.java solr/core/src/java/org/apache/solr/parser/Operator.java solr/core/src/java/org/apache/solr/parser/QueryParser.java solr/core/src/java/org/apache/solr/parser/QueryParser.jj solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java solr/core/src/java/org/apache/solr/parser/SynonymQueryStyle.java solr/core/src/java/org/apache/solr/schema/IndexSchema.java solr/core/src/java/org/apache/solr/schema/TextField.java solr/core/src/java/org/apache/solr/search/AdvancedQParser.java solr/core/src/java/org/apache/solr/search/AdvancedQParserPlugin.java solr/core/src/java/org/apache/solr/search/AdvancedQueryParser.java solr/core/src/java/org/apache/solr/search/ComplexPhraseQParserPlugin.java solr/core/src/java/org/apache/solr/search/DisMaxQParser.java solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java solr/core/src/java/org/apache/solr/search/QParserPlugin.java solr/core/src/java/org/apache/solr/search/QueryParsing.java solr/core/src/java/org/apache/solr/search/SimpleQParserPlugin.java solr/core/src/java/org/apache/solr/util/SolrPluginUtils.java solr/core/src/test/org/apache/solr/analysis/PatternTypingFilterFactoryTest.java solr/core/src/test/org/apache/solr/analysis/TokenAnalyzerFilterFactoryTest.java solr/core/src/test/org/apache/solr/aqp/AbstractAqpTestCase.java solr/core/src/test/org/apache/solr/aqp/CharacterRangeTest.java solr/core/src/test/org/apache/solr/aqp/FieldedSearchTest.java solr/core/src/test/org/apache/solr/aqp/LiteralPhraseTest.java solr/core/src/test/org/apache/solr/aqp/MustNotTest.java solr/core/src/test/org/apache/solr/aqp/MustTest.java solr/core/src/test/org/apache/solr/aqp/NumericSearchTest.java solr/core/src/test/org/apache/solr/aqp/OrderedDistanceGroupTest.java solr/core/src/test/org/apache/solr/aqp/PhraseTest.java solr/core/src/test/org/apache/solr/aqp/ShouldTest.java solr/core/src/test/org/apache/solr/aqp/SimpleGroupTest.java solr/core/src/test/org/apache/solr/aqp/SimpleQueryTest.java solr/core/src/test/org/apache/solr/aqp/TemporalFieldedSearchTest.java
[jira] [Commented] (SOLR-14597) Advanced Query Parser
[ https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150438#comment-17150438 ] Gus Heck commented on SOLR-14597: - This thought has occurred to me, but the coordination of several parts across both Lucene and Solr layers seems awkward for a packge/plugin (Solr parser, a couple new Lucene filters, etc), and I do hope that it is a generally useful parser as you mention. When we sort out the legalities and get a patch up this will become more clear, but generally it adds another javacc based parser, that was based on and is able to reuse some bits of the standard parser (a few of which needed to be extracted/or made accessible). There are also a few small tweaks to core classes, (which seem justified to me, but of course review and commentary is welcome). So even if a package/plugin is part of the final result we will likely have some changes to Solr & Lucene directly as well. > Advanced Query Parser > - > > Key: SOLR-14597 > URL: https://issues.apache.org/jira/browse/SOLR-14597 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 8.6 >Reporter: Mike Nibeck >Assignee: Gus Heck >Priority: Major > > This JIRA ticket tracks the progress of SIP-9, the Advanced Query Parser that > is being donated by the Library of Congress. Full description of the feature > can be found on the SIP Page. > [https://cwiki.apache.org/confluence/display/SOLR/SIP-9+Advanced+Query+Parser] > Briefly, this parser provides a comprehensive syntax for users that use > search on a daily basis. It also reserves a smaller set of punctuators than > other parsers. This facilitates easier handling of acronyms and punctuated > patterns with meaning ( such as C++ or 401(k) ). The new syntax opens up some > advanced features while also preventing access to arbitrary features via > local parameters. This parser will be safe for accepting user queries > directly with minimal pre-parsing, but for use cases beyond it's established > features alternate query paths (using other parsers) will need to be supplied. > The code drop is being prepared and will be supplied as soon as we receive > guidance from the PMC regarding the proper process. Given that the Library > already has a signed CCLA we need to understand which of these (or other > processes) apply: > [http://incubator.apache.org/ip-clearance/ip-clearance-template.html] > and > [https://www.apache.org/licenses/contributor-agreements.html#grants] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14022) Deprecate CDCR from Solr in 8.x
[ https://issues.apache.org/jira/browse/SOLR-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149600#comment-17149600 ] Gus Heck commented on SOLR-14022: - Do we want to think about whether it's a good idea to have yet another set of technologies/servers to deploy to make solr work (fully)? Pulsar uses in Bookeeper and it looks like pulsar gets deployed as its own cluster. So this would lead to a minimum of 4 Solrs, 3 zookeepers and 3 pulsars for HA across regions. I've met clients who had trouble with the idea of separate zookeeper servers... Another thing I'd like to say is a weakness of the existing CDCR is the use of collection names in config, which made it incompatible with routed aliases (where collections are added dynamically). A solution relying on external tools seems even less likely to be able to account for that. > Deprecate CDCR from Solr in 8.x > --- > > Key: SOLR-14022 > URL: https://issues.apache.org/jira/browse/SOLR-14022 > Project: Solr > Issue Type: Improvement > Components: CDCR >Reporter: Joel Bernstein >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.6 > > > This ticket will deprecate CDCR in Solr 8x. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13286) Move Metrics handler and any other noisy admin logging to debug
[ https://issues.apache.org/jira/browse/SOLR-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck resolved SOLR-13286. - Fix Version/s: 8.6 Resolution: Fixed > Move Metrics handler and any other noisy admin logging to debug > --- > > Key: SOLR-13286 > URL: https://issues.apache.org/jira/browse/SOLR-13286 > Project: Solr > Issue Type: Improvement > Components: logging >Affects Versions: master (9.0) >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Minor > Fix For: 8.6 > > Attachments: SOLR-13286.patch, SOLR-13286.patch > > > Lately when looking at log files I always find myself straining and squinting > to see things among a vast sea of metrics related logging. The problem > appears to be that the metrics system regularly issues /admin/ commands that > get logged at info by HttpSolrCall, so turning this down also means you can't > see any other admin commands, which is often what you're looking for in the > first place (ok what I'm often looking for at least :) ). I also recall > seeing a complaint about this on one of the lists at some point. > Attaching patch to log at an alternate level these based on the value of the > handler field in HttpSolrCall. Patch is untested and meant as fodder for > commentary and for suggestions of other handlers that might want to go on the > "noisy" list. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14597) Advanced Query Parser
[ https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck reassigned SOLR-14597: --- Assignee: Gus Heck > Advanced Query Parser > - > > Key: SOLR-14597 > URL: https://issues.apache.org/jira/browse/SOLR-14597 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 8.6 >Reporter: Mike Nibeck >Assignee: Gus Heck >Priority: Major > > This JIRA ticket tracks the progress of SIP-, the Advanced Query Parser that > is being donated by the Library of Congress. Full description of the feature > can be found on the SIP Page. > > Briefly, this parser provides a comprehensive syntax for users that use > search on a daily basis. It also reserves a smaller set of punctuators than > other parsers. This facilitates easier handling of acronyms and punctuated > patterns with meaning ( such as C++ or 401(k) ). The new syntax opens up some > advanced features while also preventing access to arbitrary features via > local parameters. This parser will be safe for accepting user queries > directly with minimal pre-parsing, but for use cases beyond it's established > features alternate query paths (using other parsers) will need to be supplied. > The code drop is being prepared and will be supplied as soon as we receive > guidance from the PMC regarding the proper process. Given that the Library > already has a signed CCLA we need to understand which of these (or other > processes) apply: > [http://incubator.apache.org/ip-clearance/ip-clearance-template.html] > and > [https://www.apache.org/licenses/contributor-agreements.html#grants] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14588) Circuit Breakers Infrastructure and Real JVM Based Circuit Breaker
[ https://issues.apache.org/jira/browse/SOLR-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145873#comment-17145873 ] Gus Heck commented on SOLR-14588: - [http://fucit.org/solr-jenkins-reports/failure-report.html] - failing 100% since this commit I think > Circuit Breakers Infrastructure and Real JVM Based Circuit Breaker > -- > > Key: SOLR-14588 > URL: https://issues.apache.org/jira/browse/SOLR-14588 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Atri Sharma >Assignee: Atri Sharma >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > This Jira tracks addition of circuit breakers in the search path and > implements JVM based circuit breaker which rejects incoming search requests > if the JVM heap usage exceeds a defined percentage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9411) Fail complation on warnings
[ https://issues.apache.org/jira/browse/LUCENE-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142108#comment-17142108 ] Gus Heck commented on LUCENE-9411: -- There seem to be a number of Apache projects that have found (or ignored) the answer to this dilemma. https://issues.apache.org/jira/issues/?jql=text%20~%20%22spotbugs%22 Zookeeper is Apache, we should probably ask them about it. My guess is that since it is LGPL not GPL this applies... https://www.apache.org/legal/resolved.html#build-tools This and the above search implies there is (or should be) an exception for this tool already approved. > Fail complation on warnings > --- > > Key: LUCENE-9411 > URL: https://issues.apache.org/jira/browse/LUCENE-9411 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Labels: build > Attachments: LUCENE-9411.patch, LUCENE-9411.patch, LUCENE-9411.patch, > annotations-warnings.patch > > > Moving this over here from SOLR-11973 since it's part of the build system and > affects Lucene as well as Solr. You might want to see the discussion there. > We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, > try, etc. warnings. There are some peculiar warnings (things like > SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's > assume those are not a problem. Now I'd like to start failing the compilation > if people write new code that generates warnings. > From what I can tell, just adding the flag is easy in both the Gradle and Ant > builds. I still have to prove out that adding -Werrors does what I expect, > i.e. succeeds now and fails when I introduce warnings. > But let's assume that works. Are there objections to this idea generally? I > hope to have some data by next Monday. > FWIW, the Lucene code base had far fewer issues than Solr, but > common-build.xml is in Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9411) Fail complation on warnings
[ https://issues.apache.org/jira/browse/LUCENE-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141571#comment-17141571 ] Gus Heck commented on LUCENE-9411: -- Do we have other category X ([https://www.apache.org/legal/resolved.html#category-x]) libraries that we have used at compile time only? it seems (to me, IANAL, etc) like licensing is only an issue for what we distribute, but distributing a build that automatically downloads it might be a grey area, especially if it's not associated with an optional feature, which is the case clearly outline din the license/legal page. > Fail complation on warnings > --- > > Key: LUCENE-9411 > URL: https://issues.apache.org/jira/browse/LUCENE-9411 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Labels: build > Attachments: LUCENE-9411.patch, LUCENE-9411.patch, LUCENE-9411.patch, > annotations-warnings.patch > > > Moving this over here from SOLR-11973 since it's part of the build system and > affects Lucene as well as Solr. You might want to see the discussion there. > We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, > try, etc. warnings. There are some peculiar warnings (things like > SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's > assume those are not a problem. Now I'd like to start failing the compilation > if people write new code that generates warnings. > From what I can tell, just adding the flag is easy in both the Gradle and Ant > builds. I still have to prove out that adding -Werrors does what I expect, > i.e. succeeds now and fails when I introduce warnings. > But let's assume that works. Are there objections to this idea generally? I > hope to have some data by next Monday. > FWIW, the Lucene code base had far fewer issues than Solr, but > common-build.xml is in Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136761#comment-17136761 ] Gus Heck commented on SOLR-13749: - 8.6 is now [being scheduled|https://mail-archives.apache.org/mod_mbox/lucene-dev/202006.mbox/browser], so it's probably important to get any last documentation or touch up for this so it can be merged and included in the release. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Blocker > Fix For: 8.6 > > Attachments: 2020-03 Smiley with ASF hat.jpeg > > Time Spent: 2h 10m > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF queries may give inaccurate > results. > After the ttl period has expired, the XCJF query will re-execute the join > against the remote collection.| > |_All others_| |Any normal Solr parameter can also be specified as a local > param.| > > Example Solr Config.xml changes: > > {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}} > {{ }}{{class}}{{=}}{{"solr.LRUCache"}} > {{ }}{{size}}{{=}}{{"128"}} > {{ }}{{initialSize}}{{=}}{{"0"}} > {{ }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}} > >
[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134939#comment-17134939 ] Gus Heck commented on SOLR-13169: - Corrections from another read through, and documentation for other parameters. Choosing not to document `waitForFinalState` at this time because it's unclear what value it has. This command already has a wait for the completion of the add command and causing the add command to wait/block on it's own doesn't seem useful (alternately, my understanding of that parameter is flawed and I shouldn't write it into the docs). Opened SOLR-14568 which may change the docs for timeout slightly. This turned into a lot more than originally anticipated, so attaching a patch summarizing changes to the ref guide in case that helps folks look over what I've done. Given no objections I'll port whatever applies to 8.x down to 8.x next weekend (and fix any objections). > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-13169: Attachment: SOLR-13169.patch > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14568) org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica uses hard coded timeout
Gus Heck created SOLR-14568: --- Summary: org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica uses hard coded timeout Key: SOLR-14568 URL: https://issues.apache.org/jira/browse/SOLR-14568 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: master (9.0) Reporter: Gus Heck org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica gained a hardcoded timeout in SOLR-11045 but there is no clear reason discussed in that ticket and no comment in the code to indicate why it is ignoring the value of the timeout parameter already passed into that method. This should be clarified in code and documented ([~caomanhdat]?) or the timeout parameter should be supported. It sure seems like we should support the api parameter but from the pattern of commits this looks potentially intentional and has survived several revisions, so I hesitate to just change it without input/confirmation. If this can be clarified soon, I'll document the result it in SOLR-131699, otherwise I'll just document the state as it is, and the docs can be updated if there are changes resulting from this ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (SOLR-14417) Gradle build sometimes fails RE BlockPoolSlice
[ https://issues.apache.org/jira/browse/SOLR-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck reopened SOLR-14417: - > Gradle build sometimes fails RE BlockPoolSlice > -- > > Key: SOLR-14417 > URL: https://issues.apache.org/jira/browse/SOLR-14417 > Project: Solr > Issue Type: Task > Components: Build >Reporter: David Smiley >Priority: Minor > > There seems to be some package visibility hacks around our Hdfs integration: > {{/Users/dsmiley/SearchDev/lucene-solr/solr/core/src/test/org/apache/solr/cloud/hdfs/HdfsTestUtil.java:125: > error: BlockPoolSlice is not public in > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl; cannot be accessed > from outside package}} > {{List> modifiedHadoopClasses = Arrays.asList(BlockPoolSlice.class, > DiskChecker.class,}} > This happens on my Gradle build when running {{gradlew testClasses}} (i.e. to > compile tests) but Ant proceeded without issue. The work-around is to run > {{gradlew clean}} first but really I want our build to be smarter here. > CC [~krisden] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14417) Gradle build sometimes fails RE BlockPoolSlice
[ https://issues.apache.org/jira/browse/SOLR-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134845#comment-17134845 ] Gus Heck commented on SOLR-14417: - I just hit this when running a test via Intellij (which is using gradle). My IDE tells me we have our own version of this class that is public, but when I search classes in Intellij, it shows me that it can find both our version and a version of the class in hadoop-hdfs-3.2.0.jar ... the latter of which is not public. This appears to be a classpath ordering inconsistency... > Gradle build sometimes fails RE BlockPoolSlice > -- > > Key: SOLR-14417 > URL: https://issues.apache.org/jira/browse/SOLR-14417 > Project: Solr > Issue Type: Task > Components: Build >Reporter: David Smiley >Priority: Minor > > There seems to be some package visibility hacks around our Hdfs integration: > {{/Users/dsmiley/SearchDev/lucene-solr/solr/core/src/test/org/apache/solr/cloud/hdfs/HdfsTestUtil.java:125: > error: BlockPoolSlice is not public in > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl; cannot be accessed > from outside package}} > {{List> modifiedHadoopClasses = Arrays.asList(BlockPoolSlice.class, > DiskChecker.class,}} > This happens on my Gradle build when running {{gradlew testClasses}} (i.e. to > compile tests) but Ant proceeded without issue. The work-around is to run > {{gradlew clean}} first but really I want our build to be smarter here. > CC [~krisden] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127368#comment-17127368 ] Gus Heck commented on SOLR-13169: - hmm, maybe we should document this tidbit: {code:java} if (createNodeList != null) { // Overrides petty considerations about maxShardsPerNode {code} It does indeed seem to be the case that MOVEREPLICA can be used to violate maxShardsPerNode. This happens in the add replica code that move replica invokes, so while this was intentional there, it's not clear if it's a bug with respect to move replica... I created a collection with 1 per node, and moved a replica to one of the other nodes successfully: !screenshot-1.png! > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: screenshot-1.png, testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-13169: Attachment: screenshot-1.png > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: screenshot-1.png, testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127217#comment-17127217 ] Gus Heck commented on SOLR-13169: - Looks like SourceNode is also ignored if replica is supplied, and the node chosen is done by this code: {code:java} Collections.shuffle(sliceReplicas, OverseerCollectionMessageHandler.RANDOM); replica = sliceReplicas.iterator().next(); {code} Neither {{CollectionOperation.MOVEREPLICA_OP}} nor {{ModifyCollectionCommand#moveReplica}} appear have code consulting auto-scaling, but I'm still trying to sort out whether or not the eventual sub call to {{AddReplicaCmd#addReplica}} can then be influenced by auto-scaling in some way. If so I think we'd have a bug with the current design, though if so it would also seem that the destination node could have been optional (with that usage meaning "find the optimal place for this replica and make it so"). > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127112#comment-17127112 ] Gus Heck commented on SOLR-13169: - Still to do: # document additional parameters # validate the extent to which auto-scaling is involved... with targetNode being required I am skeptical that auto-scaling is involved > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127069#comment-17127069 ] Gus Heck commented on SOLR-13169: - Test log showing that only one of replica or shard is required and replica has priority. In cases where replica is supplied and shard is ambiguous (more than one replica for the shard on the node) the command chooses one but the criteria of that choice are not yet clear. > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck updated SOLR-13169: Attachment: testing.txt > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > Attachments: testing.txt > > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127026#comment-17127026 ] Gus Heck commented on SOLR-13169: - And I can move it back again without the shard param like so: {code} http://localhost:8983/solr/admin/collections?action=MOVEREPLICA=test=192.168.2.171:8983_solr=192.168.2.171:8982_solr=core_node6 { "responseHeader": { "status": 0, "QTime": 3668 }, "success": "MOVEREPLICA action completed successfully, moved replica=test_shard1_replica_n5 at node=192.168.2.171:8982_solr to replica=test_shard1_replica_n7 at node=192.168.2.171:8983_solr" } {code} > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)
[ https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127023#comment-17127023 ] Gus Heck commented on SOLR-13169: - Bumping into this again... This (without replica param) succeeded, so replica is not always required: {code}http://localhost:8983/solr/admin/collections?action=MOVEREPLICA=test=192.168.2.171:8982_solr=192.168.2.171:8983_solr=shard1 { "responseHeader": { "status": 0, "QTime": 5060 }, "success": "MOVEREPLICA action completed successfully, moved replica=test_shard1_replica_n1 at node=192.168.2.171:8983_solr to replica=test_shard1_replica_n5 at node=192.168.2.171:8982_solr" } {code} > Move Replica Docs need improvement (V1 and V2 introspect) > - > > Key: SOLR-13169 > URL: https://issues.apache.org/jira/browse/SOLR-13169 > Project: Solr > Issue Type: Improvement > Components: v2 API >Reporter: Gus Heck >Priority: Major > > At a minimum required parameters should be noted equally in both places. > Conversation with [~ab] indicates that there are also some discrepancies in > what is and is not actually required in docs vs code. ("in MoveReplicaCmd if > you specify “replica” then “shard” is completely ignored") > Also in v2 it seems shard might be inferred from the URL and in that case > it's not clear if the URL or the json takes precedence. > From introspect: > {code:java} > "move-replica": { > "type": "object", > "documentation": > "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;, > "description": "This command moves a replica from one > node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` > may be reused.", > "properties": { > "replica": { > "type": "string", > "description": "The name of the replica" > }, > "shard": { > "type": "string", > "description": "The name of the shard" > }, > "sourceNode": { > "type": "string", > "description": "The name of the node that > contains the replica." > }, > "targetNode": { > "type": "string", > "description": "The name of the destination node. > This parameter is required." > }, > "waitForFinalState": { > "type": "boolean", > "default": "false", > "description": "Wait for the moved replica to > become active." > }, > "timeout": { > "type": "integer", > "default": 600, > "description": "Timeout to wait for replica to > become active. For very large replicas this may need to be increased." > }, > "inPlaceMove": { > "type": "boolean", > "default": "true", > "description": "For replicas that use shared > filesystems allow 'in-place' move that reuses shared data." > } > {code} > From ref guide for V1: > MOVEREPLICA Parameters > collection > The name of the collection. This parameter is required. > shard > The name of the shard that the replica belongs to. This parameter is required. > replica > The name of the replica. This parameter is required. > sourceNode > The name of the node that contains the replica. This parameter is required. > targetNode > The name of the destination node. This parameter is required. > async > Request ID to track this action which will be processed asynchronously. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13458) Make Jetty timeouts configurable system wide
[ https://issues.apache.org/jira/browse/SOLR-13458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125908#comment-17125908 ] Gus Heck commented on SOLR-13458: - Not at the time, that particular work got abandoned by the customer soon after I wrote this and I didn't dig further, but I'm actually once again bumping into timeouts (in a cluster with many billions of docs) so I may soon. > Make Jetty timeouts configurable system wide > > > Key: SOLR-13458 > URL: https://issues.apache.org/jira/browse/SOLR-13458 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Affects Versions: master (9.0) >Reporter: Gus Heck >Priority: Major > > Our jetty container has several timeouts associated with it, and at least one > of these is regularly getting in my way (the idle timeout after 120 sec). I > tried setting a system property, with no effect and I've tried altering a > jetty.xml found at solr-install/solr/server/etc/jetty.xml on all (50) > machines and rebooting all servers only to have an exception with the old 120 > sec timeout still show up. This ticket proposes that these values are by > nature "Global System Timeouts" and should be made configurable in solr.xml > (which may be difficult because they will be needed early in the boot > sequence). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113517#comment-17113517 ] Gus Heck edited comment on SOLR-13749 at 5/21/20, 8:26 PM: --- Let me clarify the above... some of it is forward looking in the event that the NPE I mentioned above gets changed, or some aspect of when we do/don't encode/decode URL's gets changed, etc... or in the event that there are parameter hacking/hiding/encoding tricks I didn't think of... HTTP is just too ubiquitous, and it initiates the connection with a path string of arbitrary size... the ZK protocol is only relevant to ZK servers and there is no way (that I know of) to make the initial zk connection send a lot of data. was (Author: gus_heck): Let me clarify the above... some of it is forward looking in the even that the NPE I mentioned above gets changed, or some aspect of when we do/don't encode/decode URL's gets changed, etc... or in the event that there are parameter hacking/hiding/encoding tricks I didn't think of... HTTP is just too ubiquitous, and it initiates the connection with a path string of arbitrary size... the ZK protocol is only relevant to ZK servers and there is no way (that I know of) to make the initial zk connection send a lot of data. > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Blocker > Fix For: 8.6 > > Attachments: 2020-03 Smiley with ASF hat.jpeg > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this >