[jira] [Commented] (LUCENE-9454) Upgrade hamcrest to version 2.2

2022-07-06 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563422#comment-17563422
 ] 

Gus Heck commented on LUCENE-9454:
--

The second commit listed here appears to be attributed to the wrong issue 
number, I was hoping to understand the motivation/upgrade path for this, but 
it's not discussed here. [~romseygeek] ?

> Upgrade hamcrest to version 2.2
> ---
>
> Key: LUCENE-9454
> URL: https://issues.apache.org/jira/browse/LUCENE-9454
> Project: Lucene - Core
>  Issue Type: Task
>Affects Versions: 9.0
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9575) Add PatternTypingFilter

2021-05-12 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck resolved LUCENE-9575.
--
Fix Version/s: 8.9
   Resolution: Implemented

> Add PatternTypingFilter
> ---
>
> Key: LUCENE-9575
> URL: https://issues.apache.org/jira/browse/LUCENE-9575
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
> Fix For: 8.9
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> One of the key asks when the Library of Congress was asking me to develop the 
> Advanced Query Parser was to be able to recognize arbitrary patterns that 
> included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they 
> wanted 401k and 401(k) to match documents with either style reference, and 
> NOT match documents that happen to have isolated 401 or k tokens (i.e. not 
> documents about the http status code) And of course we wanted to give up as 
> little of the text analysis features they were already using.
> This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and 
> one solr specific filter in SOLR-14597 that re-analyzes tokens with an 
> arbitrary analyzer defined for a type in the solr schema, combine to achieve 
> this. 
> This filter has the job of spotting the patterns, and adding the intended 
> synonym as at type to the token (from which minimal punctuation has been 
> removed). It also sets flags on the token which are retained through the 
> analysis chain, and at the very end the type is converted to a synonym and 
> the original token(s) for that type are dropped avoiding the match on 401 
> (for example) 
> The pattern matching is specified in a file that looks like: 
> {code}
> 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2
> 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3
> 2 C\+\+ ::: c_plus_plus
> {code}
> That file would match match legal reference patterns such as 401(k), 401k, 
> 501(c)3 and C++ The format is:
>   ::: 
> and groups in the pattern are substituted into the replacement so the first 
> line above would create synonyms such as:
> {code}
> 401k   --> legal2_401_k
> 401(k) --> legal2_401_k
> 503(c) --> legal2_503_c
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types

2021-04-29 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck resolved LUCENE-9572.
--
Fix Version/s: 8.9
   Resolution: Implemented

> Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types
> ---
>
> Key: LUCENE-9572
> URL: https://issues.apache.org/jira/browse/LUCENE-9572
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis, modules/test-framework
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
> Fix For: 8.9
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> (Breaking this off of SOLR-14597 for independent review)
> TypeAsSynonymFilter converts types attributes to a synonym. In some cases the 
> original token may have already had flags set on it and it may be useful to 
> propagate some or all of those flags to the synonym we are generating. This 
> ticket provides that ability and allows the user to specify a bitmask to 
> specify which flags are retained.
> Additionally there may be some set of types that should not be converted to 
> synonyms, and this change allows the user to specify a comma separated list 
> of types to ignore (most common case will be to ignore a common default type 
> of 'word' I suspect)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9943) DOC: Fix spelling(camelCase it like GitHub )

2021-04-28 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck resolved LUCENE-9943.
--
Fix Version/s: 9.0
   Resolution: Fixed

Thanks  :)

>  DOC: Fix spelling(camelCase it like GitHub )
> -
>
> Key: LUCENE-9943
> URL: https://issues.apache.org/jira/browse/LUCENE-9943
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: 8.8.1
>Reporter: AYUSHMAN SINGH CHAUHAN
>Priority: Minor
>  Labels: documentation
> Fix For: 9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> docs update => spelling: github



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9574) Add a token filter to drop tokens based on flags.

2021-04-27 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated LUCENE-9574:
-
Fix Version/s: 8.9

> Add a token filter to drop tokens based on flags.
> -
>
> Key: LUCENE-9574
> URL: https://issues.apache.org/jira/browse/LUCENE-9574
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
> Fix For: 8.9
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> (Breaking this off of SOLR-14597 for independent review)
> A filter that tests flags on tokens vs a bitmask and drops tokens that have 
> all specified flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9574) Add a token filter to drop tokens based on flags.

2021-04-27 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck resolved LUCENE-9574.
--
Resolution: Implemented

> Add a token filter to drop tokens based on flags.
> -
>
> Key: LUCENE-9574
> URL: https://issues.apache.org/jira/browse/LUCENE-9574
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> (Breaking this off of SOLR-14597 for independent review)
> A filter that tests flags on tokens vs a bitmask and drops tokens that have 
> all specified flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser

2021-02-25 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17291072#comment-17291072
 ] 

Gus Heck commented on SOLR-14787:
-

[~jbernste] This operates at a token level not a document level. Fields and 
joins would filter at a document level. In the simple equals case the payload 
might be "noun" or "verb" string and you could search for documents where the 
word "set" was used as a "NOUN". One could also perhaps score tokens for 
"offensiveness" (or something else) and then encode that as a payload and match 
(or avoid matches) only if the tokens were more offensive than X... or 
vice-versa (that analysis could be context sensitive NLP based stuff). These 
sorts of things likely slow down and inflate the index but enable detailed 
token by token functionality not otherwise available.

> Inequality support in Payload Check query parser
> 
>
> Key: SOLR-14787
> URL: https://issues.apache.org/jira/browse/SOLR-14787
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The goal of this ticket/pull request is to support a richer set of matching 
> and filtering based on term payloads.  This patch extends the 
> PayloadCheckQueryParser to add a new local param for "op"
> The value of OP could be one of the following
>  * gt - greater than
>  * gte - greater than or equal
>  * lt - less than
>  * lte - less than or equal
> default value for "op" if not specified is to be the current behavior of 
> equals.
> Additionally to the operation you can specify a threshold local parameter
> This will provide the ability to search for the term "cat" so long as the 
> payload has a value of greater than 0.75.  
> One use case is to classify a document into various categories with an 
> associated confidence or probability that the classification is correct.  
> That can be indexed into a delimited payload field.  The searches can find 
> and match documents that were tagged with the "cat" category with a 
> confidence of greater than 0.5.
> Example Document
> {code:java}
> { 
>   "id":"doc_1",
>   "classifications_payload":["cat|0.75 dog|2.0"]
> }
> {code}
> Example Syntax
> {code:java}
> {!payload_check f=classifications_payload payloads='1' op='gt' 
> threshold='0.5'}cat  {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13696) DimensionalRoutedAliasUpdateProcessorTest / RoutedAliasUpdateProcessorTest failures due commitWithin/openSearcher delays

2021-02-21 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288112#comment-17288112
 ] 

Gus Heck commented on SOLR-13696:
-

Finally coming back to this. In retrospect I think this test was overzealous. 
"Commit within" is a feature that is really orthogonal to routed aliases and 
there's no good reason to believe that it would succeed or fail differently 
than a regular commit. Removing this aspect of the test simplifies the code, 
makes the test faster and probably costs us little or nothing in terms of 
safety. 

> DimensionalRoutedAliasUpdateProcessorTest / RoutedAliasUpdateProcessorTest 
> failures due commitWithin/openSearcher delays
> 
>
> Key: SOLR-13696
> URL: https://issues.apache.org/jira/browse/SOLR-13696
> Project: Solr
>  Issue Type: Test
>Reporter: Chris M. Hostetter
>Assignee: Gus Heck
>Priority: Major
> Attachments: thetaphi_Lucene-Solr-8.x-MacOSX_272.log.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recent jenkins failure...
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-MacOSX/272/
> Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC
> {noformat}
> Stack Trace:
> java.lang.AssertionError: expected:<16> but was:<15>
> at 
> __randomizedtesting.SeedInfo.seed([DB6DC28D5560B1D2:E295833E1541FDB9]:0)
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:631)
> at
> org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.assertCatTimeInvariants(DimensionalRoutedAliasUpdateProcessorTest.java:677
> )
> at 
> org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.testTimeCat(DimensionalRoutedAliasUpdateProcessorTest.java:282)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {noformat}
> Digging into the logs, the problem appears to be in the way the test 
> verifies/assumes docs have been committed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14787) Inequality support in Payload Check query parser

2021-02-21 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck resolved SOLR-14787.
-
Fix Version/s: master (9.0)
   Resolution: Implemented

> Inequality support in Payload Check query parser
> 
>
> Key: SOLR-14787
> URL: https://issues.apache.org/jira/browse/SOLR-14787
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The goal of this ticket/pull request is to support a richer set of matching 
> and filtering based on term payloads.  This patch extends the 
> PayloadCheckQueryParser to add a new local param for "op"
> The value of OP could be one of the following
>  * gt - greater than
>  * gte - greater than or equal
>  * lt - less than
>  * lte - less than or equal
> default value for "op" if not specified is to be the current behavior of 
> equals.
> Additionally to the operation you can specify a threshold local parameter
> This will provide the ability to search for the term "cat" so long as the 
> payload has a value of greater than 0.75.  
> One use case is to classify a document into various categories with an 
> associated confidence or probability that the classification is correct.  
> That can be indexed into a delimited payload field.  The searches can find 
> and match documents that were tagged with the "cat" category with a 
> confidence of greater than 0.5.
> Example Document
> {code:java}
> { 
>   "id":"doc_1",
>   "classifications_payload":["cat|0.75 dog|2.0"]
> }
> {code}
> Example Syntax
> {code:java}
> {!payload_check f=classifications_payload payloads='1' op='gt' 
> threshold='0.5'}cat  {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh

2021-02-17 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck resolved SOLR-14704.
-
Fix Version/s: 8.9
   Resolution: Fixed

> Add download option to solr/cloud-dev/cloud.sh
> --
>
> Key: SOLR-14704
> URL: https://issues.apache.org/jira/browse/SOLR-14704
> Project: Solr
>  Issue Type: New Feature
>  Components: scripts and tools
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
> Fix For: 8.9
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> For easier testing of things like RC artifacts I'm adding an option to 
> cloud.sh which will curl a tarball down from the web instead of building it 
> locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15160) Update cloud-dev/cloud.sh to work with gradle

2021-02-17 Thread Gus Heck (Jira)
Gus Heck created SOLR-15160:
---

 Summary: Update cloud-dev/cloud.sh to work with gradle
 Key: SOLR-15160
 URL: https://issues.apache.org/jira/browse/SOLR-15160
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: scripts and tools
Reporter: Gus Heck


Now that the gradle build is a bit more mature, we can update this tool to 
smooth the creation of testing clusters on the local machine for master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15125) Link to docs is brroken

2021-02-01 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276537#comment-17276537
 ] 

Gus Heck commented on SOLR-15125:
-

There has been some difficulty with deploying the docs for the recent release, 
several of the latest versions are presently not available on the web, this is 
being worked on urgently by several folks. 

> Link to docs is brroken
> ---
>
> Key: SOLR-15125
> URL: https://issues.apache.org/jira/browse/SOLR-15125
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: website
>Reporter: Thomas Güttler
>Priority: Minor
>
> [On this page: 
> https://lucene.apache.org/solr/guide/|https://lucene.apache.org/solr/guide/]
> the link to [https://lucene.apache.org/solr/guide/8_8/]
> is broken.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-7642) Should launching Solr in cloud mode using a ZooKeeper chroot create the chroot znode if it doesn't exist?

2021-01-29 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275457#comment-17275457
 ] 

Gus Heck commented on SOLR-7642:


ch does indeed mean change, but it's a reference to the unix chroot operation 
(https://en.wikipedia.org/wiki/Chroot). I think it should be createZkChRoot for 
consistency both with other documentation and with similar concepts at the OS 
level. 

For those familiar with chroot elsewhere it's read as "create zk chroot" 
meaning isolating the zk stuff to it's own sub-tree and preventing upward 
access. One could argue for not capitalizing the R, but I think we do 
capitalize elsewhere so best to be consistent.

> Should launching Solr in cloud mode using a ZooKeeper chroot create the 
> chroot znode if it doesn't exist?
> -
>
> Key: SOLR-7642
> URL: https://issues.apache.org/jira/browse/SOLR-7642
> Project: Solr
>  Issue Type: Improvement
>Reporter: Timothy Potter
>Priority: Minor
> Attachments: SOLR-7642.patch, SOLR-7642.patch, SOLR-7642.patch, 
> SOLR-7642.patch, SOLR-7642_tag_7.5.0.patch, 
> SOLR-7642_tag_7.5.0_proposition.patch
>
>
> If you launch Solr for the first time in cloud mode using a ZooKeeper 
> connection string that includes a chroot leads to the following 
> initialization error:
> {code}
> ERROR - 2015-06-05 17:15:50.410; [   ] org.apache.solr.common.SolrException; 
> null:org.apache.solr.common.cloud.ZooKeeperException: A chroot was specified 
> in ZkHost but the znode doesn't exist. localhost:2181/lan
> at 
> org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:113)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:339)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:140)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:110)
> at 
> org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:138)
> at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:852)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:298)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1349)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1342)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741)
> at 
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:505)
> {code}
> The work-around for this is to use the scripts/cloud-scripts/zkcli.sh script 
> to create the chroot znode (bootstrap action does this).
> I'm wondering if we shouldn't just create the znode if it doesn't exist? Or 
> is that some violation of using a chroot?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9696) RegExp with group references

2021-01-25 Thread Gus Heck (Jira)
Gus Heck created LUCENE-9696:


 Summary: RegExp with group references
 Key: LUCENE-9696
 URL: https://issues.apache.org/jira/browse/LUCENE-9696
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Gus Heck


PatternTypingFilter presently relies on java util regexes, but LUCENE-7465 
found performance benefits using our own RegExp class instead. Unfortunately 
RegExp does not currently report matching subgroups which is key to 
PatternTypingFilter's use (and probably useful in other endeavors as well).  
What's needed is reporting of sub-groups such that 

new RegExp("(foo(.+)")) -->> converted to run atomaton etc --> match found for 
"foobar" --> somehow reports getGroup(1) as "bar"

And getGroup() can be called on some object reasonably accessible to the code 
using RegExp in the first place.

Clearly there's a lot to be worked out there since the normal usage pattern 
converts things to a DFA / run Automaton etc, and subgroups are not a natural 
concept for those classes. But if this could be achieved without loosing the 
performance benefits, that would be interesting :).

Opening this Wish ticket as encouraged by [~mikemccand] in LUCENE-9575.  I 
won't be able to work on it any time soon to encourage anyone else interested 
to pick it up or to drop links or ideas in here. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter

2021-01-25 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271184#comment-17271184
 ] 

Gus Heck commented on LUCENE-9575:
--

ah thanks, though I was waiting on tests in github for 
[https://github.com/apache/lucene-solr/pull/2240]

> Add PatternTypingFilter
> ---
>
> Key: LUCENE-9575
> URL: https://issues.apache.org/jira/browse/LUCENE-9575
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> One of the key asks when the Library of Congress was asking me to develop the 
> Advanced Query Parser was to be able to recognize arbitrary patterns that 
> included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they 
> wanted 401k and 401(k) to match documents with either style reference, and 
> NOT match documents that happen to have isolated 401 or k tokens (i.e. not 
> documents about the http status code) And of course we wanted to give up as 
> little of the text analysis features they were already using.
> This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and 
> one solr specific filter in SOLR-14597 that re-analyzes tokens with an 
> arbitrary analyzer defined for a type in the solr schema, combine to achieve 
> this. 
> This filter has the job of spotting the patterns, and adding the intended 
> synonym as at type to the token (from which minimal punctuation has been 
> removed). It also sets flags on the token which are retained through the 
> analysis chain, and at the very end the type is converted to a synonym and 
> the original token(s) for that type are dropped avoiding the match on 401 
> (for example) 
> The pattern matching is specified in a file that looks like: 
> {code}
> 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2
> 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3
> 2 C\+\+ ::: c_plus_plus
> {code}
> That file would match match legal reference patterns such as 401(k), 401k, 
> 501(c)3 and C++ The format is:
>   ::: 
> and groups in the pattern are substituted into the replacement so the first 
> line above would create synonyms such as:
> {code}
> 401k   --> legal2_401_k
> 401(k) --> legal2_401_k
> 503(c) --> legal2_503_c
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter

2021-01-24 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270947#comment-17270947
 ] 

Gus Heck commented on LUCENE-9575:
--

Thanks for fixing that, yeah separate ticket for groups in RegExp would be 
cool, though when I'd find time for it is a question. I had googled around and 
I recall looking at some paper as well wonder if its the same :). However, I 
couldn't say the customer at the time really needed that so I had to set it 
aside. I'm interested in backporting this, and all of the related AQP stuff, 
but want to make sure a full set gets in master before I spend time on that. 
This also gets complicated by a strong desire by many to get 9x out the door 
and issues with Lucene 9.0 needing to support 8.9. Based on that perhaps I 
should revise my and get the Lucene bits backported asap.

> Add PatternTypingFilter
> ---
>
> Key: LUCENE-9575
> URL: https://issues.apache.org/jira/browse/LUCENE-9575
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> One of the key asks when the Library of Congress was asking me to develop the 
> Advanced Query Parser was to be able to recognize arbitrary patterns that 
> included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they 
> wanted 401k and 401(k) to match documents with either style reference, and 
> NOT match documents that happen to have isolated 401 or k tokens (i.e. not 
> documents about the http status code) And of course we wanted to give up as 
> little of the text analysis features they were already using.
> This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and 
> one solr specific filter in SOLR-14597 that re-analyzes tokens with an 
> arbitrary analyzer defined for a type in the solr schema, combine to achieve 
> this. 
> This filter has the job of spotting the patterns, and adding the intended 
> synonym as at type to the token (from which minimal punctuation has been 
> removed). It also sets flags on the token which are retained through the 
> analysis chain, and at the very end the type is converted to a synonym and 
> the original token(s) for that type are dropped avoiding the match on 401 
> (for example) 
> The pattern matching is specified in a file that looks like: 
> {code}
> 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2
> 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3
> 2 C\+\+ ::: c_plus_plus
> {code}
> That file would match match legal reference patterns such as 401(k), 401k, 
> 501(c)3 and C++ The format is:
>   ::: 
> and groups in the pattern are substituted into the replacement so the first 
> line above would create synonyms such as:
> {code}
> 401k   --> legal2_401_k
> 401(k) --> legal2_401_k
> 503(c) --> legal2_503_c
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14608) Faster sorting for the /export handler

2021-01-23 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270740#comment-17270740
 ] 

Gus Heck commented on SOLR-14608:
-

Having just gone through some cost minimization, the particular case may be 
undersized, and it wasn't a clean test, so not looking to trouble shoot that in 
a Jira ticket :), Just trying to understand the shape of the change in this 
ticket.

Would it be possible to quantify the memory cost here, I often find that one of 
the things making solr implementations difficult for several customers I've 
seen is the cost of fielding machines with enough memory. I have a client that 
has implemented very complex arrangements with spot machines to keep costs 
under control for example. 

If there's a way to trade memory vs speed, that's a great feature to have, but 
if the memory difference is large maybe it needs to be something the user can 
select? You mention options to tune this implementation, but I'm not seeing any 
documentation updates... Particularly important would be documentation of 
settings that offer similar memory usage to the previous implementation (even 
if they are not the default). 

> Faster sorting for the /export handler
> --
>
> Key: SOLR-14608
> URL: https://issues.apache.org/jira/browse/SOLR-14608
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: master (9.0)
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: master (9.0)
>
>
> The largest cost of the export handler is the sorting. This ticket will 
> implement an improved algorithm for sorting that should greatly increase 
> overall throughput for the export handler.
> *The current algorithm is as follows:*
> Collect a bitset of matching docs. Iterate over that bitset and materialize 
> the top level oridinals for the sort fields in the document and add them to 
> priority queue of size 3. Then export the top 3 docs, turn off the 
> bits in the bit set and iterate again until all docs are sorted and sent. 
> There are two performance bottlenecks with this approach:
> 1) Materializing the top level ordinals adds a huge amount of overhead to the 
> sorting process.
> 2) The size of priority queue, 30,000, adds significant overhead to sorting 
> operations.
> *The new algorithm:*
> Has a top level *merge sort iterator* that wraps segment level iterators that 
> perform segment level priority queue sorts.
> *Segment level:*
> The segment level docset will be iterated and the segment level ordinals for 
> the sort fields will be materialized and added to a segment level priority 
> queue. As the segment level iterator pops docs from the priority queue the 
> top level ordinals for the sort fields are materialized. Because the top 
> level ordinals are materialized AFTER the sort, they only need to be looked 
> up when the segment level ordinal changes. This takes advantage of the sort 
> to limit the lookups into the top level ordinal structures. This also 
> eliminates redundant lookups of top level ordinals that occur during the 
> multiple passes over the matching docset.
> The segment level priority queues can be kept smaller than 30,000 to improve 
> performance of the sorting operations because the overall batch size will 
> still be 30,000 or greater when all the segment priority queue sizes are 
> added up. This allows for batch sizes much larger then 30,000 without using a 
> single large priority queue. The increased batch size means fewer iterations 
> over the matching docset and the decreased priority queue size means faster 
> sorting operations.
> *Top level:*
> A top level iterator does a merge sort over the segment level iterators by 
> comparing the top level ordinals materialized when the segment level docs are 
> popped from the segment level priority queues. This requires no extra memory 
> and will be very performant.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14608) Faster sorting for the /export handler

2021-01-22 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270300#comment-17270300
 ] 

Gus Heck commented on SOLR-14608:
-

Came back to re-read this to fuel a better understanding of sort memory 
requirements after an OOM on a relatively simple query that should yield ~38k 
docs out of an 11 Billion doc corpus (but other stuff including data ingestion 
was going on, so it's not a clean case, just a bit of a surprise since I 
assumed that the sort memory would relate to the 38k docs, which seemed like it 
ought to be trivial, only a few fields were requested all numeric or short 
strings, probably ~0.25k/doc so maybe 8 Mb?). 

Did you ever investigate my prior question regarding queue size? And I'm also 
wondering if your algorithm is dependent on having a lot of segments, what if 
there's been a force-merge?

Above in your description of the current algorithm you say "turn off the bits 
in the bit set" I'm assuming this means just the bits for the docs that were 
"sent"? and when you say "sent" you mean sent to the coordinating node? 



> Faster sorting for the /export handler
> --
>
> Key: SOLR-14608
> URL: https://issues.apache.org/jira/browse/SOLR-14608
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: master (9.0)
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: master (9.0)
>
>
> The largest cost of the export handler is the sorting. This ticket will 
> implement an improved algorithm for sorting that should greatly increase 
> overall throughput for the export handler.
> *The current algorithm is as follows:*
> Collect a bitset of matching docs. Iterate over that bitset and materialize 
> the top level oridinals for the sort fields in the document and add them to 
> priority queue of size 3. Then export the top 3 docs, turn off the 
> bits in the bit set and iterate again until all docs are sorted and sent. 
> There are two performance bottlenecks with this approach:
> 1) Materializing the top level ordinals adds a huge amount of overhead to the 
> sorting process.
> 2) The size of priority queue, 30,000, adds significant overhead to sorting 
> operations.
> *The new algorithm:*
> Has a top level *merge sort iterator* that wraps segment level iterators that 
> perform segment level priority queue sorts.
> *Segment level:*
> The segment level docset will be iterated and the segment level ordinals for 
> the sort fields will be materialized and added to a segment level priority 
> queue. As the segment level iterator pops docs from the priority queue the 
> top level ordinals for the sort fields are materialized. Because the top 
> level ordinals are materialized AFTER the sort, they only need to be looked 
> up when the segment level ordinal changes. This takes advantage of the sort 
> to limit the lookups into the top level ordinal structures. This also 
> eliminates redundant lookups of top level ordinals that occur during the 
> multiple passes over the matching docset.
> The segment level priority queues can be kept smaller than 30,000 to improve 
> performance of the sorting operations because the overall batch size will 
> still be 30,000 or greater when all the segment priority queue sizes are 
> added up. This allows for batch sizes much larger then 30,000 without using a 
> single large priority queue. The increased batch size means fewer iterations 
> over the matching docset and the decreased priority queue size means faster 
> sorting operations.
> *Top level:*
> A top level iterator does a merge sort over the segment level iterators by 
> comparing the top level ordinals materialized when the segment level docs are 
> popped from the segment level priority queues. This requires no extra memory 
> and will be very performant.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-9659) Support inequality operations in payload check queries

2021-01-09 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck reassigned LUCENE-9659:


Assignee: Gus Heck

> Support inequality operations in payload check queries
> --
>
> Key: LUCENE-9659
> URL: https://issues.apache.org/jira/browse/LUCENE-9659
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is a ticket broken out from 
> https://issues.apache.org/jira/browse/SOLR-14787
> The patch will extend the SpanPayloadCheck query to support inequality checks 
> to see if the term and payload should match.  Currently, this query operator 
> only supports equals as the payload check.  This ticket introduces 
> gt,gte,lt,lte and eq operations to support testing if a payload is greater 
> than/less than a specified reference payload value.  One such use case is to 
> have a label on a document with a confidence level stored as a payload.  This 
> patch will support searching for the term where a confidence level is above a 
> given threshold.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide

2020-11-23 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237501#comment-17237501
 ] 

Gus Heck commented on SOLR-15014:
-

Actually I got brave and let it run longer, and it seems to stop after 30 
replicas have been created, leaving me with 31 replicas of shard 1 (and still 1 
of shard 2)

> Runaway replica creation with autoscaling example from ref guide
> 
>
> Key: SOLR-15014
> URL: https://issues.apache.org/jira/browse/SOLR-15014
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.3
>Reporter: Gus Heck
>Priority: Major
> Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, 
> image-2020-11-23-11-37-15-124.png
>
>
> Although the present autoscaling implementation is deprecated, I have a 
> client intent on using it, and in trying to create rules that ensure all 
> replicas on all nodes, I wound up getting into a state where one replica was 
> (apparently) infinitely creating new copies of itself. The boiled down steps 
> to reproduce:
> Create a 4 node cluster locally for testing from a checkout of the tagged 
> version for 8.6.3
> (Using solr/cloud-dev/cloud.sh)
> {code:java}
> ./cloud.sh  new -r   
> {code}
> Create a collection
> {code:java}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1
> {code}
> Add this trigger from the ref guide 
> ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):]
> {code:java}
> {
>   "set-trigger": {
> "name": "node_added_trigger",
> "event": "nodeAdded",
> "waitFor": "5s",
> "preferredOperation": "ADDREPLICA",
> "replicaType": "PULL"
>   }
> }
> {code}
> Reboot the cluster, and when it comes up infinite replica creation ensues 
> (attaching screen shot of admin UI showing replicated shard momentarily)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide

2020-11-23 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237495#comment-17237495
 ] 

Gus Heck commented on SOLR-15014:
-

Discussion on slack suggests that given the fact that this functionality is 
going away, the primary thing here will be to remove the example from the ref 
guide. (or if folks have an idea how to mitigate it with additional 
configuration, add that to the ref guide, but I haven't found such yet)

> Runaway replica creation with autoscaling example from ref guide
> 
>
> Key: SOLR-15014
> URL: https://issues.apache.org/jira/browse/SOLR-15014
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.3
>Reporter: Gus Heck
>Priority: Major
> Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, 
> image-2020-11-23-11-37-15-124.png
>
>
> Although the present autoscaling implementation is deprecated, I have a 
> client intent on using it, and in trying to create rules that ensure all 
> replicas on all nodes, I wound up getting into a state where one replica was 
> (apparently) infinitely creating new copies of itself. The boiled down steps 
> to reproduce:
> Create a 4 node cluster locally for testing from a checkout of the tagged 
> version for 8.6.3
> (Using solr/cloud-dev/cloud.sh)
> {code:java}
> ./cloud.sh  new -r   
> {code}
> Create a collection
> {code:java}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1
> {code}
> Add this trigger from the ref guide 
> ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):]
> {code:java}
> {
>   "set-trigger": {
> "name": "node_added_trigger",
> "event": "nodeAdded",
> "waitFor": "5s",
> "preferredOperation": "ADDREPLICA",
> "replicaType": "PULL"
>   }
> }
> {code}
> Reboot the cluster, and when it comes up infinite replica creation ensues 
> (attaching screen shot of admin UI showing replicated shard momentarily)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide

2020-11-23 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-15014:

Attachment: Screen Shot 2020-11-23 at 11.40.29 AM.png

> Runaway replica creation with autoscaling example from ref guide
> 
>
> Key: SOLR-15014
> URL: https://issues.apache.org/jira/browse/SOLR-15014
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.3
>Reporter: Gus Heck
>Priority: Major
> Attachments: Screen Shot 2020-11-23 at 11.40.29 AM.png, 
> image-2020-11-23-11-37-15-124.png
>
>
> Although the present autoscaling implementation is deprecated, I have a 
> client intent on using it, and in trying to create rules that ensure all 
> replicas on all nodes, I wound up getting into a state where one replica was 
> (apparently) infinitely creating new copies of itself. The boiled down steps 
> to reproduce:
> Create a 4 node cluster locally for testing from a checkout of the tagged 
> version for 8.6.3
> (Using solr/cloud-dev/cloud.sh)
> {code:java}
> ./cloud.sh  new -r   
> {code}
> Create a collection
> {code:java}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1
> {code}
> Add this trigger from the ref guide 
> ([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):]
> {code:java}
> {
>   "set-trigger": {
> "name": "node_added_trigger",
> "event": "nodeAdded",
> "waitFor": "5s",
> "preferredOperation": "ADDREPLICA",
> "replicaType": "PULL"
>   }
> }
> {code}
> Reboot the cluster, and when it comes up infinite replica creation ensues 
> (attaching screen shot of admin UI showing replicated shard momentarily)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15014) Runaway replica creation with autoscaling example from ref guide

2020-11-23 Thread Gus Heck (Jira)
Gus Heck created SOLR-15014:
---

 Summary: Runaway replica creation with autoscaling example from 
ref guide
 Key: SOLR-15014
 URL: https://issues.apache.org/jira/browse/SOLR-15014
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: AutoScaling
Affects Versions: 8.6.3
Reporter: Gus Heck
 Attachments: image-2020-11-23-11-37-15-124.png

Although the present autoscaling implementation is deprecated, I have a client 
intent on using it, and in trying to create rules that ensure all replicas on 
all nodes, I wound up getting into a state where one replica was (apparently) 
infinitely creating new copies of itself. The boiled down steps to reproduce:

Create a 4 node cluster locally for testing from a checkout of the tagged 
version for 8.6.3

(Using solr/cloud-dev/cloud.sh)
{code:java}
./cloud.sh  new -r   
{code}
Create a collection
{code:java}
http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1
{code}
Add this trigger from the ref guide 
([https://lucene.apache.org/solr/guide/8_6/solrcloud-autoscaling-triggers.html#node-added-trigger):]
{code:java}
{
  "set-trigger": {
"name": "node_added_trigger",
"event": "nodeAdded",
"waitFor": "5s",
"preferredOperation": "ADDREPLICA",
"replicaType": "PULL"
  }
}
{code}
Reboot the cluster, and when it comes up infinite replica creation ensues 
(attaching screen shot of admin UI showing replicated shard momentarily)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14986) Restrict the properties possible to define with "property.name=value" when creating a collection

2020-11-10 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229266#comment-17229266
 ] 

Gus Heck commented on SOLR-14986:
-

Yeah, It seems to me that any property specified in the create command that 
would conflict with the actual properties of the create command should just 
fail with a message about overlapping properties.

> Restrict the properties possible to define with "property.name=value" when 
> creating a collection
> 
>
> Key: SOLR-14986
> URL: https://issues.apache.org/jira/browse/SOLR-14986
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> This came to light when I was looking at two user-list questions where people 
> try to manually define core.properties to define _replicas_ in SolrCloud. 
> There are two related issues:
> 1> You can do things like "action=CREATE=eoe=blivet" 
> which results in an opaque error about "could not create replica." I 
> propose we return a better error here like "property.collection should not be 
> specified when creating a collection". What do people think about the rest of 
> the auto-created properties on collection creation? 
> coreNodeName
> collection.configName
> name
> numShards
> shard
> collection
> replicaType
> "name" seems to be OK to change, although i don't see anyplace anyone can 
> actually see it afterwards
> 2> Change the ref guide to steer people away from attempting to manually 
> create a core.properties file to define cores/replicas in SolrCloud. There's 
> no warning on the "defining-core-properties.adoc" for instance. Additionally 
> there should be some kind of message on the collections API documentation 
> about not trying to set the properties in <1> on the CREATE command.
> <2> used to actually work (apparently) with legacyCloud...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9575) Add PatternTypingFilter

2020-10-16 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215434#comment-17215434
 ] 

Gus Heck edited comment on LUCENE-9575 at 10/16/20, 3:15 PM:
-

Yeah I looked at our FST based regex class, but as you say, no group tracking 
which was critical. I had somewhat hoped that the performance of a non FST list 
of regexes would force me to learn all the nitty gritty of FST's and do 
something really nifty add group support but the ingest for the customer 
(involving ~25 regexps) didn't seem to be limited by the analysis so there was 
no justifying that work... optimize later. 

Also, no not across multiple tokens, again more than the customer needed, but a 
valid enhancement. 


was (Author: gus_heck):
Yeah I looked at our FST based regex class, but as you say, no group tracking 
which was critical. I had somewhat hoped that the performance of a non FST list 
of regexes would force me to learn all the nitty gritty of FST's and do 
something really nifty add group support but the ingest for the customer 
(involving ~25 regexps) didn't seem to be limited by the analysis so there was 
no justifying that work... optimize later. 

> Add PatternTypingFilter
> ---
>
> Key: LUCENE-9575
> URL: https://issues.apache.org/jira/browse/LUCENE-9575
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> One of the key asks when the Library of Congress was asking me to develop the 
> Advanced Query Parser was to be able to recognize arbitrary patterns that 
> included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they 
> wanted 401k and 401(k) to match documents with either style reference, and 
> NOT match documents that happen to have isolated 401 or k tokens (i.e. not 
> documents about the http status code) And of course we wanted to give up as 
> little of the text analysis features they were already using.
> This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and 
> one solr specific filter in SOLR-14597 that re-analyzes tokens with an 
> arbitrary analyzer defined for a type in the solr schema, combine to achieve 
> this. 
> This filter has the job of spotting the patterns, and adding the intended 
> synonym as at type to the token (from which minimal punctuation has been 
> removed). It also sets flags on the token which are retained through the 
> analysis chain, and at the very end the type is converted to a synonym and 
> the original token(s) for that type are dropped avoiding the match on 401 
> (for example) 
> The pattern matching is specified in a file that looks like: 
> {code}
> 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2
> 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3
> 2 C\+\+ ::: c_plus_plus
> {code}
> That file would match match legal reference patterns such as 401(k), 401k, 
> 501(c)3 and C++ The format is:
>   ::: 
> and groups in the pattern are substituted into the replacement so the first 
> line above would create synonyms such as:
> {code}
> 401k   --> legal2_401_k
> 401(k) --> legal2_401_k
> 503(c) --> legal2_503_c
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter

2020-10-16 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215434#comment-17215434
 ] 

Gus Heck commented on LUCENE-9575:
--

Yeah I looked at our FST based regex class, but as you say, no group tracking 
which was critical. I had somewhat hoped that the performance of a non FST list 
of regexes would force me to learn all the nitty gritty of FST's and do 
something really nifty add group support but the ingest for the customer 
(involving ~25 regexps) didn't seem to be limited by the analysis so there was 
no justifying that work... optimize later. 

> Add PatternTypingFilter
> ---
>
> Key: LUCENE-9575
> URL: https://issues.apache.org/jira/browse/LUCENE-9575
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>
> One of the key asks when the Library of Congress was asking me to develop the 
> Advanced Query Parser was to be able to recognize arbitrary patterns that 
> included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they 
> wanted 401k and 401(k) to match documents with either style reference, and 
> NOT match documents that happen to have isolated 401 or k tokens (i.e. not 
> documents about the http status code) And of course we wanted to give up as 
> little of the text analysis features they were already using.
> This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and 
> one solr specific filter in SOLR-14597 that re-analyzes tokens with an 
> arbitrary analyzer defined for a type in the solr schema, combine to achieve 
> this. 
> This filter has the job of spotting the patterns, and adding the intended 
> synonym as at type to the token (from which minimal punctuation has been 
> removed). It also sets flags on the token which are retained through the 
> analysis chain, and at the very end the type is converted to a synonym and 
> the original token(s) for that type are dropped avoiding the match on 401 
> (for example) 
> The pattern matching is specified in a file that looks like: 
> {code}
> 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2
> 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3
> 2 C\+\+ ::: c_plus_plus
> {code}
> That file would match match legal reference patterns such as 401(k), 401k, 
> 501(c)3 and C++ The format is:
>   ::: 
> and groups in the pattern are substituted into the replacement so the first 
> line above would create synonyms such as:
> {code}
> 401k   --> legal2_401_k
> 401(k) --> legal2_401_k
> 503(c) --> legal2_503_c
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9574) Add a token filter to drop tokens based on flags.

2020-10-14 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214201#comment-17214201
 ] 

Gus Heck commented on LUCENE-9574:
--

Actually, I had expected when I started this, that 8.7 branch might have been 
cut already by the time I committed, and certainly the rest of the AQP changes 
won't make 8.7. Do we want to include it in 8.7 even so?

> Add a token filter to drop tokens based on flags.
> -
>
> Key: LUCENE-9574
> URL: https://issues.apache.org/jira/browse/LUCENE-9574
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> (Breaking this off of SOLR-14597 for independent review)
> A filter that tests flags on tokens vs a bitmask and drops tokens that have 
> all specified flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14861) CoreContainer shutdown needs to be aware of other ongoing operations and wait until they're complete

2020-10-13 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213282#comment-17213282
 ] 

Gus Heck commented on SOLR-14861:
-

Sorry didn't mean to sound accusatory. I guess I'm not understanding this: How 
is reload "complete" if it's time-sliced out.. that sounds like it's not 
"complete" to me. 

Looking at the test specifically, it appears to create 5 threads start them and 
then thread.join all of them, if reload is timesliced out, the thread in 
question shouldn't' be finished (unless the reload call is happening async 
which would be the problem I'm talking about) and the join should continue to 
block preventing the test harness from shutting down (because the test method 
isn't finished). Alternately maybe I'm confused about who is calling shutdown? 

Looking into the sub-methods I find another example of what I'm talking about 
even though it shouldn't actually cause failure here unless perhaps this 
heuristic can pass before reload completes...
{code:java}
RestTestHarness publisher = randomRestTestHarness(r);
String response = publisher.post("/schema", SolrTestCaseJ4.json(payload));
{code}
should be blocking until the core is reloaded and changes are safe for use by 
the caller (IMHO). The subsequent loop should not be needed:
{code:java}
try {
  long startTime = System.nanoTime();
  long maxTimeoutMillis = 10;
  while (TimeUnit.MILLISECONDS.convert(System.nanoTime() - startTime, 
TimeUnit.NANOSECONDS) < maxTimeoutMillis) {
errmessages.clear();
Map m = getObj(harness, aField, "fields");
if (m != null) errmessages.add(StrUtils.formatString("field {0} still 
exists", aField));
m = getObj(harness, dynamicFldName, "dynamicFields");
if (m != null) errmessages.add(StrUtils.formatString("dynamic field {0} 
still exists", dynamicFldName));
List l = getSourceCopyFields(harness, aField);
if (checkCopyField(l, aField, dynamicCopyFldDest))
  errmessages.add(StrUtils.formatString("CopyField source={0},dest={1} 
still exists", aField, dynamicCopyFldDest));
m = getObj(harness, newFieldTypeName, "fieldTypes");
if (m != null) errmessages.add(StrUtils.formatString("new type {0} 
still exists", newFieldTypeName));

if (errmessages.isEmpty()) break;

Thread.sleep(10);
  }

{code}
As for code after shutdown, It looks like people may have read isShutDown two 
different ways perhaps? Maybe we need two flags with clearer names... 
isShutDownComplete and isShutDownInProgress?

> CoreContainer shutdown needs to be aware of other ongoing operations and wait 
> until they're complete
> 
>
> Key: SOLR-14861
> URL: https://issues.apache.org/jira/browse/SOLR-14861
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-14861.patch
>
>
> Noble and I are trying to get to the bottom of the TestBulkSchemaConcurrent 
> failures and found what looks like a glaring gap in how 
> CoreContainer.shutdown operates. I don't know the impact on production since 
> we're shutting down anyway, but I think this is responsible for the errors in 
> TestBulkSchemaConcurrent and likely behind others, especially any other test 
> that fails intermittently that involves core reloads, including and 
> especially any tests that exercise managed schema.
> We have clear evidence of this sequence:
> 1> some CoreContainer.reloads come in and get _partway_ through, in 
> particular past the test at the top where CoreContainer.reload() throws an 
> AlreadyClosed exception if (isShutdown).
> 2> Some CoreContainer.shutdown() threads get some processing time before the 
> reloads in <1> are finished.
> 3> the threads in <1> pick back up and go wonky. I suspect that there are a 
> number of different things that could be going wrong here depending on how 
> far through CoreContainer.shutdown() gets that pop out in different ways.
> Since it's my shift (Noble has to sleep sometime), I put some crude locking 
> in just to test the idea; incrementing an AtomicInteger on entry to 
> CoreContainer.reload then decrementing it at the end, and spinning in 
> CoreContainer.shutdown() until the AtomicInteger was back to zero. With that 
> in place, 100 runs and no errors whereas before I could never get even 10 
> runs to finish without an error. This is not a proper fix at all, and the way 
> it's currently running there are still possible race conditions, just much 
> smaller windows. And I suspect it risks spinning forever. But it's enough to 
> make me believe I finally understand what's happening.
> I also suspect that reload is more sensitive than most operations on a core 
> due 

[jira] [Comment Edited] (SOLR-14861) CoreContainer shutdown needs to be aware of other ongoing operations and wait until they're complete

2020-10-13 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213118#comment-17213118
 ] 

Gus Heck edited comment on SOLR-14861 at 10/13/20, 1:33 PM:


Why aren't we viewing the problem as reload (etc) returns before it has 
ACTUALLY completed? How can the test be proceeding to shutdown if we aren't 
lying to it about the completion of reload? (assuming the test isn't making 
it's own threads and failing to track when they complete)

A call to any admin level operation (from Java, SolrJ or the admin API) really 
should not complete until the command is complete, and the definition of 
complete should be the target resource is 100% ready to use (see also: create 
collection)

Tests should never need any waiting strategies unless they themselves have 
started their own threads. (Credit: Above, I'm parroting a rehashed form of 
something Mark Miller said ages ago, at least as I recall it)

If we *need* to track what's in-flight on shutdown, we've failed in the event 
of a power loss, so we shouldn't be doing that (Where need defined as 
"otherwise persisted state will be corrupted", anything else is "want").

If we want a graceful "drain existing requests" process we should build that 
explicitly by tracking all requests at a high level (we do this with 
SolrRequestInfo partly already, plus need to account for async)... Of course 
that only works if we don't lie about request completion in the first place. 
Once we can perform a "start rejecting and drain" (that doesn't lie about when 
it completes) we can paste request draining on the front of shutdown and reload 
fairly trivially as an option.


was (Author: gus_heck):
Why aren't we viewing the problem as reload (etc) returns before it has 
ACTUALLY completed? How can the test be proceeding to shutdown if we aren't 
lying to it about the completion of reload? (assuming the test isn't making 
it's own threads and failing to track when they complete)

A call to any admin level operation (from Java, SolrJ or the admin API) really 
should not complete until the command is complete, and the definition of 
complete should be the target resource is 100% ready to use (see also: create 
collection)

Tests should never need any waiting strategies unless they themselves have 
started their own threads. (Credit: Above, I'm parroting a rehashed form of 
something Mark Miller said ages ago, at least as I recall it)

If we *need* to track what's in-flight on shutdown, we've failed in the event 
of a power loss, so we shouldn't be doing that (Where need defined as 
"otherwise persisted state will be corrupted", anything else is "want").

If we want a graceful "drain existing requests" process we should build that 
explicitly by tracking all requests at a high level (we do this with 
SolrRequestInfo partly already, plus need to account for async)... Of course 
that only works if we don't lie about request completion in the first place. 
Once we can perform a "start rejecting and drain" (that doesn't lie about when 
it completes) we can paste request draining on the front of shutdown and reload 
fairly trivially.

> CoreContainer shutdown needs to be aware of other ongoing operations and wait 
> until they're complete
> 
>
> Key: SOLR-14861
> URL: https://issues.apache.org/jira/browse/SOLR-14861
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-14861.patch
>
>
> Noble and I are trying to get to the bottom of the TestBulkSchemaConcurrent 
> failures and found what looks like a glaring gap in how 
> CoreContainer.shutdown operates. I don't know the impact on production since 
> we're shutting down anyway, but I think this is responsible for the errors in 
> TestBulkSchemaConcurrent and likely behind others, especially any other test 
> that fails intermittently that involves core reloads, including and 
> especially any tests that exercise managed schema.
> We have clear evidence of this sequence:
> 1> some CoreContainer.reloads come in and get _partway_ through, in 
> particular past the test at the top where CoreContainer.reload() throws an 
> AlreadyClosed exception if (isShutdown).
> 2> Some CoreContainer.shutdown() threads get some processing time before the 
> reloads in <1> are finished.
> 3> the threads in <1> pick back up and go wonky. I suspect that there are a 
> number of different things that could be going wrong here depending on how 
> far through CoreContainer.shutdown() gets that pop out in different ways.
> Since it's my shift (Noble has to sleep sometime), I put some crude locking 
> in just to test the idea; incrementing an AtomicInteger on entry to 
> 

[jira] [Commented] (SOLR-14861) CoreContainer shutdown needs to be aware of other ongoing operations and wait until they're complete

2020-10-13 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213118#comment-17213118
 ] 

Gus Heck commented on SOLR-14861:
-

Why aren't we viewing the problem as reload (etc) returns before it has 
ACTUALLY completed? How can the test be proceeding to shutdown if we aren't 
lying to it about the completion of reload? (assuming the test isn't making 
it's own threads and failing to track when they complete)

A call to any admin level operation (from Java, SolrJ or the admin API) really 
should not complete until the command is complete, and the definition of 
complete should be the target resource is 100% ready to use (see also: create 
collection)

Tests should never need any waiting strategies unless they themselves have 
started their own threads. (Credit: Above, I'm parroting a rehashed form of 
something Mark Miller said ages ago, at least as I recall it)

If we *need* to track what's in-flight on shutdown, we've failed in the event 
of a power loss, so we shouldn't be doing that (Where need defined as 
"otherwise persisted state will be corrupted", anything else is "want").

If we want a graceful "drain existing requests" process we should build that 
explicitly by tracking all requests at a high level (we do this with 
SolrRequestInfo partly already, plus need to account for async)... Of course 
that only works if we don't lie about request completion in the first place. 
Once we can perform a "start rejecting and drain" (that doesn't lie about when 
it completes) we can paste request draining on the front of shutdown and reload 
fairly trivially.

> CoreContainer shutdown needs to be aware of other ongoing operations and wait 
> until they're complete
> 
>
> Key: SOLR-14861
> URL: https://issues.apache.org/jira/browse/SOLR-14861
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-14861.patch
>
>
> Noble and I are trying to get to the bottom of the TestBulkSchemaConcurrent 
> failures and found what looks like a glaring gap in how 
> CoreContainer.shutdown operates. I don't know the impact on production since 
> we're shutting down anyway, but I think this is responsible for the errors in 
> TestBulkSchemaConcurrent and likely behind others, especially any other test 
> that fails intermittently that involves core reloads, including and 
> especially any tests that exercise managed schema.
> We have clear evidence of this sequence:
> 1> some CoreContainer.reloads come in and get _partway_ through, in 
> particular past the test at the top where CoreContainer.reload() throws an 
> AlreadyClosed exception if (isShutdown).
> 2> Some CoreContainer.shutdown() threads get some processing time before the 
> reloads in <1> are finished.
> 3> the threads in <1> pick back up and go wonky. I suspect that there are a 
> number of different things that could be going wrong here depending on how 
> far through CoreContainer.shutdown() gets that pop out in different ways.
> Since it's my shift (Noble has to sleep sometime), I put some crude locking 
> in just to test the idea; incrementing an AtomicInteger on entry to 
> CoreContainer.reload then decrementing it at the end, and spinning in 
> CoreContainer.shutdown() until the AtomicInteger was back to zero. With that 
> in place, 100 runs and no errors whereas before I could never get even 10 
> runs to finish without an error. This is not a proper fix at all, and the way 
> it's currently running there are still possible race conditions, just much 
> smaller windows. And I suspect it risks spinning forever. But it's enough to 
> make me believe I finally understand what's happening.
> I also suspect that reload is more sensitive than most operations on a core 
> due to the fact that it runs for a long time, but I assume other operations 
> have the same potential. Shouldn't CoreContainer.shutDown() wait until no 
> other operations are in flight?
> On a quick scan of CoreContainer, there are actually few places where we even 
> check for isShutdown, I suspect the places we do are ad-hoc that we've found 
> by trial-and-error when tests fail. We need a design rather than hit-or-miss 
> hacking.
> I think that isShutdown should be replaced with something more robust. What 
> that is IDK quite yet because I've been hammering at this long enough and I 
> need a break.
> This is consistent with another observation about this particular test. If 
> there's sleep at the end, it wouldn't fail; all the reloads get a chance to 
> finish before anything was shut down.
> An open question how much this matters to production systems. In the testing 
> case, bunches of these reloads are issued then we immediately 

[jira] [Commented] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types

2020-10-08 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210350#comment-17210350
 ] 

Gus Heck commented on LUCENE-9572:
--

The test framework changes in this ticket are also required by LUCENE-9575

> Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types
> ---
>
> Key: LUCENE-9572
> URL: https://issues.apache.org/jira/browse/LUCENE-9572
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis, modules/test-framework
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (Breaking this off of SOLR-14597 for independent review)
> TypeAsSynonymFilter converts types attributes to a synonym. In some cases the 
> original token may have already had flags set on it and it may be useful to 
> propagate some or all of those flags to the synonym we are generating. This 
> ticket provides that ability and allows the user to specify a bitmask to 
> specify which flags are retained.
> Additionally there may be some set of types that should not be converted to 
> synonyms, and this change allows the user to specify a comma separated list 
> of types to ignore (most common case will be to ignore a common default type 
> of 'word' I suspect)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9575) Add PatternTypingFilter

2020-10-08 Thread Gus Heck (Jira)
Gus Heck created LUCENE-9575:


 Summary: Add PatternTypingFilter
 Key: LUCENE-9575
 URL: https://issues.apache.org/jira/browse/LUCENE-9575
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Gus Heck
Assignee: Gus Heck


One of the key asks when the Library of Congress was asking me to develop the 
Advanced Query Parser was to be able to recognize arbitrary patterns that 
included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they 
wanted 401k and 401(k) to match documents with either style reference, and NOT 
match documents that happen to have isolated 401 or k tokens (i.e. not 
documents about the http status code) And of course we wanted to give up as 
little of the text analysis features they were already using.

This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and 
one solr specific filter in SOLR-14597 that re-analyzes tokens with an 
arbitrary analyzer defined for a type in the solr schema, combine to achieve 
this. 

This filter has the job of spotting the patterns, and adding the intended 
synonym as at type to the token (from which minimal punctuation has been 
removed). It also sets flags on the token which are retained through the 
analysis chain, and at the very end the type is converted to a synonym and the 
original token(s) for that type are dropped avoiding the match on 401 (for 
example) 

The pattern matching is specified in a file that looks like: 
{code}
2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2
2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3
2 C\+\+ ::: c_plus_plus
{code}

That file would match match legal reference patterns such as 401(k), 401k, 
501(c)3 and C++ The format is:

  ::: 

and groups in the pattern are substituted into the replacement so the first 
line above would create synonyms such as:

{code}
401k   --> legal2_401_k
401(k) --> legal2_401_k
503(c) --> legal2_503_c
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9574) Add a token filter to drop tokens based on flags.

2020-10-08 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210335#comment-17210335
 ] 

Gus Heck commented on LUCENE-9574:
--

Since this is blocking SIP-9 and SOLR-14597 I'll be presuming silent consensus 
if there are no comments by Monday

> Add a token filter to drop tokens based on flags.
> -
>
> Key: LUCENE-9574
> URL: https://issues.apache.org/jira/browse/LUCENE-9574
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (Breaking this off of SOLR-14597 for independent review)
> A filter that tests flags on tokens vs a bitmask and drops tokens that have 
> all specified flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9574) Add a token filter to drop tokens based on flags.

2020-10-08 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210333#comment-17210333
 ] 

Gus Heck commented on LUCENE-9574:
--

One interesting corner case came up when the first token in the stream matched 
the flags, but had already had a synonym added. The synonym of course had 
position increment 0 and so dropping the token caused compliants about first 
token not having a position increment > 0. I could think of no way to reach 
forward in the stream and adjust the synonym token to account for the dropping 
of it's parent. So the workaround I came up with was to create a random token 
that will effectively never match anything and thus be invisible to to replace 
instead of drop if the first token in the stream is being dropped. Not crazy 
about it and would like to ask why the restriction on position increment is 
there... it feels like for some reason downstream code expects token positions 
be be starting with 1 instead of zero or something? Open to suggestions for a 
better solution too.

> Add a token filter to drop tokens based on flags.
> -
>
> Key: LUCENE-9574
> URL: https://issues.apache.org/jira/browse/LUCENE-9574
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>
> (Breaking this off of SOLR-14597 for independent review)
> A filter that tests flags on tokens vs a bitmask and drops tokens that have 
> all specified flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types

2020-10-08 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210332#comment-17210332
 ] 

Gus Heck commented on LUCENE-9572:
--

Since this is blocking SIP-9 and SOLR-14597 I'll be presuming silent consensus 
if there are no comments by Monday

> Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types
> ---
>
> Key: LUCENE-9572
> URL: https://issues.apache.org/jira/browse/LUCENE-9572
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis, modules/test-framework
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (Breaking this off of SOLR-14597 for independent review)
> TypeAsSynonymFilter converts types attributes to a synonym. In some cases the 
> original token may have already had flags set on it and it may be useful to 
> propagate some or all of those flags to the synonym we are generating. This 
> ticket provides that ability and allows the user to specify a bitmask to 
> specify which flags are retained.
> Additionally there may be some set of types that should not be converted to 
> synonyms, and this change allows the user to specify a comma separated list 
> of types to ignore (most common case will be to ignore a common default type 
> of 'word' I suspect)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9574) Add a token filter to drop tokens based on flags.

2020-10-08 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated LUCENE-9574:
-
Description: 
(Breaking this off of SOLR-14597 for independent review)

A filter that tests flags on tokens vs a bitmask and drops tokens that have all 
specified flags.

  was:A filter that tests flags on tokens vs a bitmask and drops tokens that 
have all specified flags.


> Add a token filter to drop tokens based on flags.
> -
>
> Key: LUCENE-9574
> URL: https://issues.apache.org/jira/browse/LUCENE-9574
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>
> (Breaking this off of SOLR-14597 for independent review)
> A filter that tests flags on tokens vs a bitmask and drops tokens that have 
> all specified flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9574) Add a token filter to drop tokens based on flags.

2020-10-08 Thread Gus Heck (Jira)
Gus Heck created LUCENE-9574:


 Summary: Add a token filter to drop tokens based on flags.
 Key: LUCENE-9574
 URL: https://issues.apache.org/jira/browse/LUCENE-9574
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Gus Heck
Assignee: Gus Heck


A filter that tests flags on tokens vs a bitmask and drops tokens that have all 
specified flags.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types

2020-10-08 Thread Gus Heck (Jira)
Gus Heck created LUCENE-9572:


 Summary: Allow TypeAsSynonymFilter to propagate selected flags and 
Ignore some types
 Key: LUCENE-9572
 URL: https://issues.apache.org/jira/browse/LUCENE-9572
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis, modules/test-framework
Reporter: Gus Heck
Assignee: Gus Heck


(Breaking this off of SOLR-14597 for independent review)

TypeAsSynonymFilter converts types attributes to a synonym. In some cases the 
original token may have already had flags set on it and it may be useful to 
propagate some or all of those flags to the synonym we are generating. This 
ticket provides that ability and allows the user to specify a bitmask to 
specify which flags are retained.

Additionally there may be some set of types that should not be converted to 
synonyms, and this change allows the user to specify a comma separated list of 
types to ignore (most common case will be to ignore a common default type of 
'word' I suspect)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser

2020-10-08 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210252#comment-17210252
 ] 

Gus Heck commented on SOLR-14787:
-

New Syntax with latest change (one less parameter, can check multiple tokens): 

for payloads such as
{code:java}
"one|1.0 two|2.0 three|3.0"
{code}
This does not match
{code:java}
{!payload_check f=vals_dpf payloads='0.75 3' op='gt'}one two
{code}
but this does match
{code:java}
{!payload_check f=vals_dpf payloads='0.75 1.5' op='gt'}one two
{code}

> Inequality support in Payload Check query parser
> 
>
> Key: SOLR-14787
> URL: https://issues.apache.org/jira/browse/SOLR-14787
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The goal of this ticket/pull request is to support a richer set of matching 
> and filtering based on term payloads.  This patch extends the 
> PayloadCheckQueryParser to add a new local param for "op"
> The value of OP could be one of the following
>  * gt - greater than
>  * gte - greater than or equal
>  * lt - less than
>  * lte - less than or equal
> default value for "op" if not specified is to be the current behavior of 
> equals.
> Additionally to the operation you can specify a threshold local parameter
> This will provide the ability to search for the term "cat" so long as the 
> payload has a value of greater than 0.75.  
> One use case is to classify a document into various categories with an 
> associated confidence or probability that the classification is correct.  
> That can be indexed into a delimited payload field.  The searches can find 
> and match documents that were tagged with the "cat" category with a 
> confidence of greater than 0.5.
> Example Document
> {code:java}
> { 
>   "id":"doc_1",
>   "classifications_payload":["cat|0.75 dog|2.0"]
> }
> {code}
> Example Syntax
> {code:java}
> {!payload_check f=classifications_payload payloads='1' op='gt' 
> threshold='0.5'}cat  {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser

2020-10-01 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205764#comment-17205764
 ] 

Gus Heck commented on SOLR-14787:
-

So after spending some more time with this I have the following thoughts:
 # The threshold parameter is redundant with the payloads parameter. This 
should all be choosing operators in the same manner in the code, with "equals" 
being the default operator rather than having two distinct code paths. I think 
{{"\{!payload_check f=vals_dpf payloads='0.75' op='gt'}one"}} makes more sense. 
This also opens up the possibility of testing vs multiple payload values just 
like the equals case. Accepting a different operator per payload value can be a 
future enhancement however if anyone wants it.
 # There is a lucene class change here and so there definitely should be lucene 
level tests and we should have a lucene ticket too.
 # As you mentioned in a separate channel, this doesn't work with integers ( 
ie. {{"\{!payload_check f=vals_dpi payloads='1' op='gt' threshold='0.75'}A"}} 
won't work... this is because the integer payload (from the index, not the 
query) gets decoded as a float and winds up being some very very small value 
(saw it in debug, forgot to copy it down, but something ten to the minus 14 
IIRC), so this deceptively gives wrong answers and does not throw errors which 
is bad. I think this needs to be addressed by communicating the payload type to 
the query at the lucene layer (where folks are responsible for knowing the 
types info of their own fields) and deriving it from schema at the solr level 
where folks expect stuff to just work, because they declared a schema. 
Additionally, by analogy with range queries, probably strings should work via 
lexical order but possibly that could be for future enhancement, since 
users are less likely to expect strings to work in the same fashion as floats.
 # I'm still trying to explain why I get different results in IDE vs build 
here, but the build and the running applications is the important thing.
 # Needs docs of course

> Inequality support in Payload Check query parser
> 
>
> Key: SOLR-14787
> URL: https://issues.apache.org/jira/browse/SOLR-14787
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The goal of this ticket/pull request is to support a richer set of matching 
> and filtering based on term payloads.  This patch extends the 
> PayloadCheckQueryParser to add a new local param for "op"
> The value of OP could be one of the following
>  * gt - greater than
>  * gte - greater than or equal
>  * lt - less than
>  * lte - less than or equal
> default value for "op" if not specified is to be the current behavior of 
> equals.
> Additionally to the operation you can specify a threshold local parameter
> This will provide the ability to search for the term "cat" so long as the 
> payload has a value of greater than 0.75.  
> One use case is to classify a document into various categories with an 
> associated confidence or probability that the classification is correct.  
> That can be indexed into a delimited payload field.  The searches can find 
> and match documents that were tagged with the "cat" category with a 
> confidence of greater than 0.5.
> Example Document
> {code:java}
> { 
>   "id":"doc_1",
>   "classifications_payload":["cat|0.75 dog|2.0"]
> }
> {code}
> Example Syntax
> {code:java}
> {!payload_check f=classifications_payload payloads='1' op='gt' 
> threshold='0.5'}cat  {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser

2020-10-01 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205577#comment-17205577
 ] 

Gus Heck commented on SOLR-14787:
-

Hmm now I suspect my IDE had somehow got confused WRT 9.0 match version or 
perhaps I was reading a wrong window, but trying fresh today  I just reproduced 
the NOUN VERB failure again with a freshly started IDE but this time failing 
with the appropriate 8.7 match version messages... 

That said, but the build still passes when I do this
{code:java}
gus@ns-l1:~/projects/apache/lucene-solr/fork/lucene-solr8$ ant test 
-Dtests.class=org.apache.solr.search.TestPayloadCheckQParserPlugin > 
build.out.txt
gus@ns-l1:~/projects/apache/lucene-solr/fork/lucene-solr8$ grep NOUN 
build.out.txt 
   [junit4]   2> 5995 INFO  
(TEST-TestPayloadCheckQParserPlugin.test-seed#[39C6574AF7C1723D]) [ ] 
o.a.s.c.S.Request [collection1]  webapp=null path=null 
params={q={!payload_check+f%3Dvals_dps+payloads%3D'NOUN+VERB'}cat+jumped=*,score=xml}
 hits=1 status=0 QTime=4
   [junit4]   2> 6004 INFO  
(TEST-TestPayloadCheckQParserPlugin.test-seed#[39C6574AF7C1723D]) [ ] 
o.a.s.c.S.Request [collection1]  webapp=null path=null 
params={q={!payload_check+f%3Dvals_dps+payloads%3D'VERB+NOUN'}cat+jumped=*,score=xml}
 hits=0 status=0 QTime=0
{code}
Note the hits=1 above vs hits=0 I get in the ide running of the same test
{code:java}
3618 INFO  (TEST-TestPayloadCheckQParserPlugin.test-seed#[C26FC0AC309214A9]) [  
   ] o.a.s.c.S.Request [collection1]  webapp=null path=null 
params={q={!payload_check+f%3Dvals_dps+payloads%3D'NOUN+VERB'}cat+jumped=*,score=xml}
 hits=0 status=0 QTime=2
{code}

> Inequality support in Payload Check query parser
> 
>
> Key: SOLR-14787
> URL: https://issues.apache.org/jira/browse/SOLR-14787
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The goal of this ticket/pull request is to support a richer set of matching 
> and filtering based on term payloads.  This patch extends the 
> PayloadCheckQueryParser to add a new local param for "op"
> The value of OP could be one of the following
>  * gt - greater than
>  * gte - greater than or equal
>  * lt - less than
>  * lte - less than or equal
> default value for "op" if not specified is to be the current behavior of 
> equals.
> Additionally to the operation you can specify a threshold local parameter
> This will provide the ability to search for the term "cat" so long as the 
> payload has a value of greater than 0.75.  
> One use case is to classify a document into various categories with an 
> associated confidence or probability that the classification is correct.  
> That can be indexed into a delimited payload field.  The searches can find 
> and match documents that were tagged with the "cat" category with a 
> confidence of greater than 0.5.
> Example Document
> {code:java}
> { 
>   "id":"doc_1",
>   "classifications_payload":["cat|0.75 dog|2.0"]
> }
> {code}
> Example Syntax
> {code:java}
> {!payload_check f=classifications_payload payloads='1' op='gt' 
> threshold='0.5'}cat  {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8281) Add RollupMergeStream to Streaming API

2020-09-23 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200848#comment-17200848
 ] 

Gus Heck commented on SOLR-8281:


This seems related to something I wanted to do for a client... I had reduce 
with group() and I wanted to then feed the groups to an arbitrary streaming 
expression for further processing, and have the result show up in the groups 
(result would have been a matrix). Problem I stopped on was how to express the 
stream to process the group without having a source (the source is the group).

> Add RollupMergeStream to Streaming API
> --
>
> Key: SOLR-8281
> URL: https://issues.apache.org/jira/browse/SOLR-8281
> Project: Solr
>  Issue Type: Bug
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
>
> The RollupMergeStream merges the aggregate results emitted by the 
> RollupStream on *worker* nodes.
> This is designed to be used in conjunction with the HashJoinStream to perform 
> rollup Aggregations on the joined Tuples. The HashJoinStream will require the 
> tuples to be partitioned on the Join keys. To avoid needing to repartition on 
> the *group by* fields for the RollupStream, we can perform a merge of the 
> rolled up Tuples coming from the workers.
> The construct would like this:
> {code}
> mergeRollup (...
>   parallel (...
> rollup (...
> hashJoin (
>   search(...),
>   search(...),
>   on="fieldA" 
> )
>  )
>  )
>)
> {code}
> The pseudo code above would push the *hashJoin* and *rollup* to the *worker* 
> nodes. The emitted rolled up tuples would be merged by the mergeRollup.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser

2020-09-23 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200828#comment-17200828
 ] 

Gus Heck commented on SOLR-14787:
-

I have found something interesting WRT the failing case you mention... it only 
fails when I run the test in my IDE. If I use the ant build it passes. I notice 
some interesting differences in startup for these two scenarios... 

build:

 
{code:java}
   [junit4] Suite: org.apache.solr.search.TestPayloadCheckQParserPlugin
   [junit4]   2> 1454 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCase Setting 'solr.default.confdir' system property to 
test-framework derived value of 
'/home/gus/projects/apache/lucene-solr/fork/lucene-solr8/solr/server/solr/configsets/_default/conf'
   [junit4]   2> 1475 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCaseJ4 Created dataDir: 
/home/gus/projects/apache/lucene-solr/fork/lucene-solr8/solr/build/solr-core/test/J0/temp/solr.search.TestPayloadCheckQParserPlugin_AB5E0FC0380BB866-001/data-dir-1-001
   [junit4]   2> 1551 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCaseJ4 Using TrieFields (NUMERIC_POINTS_SYSPROP=false) 
w/NUMERIC_DOCVALUES_SYSPROP=true
   [junit4]   2> 1592 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.e.j.u.log Logging initialized @1620ms to org.eclipse.jetty.util.log.Slf4jLog
   [junit4]   2> 1597 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (true) via: 
@org.apache.solr.util.RandomizeSSL(reason=, ssl=NaN, value=NaN, clientAuth=NaN)
   [junit4]   2> 1621 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCaseJ4 SecureRandom sanity checks: 
test.solr.allowed.securerandom=null & java.security.egd=file:/dev/./urandom
   [junit4]   2> 1626 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCaseJ4 initCore
   [junit4]   2> 1757 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrConfig Using Lucene MatchVersion: 8.7.0
   [junit4]   2> 1901 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.s.IndexSchema Schema name=example
   [junit4]   2> 1931 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.TrieIntField]. Please consult documentation how to replace it accordingly.
   [junit4]   2> 1936 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.TrieFloatField]. Please consult documentation how to replace it 
accordingly.
   [junit4]   2> 1940 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.TrieLongField]. Please consult documentation how to replace it 
accordingly.
   [junit4]   2> 1944 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.TrieDoubleField]. Please consult documentation how to replace it 
accordingly.
   [junit4]   2> 1966 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.TrieDateField]. Please consult documentation how to replace it 
accordingly.
   [junit4]   2> 2202 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.GeoHashField]. Please consult documentation how to replace it accordingly.
   [junit4]   2> 2208 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.LatLonType]. Please consult documentation how to replace it accordingly.
   [junit4]   2> 2217 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.EnumField]. Please consult documentation how to replace it accordingly.


{code}
IDE (Intellij)

 

 
{code:java}
1172 INFO  (SUITE-TestPayloadCheckQParserPlugin-seed#[5A2517E33080AEE6]-worker) 
[ ] o.a.s.SolrTestCase Setting 'solr.default.confdir' system property to 
test-framework derived value of 
'/home/gus/projects/apache/lucene-solr/fork/lucene-solr/solr/server/solr/configsets/_default/conf'
1190 INFO  

[jira] [Commented] (SOLR-14597) Advanced Query Parser

2020-09-22 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200304#comment-17200304
 ] 

Gus Heck commented on SOLR-14597:
-

Right, agreed, Lucene stuff also should be broken out to Lucene tickets. All 
initially here to keep the donation process simple. 

> Advanced Query Parser
> -
>
> Key: SOLR-14597
> URL: https://issues.apache.org/jira/browse/SOLR-14597
> Project: Solr
>  Issue Type: New Feature
>  Components: query parsers
>Reporter: Mike Nibeck
>Assignee: Gus Heck
>Priority: Major
> Attachments: aqp_patch.patch
>
>
> This JIRA ticket tracks the progress of SIP-9, the Advanced Query Parser that 
> is being donated by the Library of Congress. Full description of the feature 
> can be found on the SIP Page.
> [https://cwiki.apache.org/confluence/display/SOLR/SIP-9+Advanced+Query+Parser]
> Briefly, this parser provides a comprehensive syntax for users that use 
> search on a daily basis. It also reserves a smaller set of punctuators than 
> other parsers. This facilitates easier handling of acronyms and punctuated 
> patterns with meaning ( such as C++ or 401(k) ). The new syntax opens up some 
> advanced features while also preventing access to arbitrary features via 
> local parameters. This parser will be safe for accepting user queries 
> directly with minimal pre-parsing, but for use cases beyond it's established 
> features alternate query paths (using other parsers) will need to be supplied.
> The code drop is being prepared and will be supplied as soon as we receive 
> guidance from the PMC regarding the proper process. Given that the Library 
> already has a signed CCLA we need to understand which of these (or other 
> processes) apply:
> [http://incubator.apache.org/ip-clearance/ip-clearance-template.html]
> and 
> [https://www.apache.org/licenses/contributor-agreements.html#grants]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14597) Advanced Query Parser

2020-09-21 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199726#comment-17199726
 ] 

Gus Heck commented on SOLR-14597:
-

looks like LUCENE-9531 has caused a conflict with the patch, and there have 
been some changes to the gradle files running javacc so I'm working on updating 
to work with that and I'll publish the fix as a pull-request for easier review. 
Now that there is code to look at some responses:

[~arafalov]: Point noted about TypeTokenFilter, there is similarity though 
filtering on flags instead of types. It would be attractive to also inherit 
from FilteringTokenFilter but It looks like one edge case I ran into isn't 
handled by the super class. (and makes me wonder if there's a lurking issue 
with other FilteringTokenFilter sub classes. The case I ran into is thus: The 
first token in the stream gets assigned a synonym, then in a subsequent step 
the first token is dropped (this is quite intentional in some use cases we had 
where the intent was to entirely prevent matches on the original token, but 
still match on the synonym). When this happens it causes 
{{java.lang.IllegalArgumentException: first position increment must be > 0 (got 
0)}} despite the fact that this scenario is not actually an error in terms of 
which tokens we want. Unfortunately there's no good way to know what's going to 
happen to the next token (which may not have the flags in question) so I came 
up with a workaround that I'm not very pleased with dropping in a placeholder 
token that is unlikely to match anything. Open to suggestions for better 
options there, and interested in whether or not other filters that drop tokens 
can hit the same issue, or if they've handled it in some graceful way I'm not 
appreciating.

Also, now that the code is available, let me know if you still see similarity 
between PatternTypingFilterFactory and KeywordMarkerFilterFactory... I think 
they are quite different.

[~ichattopadhyaya], [~dsmiley] While some of this could potentially be broken 
out into a package, there are also some changes to core and some lucene level 
classes that probably wouldn't want to be in a package, so feel free to put 
some eyes on it and suggest what the dividing line is (more eyes == better). 
I'm not against the idea of a 1st party package, but the question is will this 
be popular enough to merit default inclusion? Another breaking new ground sort 
of question is "Is it easier to pull it in later or push it out to a package 
later if we change our minds?" Maybe neither is harder...

Changes to note to classes outside the new org.apache.solr.aqp package (where 
the meat of the new parser and it's .jj file lives):
 # TypeAsSynonymFilter is gaining the ability to manage what flags are 
transmitted from the original token to the synonym when it is created
 # BaseTokenStreamTestCase is gaining the ability to verify the flags on the 
tokens produced.
 # access org.apache.solr.cloud.AbstractDistribZkTestBase#copyConfigUp is 
opened up so that it can be used in a wider array of tests.
 # Solr gains TokenAnalyzerFilter which applies the Analyzer from a specified 
field type to the individual tokens of the current stream (see javadoc for more 
detail)
 # Operator and SynonymQueryStyle are extracted from the standard parser's base 
class so they can be re-used. Reuse is is necessary because TextField 
references SynonymQueryStyle directly.
 # The above change forces an compile time API change in TextField, which might 
force this to not be available till 9.x (though the desire to make AQP 
available in 8.x is there). 
 # The change to TextField then failed TestPackages which failed with a 
ClassNotFound when it went looking for the old SynonymeQueryStyle inner class 
that had been promoted to a separate class. This forced me to decompile and 
provide classes and build/rebuild support for the binary jars checked in for 
TestPackages (as *.jar.bin). (the .java files for the classes loaded by this 
test had not been checked in). This is the genesis of the o.a.smy.pkg package 
namespace.

Some of the above (especially #7) might want to be broken into related or 
sub-tickets.

> Advanced Query Parser
> -
>
> Key: SOLR-14597
> URL: https://issues.apache.org/jira/browse/SOLR-14597
> Project: Solr
>  Issue Type: New Feature
>  Components: query parsers
>Reporter: Mike Nibeck
>Assignee: Gus Heck
>Priority: Major
> Attachments: aqp_patch.patch
>
>
> This JIRA ticket tracks the progress of SIP-9, the Advanced Query Parser that 
> is being donated by the Library of Congress. Full description of the feature 
> can be found on the SIP Page.
> [https://cwiki.apache.org/confluence/display/SOLR/SIP-9+Advanced+Query+Parser]
> Briefly, this parser provides a comprehensive syntax for users that use 
> 

[jira] [Updated] (SOLR-14597) Advanced Query Parser

2020-09-15 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-14597:

Affects Version/s: (was: 8.7)

> Advanced Query Parser
> -
>
> Key: SOLR-14597
> URL: https://issues.apache.org/jira/browse/SOLR-14597
> Project: Solr
>  Issue Type: New Feature
>  Components: query parsers
>Reporter: Mike Nibeck
>Assignee: Gus Heck
>Priority: Major
>
> This JIRA ticket tracks the progress of SIP-9, the Advanced Query Parser that 
> is being donated by the Library of Congress. Full description of the feature 
> can be found on the SIP Page.
> [https://cwiki.apache.org/confluence/display/SOLR/SIP-9+Advanced+Query+Parser]
> Briefly, this parser provides a comprehensive syntax for users that use 
> search on a daily basis. It also reserves a smaller set of punctuators than 
> other parsers. This facilitates easier handling of acronyms and punctuated 
> patterns with meaning ( such as C++ or 401(k) ). The new syntax opens up some 
> advanced features while also preventing access to arbitrary features via 
> local parameters. This parser will be safe for accepting user queries 
> directly with minimal pre-parsing, but for use cases beyond it's established 
> features alternate query paths (using other parsers) will need to be supplied.
> The code drop is being prepared and will be supplied as soon as we receive 
> guidance from the PMC regarding the proper process. Given that the Library 
> already has a signed CCLA we need to understand which of these (or other 
> processes) apply:
> [http://incubator.apache.org/ip-clearance/ip-clearance-template.html]
> and 
> [https://www.apache.org/licenses/contributor-agreements.html#grants]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14597) Advanced Query Parser

2020-09-15 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-14597:

Affects Version/s: (was: 8.6)
   8.7

> Advanced Query Parser
> -
>
> Key: SOLR-14597
> URL: https://issues.apache.org/jira/browse/SOLR-14597
> Project: Solr
>  Issue Type: New Feature
>  Components: query parsers
>Affects Versions: 8.7
>Reporter: Mike Nibeck
>Assignee: Gus Heck
>Priority: Major
>
> This JIRA ticket tracks the progress of SIP-9, the Advanced Query Parser that 
> is being donated by the Library of Congress. Full description of the feature 
> can be found on the SIP Page.
> [https://cwiki.apache.org/confluence/display/SOLR/SIP-9+Advanced+Query+Parser]
> Briefly, this parser provides a comprehensive syntax for users that use 
> search on a daily basis. It also reserves a smaller set of punctuators than 
> other parsers. This facilitates easier handling of acronyms and punctuated 
> patterns with meaning ( such as C++ or 401(k) ). The new syntax opens up some 
> advanced features while also preventing access to arbitrary features via 
> local parameters. This parser will be safe for accepting user queries 
> directly with minimal pre-parsing, but for use cases beyond it's established 
> features alternate query paths (using other parsers) will need to be supplied.
> The code drop is being prepared and will be supplied as soon as we receive 
> guidance from the PMC regarding the proper process. Given that the Library 
> already has a signed CCLA we need to understand which of these (or other 
> processes) apply:
> [http://incubator.apache.org/ip-clearance/ip-clearance-template.html]
> and 
> [https://www.apache.org/licenses/contributor-agreements.html#grants]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14787) Inequality support in Payload Check query parser

2020-09-02 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck reassigned SOLR-14787:
---

Assignee: Gus Heck

> Inequality support in Payload Check query parser
> 
>
> Key: SOLR-14787
> URL: https://issues.apache.org/jira/browse/SOLR-14787
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The goal of this ticket/pull request is to support a richer set of matching 
> and filtering based on term payloads.  This patch extends the 
> PayloadCheckQueryParser to add a new local param for "op"
> The value of OP could be one of the following
>  * gt - greater than
>  * gte - greater than or equal
>  * lt - less than
>  * lte - less than or equal
> default value for "op" if not specified is to be the current behavior of 
> equals.
> Additionally to the operation you can specify a threshold local parameter
> This will provide the ability to search for the term "cat" so long as the 
> payload has a value of greater than 0.75.  
> One use case is to classify a document into various categories with an 
> associated confidence or probability that the classification is correct.  
> That can be indexed into a delimited payload field.  The searches can find 
> and match documents that were tagged with the "cat" category with a 
> confidence of greater than 0.5.
> Example Document
> {code:java}
> { 
>   "id":"doc_1",
>   "classifications_payload":["cat|0.75 dog|2.0"]
> }
> {code}
> Example Syntax
> {code:java}
> {!payload_check f=classifications_payload payloads='1' op='gt' 
> threshold='0.5'}cat  {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14726) Streamline getting started experience

2020-08-27 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185873#comment-17185873
 ] 

Gus Heck commented on SOLR-14726:
-

Oh and there's that elephant just outside the doorway (i.e. not in scope for 
this ticket)... the lack of user friendly documentation for lucene itself :)

> Streamline getting started experience
> -
>
> Key: SOLR-14726
> URL: https://issues.apache.org/jira/browse/SOLR-14726
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Alexandre Rafalovitch
>Priority: Major
>  Labels: newdev
> Attachments: yasa-http.png
>
>
> The reference guide Solr tutorial is here:
> https://lucene.apache.org/solr/guide/8_6/solr-tutorial.html
> It needs to be simplified and easy to follow. Also, it should reflect our 
> best practices, that should also be followed in production. I have following 
> suggestions:
> # Make it less verbose. It is too long. On my laptop, it required 35 page 
> downs button presses to get to the bottom of the page!
> # First step of the tutorial should be to enable security (basic auth should 
> suffice).
> # {{./bin/solr start -e cloud}} <-- All references of -e should be removed.
> # All references of {{bin/solr post}} to be replaced with {{curl}}
> # Convert all {{bin/solr create}} references to curl of collection creation 
> commands
> # Add docker based startup instructions.
> # Create a Jupyter Notebook version of the entire tutorial, make it so that 
> it can be easily executed from Google Colaboratory. Here's an example: 
> https://twitter.com/TheSearchStack/status/1289703715981496320
> # Provide downloadable Postman and Insomnia files so that the same tutorial 
> can be executed from those tools. Except for starting Solr, all other steps 
> should be possible to be carried out from those tools.
> # Use V2 APIs everywhere in the tutorial
> # Remove all example modes, sample data (films, tech products etc.), 
> configsets from Solr's distribution (instead let the examples refer to them 
> from github)
> # Remove the post tool from Solr, curl should suffice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14726) Streamline getting started experience

2020-08-27 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185855#comment-17185855
 ] 

Gus Heck edited comment on SOLR-14726 at 8/27/20, 2:00 PM:
---

Can we make it a goal that the user be **completely** unaware of what mode 
(cloud or not) they are using in the initial contact. That's deployment stuff 
and nothing they should even think about on first contact. I think they should 
run "tutorial1.sh" or {{bin/solr -e tutorial1}} and then pull up a page in 
their web browser to see it worked. Cloud or non-cloud can be used behind the 
scenes as current or future maintainers see fit. An adapted version of my 
comments on slack:

There are various things to learn about solr... I might order them thus for 
what I (IMHO) consider optimal pedagogy:
 # {color:#0747a6}First Contact: A cushy easy intro that stands up solr, throws 
data in for them, and let's the user query it either in the UI or via curl as 
suits them (different people have different styles){color}
 # {color:#0747a6}Basic search concepts: inverted indexes, tokenization, a 
query syntax, sort vs relevancy scoring.{color}
 # {color:#0747a6}How to get data in (because without data whatever), and the 
need to be able to re-index{color}
 # How to deploy solr in a basically competent fashion for light duty use in 
low security environments
 # Features such as facets, highlighting, analysis options etc, this section 
should be an a la carte menu into the ref guide, as by this point they are 
becoming more advanced.
 # Hardening and Scaling solr, and otherwise making it production ready

For the first 3 you really don't want the user to see any of #4 and it really 
doesn't matter if it's cloud or not so long as the person trying to learn 
doesn't see whichever it is. I think bin/solr -e accomplishes that with #1, and 
we basically don't do a good job of teaching #3 (in the ref guide). When you 
get to #4 I can't imagine which cases you would want to have them start with 
non-cloud solr, though that section should have a closing section on non-cloud 
and the trade-offs of using it. #5 should be a la carte anyway, and we do have 
a fairly coherent section for #6 


was (Author: gus_heck):
Can we make it a goal that the user be **completely** unaware of what mode 
(cloud or not) they are using in the initial contact. That's deployment stuff 
and nothing they should even think about on first contact. I think they should 
run "tutorial1.sh" or {{bin/solr -e tutorial1}} and then pull up a page in 
their web browser to see it worked. Cloud or non-cloud can be used behind the 
scenes as current or future maintainers see fit. An adapted version of my 
comments on slack:

There are various things to learn about solr... I might order them thus for 
what I (IMHO) consider optimal pedagogy:
 # {color:#0747a6}First Contact: A cushy easy intro that stands up solr, throws 
data in for them, and let's the user query it either in the UI or via curl as 
suits them (different people have different styles){color}
 # {color:#0747a6}Basic search concepts: inverted indexes, tokenization, a 
query syntax, sort vs relevancy scoring.{color}
 # {color:#0747a6}How to get data in (because without data whatever), and the 
need to be able to re-index{color}
 # How to deploy solr in a basically competent fashion for light duty use in 
low security environments
 # Features such as facets, highlighting, analysis options etc, this section 
should be an a la carte menu into the ref guide, as by this point they are 
becoming more advanced.
 # Hardening and Scaling solr, and otherwise making it production ready

For the first 3 you really don't want the user to see any of #4 and it really 
doesn't matter if it's cloud or not so long as the person trying to learn 
doesn't see whichever it is. I think bin/solr -e accomplishes that with #1, and 
we basically don't do a good job of teaching #3 (in the ref guide). When you 
get to #4 I can't imagine which cases you would want to have them start with 
non-cloud solr, and have a closing section on non-cloud and the trade-offs of 
using it. #5 should be a la carte anyway, and we do have a fairly coherent 
section for #6 

> Streamline getting started experience
> -
>
> Key: SOLR-14726
> URL: https://issues.apache.org/jira/browse/SOLR-14726
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Alexandre Rafalovitch
>Priority: Major
>  Labels: newdev
> Attachments: yasa-http.png
>
>
> The reference guide Solr tutorial is here:
> https://lucene.apache.org/solr/guide/8_6/solr-tutorial.html
> It needs to be simplified and easy to follow. Also, it should reflect our 
> best practices, that should also be followed in production. I have following 
> suggestions:
> # 

[jira] [Commented] (SOLR-14726) Streamline getting started experience

2020-08-27 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185857#comment-17185857
 ] 

Gus Heck commented on SOLR-14726:
-

One caveat to what I just said is that cloud vs non-cloud does somewhat matter 
for "getting data in" WRT which SolrJ classes one might use.

> Streamline getting started experience
> -
>
> Key: SOLR-14726
> URL: https://issues.apache.org/jira/browse/SOLR-14726
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Alexandre Rafalovitch
>Priority: Major
>  Labels: newdev
> Attachments: yasa-http.png
>
>
> The reference guide Solr tutorial is here:
> https://lucene.apache.org/solr/guide/8_6/solr-tutorial.html
> It needs to be simplified and easy to follow. Also, it should reflect our 
> best practices, that should also be followed in production. I have following 
> suggestions:
> # Make it less verbose. It is too long. On my laptop, it required 35 page 
> downs button presses to get to the bottom of the page!
> # First step of the tutorial should be to enable security (basic auth should 
> suffice).
> # {{./bin/solr start -e cloud}} <-- All references of -e should be removed.
> # All references of {{bin/solr post}} to be replaced with {{curl}}
> # Convert all {{bin/solr create}} references to curl of collection creation 
> commands
> # Add docker based startup instructions.
> # Create a Jupyter Notebook version of the entire tutorial, make it so that 
> it can be easily executed from Google Colaboratory. Here's an example: 
> https://twitter.com/TheSearchStack/status/1289703715981496320
> # Provide downloadable Postman and Insomnia files so that the same tutorial 
> can be executed from those tools. Except for starting Solr, all other steps 
> should be possible to be carried out from those tools.
> # Use V2 APIs everywhere in the tutorial
> # Remove all example modes, sample data (films, tech products etc.), 
> configsets from Solr's distribution (instead let the examples refer to them 
> from github)
> # Remove the post tool from Solr, curl should suffice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14726) Streamline getting started experience

2020-08-27 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185855#comment-17185855
 ] 

Gus Heck commented on SOLR-14726:
-

Can we make it a goal that the user be **completely** unaware of what mode 
(cloud or not) they are using in the initial contact. That's deployment stuff 
and nothing they should even think about on first contact. I think they should 
run "tutorial1.sh" or {{bin/solr -e tutorial1}} and then pull up a page in 
their web browser to see it worked. Cloud or non-cloud can be used behind the 
scenes as current or future maintainers see fit. An adapted version of my 
comments on slack:

There are various things to learn about solr... I might order them thus for 
what I (IMHO) consider optimal pedagogy:
 # {color:#0747a6}First Contact: A cushy easy intro that stands up solr, throws 
data in for them, and let's the user query it either in the UI or via curl as 
suits them (different people have different styles){color}
 # {color:#0747a6}Basic search concepts: inverted indexes, tokenization, a 
query syntax, sort vs relevancy scoring.{color}
 # {color:#0747a6}How to get data in (because without data whatever), and the 
need to be able to re-index{color}
 # How to deploy solr in a basically competent fashion for light duty use in 
low security environments
 # Features such as facets, highlighting, analysis options etc, this section 
should be an a la carte menu into the ref guide, as by this point they are 
becoming more advanced.
 # Hardening and Scaling solr, and otherwise making it production ready

For the first 3 you really don't want the user to see any of #4 and it really 
doesn't matter if it's cloud or not so long as the person trying to learn 
doesn't see whichever it is. I think bin/solr -e accomplishes that with #1, and 
we basically don't do a good job of teaching #3 (in the ref guide). When you 
get to #4 I can't imagine which cases you would want to have them start with 
non-cloud solr, and have a closing section on non-cloud and the trade-offs of 
using it. #5 should be a la carte anyway, and we do have a fairly coherent 
section for #6 

> Streamline getting started experience
> -
>
> Key: SOLR-14726
> URL: https://issues.apache.org/jira/browse/SOLR-14726
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Alexandre Rafalovitch
>Priority: Major
>  Labels: newdev
> Attachments: yasa-http.png
>
>
> The reference guide Solr tutorial is here:
> https://lucene.apache.org/solr/guide/8_6/solr-tutorial.html
> It needs to be simplified and easy to follow. Also, it should reflect our 
> best practices, that should also be followed in production. I have following 
> suggestions:
> # Make it less verbose. It is too long. On my laptop, it required 35 page 
> downs button presses to get to the bottom of the page!
> # First step of the tutorial should be to enable security (basic auth should 
> suffice).
> # {{./bin/solr start -e cloud}} <-- All references of -e should be removed.
> # All references of {{bin/solr post}} to be replaced with {{curl}}
> # Convert all {{bin/solr create}} references to curl of collection creation 
> commands
> # Add docker based startup instructions.
> # Create a Jupyter Notebook version of the entire tutorial, make it so that 
> it can be easily executed from Google Colaboratory. Here's an example: 
> https://twitter.com/TheSearchStack/status/1289703715981496320
> # Provide downloadable Postman and Insomnia files so that the same tutorial 
> can be executed from those tools. Except for starting Solr, all other steps 
> should be possible to be carried out from those tools.
> # Use V2 APIs everywhere in the tutorial
> # Remove all example modes, sample data (films, tech products etc.), 
> configsets from Solr's distribution (instead let the examples refer to them 
> from github)
> # Remove the post tool from Solr, curl should suffice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13260) Add support for 128 bit integer point fields

2020-08-17 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179290#comment-17179290
 ] 

Gus Heck commented on SOLR-13260:
-

I'm interested in SOLR-6741 which you've set as requiring this. Can you 
elaborate on how this PR will be used in 6741? Are you planning on extending 
the "ByteStringPointField"? Do your plans account for or use InetAddressPoint 
(lucene class)? Also a couple comments on the PR..

> Add support for 128 bit integer point fields
> 
>
> Key: SOLR-13260
> URL: https://issues.apache.org/jira/browse/SOLR-13260
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Dale Richardson
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Since support for ipv6 requires dealing with 128 bit Point fields, I'm 
> splitting out support for 128 bit integer point fields into a separate commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Reopened] (SOLR-14582) Expose IWC.setMaxCommitMergeWaitMillis as an expert feature in Solr's index config

2020-08-07 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck reopened SOLR-14582:
-

Broken test, AwaitsFix 

> Expose IWC.setMaxCommitMergeWaitMillis as an expert feature in Solr's index 
> config
> --
>
> Key: SOLR-14582
> URL: https://issues.apache.org/jira/browse/SOLR-14582
> Project: Solr
>  Issue Type: Improvement
>Reporter: Tomas Eduardo Fernandez Lobbe
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Trivial
> Fix For: master (9.0), 8.7
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> LUCENE-8962 added the ability to merge segments synchronously on commit. This 
> isn't done by default and the default {{MergePolicy}} won't do it, but custom 
> merge policies can take advantage of this. Solr allows plugging in custom 
> merge policies, so if someone wants to make use of this feature they could, 
> however, they need to set {{IndexWriterConfig.maxCommitMergeWaitSeconds}} to 
> something greater than 0.
> Since this is an expert feature, I plan to document it only in javadoc and 
> not the ref guide.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail

2020-08-04 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171256#comment-17171256
 ] 

Gus Heck commented on SOLR-14706:
-

I tested the PR and I found 

A) Collection creation no longer fails
B) The upgrade recommendation to remove the policy works as far as it goes
C) The autoscaling.json still has cluster preferences after upgrade which are 
not present with a fresh install. I think we also want to recommend  

{code:java}
{"set-cluster-preferences" : []} 
{code}

To ensure exact parity with a fresh install. 


> Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
> ---
>
> Key: SOLR-14706
> URL: https://issues.apache.org/jira/browse/SOLR-14706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.7, 8.6.1
> Environment: 8.6.1 upgraded from 8.6.0 with more than one node
>Reporter: Gus Heck
>Assignee: Houston Putman
>Priority: Blocker
> Fix For: 8.6.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following steps will reproduce a situation in which collection creation 
> fails with this stack trace:
> {code:java}
> 2020-08-03 12:17:58.617 INFO  
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861
> 2020-08-03 12:17:58.751 ERROR 
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
> create failed:org.apache.solr.common.SolrException
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347)
>   at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
>   at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Only one extra tag supported for the 
> tag cores in {
>   "cores":"#EQUAL",
>   "node":"#ANY",
>   "strict":"false"}
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277)
>   at 
> org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192)
>   ... 6 more
> {code}
> Generalized steps:
> # Deploy 8.6.0 with separate data directories, create a collection to prove 
> it's working
> # download 
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
> # Stop the server on all nodes
> # replace the 8.6.0 with 8.6.1 
> # Start the server
> # via the admin UI create a collection
> # Observe failure warning box (with no text), check logs, find above trace
> Or more exactly here are my actual commands with a checkout of the 8.6.0 tag 
> in the working dir to which cloud.sh was configured:
> # /cloud.sh new -r upgrademe 
> # Create collection named test860 via admin ui with _default
> # ./cloud.sh stop 
> # cd 

[jira] [Commented] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh

2020-08-04 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171212#comment-17171212
 ] 

Gus Heck commented on SOLR-14704:
-

This is being added because the current script presently only unpacks the 
tarball if it has {{-r}} (recompile), or if you are running the {{new}} 
command. The new command will fail before extraction if the directory already 
exists(intentional, for safety). If {{-r}} is used it would overwrite whatever 
you placed in the directory with whatever is in your working copy after 
compilation/packaging and then immediately start the server with that instead.

This could also have been done as {{-t }} (and that could still 
be added), or {{-u}} to trigger an archive/re-extract but I thought it was 
slightly nicer to do the download without requiring separate steps. 

Among the things that may want to be added to this PR (which is just a start) 
is support for {{-d}} (and/or {{-t}}) in start/restart for upgrade testing, and 
pushing a new solr.xml to zk could be necessary in some cases but is not yet 
accounted for.

> Add download option to solr/cloud-dev/cloud.sh
> --
>
> Key: SOLR-14704
> URL: https://issues.apache.org/jira/browse/SOLR-14704
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For easier testing of things like RC artifacts I'm adding an option to 
> cloud.sh which will curl a tarball down from the web instead of building it 
> locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh

2020-08-04 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171205#comment-17171205
 ] 

Gus Heck edited comment on SOLR-14704 at 8/5/20, 1:10 AM:
--

I had gone for simply downloading from a specified url, for flexibility... thus 
it could be used for RC, actual releases, internal releases on internal 
repositories, etc etc. Also, given the hard to predict has sequence in RC 
artifact urls, I think it would be a lot more work to derive that URL and a lot 
more fragile


was (Author: gus_heck):
I had gone for simply downloading from a specified url, for flexibility... thus 
it could be used for RC, actual releases, internal releases on internal 
repositories, etc etc. 

> Add download option to solr/cloud-dev/cloud.sh
> --
>
> Key: SOLR-14704
> URL: https://issues.apache.org/jira/browse/SOLR-14704
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For easier testing of things like RC artifacts I'm adding an option to 
> cloud.sh which will curl a tarball down from the web instead of building it 
> locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh

2020-08-04 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171205#comment-17171205
 ] 

Gus Heck commented on SOLR-14704:
-

I had gone for simply downloading from a specified url, for flexibility... thus 
it could be used for RC, actual releases, internal releases on internal 
repositories, etc etc. 

> Add download option to solr/cloud-dev/cloud.sh
> --
>
> Key: SOLR-14704
> URL: https://issues.apache.org/jira/browse/SOLR-14704
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For easier testing of things like RC artifacts I'm adding an option to 
> cloud.sh which will curl a tarball down from the web instead of building it 
> locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail

2020-08-03 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-14706:

Fix Version/s: 8.6.1

> Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
> ---
>
> Key: SOLR-14706
> URL: https://issues.apache.org/jira/browse/SOLR-14706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.1
> Environment: 8.6.1 upgraded from 8.6.0
>Reporter: Gus Heck
>Priority: Blocker
> Fix For: 8.6.1
>
>
> The following steps will reproduce a situation in which collection creation 
> fails with this stack trace:
> {code:java}
> 2020-08-03 12:17:58.617 INFO  
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861
> 2020-08-03 12:17:58.751 ERROR 
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
> create failed:org.apache.solr.common.SolrException
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347)
>   at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
>   at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Only one extra tag supported for the 
> tag cores in {
>   "cores":"#EQUAL",
>   "node":"#ANY",
>   "strict":"false"}
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277)
>   at 
> org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192)
>   ... 6 more
> {code}
> Generalized steps:
> # Deploy 8.6.0 with separate data directories, create a collection to prove 
> it's working
> # download 
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
> # Stop the server on all nodes
> # replace the 8.6.0 with 8.6.1 
> # Start the server
> # via the admin UI create a collection
> # Observe failure warning box (with no text), check logs, find above trace
> Or more exactly here are my actual commands with a checkout of the 8.6.0 tag 
> in the working dir to which cloud.sh was configured:
> # /cloud.sh new -r upgrademe 
> # Create collection named test860 via admin ui with _default
> # ./cloud.sh stop 
> # cd upgrademe/
> # cp ../8_6_1_RC1/solr-8.6.1.tgz .
> # mv solr-8.6.0-SNAPSHOT old
> # tar xzvf solr-8.6.1.tgz
> # cd ..
> # ./cloud.sh start
> # Try to create collection test861 with _default config
> For those not familiar with it the first command there with cloud.sh builds 
> the tarball in the working directory and then makes a directory named 
> "upgrademe" copies it to "upgrademe" unpacks it, sets up a chroot based on 
> the path in (already running separate) zookeeper, and by default starts 4 
> local nodes on ports 8981 

[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail

2020-08-03 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-14706:

Environment: 8.6.1 upgraded from 8.6.0 with more than one node  (was: 8.6.1 
upgraded from 8.6.0)

> Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
> ---
>
> Key: SOLR-14706
> URL: https://issues.apache.org/jira/browse/SOLR-14706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.1
> Environment: 8.6.1 upgraded from 8.6.0 with more than one node
>Reporter: Gus Heck
>Priority: Blocker
> Fix For: 8.6.1
>
>
> The following steps will reproduce a situation in which collection creation 
> fails with this stack trace:
> {code:java}
> 2020-08-03 12:17:58.617 INFO  
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861
> 2020-08-03 12:17:58.751 ERROR 
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
> create failed:org.apache.solr.common.SolrException
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347)
>   at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
>   at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Only one extra tag supported for the 
> tag cores in {
>   "cores":"#EQUAL",
>   "node":"#ANY",
>   "strict":"false"}
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277)
>   at 
> org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192)
>   ... 6 more
> {code}
> Generalized steps:
> # Deploy 8.6.0 with separate data directories, create a collection to prove 
> it's working
> # download 
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
> # Stop the server on all nodes
> # replace the 8.6.0 with 8.6.1 
> # Start the server
> # via the admin UI create a collection
> # Observe failure warning box (with no text), check logs, find above trace
> Or more exactly here are my actual commands with a checkout of the 8.6.0 tag 
> in the working dir to which cloud.sh was configured:
> # /cloud.sh new -r upgrademe 
> # Create collection named test860 via admin ui with _default
> # ./cloud.sh stop 
> # cd upgrademe/
> # cp ../8_6_1_RC1/solr-8.6.1.tgz .
> # mv solr-8.6.0-SNAPSHOT old
> # tar xzvf solr-8.6.1.tgz
> # cd ..
> # ./cloud.sh start
> # Try to create collection test861 with _default config
> For those not familiar with it the first command there with cloud.sh builds 
> the tarball in the working directory and then makes a directory named 
> "upgrademe" copies it to "upgrademe" unpacks it, sets up a chroot based on 
> 

[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail

2020-08-03 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-14706:

Issue Type: Bug  (was: New Feature)

> Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
> ---
>
> Key: SOLR-14706
> URL: https://issues.apache.org/jira/browse/SOLR-14706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.1
> Environment: 8.6.1 upgraded from 8.6.0
>Reporter: Gus Heck
>Priority: Blocker
>
> The following steps will reproduce a situation in which collection creation 
> fails with this stack trace:
> {code:java}
> 2020-08-03 12:17:58.617 INFO  
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861
> 2020-08-03 12:17:58.751 ERROR 
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
> create failed:org.apache.solr.common.SolrException
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347)
>   at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
>   at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Only one extra tag supported for the 
> tag cores in {
>   "cores":"#EQUAL",
>   "node":"#ANY",
>   "strict":"false"}
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277)
>   at 
> org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192)
>   ... 6 more
> {code}
> Generalized steps:
> # Deploy 8.6.0 with separate data directories, create a collection to prove 
> it's working
> # download 
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
> # Stop the server on all nodes
> # replace the 8.6.0 with 8.6.1 
> # Start the server
> # via the admin UI create a collection
> # Observe failure warning box (with no text), check logs, find above trace
> Or more exactly here are my actual commands with a checkout of the 8.6.0 tag 
> in the working dir to which cloud.sh was configured:
> # /cloud.sh new -r upgrademe 
> # Create collection named test860 via admin ui with _default
> # ./cloud.sh stop 
> # cd upgrademe/
> # cp ../8_6_1_RC1/solr-8.6.1.tgz .
> # mv solr-8.6.0-SNAPSHOT old
> # tar xzvf solr-8.6.1.tgz
> # cd ..
> # ./cloud.sh start
> # Try to create collection test861 with _default config
> For those not familiar with it the first command there with cloud.sh builds 
> the tarball in the working directory and then makes a directory named 
> "upgrademe" copies it to "upgrademe" unpacks it, sets up a chroot based on 
> the path in (already running separate) zookeeper, and by default starts 4 
> local nodes on ports 8981 to 8984 all 

[jira] [Commented] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail

2020-08-03 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170384#comment-17170384
 ] 

Gus Heck commented on SOLR-14706:
-

You can use solr/cloud-dev/cloud.sh to fire up multiple nodes quickly. The top 
of cloud.sh has extensive comments on it's use. But setting it up independently 
would be interesting too in case the script is actually misconfiguring 
something. The script does upload a solr.xml to zk too which wouldn't have been 
done again when I upgraded but I haven't thought of how that could be involved 
yet since that didn't change

> Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
> ---
>
> Key: SOLR-14706
> URL: https://issues.apache.org/jira/browse/SOLR-14706
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.1
> Environment: 8.6.1 upgraded from 8.6.0
>Reporter: Gus Heck
>Priority: Blocker
>
> The following steps will reproduce a situation in which collection creation 
> fails with this stack trace:
> {code:java}
> 2020-08-03 12:17:58.617 INFO  
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861
> 2020-08-03 12:17:58.751 ERROR 
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
> create failed:org.apache.solr.common.SolrException
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347)
>   at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
>   at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Only one extra tag supported for the 
> tag cores in {
>   "cores":"#EQUAL",
>   "node":"#ANY",
>   "strict":"false"}
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277)
>   at 
> org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192)
>   ... 6 more
> {code}
> Generalized steps:
> # Deploy 8.6.0 with separate data directories, create a collection to prove 
> it's working
> # download 
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
> # Stop the server on all nodes
> # replace the 8.6.0 with 8.6.1 
> # Start the server
> # via the admin UI create a collection
> # Observe failure warning box (with no text), check logs, find above trace
> Or more exactly here are my actual commands with a checkout of the 8.6.0 tag 
> in the working dir to which cloud.sh was configured:
> # /cloud.sh new -r upgrademe 
> # Create collection named test860 via admin ui with _default
> # ./cloud.sh stop 
> # cd upgrademe/
> # cp ../8_6_1_RC1/solr-8.6.1.tgz .
> # mv solr-8.6.0-SNAPSHOT old
> # tar xzvf solr-8.6.1.tgz
> # cd 

[jira] [Commented] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail

2020-08-03 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170025#comment-17170025
 ] 

Gus Heck commented on SOLR-14706:
-

This has to do with the following bit of code from Clause
{code}
if (globalTagName.isPresent()) {
  globalTag = parse(globalTagName.get(), m);
  if (m.size() > 2) {
throw new RuntimeException("Only one extra tag supported for the tag " 
+ globalTagName.get() + " in " + toJSONString(m));
  }
{code}

which was recently changed from > 3 to > 2 by [~houstonputman] I am quite 
unfamiliar with this area of the code. Houston, can you take a look?

> Upgrading 8.6.0 to 8.6.1 causes collection creation to fail
> ---
>
> Key: SOLR-14706
> URL: https://issues.apache.org/jira/browse/SOLR-14706
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.1
> Environment: 8.6.1 upgraded from 8.6.0
>Reporter: Gus Heck
>Priority: Blocker
>
> The following steps will reproduce a situation in which collection creation 
> fails with this stack trace:
> {code:java}
> 2020-08-03 12:17:58.617 INFO  
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.CreateCollectionCmd Create collection test861
> 2020-08-03 12:17:58.751 ERROR 
> (OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   
> ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
> create failed:org.apache.solr.common.SolrException
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347)
>   at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
>   at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Only one extra tag supported for the 
> tag cores in {
>   "cores":"#EQUAL",
>   "node":"#ANY",
>   "strict":"false"}
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144)
>   at 
> org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300)
>   at 
> org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277)
>   at 
> org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415)
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192)
>   ... 6 more
> {code}
> Generalized steps:
> # Deploy 8.6.0 with separate data directories, create a collection to prove 
> it's working
> # download 
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
> # Stop the server on all nodes
> # replace the 8.6.0 with 8.6.1 
> # Start the server
> # via the admin UI create a collection
> # Observe failure warning box (with no text), check logs, find above trace
> Or more exactly here are my actual commands with a checkout of the 8.6.0 tag 
> in the working dir to which cloud.sh was configured:
> # /cloud.sh new -r upgrademe 
> # Create collection named test860 via admin ui with _default
> # ./cloud.sh stop 
> # cd upgrademe/
> # cp ../8_6_1_RC1/solr-8.6.1.tgz .
> # mv 

[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail

2020-08-03 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-14706:

Description: 
The following steps will reproduce a situation in which collection creation 
fails with this stack trace:

{code:java}
2020-08-03 12:17:58.617 INFO  
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.CreateCollectionCmd Create collection test861
2020-08-03 12:17:58.751 ERROR 
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
create failed:org.apache.solr.common.SolrException
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347)
at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Only one extra tag supported for the tag 
cores in {
  "cores":"#EQUAL",
  "node":"#ANY",
  "strict":"false"}
at 
org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144)
at 
org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372)
at 
org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300)
at 
org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277)
at 
org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661)
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415)
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192)
... 6 more

{code}

Generalized steps:
# Deploy 8.6.0 with separate data directories, create a collection to prove 
it's working
# download 
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
# Stop the server on all nodes
# replace the 8.6.0 with 8.6.1 
# Start the server
# via the admin UI create a collection
# Observe failure warning box (with no text), check logs, find above trace

Or more exactly here are my actual commands with a checkout of the 8.6.0 tag in 
the working dir to which cloud.sh was configured:

# /cloud.sh new -r upgrademe 
# Create collection named test860 via admin ui with _default
# ./cloud.sh stop 
# cd upgrademe/
# cp ../8_6_1_RC1/solr-8.6.1.tgz .
# mv solr-8.6.0-SNAPSHOT old
# tar xzvf solr-8.6.1.tgz
# cd ..
# ./cloud.sh start
# Try to create collection test861 with _default config

For those not familiar with it the first command there with cloud.sh builds the 
tarball in the working directory and then makes a directory named "upgrademe" 
copies it to "upgrademe" unpacks it, sets up a chroot based on the path in 
(already running separate) zookeeper, and by default starts 4 local nodes on 
ports 8981 to 8984 all with separate data directorys hosted under the 
"upgrademe" directory. 

  was:
The following steps will reproduce a situation in which collection creation 
fails with this stack trace:

{code:java}
2020-08-03 12:17:58.617 INFO  
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.CreateCollectionCmd Create collection test861
2020-08-03 12:17:58.751 ERROR 
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
create failed:org.apache.solr.common.SolrException
at 

[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail

2020-08-03 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-14706:

Description: 
The following steps will reproduce a situation in which collection creation 
fails with this stack trace:

{code:java}
2020-08-03 12:17:58.617 INFO  
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.CreateCollectionCmd Create collection test861
2020-08-03 12:17:58.751 ERROR 
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
create failed:org.apache.solr.common.SolrException
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347)
at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Only one extra tag supported for the tag 
cores in {
  "cores":"#EQUAL",
  "node":"#ANY",
  "strict":"false"}
at 
org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144)
at 
org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372)
at 
org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300)
at 
org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277)
at 
org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661)
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415)
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192)
... 6 more

{code}

Generalized steps:
# Deploy 8.6.0 with separate data directories, create a collection to prove 
it's working**
# download 
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
# Stop the server on all nodes
# replace the 8.6.0 with 8.6.1 
# Start the server
# via the admin UI create a collection
# Observe failure warning box (with no text), check logs, find above trace

Or more exactly here are my actual commands with a checkout of the 8.6.0 tag in 
the working dir to which cloud.sh was configured:

# /cloud.sh new -r upgrademe 
# Create collection named test860 via admin ui with _default
# ./cloud.sh stop 
# cd upgrademe/
# cp ../8_6_1_RC1/solr-8.6.1.tgz .
# mv solr-8.6.0-SNAPSHOT old
# tar xzvf solr-8.6.1.tgz
# cd ..
# ./cloud.sh start
# Try to create collection test861 with _default config

For those not familiar with it the first command there with cloud.sh builds the 
tarball in the working directory and then makes a directory named "upgrademe" 
copies it to "upgrademe" unpacks it, sets up a chroot based on the path in 
(already running separate) zookeeper, and by default starts 4 local nodes on 
ports 8981 to 8984 all with separate data directorys hosted under the 
"upgrademe" directory. 

  was:
The following steps will reproduce a situation in which collection creation 
fails with this stack trace:

{code:java}
2020-08-03 12:17:58.617 INFO  
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.CreateCollectionCmd Create collection test861
2020-08-03 12:17:58.751 ERROR 
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
create failed:org.apache.solr.common.SolrException
at 

[jira] [Updated] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail

2020-08-03 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-14706:

Description: 
The following steps will reproduce a situation in which collection creation 
fails with this stack trace:

{code:java}
2020-08-03 12:17:58.617 INFO  
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.CreateCollectionCmd Create collection test861
2020-08-03 12:17:58.751 ERROR 
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
create failed:org.apache.solr.common.SolrException
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347)
at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Only one extra tag supported for the tag 
cores in {
  "cores":"#EQUAL",
  "node":"#ANY",
  "strict":"false"}
at 
org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144)
at 
org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372)
at 
org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300)
at 
org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277)
at 
org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661)
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415)
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192)
... 6 more

{code}

Generalized steps:
# Deploy 8.6.0 with separate data directories, create a collection to prove 
it's working**
# download 
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
# Stop the server on all nodes
# replace the 8.6.0 with 8.6.1 
# Start the server
# via the admin UI create a collection
# Observe failure warning box (with no text), check logs, find above trace

Or more exactly here are my actual commands with a checkout of the 8.6.0 tag in 
the working dir to which cloud.sh was configured:

# /cloud.sh new -r upgrademe 
# Create collection named test860 via admin ui with _default
# ./cloud.sh stop 
# cd upgrademe/
# cp ../8_6_1_RC1/solr-8.6.1.tgz .
# mv solr-8.6.0-SNAPSHOT old
# tar xzvf solr-8.6.1.tgz
# cd ..
# ./cloud.sh start

For those not familiar with it the first command there with cloud.sh builds the 
tarball in the working directory and then makes a directory named "upgrademe" 
copies it to "upgrademe" unpacks it, sets up a chroot based on the path in 
(already running separate) zookeeper, and by default starts 4 local nodes on 
ports 8981 to 8984 all with separate data directorys hosted under the 
"upgrademe" directory. 

  was:
The following steps will reproduce a situation in which collection creation 
fails with this stack trace:

{code:java}
2020-08-03 12:17:58.617 INFO  
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.CreateCollectionCmd Create collection test861
2020-08-03 12:17:58.751 ERROR 
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
create failed:org.apache.solr.common.SolrException
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347)
at 

[jira] [Created] (SOLR-14706) Upgrading 8.6.0 to 8.6.1 causes collection creation to fail

2020-08-03 Thread Gus Heck (Jira)
Gus Heck created SOLR-14706:
---

 Summary: Upgrading 8.6.0 to 8.6.1 causes collection creation to 
fail
 Key: SOLR-14706
 URL: https://issues.apache.org/jira/browse/SOLR-14706
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
  Components: AutoScaling
Affects Versions: 8.6.1
 Environment: 8.6.1 upgraded from 8.6.0
Reporter: Gus Heck


The following steps will reproduce a situation in which collection creation 
fails with this stack trace:

{code:java}
2020-08-03 12:17:58.617 INFO  
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.CreateCollectionCmd Create collection test861
2020-08-03 12:17:58.751 ERROR 
(OverseerThreadFactory-22-thread-1-processing-n:192.168.2.106:8981_solr) [   ] 
o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test861 operation: 
create failed:org.apache.solr.common.SolrException
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:347)
at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:517)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Only one extra tag supported for the tag 
cores in {
  "cores":"#EQUAL",
  "node":"#ANY",
  "strict":"false"}
at 
org.apache.solr.client.solrj.cloud.autoscaling.Clause.(Clause.java:122)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Clause.create(Clause.java:235)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at 
org.apache.solr.client.solrj.cloud.autoscaling.Policy.(Policy.java:144)
at 
org.apache.solr.client.solrj.cloud.autoscaling.AutoScalingConfig.getPolicy(AutoScalingConfig.java:372)
at 
org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:300)
at 
org.apache.solr.cloud.api.collections.Assign.usePolicyFramework(Assign.java:277)
at 
org.apache.solr.cloud.api.collections.Assign$AssignStrategyFactory.create(Assign.java:661)
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:415)
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:192)
... 6 more

{code}

Generalized steps:
# Deploy 8.6.0 with separate data directories, create a collection to prove 
it's working**
# download 
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
# Stop the server on all nodes
# replace the 8.6.0 with 8.6.1 
# Start the server
# via the admin UI create a collection
# Observe failure warning box (with no text), check logs, find above trace

Or more exactly here are my actual commands with a checkout of the 8.6.0 tag in 
the working dir to which cloud.sh was configured:

# /cloud.sh new -r upgrademe 
# Create collection named test860 via admin ui with _default
# ./cloud.sh stop 
# cd upgrademe/
# ../8_6_1_RC1/solr-8.6.1.tgz .
# mv solr-8.6.0-SNAPSHOT old
# tar xzvf solr-8.6.1.tgz
# cd ..
# ./cloud.sh start

For those not familiar with it the first command there with cloud.sh builds the 
tarball in the working directory and then makes a directory named "upgrademe" 
copies it to "upgrademe" unpacks it, sets up a chroot based on the path in 
(already running separate) zookeeper, and by default starts 4 local nodes on 
ports 8981 to 8984 all with separate data directorys hosted under the 
"upgrademe" directory. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh

2020-08-02 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169700#comment-17169700
 ] 

Gus Heck edited comment on SOLR-14704 at 8/3/20, 3:57 AM:
--

For example: 

{code:java}
./cloud.sh new -d 
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
 8_6_1_RC1
{code}



was (Author: gus_heck):
For example: 

{code:java}
./cloud.sh new -t 
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
 8_6_1_RC1
{code}


> Add download option to solr/cloud-dev/cloud.sh
> --
>
> Key: SOLR-14704
> URL: https://issues.apache.org/jira/browse/SOLR-14704
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For easier testing of things like RC artifacts I'm adding an option to 
> cloud.sh which will curl a tarball down from the web instead of building it 
> locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh

2020-08-02 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169700#comment-17169700
 ] 

Gus Heck commented on SOLR-14704:
-

For example: 

{code:java}
./cloud.sh new -t 
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC1-reva32a3ac4e43f629df71e5ae30a3330be94b095f2/solr/solr-8.6.1.tgz
 8_6_1_RC1
{code}


> Add download option to solr/cloud-dev/cloud.sh
> --
>
> Key: SOLR-14704
> URL: https://issues.apache.org/jira/browse/SOLR-14704
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For easier testing of things like RC artifacts I'm adding an option to 
> cloud.sh which will curl a tarball down from the web instead of building it 
> locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh

2020-08-02 Thread Gus Heck (Jira)
Gus Heck created SOLR-14704:
---

 Summary: Add download option to solr/cloud-dev/cloud.sh
 Key: SOLR-14704
 URL: https://issues.apache.org/jira/browse/SOLR-14704
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
  Components: scripts and tools
Reporter: Gus Heck
Assignee: Gus Heck


For easier testing of things like RC artifacts I'm adding an option to cloud.sh 
which will curl a tarball down from the web instead of building it locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-07-27 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck resolved SOLR-13169.
-
Fix Version/s: 8.6
   Resolution: Fixed

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
> Fix For: 8.6
>
> Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14608) Faster sorting for the /export handler

2020-07-27 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165712#comment-17165712
 ] 

Gus Heck edited comment on SOLR-14608 at 7/27/20, 1:41 PM:
---

A question from a customer caused me to re-read this and think a bit more 
deeply. I'm wondering about the fact that the priority queue has a limit on 
it's size. This would seem to place a (hard to define) limit on the size of the 
segment, and perhaps fail by returning out of order docs silently? (The client 
case in question is a collection that is approaching half a trillion 
documents...)


was (Author: gus_heck):
A question from a customer caused me to re-read this and think a bit more 
deeply. I'm wondering about the fact that the priority queue has a limit on 
it's size. This would seem to place a (hard to define) limit on the size of the 
segment, and perhaps fail by returning out of order docs silently? (The client 
case in question is a cluster that is approaching half a trillion documents...)

> Faster sorting for the /export handler
> --
>
> Key: SOLR-14608
> URL: https://issues.apache.org/jira/browse/SOLR-14608
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Andrzej Bialecki
>Priority: Major
>
> The largest cost of the export handler is the sorting. This ticket will 
> implement an improved algorithm for sorting that should greatly increase 
> overall throughput for the export handler.
> *The current algorithm is as follows:*
> Collect a bitset of matching docs. Iterate over that bitset and materialize 
> the top level oridinals for the sort fields in the document and add them to 
> priority queue of size 3. Then export the top 3 docs, turn off the 
> bits in the bit set and iterate again until all docs are sorted and sent. 
> There are two performance bottlenecks with this approach:
> 1) Materializing the top level ordinals adds a huge amount of overhead to the 
> sorting process.
> 2) The size of priority queue, 30,000, adds significant overhead to sorting 
> operations.
> *The new algorithm:*
> Has a top level *merge sort iterator* that wraps segment level iterators that 
> perform segment level priority queue sorts.
> *Segment level:*
> The segment level docset will be iterated and the segment level ordinals for 
> the sort fields will be materialized and added to a segment level priority 
> queue. As the segment level iterator pops docs from the priority queue the 
> top level ordinals for the sort fields are materialized. Because the top 
> level ordinals are materialized AFTER the sort, they only need to be looked 
> up when the segment level ordinal changes. This takes advantage of the sort 
> to limit the lookups into the top level ordinal structures. This also 
> eliminates redundant lookups of top level ordinals that occur during the 
> multiple passes over the matching docset.
> The segment level priority queues can be kept smaller than 30,000 to improve 
> performance of the sorting operations because the overall batch size will 
> still be 30,000 or greater when all the segment priority queue sizes are 
> added up. This allows for batch sizes much larger then 30,000 without using a 
> single large priority queue. The increased batch size means fewer iterations 
> over the matching docset and the decreased priority queue size means faster 
> sorting operations.
> *Top level:*
> A top level iterator does a merge sort over the segment level iterators by 
> comparing the top level ordinals materialized when the segment level docs are 
> popped from the segment level priority queues. This requires no extra memory 
> and will be very performant.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14608) Faster sorting for the /export handler

2020-07-27 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165712#comment-17165712
 ] 

Gus Heck commented on SOLR-14608:
-

A question from a customer caused me to re-read this and think a bit more 
deeply. I'm wondering about the fact that the priority queue has a limit on 
it's size. This would seem to place a (hard to define) limit on the size of the 
segment, and perhaps fail by returning out of order docs silently? (The client 
case in question is a cluster that is approaching half a trillion documents...)

> Faster sorting for the /export handler
> --
>
> Key: SOLR-14608
> URL: https://issues.apache.org/jira/browse/SOLR-14608
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Andrzej Bialecki
>Priority: Major
>
> The largest cost of the export handler is the sorting. This ticket will 
> implement an improved algorithm for sorting that should greatly increase 
> overall throughput for the export handler.
> *The current algorithm is as follows:*
> Collect a bitset of matching docs. Iterate over that bitset and materialize 
> the top level oridinals for the sort fields in the document and add them to 
> priority queue of size 3. Then export the top 3 docs, turn off the 
> bits in the bit set and iterate again until all docs are sorted and sent. 
> There are two performance bottlenecks with this approach:
> 1) Materializing the top level ordinals adds a huge amount of overhead to the 
> sorting process.
> 2) The size of priority queue, 30,000, adds significant overhead to sorting 
> operations.
> *The new algorithm:*
> Has a top level *merge sort iterator* that wraps segment level iterators that 
> perform segment level priority queue sorts.
> *Segment level:*
> The segment level docset will be iterated and the segment level ordinals for 
> the sort fields will be materialized and added to a segment level priority 
> queue. As the segment level iterator pops docs from the priority queue the 
> top level ordinals for the sort fields are materialized. Because the top 
> level ordinals are materialized AFTER the sort, they only need to be looked 
> up when the segment level ordinal changes. This takes advantage of the sort 
> to limit the lookups into the top level ordinal structures. This also 
> eliminates redundant lookups of top level ordinals that occur during the 
> multiple passes over the matching docset.
> The segment level priority queues can be kept smaller than 30,000 to improve 
> performance of the sorting operations because the overall batch size will 
> still be 30,000 or greater when all the segment priority queue sizes are 
> added up. This allows for batch sizes much larger then 30,000 without using a 
> single large priority queue. The increased batch size means fewer iterations 
> over the matching docset and the decreased priority queue size means faster 
> sorting operations.
> *Top level:*
> A top level iterator does a merge sort over the segment level iterators by 
> comparing the top level ordinals materialized when the segment level docs are 
> popped from the segment level priority queues. This requires no extra memory 
> and will be very performant.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-07-26 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck reassigned SOLR-13169:
---

Assignee: Gus Heck

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
> Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12847) Cut over implementation of maxShardsPerNode to a collection policy

2020-07-20 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161571#comment-17161571
 ] 

Gus Heck edited comment on SOLR-12847 at 7/20/20, 10:05 PM:


FWIW when I was investigating and fixing the MOVEREPLICA docs I found that 
maxShardsPerNode is advisory only once the collection is created and in not a 
hard limit. If destination node is specified in ADDREPLICA it will force 
placement above the limit and move replica always issues its add with a 
specified value for the node. Thus, MOVEREPLICA entirely ignores 
maxShardsPerNode. SOLR-13169


was (Author: gus_heck):
FWIW when I was investigating and fixing the MOVEREPLICA docs I found that 
maxShardsPerNode is advisory only once the collection is created and in not a 
hard limit. If destination node is specified in ADDREPLICA it will force 
placement above the limit and move replica always issues its add with a 
specified value for the node. Thus, MOVEREPLICA entirely ignores 
maxShardsPerNode.

> Cut over implementation of maxShardsPerNode to a collection policy
> --
>
> Key: SOLR-12847
> URL: https://issues.apache.org/jira/browse/SOLR-12847
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling, SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We've back and forth over handling maxShardsPerNode with autoscaling policies 
> (see SOLR-11005 for history). Now that we've reimplemented support for 
> creating collections with maxShardsPerNode when autoscaling policy is 
> enabled, we should re-look at how it is implemented.
> I propose that we fold maxShardsPerNode (if specified) to a collection level 
> policy that overrides the corresponding default in cluster policy (see 
> SOLR-12845). We'll need to ensure that if maxShardsPerNode is specified then 
> the user sees neither violations nor corresponding suggestions because of the 
> default cluster policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12847) Cut over implementation of maxShardsPerNode to a collection policy

2020-07-20 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161571#comment-17161571
 ] 

Gus Heck commented on SOLR-12847:
-

FWIW when I was investigating and fixing the MOVEREPLICA docs I found that 
maxShardsPerNode is advisory only once the collection is created and in not a 
hard limit. If destination node is specified in ADDREPLICA it will force 
placement above the limit and move replica always issues its add with a 
specified value for the node. Thus, MOVEREPLICA entirely ignores 
maxShardsPerNode.

> Cut over implementation of maxShardsPerNode to a collection policy
> --
>
> Key: SOLR-12847
> URL: https://issues.apache.org/jira/browse/SOLR-12847
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling, SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We've back and forth over handling maxShardsPerNode with autoscaling policies 
> (see SOLR-11005 for history). Now that we've reimplemented support for 
> creating collections with maxShardsPerNode when autoscaling policy is 
> enabled, we should re-look at how it is implemented.
> I propose that we fold maxShardsPerNode (if specified) to a collection level 
> policy that overrides the corresponding default in cluster policy (see 
> SOLR-12845). We'll need to ensure that if maxShardsPerNode is specified then 
> the user sees neither violations nor corresponding suggestions because of the 
> default cluster policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14597) Advanced Query Parser

2020-07-10 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155587#comment-17155587
 ] 

Gus Heck commented on SOLR-14597:
-

After some work came up with this which omits files that don't have "java" in 
their name, but should give a decent idea:
{code:java}
NS2-MacBook-Pro:lucene-solr-cdg3 gus$ git diff HEAD..master_head | grep 'diff 
..git' | grep java |sed 's#b/#@#' | rev | cut -d'@' -f 1 | rev
gradle/generation/javacc.gradle
lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/DropIfFlaggedFilter.java
lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/DropIfFlaggedFilterFactory.java
lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/PatternTypingFilter.java
lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/PatternTypingFilterFactory.java
lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/TypeAsSynonymFilter.java
lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/TypeAsSynonymFilterFactory.java
lucene/analysis/common/src/test/org/apache/lucene/analysis/minhash/MinHashFilterTest.java
lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestConcatenatingTokenStream.java
lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestDropIfFlaggedFilter.java
lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestDropIfFlaggedFilterFactory.java
lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestPatternTypingFilter.java
lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestPatternTypingFilterFactory.java
lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestTypeAsSynonymFilter.java
lucene/analysis/common/src/test/org/apache/lucene/analysis/miscellaneous/TestTypeAsSynonymFilterFactory.java
lucene/core/src/test/org/apache/lucene/analysis/TestStopFilter.java
lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java
solr/core/src/java/org/apache/solr/analysis/TokenAnalyzerFilter.java
solr/core/src/java/org/apache/solr/analysis/TokenAnalyzerFilterFactory.java
solr/core/src/java/org/apache/solr/aqp/AdvToken.java
solr/core/src/java/org/apache/solr/aqp/AdvancedQueryParserBase.java
solr/core/src/java/org/apache/solr/aqp/ParseException.java
solr/core/src/java/org/apache/solr/aqp/QueryParser.java
solr/core/src/java/org/apache/solr/aqp/QueryParser.jj
solr/core/src/java/org/apache/solr/aqp/QueryParserConstants.java
solr/core/src/java/org/apache/solr/aqp/QueryParserTokenManager.java
solr/core/src/java/org/apache/solr/aqp/SpanContext.java
solr/core/src/java/org/apache/solr/aqp/Token.java
solr/core/src/java/org/apache/solr/aqp/TokenMgrError.java
solr/core/src/java/org/apache/solr/aqp/package-info.java
solr/core/src/java/org/apache/solr/parser/Operator.java
solr/core/src/java/org/apache/solr/parser/QueryParser.java
solr/core/src/java/org/apache/solr/parser/QueryParser.jj
solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java
solr/core/src/java/org/apache/solr/parser/SynonymQueryStyle.java
solr/core/src/java/org/apache/solr/schema/IndexSchema.java
solr/core/src/java/org/apache/solr/schema/TextField.java
solr/core/src/java/org/apache/solr/search/AdvancedQParser.java
solr/core/src/java/org/apache/solr/search/AdvancedQParserPlugin.java
solr/core/src/java/org/apache/solr/search/AdvancedQueryParser.java
solr/core/src/java/org/apache/solr/search/ComplexPhraseQParserPlugin.java
solr/core/src/java/org/apache/solr/search/DisMaxQParser.java
solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java
solr/core/src/java/org/apache/solr/search/QParserPlugin.java
solr/core/src/java/org/apache/solr/search/QueryParsing.java
solr/core/src/java/org/apache/solr/search/SimpleQParserPlugin.java
solr/core/src/java/org/apache/solr/util/SolrPluginUtils.java
solr/core/src/test/org/apache/solr/analysis/PatternTypingFilterFactoryTest.java
solr/core/src/test/org/apache/solr/analysis/TokenAnalyzerFilterFactoryTest.java
solr/core/src/test/org/apache/solr/aqp/AbstractAqpTestCase.java
solr/core/src/test/org/apache/solr/aqp/CharacterRangeTest.java
solr/core/src/test/org/apache/solr/aqp/FieldedSearchTest.java
solr/core/src/test/org/apache/solr/aqp/LiteralPhraseTest.java
solr/core/src/test/org/apache/solr/aqp/MustNotTest.java
solr/core/src/test/org/apache/solr/aqp/MustTest.java
solr/core/src/test/org/apache/solr/aqp/NumericSearchTest.java
solr/core/src/test/org/apache/solr/aqp/OrderedDistanceGroupTest.java
solr/core/src/test/org/apache/solr/aqp/PhraseTest.java
solr/core/src/test/org/apache/solr/aqp/ShouldTest.java
solr/core/src/test/org/apache/solr/aqp/SimpleGroupTest.java
solr/core/src/test/org/apache/solr/aqp/SimpleQueryTest.java
solr/core/src/test/org/apache/solr/aqp/TemporalFieldedSearchTest.java

[jira] [Commented] (SOLR-14597) Advanced Query Parser

2020-07-02 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150438#comment-17150438
 ] 

Gus Heck commented on SOLR-14597:
-

This thought has occurred to me, but the coordination of several parts across 
both Lucene and Solr layers seems awkward for a packge/plugin (Solr parser, a 
couple new Lucene filters, etc), and I do hope that it is a generally useful 
parser as you mention. When we sort out the legalities and get a patch up this 
will become more clear, but generally it adds another javacc based parser, that 
was based on and is able to reuse some bits of the standard parser (a few of 
which needed to be extracted/or made accessible). There are also a few small 
tweaks to core classes, (which seem justified to me, but of course review and 
commentary is welcome). So even if a package/plugin is part of the final result 
we will likely have some changes to Solr & Lucene directly as well. 

> Advanced Query Parser
> -
>
> Key: SOLR-14597
> URL: https://issues.apache.org/jira/browse/SOLR-14597
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.6
>Reporter: Mike Nibeck
>Assignee: Gus Heck
>Priority: Major
>
> This JIRA ticket tracks the progress of SIP-9, the Advanced Query Parser that 
> is being donated by the Library of Congress. Full description of the feature 
> can be found on the SIP Page.
> [https://cwiki.apache.org/confluence/display/SOLR/SIP-9+Advanced+Query+Parser]
> Briefly, this parser provides a comprehensive syntax for users that use 
> search on a daily basis. It also reserves a smaller set of punctuators than 
> other parsers. This facilitates easier handling of acronyms and punctuated 
> patterns with meaning ( such as C++ or 401(k) ). The new syntax opens up some 
> advanced features while also preventing access to arbitrary features via 
> local parameters. This parser will be safe for accepting user queries 
> directly with minimal pre-parsing, but for use cases beyond it's established 
> features alternate query paths (using other parsers) will need to be supplied.
> The code drop is being prepared and will be supplied as soon as we receive 
> guidance from the PMC regarding the proper process. Given that the Library 
> already has a signed CCLA we need to understand which of these (or other 
> processes) apply:
> [http://incubator.apache.org/ip-clearance/ip-clearance-template.html]
> and 
> [https://www.apache.org/licenses/contributor-agreements.html#grants]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14022) Deprecate CDCR from Solr in 8.x

2020-07-01 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149600#comment-17149600
 ] 

Gus Heck commented on SOLR-14022:
-

Do we want to think about whether it's a good idea to have yet another set of 
technologies/servers to deploy to make solr work (fully)? Pulsar uses in 
Bookeeper and it looks like pulsar gets deployed as its own cluster. So this 
would lead to a minimum of 4 Solrs, 3 zookeepers and 3 pulsars for HA across 
regions. I've met clients who had trouble with the idea of separate zookeeper 
servers...

Another thing I'd like to say is a weakness of the existing CDCR is the use of 
collection names in config, which made it incompatible with routed aliases 
(where collections are added dynamically). A solution relying on external tools 
seems even less likely to be able to account for that. 

> Deprecate CDCR from Solr in 8.x
> ---
>
> Key: SOLR-14022
> URL: https://issues.apache.org/jira/browse/SOLR-14022
> Project: Solr
>  Issue Type: Improvement
>  Components: CDCR
>Reporter: Joel Bernstein
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.6
>
>
> This ticket will deprecate CDCR in Solr 8x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13286) Move Metrics handler and any other noisy admin logging to debug

2020-06-26 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck resolved SOLR-13286.
-
Fix Version/s: 8.6
   Resolution: Fixed

> Move Metrics handler and any other noisy admin logging to debug
> ---
>
> Key: SOLR-13286
> URL: https://issues.apache.org/jira/browse/SOLR-13286
> Project: Solr
>  Issue Type: Improvement
>  Components: logging
>Affects Versions: master (9.0)
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Minor
> Fix For: 8.6
>
> Attachments: SOLR-13286.patch, SOLR-13286.patch
>
>
> Lately when looking at log files I always find myself straining and squinting 
> to see things among a vast sea of metrics related logging. The problem 
> appears to be that the metrics system regularly issues /admin/ commands that 
> get logged at info by HttpSolrCall, so turning this down also means you can't 
> see any other admin commands, which is often what you're looking for in the 
> first place (ok what I'm often looking for at least :) ). I also recall 
> seeing a complaint about this on one of the lists at some point. 
> Attaching patch to log at an alternate level these based on the value of the 
> handler field in HttpSolrCall. Patch is untested and meant as fodder for 
> commentary and for suggestions of other handlers that might want to go on the 
> "noisy" list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14597) Advanced Query Parser

2020-06-26 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck reassigned SOLR-14597:
---

Assignee: Gus Heck

> Advanced Query Parser
> -
>
> Key: SOLR-14597
> URL: https://issues.apache.org/jira/browse/SOLR-14597
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.6
>Reporter: Mike Nibeck
>Assignee: Gus Heck
>Priority: Major
>
> This JIRA ticket tracks the progress of SIP-, the Advanced Query Parser that 
> is being donated by the Library of Congress. Full description of the feature 
> can be found on the SIP Page.
>  
> Briefly, this parser provides a comprehensive syntax for users that use 
> search on a daily basis. It also reserves a smaller set of punctuators than 
> other parsers. This facilitates easier handling of acronyms and punctuated 
> patterns with meaning ( such as C++ or 401(k) ). The new syntax opens up some 
> advanced features while also preventing access to arbitrary features via 
> local parameters. This parser will be safe for accepting user queries 
> directly with minimal pre-parsing, but for use cases beyond it's established 
> features alternate query paths (using other parsers) will need to be supplied.
> The code drop is being prepared and will be supplied as soon as we receive 
> guidance from the PMC regarding the proper process. Given that the Library 
> already has a signed CCLA we need to understand which of these (or other 
> processes) apply:
> [http://incubator.apache.org/ip-clearance/ip-clearance-template.html]
> and 
> [https://www.apache.org/licenses/contributor-agreements.html#grants]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14588) Circuit Breakers Infrastructure and Real JVM Based Circuit Breaker

2020-06-25 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145873#comment-17145873
 ] 

Gus Heck commented on SOLR-14588:
-

[http://fucit.org/solr-jenkins-reports/failure-report.html] - failing 100% 
since this commit I think

> Circuit Breakers Infrastructure and Real JVM Based Circuit Breaker
> --
>
> Key: SOLR-14588
> URL: https://issues.apache.org/jira/browse/SOLR-14588
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Atri Sharma
>Assignee: Atri Sharma
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> This Jira tracks addition of circuit breakers in the search path and 
> implements JVM based circuit breaker which rejects incoming search requests 
> if the JVM heap usage exceeds a defined percentage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9411) Fail complation on warnings

2020-06-22 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142108#comment-17142108
 ] 

Gus Heck commented on LUCENE-9411:
--

There seem to be a number of Apache projects that have found (or ignored) the 
answer to this dilemma.

https://issues.apache.org/jira/issues/?jql=text%20~%20%22spotbugs%22

 Zookeeper is Apache, we should probably ask them about it. My guess is that 
since it is LGPL not GPL this applies... 

https://www.apache.org/legal/resolved.html#build-tools 

This and the above search implies there is (or should be) an exception for this 
tool already approved.

> Fail complation on warnings
> ---
>
> Key: LUCENE-9411
> URL: https://issues.apache.org/jira/browse/LUCENE-9411
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>  Labels: build
> Attachments: LUCENE-9411.patch, LUCENE-9411.patch, LUCENE-9411.patch, 
> annotations-warnings.patch
>
>
> Moving this over here from SOLR-11973 since it's part of the build system and 
> affects Lucene as well as Solr. You might want to see the discussion there.
> We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, 
> try, etc. warnings. There are some peculiar warnings (things like 
> SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's 
> assume those are not a problem. Now I'd like to start failing the compilation 
> if people write new code that generates warnings.
> From what I can tell, just adding the flag is easy in both the Gradle and Ant 
> builds. I still have to prove out that adding -Werrors does what I expect, 
> i.e. succeeds now and fails when I introduce warnings.
> But let's assume that works. Are there objections to this idea generally? I 
> hope to have some data by next Monday.
> FWIW, the Lucene code base had far fewer issues than Solr, but 
> common-build.xml is in Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9411) Fail complation on warnings

2020-06-21 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141571#comment-17141571
 ] 

Gus Heck commented on LUCENE-9411:
--

Do we have other category X 
([https://www.apache.org/legal/resolved.html#category-x]) libraries that we 
have used at compile time only? it seems (to me, IANAL, etc) like licensing is 
only an issue for what we distribute, but distributing a build that 
automatically downloads it might be a grey area, especially if it's not 
associated with an optional feature, which is the case clearly outline din the 
license/legal page.

 

> Fail complation on warnings
> ---
>
> Key: LUCENE-9411
> URL: https://issues.apache.org/jira/browse/LUCENE-9411
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>  Labels: build
> Attachments: LUCENE-9411.patch, LUCENE-9411.patch, LUCENE-9411.patch, 
> annotations-warnings.patch
>
>
> Moving this over here from SOLR-11973 since it's part of the build system and 
> affects Lucene as well as Solr. You might want to see the discussion there.
> We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, 
> try, etc. warnings. There are some peculiar warnings (things like 
> SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's 
> assume those are not a problem. Now I'd like to start failing the compilation 
> if people write new code that generates warnings.
> From what I can tell, just adding the flag is easy in both the Gradle and Ant 
> builds. I still have to prove out that adding -Werrors does what I expect, 
> i.e. succeeds now and fails when I introduce warnings.
> But let's assume that works. Are there objections to this idea generally? I 
> hope to have some data by next Monday.
> FWIW, the Lucene code base had far fewer issues than Solr, but 
> common-build.xml is in Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2020-06-16 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136761#comment-17136761
 ] 

Gus Heck commented on SOLR-13749:
-

8.6 is now [being 
scheduled|https://mail-archives.apache.org/mod_mbox/lucene-dev/202006.mbox/browser],
 so it's probably important to get any last documentation or touch up for this 
so it can be merged and included in the release. 

> Implement support for joining across collections with multiple shards ( XCJF )
> --
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Blocker
> Fix For: 8.6
>
> Attachments: 2020-03 Smiley with ASF hat.jpeg
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to 
> retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the 
> "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash 
> range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this 
> parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered 
> valid, in seconds.  Defaults to 3600 (one hour).  
> The XCJF query will not be aware of changes to the remote collection, so 
> if the remote collection is updated, cached XCJF queries may give inaccurate 
> results.  
> After the ttl period has expired, the XCJF query will re-execute the join 
> against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local 
> param.|
>  
> Example Solr Config.xml changes:
>  
>  {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
>  {{   }}{{class}}{{=}}{{"solr.LRUCache"}}
>  {{   }}{{size}}{{=}}{{"128"}}
>  {{   }}{{initialSize}}{{=}}{{"0"}}
>  {{   }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
>   
>  

[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-13 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134939#comment-17134939
 ] 

Gus Heck commented on SOLR-13169:
-

Corrections from another read through, and documentation for other parameters. 
Choosing not to document `waitForFinalState` at this time because it's unclear 
what value it has. This command already has a wait for the completion of the 
add command and causing the add command to wait/block on it's own doesn't seem 
useful (alternately, my understanding of that parameter is flawed and I 
shouldn't write it into the docs). Opened SOLR-14568 which may change the docs 
for timeout slightly. This turned into a lot more than originally anticipated, 
so attaching a patch summarizing changes to the ref guide in case that helps 
folks look over what I've done.  Given no objections I'll port whatever applies 
to 8.x down to 8.x next weekend (and fix any objections). 

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-13 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-13169:

Attachment: SOLR-13169.patch

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: SOLR-13169.patch, screenshot-1.png, testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14568) org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica uses hard coded timeout

2020-06-13 Thread Gus Heck (Jira)
Gus Heck created SOLR-14568:
---

 Summary: org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica uses 
hard coded timeout
 Key: SOLR-14568
 URL: https://issues.apache.org/jira/browse/SOLR-14568
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: master (9.0)
Reporter: Gus Heck


org.apache.solr.cloud.MoveReplicaCmd#moveHdfsReplica gained a hardcoded timeout 
in SOLR-11045 but there is no clear reason discussed in that ticket and no 
comment in the code to indicate why it is ignoring the value of the timeout 
parameter already passed into that method. 

This should be clarified in code and documented ([~caomanhdat]?) or the timeout 
parameter should be supported. It sure seems like we should support the api 
parameter but from the pattern of commits this looks potentially intentional 
and has survived several revisions, so I hesitate to  just change it without 
input/confirmation.  If this can be clarified soon, I'll document the result it 
in SOLR-131699, otherwise I'll just document the state as it is, and the docs 
can be updated if there are changes resulting from this ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Reopened] (SOLR-14417) Gradle build sometimes fails RE BlockPoolSlice

2020-06-13 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck reopened SOLR-14417:
-

> Gradle build sometimes fails RE BlockPoolSlice
> --
>
> Key: SOLR-14417
> URL: https://issues.apache.org/jira/browse/SOLR-14417
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: David Smiley
>Priority: Minor
>
> There seems to be some package visibility hacks around our Hdfs integration:
> {{/Users/dsmiley/SearchDev/lucene-solr/solr/core/src/test/org/apache/solr/cloud/hdfs/HdfsTestUtil.java:125:
>  error: BlockPoolSlice is not public in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl; cannot be accessed 
> from outside package}}
> {{List> modifiedHadoopClasses = Arrays.asList(BlockPoolSlice.class, 
> DiskChecker.class,}}
> This happens on my Gradle build when running {{gradlew testClasses}} (i.e. to 
> compile tests) but Ant proceeded without issue.  The work-around is to run 
> {{gradlew clean}} first but really I want our build to be smarter here.
> CC [~krisden]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14417) Gradle build sometimes fails RE BlockPoolSlice

2020-06-13 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134845#comment-17134845
 ] 

Gus Heck commented on SOLR-14417:
-

I just hit this when running a test via Intellij (which is using gradle). My 
IDE tells me we have our own version of this class that is public, but when I 
search classes in Intellij, it shows me that it can find both our version and a 
version of the class  in hadoop-hdfs-3.2.0.jar ... the latter of which is not 
public. This appears to be a classpath ordering inconsistency...

> Gradle build sometimes fails RE BlockPoolSlice
> --
>
> Key: SOLR-14417
> URL: https://issues.apache.org/jira/browse/SOLR-14417
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: David Smiley
>Priority: Minor
>
> There seems to be some package visibility hacks around our Hdfs integration:
> {{/Users/dsmiley/SearchDev/lucene-solr/solr/core/src/test/org/apache/solr/cloud/hdfs/HdfsTestUtil.java:125:
>  error: BlockPoolSlice is not public in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl; cannot be accessed 
> from outside package}}
> {{List> modifiedHadoopClasses = Arrays.asList(BlockPoolSlice.class, 
> DiskChecker.class,}}
> This happens on my Gradle build when running {{gradlew testClasses}} (i.e. to 
> compile tests) but Ant proceeded without issue.  The work-around is to run 
> {{gradlew clean}} first but really I want our build to be smarter here.
> CC [~krisden]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-06 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127368#comment-17127368
 ] 

Gus Heck commented on SOLR-13169:
-

hmm, maybe we should document this tidbit:
{code:java}
if (createNodeList != null) { // Overrides petty considerations about 
maxShardsPerNode
 {code}
 
 It does indeed seem to be the case that MOVEREPLICA can be used to violate 
maxShardsPerNode. This happens in the add replica code that move replica 
invokes, so while this was intentional there, it's not clear if it's a bug with 
respect to move replica... 

I created a collection with 1 per node, and moved a replica to one of the other 
nodes successfully:


 !screenshot-1.png!

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: screenshot-1.png, testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-06 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-13169:

Attachment: screenshot-1.png

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: screenshot-1.png, testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-06 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127217#comment-17127217
 ] 

Gus Heck commented on SOLR-13169:
-

Looks like SourceNode is also ignored if replica is supplied, and the node 
chosen is done by this code:
{code:java}
  Collections.shuffle(sliceReplicas, 
OverseerCollectionMessageHandler.RANDOM);
  replica = sliceReplicas.iterator().next();
 {code}
Neither {{CollectionOperation.MOVEREPLICA_OP}} nor 
{{ModifyCollectionCommand#moveReplica}} appear have code consulting 
auto-scaling, but I'm still trying to sort out whether or not the eventual sub 
call to {{AddReplicaCmd#addReplica}} can then be influenced by auto-scaling in 
some way. If so I think we'd have a bug with the current design, though if so 
it would also seem that the destination node could have been optional (with 
that usage meaning "find the optimal place for this replica and make it so").

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-05 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127112#comment-17127112
 ] 

Gus Heck commented on SOLR-13169:
-

Still to do: 
# document additional parameters
# validate the extent to which auto-scaling is involved... with targetNode 
being required I am skeptical that auto-scaling is involved

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-05 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127069#comment-17127069
 ] 

Gus Heck commented on SOLR-13169:
-

Test log showing that only one of replica or shard is required and replica has 
priority. In cases where replica is supplied and shard is ambiguous (more than 
one replica for the shard on the node) the command chooses one but the criteria 
of that choice are not yet clear.

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-05 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck updated SOLR-13169:

Attachment: testing.txt

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
> Attachments: testing.txt
>
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-05 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127026#comment-17127026
 ] 

Gus Heck commented on SOLR-13169:
-

And I can move it back again without the shard param like so:
{code}
http://localhost:8983/solr/admin/collections?action=MOVEREPLICA=test=192.168.2.171:8983_solr=192.168.2.171:8982_solr=core_node6

{
"responseHeader": {
"status": 0,
"QTime": 3668
},
"success": "MOVEREPLICA action completed successfully, moved 
replica=test_shard1_replica_n5 at node=192.168.2.171:8982_solr to 
replica=test_shard1_replica_n7 at node=192.168.2.171:8983_solr"
}
{code}

> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13169) Move Replica Docs need improvement (V1 and V2 introspect)

2020-06-05 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127023#comment-17127023
 ] 

Gus Heck commented on SOLR-13169:
-

Bumping into this again... This (without replica param) succeeded, so replica 
is not always required:

{code}http://localhost:8983/solr/admin/collections?action=MOVEREPLICA=test=192.168.2.171:8982_solr=192.168.2.171:8983_solr=shard1

{
"responseHeader": {
"status": 0,
"QTime": 5060
},
"success": "MOVEREPLICA action completed successfully, moved 
replica=test_shard1_replica_n1 at node=192.168.2.171:8983_solr to 
replica=test_shard1_replica_n5 at node=192.168.2.171:8982_solr"
}
{code}


> Move Replica Docs need improvement (V1 and V2 introspect)
> -
>
> Key: SOLR-13169
> URL: https://issues.apache.org/jira/browse/SOLR-13169
> Project: Solr
>  Issue Type: Improvement
>  Components: v2 API
>Reporter: Gus Heck
>Priority: Major
>
> At a minimum required parameters should be noted equally in both places. 
> Conversation with [~ab] indicates that there are also some discrepancies in 
> what is and is not actually required in docs vs code. ("in MoveReplicaCmd if 
> you specify “replica” then “shard” is completely ignored")
> Also in v2 it seems shard might be inferred from the URL and in that case 
> it's not clear if the URL or the json takes precedence.
> From introspect:
> {code:java}
> "move-replica": {
> "type": "object",
> "documentation": 
> "https://lucene.apache.org/solr/guide/collections-api.html#movereplica;,
> "description": "This command moves a replica from one 
> node to a new node. In case of shared filesystems the `dataDir` and `ulogDir` 
> may be reused.",
> "properties": {
> "replica": {
> "type": "string",
> "description": "The name of the replica"
> },
> "shard": {
> "type": "string",
> "description": "The name of the shard"
> },
> "sourceNode": {
> "type": "string",
> "description": "The name of the node that 
> contains the replica."
> },
> "targetNode": {
> "type": "string",
> "description": "The name of the destination node. 
> This parameter is required."
> },
> "waitForFinalState": {
> "type": "boolean",
> "default": "false",
> "description": "Wait for the moved replica to 
> become active."
> },
> "timeout": {
> "type": "integer",
> "default": 600,
> "description": "Timeout to wait for replica to 
> become active. For very large replicas this may need to be increased."
> },
> "inPlaceMove": {
> "type": "boolean",
> "default": "true",
> "description": "For replicas that use shared 
> filesystems allow 'in-place' move that reuses shared data."
> }
> {code}
> From ref guide for V1:
> MOVEREPLICA Parameters
> collection
> The name of the collection. This parameter is required.
> shard
> The name of the shard that the replica belongs to. This parameter is required.
> replica
> The name of the replica. This parameter is required.
> sourceNode
> The name of the node that contains the replica. This parameter is required.
> targetNode
> The name of the destination node. This parameter is required.
> async
> Request ID to track this action which will be processed asynchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13458) Make Jetty timeouts configurable system wide

2020-06-04 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125908#comment-17125908
 ] 

Gus Heck commented on SOLR-13458:
-

Not at the time, that particular work got abandoned by the customer soon after 
I wrote this and I didn't dig further, but I'm actually once again bumping into 
timeouts (in a cluster with many billions of docs) so I may soon.

> Make Jetty timeouts configurable system wide
> 
>
> Key: SOLR-13458
> URL: https://issues.apache.org/jira/browse/SOLR-13458
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Affects Versions: master (9.0)
>Reporter: Gus Heck
>Priority: Major
>
> Our jetty container has several timeouts associated with it, and at least one 
> of these is regularly getting in my way (the idle timeout after 120 sec). I 
> tried setting a system property, with no effect and I've tried altering a 
> jetty.xml found at solr-install/solr/server/etc/jetty.xml on all (50) 
> machines and rebooting all servers only to have an exception with the old 120 
> sec timeout still show up. This ticket proposes that these values are by 
> nature "Global System Timeouts" and should be made configurable in solr.xml 
> (which may be difficult because they will be needed early in the boot 
> sequence). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2020-05-21 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113517#comment-17113517
 ] 

Gus Heck edited comment on SOLR-13749 at 5/21/20, 8:26 PM:
---

Let me clarify the above... some of it is forward looking in the event that the 
NPE I mentioned above gets changed, or some aspect of when we do/don't 
encode/decode URL's gets changed, etc... or in the event that there are 
parameter hacking/hiding/encoding tricks I didn't think of... HTTP is just too 
ubiquitous, and it initiates the connection with a path string of arbitrary 
size... the ZK protocol is only relevant to ZK servers and there is no way 
(that I know of) to make the initial zk connection send a lot of data.


was (Author: gus_heck):
Let me clarify the above... some of it is forward looking in the even that the 
NPE I mentioned above gets changed, or some aspect of when we do/don't 
encode/decode URL's gets changed, etc... or in the event that there are 
parameter hacking/hiding/encoding tricks I didn't think of... HTTP is just too 
ubiquitous, and it initiates the connection with a path string of arbitrary 
size... the ZK protocol is only relevant to ZK servers and there is no way 
(that I know of) to make the initial zk connection send a lot of data.

> Implement support for joining across collections with multiple shards ( XCJF )
> --
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Blocker
> Fix For: 8.6
>
> Attachments: 2020-03 Smiley with ASF hat.jpeg
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to 
> retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the 
> "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash 
> range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this 
> 

  1   2   >