[jira] [Commented] (LUCENE-8183) HyphenationCompoundWordTokenFilter creates overlapping tokens with onlyLongestMatch enabled

2021-01-13 Thread Martin Demberger (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264648#comment-17264648
 ] 

Martin Demberger commented on LUCENE-8183:
--

Is there any update on this? I have updated the PR so it approves to the new 
checks.

> HyphenationCompoundWordTokenFilter creates overlapping tokens with 
> onlyLongestMatch enabled
> ---
>
> Key: LUCENE-8183
> URL: https://issues.apache.org/jira/browse/LUCENE-8183
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 6.6
> Environment: Configuration of the analyzer:
> 
> 
>          hyphenator="lang/hyph_de_DR.xml" encoding="iso-8859-1"
>          dictionary="lang/wordlist_de.txt" 
>         onlyLongestMatch="true"/>
>  
>Reporter: Rupert Westenthaler
>Assignee: Uwe Schindler
>Priority: Major
> Attachments: LUCENE-8183_20180223_rwesten.diff, 
> LUCENE-8183_20180227_rwesten.diff, lucene-8183.zip
>
>
> The HyphenationCompoundWordTokenFilter creates overlapping tokens even if 
> onlyLongestMatch is enabled. 
> Example:
> Dictionary: {{gesellschaft}}, {{schaft}}
>  Hyphenator: {{de_DR.xml}} //from Apche Offo
>  onlyLongestMatch: true
>  
> |text|gesellschaft|gesellschaft|schaft|
> |raw_bytes|[67 65 73 65 6c 6c 73 63 68 61 66 74]|[67 65 73 65 6c 6c 73 63 68 
> 61 66 74]|[73 63 68 61 66 74]|
> |start|0|0|0|
> |end|12|12|12|
> |positionLength|1|1|1|
> |type|word|word|word|
> |position|1|1|1|
> IMHO this includes 2 unexpected Tokens
>  # the 2nd 'gesellschaft' as it duplicates the original token
>  # the 'schaft' as it is a sub-token 'gesellschaft' that is present in the 
> dictionary
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn commented on pull request #2141: LUCENE-9346: Support minimumNumberShouldMatch in WANDScorer

2021-01-13 Thread GitBox


zacharymorn commented on pull request #2141:
URL: https://github.com/apache/lucene-solr/pull/2141#issuecomment-759967208


   > @zacharymorn I merged your PR because it was good progress already, but 
I'm also +1 on your idea of replacing MinShouldMatchSumScorer with WANDScorer 
since they share very similar logic. Let's give it a try in a follow-up PR?
   
   hi @jpountz, sorry for the delay and thanks for merging this PR! I've 
created a new jira issue https://issues.apache.org/jira/browse/LUCENE-9668 to 
follow up on this, and will work on it and create a new PR in the next few days.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9668) Deprecate MinShouldMatchSumScorer with WANDScorer

2021-01-13 Thread Zach Chen (Jira)
Zach Chen created LUCENE-9668:
-

 Summary: Deprecate MinShouldMatchSumScorer with WANDScorer
 Key: LUCENE-9668
 URL: https://issues.apache.org/jira/browse/LUCENE-9668
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/query/scoring
Reporter: Zach Chen


This is a follow up issue of https://issues.apache.org/jira/browse/LUCENE-9346, 
where support to minShouldMatch has been added to WANDScorer, and thus would 
like to see if MinShouldMatchSumScorer can be deprecated completely by 
WANDScorer, given how similar they are.

For context, some initial discussion of this during the previous work is 
available at 
https://github.com/apache/lucene-solr/pull/2141#discussion_r550806711



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-13 Thread Florin Babes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264619#comment-17264619
 ] 

Florin Babes edited comment on SOLR-15071 at 1/14/21, 6:10 AM:
---

Hello [~cpoerschke]. I've created a test to reproduce the issue here 
[https://github.com/apache/lucene-solr/pull/2201] .


was (Author: florin.babes):
Hello [~cpoerschke]. I've created a test to reproduce the issue here 
[https://github.com/apache/lucene-solr/pull/2201.] 

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Assignee: Christine Poerschke
>Priority: Major
>  Labels: ltr
> Fix For: 8.8
>
> Attachments: featurestore+model+sample_documents.zip
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
> )
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
> )
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580
> )
> at
> 

[jira] [Commented] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-13 Thread Florin Babes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264619#comment-17264619
 ] 

Florin Babes commented on SOLR-15071:
-

Hello [~cpoerschke]. I've created a test to reproduce the issue here 
[https://github.com/apache/lucene-solr/pull/2201.] 

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Assignee: Christine Poerschke
>Priority: Major
>  Labels: ltr
> Fix For: 8.8
>
> Attachments: featurestore+model+sample_documents.zip
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
> )
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
> )
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215
> )
> at
> 

[GitHub] [lucene-solr] holysleeper opened a new pull request #2201: add test case for SOLR-15071

2021-01-13 Thread GitBox


holysleeper opened a new pull request #2201:
URL: https://github.com/apache/lucene-solr/pull/2201


   
   
   
   # Description
   
   Add test for reproducing the issue SOLR-15071
   # Tests
   
   Add test for reproducing the issue SOLR-15071
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14155) Load all other SolrCore plugins from packages

2021-01-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264617#comment-17264617
 ] 

ASF subversion and git services commented on SOLR-14155:


Commit cdf4e31d86d0f5f70aa728d0e5b41ccfd786ca00 in lucene-solr's branch 
refs/heads/branch_8x from noblepaul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cdf4e31 ]

SOLR-14155: Load all other SolrCore plugins from packages


> Load all other SolrCore plugins from packages
> -
>
> Key: SOLR-14155
> URL: https://issues.apache.org/jira/browse/SOLR-14155
> Project: Solr
>  Issue Type: Sub-task
>  Components: packages
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> A few plugins configurable in {{solrconfig.xml}} still cannot be loaded from 
> packages 
>  # SolrEventListener (improperly implemented)
>  # DirectoryFactor
>  # Updatelog
>  # Cache
>  # RecoveryStrategy
>  # IndexReaderFactory
>  # CodecFactory
>  # StatsCache
> #1 can do hot reload.  other should result in reloading the core. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14765) optimize DocList creation by skipping sort for sort-irrelevant cases

2021-01-13 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264604#comment-17264604
 ] 

David Smiley commented on SOLR-14765:
-

Cool optimization!  I'd be happy to review code in & around SolrIndexSearcher, 
as it relates to my crusade to eliminate Filter, so I'm rather familiar with 
the matters.

> optimize DocList creation by skipping sort for sort-irrelevant cases
> 
>
> Key: SOLR-14765
> URL: https://issues.apache.org/jira/browse/SOLR-14765
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Michael Gibney
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When {{rows=0}}, and for {{MatchAllDocsQuery}} and {{ConstantScoreQuery}} 
> (and possibly others?), it is possible for 
> {{SolrIndexSearcher.getDocListC(QueryResult, QueryCommand)}} to create a 
> DocList directly from {{filterCache}} DocSets -- similar to 
> {{useFilterForSortedQuery}}, but without actually sorting. 
> This results in significant benefits for high-recall domains, including the 
> common (and commonly-cached) use-case of {{q=\*:*}} and {{fq}}, facets, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman commented on a change in pull request #2197: SOLR-15075: Solr docker gradle improvements

2021-01-13 Thread GitBox


HoustonPutman commented on a change in pull request #2197:
URL: https://github.com/apache/lucene-solr/pull/2197#discussion_r556931132



##
File path: solr/docker/build.gradle
##
@@ -18,106 +18,203 @@
 import com.google.common.base.Preconditions
 import com.google.common.base.Strings
 
-apply plugin: 'base'
-apply plugin: 'com.palantir.docker'
-
-subprojects {
-  apply plugin: 'base'
-  apply plugin: 'com.palantir.docker'
-}
-
 description = 'Solr Docker image'
 
-def dockerPackage = project(':solr:docker:package')
-
-dependencies {
-  docker dockerPackage
-}
+apply plugin: 'base'
 
+// Solr Docker inputs
 def dockerImageRepo = propertyOrEnvOrDefault("solr.docker.imageRepo", 
"SOLR_DOCKER_IMAGE_REPO", "apache/solr")
 def dockerImageTag = propertyOrEnvOrDefault("solr.docker.imageTag", 
"SOLR_DOCKER_IMAGE_TAG", "${version}")
 def dockerImageName = propertyOrEnvOrDefault("solr.docker.imageName", 
"SOLR_DOCKER_IMAGE_NAME", "${dockerImageRepo}:${dockerImageTag}")
 def baseDockerImage = propertyOrEnvOrDefault("solr.docker.baseImage", 
"SOLR_DOCKER_BASE_IMAGE", 'openjdk:11-jre-slim')
 def githubUrlOrMirror = propertyOrEnvOrDefault("solr.docker.githubUrl", 
"SOLR_DOCKER_GITHUB_URL", 'github.com')
 
-docker {
-  name = dockerImageName
-  files file('include')
-  buildArgs(['BASE_IMAGE' : baseDockerImage, 'SOLR_PACKAGE_IMAGE' : 
'apache/solr-build:local-package', 'SOLR_VERSION': "${version}", 'GITHUB_URL': 
githubUrlOrMirror])
+// Build directory locations
+def dockerBuildDistribution = "$buildDir/distributions"
+def imageIdFile = "$buildDir/image-id"
+
+configurations {
+  packaging {
+canBeResolved = true
+  }
+  dockerImage {
+canBeResolved = true
+  }
 }
 
-tasks.docker {
-  // In order to create the solr docker image, the solr package image must be 
created first.
-  dependsOn(dockerPackage.tasks.docker)
+dependencies {
+  packaging project(path: ":solr:packaging", configuration: 'archives')
+
+  dockerImage files(imageIdFile) {
+builtBy 'dockerBuild'
+  }
+}
+
+task dockerTar(type: Tar) {
+  group = 'Docker'
+  description = 'Package docker context to prepare for docker build'
+
+  dependsOn configurations.packaging
+  into('scripts') {
+from file('scripts')
+fileMode 755
+  }
+  into('releases') {
+from configurations.packaging
+include '*.tgz'
+  }
+  from file('Dockerfile')
+  destinationDirectory = file(dockerBuildDistribution)
+  extension 'tgz'
+  compression = Compression.GZIP
+}
+
+task dockerBuild(dependsOn: tasks.dockerTar) {
+  group = 'Docker'
+  description = 'Build Solr docker image'
+
+  // Ensure that the docker image is rebuilt on build-arg changes or changes 
in the docker context
+  inputs.properties([
+  baseDockerImage: baseDockerImage,
+  githubUrlOrMirror: githubUrlOrMirror,
+  version: version
+  ])
+  inputs.dir(dockerBuildDistribution)
+
+  // Docker build must run after the docker context has been created & tarred
+  mustRunAfter tasks.dockerTar

Review comment:
   It's not neccessary right now. I mainly put it in because I'm 
contemplating removing the dependency at some point (I actually did add it, but 
then removed it without removing the `mustRunAfter` lines.
   
   The issue now is that if you run:
   
   ```
   gradlew dockerBuild -Psolr.docker.baseImage=something-else:latest
   gradlew dockerTag -P solr.docker.imageName=custom-name:latest
   ```
   
   In the above example the first command will build the dockerImage with the 
custom baseImage. However the second command doesn't have the `baseImage` 
specified, so in the dependent `dockerBuild` task it will build the docker 
image with the default baseImage, overwriting the image created in the first 
command.
   
   ```
   gradlew dockerBuild dockerTag -Psolr.docker.baseImage=something-else:latest 
-P solr.docker.imageName=custom-name:latest
   ```
   
   This is the command necessary to build & tag with a custom baseImage.
   
   I think it might be nice to make `testDocker` and `dockerTag` not dependent 
on `dockerBuild`, but instead merely check to make sure that `dockerBuild` has 
already been run. (The resulting image-id should be stored in a file in the 
build directory). That way the two commands above work exactly the same.
   
   I'm not convinced either way yet, just something I've been thinking about.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #2197: SOLR-15075: Solr docker gradle improvements

2021-01-13 Thread GitBox


madrob commented on a change in pull request #2197:
URL: https://github.com/apache/lucene-solr/pull/2197#discussion_r556919719



##
File path: solr/docker/build.gradle
##
@@ -18,106 +18,203 @@
 import com.google.common.base.Preconditions
 import com.google.common.base.Strings
 
-apply plugin: 'base'
-apply plugin: 'com.palantir.docker'
-
-subprojects {
-  apply plugin: 'base'
-  apply plugin: 'com.palantir.docker'
-}
-
 description = 'Solr Docker image'
 
-def dockerPackage = project(':solr:docker:package')
-
-dependencies {
-  docker dockerPackage
-}
+apply plugin: 'base'
 
+// Solr Docker inputs
 def dockerImageRepo = propertyOrEnvOrDefault("solr.docker.imageRepo", 
"SOLR_DOCKER_IMAGE_REPO", "apache/solr")
 def dockerImageTag = propertyOrEnvOrDefault("solr.docker.imageTag", 
"SOLR_DOCKER_IMAGE_TAG", "${version}")
 def dockerImageName = propertyOrEnvOrDefault("solr.docker.imageName", 
"SOLR_DOCKER_IMAGE_NAME", "${dockerImageRepo}:${dockerImageTag}")
 def baseDockerImage = propertyOrEnvOrDefault("solr.docker.baseImage", 
"SOLR_DOCKER_BASE_IMAGE", 'openjdk:11-jre-slim')
 def githubUrlOrMirror = propertyOrEnvOrDefault("solr.docker.githubUrl", 
"SOLR_DOCKER_GITHUB_URL", 'github.com')
 
-docker {
-  name = dockerImageName
-  files file('include')
-  buildArgs(['BASE_IMAGE' : baseDockerImage, 'SOLR_PACKAGE_IMAGE' : 
'apache/solr-build:local-package', 'SOLR_VERSION': "${version}", 'GITHUB_URL': 
githubUrlOrMirror])
+// Build directory locations
+def dockerBuildDistribution = "$buildDir/distributions"
+def imageIdFile = "$buildDir/image-id"
+
+configurations {
+  packaging {
+canBeResolved = true
+  }
+  dockerImage {
+canBeResolved = true
+  }
 }
 
-tasks.docker {
-  // In order to create the solr docker image, the solr package image must be 
created first.
-  dependsOn(dockerPackage.tasks.docker)
+dependencies {
+  packaging project(path: ":solr:packaging", configuration: 'archives')
+
+  dockerImage files(imageIdFile) {
+builtBy 'dockerBuild'
+  }
+}
+
+task dockerTar(type: Tar) {
+  group = 'Docker'
+  description = 'Package docker context to prepare for docker build'
+
+  dependsOn configurations.packaging
+  into('scripts') {
+from file('scripts')
+fileMode 755
+  }
+  into('releases') {
+from configurations.packaging
+include '*.tgz'
+  }
+  from file('Dockerfile')
+  destinationDirectory = file(dockerBuildDistribution)
+  extension 'tgz'
+  compression = Compression.GZIP
+}
+
+task dockerBuild(dependsOn: tasks.dockerTar) {
+  group = 'Docker'
+  description = 'Build Solr docker image'
+
+  // Ensure that the docker image is rebuilt on build-arg changes or changes 
in the docker context
+  inputs.properties([
+  baseDockerImage: baseDockerImage,
+  githubUrlOrMirror: githubUrlOrMirror,
+  version: version
+  ])
+  inputs.dir(dockerBuildDistribution)
+
+  // Docker build must run after the docker context has been created & tarred
+  mustRunAfter tasks.dockerTar

Review comment:
   is this redundant with the dependsOn declaration above? I'm surely 
missing some subtlety. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on pull request #2166: SOLR-15060: Introduce DelegatingDirectoryFactory.

2021-01-13 Thread GitBox


dsmiley commented on pull request #2166:
URL: https://github.com/apache/lucene-solr/pull/2166#issuecomment-759775708


   An in-between is to leave this issue open until the BlobDirectory is much 
more ready.  It may change in the mean time; who knows.  I do like the notion 
of committing separate pieces instead of one feature!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on pull request #2197: SOLR-15075: Solr docker gradle improvements

2021-01-13 Thread GitBox


dsmiley commented on pull request #2197:
URL: https://github.com/apache/lucene-solr/pull/2197#issuecomment-759773315


   I'm glad you payed attention to task inputs/outputs.  I have shelved changes 
to this gradle build to improve this because changes to the Dockerfile weren't 
seen as changing the output! (it's sort of the most important input).  I can 
now remove my WIP because you've overhauled the whole thing, which I'm glad to 
see :-)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2196: SOLR-15071: Fix ArrayIndexOutOfBoundsException in contrib/ltr SolrFeatureScorer

2021-01-13 Thread GitBox


dsmiley commented on a change in pull request #2196:
URL: https://github.com/apache/lucene-solr/pull/2196#discussion_r556883061



##
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/Feature.java
##
@@ -365,11 +364,6 @@ public DocIdSetIterator iterator() {
 return in.iterator();
   }
 
-  @Override

Review comment:
   @cpoerschke  Let's add a comment saying that we intentionally don't 
delegate twoPhaseIterator because it doesn't work, and that we don't know why, 
with a reference to the JIRA issue?
   Other than that, let's commit this to protect users as-is.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9661) Another classloader deadlock?

2021-01-13 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-9661:
-
Attachment: intellij inspection results.html

> Another classloader deadlock?
> -
>
> Key: LUCENE-9661
> URL: https://issues.apache.org/jira/browse/LUCENE-9661
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.0, master (9.0)
>Reporter: Michael McCandless
>Priority: Blocker
> Fix For: 8.x, master (9.0), 8.8
>
> Attachments: deadlock inspections.jpg, deadlock_test.patch, intellij 
> inspection results.html
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The {{java}} processes spawned by our Lucene nightly benchmarks sometimes 
> randomly hang, apparently while loading classes across threads, under 
> contention.
> I've opened [this {{luceneutil}} issue with some 
> details|https://github.com/mikemccand/luceneutil/issues/89], but 
> [~uschindler] suggested I open an issue here too since he has been seeing 
> this in CI builds too.
> It is rare, maybe once a week in the nightly benchmarks (which spawn many 
> {{java}} processes with many threads across 128 CPU cores).  It is clearly a 
> deadlock – when it strikes, the process hangs forever until I notice and 
> {{kill -9}} it.  I posted a coupled {{jstacks}} in the issue above.
> [~rcmuir] suggested using {{classcycle}} to maybe statically dig into 
> possible deadlocks ... I have not tried that yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-13 Thread Walter Underwood (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264374#comment-17264374
 ] 

Walter Underwood edited comment on SOLR-15056 at 1/13/21, 7:42 PM:
---

For a load average threshold of 3.0, should the config value be 300 or 3.0?

I'm going with 3.0 for now.


was (Author: wunder):
For a load average threshold of 3.0, should the config value be 300 or 3.0?

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>  Labels: Metrics
> Attachments: SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-13 Thread Walter Underwood (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264374#comment-17264374
 ] 

Walter Underwood commented on SOLR-15056:
-

For a load average threshold of 3.0, should the config value be 300 or 3.0?

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>  Labels: Metrics
> Attachments: SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9661) Another classloader deadlock?

2021-01-13 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264343#comment-17264343
 ] 

Uwe Schindler commented on LUCENE-9661:
---

There's also a bug in older Intellij versions which wouldn't detect the bug 
here: https://youtrack.jetbrains.com/issue/IDEA-200775

> Another classloader deadlock?
> -
>
> Key: LUCENE-9661
> URL: https://issues.apache.org/jira/browse/LUCENE-9661
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.0, master (9.0)
>Reporter: Michael McCandless
>Priority: Blocker
> Fix For: 8.x, master (9.0), 8.8
>
> Attachments: deadlock inspections.jpg, deadlock_test.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The {{java}} processes spawned by our Lucene nightly benchmarks sometimes 
> randomly hang, apparently while loading classes across threads, under 
> contention.
> I've opened [this {{luceneutil}} issue with some 
> details|https://github.com/mikemccand/luceneutil/issues/89], but 
> [~uschindler] suggested I open an issue here too since he has been seeing 
> this in CI builds too.
> It is rare, maybe once a week in the nightly benchmarks (which spawn many 
> {{java}} processes with many threads across 128 CPU cores).  It is clearly a 
> deadlock – when it strikes, the process hangs forever until I notice and 
> {{kill -9}} it.  I posted a coupled {{jstacks}} in the issue above.
> [~rcmuir] suggested using {{classcycle}} to maybe statically dig into 
> possible deadlocks ... I have not tried that yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9661) Another classloader deadlock?

2021-01-13 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264339#comment-17264339
 ] 

Uwe Schindler commented on LUCENE-9661:
---

Hi,
I checked the ECJ compiler settings, there's nothing detecting that.
If we would be able to enable this is ECJ, our checks with ECJ could detect 
this problem.

I will dig further, unfortunately, InteliJ's compiler or code analyzer cannot 
be executed from command line or Gradle, like ECJ can do.

> Another classloader deadlock?
> -
>
> Key: LUCENE-9661
> URL: https://issues.apache.org/jira/browse/LUCENE-9661
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.0, master (9.0)
>Reporter: Michael McCandless
>Priority: Blocker
> Fix For: 8.x, master (9.0), 8.8
>
> Attachments: deadlock inspections.jpg, deadlock_test.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The {{java}} processes spawned by our Lucene nightly benchmarks sometimes 
> randomly hang, apparently while loading classes across threads, under 
> contention.
> I've opened [this {{luceneutil}} issue with some 
> details|https://github.com/mikemccand/luceneutil/issues/89], but 
> [~uschindler] suggested I open an issue here too since he has been seeing 
> this in CI builds too.
> It is rare, maybe once a week in the nightly benchmarks (which spawn many 
> {{java}} processes with many threads across 128 CPU cores).  It is clearly a 
> deadlock – when it strikes, the process hangs forever until I notice and 
> {{kill -9}} it.  I posted a coupled {{jstacks}} in the issue above.
> [~rcmuir] suggested using {{classcycle}} to maybe statically dig into 
> possible deadlocks ... I have not tried that yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] danmuzi commented on a change in pull request #2200: LUCENE-9661: Fix deadlock in TermsEnum.EMPTY

2021-01-13 Thread GitBox


danmuzi commented on a change in pull request #2200:
URL: https://github.com/apache/lucene-solr/pull/2200#discussion_r556740515



##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestTermsEnumDeadlock.java
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import org.apache.lucene.util.BytesRef;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.lang.management.ManagementFactory;
+import java.lang.management.RuntimeMXBean;
+import java.nio.file.Paths;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.concurrent.TimeUnit;
+
+public class TestTermsEnumDeadlock extends Assert {
+  private static final int MAX_TIME_SECONDS = 15;
+
+  @Test
+  public void testDeadlock() throws Exception {
+for (int i = 0; i < 20; i++) {
+  // Fork a separate JVM to reinitialize classes.

Review comment:
   Thank you for your review @uschindler :D
   
   It was added to ensure working.
   
   But I agree with your comment about test iteration.
   So I'll delete the for-loop statement and change it to run only once.
   
   As for the necessity of deadlock test, I'll add TODO comment.
   Later, if someone applies to static checker about it, I'll delete it at that 
time.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9667) Hunspell: add a spellchecker, support BREAK and FORBIDDENWORD affix rules

2021-01-13 Thread Peter Gromov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Gromov updated LUCENE-9667:
-
Status: Patch Available  (was: Open)

> Hunspell: add a spellchecker, support BREAK and FORBIDDENWORD affix rules
> -
>
> Key: LUCENE-9667
> URL: https://issues.apache.org/jira/browse/LUCENE-9667
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Peter Gromov
>Priority: Major
> Attachments: LUCENE-9667.patch
>
>
> Test data taken from hunspell C++, the new code is based on 
> https://github.com/hunspell/hunspell/blob/master/src/hunspell/hunspell.cxx#L675



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9667) Hunspell: add a spellchecker, support BREAK and FORBIDDENWORD affix rules

2021-01-13 Thread Peter Gromov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Gromov updated LUCENE-9667:
-
Attachment: LUCENE-9667.patch

> Hunspell: add a spellchecker, support BREAK and FORBIDDENWORD affix rules
> -
>
> Key: LUCENE-9667
> URL: https://issues.apache.org/jira/browse/LUCENE-9667
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Peter Gromov
>Priority: Major
> Attachments: LUCENE-9667.patch
>
>
> Test data taken from hunspell C++, the new code is based on 
> https://github.com/hunspell/hunspell/blob/master/src/hunspell/hunspell.cxx#L675



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9667) Hunspell: add a spellchecker, support BREAK and FORBIDDENWORD affix rules

2021-01-13 Thread Peter Gromov (Jira)
Peter Gromov created LUCENE-9667:


 Summary: Hunspell: add a spellchecker, support BREAK and 
FORBIDDENWORD affix rules
 Key: LUCENE-9667
 URL: https://issues.apache.org/jira/browse/LUCENE-9667
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Peter Gromov


Test data taken from hunspell C++, the new code is based on 
https://github.com/hunspell/hunspell/blob/master/src/hunspell/hunspell.cxx#L675



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #2200: LUCENE-9661: Fix deadlock in TermsEnum.EMPTY

2021-01-13 Thread GitBox


uschindler commented on a change in pull request #2200:
URL: https://github.com/apache/lucene-solr/pull/2200#discussion_r556698348



##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestTermsEnumDeadlock.java
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import org.apache.lucene.util.BytesRef;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.lang.management.ManagementFactory;
+import java.lang.management.RuntimeMXBean;
+import java.nio.file.Paths;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.concurrent.TimeUnit;
+
+public class TestTermsEnumDeadlock extends Assert {
+  private static final int MAX_TIME_SECONDS = 15;
+
+  @Test
+  public void testDeadlock() throws Exception {
+for (int i = 0; i < 20; i++) {
+  // Fork a separate JVM to reinitialize classes.

Review comment:
   Classloading won't help, because we still need a separate JVM. When JVM 
loads classes, it trys in parent classloader first. If the class is already 
there it won't load. So we need at least one separate process.
   
   I don't like to try many times each with separate JVM. Maybe only try once 
(like in the other test with codecs). It may not fail every time, but sometimes 
test fails.
   
   I am also not sure if we really need a test for this. If we may get a static 
checker that finds classes that initialize their subclasses in their own static 
initializer, we can prevent similar cars in future.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on a change in pull request #2200: LUCENE-9661: Fix deadlock in TermsEnum.EMPTY

2021-01-13 Thread GitBox


uschindler commented on a change in pull request #2200:
URL: https://github.com/apache/lucene-solr/pull/2200#discussion_r556698348



##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestTermsEnumDeadlock.java
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import org.apache.lucene.util.BytesRef;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.lang.management.ManagementFactory;
+import java.lang.management.RuntimeMXBean;
+import java.nio.file.Paths;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.concurrent.TimeUnit;
+
+public class TestTermsEnumDeadlock extends Assert {
+  private static final int MAX_TIME_SECONDS = 15;
+
+  @Test
+  public void testDeadlock() throws Exception {
+for (int i = 0; i < 20; i++) {
+  // Fork a separate JVM to reinitialize classes.

Review comment:
   Classloading won't help, because we still need a separate JVM. When JVM 
loads classes, it ttys in parent classliadee first. If the class is already 
there it won't load. So we need at least one separate process.
   
   I don't like to try many times each with separate JVM. Maybe only try once 
(like in the other test with codecs). It may not fail every time, but sometimes 
test fails.
   
   I am also not sure if we really need a test for this. If we may get a static 
checker that finds classes that initialize their subclasses in their own static 
initializer, we can prevent similar cars in future.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] danmuzi commented on a change in pull request #2200: LUCENE-9661: Fix deadlock in TermsEnum.EMPTY

2021-01-13 Thread GitBox


danmuzi commented on a change in pull request #2200:
URL: https://github.com/apache/lucene-solr/pull/2200#discussion_r556689874



##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestTermsEnumDeadlock.java
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import org.apache.lucene.util.BytesRef;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.lang.management.ManagementFactory;
+import java.lang.management.RuntimeMXBean;
+import java.nio.file.Paths;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.concurrent.TimeUnit;
+
+public class TestTermsEnumDeadlock extends Assert {
+  private static final int MAX_TIME_SECONDS = 15;
+
+  @Test
+  public void testDeadlock() throws Exception {
+for (int i = 0; i < 20; i++) {
+  // Fork a separate JVM to reinitialize classes.

Review comment:
   Thank you for your review @madrob :D
   
   Hmm... Is there any way to do static initialization multiple times in the 
same JVM?
   
   As far as I know, static initialization is only executed once per JVM and 
the concepts of loading and initialization are different.
   https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-5.html
   Class loading : 
https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-5.html#jvms-5.3
   Initialization : 
https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-5.html#jvms-5.5
   
   And that TC is meaningful only when the test is run the first time or alone.
   
   Please let me know if I'm wrong!
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman commented on a change in pull request #2198: SOLR-15081: Metrics for core: isLeader, status

2021-01-13 Thread GitBox


HoustonPutman commented on a change in pull request #2198:
URL: https://github.com/apache/lucene-solr/pull/2198#discussion_r556646373



##
File path: solr/core/src/java/org/apache/solr/core/SolrCore.java
##
@@ -1202,26 +1203,26 @@ public void initializeMetrics(SolrMetricsContext 
parentContext, String scope) {
 parentContext.gauge(() -> isClosed() ? parentContext.nullString() : 
getIndexDir(), true, "indexDir", Category.CORE.toString());
 parentContext.gauge(() -> isClosed() ? parentContext.nullNumber() : 
getIndexSize(), true, "sizeInBytes", Category.INDEX.toString());
 parentContext.gauge(() -> isClosed() ? parentContext.nullString() : 
NumberUtils.readableSize(getIndexSize()), true, "size", 
Category.INDEX.toString());
-if (coreContainer != null) {
-  final CloudDescriptor cd = getCoreDescriptor().getCloudDescriptor();
-  if (cd != null) {
-parentContext.gauge(() -> {
-  if (cd.getCollectionName() != null) {
-return cd.getCollectionName();
-  } else {
-return parentContext.nullString();
-  }
-}, true, "collection", Category.CORE.toString());
 
-parentContext.gauge(() -> {
-  if (cd.getShardId() != null) {
-return cd.getShardId();
-  } else {
-return parentContext.nullString();
-  }
-}, true, "shard", Category.CORE.toString());
-  }
+final CloudDescriptor cd = getCoreDescriptor().getCloudDescriptor();
+if (cd != null) {
+  // TODO
+  parentContext.gauge(cd::getCollectionName, true, "collection", 
Category.CORE.toString());
+  parentContext.gauge(() -> Objects.requireNonNullElse(cd.getShardId(), 
parentContext.nullString()), true, "shard", Category.CORE.toString());
+  //TODO should this instead be in a core status, or a metric?  When do we 
use which?

Review comment:
   I don't think it's inherently wrong to have overlap between status and 
metrics, as long as the overlapping information is coming from the same source.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2197: SOLR-15075: Solr docker gradle improvements

2021-01-13 Thread GitBox


dweiss commented on a change in pull request #2197:
URL: https://github.com/apache/lucene-solr/pull/2197#discussion_r556640697



##
File path: solr/docker/build.gradle
##
@@ -18,106 +18,187 @@
 import com.google.common.base.Preconditions
 import com.google.common.base.Strings
 
-apply plugin: 'base'
-apply plugin: 'com.palantir.docker'
-
-subprojects {
-  apply plugin: 'base'
-  apply plugin: 'com.palantir.docker'
-}
-
 description = 'Solr Docker image'
 
-def dockerPackage = project(':solr:docker:package')
-
-dependencies {
-  docker dockerPackage
-}
+apply plugin: 'base'
 
+// Solr Docker inputs
 def dockerImageRepo = propertyOrEnvOrDefault("solr.docker.imageRepo", 
"SOLR_DOCKER_IMAGE_REPO", "apache/solr")
 def dockerImageTag = propertyOrEnvOrDefault("solr.docker.imageTag", 
"SOLR_DOCKER_IMAGE_TAG", "${version}")
 def dockerImageName = propertyOrEnvOrDefault("solr.docker.imageName", 
"SOLR_DOCKER_IMAGE_NAME", "${dockerImageRepo}:${dockerImageTag}")
 def baseDockerImage = propertyOrEnvOrDefault("solr.docker.baseImage", 
"SOLR_DOCKER_BASE_IMAGE", 'openjdk:11-jre-slim')
 def githubUrlOrMirror = propertyOrEnvOrDefault("solr.docker.githubUrl", 
"SOLR_DOCKER_GITHUB_URL", 'github.com')
 
-docker {
-  name = dockerImageName
-  files file('include')
-  buildArgs(['BASE_IMAGE' : baseDockerImage, 'SOLR_PACKAGE_IMAGE' : 
'apache/solr-build:local-package', 'SOLR_VERSION': "${version}", 'GITHUB_URL': 
githubUrlOrMirror])
+// Build directory locations
+def dockerBuildDistribution = "$buildDir/distributions"
+def imageIdFile = "$buildDir/image-id"
+
+configurations {
+  packaging {
+canBeResolved = true
+  }
+  dockerImage {
+canBeResolved = true
+  }
+}
+
+dependencies {
+  packaging project(path: ":solr:packaging", configuration: 'archives')
+
+  dockerImage files(imageIdFile) {
+builtBy 'dockerBuild'
+  }
+}
+
+task dockerTar(type: Tar) {
+  group = 'Docker'
+  description = 'Package docker context to prepare for docker build'
+
+  dependsOn configurations.packaging
+  into('scripts') {
+from file('scripts')
+fileMode 755
+  }
+  into('releases') {
+from configurations.packaging
+include '*.tgz'
+  }
+  from file('Dockerfile')
+  destinationDirectory = file(dockerBuildDistribution)
+  extension 'tgz'
+  compression = Compression.GZIP
 }
 
-tasks.docker {
-  // In order to create the solr docker image, the solr package image must be 
created first.
-  dependsOn(dockerPackage.tasks.docker)
+task dockerBuild(dependsOn: tasks.dockerTar) {
+  group = 'Docker'
+  description = 'Build Solr docker image'
+
+  // Ensure that the docker image is rebuilt on build-arg changes or changes 
in the docker context
+  inputs.properties([
+  baseDockerImage: baseDockerImage,
+  githubUrlOrMirror: githubUrlOrMirror,
+  version: version
+  ])
+  inputs.dir(dockerBuildDistribution)
+
+  doLast {
+exec {
+  standardInput = 
tasks.dockerTar.outputs.files.singleFile.newDataInputStream()
+  commandLine "docker", "build",
+  "--iidfile", imageIdFile,
+  "--build-arg", "BASE_IMAGE=${inputs.properties.baseDockerImage}",
+  "--build-arg", "SOLR_VERSION=${version}",
+  "--build-arg", 
"GITHUB_URL=${inputs.properties.githubUrlOrMirror}",
+  "-"
+}
+  }
 
   // Print information on the image after it has been created
   doLast {
+def dockerImageId = file(imageIdFile).text
 project.logger.lifecycle("Solr Docker Image Created")
-project.logger.lifecycle("\tName: $dockerImageName")
-project.logger.lifecycle("\tBase Image: $baseDockerImage")
+project.logger.lifecycle("\tID: \t$dockerImageId")
+project.logger.lifecycle("\tBase Image: \t$baseDockerImage")
+project.logger.lifecycle("\tSolr Version: \t$version")
   }
+
+  outputs.files(imageIdFile)
 }
 
-abstract class DockerTestSuite extends DefaultTask {
-  private String solrImageName = null;
-  private List tests = new ArrayList<>();
-  private List ignore = new ArrayList<>();
+task dockerTag(dependsOn: tasks.dockerBuild) {
+  group = 'Docker'
+  description = 'Tag Solr docker image'
 
-  @OutputDirectory
-  abstract DirectoryProperty getOutputDir()
+  // Ensure that the docker image is re-tagged if the image ID or desired tag 
changes
+  inputs.properties([
+  dockerImageName: dockerImageName,
+  ])
+  inputs.file(imageIdFile)
 
-  public void setSolrImageName(String solrImageName) {
-this.solrImageName = solrImageName
+  doLast {
+exec {
+  commandLine "docker", "tag", 
tasks.dockerBuild.outputs.files.singleFile.text, 
inputs.properties.dockerImageName
+}
   }
 
-  public String getSolrImageName() {
-Preconditions.checkArgument(!Strings.isNullOrEmpty(solrImageName), 
"solrImageName is a required dockerTests configuration item.")
-return solrImageName
+  // Print information on the image after it has been created
+  doLast {
+def dockerImageId = tasks.dockerBuild.outputs.files.singleFile.text
+ 

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2197: SOLR-15075: Solr docker gradle improvements

2021-01-13 Thread GitBox


dweiss commented on a change in pull request #2197:
URL: https://github.com/apache/lucene-solr/pull/2197#discussion_r556635582



##
File path: solr/docker/build.gradle
##
@@ -18,106 +18,187 @@
 import com.google.common.base.Preconditions
 import com.google.common.base.Strings
 
-apply plugin: 'base'
-apply plugin: 'com.palantir.docker'
-
-subprojects {
-  apply plugin: 'base'
-  apply plugin: 'com.palantir.docker'
-}
-
 description = 'Solr Docker image'
 
-def dockerPackage = project(':solr:docker:package')
-
-dependencies {
-  docker dockerPackage
-}
+apply plugin: 'base'
 
+// Solr Docker inputs
 def dockerImageRepo = propertyOrEnvOrDefault("solr.docker.imageRepo", 
"SOLR_DOCKER_IMAGE_REPO", "apache/solr")
 def dockerImageTag = propertyOrEnvOrDefault("solr.docker.imageTag", 
"SOLR_DOCKER_IMAGE_TAG", "${version}")
 def dockerImageName = propertyOrEnvOrDefault("solr.docker.imageName", 
"SOLR_DOCKER_IMAGE_NAME", "${dockerImageRepo}:${dockerImageTag}")
 def baseDockerImage = propertyOrEnvOrDefault("solr.docker.baseImage", 
"SOLR_DOCKER_BASE_IMAGE", 'openjdk:11-jre-slim')
 def githubUrlOrMirror = propertyOrEnvOrDefault("solr.docker.githubUrl", 
"SOLR_DOCKER_GITHUB_URL", 'github.com')
 
-docker {
-  name = dockerImageName
-  files file('include')
-  buildArgs(['BASE_IMAGE' : baseDockerImage, 'SOLR_PACKAGE_IMAGE' : 
'apache/solr-build:local-package', 'SOLR_VERSION': "${version}", 'GITHUB_URL': 
githubUrlOrMirror])
+// Build directory locations
+def dockerBuildDistribution = "$buildDir/distributions"
+def imageIdFile = "$buildDir/image-id"
+
+configurations {
+  packaging {
+canBeResolved = true
+  }
+  dockerImage {
+canBeResolved = true
+  }
+}
+
+dependencies {
+  packaging project(path: ":solr:packaging", configuration: 'archives')
+
+  dockerImage files(imageIdFile) {
+builtBy 'dockerBuild'
+  }
+}
+
+task dockerTar(type: Tar) {
+  group = 'Docker'
+  description = 'Package docker context to prepare for docker build'
+
+  dependsOn configurations.packaging
+  into('scripts') {
+from file('scripts')
+fileMode 755
+  }
+  into('releases') {
+from configurations.packaging
+include '*.tgz'
+  }
+  from file('Dockerfile')
+  destinationDirectory = file(dockerBuildDistribution)
+  extension 'tgz'
+  compression = Compression.GZIP
 }
 
-tasks.docker {
-  // In order to create the solr docker image, the solr package image must be 
created first.
-  dependsOn(dockerPackage.tasks.docker)
+task dockerBuild(dependsOn: tasks.dockerTar) {
+  group = 'Docker'
+  description = 'Build Solr docker image'
+
+  // Ensure that the docker image is rebuilt on build-arg changes or changes 
in the docker context
+  inputs.properties([
+  baseDockerImage: baseDockerImage,
+  githubUrlOrMirror: githubUrlOrMirror,
+  version: version
+  ])
+  inputs.dir(dockerBuildDistribution)
+
+  doLast {
+exec {
+  standardInput = 
tasks.dockerTar.outputs.files.singleFile.newDataInputStream()
+  commandLine "docker", "build",
+  "--iidfile", imageIdFile,
+  "--build-arg", "BASE_IMAGE=${inputs.properties.baseDockerImage}",
+  "--build-arg", "SOLR_VERSION=${version}",
+  "--build-arg", 
"GITHUB_URL=${inputs.properties.githubUrlOrMirror}",
+  "-"
+}
+  }
 
   // Print information on the image after it has been created
   doLast {
+def dockerImageId = file(imageIdFile).text
 project.logger.lifecycle("Solr Docker Image Created")
-project.logger.lifecycle("\tName: $dockerImageName")
-project.logger.lifecycle("\tBase Image: $baseDockerImage")
+project.logger.lifecycle("\tID: \t$dockerImageId")
+project.logger.lifecycle("\tBase Image: \t$baseDockerImage")
+project.logger.lifecycle("\tSolr Version: \t$version")
   }
+
+  outputs.files(imageIdFile)
 }
 
-abstract class DockerTestSuite extends DefaultTask {
-  private String solrImageName = null;
-  private List tests = new ArrayList<>();
-  private List ignore = new ArrayList<>();
+task dockerTag(dependsOn: tasks.dockerBuild) {
+  group = 'Docker'
+  description = 'Tag Solr docker image'
 
-  @OutputDirectory
-  abstract DirectoryProperty getOutputDir()
+  // Ensure that the docker image is re-tagged if the image ID or desired tag 
changes
+  inputs.properties([
+  dockerImageName: dockerImageName,
+  ])
+  inputs.file(imageIdFile)
 
-  public void setSolrImageName(String solrImageName) {
-this.solrImageName = solrImageName
+  doLast {
+exec {
+  commandLine "docker", "tag", 
tasks.dockerBuild.outputs.files.singleFile.text, 
inputs.properties.dockerImageName
+}
   }
 
-  public String getSolrImageName() {
-Preconditions.checkArgument(!Strings.isNullOrEmpty(solrImageName), 
"solrImageName is a required dockerTests configuration item.")
-return solrImageName
+  // Print information on the image after it has been created
+  doLast {
+def dockerImageId = tasks.dockerBuild.outputs.files.singleFile.text
+ 

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2197: SOLR-15075: Solr docker gradle improvements

2021-01-13 Thread GitBox


dweiss commented on a change in pull request #2197:
URL: https://github.com/apache/lucene-solr/pull/2197#discussion_r556633807



##
File path: gradle/help.gradle
##
@@ -30,7 +30,7 @@ configure(rootProject) {
   ["Git", "help/git.txt", "Git assistance and guides."],
   ["ValidateLogCalls", "help/validateLogCalls.txt", "How to use logging 
calls efficiently."],
   ["IDEs", "help/IDEs.txt", "IDE support."],
-  ["Docker", "help/docker.txt", "Building Solr Docker images."],

Review comment:
   Ok, fine. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-13 Thread Florin Babes (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264219#comment-17264219
 ] 

Florin Babes commented on SOLR-15071:
-

Hello [~cpoerschke], I will try to create a test for that issue. I will try to 
finish it next week because tomorrow I will go on vacation for a few days.

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Assignee: Christine Poerschke
>Priority: Major
>  Labels: ltr
> Fix For: 8.8
>
> Attachments: featurestore+model+sample_documents.zip
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
> )
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
> )
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215
> )
> 

[GitHub] [lucene-solr] madrob commented on a change in pull request #2200: LUCENE-9661: Fix deadlock in TermsEnum.EMPTY

2021-01-13 Thread GitBox


madrob commented on a change in pull request #2200:
URL: https://github.com/apache/lucene-solr/pull/2200#discussion_r556614634



##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestTermsEnumDeadlock.java
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+import org.apache.lucene.util.BytesRef;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.lang.management.ManagementFactory;
+import java.lang.management.RuntimeMXBean;
+import java.nio.file.Paths;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.concurrent.TimeUnit;
+
+public class TestTermsEnumDeadlock extends Assert {
+  private static final int MAX_TIME_SECONDS = 15;
+
+  @Test
+  public void testDeadlock() throws Exception {
+for (int i = 0; i < 20; i++) {
+  // Fork a separate JVM to reinitialize classes.

Review comment:
   Could we do this by creating a new class loader instead of a whole new 
process?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9661) Another classloader deadlock?

2021-01-13 Thread Namgyu Kim (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namgyu Kim updated LUCENE-9661:
---
Fix Version/s: 8.x

> Another classloader deadlock?
> -
>
> Key: LUCENE-9661
> URL: https://issues.apache.org/jira/browse/LUCENE-9661
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.0, master (9.0)
>Reporter: Michael McCandless
>Priority: Blocker
> Fix For: 8.x, master (9.0), 8.8
>
> Attachments: deadlock inspections.jpg, deadlock_test.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The {{java}} processes spawned by our Lucene nightly benchmarks sometimes 
> randomly hang, apparently while loading classes across threads, under 
> contention.
> I've opened [this {{luceneutil}} issue with some 
> details|https://github.com/mikemccand/luceneutil/issues/89], but 
> [~uschindler] suggested I open an issue here too since he has been seeing 
> this in CI builds too.
> It is rare, maybe once a week in the nightly benchmarks (which spawn many 
> {{java}} processes with many threads across 128 CPU cores).  It is clearly a 
> deadlock – when it strikes, the process hangs forever until I notice and 
> {{kill -9}} it.  I posted a coupled {{jstacks}} in the issue above.
> [~rcmuir] suggested using {{classcycle}} to maybe statically dig into 
> possible deadlocks ... I have not tried that yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on pull request #2166: SOLR-15060: Introduce DelegatingDirectoryFactory.

2021-01-13 Thread GitBox


bruno-roustant commented on pull request #2166:
URL: https://github.com/apache/lucene-solr/pull/2166#issuecomment-759526669


   Actually I'll need that to simplify the code in BlobDirectory. So based on 
your remark David, I should rather integrate this code in BlobDirectory code 
and close this Jira issue.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)

2021-01-13 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264214#comment-17264214
 ] 

Jason Gerlowski commented on SOLR-15051:


Some overall thoughts on the proposal:

# I really like the conceptual lines that this draws.  Relying strictly on the 
Directory/DirectoryFactory interface is promising in terms of keeping storage 
concerns and SolrCloud concerns separate.  We've tried Directory-based 
abstractions before of course (with HdfsDirectory), but this proposal improves 
on that in concrete ways: index file deduplication/ref-counting, removal of 
"BlockCache" concept, etc.  (This isn't a knock on HdfsDirectory - TLOG/PULL 
replica types weren't around when HdfsDirectory was introduced, which is really 
what makes the BlobDirectory design feasible afaict.)
# At the risk of counting unhatched chickens - I also think it's promising that 
this design can piggy-back on some of the SIP-12 work: especially the concrete 
BackupRepository implementations that SIP-12 proposes for common blob stores.
# One specific worry I have is that BlobDirectory methods might be insufficient 
for accurately refcounting files.  e.g. If a replica is deleted while the 
hosting Solr node is down, what will delete the corresponding "space" for that 
replica in the blob store, decrement the refcounts of shared files, etc?  The 
proposal describes this being done by BlobDirectory - but when the hosting node 
is down seemingly the relevant BlobDirectory won't be instantiated anywhere to 
perform those actions.  That said I don't know enough about how Directory 
objects are instantiated and used to say whether this is actually a real 
concern or whether existing SolrCloud logic will handle these cases 
appropriately.

Overall I'm in favor of the proposal here as a more flexible alternative 
(replacement?) for HdfsDirectory.  So, +1.

> Shared storage -- BlobDirectory (de-duping)
> ---
>
> Key: SOLR-15051
> URL: https://issues.apache.org/jira/browse/SOLR-15051
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This proposal is a way to accomplish shared storage in SolrCloud with a few 
> key characteristics: (A) using a Directory implementation, (B) delegates to a 
> backing local file Directory as a kind of read/write cache (C) replicas have 
> their own "space", (D) , de-duplication across replicas via reference 
> counting, (E) uses ZK but separately from SolrCloud stuff.
> The Directory abstraction is a good one, and helps isolate shared storage 
> from the rest of SolrCloud that doesn't care.  Using a backing normal file 
> Directory is faster for reads and is simpler than Solr's HDFSDirectory's 
> BlockCache.  Replicas having their own space solves the problem of multiple 
> writers (e.g. of the same shard) trying to own and write to the same space, 
> and it implies that any of Solr's replica types can be used along with what 
> goes along with them like peer-to-peer replication (sometimes faster/cheaper 
> than pulling from shared storage).  A de-duplication feature solves needless 
> duplication of files across replicas and from parent shards (i.e. from shard 
> splitting).  The de-duplication feature requires a place to cache directory 
> listings so that they can be shared across replicas and atomically updated; 
> this is handled via ZooKeeper.  Finally, some sort of Solr daemon / 
> auto-scaling code should be added to implement "autoAddReplicas", especially 
> to provide for a scenario where the leader is gone and can't be replicated 
> from directly but we can access shared storage.
> For more about shared storage concepts, consider looking at the description 
> in SOLR-13101 and the linked Google Doc.
> *[PROPOSAL 
> DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] danmuzi opened a new pull request #2200: LUCENE-9661: Fix deadlock in TermsEnum.EMPTY

2021-01-13 Thread GitBox


danmuzi opened a new pull request #2200:
URL: https://github.com/apache/lucene-solr/pull/2200


   Deadlocks can occur if the constructor part of BaseTermsEnum is executed 
during initializing of TermsEnum.
   It can be reproduced by revert the code of TermsEnum class and running 
TestTermsEnumDeadlock#testDeadlock.
   
   This PR will be cherry-picked to the master, 8x and 8.8 branch.
   
   Issue link: https://issues.apache.org/jira/browse/LUCENE-9661



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman commented on a change in pull request #2197: SOLR-15075: Solr docker gradle improvements

2021-01-13 Thread GitBox


HoustonPutman commented on a change in pull request #2197:
URL: https://github.com/apache/lucene-solr/pull/2197#discussion_r556607070



##
File path: gradle/help.gradle
##
@@ -30,7 +30,7 @@ configure(rootProject) {
   ["Git", "help/git.txt", "Git assistance and guides."],
   ["ValidateLogCalls", "help/validateLogCalls.txt", "How to use logging 
calls efficiently."],
   ["IDEs", "help/IDEs.txt", "IDE support."],
-  ["Docker", "help/docker.txt", "Building Solr Docker images."],

Review comment:
   I was moving based off of a comment from @dsmiley. I could go either 
way. I do think it makes sense that the documentation for the docker module 
resides in the docker module. The other help files in that directory seem to be 
applicable project-wide, not for a specific module.

##
File path: solr/docker/build.gradle
##
@@ -18,106 +18,187 @@
 import com.google.common.base.Preconditions
 import com.google.common.base.Strings
 
-apply plugin: 'base'
-apply plugin: 'com.palantir.docker'
-
-subprojects {
-  apply plugin: 'base'
-  apply plugin: 'com.palantir.docker'
-}
-
 description = 'Solr Docker image'
 
-def dockerPackage = project(':solr:docker:package')
-
-dependencies {
-  docker dockerPackage
-}
+apply plugin: 'base'
 
+// Solr Docker inputs
 def dockerImageRepo = propertyOrEnvOrDefault("solr.docker.imageRepo", 
"SOLR_DOCKER_IMAGE_REPO", "apache/solr")
 def dockerImageTag = propertyOrEnvOrDefault("solr.docker.imageTag", 
"SOLR_DOCKER_IMAGE_TAG", "${version}")
 def dockerImageName = propertyOrEnvOrDefault("solr.docker.imageName", 
"SOLR_DOCKER_IMAGE_NAME", "${dockerImageRepo}:${dockerImageTag}")
 def baseDockerImage = propertyOrEnvOrDefault("solr.docker.baseImage", 
"SOLR_DOCKER_BASE_IMAGE", 'openjdk:11-jre-slim')
 def githubUrlOrMirror = propertyOrEnvOrDefault("solr.docker.githubUrl", 
"SOLR_DOCKER_GITHUB_URL", 'github.com')
 
-docker {
-  name = dockerImageName
-  files file('include')
-  buildArgs(['BASE_IMAGE' : baseDockerImage, 'SOLR_PACKAGE_IMAGE' : 
'apache/solr-build:local-package', 'SOLR_VERSION': "${version}", 'GITHUB_URL': 
githubUrlOrMirror])
+// Build directory locations
+def dockerBuildDistribution = "$buildDir/distributions"
+def imageIdFile = "$buildDir/image-id"
+
+configurations {
+  packaging {
+canBeResolved = true
+  }
+  dockerImage {
+canBeResolved = true
+  }
+}
+
+dependencies {
+  packaging project(path: ":solr:packaging", configuration: 'archives')
+
+  dockerImage files(imageIdFile) {
+builtBy 'dockerBuild'
+  }
+}
+
+task dockerTar(type: Tar) {
+  group = 'Docker'
+  description = 'Package docker context to prepare for docker build'
+
+  dependsOn configurations.packaging
+  into('scripts') {
+from file('scripts')
+fileMode 755
+  }
+  into('releases') {
+from configurations.packaging
+include '*.tgz'
+  }
+  from file('Dockerfile')
+  destinationDirectory = file(dockerBuildDistribution)
+  extension 'tgz'
+  compression = Compression.GZIP
 }
 
-tasks.docker {
-  // In order to create the solr docker image, the solr package image must be 
created first.
-  dependsOn(dockerPackage.tasks.docker)
+task dockerBuild(dependsOn: tasks.dockerTar) {
+  group = 'Docker'
+  description = 'Build Solr docker image'
+
+  // Ensure that the docker image is rebuilt on build-arg changes or changes 
in the docker context
+  inputs.properties([
+  baseDockerImage: baseDockerImage,
+  githubUrlOrMirror: githubUrlOrMirror,
+  version: version
+  ])
+  inputs.dir(dockerBuildDistribution)
+
+  doLast {
+exec {
+  standardInput = 
tasks.dockerTar.outputs.files.singleFile.newDataInputStream()
+  commandLine "docker", "build",
+  "--iidfile", imageIdFile,
+  "--build-arg", "BASE_IMAGE=${inputs.properties.baseDockerImage}",
+  "--build-arg", "SOLR_VERSION=${version}",
+  "--build-arg", 
"GITHUB_URL=${inputs.properties.githubUrlOrMirror}",
+  "-"
+}
+  }
 
   // Print information on the image after it has been created
   doLast {
+def dockerImageId = file(imageIdFile).text
 project.logger.lifecycle("Solr Docker Image Created")
-project.logger.lifecycle("\tName: $dockerImageName")
-project.logger.lifecycle("\tBase Image: $baseDockerImage")
+project.logger.lifecycle("\tID: \t$dockerImageId")
+project.logger.lifecycle("\tBase Image: \t$baseDockerImage")
+project.logger.lifecycle("\tSolr Version: \t$version")
   }
+
+  outputs.files(imageIdFile)
 }
 
-abstract class DockerTestSuite extends DefaultTask {
-  private String solrImageName = null;
-  private List tests = new ArrayList<>();
-  private List ignore = new ArrayList<>();
+task dockerTag(dependsOn: tasks.dockerBuild) {
+  group = 'Docker'
+  description = 'Tag Solr docker image'
 
-  @OutputDirectory
-  abstract DirectoryProperty getOutputDir()
+  // Ensure that the docker image is re-tagged if the image ID or desired tag 
changes
+  

[jira] [Comment Edited] (LUCENE-9661) Another classloader deadlock?

2021-01-13 Thread Namgyu Kim (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264209#comment-17264209
 ] 

Namgyu Kim edited comment on LUCENE-9661 at 1/13/21, 3:18 PM:
--

Hi [~dsmiley] and [~mdrob],

*To David*
{quote}Couldn't this sort of thing be caught via static analysis? Classes that 
refer to their subclasses? 
 Perhaps not that broad but it's a start.
 I don't see an IntelliJ "Intention" for anything like this; perhaps there is 
something for other tools?
{quote}
I'm not sure if we can catch that by using static analysis.
 IntelliJ is providing inspection for that.
 You can configure it through the following guide.
 (File -> Settings -> Editor -> Inspections -> Static initializer references 
subclass)
 Here is a screenshot.
 !deadlock inspections.jpg|width=819,height=588!
 If we use only IntelliJ for ide, I think that setting the severity option to 
Error can be a good idea.
 But I didn't checked if Netbeans and Eclipse provide it.

*To Mike*
{quote}There's an IntelliJ inspection that I've seen for classes referring to 
subclasses before.
{quote}
Yeah. Is it the same option as the above?


was (Author: danmuzi):
Hi [~dsmiley] and [~mdrob],
{quote}Couldn't this sort of thing be caught via static analysis? Classes that 
refer to their subclasses? 
 Perhaps not that broad but it's a start.
 I don't see an IntelliJ "Intention" for anything like this; perhaps there is 
something for other tools?
{quote}
I'm not sure if we can catch that by using static analysis.
 IntelliJ is providing inspection for that.
 You can configure it through the following guide.
 (File -> Settings -> Editor -> Inspections -> Static initializer references 
subclass)
 Here is a screenshot.
 !deadlock inspections.jpg|width=819,height=588!
 If we use only IntelliJ for ide, I think that setting the severity option to 
Error can be a good idea.
 But I didn't checked if Netbeans and Eclipse provide it.
{quote}There's an IntelliJ inspection that I've seen for classes referring to 
subclasses before.
{quote}
Yeah. Is it the same option as the above?

> Another classloader deadlock?
> -
>
> Key: LUCENE-9661
> URL: https://issues.apache.org/jira/browse/LUCENE-9661
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.0, master (9.0)
>Reporter: Michael McCandless
>Priority: Blocker
> Fix For: master (9.0), 8.8
>
> Attachments: deadlock inspections.jpg, deadlock_test.patch
>
>
> The {{java}} processes spawned by our Lucene nightly benchmarks sometimes 
> randomly hang, apparently while loading classes across threads, under 
> contention.
> I've opened [this {{luceneutil}} issue with some 
> details|https://github.com/mikemccand/luceneutil/issues/89], but 
> [~uschindler] suggested I open an issue here too since he has been seeing 
> this in CI builds too.
> It is rare, maybe once a week in the nightly benchmarks (which spawn many 
> {{java}} processes with many threads across 128 CPU cores).  It is clearly a 
> deadlock – when it strikes, the process hangs forever until I notice and 
> {{kill -9}} it.  I posted a coupled {{jstacks}} in the issue above.
> [~rcmuir] suggested using {{classcycle}} to maybe statically dig into 
> possible deadlocks ... I have not tried that yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9661) Another classloader deadlock?

2021-01-13 Thread Namgyu Kim (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264209#comment-17264209
 ] 

Namgyu Kim commented on LUCENE-9661:


Hi [~dsmiley] and [~mdrob],
{quote}Couldn't this sort of thing be caught via static analysis? Classes that 
refer to their subclasses? 
 Perhaps not that broad but it's a start.
 I don't see an IntelliJ "Intention" for anything like this; perhaps there is 
something for other tools?
{quote}
I'm not sure if we can catch that by using static analysis.
 IntelliJ is providing inspection for that.
 You can configure it through the following guide.
 (File -> Settings -> Editor -> Inspections -> Static initializer references 
subclass)
 Here is a screenshot.
 !deadlock inspections.jpg|width=819,height=588!
 If we use only IntelliJ for ide, I think that setting the severity option to 
Error can be a good idea.
 But I didn't checked if Netbeans and Eclipse provide it.
{quote}There's an IntelliJ inspection that I've seen for classes referring to 
subclasses before.
{quote}
Yeah. Is it the same option as the above?

> Another classloader deadlock?
> -
>
> Key: LUCENE-9661
> URL: https://issues.apache.org/jira/browse/LUCENE-9661
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.0, master (9.0)
>Reporter: Michael McCandless
>Priority: Blocker
> Fix For: master (9.0), 8.8
>
> Attachments: deadlock inspections.jpg, deadlock_test.patch
>
>
> The {{java}} processes spawned by our Lucene nightly benchmarks sometimes 
> randomly hang, apparently while loading classes across threads, under 
> contention.
> I've opened [this {{luceneutil}} issue with some 
> details|https://github.com/mikemccand/luceneutil/issues/89], but 
> [~uschindler] suggested I open an issue here too since he has been seeing 
> this in CI builds too.
> It is rare, maybe once a week in the nightly benchmarks (which spawn many 
> {{java}} processes with many threads across 128 CPU cores).  It is clearly a 
> deadlock – when it strikes, the process hangs forever until I notice and 
> {{kill -9}} it.  I posted a coupled {{jstacks}} in the issue above.
> [~rcmuir] suggested using {{classcycle}} to maybe statically dig into 
> possible deadlocks ... I have not tried that yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9661) Another classloader deadlock?

2021-01-13 Thread Namgyu Kim (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namgyu Kim updated LUCENE-9661:
---
Attachment: deadlock inspections.jpg

> Another classloader deadlock?
> -
>
> Key: LUCENE-9661
> URL: https://issues.apache.org/jira/browse/LUCENE-9661
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.0, master (9.0)
>Reporter: Michael McCandless
>Priority: Blocker
> Fix For: master (9.0), 8.8
>
> Attachments: deadlock inspections.jpg, deadlock_test.patch
>
>
> The {{java}} processes spawned by our Lucene nightly benchmarks sometimes 
> randomly hang, apparently while loading classes across threads, under 
> contention.
> I've opened [this {{luceneutil}} issue with some 
> details|https://github.com/mikemccand/luceneutil/issues/89], but 
> [~uschindler] suggested I open an issue here too since he has been seeing 
> this in CI builds too.
> It is rare, maybe once a week in the nightly benchmarks (which spawn many 
> {{java}} processes with many threads across 128 CPU cores).  It is clearly a 
> deadlock – when it strikes, the process hangs forever until I notice and 
> {{kill -9}} it.  I posted a coupled {{jstacks}} in the issue above.
> [~rcmuir] suggested using {{classcycle}} to maybe statically dig into 
> possible deadlocks ... I have not tried that yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2166: SOLR-15060: Introduce DelegatingDirectoryFactory.

2021-01-13 Thread GitBox


dsmiley commented on a change in pull request #2166:
URL: https://github.com/apache/lucene-solr/pull/2166#discussion_r556581069



##
File path: solr/core/src/java/org/apache/solr/core/CachingDirectoryFactory.java
##
@@ -405,11 +398,6 @@ public void incRef(Directory directory) {
 
   @Override
   public void init(@SuppressWarnings("rawtypes") NamedList args) {
-maxWriteMBPerSecFlush = (Double) args.get("maxWriteMBPerSecFlush");
-maxWriteMBPerSecMerge = (Double) args.get("maxWriteMBPerSecMerge");
-maxWriteMBPerSecRead = (Double) args.get("maxWriteMBPerSecRead");
-maxWriteMBPerSecDefault = (Double) args.get("maxWriteMBPerSecDefault");
-

Review comment:
   I looked at these a week ago.  It's lingering left-over stuff from a 
refactoring McCandless did years ago in where some thresholds are configured.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9644) HNSW diverse neighbor selection heuristic

2021-01-13 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264190#comment-17264190
 ] 

Michael McCandless commented on LUCENE-9644:


This change seems to have caused a surprising jump in Lucene's nightly 
{{VectorSearch}} performance: 
[https://home.apache.org/~mikemccand/lucenebench/VectorSearch.html] (annotation 
DJ).

> HNSW diverse neighbor selection heuristic
> -
>
> Key: LUCENE-9644
> URL: https://issues.apache.org/jira/browse/LUCENE-9644
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This will replace the simple nearest neighbor selection with a criterion that 
> takes into account the distance of the neighbors from each other. It is seen 
> to provide dramatically improved recall on at least two datasets, and is what 
> is being used by our reference implementation, hnswlib. The basic idea is 
> that when selecting  the nearest neighbors to associate with a new node added 
> to the graph, we filter using a diversity criterion. If a candidate neighbor 
> is closer to an already-added (closer to the new node) neighbor than it is to 
> the new node, then we pass over it, moving on to more-distant, but presumably 
> more diverse neighbors. The same criterion is also (re-) applied to the 
> neighbor nodes' neighbors, since we add the links bidirectionally.
> h2. Results:
> h3. GloVe/Wikipedia
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.643|0.77|  10| 50| 32| 64| 1742|   22830|
> |0.671|0.95|  10| 100|32| 64| 2141|   0|
> |0.704|1.32|  10| 200|32| 64| 2923|   0|
> |0.739|2.04|  10| 400|32| 64| 4382|   0|
> |0.470|0.91|  100|50| 32| 64| 2068|   337081|
> |0.496|1.21|  100|100|32| 64| 2548|   0|
> |0.533|1.77|  100|200|32| 64| 3479|   0|
> |0.573|2.58|  100|400|32| 64| 5257|   0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.801|0.57|  10| 50| 32| 64| 593|17985|
> |0.840|0.67|  10| 100|32| 64| 738|0|
> |0.883|0.97|  10| 200|32| 64| 1018|   0|
> |0.921|1.36|  10| 400|32| 64| 1502|   0|
> |0.723|0.71|  100|50| 32| 64| 860|298383|
> |0.761|0.77|  100|100|32| 64| 1058|   0|
> |0.806|1.06|  100|200|32| 64| 1442|   0|
> |0.854|1.67|  100|400|32| 64| 2159|   0|
> h3. Dataset from work:
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.933|1.41|  10| 50| 32| 64| 1496|   35462|
> |0.948|1.39|  10| 100|32| 64| 1872|   0|
> |0.961|2.10|  10| 200|32| 64| 2591|   0|
> |0.972|3.04|  10| 400|32| 64| 3939|   0|
> |0.827|1.34|  100|50| 32| 64| 1676|   535802|
> |0.854|1.76|  100|100|32| 64| 2056|   0|
> |0.887|2.47|  100|200|32| 64| 2761|   0|
> |0.907|3.75|  100|400|32| 64| 4129|   0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.966|1.18|  10| 50| 32| 64| 1480|   37656|
> |0.977|1.46|  10| 100|32| 64| 1832|   0|
> |0.988|2.00|  10| 200|32| 64| 2472|   0|
> |0.995|3.14|  10| 400|32| 64| 3629|   0|
> |0.944|1.34|  100|50| 32| 64| 1780|   526834|
> |0.959|1.71|  100|100|32| 64| |   0|
> |0.975|2.30|  100|200|32| 64| 3041|   0|
> |0.986|3.56|  100|400|32| 64| 4543|   0|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on pull request #2166: SOLR-15060: Introduce DelegatingDirectoryFactory.

2021-01-13 Thread GitBox


dsmiley commented on pull request #2166:
URL: https://github.com/apache/lucene-solr/pull/2166#issuecomment-759490172


   The PR is about introducing a low level utility that we expect will be 
useful for custom DirectoryFactory implementations _that we don't even have 
yet_.  It's the kind of thing that many of us wouldn't of even created a 
separate PR for -- it'd show up in a larger PR that introduces the first user 
of the utility.  It's not even worth a CHANGES.txt entry IMO; there's always 
the commit message.  As long as there is no user of this code yet, I'd prefer 
that we don't merge this yet.  Perhaps we may find that the abstraction here 
isn't quite right.
   
   RE master (9) vs 8x... If we wait till BlobDirectory, it'd almost certainly 
be 9. If we don't, ehh... I'd just do 9.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9663) Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-13 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264171#comment-17264171
 ] 

Adrien Grand commented on LUCENE-9663:
--

+1 to add lightweight compression to doc-value terms dictionaries. I've seen 
users store things like unique URLs in sorted doc-value fields where 
compressing suffixes would have helped.

I agree with Jaison that the query impact should be negligible since faceting 
typically bottlenecks on reading ordinals, not terms dictionaries, though we 
should double check. :) Also +1 to test how slower building an OrdinalMap gets 
with this change.

bq. replacing prefix-compression with LZ4

My intuition is that it would actually be better to do LZ4 in addition to 
prefix compression, like we do for the terms dictionary of the inverted index.

> Adding compression to terms dict from SortedSet/Sorted DocValues
> 
>
> Key: LUCENE-9663
> URL: https://issues.apache.org/jira/browse/LUCENE-9663
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Jaison.Bi
>Priority: Trivial
>
> Elasticsearch keyword field uses SortedSet DocValues. In our applications, 
> “keyword” is the most frequently used field type.
>  LUCENE-7081 has done prefix-compression for docvalues terms dict. We can do 
> better by replacing prefix-compression with LZ4. In one of our application, 
> the dvd files were ~41% smaller with this change(from 1.95 GB to 1.15 GB).
>  I've done simple tests based on the real application data, comparing the 
> write/merge time cost, and the on-disk *.dvd file size(after merge into 1 
> segment).
> || ||Before||After||
> |Write time cost(ms)|591972|618200|
> |Merge time cost(ms)|270661|294663|
> |*.dvd file size(GB)|1.95|1.15|
> This feature is only for the high-cardinality fields. 
>  I'm doing the benchmark test based on luceneutil. Will attach the report and 
> patch after the test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-13 Thread GitBox


muse-dev[bot] commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r556520136



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/impl/ModificationRequestImpl.java
##
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster.placement.impl;
+
+import org.apache.solr.cluster.Replica;
+import org.apache.solr.cluster.Shard;
+import org.apache.solr.cluster.SolrCollection;
+import org.apache.solr.cluster.placement.DeleteReplicasRequest;
+import org.apache.solr.cluster.placement.DeleteShardsRequest;
+import org.apache.solr.common.cloud.DocCollection;
+import org.apache.solr.common.cloud.Slice;
+
+import java.util.HashSet;
+import java.util.Set;
+
+/**
+ *
+ */
+public class ModificationRequestImpl {
+
+  public static DeleteReplicasRequest deleteReplicasRequest(SolrCollection 
collection, Set replicas) {
+return new DeleteReplicasRequest() {
+  @Override
+  public Set getReplicas() {
+return replicas;
+  }
+
+  @Override
+  public SolrCollection getCollection() {
+return collection;
+  }
+};
+  }
+
+  public static DeleteReplicasRequest deleteReplicasRequest(DocCollection 
docCollection, String shardName, Set replicaNames) {
+SolrCollection solrCollection = 
SimpleClusterAbstractionsImpl.SolrCollectionImpl.fromDocCollection(docCollection);
+Shard shard = solrCollection.getShard(shardName);

Review comment:
   *NULL_DEREFERENCE:*  object `solrCollection` last assigned on line 51 
could be null and is dereferenced at line 52.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #1666: SOLR-14155: Load all other SolrCore plugins from packages

2021-01-13 Thread GitBox


muse-dev[bot] commented on a change in pull request #1666:
URL: https://github.com/apache/lucene-solr/pull/1666#discussion_r556493633



##
File path: solr/core/src/java/org/apache/solr/core/SolrResourceLoader.java
##
@@ -645,15 +648,29 @@ static void clearCache() {
 }
   }
 
+  void initCore(SolrCore core) {
+this.coreName = core.getName();
+this.config = core.getSolrConfig();
+this.coreId = core.uniqueId;
+this.coreContainer = core.getCoreContainer();
+SolrCore.Provider coreProvider = core.coreProvider;
+
+this.coreReloadingClassLoader = new 
PackageListeningClassLoader(core.getCoreContainer(),
+this, s -> config.maxPackageVersion(s), null){
+  @Override
+  protected void doReloadAction(Ctx ctx) {
+log.info("Core reloading classloader issued reload for: {}/{} ", 
coreName, coreId);
+coreProvider.reload();
+  }
+};
+core.getPackageListeners().addListener(coreReloadingClassLoader, true);
+
+  }
 
   /**
* Tell all {@link SolrCoreAware} instances about the SolrCore
*/
   public void inform(SolrCore core) {
-this.coreName = core.getName();
-this.config = core.getSolrConfig();
-this.coreId = core.uniqueId;
-this.coreContainer = core.getCoreContainer();
 if(getSchemaLoader() != null) 
core.getPackageListeners().addListener(schemaLoader);

Review comment:
   *THREAD_SAFETY_VIOLATION:*  Read/Write race. Non-private method 
`SolrResourceLoader.inform(...)` indirectly reads without synchronization from 
`this.coreContainer`. Potentially races with write in method 
`SolrResourceLoader.initCore(...)`.
Reporting because this access may occur on a background thread.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9666) http://s.apache.org/luceneversions doesn't work

2021-01-13 Thread Peter Gromov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264098#comment-17264098
 ] 

Peter Gromov commented on LUCENE-9666:
--

Sorry, it only occurred to me now that such questions might be better discussed 
in the mailing list. If so, please forgive me.

> http://s.apache.org/luceneversions doesn't work
> ---
>
> Key: LUCENE-9666
> URL: https://issues.apache.org/jira/browse/LUCENE-9666
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/website
>Reporter: Peter Gromov
>Priority: Major
>
> http://s.apache.org/luceneversions redirects to JIRA for me and doesn't give 
> any "information on past and future Lucene versions" as promised in 
> CHANGES.txt and at 
> https://lucene.apache.org/core/8_7_0/changes/Changes.html#v8.7.0.optimizations
>  :(



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram opened a new pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-13 Thread GitBox


sigram opened a new pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199


   See Jira for details.
   
   This is a minimal approach that helps to set-up the initial co-location and 
then may veto collection layout changes (additions / removals or replicas) if 
it would violate the `withCollection` constraint.
   
   This PR also refactors the placement API to add support for other types of 
"collection modification" requests, and extends the placement plugin API to add 
a method for vetoing such changes. `DeleteReplicaCmd` is also modified to make 
use of this functionality.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14155) Load all other SolrCore plugins from packages

2021-01-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264085#comment-17264085
 ] 

ASF subversion and git services commented on SOLR-14155:


Commit 9466af576a4a9d3cd750438123063928329fbb46 in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9466af5 ]

SOLR-14155: Load all other SolrCore plugins from packages (#1666)



> Load all other SolrCore plugins from packages
> -
>
> Key: SOLR-14155
> URL: https://issues.apache.org/jira/browse/SOLR-14155
> Project: Solr
>  Issue Type: Sub-task
>  Components: packages
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> A few plugins configurable in {{solrconfig.xml}} still cannot be loaded from 
> packages 
>  # SolrEventListener (improperly implemented)
>  # DirectoryFactor
>  # Updatelog
>  # Cache
>  # RecoveryStrategy
>  # IndexReaderFactory
>  # CodecFactory
>  # StatsCache
> #1 can do hot reload.  other should result in reloading the core. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #2198: SOLR-15081: Metrics for core: isLeader, status

2021-01-13 Thread GitBox


sigram commented on a change in pull request #2198:
URL: https://github.com/apache/lucene-solr/pull/2198#discussion_r556449504



##
File path: solr/core/src/java/org/apache/solr/core/SolrCore.java
##
@@ -1202,26 +1203,26 @@ public void initializeMetrics(SolrMetricsContext 
parentContext, String scope) {
 parentContext.gauge(() -> isClosed() ? parentContext.nullString() : 
getIndexDir(), true, "indexDir", Category.CORE.toString());
 parentContext.gauge(() -> isClosed() ? parentContext.nullNumber() : 
getIndexSize(), true, "sizeInBytes", Category.INDEX.toString());
 parentContext.gauge(() -> isClosed() ? parentContext.nullString() : 
NumberUtils.readableSize(getIndexSize()), true, "size", 
Category.INDEX.toString());
-if (coreContainer != null) {
-  final CloudDescriptor cd = getCoreDescriptor().getCloudDescriptor();
-  if (cd != null) {
-parentContext.gauge(() -> {
-  if (cd.getCollectionName() != null) {
-return cd.getCollectionName();
-  } else {
-return parentContext.nullString();
-  }
-}, true, "collection", Category.CORE.toString());
 
-parentContext.gauge(() -> {
-  if (cd.getShardId() != null) {
-return cd.getShardId();
-  } else {
-return parentContext.nullString();
-  }
-}, true, "shard", Category.CORE.toString());
-  }
+final CloudDescriptor cd = getCoreDescriptor().getCloudDescriptor();
+if (cd != null) {
+  // TODO
+  parentContext.gauge(cd::getCollectionName, true, "collection", 
Category.CORE.toString());
+  parentContext.gauge(() -> Objects.requireNonNullElse(cd.getShardId(), 
parentContext.nullString()), true, "shard", Category.CORE.toString());
+  //TODO should this instead be in a core status, or a metric?  When do we 
use which?

Review comment:
   Yeah, metrics today overlap a lot with "status" requests... something to 
clean up in 9x.
   
   Initially the metrics API wasn't able to properly report complex values 
(esp. when reported via JMX) but this has been fixed around 7.0 or so - support 
for non-numeric values had to be added specifically to report things like 
paths, non-numeric state, etc. and for complex properties like eg. system 
properties, caches, etc. Now it can report basically anything you want.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #2198: SOLR-15081: Metrics for core: isLeader, status

2021-01-13 Thread GitBox


sigram commented on a change in pull request #2198:
URL: https://github.com/apache/lucene-solr/pull/2198#discussion_r556446332



##
File path: solr/core/src/java/org/apache/solr/core/SolrCore.java
##
@@ -1202,26 +1203,26 @@ public void initializeMetrics(SolrMetricsContext 
parentContext, String scope) {
 parentContext.gauge(() -> isClosed() ? parentContext.nullString() : 
getIndexDir(), true, "indexDir", Category.CORE.toString());
 parentContext.gauge(() -> isClosed() ? parentContext.nullNumber() : 
getIndexSize(), true, "sizeInBytes", Category.INDEX.toString());
 parentContext.gauge(() -> isClosed() ? parentContext.nullString() : 
NumberUtils.readableSize(getIndexSize()), true, "size", 
Category.INDEX.toString());
-if (coreContainer != null) {
-  final CloudDescriptor cd = getCoreDescriptor().getCloudDescriptor();
-  if (cd != null) {
-parentContext.gauge(() -> {
-  if (cd.getCollectionName() != null) {
-return cd.getCollectionName();
-  } else {
-return parentContext.nullString();
-  }
-}, true, "collection", Category.CORE.toString());
 
-parentContext.gauge(() -> {
-  if (cd.getShardId() != null) {
-return cd.getShardId();
-  } else {
-return parentContext.nullString();
-  }
-}, true, "shard", Category.CORE.toString());
-  }
+final CloudDescriptor cd = getCoreDescriptor().getCloudDescriptor();
+if (cd != null) {
+  // TODO
+  parentContext.gauge(cd::getCollectionName, true, "collection", 
Category.CORE.toString());
+  parentContext.gauge(() -> Objects.requireNonNullElse(cd.getShardId(), 
parentContext.nullString()), true, "shard", Category.CORE.toString());
+  //TODO should this instead be in a core status, or a metric?  When do we 
use which?
+  //   SEE org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus
+  parentContext.gauge(() -> {
+DocCollection docColl = 
coreContainer.getZkController().getZkStateReader().getClusterState().getCollectionOrNull(cd.getCollectionName(),
 true);
+Replica leaderReplica = docColl.getLeader(cd.getShardId());
+return leaderReplica.getName().equals(cd.getCoreNodeName());
+  }, true, "isLeader", Category.CORE.toString());
+  parentContext.gauge(() -> {
+DocCollection docColl = 
coreContainer.getZkController().getZkStateReader().getClusterState().getCollectionOrNull(cd.getCollectionName(),
 true);
+final Replica myReplica = docColl.getReplica(cd.getCoreNodeName());
+return Objects.requireNonNullElse(myReplica.getState().toString(), 
parentContext.nullString());

Review comment:
   Similarly `cloudDescriptor.getLastPublished()`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram commented on a change in pull request #2198: SOLR-15081: Metrics for core: isLeader, status

2021-01-13 Thread GitBox


sigram commented on a change in pull request #2198:
URL: https://github.com/apache/lucene-solr/pull/2198#discussion_r556445628



##
File path: solr/core/src/java/org/apache/solr/core/SolrCore.java
##
@@ -1202,26 +1203,26 @@ public void initializeMetrics(SolrMetricsContext 
parentContext, String scope) {
 parentContext.gauge(() -> isClosed() ? parentContext.nullString() : 
getIndexDir(), true, "indexDir", Category.CORE.toString());
 parentContext.gauge(() -> isClosed() ? parentContext.nullNumber() : 
getIndexSize(), true, "sizeInBytes", Category.INDEX.toString());
 parentContext.gauge(() -> isClosed() ? parentContext.nullString() : 
NumberUtils.readableSize(getIndexSize()), true, "size", 
Category.INDEX.toString());
-if (coreContainer != null) {
-  final CloudDescriptor cd = getCoreDescriptor().getCloudDescriptor();
-  if (cd != null) {
-parentContext.gauge(() -> {
-  if (cd.getCollectionName() != null) {
-return cd.getCollectionName();
-  } else {
-return parentContext.nullString();
-  }
-}, true, "collection", Category.CORE.toString());
 
-parentContext.gauge(() -> {
-  if (cd.getShardId() != null) {
-return cd.getShardId();
-  } else {
-return parentContext.nullString();
-  }
-}, true, "shard", Category.CORE.toString());
-  }
+final CloudDescriptor cd = getCoreDescriptor().getCloudDescriptor();
+if (cd != null) {
+  // TODO
+  parentContext.gauge(cd::getCollectionName, true, "collection", 
Category.CORE.toString());
+  parentContext.gauge(() -> Objects.requireNonNullElse(cd.getShardId(), 
parentContext.nullString()), true, "shard", Category.CORE.toString());
+  //TODO should this instead be in a core status, or a metric?  When do we 
use which?
+  //   SEE org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus
+  parentContext.gauge(() -> {
+DocCollection docColl = 
coreContainer.getZkController().getZkStateReader().getClusterState().getCollectionOrNull(cd.getCollectionName(),
 true);

Review comment:
   Maybe use `cloudDescriptor.isLeader()`, it's simpler...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2197: SOLR-15075: Solr docker gradle improvements

2021-01-13 Thread GitBox


dweiss commented on a change in pull request #2197:
URL: https://github.com/apache/lucene-solr/pull/2197#discussion_r556429043



##
File path: solr/docker/build.gradle
##
@@ -18,106 +18,187 @@
 import com.google.common.base.Preconditions
 import com.google.common.base.Strings
 
-apply plugin: 'base'
-apply plugin: 'com.palantir.docker'
-
-subprojects {
-  apply plugin: 'base'
-  apply plugin: 'com.palantir.docker'
-}
-
 description = 'Solr Docker image'
 
-def dockerPackage = project(':solr:docker:package')
-
-dependencies {
-  docker dockerPackage
-}
+apply plugin: 'base'
 
+// Solr Docker inputs
 def dockerImageRepo = propertyOrEnvOrDefault("solr.docker.imageRepo", 
"SOLR_DOCKER_IMAGE_REPO", "apache/solr")
 def dockerImageTag = propertyOrEnvOrDefault("solr.docker.imageTag", 
"SOLR_DOCKER_IMAGE_TAG", "${version}")
 def dockerImageName = propertyOrEnvOrDefault("solr.docker.imageName", 
"SOLR_DOCKER_IMAGE_NAME", "${dockerImageRepo}:${dockerImageTag}")
 def baseDockerImage = propertyOrEnvOrDefault("solr.docker.baseImage", 
"SOLR_DOCKER_BASE_IMAGE", 'openjdk:11-jre-slim')
 def githubUrlOrMirror = propertyOrEnvOrDefault("solr.docker.githubUrl", 
"SOLR_DOCKER_GITHUB_URL", 'github.com')
 
-docker {
-  name = dockerImageName
-  files file('include')
-  buildArgs(['BASE_IMAGE' : baseDockerImage, 'SOLR_PACKAGE_IMAGE' : 
'apache/solr-build:local-package', 'SOLR_VERSION': "${version}", 'GITHUB_URL': 
githubUrlOrMirror])
+// Build directory locations
+def dockerBuildDistribution = "$buildDir/distributions"
+def imageIdFile = "$buildDir/image-id"
+
+configurations {
+  packaging {
+canBeResolved = true
+  }
+  dockerImage {
+canBeResolved = true
+  }
+}
+
+dependencies {
+  packaging project(path: ":solr:packaging", configuration: 'archives')
+
+  dockerImage files(imageIdFile) {
+builtBy 'dockerBuild'
+  }
+}
+
+task dockerTar(type: Tar) {
+  group = 'Docker'
+  description = 'Package docker context to prepare for docker build'
+
+  dependsOn configurations.packaging
+  into('scripts') {
+from file('scripts')
+fileMode 755
+  }
+  into('releases') {
+from configurations.packaging
+include '*.tgz'
+  }
+  from file('Dockerfile')
+  destinationDirectory = file(dockerBuildDistribution)
+  extension 'tgz'
+  compression = Compression.GZIP
 }
 
-tasks.docker {
-  // In order to create the solr docker image, the solr package image must be 
created first.
-  dependsOn(dockerPackage.tasks.docker)
+task dockerBuild(dependsOn: tasks.dockerTar) {
+  group = 'Docker'
+  description = 'Build Solr docker image'
+
+  // Ensure that the docker image is rebuilt on build-arg changes or changes 
in the docker context
+  inputs.properties([
+  baseDockerImage: baseDockerImage,
+  githubUrlOrMirror: githubUrlOrMirror,
+  version: version
+  ])
+  inputs.dir(dockerBuildDistribution)
+
+  doLast {
+exec {
+  standardInput = 
tasks.dockerTar.outputs.files.singleFile.newDataInputStream()
+  commandLine "docker", "build",
+  "--iidfile", imageIdFile,
+  "--build-arg", "BASE_IMAGE=${inputs.properties.baseDockerImage}",
+  "--build-arg", "SOLR_VERSION=${version}",
+  "--build-arg", 
"GITHUB_URL=${inputs.properties.githubUrlOrMirror}",
+  "-"
+}
+  }
 
   // Print information on the image after it has been created
   doLast {
+def dockerImageId = file(imageIdFile).text
 project.logger.lifecycle("Solr Docker Image Created")
-project.logger.lifecycle("\tName: $dockerImageName")
-project.logger.lifecycle("\tBase Image: $baseDockerImage")
+project.logger.lifecycle("\tID: \t$dockerImageId")
+project.logger.lifecycle("\tBase Image: \t$baseDockerImage")
+project.logger.lifecycle("\tSolr Version: \t$version")
   }
+
+  outputs.files(imageIdFile)
 }
 
-abstract class DockerTestSuite extends DefaultTask {
-  private String solrImageName = null;
-  private List tests = new ArrayList<>();
-  private List ignore = new ArrayList<>();
+task dockerTag(dependsOn: tasks.dockerBuild) {
+  group = 'Docker'
+  description = 'Tag Solr docker image'
 
-  @OutputDirectory
-  abstract DirectoryProperty getOutputDir()
+  // Ensure that the docker image is re-tagged if the image ID or desired tag 
changes
+  inputs.properties([
+  dockerImageName: dockerImageName,
+  ])
+  inputs.file(imageIdFile)
 
-  public void setSolrImageName(String solrImageName) {
-this.solrImageName = solrImageName
+  doLast {
+exec {
+  commandLine "docker", "tag", 
tasks.dockerBuild.outputs.files.singleFile.text, 
inputs.properties.dockerImageName
+}
   }
 
-  public String getSolrImageName() {
-Preconditions.checkArgument(!Strings.isNullOrEmpty(solrImageName), 
"solrImageName is a required dockerTests configuration item.")
-return solrImageName
+  // Print information on the image after it has been created
+  doLast {
+def dockerImageId = tasks.dockerBuild.outputs.files.singleFile.text
+ 

[jira] [Assigned] (LUCENE-9646) Set BM25Similarity discountOverlaps via the constructor

2021-01-13 Thread Bruno Roustant (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Roustant reassigned LUCENE-9646:
--

Assignee: Bruno Roustant

> Set BM25Similarity discountOverlaps via the constructor
> ---
>
> Key: LUCENE-9646
> URL: https://issues.apache.org/jira/browse/LUCENE-9646
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (9.0)
>Reporter: Patrick Marty
>Assignee: Bruno Roustant
>Priority: Trivial
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> BM25Similarity discountOverlaps parameter is true by default.
> It can be set with 
> {{org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps}} 
> method.
> But this method makes BM25Similarity mutable.
>  
> discountOverlaps should be set via the constructor and 
> {{setDiscountOverlaps}} method should be removed to make BM25Similarity 
> immutable.
>  
> PR https://github.com/apache/lucene-solr/pull/2161



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9646) Set BM25Similarity discountOverlaps via the constructor

2021-01-13 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264033#comment-17264033
 ] 

Bruno Roustant commented on LUCENE-9646:


I'm going to merge it tomorrow, on master only.

> Set BM25Similarity discountOverlaps via the constructor
> ---
>
> Key: LUCENE-9646
> URL: https://issues.apache.org/jira/browse/LUCENE-9646
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (9.0)
>Reporter: Patrick Marty
>Priority: Trivial
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> BM25Similarity discountOverlaps parameter is true by default.
> It can be set with 
> {{org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps}} 
> method.
> But this method makes BM25Similarity mutable.
>  
> discountOverlaps should be set via the constructor and 
> {{setDiscountOverlaps}} method should be removed to make BM25Similarity 
> immutable.
>  
> PR https://github.com/apache/lucene-solr/pull/2161



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2021-01-13 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264023#comment-17264023
 ] 

Andrzej Bialecki commented on SOLR-15055:
-

>From the Slack discussions on #solr-dev it looks like the least intrusive 
>option for now is to provide a way for the placement plugins to veto 
>collection layout changes (adding / removing replicas and shards) if they 
>would violate the constraint, and delegate the responsibility to meet the 
>constraint to the operator (by manually adding necessary number of secondary 
>replicas, or manually removing them first from the nodes where the primary 
>replicas are to be deleted).

Other options would introduce a lot of complexity to the existing collection 
admin commands.

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-13 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264021#comment-17264021
 ] 

Andrzej Bialecki commented on SOLR-15056:
-

{quote}Under 100% CPU, load average doesn't tell us much, but CPU usage is very 
useful. Over 100% CPU, CPU utilization doesn't tell us much, but load average 
tells us a lot. It tells us how much work is waiting to run.
{quote}
Well said! Indeed these are two very different metrics, and having two separate 
breaker implementations is not a bad idea (well, we could use one impl + a 
switch, but that could be too confusing and too easy to make mistakes).

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>  Labels: Metrics
> Attachments: SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org