[GitHub] [lucene] msokolov commented on a change in pull request #369: LUCENE-10158: Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing

2021-10-08 Thread GitBox


msokolov commented on a change in pull request #369:
URL: https://github.com/apache/lucene/pull/369#discussion_r725426021



##
File path: lucene/core/src/java/org/apache/lucene/util/Unwrappable.java
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+/**
+ * An object with this interface is a wrapper around another object (e.g., a 
filter with a
+ * delegate). The method {@link #unwrap()} can be called to get the wrapped 
object
+ *
+ * @lucene.internal
+ */
+public interface Unwrappable {
+
+  /** Unwraps this instance */
+  T unwrap();
+
+  /** Unwraps all {@code Unwrapable}s around the given object. */

Review comment:
   nit "Unwra**p**able" needs a double-p




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] janhoy commented on a change in pull request #355: LUCENE-9997 Revisit smoketester for 9.0 build

2021-10-08 Thread GitBox


janhoy commented on a change in pull request #355:
URL: https://github.com/apache/lucene/pull/355#discussion_r725401031



##
File path: dev-tools/scripts/smokeTestRelease.py
##
@@ -48,6 +48,7 @@
 cygwin = platform.system().lower().startswith('cygwin')
 cygwinWindowsRoot = os.popen('cygpath -w /').read().strip().replace('\\','/') 
if cygwin else ''
 
+

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] janhoy commented on a change in pull request #355: LUCENE-9997 Revisit smoketester for 9.0 build

2021-10-08 Thread GitBox


janhoy commented on a change in pull request #355:
URL: https://github.com/apache/lucene/pull/355#discussion_r725396122



##
File path: dev-tools/scripts/smokeTestRelease.py
##
@@ -900,42 +701,19 @@ def testDemo(run_java, isSrc, version, jdk):
   if removeTrailingZeros(actualVersion) != removeTrailingZeros(version):
 raise RuntimeError('wrong version from CheckIndex: got "%s" but expected 
"%s"' % (actualVersion, version))
 
+
 def removeTrailingZeros(version):
   return re.sub(r'(\.0)*$', '', version)
 
-def checkMaven(solrSrcUnpackPath, baseURL, tmpDir, gitRevision, version, 
isSigned, keysFile):
-  POMtemplates = defaultdict()
-  getPOMtemplates(solrSrcUnpackPath, POMtemplates, tmpDir)
-  print('download artifacts')
-  artifacts = {'lucene': [], 'solr': []}
-  for project in ('lucene', 'solr'):
-artifactsURL = '%s/%s/maven/org/apache/%s/' % (baseURL, project, project)
-targetDir = '%s/maven/org/apache/%s' % (tmpDir, project)
-if not os.path.exists(targetDir):
-  os.makedirs(targetDir)
-crawl(artifacts[project], artifactsURL, targetDir)
-  print()
-  verifyPOMperBinaryArtifact(artifacts, version)
-  verifyArtifactPerPOMtemplate(POMtemplates, artifacts, tmpDir, version)
-  verifyMavenDigests(artifacts)
-  checkJavadocAndSourceArtifacts(artifacts, version)
-  verifyDeployedPOMsCoordinates(artifacts, version)
-  if isSigned:
-verifyMavenSigs(baseURL, tmpDir, artifacts, keysFile)
-
-  distFiles = getBinaryDistFilesForMavenChecks(tmpDir, version, baseURL)
-  checkIdenticalMavenArtifacts(distFiles, artifacts, version)
-
-  checkAllJARs('%s/maven/org/apache/lucene' % tmpDir, 'lucene', gitRevision, 
version, tmpDir, baseURL)
-  checkAllJARs('%s/maven/org/apache/solr' % tmpDir, 'solr', gitRevision, 
version, tmpDir, baseURL)
 
 def getBinaryDistFilesForMavenChecks(tmpDir, version, baseURL):
   # TODO: refactor distribution unpacking so that it only happens once per 
distribution per smoker run
   distFiles = defaultdict()
-  for project in ('lucene', 'solr'):
+  for project in ('lucene'):

Review comment:
   The function was dead code, but then by the careful eye of @cpoerschke 
the validation of maven artifacts is back and targeting lucene only.
   
   In this first iteration I just get rid of solr, but agree that we can 
collapse some loops and also get rid of some `project` function arguments.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] markrmiller edited a comment on pull request #365: GameGenie:1990JMH

2021-10-08 Thread GitBox


markrmiller edited a comment on pull request #365:
URL: https://github.com/apache/lucene/pull/365#issuecomment-939167876


   > 
   >   git clone -b JMH --single-branch 
https://github.com/markrmiller/lucene.git
   >cd lucene/lucene/jmh
   >   ./jmh.sh FuzzyQuery
   >   
   >   // async-profiler flame graphs to lucene/lucene/jmh/work/*, 1 warm up 
iteration, 1 iteration, 1 second each, measure throughput
   
https://github.com/markrmiller/lucene/blob/JMH/lucene/jmh/README.md#using-jmh-with-async-profiler
   >   ./jmh.sh FuzzyQuery -w 1 -wi 1  -r 1 -i 1 -prof 
async:dir=work\;output=flamegraph


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] markrmiller edited a comment on pull request #365: GameGenie:1990JMH

2021-10-08 Thread GitBox


markrmiller edited a comment on pull request #365:
URL: https://github.com/apache/lucene/pull/365#issuecomment-937960859


   The **index**, **line doc file**, and **tasks file** can be generated via 
the _lucene-util_ benchmark. The only current task is a **SearchPerf** task 
that roughly emulates a _std lucene-util_ benchmark and consumes a compatible 
**index**, **line doc file** and **tasks file**.
   
   You can of course create any kind of the lucene jmh benchmark - **jmh** is a 
fully integrated module. But the only thing here to play with is a 
**SearchPerf**.
   
   It will, by default look for those 3 files in:
   
   > "lucene/jmh/work/index"
   > "lucene/jmh/work/lines.txt"
   > "lucene/jmh/work/tasks.txt"
   
   You can override with:
   
   >   lucene/lucene/jmh  ./jmh.sh SearchPerf.orHighHigh -jvmArgs 
-Dindex=/mnt/s1/wikimedium5m/index -jvmArgs 
-Dldfile=/mnt/s1/lucene-bench/data/enwiki-20120502-lines-1k.txt -jvmArgs 
-DtasksFile=/mnt/s1/wikimedium5m/wikimedium500.tasks
   
   You can adjust or vary all the same parameters as lucene-util (via necessary 
shorter param names) via **jmh's** -p parameter control:
   
   `lucene/lucene/jmh  ./jmh.sh SearchPerf -p nrt=true,false -p 
analyzer=StandardAnalyzer,WhitespaceAnalyzer -p postfmt=Lucene90`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] markrmiller edited a comment on pull request #365: GameGenie:1990JMH

2021-10-08 Thread GitBox


markrmiller edited a comment on pull request #365:
URL: https://github.com/apache/lucene/pull/365#issuecomment-939167876


   > 
   >   git clone -b JMH --single-branch 
https://github.com/markrmiller/lucene.git
   >cd lucene/lucene/jmh
   >   ./jmh.sh FuzzyQuery
   > 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] markrmiller commented on pull request #365: GameGenie:1990JMH

2021-10-08 Thread GitBox


markrmiller commented on pull request #365:
URL: https://github.com/apache/lucene/pull/365#issuecomment-939167876


   `  
 git clone -b JMH --single-branch https://github.com/markrmiller/lucene.git
  cd lucene/lucene/jmh
 ./jmh.sh FuzzyQuery
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] janhoy commented on a change in pull request #355: LUCENE-9997 Revisit smoketester for 9.0 build

2021-10-08 Thread GitBox


janhoy commented on a change in pull request #355:
URL: https://github.com/apache/lucene/pull/355#discussion_r725370169



##
File path: dev-tools/scripts/smokeTestRelease.py
##
@@ -900,42 +701,19 @@ def testDemo(run_java, isSrc, version, jdk):
   if removeTrailingZeros(actualVersion) != removeTrailingZeros(version):
 raise RuntimeError('wrong version from CheckIndex: got "%s" but expected 
"%s"' % (actualVersion, version))
 
+
 def removeTrailingZeros(version):
   return re.sub(r'(\.0)*$', '', version)
 
-def checkMaven(solrSrcUnpackPath, baseURL, tmpDir, gitRevision, version, 
isSigned, keysFile):

Review comment:
   Oops. I was too quick removing this, as it looked as if it was solr 
specific, but the `solrSrcUnpackPath` was only to get hold of the maven pom 
templates that we don't use anymore as we don't maintain a separate maven build.
   
   I'm bringing the test(s) back, for lucene only. Have not tested yet, but 
hope to do that too...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] janhoy commented on a change in pull request #355: LUCENE-9997 Revisit smoketester for 9.0 build

2021-10-08 Thread GitBox


janhoy commented on a change in pull request #355:
URL: https://github.com/apache/lucene/pull/355#discussion_r725350490



##
File path: dev-tools/scripts/smokeTestRelease.py
##
@@ -658,97 +616,41 @@ def verifyUnpacked(java, project, artifact, unpackPath, 
gitRevision, version, te
 print('run "%s"' % validateCmd)
 java.run_java11(validateCmd, '%s/validate.log' % unpackPath)
 
-if project == 'lucene':
-  print("run tests w/ Java 11 and testArgs='%s'..." % testArgs)
-  java.run_java11('ant clean test %s' % testArgs, '%s/test.log' % 
unpackPath)
-  java.run_java11('ant jar', '%s/compile.log' % unpackPath)
-  testDemo(java.run_java11, isSrc, version, '11')
-
-  print('generate javadocs w/ Java 11...')
-  java.run_java11('ant javadocs', '%s/javadocs.log' % unpackPath)
-  checkBrokenLinks('%s/build/docs' % unpackPath)
-
-  if java.run_java12:
-print("run tests w/ Java 12 and testArgs='%s'..." % testArgs)
-java.run_java12('ant clean test %s' % testArgs, '%s/test.log' % 
unpackPath)
-java.run_java12('ant jar', '%s/compile.log' % unpackPath)
-testDemo(java.run_java12, isSrc, version, '12')
-
-#print('generate javadocs w/ Java 12...')
-#java.run_java12('ant javadocs', '%s/javadocs.log' % unpackPath)
-#checkBrokenLinks('%s/build/docs' % unpackPath)
-
-else:
-  os.chdir('solr')
-
-  print("run tests w/ Java 11 and testArgs='%s'..." % testArgs)
-  java.run_java11('ant clean test -Dtests.slow=false %s' % testArgs, 
'%s/test.log' % unpackPath)
-
-  # test javadocs
-  print('generate javadocs w/ Java 11...')
-  java.run_java11('ant clean javadocs', '%s/javadocs.log' % unpackPath)
-  checkBrokenLinks('%s/solr/build/docs')
-
-  print('test solr example w/ Java 11...')
-  java.run_java11('ant clean server', '%s/antexample.log' % unpackPath)
-  testSolrExample(unpackPath, java.java11_home, True)
+print("run tests w/ Java 11 and testArgs='%s'..." % testArgs)
+java.run_java11('ant clean test %s' % testArgs, '%s/test.log' % unpackPath)
+java.run_java11('ant jar', '%s/compile.log' % unpackPath)
+testDemo(java.run_java11, isSrc, version, '11')
 
-  if java.run_java12:
-print("run tests w/ Java 12 and testArgs='%s'..." % testArgs)
-java.run_java12('ant clean test -Dtests.slow=false %s' % testArgs, 
'%s/test.log' % unpackPath)
+print('generate javadocs w/ Java 11...')
+java.run_java11('ant javadocs', '%s/javadocs.log' % unpackPath)
+checkBrokenLinks('%s/build/docs' % unpackPath)
 
-#print('generate javadocs w/ Java 12...')
-#java.run_java12('ant clean javadocs', '%s/javadocs.log' % unpackPath)
-#checkBrokenLinks('%s/solr/build/docs' % unpackPath)
+if java.run_java12:
+  print("run tests w/ Java 12 and testArgs='%s'..." % testArgs)
+  java.run_java12('ant clean test %s' % testArgs, '%s/test.log' % 
unpackPath)
+  java.run_java12('ant jar', '%s/compile.log' % unpackPath)
+  testDemo(java.run_java12, isSrc, version, '12')
 
-print('test solr example w/ Java 12...')
-java.run_java12('ant clean server', '%s/antexample.log' % unpackPath)
-testSolrExample(unpackPath, java.java12_home, True)
-
-  os.chdir('..')
-  print('check NOTICE')
-  testNotice(unpackPath)
+  #print('generate javadocs w/ Java 12...')
+  #java.run_java12('ant javadocs', '%s/javadocs.log' % unpackPath)
+  #checkBrokenLinks('%s/build/docs' % unpackPath)
 
   else:
 
 checkAllJARs(os.getcwd(), project, gitRevision, version, tmpDir, baseURL)
 
-if project == 'lucene':
-  testDemo(java.run_java11, isSrc, version, '11')
-  if java.run_java12:
-testDemo(java.run_java12, isSrc, version, '12')
-
-else:
-  print('copying unpacked distribution for Java 11 ...')
-  java11UnpackPath = '%s-java11' % unpackPath
-  if os.path.exists(java11UnpackPath):
-shutil.rmtree(java11UnpackPath)
-  shutil.copytree(unpackPath, java11UnpackPath)
-  os.chdir(java11UnpackPath)
-  print('test solr example w/ Java 11...')
-  testSolrExample(java11UnpackPath, java.java11_home, False)
-
-  if java.run_java12:
-print('copying unpacked distribution for Java 12 ...')
-java12UnpackPath = '%s-java12' % unpackPath
-if os.path.exists(java12UnpackPath):
-  shutil.rmtree(java12UnpackPath)
-shutil.copytree(unpackPath, java12UnpackPath)
-os.chdir(java12UnpackPath)
-print('test solr example w/ Java 12...')
-testSolrExample(java12UnpackPath, java.java12_home, False)
-
-  os.chdir(unpackPath)
+testDemo(java.run_java11, isSrc, version, '11')
+if java.run_java12:
+  testDemo(java.run_java12, isSrc, version, '12')
 
   testChangesText('.', version, project)
 
-  if project == 'lucene' and isSrc:
+  if 

[jira] [Commented] (LUCENE-8637) WeightedSpanTermExtractor unnexessarily enforces rewrite for some SpanQueiries

2021-10-08 Thread Marcus Eagan (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426406#comment-17426406
 ] 

Marcus Eagan commented on LUCENE-8637:
--

[~gol...@detego-software.de] Can you check if it breaks unit tests? Do you know 
how to run the whole suite?

> WeightedSpanTermExtractor unnexessarily enforces rewrite for some SpanQueiries
> --
>
> Key: LUCENE-8637
> URL: https://issues.apache.org/jira/browse/LUCENE-8637
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/highlighter
>Affects Versions: 7.4, 7.3.1, 7.5, 7.6
>Reporter: Christoph Goller
>Priority: Minor
>  Labels: patch
> Attachments: WeightedSpanTermExtractor.java
>
>
> Method mustRewriteQuery(SpanQuery) returns true for SpanPositionCheckQuery, 
> SpanContainingQuery, SpanWithinQuery, and SpanBoostQuery, however, these 
> queries do not require rewriting. One effect of this is e.g. that 
> UnifiedHighlighter does not work with OffsetSource Postings and switches to 
> Analysis which of course has consequences for performance.
> I attach a patch for lucene version 7.6.0. I have not checked whether it 
> breaks existing unit tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #369: LUCENE-10158: Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing

2021-10-08 Thread GitBox


uschindler commented on a change in pull request #369:
URL: https://github.com/apache/lucene/pull/369#discussion_r725287741



##
File path: lucene/core/src/java/org/apache/lucene/util/Unwrappable.java
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+/**
+ * An object with this interface is a wrapper around another object (e.g., a 
filter with a
+ * delegate). The method {@link #unwrap()} can be called to get the wrapped 
object
+ *
+ * @lucene.internal
+ */
+public interface Unwrappable {

Review comment:
   It must be static here, because you don't know if the class you pass as 
param supports unwrapping at all. e.g., you just have a java.nio.Path and you 
don't know if it was wrapped. So you pass the unknown implementation instance 
to the static method and it returns (due to generics) an instance of same type, 
probably unwrapped. But you don't need to know!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #369: LUCENE-10158: Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing

2021-10-08 Thread GitBox


uschindler commented on a change in pull request #369:
URL: https://github.com/apache/lucene/pull/369#discussion_r725287741



##
File path: lucene/core/src/java/org/apache/lucene/util/Unwrappable.java
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+/**
+ * An object with this interface is a wrapper around another object (e.g., a 
filter with a
+ * delegate). The method {@link #unwrap()} can be called to get the wrapped 
object
+ *
+ * @lucene.internal
+ */
+public interface Unwrappable {

Review comment:
   It must be static here, because you don't know if the class you pass on 
supports unwrapping at all. You just have a Java.nio.Path and you don't know if 
it was wrapped. So you pass the unknown class to the static method and it 
returns (due to generics) an instance of same type, probably unwrapped. But you 
don't need to know!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a change in pull request #369: LUCENE-10158: Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing

2021-10-08 Thread GitBox


uschindler commented on a change in pull request #369:
URL: https://github.com/apache/lucene/pull/369#discussion_r725286409



##
File path: lucene/core/src/java/org/apache/lucene/util/Unwrappable.java
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+/**
+ * An object with this interface is a wrapper around another object (e.g., a 
filter with a
+ * delegate). The method {@link #unwrap()} can be called to get the wrapped 
object
+ *
+ * @lucene.internal
+ */
+public interface Unwrappable {

Review comment:
   We had the discussion in the Mmap PR: 
https://github.com/apache/lucene/pull/173#pullrequestreview-677347736
   
   The FilterPath exported the unwrapping method anyways, so we don't change 
public apis. Let's keep it simple!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a change in pull request #369: LUCENE-10158: Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing

2021-10-08 Thread GitBox


dweiss commented on a change in pull request #369:
URL: https://github.com/apache/lucene/pull/369#discussion_r725220341



##
File path: lucene/core/src/java/org/apache/lucene/util/Unwrappable.java
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+/**
+ * An object with this interface is a wrapper around another object (e.g., a 
filter with a
+ * delegate). The method {@link #unwrap()} can be called to get the wrapped 
object
+ *
+ * @lucene.internal
+ */
+public interface Unwrappable {

Review comment:
   I feel like we already had this discussion at some point in the past 
(about terminology and unwrapAll being static vs. a default method)?... Looks 
good to me. I recall Eclipse used a different pattern - they had type adapters 
which you could register as a service and then the service took care of 
adapting one type to another (here it'd unwrap delegates). This has the 
advantage of not exposing any additional methods on the class - the service 
facility could require a Function upon registration.
   
   Maybe it's overengineering though - this method seems fine to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10136) Lift the restriction on using 'var' variables

2021-10-08 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10136.
--
Fix Version/s: main (9.0)
   Resolution: Fixed

> Lift the restriction on using 'var' variables
> -
>
> Key: LUCENE-10136
> URL: https://issues.apache.org/jira/browse/LUCENE-10136
> Project: Lucene - Core
>  Issue Type: Wish
>Affects Versions: main (9.0)
>Reporter: Dawid Weiss
>Priority: Trivial
> Fix For: main (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Can we lift the restriction on using 'var' on the main branch? I know it's a 
> double-edged sword and sometimes it leads to unreadable code, especially when 
> you invoke a method, for example:
> {code}
> var foo = myMethodThatDoesSomething();
> {code}
> but in many, many, *many* cases the var keyword shortens the code and the 
> type is obvious from the context. This happens in loops, try-with-resources 
> and local variables. 
> {code}
> for (var it = array.iterator(); it.hasNext();) { ... }
> try (var foo = new MyFoo()) { ... }
> {code}
> I'd say - let's allow vars (and other language features) and use common 
> sense. If something is not clear in the context, type the variable. If 
> something is obvious, use shortcuts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10136) Lift the restriction on using 'var' variables

2021-10-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426318#comment-17426318
 ] 

ASF subversion and git services commented on LUCENE-10136:
--

Commit a613021ca4823c5eb8c9cf6947095bc5098ac500 in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a613021 ]

LUCENE-10136: allow 'var' declarations in source code (be reasonable though). 
(#368)



> Lift the restriction on using 'var' variables
> -
>
> Key: LUCENE-10136
> URL: https://issues.apache.org/jira/browse/LUCENE-10136
> Project: Lucene - Core
>  Issue Type: Wish
>Affects Versions: main (9.0)
>Reporter: Dawid Weiss
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Can we lift the restriction on using 'var' on the main branch? I know it's a 
> double-edged sword and sometimes it leads to unreadable code, especially when 
> you invoke a method, for example:
> {code}
> var foo = myMethodThatDoesSomething();
> {code}
> but in many, many, *many* cases the var keyword shortens the code and the 
> type is obvious from the context. This happens in loops, try-with-resources 
> and local variables. 
> {code}
> for (var it = array.iterator(); it.hasNext();) { ... }
> try (var foo = new MyFoo()) { ... }
> {code}
> I'd say - let's allow vars (and other language features) and use common 
> sense. If something is not clear in the context, type the variable. If 
> something is obvious, use shortcuts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss merged pull request #368: LUCENE-10136: allow 'var' declarations in source code

2021-10-08 Thread GitBox


dweiss merged pull request #368:
URL: https://github.com/apache/lucene/pull/368


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #350: LUCENE-10182: No longer check dvGen.

2021-10-08 Thread GitBox


msokolov commented on pull request #350:
URL: https://github.com/apache/lucene/pull/350#issuecomment-938823886


   Oh, sorry I am just catching up here. I like this solution. I'm also kind
   of annoyed that this slipped through before when I added the error message
   specialization (I think that must be when the autoboxing kicked in?).
   Somehow my perf measurements did not catch the regression.
   
   On Wed, Oct 6, 2021 at 5:46 AM Adrien Grand ***@***.***>
   wrote:
   
   > Actually there is an even simpler fix: the only long is dvGen which we
   > don't need to check since it's always -1. I'll update the PR description
   > accordingly.
   >
   > —
   > You are receiving this because your review was requested.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > 

   > or Android
   > 
.
   >
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on a change in pull request #350: LUCENE-10182: No longer check dvGen.

2021-10-08 Thread GitBox


msokolov commented on a change in pull request #350:
URL: https://github.com/apache/lucene/pull/350#discussion_r725158069



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexingChain.java
##
@@ -1348,7 +1348,13 @@ private void assertSame(String label, int expected, int 
given) {
   }
 }
 
-private void assertSame(String label, Object expected, Object given) {
+private void assertSame(String label, long expected, long given) {

Review comment:
   Ah good catch! I didn't observe the slowdown when introducing this 
Object method, glad you found it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] balmukundblr commented on pull request #358: Add JAVA_OPTS to download the gradle-wrapper jar behind proxy.

2021-10-08 Thread GitBox


balmukundblr commented on pull request #358:
URL: https://github.com/apache/lucene/pull/358#issuecomment-938795581


   > Thanks for this patch! Can you please also update gradlew.bat?
   
   Done required changes in gradlew.bat file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] balmukundblr commented on pull request #358: Add JAVA_OPTS to download the gradle-wrapper jar behind proxy.

2021-10-08 Thread GitBox


balmukundblr commented on pull request #358:
URL: https://github.com/apache/lucene/pull/358#issuecomment-938790568


   > Dealing with this in scripts is nightmarish... but if you do want it then 
you should:
   > 
   > * provide the defaults (empty string) if the variable is not defined,
   > * modify all scripts (Windows, Linux) so that they work in the same way.
   
   Fixed. Done the required changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] madrob commented on a change in pull request #355: LUCENE-9997 Revisit smoketester for 9.0 build

2021-10-08 Thread GitBox


madrob commented on a change in pull request #355:
URL: https://github.com/apache/lucene/pull/355#discussion_r725033367



##
File path: dev-tools/scripts/smokeTestRelease.py
##
@@ -900,42 +701,19 @@ def testDemo(run_java, isSrc, version, jdk):
   if removeTrailingZeros(actualVersion) != removeTrailingZeros(version):
 raise RuntimeError('wrong version from CheckIndex: got "%s" but expected 
"%s"' % (actualVersion, version))
 
+
 def removeTrailingZeros(version):
   return re.sub(r'(\.0)*$', '', version)
 
-def checkMaven(solrSrcUnpackPath, baseURL, tmpDir, gitRevision, version, 
isSigned, keysFile):
-  POMtemplates = defaultdict()
-  getPOMtemplates(solrSrcUnpackPath, POMtemplates, tmpDir)
-  print('download artifacts')
-  artifacts = {'lucene': [], 'solr': []}
-  for project in ('lucene', 'solr'):
-artifactsURL = '%s/%s/maven/org/apache/%s/' % (baseURL, project, project)
-targetDir = '%s/maven/org/apache/%s' % (tmpDir, project)
-if not os.path.exists(targetDir):
-  os.makedirs(targetDir)
-crawl(artifacts[project], artifactsURL, targetDir)
-  print()
-  verifyPOMperBinaryArtifact(artifacts, version)
-  verifyArtifactPerPOMtemplate(POMtemplates, artifacts, tmpDir, version)
-  verifyMavenDigests(artifacts)
-  checkJavadocAndSourceArtifacts(artifacts, version)
-  verifyDeployedPOMsCoordinates(artifacts, version)
-  if isSigned:
-verifyMavenSigs(baseURL, tmpDir, artifacts, keysFile)
-
-  distFiles = getBinaryDistFilesForMavenChecks(tmpDir, version, baseURL)
-  checkIdenticalMavenArtifacts(distFiles, artifacts, version)
-
-  checkAllJARs('%s/maven/org/apache/lucene' % tmpDir, 'lucene', gitRevision, 
version, tmpDir, baseURL)
-  checkAllJARs('%s/maven/org/apache/solr' % tmpDir, 'solr', gitRevision, 
version, tmpDir, baseURL)
 
 def getBinaryDistFilesForMavenChecks(tmpDir, version, baseURL):
   # TODO: refactor distribution unpacking so that it only happens once per 
distribution per smoker run
   distFiles = defaultdict()
-  for project in ('lucene', 'solr'):
+  for project in ('lucene'):

Review comment:
   for all of these, should we consider removing the for loop entirely?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a change in pull request #225: LUCENE-10010 Introduce NFARunAutomaton to run NFA directly

2021-10-08 Thread GitBox


mikemccand commented on a change in pull request #225:
URL: https://github.com/apache/lucene/pull/225#discussion_r725023003



##
File path: lucene/core/src/java/org/apache/lucene/search/AutomatonQuery.java
##
@@ -96,12 +110,36 @@ public AutomatonQuery(final Term term, Automaton 
automaton, int determinizeWorkL
*/
   public AutomatonQuery(
   final Term term, Automaton automaton, int determinizeWorkLimit, boolean 
isBinary) {
+this(term, automaton, determinizeWorkLimit, isBinary, 
ByteRunnable.TYPE.DFA);
+  }
+
+  /**
+   * Create a new AutomatonQuery from an {@link Automaton}.
+   *
+   * @param term Term containing field and possibly some pattern structure. 
The term text is
+   * ignored.
+   * @param automaton Automaton to run, terms that are accepted are considered 
a match.
+   * @param determinizeWorkLimit maximum effort to spend determinizing the 
automaton. If the
+   * automaton will need more than this much effort, 
TooComplexToDeterminizeException is thrown.
+   * Higher numbers require more space but can process more complex 
automata.
+   * @param isBinary if true, this automaton is already binary and will not go 
through the
+   * UTF32ToUTF8 conversion
+   * @param runnableType NFA or DFA. See {@link 
org.apache.lucene.util.automaton.ByteRunnable.TYPE}
+   * for difference between NFA and DFA. Also note * that NFA has 
uncertain performance impact

Review comment:
   Remove that errant `*` between `note` and `that`?

##
File path: 
lucene/core/src/java/org/apache/lucene/util/automaton/ByteRunnable.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util.automaton;
+
+/** A runnable automaton accepting byte array as input */
+public interface ByteRunnable {
+
+  /** NFA or DFA */
+  enum TYPE {
+/**
+ * Determinize the automaton lazily on-demand as terms are intersected. 
This option saves the
+ * up-front determinize cost, and can handle some RegExps that DFA cannot, 
but intersection will
+ * be a bit slower

Review comment:
   Missing period at the end of the sentence?
   
   Maybe link to Russ Cox's famous page 
(https://swtch.com/~rsc/regexp/regexp1.html) and point out that this is similar 
to the Thompson NFA approach described there?

##
File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/memory/DirectPostingsFormat.java
##
@@ -962,15 +964,22 @@ public ImpactsEnum impacts(int flags) throws IOException {
   private int stateUpto;
 
   public DirectIntersectTermsEnum(CompiledAutomaton compiled, BytesRef 
startTerm) {
-runAutomaton = compiled.runAutomaton;
-compiledAutomaton = compiled;
+if (compiled.nfaRunAutomaton != null) {
+  this.runAutomaton = compiled.nfaRunAutomaton;

Review comment:
   OK we can wait on this.  Maybe just add a comment explaining why we need 
this odd `if` still, here and in the other places where we did this.

##
File path: lucene/core/src/java/org/apache/lucene/search/AutomatonQuery.java
##
@@ -65,7 +66,20 @@
* @param automaton Automaton to run, terms that are accepted are considered 
a match.
*/

Review comment:
   Could you update these javadocs to state that the `runnableType` is 
`DFA` by default, and point to the `ByteRunnable.TYPE` javadocs?

##
File path: 
lucene/core/src/java/org/apache/lucene/util/automaton/ByteRunnable.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the 

[jira] [Updated] (LUCENE-10158) Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing

2021-10-08 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10158:
---
Description: 
While creating the new MMapDirectory using Project Panama in the recent OpenJDK 
versions (not yet released, incubation only), I stumbled on our testing 
framework, which wraps many objects with AssertingXY. The problem with that is, 
mmap in project panama only works when the java.nio.files.Path is owned by the 
default file system provider. During testing we wrap it often with custom 
implementations emulating Windows or track open file handles.
If you pass such a wrapped Path to the NIO2 Panama APIs, it will fail with 
exception, because it can't refer to file channel internal methods from it. In 
the final version of Panama this may go away and we can provide our own wrapper 
for memory mapping, but this is problematic with current testing.

My plan is to release versions of MMapDirectory version 2 with different 
implementations of the Panama APIs for easy pluggin into Lucene, Solr, 
Elasticsearch by just adding a JAR file that fits your JDK version. To run 
tests, unfortunately the MMapDir impl must "unrwap" the Path wrappers added by 
the test system.

To help with that and to make it more general, in my pull requests (e.g. 
https://github.com/apache/lucene/pull/177), I added a new interface 
{{org.apache.lucene.util.Unwrappable}} that allows to unwrap external code to 
get the "original" Path implementation. The same interface could be applied to 
many other Lucene/Test classes that needs unwrapping sometimes (e.g. around 
Directory or Queries), but for now it is only implemented for Test's 
FilterPath. The interface needs to be part of Lucene core and is used by 
production code to unwrap any test-framework FilterPath (or similar) wrappers. 
MMapDirectory version 2 uses it to get the original Path to be passed to 
MemorySegment.mapFile().

I'd like to get this into Lucene 9.0. It does not hurt, it is just an 
interface, which is implemented by test classes, ready for extension to other 
classes. It also provides the unwrapper method, which is generic.

  was:
While creating the new MMapDirectory using Project Panama in the recent OpenJDK 
versions (not yet released, incubation only), I stumbled on our testing 
framework, which wraps many objects with AssertingXY. The problem with that is, 
mmap in project panama only works when the java.nio.files.Path is owned by the 
default file system provider. During testing we wrap it often with custom 
implementations emulating Windows or track open file handles.
If you pass such a wrapped Path to the NIO2 Panama APIs, it will fail with 
exception, because it can't refer to file channel internal methods from it. In 
the final version of Panama this may go away and we can provide our own wrapper 
for memory mapping, but this is problematic with current testing.

My plan is to release versions of MMapDirectory version 2 with different 
implementations of the Panama APIs for easy pluggin into Lucene, Solr, 
Elasticsearch by just adding a JAR file that fits your JDK version. To run 
tests, unfortunately the MMapDir impl must "unrwap" the Path wrappers added by 
the test system.

To help with that and to make it more general, in my pull requests (e.g. 
https://github.com/apache/lucene/pull/177), I added a new interface 
{{org.apache.lucene.util.Unwrappable}} that allows to unwrap external code to 
get the "original" Path implementation. The same interface could be applied to 
many other Lucene/Test classes that needs unwrapping sometimes (e.g. around 
Directory or Queries), but for now it is only implemented for Test's 
FilterPath. The interface is part of Lucene core and just used by the test to 
supply unwrapping.

MMapDirectory version 2 uses it to get the original Path to be passed to 
MemorySegment.mapFile().

I'd like to get this into Lucene 9.0. It does not hurt, it is just an 
interface, which is implemented by test classes, ready for extension to other 
classes. It also provides the unwrapper method, which is generic.


> Add a new interface Unwrappable to the utils package to ease migration to new 
> MMAPDirectory and its testing
> ---
>
> Key: LUCENE-10158
> URL: https://issues.apache.org/jira/browse/LUCENE-10158
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other, general/test
>Affects Versions: main (9.0)
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While creating the new MMapDirectory using Project Panama in the recent 
> OpenJDK versions (not yet released, incubation only), I stumbled on our 

[jira] [Commented] (LUCENE-10158) Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing

2021-10-08 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426191#comment-17426191
 ] 

Uwe Schindler commented on LUCENE-10158:


For discussion about this interface see: 
https://github.com/apache/lucene/pull/173#pullrequestreview-677347736

> Add a new interface Unwrappable to the utils package to ease migration to new 
> MMAPDirectory and its testing
> ---
>
> Key: LUCENE-10158
> URL: https://issues.apache.org/jira/browse/LUCENE-10158
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other, general/test
>Affects Versions: main (9.0)
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While creating the new MMapDirectory using Project Panama in the recent 
> OpenJDK versions (not yet released, incubation only), I stumbled on our 
> testing framework, which wraps many objects with AssertingXY. The problem 
> with that is, mmap in project panama only works when the java.nio.files.Path 
> is owned by the default file system provider. During testing we wrap it often 
> with custom implementations emulating Windows or track open file handles.
> If you pass such a wrapped Path to the NIO2 Panama APIs, it will fail with 
> exception, because it can't refer to file channel internal methods from it. 
> In the final version of Panama this may go away and we can provide our own 
> wrapper for memory mapping, but this is problematic with current testing.
> My plan is to release versions of MMapDirectory version 2 with different 
> implementations of the Panama APIs for easy pluggin into Lucene, Solr, 
> Elasticsearch by just adding a JAR file that fits your JDK version. To run 
> tests, unfortunately the MMapDir impl must "unrwap" the Path wrappers added 
> by the test system.
> To help with that and to make it more general, in my pull requests (e.g. 
> https://github.com/apache/lucene/pull/177), I added a new interface 
> {{org.apache.lucene.util.Unwrappable}} that allows to unwrap external code to 
> get the "original" Path implementation. The same interface could be applied 
> to many other Lucene/Test classes that needs unwrapping sometimes (e.g. 
> around Directory or Queries), but for now it is only implemented for Test's 
> FilterPath. The interface is part of Lucene core and just used by the test to 
> supply unwrapping.
> MMapDirectory version 2 uses it to get the original Path to be passed to 
> MemorySegment.mapFile().
> I'd like to get this into Lucene 9.0. It does not hurt, it is just an 
> interface, which is implemented by test classes, ready for extension to other 
> classes. It also provides the unwrapper method, which is generic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a change in pull request #225: LUCENE-10010 Introduce NFARunAutomaton to run NFA directly

2021-10-08 Thread GitBox


mikemccand commented on a change in pull request #225:
URL: https://github.com/apache/lucene/pull/225#discussion_r725016724



##
File path: 
lucene/core/src/test/org/apache/lucene/util/automaton/TestNFARunAutomaton.java
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import java.util.Arrays;
+import org.apache.lucene.util.IntsRef;
+import org.apache.lucene.util.LuceneTestCase;
+
+public class TestNFARunAutomaton extends LuceneTestCase {
+
+  public void testRandom() {
+for (int i = 0; i < 100; i++) {
+  RegExp regExp = null;
+  while (regExp == null) {
+try {
+  regExp = new RegExp(AutomatonTestUtil.randomRegexp(random()));
+} catch (IllegalArgumentException e) {
+  ignoreException(e);

Review comment:
   Yeah +1 to pursue that separately/later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a change in pull request #225: LUCENE-10010 Introduce NFARunAutomaton to run NFA directly

2021-10-08 Thread GitBox


mikemccand commented on a change in pull request #225:
URL: https://github.com/apache/lucene/pull/225#discussion_r725015736



##
File path: lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java
##
@@ -551,12 +551,22 @@ static RegExp newLeafNode(
 return new RegExp(flags, kind, null, null, s, c, min, max, digits, from, 
to);
   }
 
+  /**
+   * Return an Automaton from this RegExp that will 
skip the determinize
+   * and minimize step
+   *
+   * @return {@link Automaton} most likely non-deterministic
+   */
+  public Automaton toNFA() {

Review comment:
   Yes, I think that's right!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10158) Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing

2021-10-08 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10158:
---
Fix Version/s: main (9.0)

> Add a new interface Unwrappable to the utils package to ease migration to new 
> MMAPDirectory and its testing
> ---
>
> Key: LUCENE-10158
> URL: https://issues.apache.org/jira/browse/LUCENE-10158
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other, general/test
>Affects Versions: main (9.0)
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While creating the new MMapDirectory using Project Panama in the recent 
> OpenJDK versions (not yet released, incubation only), I stumbled on our 
> testing framework, which wraps many objects with AssertingXY. The problem 
> with that is, mmap in project panama only works when the java.nio.files.Path 
> is owned by the default file system provider. During testing we wrap it often 
> with custom implementations emulating Windows or track open file handles.
> If you pass such a wrapped Path to the NIO2 Panama APIs, it will fail with 
> exception, because it can't refer to file channel internal methods from it. 
> In the final version of Panama this may go away and we can provide our own 
> wrapper for memory mapping, but this is problematic with current testing.
> My plan is to release versions of MMapDirectory version 2 with different 
> implementations of the Panama APIs for easy pluggin into Lucene, Solr, 
> Elasticsearch by just adding a JAR file that fits your JDK version. To run 
> tests, unfortunately the MMapDir impl must "unrwap" the Path wrappers added 
> by the test system.
> To help with that and to make it more general, in my pull requests (e.g. 
> https://github.com/apache/lucene/pull/177), I added a new interface 
> {{org.apache.lucene.util.Unwrappable}} that allows to unwrap external code to 
> get the "original" Path implementation. The same interface could be applied 
> to many other Lucene/Test classes that needs unwrapping sometimes (e.g. 
> around Directory or Queries), but for now it is only implemented for Test's 
> FilterPath. The interface is part of Lucene core and just used by the test to 
> supply unwrapping.
> MMapDirectory version 2 uses it to get the original Path to be passed to 
> MemorySegment.mapFile().
> I'd like to get this into Lucene 9.0. It does not hurt, it is just an 
> interface, which is implemented by test classes, ready for extension to other 
> classes. It also provides the unwrapper method, which is generic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10158) Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing

2021-10-08 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10158:
---
Summary: Add a new interface Unwrappable to the utils package to ease 
migration to new MMAPDirectory and its testing  (was: Add a new interface 
Unwrappable to the utils package to ease migration to new MMAPDirectory and 
testing)

> Add a new interface Unwrappable to the utils package to ease migration to new 
> MMAPDirectory and its testing
> ---
>
> Key: LUCENE-10158
> URL: https://issues.apache.org/jira/browse/LUCENE-10158
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other, general/test
>Affects Versions: main (9.0)
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While creating the new MMapDirectory using Project Panama in the recent 
> OpenJDK versions (not yet released, incubation only), I stumbled on our 
> testing framework, which wraps many objects with AssertingXY. The problem 
> with that is, mmap in project panama only works when the java.nio.files.Path 
> is owned by the default file system provider. During testing we wrap it often 
> with custom implementations emulating Windows or track open file handles.
> If you pass such a wrapped Path to the NIO2 Panama APIs, it will fail with 
> exception, because it can't refer to file channel internal methods from it. 
> In the final version of Panama this may go away and we can provide our own 
> wrapper for memory mapping, but this is problematic with current testing.
> My plan is to release versions of MMapDirectory version 2 with different 
> implementations of the Panama APIs for easy pluggin into Lucene, Solr, 
> Elasticsearch by just adding a JAR file that fits your JDK version. To run 
> tests, unfortunately the MMapDir impl must "unrwap" the Path wrappers added 
> by the test system.
> To help with that and to make it more general, in my pull requests (e.g. 
> https://github.com/apache/lucene/pull/177), I added a new interface 
> {{org.apache.lucene.util.Unwrappable}} that allows to unwrap external code to 
> get the "original" Path implementation. The same interface could be applied 
> to many other Lucene/Test classes that needs unwrapping sometimes (e.g. 
> around Directory or Queries), but for now it is only implemented for Test's 
> FilterPath. The interface is part of Lucene core and just used by the test to 
> supply unwrapping.
> MMapDirectory version 2 uses it to get the original Path to be passed to 
> MemorySegment.mapFile().
> I'd like to get this into Lucene 9.0. It does not hurt, it is just an 
> interface, which is implemented by test classes, ready for extension to other 
> classes. It also provides the unwrapper method, which is generic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10158) Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing

2021-10-08 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426189#comment-17426189
 ] 

Uwe Schindler commented on LUCENE-10158:


Here is the pull request: https://github.com/apache/lucene/pull/369

> Add a new interface Unwrappable to the utils package to ease migration to new 
> MMAPDirectory and its testing
> ---
>
> Key: LUCENE-10158
> URL: https://issues.apache.org/jira/browse/LUCENE-10158
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other, general/test
>Affects Versions: main (9.0)
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While creating the new MMapDirectory using Project Panama in the recent 
> OpenJDK versions (not yet released, incubation only), I stumbled on our 
> testing framework, which wraps many objects with AssertingXY. The problem 
> with that is, mmap in project panama only works when the java.nio.files.Path 
> is owned by the default file system provider. During testing we wrap it often 
> with custom implementations emulating Windows or track open file handles.
> If you pass such a wrapped Path to the NIO2 Panama APIs, it will fail with 
> exception, because it can't refer to file channel internal methods from it. 
> In the final version of Panama this may go away and we can provide our own 
> wrapper for memory mapping, but this is problematic with current testing.
> My plan is to release versions of MMapDirectory version 2 with different 
> implementations of the Panama APIs for easy pluggin into Lucene, Solr, 
> Elasticsearch by just adding a JAR file that fits your JDK version. To run 
> tests, unfortunately the MMapDir impl must "unrwap" the Path wrappers added 
> by the test system.
> To help with that and to make it more general, in my pull requests (e.g. 
> https://github.com/apache/lucene/pull/177), I added a new interface 
> {{org.apache.lucene.util.Unwrappable}} that allows to unwrap external code to 
> get the "original" Path implementation. The same interface could be applied 
> to many other Lucene/Test classes that needs unwrapping sometimes (e.g. 
> around Directory or Queries), but for now it is only implemented for Test's 
> FilterPath. The interface is part of Lucene core and just used by the test to 
> supply unwrapping.
> MMapDirectory version 2 uses it to get the original Path to be passed to 
> MemorySegment.mapFile().
> I'd like to get this into Lucene 9.0. It does not hurt, it is just an 
> interface, which is implemented by test classes, ready for extension to other 
> classes. It also provides the unwrapper method, which is generic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler opened a new pull request #369: LUCENE-10158: Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and its testing

2021-10-08 Thread GitBox


uschindler opened a new pull request #369:
URL: https://github.com/apache/lucene/pull/369


   See https://issues.apache.org/jira/browse/LUCENE-10158


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10158) Add a new interface Unwrappable to the utils package to ease migration to new MMAPDirectory and testing

2021-10-08 Thread Uwe Schindler (Jira)
Uwe Schindler created LUCENE-10158:
--

 Summary: Add a new interface Unwrappable to the utils package to 
ease migration to new MMAPDirectory and testing
 Key: LUCENE-10158
 URL: https://issues.apache.org/jira/browse/LUCENE-10158
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other, general/test
Affects Versions: main (9.0)
Reporter: Uwe Schindler
Assignee: Uwe Schindler


While creating the new MMapDirectory using Project Panama in the recent OpenJDK 
versions (not yet released, incubation only), I stumbled on our testing 
framework, which wraps many objects with AssertingXY. The problem with that is, 
mmap in project panama only works when the java.nio.files.Path is owned by the 
default file system provider. During testing we wrap it often with custom 
implementations emulating Windows or track open file handles.
If you pass such a wrapped Path to the NIO2 Panama APIs, it will fail with 
exception, because it can't refer to file channel internal methods from it. In 
the final version of Panama this may go away and we can provide our own wrapper 
for memory mapping, but this is problematic with current testing.

My plan is to release versions of MMapDirectory version 2 with different 
implementations of the Panama APIs for easy pluggin into Lucene, Solr, 
Elasticsearch by just adding a JAR file that fits your JDK version. To run 
tests, unfortunately the MMapDir impl must "unrwap" the Path wrappers added by 
the test system.

To help with that and to make it more general, in my pull requests (e.g. 
https://github.com/apache/lucene/pull/177), I added a new interface 
{{org.apache.lucene.util.Unwrappable}} that allows to unwrap external code to 
get the "original" Path implementation. The same interface could be applied to 
many other Lucene/Test classes that needs unwrapping sometimes (e.g. around 
Directory or Queries), but for now it is only implemented for Test's 
FilterPath. The interface is part of Lucene core and just used by the test to 
supply unwrapping.

MMapDirectory version 2 uses it to get the original Path to be passed to 
MemorySegment.mapFile().

I'd like to get this into Lucene 9.0. It does not hurt, it is just an 
interface, which is implemented by test classes, ready for extension to other 
classes. It also provides the unwrapper method, which is generic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a change in pull request #368: LUCENE-10136: allow 'var' declarations in source code

2021-10-08 Thread GitBox


mikemccand commented on a change in pull request #368:
URL: https://github.com/apache/lucene/pull/368#discussion_r724999261



##
File path: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java
##
@@ -1965,7 +1965,7 @@ public void testCommitWithUserDataOnly() throws Exception 
{
 Map data = new HashMap<>();
 Iterable> iter = writer.getLiveCommitData();
 if (iter != null) {
-  for (Map.Entry ent : iter) {
+  for (var ent : iter) {

Review comment:
   Woot!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #368: LUCENE-10136: allow 'var' declarations in source code

2021-10-08 Thread GitBox


dweiss commented on pull request #368:
URL: https://github.com/apache/lucene/pull/368#issuecomment-938547464


   Lifted one restriction (var use), left this one in though.
   {code}
   (~$/\n\s*var\s+.*=.*<>.*/$) : 'Diamond operators should not be used with var'
   {code}
   
   Also, added a single var to for Map.Entry loop to make sure everything works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10129) Add RamUsageEstimator shallowSizeOf(long[]) overload that just calls sizeOf(long[])?

2021-10-08 Thread Stefan Vodita (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426103#comment-17426103
 ] 

Stefan Vodita edited comment on LUCENE-10129 at 10/8/21, 10:33 AM:
---

Hello! I am new to Lucene, and this looked like an easy task to start with, so 
I published [a PR|https://github.com/apache/lucene/pull/367] for it.


I am wondering though if _TestRamUsageEstimator_ is missing an _import static 
org.apache.lucene.util.RamUsageEstimator.sizeOf;_, so that in lines like 
_assertEquals(sizeOf(array), sizeOf((Object) array));_ the first _sizeOf()_ 
calls _RamUsageEstimator.sizeOf_, and the second calls _RamUsageTester.sizeOf_.
Apologies if I misunderstood the purpose of the test.


was (Author: stefanvodita):
Hello! I am new to Lucene, and this looked like an easy task to start with, so 
I published [a PR|http://example.com] for it.


I am wondering though if _TestRamUsageEstimator_ is missing an _import static 
org.apache.lucene.util.RamUsageEstimator.sizeOf;_, so that in lines like 
_assertEquals(sizeOf(array), sizeOf((Object) array));_ the first _sizeOf()_ 
calls _RamUsageEstimator.sizeOf_, and the second calls _RamUsageTester.sizeOf_.
Apologies if I misunderstood the purpose of the test.

> Add RamUsageEstimator shallowSizeOf(long[]) overload that just calls 
> sizeOf(long[])?
> 
>
> Key: LUCENE-10129
> URL: https://issues.apache.org/jira/browse/LUCENE-10129
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See LUCENE-10128 for an example. The problem is there is only a 
> {{sizeOf(long[])}}, so if the programmer uses {{shallowSizeOf}} instead of 
> {{sizeOf}} then it falls back to {{shallowSizeOf(Object)}} which does a bunch 
> of reflection.
> This is pretty crazy because it can create performance traps. Should we just 
> add a {{shallowSizeOf(long[])}} that calls {{sizeOf(long[])}}, so that things 
> are fast? (same for other primitive arrays). It would solve the problem 
> easily I think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10129) Add RamUsageEstimator shallowSizeOf(long[]) overload that just calls sizeOf(long[])?

2021-10-08 Thread Stefan Vodita (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426103#comment-17426103
 ] 

Stefan Vodita commented on LUCENE-10129:


Hello! I am new to Lucene, and this looked like an easy task to start with, so 
I published [a PR|http://example.com] for it.


I am wondering though if _TestRamUsageEstimator_ is missing an _import static 
org.apache.lucene.util.RamUsageEstimator.sizeOf;_, so that in lines like 
_assertEquals(sizeOf(array), sizeOf((Object) array));_ the first _sizeOf()_ 
calls _RamUsageEstimator.sizeOf_, and the second calls _RamUsageTester.sizeOf_.
Apologies if I misunderstood the purpose of the test.

> Add RamUsageEstimator shallowSizeOf(long[]) overload that just calls 
> sizeOf(long[])?
> 
>
> Key: LUCENE-10129
> URL: https://issues.apache.org/jira/browse/LUCENE-10129
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See LUCENE-10128 for an example. The problem is there is only a 
> {{sizeOf(long[])}}, so if the programmer uses {{shallowSizeOf}} instead of 
> {{sizeOf}} then it falls back to {{shallowSizeOf(Object)}} which does a bunch 
> of reflection.
> This is pretty crazy because it can create performance traps. Should we just 
> add a {{shallowSizeOf(long[])}} that calls {{sizeOf(long[])}}, so that things 
> are fast? (same for other primitive arrays). It would solve the problem 
> easily I think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita opened a new pull request #367: LUCENE-10129: Add RamUsageEstimator.shallowSizeOf() for primitive arrays

2021-10-08 Thread GitBox


stefanvodita opened a new pull request #367:
URL: https://github.com/apache/lucene/pull/367


   
   
   
   # Description
   
   `shallowSizeOf(long[])` would call `shallowSizeOf(Object)`, which is slower, 
but leads to the same result as `sizeOf(long[])`. The same applies to all 
primitive types.
   
   # Solution
   
   Overload call to `shallowSizeOf()` for primitive arrays so it would not 
default to `shallowSizeOf(Object)`.
   
   # Tests
   
   Unit test calls `shallowSizeOf()` for primitive arrays and checks that 
result is the same as calling `sizeOf()`.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code 
conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5572) JapaneseTokenizer is sensitive to interrupts

2021-10-08 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426035#comment-17426035
 ] 

Dawid Weiss commented on LUCENE-5572:
-

bq. A GUI application running outside our context is interrupting the thread in 
order to cancel a long-running operation. This is still, to my knowledge at 
least, the only remaining way to do this in Java. 

My experience is that anything involving Thread.interrupt() will cause you 
headaches either due to bugs in other code, like resources not released 
properly (for example thread pools, open file handles), or due to infrequent 
corner cases like this one. Whether you call it a design issue or an 
unfortunate series of events is really secondary to the fact that I don't think 
there is a reliable way to ensure everything works correctly then. Please read 
on.

bq. There's a general expectation in Java that code will behave correctly when 
an interrupt occurs.

Maybe there is such an expectation. My experience says it's not the case. If 
you interrupt threads at unexpected places, things will go wild. We also use 
interrupts - to try to harness deadlocked tests, after a timeout passes. 
Typically this leads to a situation when the main test thread returns but there 
are tons of forked threads that just happily hang there - thread pools are the 
typical offenders. If you do this repeatedly, your will run out of resources 
eventually.

bq. Our library is shielding the GUI application from needing to know that 
they're using Lucene. What is Lucene doing to shield its users from this quirky 
interrupt behaviour of Java?

The code in this Lucene class initializes its (required) resources in a static 
initializer and uses its own class loader to do so. I do think it is a 
reasonable assumption that classpath resources are always available for classes 
- an I/O exception there to me is unrecoverable (for whatever reason). If we 
wanted to "fix" this then an alternative to a static initializer is lazy 
initialization but this entails implementing some form of singleton creation - 
either racy static variable initialization or a lock somewhere. Neither is 
pretty and neither is really required in 99.99% of cases (your use case 
accounts for the rest). 

What I'm saying is that I still don't think it's a bug - I understand your use 
case and frustration but I don't think it requires fixing on Lucene side. If 
you know your library is used in circumstances you describe above, shield your 
users by preloading those classes that have I/O in static initializers - this 
is a very easy thing to do via Class.forName and will ensure everything is in 
place before those GUI threads even have a chance to interrupt anything.

Finally, I don't mean to preach since I know you're a seasoned engineer... but 
in reality thread interrupts won't really do much if your blocked code is 
purely computational - not touching the I/O or monitors. Whenever I had to 
implement interrupting "long running operations" I resorted to delegating jobs 
to a background thread and returning the calling thread to the application 
immediately when the user canceled the operation, leaving the background job to 
run its course (and hopefully release the resources!). If this wasn't feasible 
or too costly, we broke up the job and checked some form of cancellation flag 
manually - this can be done even for third-party libraries with tools like 
bytecode injectors (aspectj or the like) but then you know where you insert 
cancellation checks and have some form of control over what's happening.


> JapaneseTokenizer is sensitive to interrupts
> 
>
> Key: LUCENE-5572
> URL: https://issues.apache.org/jira/browse/LUCENE-5572
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.6.2
>Reporter: Anthony Rasmussen
>Priority: Minor
>
> The constructor for JapaneseTokenizer gets the following singleton instances: 
> TokenInfoDictionary, UnknownDictionary, and ConnectionCosts. I am finding 
> that the associated getInstanceMethods are particularly sensitive to 
> IOExceptions.
> Perhaps, in the static initializers of these  3 singletons, there could be 
> some sort of retry effort before throwing a RunTimeException?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org