[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-12 Thread Alex Klibisz (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134629#comment-17134629
 ] 

Alex Klibisz commented on LUCENE-9378:
--

Hi, just as another datapoint:

I'm using BinaryDocValues to store vectors for this elasticsearch plugin: 
[https://github.com/alexklibisz/elastiknn, 
|https://github.com/alexklibisz/elastiknn] The usecase is actually very similar 
to what [~sokolov] described. I saw a large regression after switching from 
elasticsearch 7.6.x to 7.7.x, which introduces Lucene 8.5.0. 

For instance, here are two screenshots from visualvm running the same benchmark 
on 7.6.x and then 7.7.x.

7.7.x spends a lot more time in the `decompress` method, and actually overtakes 
the `sortedIntersectionCount`  method that was previously most expensive. 

!image-2020-06-12-22-18-48-919.png|width=732,height=50!

!image-2020-06-12-22-18-24-527.png!

Note that this is also comparing Oracle JDK 13 (7.6.x) to Oracle JDK 14 
(7.7.x). As a sanity check, I benchmarked the sortedIntersectionCount 
independently and it did get faster after the JDK switch.

I can provide more detailed info if necessary.

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-12 Thread Alex Klibisz (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: image-2020-06-12-22-18-48-919.png

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-12 Thread Alex Klibisz (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: image-2020-06-12-22-18-24-527.png

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-12 Thread Alex Klibisz (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: image-2020-06-12-22-17-30-339.png

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-12 Thread Alex Klibisz (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: image-2020-06-12-22-17-53-961.png

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] tflobbe commented on a change in pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated

2020-06-12 Thread GitBox


tflobbe commented on a change in pull request #1572:
URL: https://github.com/apache/lucene-solr/pull/1572#discussion_r439686403



##
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##
@@ -1259,6 +1277,20 @@ public SolrCore create(String coreName, Path 
instancePath, Map p
 }
   }
 
+  /**
+   * Checks that the given path is relative to SOLR_HOME, SOLR_DATA_HOME, 
coreRootDirectory or one of the paths
+   * specified in solr.xml's allowPaths element.
+   * @param path path to check
+   * @throws SolrException if path is outside allowed paths
+   */
+  public void assertPathAllowed(Path path) throws SolrException {
+if (path.normalize().equals(path) && !path.isAbsolute()) return;

Review comment:
   This doesn't cover the case of a `../foo` path, right? is that covered 
somewhere else?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14565) Fix or suppress warnings in solrj/impl and solrj/io/graph

2020-06-12 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14565.
---
Fix Version/s: 8.6
   Resolution: Fixed

> Fix or suppress warnings in solrj/impl and solrj/io/graph
> -
>
> Key: SOLR-14565
> URL: https://issues.apache.org/jira/browse/SOLR-14565
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.6
>
>
> The overhead of individual directories is kind of a pain when there aren't 
> very many warnings each, so I'll do these two together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14565) Fix or suppress warnings in solrj/impl and solrj/io/graph

2020-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134601#comment-17134601
 ] 

ASF subversion and git services commented on SOLR-14565:


Commit a23dd8f4feb8d1fc80c681def7c5580eefe2026d in lucene-solr's branch 
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a23dd8f ]

SOLR-14565: Fix or suppress warnings in solrj/impl and solrj/io/graph


> Fix or suppress warnings in solrj/impl and solrj/io/graph
> -
>
> Key: SOLR-14565
> URL: https://issues.apache.org/jira/browse/SOLR-14565
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> The overhead of individual directories is kind of a pain when there aren't 
> very many warnings each, so I'll do these two together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14565) Fix or suppress warnings in solrj/impl and solrj/io/graph

2020-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134599#comment-17134599
 ] 

ASF subversion and git services commented on SOLR-14565:


Commit 6801d4c13982b42007ec6d1ea1f443902e2fb438 in lucene-solr's branch 
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6801d4c ]

SOLR-14565: Fix or suppress warnings in solrj/impl and solrj/io/graph


> Fix or suppress warnings in solrj/impl and solrj/io/graph
> -
>
> Key: SOLR-14565
> URL: https://issues.apache.org/jira/browse/SOLR-14565
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> The overhead of individual directories is kind of a pain when there aren't 
> very many warnings each, so I'll do these two together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12823) remove clusterstate.json in Lucene/Solr 9.0

2020-06-12 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134591#comment-17134591
 ] 

Ilan Ginzburg commented on SOLR-12823:
--

I have learned quite a bit on how SolrCloud tests are done by integrating 
SolrCloud into a new environment that didn't have the test resources infra.
Can't say it was fun, but at least it's done :) (it's not very pretty though)

[https://github.com/apache/lucene-solr/pull/1575]

> remove clusterstate.json in Lucene/Solr 9.0
> ---
>
> Key: SOLR-12823
> URL: https://issues.apache.org/jira/browse/SOLR-12823
> Project: Solr
>  Issue Type: Task
>Reporter: Varun Thacker
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> clusterstate.json is an artifact of a pre 5.0 Solr release. We should remove 
> that in 9.0
> It stays empty unless you explicitly ask to create the collection with the 
> old "stateFormat" and there is no reason for one to create a collection with 
> the old stateFormat.
> We should also remove the "stateFormat" argument in create collection
> We should also remove MIGRATESTATEVERSION as well
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc opened a new pull request #1575: SOLR-12823: fix TestZKPropertiesWriter

2020-06-12 Thread GitBox


murblanc opened a new pull request #1575:
URL: https://github.com/apache/lucene-solr/pull/1575


   
   # Description
   
   Fix TestZKPropertiesWriter that relied on legacy features of the SolrCloud 
cluster to work. These features were removed.
   
   # Solution
   
   Start a MiniSolrCloudCluster (implies config set and other test resources 
config) and have the test use the core of a created collection.
   
   # Tests
   
   Test fix.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [X] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [X] I have developed this patch against the `master` branch.
   - [X] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-12 Thread GitBox


msokolov commented on pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#issuecomment-643498821


   Re those test failures: I was able to fix by checking for an empty merge and 
not submitting it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-12 Thread GitBox


msokolov commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r439657240



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java
##
@@ -109,6 +110,9 @@
   
   /** Default value for whether calls to {@link IndexWriter#close()} include a 
commit. */
   public final static boolean DEFAULT_COMMIT_ON_CLOSE = true;
+
+  /** Default value for time to wait for merges on commit (when using a {@link 
MergePolicy} that implements findFullFlushMerges). */
+  public static final double DEFAULT_MAX_COMMIT_MERGE_WAIT_SECONDS = 30.0;

Review comment:
   The `success=true` added above was needed in order to fix a test failure 
caught by @dnhatn 's new unit test (testRandomOperations), so they belong 
together.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-12 Thread GitBox


mikemccand commented on pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#issuecomment-643496306


   > @mikemccand thanks for replying to all these comments. I do understand 
that this change has an impact and I agree we should add this functionality. I 
just disagree with the how it's done and how much code is used. I will go an 
reply to some of your comments directly, in the meanwhile I went ahead to 
prototype some ideas in how this can be less intrusive and reuse code. I pushed 
one commit here 
[s1monw@3864b6c](https://github.com/s1monw/lucene-solr/commit/3864b6c2b631879fa1e995d47ed2b84aae054747)
 to showcase what I mean. I even think we can get away without a new method on 
MergePolicy but that's too much for the prototype. I'd be ok with adding a 
setting to IWC if we can't agree on a different way.
   
   Thanks @s1monw!  I would love if we could find a simple way to implement 
this feature as long as it keeps the "no wasted work" (merge either finishes in 
time, and is reflected in the commit point, or does not, but still runs to 
completion and is reflected later).  I will review your prototype soon ... I'm 
mostly offline this weekend but will try to look soon.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-12 Thread GitBox


s1monw commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r439647267



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -4483,6 +4593,7 @@ public int length() {
 // Merge would produce a 0-doc segment, so we do nothing except commit 
the merge to remove all the 0-doc segments that we "merged":
 assert merge.info.info.maxDoc() == 0;
 commitMerge(merge, mergeState);
+success = true;

Review comment:
   can we fix it in a dedicated PR with a dedicated test? that would also 
help if we look at the history of the bugfix?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-12 Thread GitBox


s1monw commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r439647024



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3228,15 +3268,38 @@ private long prepareCommitInternal() throws IOException 
{
   // sneak into the commit point:
   toCommit = segmentInfos.clone();
 
+  if (anyChanges) {
+// Find any merges that can execute on commit (per 
MergePolicy).
+MergePolicy.MergeSpecification mergeSpec =

Review comment:
   I tried to showcase this here 
https://github.com/s1monw/lucene-solr/commit/3864b6c2b631879fa1e995d47ed2b84aae054747





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-12 Thread GitBox


s1monw commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r439646861



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3257,6 +3320,52 @@ private long prepareCommitInternal() throws IOException {
   } finally {
 maybeCloseOnTragicEvent();
   }
+
+  if (mergeAwaitLatchRef != null) {
+CountDownLatch mergeAwaitLatch = mergeAwaitLatchRef.get();
+// If we found and registered any merges above, within the flushLock, 
then we want to ensure that they
+// complete execution. Note that since we released the lock, other 
merges may have been scheduled. We will
+// block until  the merges that we registered complete. As they 
complete, they will update toCommit to
+// replace merged segments with the result of each merge.
+config.getIndexWriterEvents().beginMergeOnCommit();
+mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT);
+long mergeWaitStart = System.nanoTime();
+int abandonedCount = 0;
+long waitTimeMillis = (long) (config.getMaxCommitMergeWaitSeconds() * 
1000.0);
+try {
+  if (mergeAwaitLatch.await(waitTimeMillis, TimeUnit.MILLISECONDS) == 
false) {
+synchronized (this) {
+  // Need to do this in a synchronized block, to make sure none of 
our commit merges are currently
+  // executing mergeFinished (since mergeFinished itself is called 
from within the IndexWriter lock).
+  // After we clear the value from mergeAwaitLatchRef, the merges 
we schedule will still execute as
+  // usual, but when they finish, they won't attempt to update 
toCommit or modify segment reference
+  // counts.
+  mergeAwaitLatchRef.set(null);
+  for (MergePolicy.OneMerge commitMerge : commitMerges) {
+if (runningMerges.contains(commitMerge) || 
pendingMerges.contains(commitMerge)) {
+  abandonedCount++;
+}
+  }
+}
+  }
+} catch (InterruptedException ie) {
+  throw new ThreadInterruptedException(ie);
+} finally {
+  if (infoStream.isEnabled("IW")) {
+infoStream.message("IW", String.format(Locale.ROOT, "Waited %.1f 
ms for commit merges",
+(System.nanoTime() - mergeWaitStart)/1_000_000.0));
+infoStream.message("IW", "After executing commit merges, had " + 
toCommit.size() + " segments");
+if (abandonedCount > 0) {
+  infoStream.message("IW", "Abandoned " + abandonedCount + " 
commit merges after " + waitTimeMillis + " ms");
+}
+  }
+  if (abandonedCount > 0) {
+
config.getIndexWriterEvents().abandonedMergesOnCommit(abandonedCount);

Review comment:
   I think we should detach this discussion if we need metrics on IW from 
this PR. It distracts from it's actual core change IMO and if'd add metrics 
then we'd need some more or can consolidate them too. I'd rather have a stats 
object than a callback here to be honest but again that's a different 
discussion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-12 Thread GitBox


s1monw commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r439646326



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java
##
@@ -109,6 +110,9 @@
   
   /** Default value for whether calls to {@link IndexWriter#close()} include a 
commit. */
   public final static boolean DEFAULT_COMMIT_ON_CLOSE = true;
+
+  /** Default value for time to wait for merges on commit (when using a {@link 
MergePolicy} that implements findFullFlushMerges). */
+  public static final double DEFAULT_MAX_COMMIT_MERGE_WAIT_SECONDS = 30.0;

Review comment:
   maybe 0 as a default and if somebody want's to wait they can set it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-12 Thread GitBox


s1monw commented on pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#issuecomment-643486962


   @mikemccand thanks for replying to all these comments. I do understand that 
this change has an impact and I agree we should add this functionality. I just 
disagree with the how it's done and how much code is used. I will go an reply 
to some of your comments directly, in the meanwhile I went ahead to prototype 
some ideas in how this can be less intrusive and reuse code. I pushed one 
commit here 
https://github.com/s1monw/lucene-solr/commit/3864b6c2b631879fa1e995d47ed2b84aae054747
 to showcase what I mean. I even think we can get away without a new method on 
MergePolicy but that's too much for the prototype. I'd be ok with adding a 
setting to IWC if we can't agree on a different way. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-12 Thread GitBox


msokolov commented on pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#issuecomment-643476200


   Test case #1 above, at least does reproduce (and #4 looks like a similar 
stack trace; I did not try it):
   
   gradlew :lucene:core:test --tests 
"org.apache.lucene.index.TestIndexFileDeleter" -Ptests.seed=DC21EB3B9D4052A4 
   
   My experience with #3 (testRandomOperations) is it doesn't tend to reproduce 
with a given seed. It is indeed quite random. And these did not reproduce for 
me.
   
   #5 reproduces:
   
   gradlew :lucene:core:test  -Dtestcase=TestIndexWriterExceptions2 
-Dtests.method=testBasics -Dtests.seed=AC9C0966B9BC03C8
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9389) Enhance gradle logging calls validation: eliminate getMessage()

2020-06-12 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134505#comment-17134505
 ] 

Lucene/Solr QA commented on LUCENE-9389:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m  
8s{color} | {color:green} luke in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}  2m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-9389 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13005553/LUCENE-9389.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 8cbfb192ab1 |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/284/testReport/ |
| modules | C: lucene/luke U: lucene/luke |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/284/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Enhance gradle logging calls validation: eliminate getMessage()
> ---
>
> Key: LUCENE-9389
> URL: https://issues.apache.org/jira/browse/LUCENE-9389
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andras Salamon
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-9389.patch
>
>
> SOLR-14280 fixed a logging problem in SolrConfig by removing a few 
> getMessage() calls. We could enhance this solution by modifying gradle's 
> logging calls validation and forbid getMessage() calls during logging. We 
> should check the existing code and eliminate such calls.
> It is possible to suppress the warning using {{//logok}}.
> [~erickerickson] [~gerlowskija]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11973) Selectively fail on precommit WARN messages

2020-06-12 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134455#comment-17134455
 ] 

Erick Erickson commented on SOLR-11973:
---

Thanks Tomás and Jason.

Tomás:

Hmmm, yes, perhaps a Lucene Jira or, perhaps, one in Solr and one in Lucene and 
we can turn this on separately.

Jason:

Yeah, a lot depends on exactly how annoying it will be. I have to point that 
IntelliJ (and, I assume Ecplise and NetBeans) highlights all of these as you 
edit rather than leave them as something to clean up later. And the 
"build>recompile blah.java" will bring up the errors in a convenient window you 
can click on. And the "assemble" or "classes" task in the gradle window also 
points out all of these from within the IDE. I suppose that if people turn off 
the compile and then don't clean up after themselves, the immediate compile 
failures on Jenkins will shame them into taking more care so turning them off 
easily isn't a problem ;). I'll see if I can.

And there are 21 classes that have a warning about classes that define equals 
but not hashCode(), which can be a source of subtle bugs.

Other than that, though, it's somewhat of a leap of faith. This pass probably 
won't fix anything, I claim that by and large, the bugs that would have been 
flagged were fixed already...painfully. If we tightened up our typing, I 
suspect we wouldn't have had as many "cannot cast" exceptions as we've had but 
I've no hard evidence that that's true. And this pass isn't doing anything 
about that anyway.

 

> Selectively fail on precommit WARN messages
> ---
>
> Key: SOLR-11973
> URL: https://issues.apache.org/jira/browse/SOLR-11973
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Minor
>
> Not quite sure whether this qualifies as something for Solr or Lucene
> I'm working gradually on getting precommit lint warnings out of the code 
> base. I'd like to selectively fail a subtree once it's clean. I played around 
> a bit with Robert's suggestions on the dev list but couldn't quite get it to 
> work, then decided I needed to focus on one thing at a time.
> See SOLR-10809 for the first clean directory Real Soon Now.
> Bonus points would be working out how to fail on deprecation warnings when 
> building Solr too, although that's farther off in the future.
> Assigning to myself, but anyone who knows the build ins and outs _please_ 
> feel free to take it!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

2020-06-12 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134436#comment-17134436
 ] 

Michael Gibney commented on SOLR-13132:
---

Ah, no worries! Though in light of this discussion I re-thought the "Shim" and 
just pushed a commit (e7bb94922fbf10e501a006e672cec483a0f21b56) that:

# removes the Shim class entirely (it was only ever a convenience that saved a 
handful of lines of code in non-hot parts of the {{collect}} methods -- but it 
came at the expense of even _more_ lines of a "Shim" class overriding all 
methods to ensure that it's not capable of doing any of the things its parent 
class does ... not nice, to be sure).
# preserves the ability (e.g., of subclasses) to set {{countAccs}} that don't 
support sweep collection
# allows {{DEV_NULL_COUNT_ACC}} to remain a direct subclass of 
{{CountSlotAcc}}, allowing it to be used without inadvertently  implying that 
sweeping should be supported.

I think this most recent iteration (commit 
e7bb94922fbf10e501a006e672cec483a0f21b56) would be my preference, as long as it 
addresses your concerns. But if you _prefer_ to require that custom 
{{countAcc}} must support sweeping, that would be fine too -- I wasn't able to 
think of an actual use case or other argument for _specifically_ supporting 
non-sweep {{countAcc}} in {{FacetFieldProcessorByArray*}} ... other than "don't 
remove support for something without a compelling reason", and "make 
{{DEV_NULL_COUNT_ACC}} potentially useful in unanticipated contexts, without 
forcing it to imply sweep collection support".

It sounds like we're on the same page wrt the "which slot is allBuckets when 
sweeping?" question (and its possible solutions); and I agree it makes sense to 
punt on settling that question, for the moment. Accordingly, I'll to take a 
pass at addressing some of the other outstanding nocommits ...

> Improve JSON "terms" facet performance when sorted by relatedness 
> --
>
> Key: SOLR-13132
> URL: https://issues.apache.org/jira/browse/SOLR-13132
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module
>Affects Versions: 7.4, master (9.0)
>Reporter: Michael Gibney
>Priority: Major
> Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-12 Thread Jun Ohtani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134428#comment-17134428
 ] 

Jun Ohtani edited comment on LUCENE-9390 at 6/12/20, 5:53 PM:
--

I also checked *UniDic* around punctuation character, because I was working on 
[https://github.com/apache/lucene-solr/pull/935] .
 # word that starts punctuation character : 606 words. 222  words that length > 
1
 # word that all punctuation character : 111 words
 # word that has punctuation without 1st char: 1780 words

Here is the word list.

[https://gist.github.com/johtani/3769639bc24ebeab17ddcb1be039ba94]


was (Author: jun_o):
I also checked *UniDic* around punctuation character, because I was working on 
[https://github.com/apache/lucene-solr/pull/935] .
 # word that starts punctuation character : 606 words. 222  words that length > 
1
 # word that all punctuation character : 111 words
 # word that has punctuation without 1st char: 1780 words

> Kuromoji tokenizer discards tokens if they start with a punctuation character
> -
>
> Key: LUCENE-9390
> URL: https://issues.apache.org/jira/browse/LUCENE-9390
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
>
> This issue was first raised in Elasticsearch 
> [here|https://github.com/elastic/elasticsearch/issues/57614]
> The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
> that mix punctuations and other characters. For instance the following entry:
> _(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_
> can be found in the Noun.csv file.
> Today, tokens that start with punctuations are automatically removed by 
> default (discardPunctuation  is true). I think the code was written this way 
> because we expect punctuations to be separated from normal tokens but there 
> are exceptions in the original dictionary. Maybe we should check the entire 
> token when discarding punctuations ?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-12 Thread Jun Ohtani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134428#comment-17134428
 ] 

Jun Ohtani commented on LUCENE-9390:


I also checked *UniDic* around punctuation character, because I was working on 
[https://github.com/apache/lucene-solr/pull/935] .
 # word that starts punctuation character : 606 words. 222  words that length > 
1
 # word that all punctuation character : 111 words
 # word that has punctuation without 1st char: 1780 words

> Kuromoji tokenizer discards tokens if they start with a punctuation character
> -
>
> Key: LUCENE-9390
> URL: https://issues.apache.org/jira/browse/LUCENE-9390
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
>
> This issue was first raised in Elasticsearch 
> [here|https://github.com/elastic/elasticsearch/issues/57614]
> The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
> that mix punctuations and other characters. For instance the following entry:
> _(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_
> can be found in the Noun.csv file.
> Today, tokens that start with punctuations are automatically removed by 
> default (discardPunctuation  is true). I think the code was written this way 
> because we expect punctuations to be separated from normal tokens but there 
> are exceptions in the original dictionary. Maybe we should check the entire 
> token when discarding punctuations ?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11973) Selectively fail on precommit WARN messages

2020-06-12 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134397#comment-17134397
 ] 

Jason Gerlowski commented on SOLR-11973:


It sounds like a Good Thing to me.  Though I reserve the right to change my 
mind once it's turned on and we see how annoying it might be :P

One caveat: when you set this up in ant/gradle, can there be an easy way to 
disable it?  It'd suck if I had to fix warnings in my code just to {{ant 
server}} as the first step in running a manual test.

One question: I think there's this common-sense idea that eliminating warnings 
improves code quality.  And that makes sense, but it's nebulous.  I'm just 
curious - have you found any concrete examples that make the benefit here more 
tangible?  Any bugs you've found in the course of resolving these warnings?  
Any resource leaks?  etc.  Just curious what you experience has been. 

> Selectively fail on precommit WARN messages
> ---
>
> Key: SOLR-11973
> URL: https://issues.apache.org/jira/browse/SOLR-11973
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Minor
>
> Not quite sure whether this qualifies as something for Solr or Lucene
> I'm working gradually on getting precommit lint warnings out of the code 
> base. I'd like to selectively fail a subtree once it's clean. I played around 
> a bit with Robert's suggestions on the dev list but couldn't quite get it to 
> work, then decided I needed to focus on one thing at a time.
> See SOLR-10809 for the first clean directory Real Soon Now.
> Bonus points would be working out how to fail on deprecation warnings when 
> building Solr too, although that's farther off in the future.
> Assigning to myself, but anyone who knows the build ins and outs _please_ 
> feel free to take it!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11973) Selectively fail on precommit WARN messages

2020-06-12 Thread Tomas Eduardo Fernandez Lobbe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134377#comment-17134377
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-11973:
--

+1 to the idea. Thanks for taking on this huge tasks. My thoughts:
bq. Should this be master-only or include 8x?
I think master only should be fine. 8.x would be a bonus and would make the 
code similar which makes backporting easier, but I would be fine if it's not 
done there. I don't know if there are differences in warnings between java 11 
and 8 (which 8.x needs to support) that could complicate things.
bq. Gradle-only or both Gradle and Ant? I propose both if it's easy.
Agree, both if it's easy. Gradle is more important IMO, Ant is optional if we 
don't do 8.x
bq. 4> Solr and Lucene both? I propose both.
+1. Maybe this issue should be turned into a LUCENE one? although the email to 
the dev list should alert everyone anyways.

> Selectively fail on precommit WARN messages
> ---
>
> Key: SOLR-11973
> URL: https://issues.apache.org/jira/browse/SOLR-11973
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Minor
>
> Not quite sure whether this qualifies as something for Solr or Lucene
> I'm working gradually on getting precommit lint warnings out of the code 
> base. I'd like to selectively fail a subtree once it's clean. I played around 
> a bit with Robert's suggestions on the dev list but couldn't quite get it to 
> work, then decided I needed to focus on one thing at a time.
> See SOLR-10809 for the first clean directory Real Soon Now.
> Bonus points would be working out how to fail on deprecation warnings when 
> building Solr too, although that's farther off in the future.
> Assigning to myself, but anyone who knows the build ins and outs _please_ 
> feel free to take it!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14566) Record "NOW" on "coordinator" log messages

2020-06-12 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134368#comment-17134368
 ] 

Jason Gerlowski commented on SOLR-14566:


I created a PR that takes the "NOW" approach described above.  If anyone has 
any opinions on approach, very curious to hear what you think.

> Record "NOW" on "coordinator" log messages
> --
>
> Key: SOLR-14566
> URL: https://issues.apache.org/jira/browse/SOLR-14566
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, in SolrCore.java we log each search request that comes through 
> each core as it is finishing.  This includes the path, query-params, QTime, 
> and status.  In the case of a distributed search both the "coordinator" node 
> and each of the per-shard requests produce a log message.
> When Solr is fielding many identical queries, such as those created by a 
> healthcheck or dashboard, it can be hard when examining logs to link the 
> per-shard requests with the "cooordinator" request that came in upstream.
> One thing that would make this easier is if the {{NOW}} param added to 
> per-shard requests is also included in the log message from the 
> "coordinator".  While {{NOW}} isn't unique strictly speaking, it often is in 
> practice, and along with the query-params would allow debuggers to associate 
> shard requests with coordinator requests a large majority of the time.
> An alternative approach would be to create a {{qid}} or {{query-uuid}} when 
> the coordinator starts its work that can be logged everywhere.  This provides 
> a stronger expectation around uniqueness, but would require UUID generation 
> on the coordinator, which may be non-negligible work at high QPS (maybe? I 
> have no idea).  It also loses the neatness of reusing data already present on 
> the shard requests.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija opened a new pull request #1574: SOLR-14566: Log NOW value on coordinator node for dist search requests

2020-06-12 Thread GitBox


gerlowskija opened a new pull request #1574:
URL: https://github.com/apache/lucene-solr/pull/1574


   # Description
   
   When reading through logs, associating the log message from the coordinator 
node with the messages for per-shard requests downstream can be difficult.  
Nothing unique (or semi-uniquely) identifies shard requests as being part of an 
upstream request. 
   
   # Solution
   
   This PR makes use of the NOW param that is already added to all downstream 
shard requests, and adds it to the log message from the coordinator node as 
well.  While NOW isn't strictly unique (it's a millisecond-granularity 
timestamp), the combination of NOW and the request params should allow 
debugging administrators to trace per-shard requests to a coordinator request  
(and vice versa) with confidence.
   
   # Tests
   
   Manual testing to ensure the log field is added in appropriate situations.  
No added automated tests, though if anyone knows a dependable way to do so, I'm 
open to it.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14566) Record "NOW" on "coordinator" log messages

2020-06-12 Thread Jason Gerlowski (Jira)
Jason Gerlowski created SOLR-14566:
--

 Summary: Record "NOW" on "coordinator" log messages
 Key: SOLR-14566
 URL: https://issues.apache.org/jira/browse/SOLR-14566
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Jason Gerlowski
Assignee: Jason Gerlowski


Currently, in SolrCore.java we log each search request that comes through each 
core as it is finishing.  This includes the path, query-params, QTime, and 
status.  In the case of a distributed search both the "coordinator" node and 
each of the per-shard requests produce a log message.

When Solr is fielding many identical queries, such as those created by a 
healthcheck or dashboard, it can be hard when examining logs to link the 
per-shard requests with the "cooordinator" request that came in upstream.

One thing that would make this easier is if the {{NOW}} param added to 
per-shard requests is also included in the log message from the "coordinator".  
While {{NOW}} isn't unique strictly speaking, it often is in practice, and 
along with the query-params would allow debuggers to associate shard requests 
with coordinator requests a large majority of the time.

An alternative approach would be to create a {{qid}} or {{query-uuid}} when the 
coordinator starts its work that can be logged everywhere.  This provides a 
stronger expectation around uniqueness, but would require UUID generation on 
the coordinator, which may be non-negligible work at high QPS (maybe? I have no 
idea).  It also loses the neatness of reusing data already present on the shard 
requests.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1552: LUCENE-8962

2020-06-12 Thread GitBox


mikemccand commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r439492017



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -4483,6 +4593,7 @@ public int length() {
 // Merge would produce a 0-doc segment, so we do nothing except commit 
the merge to remove all the 0-doc segments that we "merged":
 assert merge.info.info.maxDoc() == 0;
 commitMerge(merge, mergeState);
+success = true;

Review comment:
   I think this was a small pre-existing bug.
   
   I.e. the merge has in fact succeeded on this path.  Before this change we 
are calling `closeMergeReaders` twice (once in the line above this, then again 
on line 4720 below.  Maybe that is harmless, but code-wise I think this path 
did succeed.
   
   If necessary, we could pull this out into its own PR?  But I think it's a 
good, if subtle, catch.  The merge did succeed in this path.

##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3228,15 +3268,38 @@ private long prepareCommitInternal() throws IOException 
{
   // sneak into the commit point:
   toCommit = segmentInfos.clone();
 
+  if (anyChanges) {
+// Find any merges that can execute on commit (per 
MergePolicy).
+MergePolicy.MergeSpecification mergeSpec =

Review comment:
   I think what makes this tricky is that this is a combination of 
`MergePolicy` (to pick the small merges) and `MergeScheduler` (to run them and 
await their completion, subject to a time limit) purposes.
   
   I do not think you can achieve this by just wrapping in `MergePolicy`, but I 
agree it would be better if we could find a simpler way to achieve it.

##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java
##
@@ -459,6 +463,31 @@ public IndexWriterConfig setCommitOnClose(boolean 
commitOnClose) {
 return this;
   }
 
+  /**
+   * Expert: sets the amount of time to wait for merges returned by 
MergePolicy.findFullFlushMerges(...).
+   * If this time is reached, we proceed with the commit based on segments 
merged up to that point.
+   * The merges are not cancelled, and may still run to completion independent 
of the commit.
+   */
+  public IndexWriterConfig setMaxCommitMergeWaitSeconds(double 
maxCommitMergeWaitSeconds) {

Review comment:
   I think @msfroh had considered a separate `IndexWriter` method before 
but something went wrong with that approach?
   
   I don't think this should be a separate method, actually.
   
   We have a `MergePolicy` that governs which merges should happen upon which 
events/triggers and what this change is adding is a new trigger (on commit) at 
which merging could conceivably occur.  If we added this method, the 
implication to fresh eyes would be that the existing `prepareCommit` will also 
wait for merges with some default parameter, while this new method lets you 
change the default.
   
   Anyway, let's hear from @msfroh if there was some wrinkle on making a 
dedicated method for this, but I still think that's a messy API.  We should 
rather use our existing `MergePolicy` API correctly.

##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3257,6 +3320,52 @@ private long prepareCommitInternal() throws IOException {
   } finally {
 maybeCloseOnTragicEvent();
   }
+
+  if (mergeAwaitLatchRef != null) {
+CountDownLatch mergeAwaitLatch = mergeAwaitLatchRef.get();
+// If we found and registered any merges above, within the flushLock, 
then we want to ensure that they
+// complete execution. Note that since we released the lock, other 
merges may have been scheduled. We will
+// block until  the merges that we registered complete. As they 
complete, they will update toCommit to
+// replace merged segments with the result of each merge.
+config.getIndexWriterEvents().beginMergeOnCommit();
+mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT);
+long mergeWaitStart = System.nanoTime();
+int abandonedCount = 0;
+long waitTimeMillis = (long) (config.getMaxCommitMergeWaitSeconds() * 
1000.0);
+try {
+  if (mergeAwaitLatch.await(waitTimeMillis, TimeUnit.MILLISECONDS) == 
false) {
+synchronized (this) {
+  // Need to do this in a synchronized block, to make sure none of 
our commit merges are currently
+  // executing mergeFinished (since mergeFinished itself is called 
from within the IndexWriter lock).
+  // After we clear the value from mergeAwaitLatchRef, the merges 
we schedule will still execute as
+  // usual, but when they finish, they won't attempt to update 
toCommit or modify segment reference
+  // counts.
+  

[jira] [Commented] (LUCENE-8574) ExpressionFunctionValues should cache per-hit value

2020-06-12 Thread Haoyu Zhai (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134323#comment-17134323
 ] 

Haoyu Zhai commented on LUCENE-8574:


Sry, wrong commit message pointed to here, the correct issue should be 
LUCENE-9391.

BTW, is this patch ever merged?

> ExpressionFunctionValues should cache per-hit value
> ---
>
> Key: LUCENE-8574
> URL: https://issues.apache.org/jira/browse/LUCENE-8574
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.5, 8.0
>Reporter: Michael McCandless
>Assignee: Robert Muir
>Priority: Major
> Attachments: LUCENE-8574.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The original version of {{ExpressionFunctionValues}} had a simple per-hit 
> cache, so that nested expressions that reference the same common variable 
> would compute the value for that variable the first time it was referenced 
> and then use that cached value for all subsequent invocations, within one 
> hit.  I think it was accidentally removed in LUCENE-7609?
> This is quite important if you have non-trivial expressions that reference 
> the same variable multiple times.
> E.g. if I have these expressions:
> {noformat}
> x = c + d
> c = b + 2 
> d = b * 2{noformat}
> Then evaluating x should only cause b's value to be computed once (for a 
> given hit), but today it's computed twice.  The problem is combinatoric if b 
> then references another variable multiple times, etc.
> I think to fix this we just need to restore the per-hit cache?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw opened a new pull request #1573: Cleanup TermsHashPerField

2020-06-12 Thread GitBox


s1monw opened a new pull request #1573:
URL: https://github.com/apache/lucene-solr/pull/1573


   Several classes within the IndexWriter indexing chain haven't been touched 
for several years. Most of these classes expose their internals through public 
members and are difficult to construct in tests since they depend on many other 
classes. This change tries to clean up TermsHashPerField and adds a dedicated 
standalone test for it to make it more accessible for other developers since 
it's simpler to understand. There are also attempts to make documentation 
better as a result of this refactoring.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14563) Fix or suppress warnings in solr/contrib

2020-06-12 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14563.
---
Fix Version/s: 8.6
   Resolution: Fixed

> Fix or suppress warnings in solr/contrib
> 
>
> Key: SOLR-14563
> URL: https://issues.apache.org/jira/browse/SOLR-14563
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.6
>
>
> There aren't very many in any of these individual directories, so I'll do 
> them all at once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14563) Fix or suppress warnings in solr/contrib

2020-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134292#comment-17134292
 ] 

ASF subversion and git services commented on SOLR-14563:


Commit 613abf2d1c085727ec87ddc39b9938fd4d059dd1 in lucene-solr's branch 
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=613abf2 ]

SOLR-14563: Fix or suppress warnings in solr/contrib


> Fix or suppress warnings in solr/contrib
> 
>
> Key: SOLR-14563
> URL: https://issues.apache.org/jira/browse/SOLR-14563
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> There aren't very many in any of these individual directories, so I'll do 
> them all at once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14563) Fix or suppress warnings in solr/contrib

2020-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134291#comment-17134291
 ] 

ASF subversion and git services commented on SOLR-14563:


Commit 8cbfb192ab151312efe7d0de42478329604cba90 in lucene-solr's branch 
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8cbfb19 ]

SOLR-14563: Fix or suppress warnings in solr/contrib


> Fix or suppress warnings in solr/contrib
> 
>
> Key: SOLR-14563
> URL: https://issues.apache.org/jira/browse/SOLR-14563
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> There aren't very many in any of these individual directories, so I'll do 
> them all at once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] rmuir commented on pull request #1557: LUCENE-9396: Improve truncation detection for points.

2020-06-12 Thread GitBox


rmuir commented on pull request #1557:
URL: https://github.com/apache/lucene-solr/pull/1557#issuecomment-643316427


   Yeah, you are right, its not that bad. But still i would prefer to just 
deliver the clear exception without adding a less-clear suppressed one.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14384) Stack SolrRequestInfo

2020-06-12 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134281#comment-17134281
 ] 

David Smiley commented on SOLR-14384:
-

I'm inclined to merge the PR as-is for master & 8x Monday.

> Stack SolrRequestInfo
> -
>
> Key: SOLR-14384
> URL: https://issues.apache.org/jira/browse/SOLR-14384
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Sometimes SolrRequestInfo need to be suspended/overridden with a new one that 
> is used temporarily. Examples are in the {{[subquery]}} transformer, and in 
> warm of caches, and in QuerySenderListener (another type of warming), maybe 
> others.  This can be annoying to do correctly, and in at least one place it 
> isn't done correctly.  SolrRequestInfoSuspender shows some complexity.  In 
> this issue, [~dsmiley] proposes using a stack internally to SolrRequestInfo 
> that is push'ed and pop'ed.  It's not the only way to solve this but it's one 
> way.
>  See linked issues for the context and discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-12 Thread Jun Ohtani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134249#comment-17134249
 ] 

Jun Ohtani edited comment on LUCENE-9390 at 6/12/20, 2:45 PM:
--

I counted 3 types of words in ipadic csv files.
 # word that starts punctuation character : 101 words. only 4 words that length 
> 1
 # word that all punctuation character : 3 words
 # word that has punctuation without 1st char: 723 words

For no.3, just counted because I was curious it. 

Reference : Word list.

 [https://gist.github.com/johtani/50aa2776a385c5c8dfa3a0d1e4e268cd]

4 words that starts punctuation are below:
(社)
 (財)
 (有)
 (株)

all punctuation words are :

——
 −−
 ──
  


was (Author: jun_o):
I counted 3 types of words in ipadic csv files. 
 # word that starts punctuation character : 104 words. only 7 words that length 
> 1
 # word that all punctuation character : 0 words
 # word that has punctuation without 1st char: 723 words

Word list.
 [https://gist.github.com/johtani/50aa2776a385c5c8dfa3a0d1e4e268cd]



7 words are below:
——
−−
──
(社)
(財)
(有)
(株)
 

> Kuromoji tokenizer discards tokens if they start with a punctuation character
> -
>
> Key: LUCENE-9390
> URL: https://issues.apache.org/jira/browse/LUCENE-9390
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
>
> This issue was first raised in Elasticsearch 
> [here|https://github.com/elastic/elasticsearch/issues/57614]
> The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
> that mix punctuations and other characters. For instance the following entry:
> _(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_
> can be found in the Noun.csv file.
> Today, tokens that start with punctuations are automatically removed by 
> default (discardPunctuation  is true). I think the code was written this way 
> because we expect punctuations to be separated from normal tokens but there 
> are exceptions in the original dictionary. Maybe we should check the entire 
> token when discarding punctuations ?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #1557: LUCENE-9396: Improve truncation detection for points.

2020-06-12 Thread GitBox


jpountz commented on pull request #1557:
URL: https://github.com/apache/lucene-solr/pull/1557#issuecomment-643297559


   I think it's doing the opposite actually:
   
   ```
   CorruptIndexException("truncated file: length=" + in.length() + " but 
expectedLength==" + expectedLength, in)
  |--- suppressed: CorruptIndexException("misplaced codec footer (file 
truncated?): remaining=" + remaining + ", expected=" + expected + ", fp=" + 
in.getFilePointer(), in)
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] rmuir commented on pull request #1557: LUCENE-9396: Improve truncation detection for points.

2020-06-12 Thread GitBox


rmuir commented on pull request #1557:
URL: https://github.com/apache/lucene-solr/pull/1557#issuecomment-643294631


   How would the length in the meta file be incorrect? It comes from a 
checksum-verified file, right?
   
   I think with the current PR, you'll get a somewhat confusing exception for a 
truncated file, probably something like this:
   ```
   CorruptIndexException("misplaced codec footer (file truncated?): remaining=" 
+ remaining + ", expected=" + expected + ", fp=" + in.getFilePointer(), in)
  |--- suppressed: new CorruptIndexException("truncated file: length=" + 
in.length() + " but expectedLength==" + expectedLength, in)
   ```
   
   Since we actually know the length up-front, we can take advantage of that to 
just deliver the better exception: `truncated file` directly to the user?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #1557: LUCENE-9396: Improve truncation detection for points.

2020-06-12 Thread GitBox


jpountz commented on pull request #1557:
URL: https://github.com/apache/lucene-solr/pull/1557#issuecomment-643292082


   I liked doing the suppressed dance to still get information about the shape 
of the footer, e.g. for the case when the footer would be correct but the 
length retrieved from the meta file would be incorrect, similarly to how 
`checkFooter` tells us that the checksum passed when reading a file failed even 
though the checksum is correct. No strong feelings, I don't mind removing it if 
you don't like it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-12 Thread Jun Ohtani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134249#comment-17134249
 ] 

Jun Ohtani commented on LUCENE-9390:


I counted 3 types of words in ipadic csv files. 
 # word that starts punctuation character : 104 words. only 7 words that length 
> 1
 # word that all punctuation character : 0 words
 # word that has punctuation without 1st char: 723 words

Word list.
 [https://gist.github.com/johtani/50aa2776a385c5c8dfa3a0d1e4e268cd]



7 words are below:
——
−−
──
(社)
(財)
(有)
(株)
 

> Kuromoji tokenizer discards tokens if they start with a punctuation character
> -
>
> Key: LUCENE-9390
> URL: https://issues.apache.org/jira/browse/LUCENE-9390
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
>
> This issue was first raised in Elasticsearch 
> [here|https://github.com/elastic/elasticsearch/issues/57614]
> The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
> that mix punctuations and other characters. For instance the following entry:
> _(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_
> can be found in the Noun.csv file.
> Today, tokens that start with punctuations are automatically removed by 
> default (discardPunctuation  is true). I think the code was written this way 
> because we expect punctuations to be separated from normal tokens but there 
> are exceptions in the original dictionary. Maybe we should check the entire 
> token when discarding punctuations ?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11973) Selectively fail on precommit WARN messages

2020-06-12 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134248#comment-17134248
 ] 

Erick Erickson commented on SOLR-11973:
---

We're getting perilously close to having warning-free compilations on master. 
Once that's accomplished, I intend to see if we can fail compilation on 
warnings. We'll still be able to get around that by adding SuppressWarnings. 
I'll warn you in advance that I've enhanced the BadApple report to flag files 
that have increases in that annotation week-to-week and I'll nag people about 
adding them rather than addressing the cause.

Deprecations are _not_ included in this, that's another topic entirely. This is 
all the annoying "rawtypes", "unchecked", "try", blah blah blah.

So before I go there, I'd like some consensus about whether this is A Good Idea 
or not.

My take is that yes it is. We've ignored this kind of thing for years. the 
result is that there were over 5,000 such warnings in the code when I started. 
Some of this is legacy in the sense that some of the code was written on old 
versions of Java. Some of it is by example (hey, I see over here that something 
like "List = new ArrayList();" is used, so I guess that's OK). Some of 
it is because nobody (including me) has actually paid any attention to warnings 
for years.

It's far too vast a job to correct these all at once. And it's always 
questionable whether or not to rewrite lots of code for this kind of thing 
unless you're fixing something functional. The idea here is to get better going 
forward; "progress not perfection".

So, if we start failing compilations for new/changed code that generates 
warnings, I believe the code quality will improve. Plus it'll make people learn 
about generics ;). We need all the help from the compiler we can get.

So my questions:

1> do people strenuously object or not? If you object, how do you propose 
getting better? What we've been doing clearly hasn't worked.

2> Should this be master-only or include 8x? If the latter, we'll need to run a 
separate pass on 8x to pick up any differences. So far, we've only been 
insuring that master is warning free, and backporting to 8x without checking 
whether there were still warnings reported when it was safe. Backporting when 
it's safe has been largely to keep the code lines in sync so it wouldn't 
interfere with other work unnecessarily. I propose failing only on master.

3> Gradle-only or both Gradle and Ant? I propose both if it's easy.

4> Solr and Lucene both? I propose both.

Timeframe: Master should be warning-free sometime next week. I'd like to 
immediately implement any decisions here when that's true.

> Selectively fail on precommit WARN messages
> ---
>
> Key: SOLR-11973
> URL: https://issues.apache.org/jira/browse/SOLR-11973
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Minor
>
> Not quite sure whether this qualifies as something for Solr or Lucene
> I'm working gradually on getting precommit lint warnings out of the code 
> base. I'd like to selectively fail a subtree once it's clean. I played around 
> a bit with Robert's suggestions on the dev list but couldn't quite get it to 
> work, then decided I needed to focus on one thing at a time.
> See SOLR-10809 for the first clean directory Real Soon Now.
> Bonus points would be working out how to fail on deprecation warnings when 
> building Solr too, although that's farther off in the future.
> Assigning to myself, but anyone who knows the build ins and outs _please_ 
> feel free to take it!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] rmuir commented on pull request #1557: LUCENE-9396: Improve truncation detection for points.

2020-06-12 Thread GitBox


rmuir commented on pull request #1557:
URL: https://github.com/apache/lucene-solr/pull/1557#issuecomment-643285331


   i'm proposing something like this for the new method:
   ```
   if (expectedLength < footerLength()) {
 throw new IllegalArgumentException("expectedLength cannot be less than the 
footer length");
   }
   if (in.length() < expectedLength) {
 throw new CorruptIndexException("truncated file: length=" + in.length() + 
" but expectedLength==" + expectedLength, in);
   } else if (in.length() > expectedLength) {
 throw new CorruptIndexException("file too long: length=" + in.length() + " 
but expectedLength==" + expectedLength, in);
   }
   return retrieveChecksum(in);
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] rmuir commented on pull request #1557: LUCENE-9396: Improve truncation detection for points.

2020-06-12 Thread GitBox


rmuir commented on pull request #1557:
URL: https://github.com/apache/lucene-solr/pull/1557#issuecomment-643283120


   I like the new organization by having an optional parameter to `retrieve`. 
But I think the exception handling can be simpler here? I think it would be 
better to check the file's length and if its wrong throw exception, then call 
retrieveChecksum? Since we know the length beforehand, there isn't sense IMO in 
doing suppressed dances here.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14534) Investigate cleaning up any remaining warnings in 8x

2020-06-12 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134222#comment-17134222
 ] 

Erick Erickson commented on SOLR-14534:
---

Thanks for the info. So far I've been OK backporting things, but that's just 
because I'm just adding annotations rather than changing anything substantive.

I put "investigate" in the title specifically because I don't want to get into 
much surgery on 8x, I'll be content if we clean up 9.0 and get better from 
there.

Thanks again for dealing with Lucene, as of now there are _no_ warnings in the 
Lucene compilation, but do note that deprecation warnings are disabled, but 
that's another topic...

> Investigate cleaning up any remaining warnings in 8x
> 
>
> Key: SOLR-14534
> URL: https://issues.apache.org/jira/browse/SOLR-14534
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> There will be some divergence between master and 8x. The current pattern is
> 1> clean up warnings in master
> 2> backport to 8x and insure all tests etc run.
> Conspicuously missing is compiling under 8x and insuring that there are no 
> warnings in the cleaned code.
> I'm not sure I really will do this if it turns out there are a lot of them. 
> It's good enough that master is (and stay) clean IMO. OTOH, if it only takes 
> a short time. Won't be able to tell until we get code clean.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14541) Ensure classes that implement equals implement hashCode or suppress warnings

2020-06-12 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134216#comment-17134216
 ] 

Erick Erickson commented on SOLR-14541:
---

These classes have SuppressWarnings that should be removed:

./master/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/CloudSolrStream.java
./master/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/DeepRandomStream.java
./master/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionValue.java
./master/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpression.java
./master/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionNamedParameter.java

> Ensure classes that implement equals implement hashCode or suppress warnings
> 
>
> Key: SOLR-14541
> URL: https://issues.apache.org/jira/browse/SOLR-14541
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: 0001-SOLR-14541-add-hashCode-for-some-classes.patch, 
> 0002-SOLR-14541-add-hashCode-for-some-classes-in-autoscal.patch, 
> 0003-SOLR-14541-add-hashCode-or-remove-equals-for-some-cl.patch
>
>
> While looking at warnings, I found that the following classes generate this 
> warning:
> *overrides equals, but neither it nor any superclass overrides hashCode 
> method*
> I can suppress the warning, but this has been a source of errors in the past 
> so I'm reluctant to just do that blindly.
> NOTE: The Lucene one should probably be it's own Jira if it's going to have 
> hashCode implemented, but here for triage.
> What I need for each method is for someone who has a clue about that 
> particular code to render an opinion that we can safely suppress the warning 
> or to provide a hashCode method.
> Some of these have been here for a very long time and were implemented by 
> people no longer active...
> lucene/suggest/src/java/org/apache/lucene/search/spell/LuceneLevenshteinDistance.java:39
> solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java:34
>  solr/solrj/src/java/org/apache/solr/common/cloud/Replica.java:26
>  solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java:49
> solr/core/src/java/org/apache/solr/cloud/rule/Rule.java:277
>  solr/core/src/java/org/apache/solr/pkg/PackageAPI.java:177
>  solr/core/src/java/org/apache/solr/packagemanager/SolrPackageInstance.java:31
>  
> Noble Paul says it's OK to suppress warnings for these:
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/VersionedData.java:31
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:61
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:150
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:252
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:45
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Policy.java:73
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Preference.java:32
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/ReplicaInfo.java:39
>  
> Joel Bernstein says it's OK to suppress warnings for these:
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/ReplicaCount.java:27
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpression.java:25
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionNamedParameter.java:23
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/CloudSolrStream.java:467
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/DeepRandomStream.java:417
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionValue.java:22
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14534) Investigate cleaning up any remaining warnings in 8x

2020-06-12 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134213#comment-17134213
 ] 

Michael Sokolov commented on SOLR-14534:


I decided not to backport LUCENE-9394, cleaning up warnings in lucene/. I was 
concerned mostly about the change of raw {{Map}} to {{Map}} in 
{{ValueSource}} being not backwards-compatible. Maybe it is not such a big 
change, but consumers would potentially have to add casts.

> Investigate cleaning up any remaining warnings in 8x
> 
>
> Key: SOLR-14534
> URL: https://issues.apache.org/jira/browse/SOLR-14534
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> There will be some divergence between master and 8x. The current pattern is
> 1> clean up warnings in master
> 2> backport to 8x and insure all tests etc run.
> Conspicuously missing is compiling under 8x and insuring that there are no 
> warnings in the cleaned code.
> I'm not sure I really will do this if it turns out there are a lot of them. 
> It's good enough that master is (and stay) clean IMO. OTOH, if it only takes 
> a short time. Won't be able to tell until we get code clean.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.

2020-06-12 Thread GitBox


mikemccand commented on pull request #1543:
URL: https://github.com/apache/lucene-solr/pull/1543#issuecomment-643263358


   > > @gandhi-viral That would work for me but I'd like to make sure we're 
talking about the same thing:
   > > 
   > > * Lucene86DocValuesConsumer gets a ctor argument to configure the 
threshold.
   > > * Lucene86DocValuesFormat keeps 32 as a default value.
   > > * You would create your own DocValuesFormat that would reuse 
Lucene86DocValuesProducer and create a Lucene86DocValuesConsumer with a high 
threshold for compression of binary values.
   > > * You would enable this format by overriding getDocValueFormatForField 
in Lucene86Codec.
   > > * This would mean that your indices would no longer have backward 
compatibility guarantees of the default codec (N-1) but maybe you don't care 
since you're re-building your indices from scratch on a regular basis?
   > 
   > Yes, that's what I had in mind too. Currently, we are doing similar thing 
after `8.5.1` upgrade to keep using forked BDVs from `8.4`.
   > 
   > You are right about backward compatibility guarantees not being an issue 
for our use-case since we do re-build our indices on each software deployment.
   
   Hmm, could we add the parameter also to `Lucene86DocValuesFormat`, which 
would forward that to `Lucene86DocValuesConsumer`?  This would allow users to 
keep back-compat (same SPI-named DocValuesFormat).
   
   It is true that for us (Amazon product search) in particular it would be OK 
to forego backwards compatibility, but I think we shouldn't push that on others 
who might want to customize this / make their own Codec?
   
   At read time (`Lucene86DocValuesProducer`) the constant isn't used, right?  
It was built into the index at segment-write time.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14565) Fix or suppress warnings in solrj/impl and solrj/io/graph

2020-06-12 Thread Erick Erickson (Jira)
Erick Erickson created SOLR-14565:
-

 Summary: Fix or suppress warnings in solrj/impl and solrj/io/graph
 Key: SOLR-14565
 URL: https://issues.apache.org/jira/browse/SOLR-14565
 Project: Solr
  Issue Type: Sub-task
Reporter: Erick Erickson
Assignee: Erick Erickson


The overhead of individual directories is kind of a pain when there aren't very 
many warnings each, so I'll do these two together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14564) Fix or suppress remaining warnings in solr/core

2020-06-12 Thread Erick Erickson (Jira)
Erick Erickson created SOLR-14564:
-

 Summary: Fix or suppress remaining warnings in solr/core
 Key: SOLR-14564
 URL: https://issues.apache.org/jira/browse/SOLR-14564
 Project: Solr
  Issue Type: Sub-task
 Environment: It's getting to the point where the overhead of cleaning 
up individual directories is getting to be a pain. So this will be 2-3 commits 
of fixes in whatever order I find them when compiling in IntelliJ.
Reporter: Erick Erickson
Assignee: Erick Erickson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14534) Investigate cleaning up any remaining warnings in 8x

2020-06-12 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-14534:
-

Assignee: Erick Erickson

> Investigate cleaning up any remaining warnings in 8x
> 
>
> Key: SOLR-14534
> URL: https://issues.apache.org/jira/browse/SOLR-14534
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> There will be some divergence between master and 8x. The current pattern is
> 1> clean up warnings in master
> 2> backport to 8x and insure all tests etc run.
> Conspicuously missing is compiling under 8x and insuring that there are no 
> warnings in the cleaned code.
> I'm not sure I really will do this if it turns out there are a lot of them. 
> It's good enough that master is (and stay) clean IMO. OTOH, if it only takes 
> a short time. Won't be able to tell until we get code clean.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9389) Enhance gradle logging calls validation: eliminate getMessage()

2020-06-12 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134165#comment-17134165
 ] 

Erick Erickson commented on LUCENE-9389:


This looks good, anyone object to me pushing it?

> Enhance gradle logging calls validation: eliminate getMessage()
> ---
>
> Key: LUCENE-9389
> URL: https://issues.apache.org/jira/browse/LUCENE-9389
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andras Salamon
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-9389.patch
>
>
> SOLR-14280 fixed a logging problem in SolrConfig by removing a few 
> getMessage() calls. We could enhance this solution by modifying gradle's 
> logging calls validation and forbid getMessage() calls during logging. We 
> should check the existing code and eliminate such calls.
> It is possible to suppress the warning using {{//logok}}.
> [~erickerickson] [~gerlowskija]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-10810) Examine precommit lint WARNINGs in non-test code

2020-06-12 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-10810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-10810.
---
Resolution: Duplicate

This is all being handled in other JIRAs, particularly SOLR-10778 this one's 
obsolete,

> Examine precommit lint WARNINGs in non-test code
> 
>
> Key: SOLR-10810
> URL: https://issues.apache.org/jira/browse/SOLR-10810
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Se SOLR-10778. We're examining the benefits/risks of eliminating lint 
> WARNINGs in the code base. Once the test cases are cleaned up we'll have a 
> manageable level of items to look for in the base code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14563) Fix or suppress warnings in solr/contrib

2020-06-12 Thread Erick Erickson (Jira)
Erick Erickson created SOLR-14563:
-

 Summary: Fix or suppress warnings in solr/contrib
 Key: SOLR-14563
 URL: https://issues.apache.org/jira/browse/SOLR-14563
 Project: Solr
  Issue Type: Sub-task
Reporter: Erick Erickson
Assignee: Erick Erickson


There aren't very many in any of these individual directories, so I'll do them 
all at once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9394) Fix or suppress compile-time warnings

2020-06-12 Thread Michael Sokolov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Sokolov updated LUCENE-9394:

Fix Version/s: master (9.0)

> Fix or suppress compile-time warnings
> -
>
> Key: LUCENE-9394
> URL: https://issues.apache.org/jira/browse/LUCENE-9394
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a spinoff from [~erickerickson]'s efforts over in  SOLR-10778 
> The goal is a warning-free compilation, followed by enforcement of build 
> failure on warnings, with the idea of suppressing innocuous warnings to the 
> extent that the remaining warnings be treated as build failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9394) Fix or suppress compile-time warnings

2020-06-12 Thread Michael Sokolov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Sokolov resolved LUCENE-9394.
-
  Assignee: Michael Sokolov
Resolution: Fixed

> Fix or suppress compile-time warnings
> -
>
> Key: LUCENE-9394
> URL: https://issues.apache.org/jira/browse/LUCENE-9394
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Assignee: Michael Sokolov
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a spinoff from [~erickerickson]'s efforts over in  SOLR-10778 
> The goal is a warning-free compilation, followed by enforcement of build 
> failure on warnings, with the idea of suppressing innocuous warnings to the 
> extent that the remaining warnings be treated as build failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9394) Fix or suppress compile-time warnings

2020-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134137#comment-17134137
 ] 

ASF subversion and git services commented on LUCENE-9394:
-

Commit 26075fc1dc06766a9d2af8bd5dd14243c0463a6b in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=26075fc ]

LUCENE-9394: fix and suppress warnings (#1563)

* LUCENE-9394: fix and suppress warnings in lucene/*
* Change type of ValuesSource context from raw Map to Map

> Fix or suppress compile-time warnings
> -
>
> Key: LUCENE-9394
> URL: https://issues.apache.org/jira/browse/LUCENE-9394
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a spinoff from [~erickerickson]'s efforts over in  SOLR-10778 
> The goal is a warning-free compilation, followed by enforcement of build 
> failure on warnings, with the idea of suppressing innocuous warnings to the 
> extent that the remaining warnings be treated as build failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9394) Fix or suppress compile-time warnings

2020-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134138#comment-17134138
 ] 

ASF subversion and git services commented on LUCENE-9394:
-

Commit 26075fc1dc06766a9d2af8bd5dd14243c0463a6b in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=26075fc ]

LUCENE-9394: fix and suppress warnings (#1563)

* LUCENE-9394: fix and suppress warnings in lucene/*
* Change type of ValuesSource context from raw Map to Map

> Fix or suppress compile-time warnings
> -
>
> Key: LUCENE-9394
> URL: https://issues.apache.org/jira/browse/LUCENE-9394
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a spinoff from [~erickerickson]'s efforts over in  SOLR-10778 
> The goal is a warning-free compilation, followed by enforcement of build 
> failure on warnings, with the idea of suppressing innocuous warnings to the 
> extent that the remaining warnings be treated as build failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov merged pull request #1563: LUCENE-9394: fix and suppress warnings

2020-06-12 Thread GitBox


msokolov merged pull request #1563:
URL: https://github.com/apache/lucene-solr/pull/1563


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9389) Enhance gradle logging calls validation: eliminate getMessage()

2020-06-12 Thread Andras Salamon (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134132#comment-17134132
 ] 

Andras Salamon commented on LUCENE-9389:


All the listed lucene (luke) logs had the following format:
{noformat}
log.error(e.getMessage(), e); {noformat}
so I don't think that we really loose information here. I still created a patch 
which replaces the getMessage() calls with a more meaningful message.

> Enhance gradle logging calls validation: eliminate getMessage()
> ---
>
> Key: LUCENE-9389
> URL: https://issues.apache.org/jira/browse/LUCENE-9389
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andras Salamon
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-9389.patch
>
>
> SOLR-14280 fixed a logging problem in SolrConfig by removing a few 
> getMessage() calls. We could enhance this solution by modifying gradle's 
> logging calls validation and forbid getMessage() calls during logging. We 
> should check the existing code and eliminate such calls.
> It is possible to suppress the warning using {{//logok}}.
> [~erickerickson] [~gerlowskija]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9389) Enhance gradle logging calls validation: eliminate getMessage()

2020-06-12 Thread Andras Salamon (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Salamon updated LUCENE-9389:
---
Status: Patch Available  (was: Open)

> Enhance gradle logging calls validation: eliminate getMessage()
> ---
>
> Key: LUCENE-9389
> URL: https://issues.apache.org/jira/browse/LUCENE-9389
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andras Salamon
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-9389.patch
>
>
> SOLR-14280 fixed a logging problem in SolrConfig by removing a few 
> getMessage() calls. We could enhance this solution by modifying gradle's 
> logging calls validation and forbid getMessage() calls during logging. We 
> should check the existing code and eliminate such calls.
> It is possible to suppress the warning using {{//logok}}.
> [~erickerickson] [~gerlowskija]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9389) Enhance gradle logging calls validation: eliminate getMessage()

2020-06-12 Thread Andras Salamon (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Salamon updated LUCENE-9389:
---
Attachment: LUCENE-9389.patch

> Enhance gradle logging calls validation: eliminate getMessage()
> ---
>
> Key: LUCENE-9389
> URL: https://issues.apache.org/jira/browse/LUCENE-9389
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andras Salamon
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-9389.patch
>
>
> SOLR-14280 fixed a logging problem in SolrConfig by removing a few 
> getMessage() calls. We could enhance this solution by modifying gradle's 
> logging calls validation and forbid getMessage() calls during logging. We 
> should check the existing code and eliminate such calls.
> It is possible to suppress the warning using {{//logok}}.
> [~erickerickson] [~gerlowskija]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134048#comment-17134048
 ] 

ASF subversion and git services commented on LUCENE-9356:
-

Commit 21d08e4b725a4a28196eb36ed1e402f8e19d1be2 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=21d08e4 ]

LUCENE-9356: Disable test, some corruptions are still not detected as 
corruptions.


> Add tests for corruptions caused by byte flips
> --
>
> Key: LUCENE-9356
> URL: https://issues.apache.org/jira/browse/LUCENE-9356
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We already have tests that file truncation and modification of the index 
> headers are caught correctly. I'd like to add another test that flipping a 
> byte in a way that modifies the checksum of the file is always caught 
> gracefully by Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-06-12 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand reopened LUCENE-9356:
--

I reverted the test, there are still some cases that wouldn't cause a 
CorruptIndexException and that do not seem straightforward to fix, such as 
checking the suffix in the index header, which might throw an EOF. I'll think 
more about what can be done.

> Add tests for corruptions caused by byte flips
> --
>
> Key: LUCENE-9356
> URL: https://issues.apache.org/jira/browse/LUCENE-9356
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We already have tests that file truncation and modification of the index 
> headers are caught correctly. I'd like to add another test that flipping a 
> byte in a way that modifies the checksum of the file is always caught 
> gracefully by Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134007#comment-17134007
 ] 

ASF subversion and git services commented on LUCENE-9356:
-

Commit 38adf09ca2ba9620940fc279cc12760cc355a361 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=38adf09 ]

LUCENE-9356: Make FST throw the correct exception upon incorrect input type.


> Add tests for corruptions caused by byte flips
> --
>
> Key: LUCENE-9356
> URL: https://issues.apache.org/jira/browse/LUCENE-9356
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We already have tests that file truncation and modification of the index 
> headers are caught correctly. I'd like to add another test that flipping a 
> byte in a way that modifies the checksum of the file is always caught 
> gracefully by Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134008#comment-17134008
 ] 

ASF subversion and git services commented on LUCENE-9356:
-

Commit cf8f83cef95c03767f83602ff99345979dd0808b in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cf8f83c ]

LUCENE-9356: Disable test, some corruptions are still not detected as 
corruptions.


> Add tests for corruptions caused by byte flips
> --
>
> Key: LUCENE-9356
> URL: https://issues.apache.org/jira/browse/LUCENE-9356
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We already have tests that file truncation and modification of the index 
> headers are caught correctly. I'd like to add another test that flipping a 
> byte in a way that modifies the checksum of the file is always caught 
> gracefully by Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org