[jira] [Commented] (LUCENE-8574) ExpressionFunctionValues should cache per-hit value

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133940#comment-17133940
 ] 

ASF subversion and git services commented on LUCENE-8574:
-

Commit 2991acf8fffe9dbeda20c24479b108bfb8ea9257 in lucene-solr's branch 
refs/heads/master from Patrick Zhai
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2991acf ]

LUCENE-9391: Upgrade HPPC to 0.8.2 (#1560)

* LUCENE-8574: Upgrade HPPC to 0.8.2 (Co-authored-by: Haoyu Zhai 
)

> ExpressionFunctionValues should cache per-hit value
> ---
>
> Key: LUCENE-8574
> URL: https://issues.apache.org/jira/browse/LUCENE-8574
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.5, 8.0
>Reporter: Michael McCandless
>Assignee: Robert Muir
>Priority: Major
> Attachments: LUCENE-8574.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The original version of {{ExpressionFunctionValues}} had a simple per-hit 
> cache, so that nested expressions that reference the same common variable 
> would compute the value for that variable the first time it was referenced 
> and then use that cached value for all subsequent invocations, within one 
> hit.  I think it was accidentally removed in LUCENE-7609?
> This is quite important if you have non-trivial expressions that reference 
> the same variable multiple times.
> E.g. if I have these expressions:
> {noformat}
> x = c + d
> c = b + 2 
> d = b * 2{noformat}
> Then evaluating x should only cause b's value to be computed once (for a 
> given hit), but today it's computed twice.  The problem is combinatoric if b 
> then references another variable multiple times, etc.
> I think to fix this we just need to restore the per-hit cache?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9391) Upgrade to HPPC 0.8.2

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133939#comment-17133939
 ] 

ASF subversion and git services commented on LUCENE-9391:
-

Commit 2991acf8fffe9dbeda20c24479b108bfb8ea9257 in lucene-solr's branch 
refs/heads/master from Patrick Zhai
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2991acf ]

LUCENE-9391: Upgrade HPPC to 0.8.2 (#1560)

* LUCENE-8574: Upgrade HPPC to 0.8.2 (Co-authored-by: Haoyu Zhai 
)

> Upgrade to HPPC 0.8.2
> -
>
> Key: LUCENE-9391
> URL: https://issues.apache.org/jira/browse/LUCENE-9391
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Haoyu Zhai
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HPPC 0.8.2 is out and exposes an Accountable-like interface using to estimate 
> the memory usage.
> [https://issues.carrot2.org/secure/ReleaseNote.jspa?projectId=10070&version=13522&styleName=Text]
> We should upgrade to that if any of components using hppc need to estimate 
> memory better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss merged pull request #1560: LUCENE-9391: Upgrade HPPC to 0.8.2

2020-06-11 Thread GitBox


dweiss merged pull request #1560:
URL: https://github.com/apache/lucene-solr/pull/1560


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

2020-06-11 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133801#comment-17133801
 ] 

Chris M. Hostetter commented on SOLR-13132:
---

Hey Michael, I still haven't had a chance to dig into the recent commits, 
hopefully i can do that tomorrow, but in response to some of your comments 
here...

bq ... To support such a case, a shim is required because the code paths that 
do the actual count accumulation (in ByArrayUIF and ByArrayDV) used to directly 
increment processor.countAcc, and have now been switched to register counts via 
the SweepDocIterator and SweepDISI abstractions, ...

Right right right ... I'm really sorry, i keep forgetting: the changes in this 
issue to "support" sweeping as a concept affect the the low level impls of 
ByArrayUIF & ByArrayDV such that now they _only_ work by "sweeping" over the 
set defined by the SweepingCountSlotAcc – the only question (at run time) is 
whether that set comes from *just* the "base set" or if there are any other 
sets (provided by other CountSlotAccs, in turn provided by other 
SweepableSlotAcc) that they sweep over at the same time

So, with that in mind: please ignore/retract my earlier comments about being 
concerned about subclasses that don't want to sweep
 * If it simplifies the code, we can certainly assume/assert that any/all 
future hypothetical ByArrayUIF & ByArrayDV subclasses *must* support sweeping & 
use a SweepingCountSlotAcc
 ** provided we make sure to spell that out in the javadocs.
 * It would be _nice_ to keep the changes to FacetFieldProcessorByArray to a 
minimum, and say "FacetFieldProcessorByArray will/should not assume all 
subclasses can sweep" – but there's no reason we _have_ to
 ** *_If_* the code would be a lot simpler to say "all current 
FacetFieldProcessorByArray subclasses use sweeping, so we're going to document 
that from now on any additional future subclasses of FacetFieldProcessorByArray 
use sweeping and assert/assume that in the common FacetFieldProcessorByArray 
code" then we can certainly do that
 *** ie: would that allow us to remove the Shim?
 *** ie: would it allow us to refactor/merge the Shim impl into the 
DEV_NULL_COUNT_ACC impl? (IIRC the refinement code path that uses the DEV_NULL 
countAcc is in FacetFieldProcessorByArray ... correct?)

So my question to you is: What do you think? Do you think there are 
simplification gains to be had if we add assertions & assumptions about these 
classes always using SweepingCountSlotAcc? 

 

{quote}I've thought a bit more about the question of how to detect the 
allBuckets slot for disabling allBuckets relatedness: I don't really have any 
good answers, but a handful of thoughts: ...
{quote} # i don't think adding SlotContext to the setValues(...) API would work 
in general because in practice there's no guarantee Processors will have a 
valid SlotContext at that point in time (i'm thinking of per-segment DVs that 
use the soltNum as an ord lookup, or TermEnum that just returns the "current" 
term)
 # i do think the "papa-bear" approach would work well in the long run (both in 
terms of being a clean/consistent API and being useful for this particular 
problem), but i'm still not convinced it's worth the hassle at this point since 
we really only have this one useage where it matters
 # considering how much of an "edge case on an edge case" we're talking about, 
you're current "hack" is growing on me, provided we add some more conditional 
logic to protect against the possibility of a ClassCastEx if anyone ever adds 
"sweep" support to some other processor (ie: FacetQueryProcessor)

...either way: I'd suggest we punt for now and worry about all the other 
nocommits before worrying about the the "which slot is allBuckets when 
sweeping?" nocommit.

> Improve JSON "terms" facet performance when sorted by relatedness 
> --
>
> Key: SOLR-13132
> URL: https://issues.apache.org/jira/browse/SOLR-13132
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module
>Affects Versions: 7.4, master (9.0)
>Reporter: Michael Gibney
>Priority: Major
> Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calcul

[GitHub] [lucene-solr] janhoy commented on pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated

2020-06-11 Thread GitBox


janhoy commented on pull request #1572:
URL: https://github.com/apache/lucene-solr/pull/1572#issuecomment-642990676


   I think the only thing I'm lacking is a real integration test. I validated 
manually that core creation fails in `/tmp`, and that setting 
`-Dsolr.allowPaths=/tmp` allows it:
   https://user-images.githubusercontent.com/409128/84450542-b9a23d00-ac50-11ea-9ff1-253f9139685e.png";>
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14561) Validate parameters to CoreAdminAPI

2020-06-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133782#comment-17133782
 ] 

Jan Høydahl commented on SOLR-14561:


First PR ready, see [https://github.com/apache/lucene-solr/pull/1572] 

> Validate parameters to CoreAdminAPI
> ---
>
> Key: SOLR-14561
> URL: https://issues.apache.org/jira/browse/SOLR-14561
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CoreAdminAPI does not validate parameter input. We should limit what users 
> can specify for at least {{instanceDir and dataDir}} params, perhaps restrict 
> them to be relative to SOLR_HOME or SOLR_DATA_HOME.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14561) Validate parameters to CoreAdminAPI

2020-06-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-14561:
---
Description: CoreAdminAPI does not validate parameter input. We should 
limit what users can specify for at least {{instanceDir and dataDir}} params, 
perhaps restrict them to be relative to SOLR_HOME or SOLR_DATA_HOME.  (was: 
CoreAdminAPI does not validate parameter input. We should limit what users can 
specify for at least {{instanceDir }}and {{dataDir}} params, perhaps restrict 
them to be relative to SOLR_HOME or SOLR_DATA_HOME.)

> Validate parameters to CoreAdminAPI
> ---
>
> Key: SOLR-14561
> URL: https://issues.apache.org/jira/browse/SOLR-14561
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CoreAdminAPI does not validate parameter input. We should limit what users 
> can specify for at least {{instanceDir and dataDir}} params, perhaps restrict 
> them to be relative to SOLR_HOME or SOLR_DATA_HOME.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14561) Validate parameters to CoreAdminAPI

2020-06-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-14561:
--

Assignee: Jan Høydahl

> Validate parameters to CoreAdminAPI
> ---
>
> Key: SOLR-14561
> URL: https://issues.apache.org/jira/browse/SOLR-14561
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CoreAdminAPI does not validate parameter input. We should limit what users 
> can specify for at least {{instanceDir }}and {{dataDir}} params, perhaps 
> restrict them to be relative to SOLR_HOME or SOLR_DATA_HOME.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy opened a new pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated

2020-06-11 Thread GitBox


janhoy opened a new pull request #1572:
URL: https://github.com/apache/lucene-solr/pull/1572


   See https://issues.apache.org/jira/browse/SOLR-14561
   
   The `instanceDir` and `dataDir` params must now be relative to either 
`SOLR_HOME`, `SOLR_DATA_HOME` or `coreRootDir`.
   
   Added new solr.xml config 'allowPaths', controlled by system property 
'solr.allowPaths' that allows you to add other allowed paths when needed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14559) Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api

2020-06-11 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14559.
---
Fix Version/s: 8.6
   Resolution: Fixed

> Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, 
> response, cloud, security, schema, api
> ---
>
> Key: SOLR-14559
> URL: https://issues.apache.org/jira/browse/SOLR-14559
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.6
>
>
> There's considerable overhead in testing and precommit, so fixing up one 
> directory at a time is getting tedious as there are fewer and fewer warnings 
> in particular directories. This set will fix about half the remaining 
> warnings outside of solrj, 300 or so. Then one more Jira will fix the 
> remaining warnings in Solr (exclusive of SolrJ).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14559) Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133732#comment-17133732
 ] 

ASF subversion and git services commented on SOLR-14559:


Commit 01f6cd3a84ef6a002f0f7ae1129bd74cdc2f5c01 in lucene-solr's branch 
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=01f6cd3 ]

SOLR-14559: Fix or suppress warnings in 
solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api


> Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, 
> response, cloud, security, schema, api
> ---
>
> Key: SOLR-14559
> URL: https://issues.apache.org/jira/browse/SOLR-14559
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> There's considerable overhead in testing and precommit, so fixing up one 
> directory at a time is getting tedious as there are fewer and fewer warnings 
> in particular directories. This set will fix about half the remaining 
> warnings outside of solrj, 300 or so. Then one more Jira will fix the 
> remaining warnings in Solr (exclusive of SolrJ).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14559) Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133731#comment-17133731
 ] 

ASF subversion and git services commented on SOLR-14559:


Commit ff391448d1648c4027133c58248bf7f1aabe5d96 in lucene-solr's branch 
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ff39144 ]

SOLR-14559: Fix or suppress warnings in 
solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api


> Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, 
> response, cloud, security, schema, api
> ---
>
> Key: SOLR-14559
> URL: https://issues.apache.org/jira/browse/SOLR-14559
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> There's considerable overhead in testing and precommit, so fixing up one 
> directory at a time is getting tedious as there are fewer and fewer warnings 
> in particular directories. This set will fix about half the remaining 
> warnings outside of solrj, 300 or so. Then one more Jira will fix the 
> remaining warnings in Solr (exclusive of SolrJ).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8392) SolrParam.get(String) returns String and shouldn't be used in other instanceof checks

2020-06-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133721#comment-17133721
 ] 

David Smiley commented on SOLR-8392:


Thanks Mike!  I like the assertions.

I noticed a special case on the empty string that has me scratching my head 
(and yours too I see with the appropriate addition of the comment).  Git blame 
points to [~noble.paul]  see 
https://github.com/apache/lucene-solr/blob/fb98f30a61f929326105718d2d284d761ac1b6e3/solr/core/src/java/org/apache/solr/core/RequestParams.java#L91
   What is that about?  We we copy the array for values when the key is 
non-empty, shouldn't we do the same when the key is empty?

BTW as a small optimization, we might not copy the values array if the size is 
zero.  I'm not sure if that would happen in practice though.

> SolrParam.get(String) returns String and shouldn't be used in other 
> instanceof checks
> -
>
> Key: SOLR-8392
> URL: https://issues.apache.org/jira/browse/SOLR-8392
> Project: Solr
>  Issue Type: Bug
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-8392.patch, SOLR-8392.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There's a couple of places where we declare the return type of 
> solrParams.get() as an Object and then do instanceof checks for other types. 
> Since we know it will be a String, we can simplify this logic in several 
> places.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1546: SOLR: Use absolute paths for server paths.

2020-06-11 Thread GitBox


dsmiley commented on a change in pull request #1546:
URL: https://github.com/apache/lucene-solr/pull/1546#discussion_r439083930



##
File path: solr/core/src/java/org/apache/solr/core/CoreDescriptor.java
##
@@ -182,7 +182,7 @@ public CoreDescriptor(String coreName, CoreDescriptor 
other) {
*/
   public CoreDescriptor(String name, Path instanceDir, Map 
coreProps,
 Properties containerProperties, ZkController 
zkController) {
-this.instanceDir = instanceDir;
+this.instanceDir = instanceDir.toAbsolutePath();

Review comment:
   That makes sense; thanks.  I see that all callers send absolute paths 
already.  So I think I should change this to throw an exception if it isn't 
absolute to protect us from mistakes.
   
   I'll push a new commit here for review.  I've never done that for an already 
merged PR; we'll see how it goes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gandhi-viral commented on pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.

2020-06-11 Thread GitBox


gandhi-viral commented on pull request #1543:
URL: https://github.com/apache/lucene-solr/pull/1543#issuecomment-642931662


   > @gandhi-viral That would work for me but I'd like to make sure we're 
talking about the same thing:
   > 
   > * Lucene86DocValuesConsumer gets a ctor argument to configure the 
threshold.
   > * Lucene86DocValuesFormat keeps 32 as a default value.
   > * You would create your own DocValuesFormat that would reuse 
Lucene86DocValuesProducer and create a Lucene86DocValuesConsumer with a high 
threshold for compression of binary values.
   > * You would enable this format by overriding getDocValueFormatForField in 
Lucene86Codec.
   > * This would mean that your indices would no longer have backward 
compatibility guarantees of the default codec (N-1) but maybe you don't care 
since you're re-building your indices from scratch on a regular basis?
   
   Yes, that's what I had in mind too. Currently, we are doing similar thing 
after `8.5.1` upgrade to keep using forked BDVs from `8.4`. 
   
   You are right about backward compatibility guarantees not being an issue for 
our use-case since we do re-build our indices on each software deployment.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on a change in pull request #1546: SOLR: Use absolute paths for server paths.

2020-06-11 Thread GitBox


janhoy commented on a change in pull request #1546:
URL: https://github.com/apache/lucene-solr/pull/1546#discussion_r439060787



##
File path: solr/core/src/java/org/apache/solr/core/CoreDescriptor.java
##
@@ -182,7 +182,7 @@ public CoreDescriptor(String coreName, CoreDescriptor 
other) {
*/
   public CoreDescriptor(String name, Path instanceDir, Map 
coreProps,
 Properties containerProperties, ZkController 
zkController) {
-this.instanceDir = instanceDir;
+this.instanceDir = instanceDir.toAbsolutePath();

Review comment:
   A bit late, but I don't think this is necessary, as all callers will 
send absolute paths. And if you ever get a relative path, resolving it with 
`toAbsolutePath()` leads to it being relative to whatever CWD the app is 
started with, while the typical resolving of relative `instanceDir` is to 
resolve it relative to CoreContainer#coreRootDirectory.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on pull request #1563: LUCENE-9394: fix and suppress warnings

2020-06-11 Thread GitBox


ErickErickson commented on pull request #1563:
URL: https://github.com/apache/lucene-solr/pull/1563#issuecomment-642915566


   speaking from experience, when dealing with this many changes in a big 
wodge, it’s _very easy_ to have some things slip through ;)
   
   > On Jun 11, 2020, at 3:34 PM, Michael Sokolov  
wrote:
   > 
   > 
   > Thanks for the comments, @madrob, I posted a new PR addressing them. I'm 
not sure how I missed all that unused code in RandomizedShapeTestCase - it's 
pretty bare now!
   > 
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   > 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.

2020-06-11 Thread GitBox


jpountz commented on a change in pull request #1543:
URL: https://github.com/apache/lucene-solr/pull/1543#discussion_r439048284



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
##
@@ -404,32 +406,51 @@ private void flushData() throws IOException {
 // Write offset to this block to temporary offsets file
 totalChunks++;
 long thisBlockStartPointer = data.getFilePointer();
-
-// Optimisation - check if all lengths are same

Review comment:
   If all docs are the same length, then `numBytes`would be 0 below and we 
only encode the average length, so this case is still optimized.

##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
##
@@ -762,6 +764,97 @@ public BytesRef binaryValue() throws IOException {
   // Decompresses blocks of binary values to retrieve content
   class BinaryDecoder {
 
+private final LongValues addresses;
+private final IndexInput compressedData;
+// Cache of last uncompressed block 
+private long lastBlockId = -1;
+private final ByteBuffer deltas;
+private int numBytes;
+private int uncompressedBlockLength;  
+private int avgLength;
+private final byte[] uncompressedBlock;
+private final BytesRef uncompressedBytesRef;
+private final int docsPerChunk;
+private final int docsPerChunkShift;
+
+public BinaryDecoder(LongValues addresses, IndexInput compressedData, int 
biggestUncompressedBlockSize, int docsPerChunkShift) {
+  super();
+  this.addresses = addresses;
+  this.compressedData = compressedData;
+  // pre-allocate a byte array large enough for the biggest uncompressed 
block needed.
+  this.uncompressedBlock = new byte[biggestUncompressedBlockSize];
+  uncompressedBytesRef = new BytesRef(uncompressedBlock);
+  this.docsPerChunk = 1 << docsPerChunkShift;
+  this.docsPerChunkShift = docsPerChunkShift;
+  deltas = ByteBuffer.allocate((docsPerChunk + 1) * Integer.BYTES);
+  deltas.order(ByteOrder.LITTLE_ENDIAN);
+}
+
+private void decodeBlock(int blockId) throws IOException {
+  long blockStartOffset = addresses.get(blockId);
+  compressedData.seek(blockStartOffset);
+
+  final long token = compressedData.readVLong();
+  uncompressedBlockLength = (int) (token >>> 4);
+  avgLength = uncompressedBlockLength >>> docsPerChunkShift;
+  numBytes = (int) (token & 0x0f);
+  switch (numBytes) {
+case Integer.BYTES:
+  deltas.putInt(0, (int) 0);
+  compressedData.readBytes(deltas.array(), Integer.BYTES, docsPerChunk 
* Integer.BYTES);
+  break;
+case Byte.BYTES:
+  compressedData.readBytes(deltas.array(), Byte.BYTES, docsPerChunk * 
Byte.BYTES);
+  break;
+case 0:
+  break;
+default:
+  throw new CorruptIndexException("Invalid number of bytes: " + 
numBytes, compressedData);
+  }
+
+  if (uncompressedBlockLength == 0) {
+uncompressedBytesRef.offset = 0;
+uncompressedBytesRef.length = 0;
+  } else {
+assert uncompressedBlockLength <= uncompressedBlock.length;
+LZ4.decompress(compressedData, uncompressedBlockLength, 
uncompressedBlock);
+  }
+}
+
+BytesRef decode(int docNumber) throws IOException {
+  int blockId = docNumber >> docsPerChunkShift; 
+  int docInBlockId = docNumber % docsPerChunk;
+  assert docInBlockId < docsPerChunk;
+  
+  
+  // already read and uncompressed?
+  if (blockId != lastBlockId) {
+decodeBlock(blockId);
+lastBlockId = blockId;
+  }
+
+  int startDelta = 0, endDelta = 0;
+  switch (numBytes) {
+case Integer.BYTES:
+  startDelta = deltas.getInt(docInBlockId * Integer.BYTES);
+  endDelta = deltas.getInt((docInBlockId + 1) * Integer.BYTES);

Review comment:
   The trick I'm using is that I'm reading 32 values starting at offset 1. 
This helps avoid a condition for the first value of the block, but we're still 
writing/reading only 32 values.

##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
##
@@ -404,32 +406,51 @@ private void flushData() throws IOException {
 // Write offset to this block to temporary offsets file
 totalChunks++;
 long thisBlockStartPointer = data.getFilePointer();
-
-// Optimisation - check if all lengths are same
-boolean allLengthsSame = true;
-for (int i = 1; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; i++) {
-  if (docLengths[i] != docLengths[i-1]) {
-allLengthsSame = false;
+
+final int avgLength = uncompressedBlockLength >>> 
Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT;
+int offset = 0;
+// Turn docLengths into deltas from expe

[GitHub] [lucene-solr] gandhi-viral commented on pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.

2020-06-11 Thread GitBox


gandhi-viral commented on pull request #1543:
URL: https://github.com/apache/lucene-solr/pull/1543#issuecomment-642889243


   Red-line QPS (throughput) based on our internal benchmarking is still 
unfortunately suffering (-49%) with the latest PR.
   
   We were able to isolate one particular field, a ~90 byte on average metadata 
field, which is causing most of our regression. After disabling compression on 
that particular field, we are at -8% red-line QPS compared to using Lucene 8.4 
BDVs. Looking further into the access pattern for that field, we see that 
(num_access / num_blocks_decompressed = 1.51), so we are decompressing a whole 
block per every ~1.5 hits.
   
   By temporarily using `BINARY_LENGTH_COMPRESSION_THRESHOLD = 1` to 
effectively disable the LZ4 compression, we are at -2% red-line QPS, which we 
could live with. Could we maybe add an option to the 
`Lucene80DocValuesConsumer` constructor to disable compression for 
BinaryDocValues, or to control the 32 byte threshold?  We could enable this 
compression by default, since it’s clearly helpful in many cases from the 
`luceneutil` benchmarks, but let expert users create their custom Codec to 
control it.
   
   Thank you @jpountz for your help. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on pull request #1563: LUCENE-9394: fix and suppress warnings

2020-06-11 Thread GitBox


msokolov commented on pull request #1563:
URL: https://github.com/apache/lucene-solr/pull/1563#issuecomment-642886971


   Thanks for the comments, @madrob, I posted a new PR addressing them. I'm not 
sure how I missed all that unused code in RandomizedShapeTestCase - it's pretty 
bare now!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8392) SolrParam.get(String) returns String and shouldn't be used in other instanceof checks

2020-06-11 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133616#comment-17133616
 ] 

Mike Drob commented on SOLR-8392:
-

Hopefully that fixes this, but if it doesn't then we should at least get a good 
idea of the failures that we can see.

> SolrParam.get(String) returns String and shouldn't be used in other 
> instanceof checks
> -
>
> Key: SOLR-8392
> URL: https://issues.apache.org/jira/browse/SOLR-8392
> Project: Solr
>  Issue Type: Bug
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-8392.patch, SOLR-8392.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There's a couple of places where we declare the return type of 
> solrParams.get() as an Object and then do instanceof checks for other types. 
> Since we know it will be a String, we can simplify this logic in several 
> places.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8392) SolrParam.get(String) returns String and shouldn't be used in other instanceof checks

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133615#comment-17133615
 ] 

ASF subversion and git services commented on SOLR-8392:
---

Commit fb98f30a61f929326105718d2d284d761ac1b6e3 in lucene-solr's branch 
refs/heads/master from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fb98f30 ]

SOLR-8392 type safety on SolrParam (#1556)



> SolrParam.get(String) returns String and shouldn't be used in other 
> instanceof checks
> -
>
> Key: SOLR-8392
> URL: https://issues.apache.org/jira/browse/SOLR-8392
> Project: Solr
>  Issue Type: Bug
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: 7.0
>
> Attachments: SOLR-8392.patch, SOLR-8392.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There's a couple of places where we declare the return type of 
> solrParams.get() as an Object and then do instanceof checks for other types. 
> Since we know it will be a String, we can simplify this logic in several 
> places.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-8392) SolrParam.get(String) returns String and shouldn't be used in other instanceof checks

2020-06-11 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved SOLR-8392.
-
Fix Version/s: (was: 7.0)
   master (9.0)
   Resolution: Fixed

> SolrParam.get(String) returns String and shouldn't be used in other 
> instanceof checks
> -
>
> Key: SOLR-8392
> URL: https://issues.apache.org/jira/browse/SOLR-8392
> Project: Solr
>  Issue Type: Bug
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-8392.patch, SOLR-8392.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There's a couple of places where we declare the return type of 
> solrParams.get() as an Object and then do instanceof checks for other types. 
> Since we know it will be a String, we can simplify this logic in several 
> places.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob merged pull request #1556: SOLR-8392 type safety on SolrParam

2020-06-11 Thread GitBox


madrob merged pull request #1556:
URL: https://github.com/apache/lucene-solr/pull/1556


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14561) Validate parameters to CoreAdminAPI

2020-06-11 Thread Jira
Jan Høydahl created SOLR-14561:
--

 Summary: Validate parameters to CoreAdminAPI
 Key: SOLR-14561
 URL: https://issues.apache.org/jira/browse/SOLR-14561
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Jan Høydahl


CoreAdminAPI does not validate parameter input. We should limit what users can 
specify for at least {{instanceDir }}and {{dataDir}} params, perhaps restrict 
them to be relative to SOLR_HOME or SOLR_DATA_HOME.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

2020-06-11 Thread GitBox


madrob commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r439010217



##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##
@@ -253,20 +277,22 @@ public void run() {
 continue;
   }
 
-  blockedTasks.clear(); // clear it now; may get refilled below.
+  // clear the blocked tasks, may get refilled below. Given 
blockedTasks can only get entries from heads and heads
+  // has at most MAX_BLOCKED_TASKS tasks, blockedTasks will never 
exceed MAX_BLOCKED_TASKS entries.
+  // Note blockedTasks can't be cleared too early as it is used in the 
excludedTasks Predicate above.
+  blockedTasks.clear();
+
+  // Trigger the creation of a new Session used for locking when/if a 
lock is later acquired on the OverseerCollectionMessageHandler
+  batchSessionId++;
 
-  taskBatch.batchId++;
   boolean tooManyTasks = false;
   for (QueueEvent head : heads) {
 if (!tooManyTasks) {
-  synchronized (runningTasks) {
 tooManyTasks = runningTasksSize() >= MAX_PARALLEL_TASKS;
-  }
 }
 if (tooManyTasks) {
   // Too many tasks are running, just shove the rest into the 
"blocked" queue.
-  if(blockedTasks.size() < MAX_BLOCKED_TASKS)
-blockedTasks.put(head.getId(), head);
+  blockedTasks.put(head.getId(), head);

Review comment:
   Ah, ok, I saw that but then missed the connection by the time I got to 
this method.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

2020-06-11 Thread GitBox


murblanc commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r439009885



##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##
@@ -95,16 +95,25 @@
 
   private volatile Stats stats;
 
-  // Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
-  // It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
-  // deleted from the work-queue as that is a batched operation.
+  /**
+   * Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
+   * It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
+   * deleted from the work-queue as that is a batched operation.
+   */
   final private Set runningZKTasks;
-  // This map may contain tasks which are read from work queue but could not
-  // be executed because they are blocked or the execution queue is full
-  // This is an optimization to ensure that we do not read the same tasks
-  // again and again from ZK.
+
+  /**
+   * This map may contain tasks which are read from work queue but could not
+   * be executed because they are blocked or the execution queue is full
+   * This is an optimization to ensure that we do not read the same tasks
+   * again and again from ZK.
+   */
   final private Map blockedTasks = 
Collections.synchronizedMap(new LinkedHashMap<>());

Review comment:
   We need a map for the predicate to check presence of an id (map keys 
also used for logs, but if it was the only use we could work around).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14557) eDisMax parser switch + braces regression

2020-06-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133600#comment-17133600
 ] 

David Smiley commented on SOLR-14557:
-

Thanks for clarifying.  Then it seems there is a bug in edismax or the 
underlying query parser syntax rules that we use javacc for.  I know very 
little of that part so you'll have to dig.  I don't think SOLR-11501 is the 
true cause; the former behavior short circuited the query parser altogether to 
switch it at a higher level.  That basically masked whatever deficiencies 
edismax had and still has in parsing a Lucene query.

> eDisMax parser switch + braces regression
> -
>
> Key: SOLR-14557
> URL: https://issues.apache.org/jira/browse/SOLR-14557
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Major
>  Labels: painful
>
> h2. Solr 4.5
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
>  
>  goes like
>  {code}
>  \{!lucene}(foo)
>  content:foo
>  LuceneQParser
> {code}
> fine
> h2. Solr 8.2 
> with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but 
> it's a question of migrating existing queries. 
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
> goes like 
> {code}
> "querystring":"\{!lucene}(foo)",
>  "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene 
> Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) 
>  "QParser":"ExtendedDismaxQParser",
> {code}
> blah... 
> but removing braces in 8.2 works perfectly fine 
> {code}
> "querystring":"\{!lucene}foo",
>  "parsedquery":"+content:foo",
>  "parsedquery_toString":"+content:foo",
>  "QParser":"ExtendedDismaxQParser",
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

2020-06-11 Thread GitBox


madrob commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r439007892



##
File path: 
solr/solrj/src/java/org/apache/solr/common/params/CollectionParams.java
##
@@ -42,31 +42,30 @@
 
 
   enum LockLevel {
-CLUSTER(0),
-COLLECTION(1),
-SHARD(2),
-REPLICA(3),
-NONE(10);
-
-public final int level;
-
-LockLevel(int i) {
-  this.level = i;
+NONE(10, null),

Review comment:
   Didn't consider that; yea, that's a good reason for reordering.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

2020-06-11 Thread GitBox


madrob commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r439007539



##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##
@@ -95,16 +95,25 @@
 
   private volatile Stats stats;
 
-  // Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
-  // It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
-  // deleted from the work-queue as that is a batched operation.
+  /**
+   * Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
+   * It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
+   * deleted from the work-queue as that is a batched operation.
+   */
   final private Set runningZKTasks;
-  // This map may contain tasks which are read from work queue but could not
-  // be executed because they are blocked or the execution queue is full
-  // This is an optimization to ensure that we do not read the same tasks
-  // again and again from ZK.
+
+  /**
+   * This map may contain tasks which are read from work queue but could not
+   * be executed because they are blocked or the execution queue is full
+   * This is an optimization to ensure that we do not read the same tasks
+   * again and again from ZK.
+   */
   final private Map blockedTasks = 
Collections.synchronizedMap(new LinkedHashMap<>());

Review comment:
   Would a ConcurrentLinkedQueue work?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14560) Learning To Rank Interleaving

2020-06-11 Thread Alessandro Benedetti (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133592#comment-17133592
 ] 

Alessandro Benedetti commented on SOLR-14560:
-

The draft is attached : 

[https://github.com/apache/lucene-solr/pull/1571|https://github.com/apache/lucene-solr/pull/1571]

Any comments on the architectural changes and the places I touched so far are 
more than welcome.

Bear in mind the task is still work in progress and changes/tests will happen, 
so in case you are curious and willing to leave a comment, take this into 
account.

Once ready for code review I will add a comment here a finalise the Pull 
Request from draft.
I will proceed to the merge with at least another committer approval.

I tag all the people that worked on Learning To Rank, in no particular order:

[~cpoerschke] [~diegoceccarelli] [~mnilsson] 
[~jpantony][~jdorando][~nsanthapuri] [~dave1g]


> Learning To Rank Interleaving
> -
>
> Key: SOLR-14560
> URL: https://issues.apache.org/jira/browse/SOLR-14560
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.5.2
>Reporter: Alessandro Benedetti
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Interleaving is an approach to Online Search Quality evaluation that can be 
> very useful for Learning To Rank models:
> [https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html|https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html]
> Scope of this issue is to introduce the ability to the LTR query parser of 
> accepting multiple models (2 to start with).
> If one model is passed, normal reranking happens.
> If two models are passed, reranking happens for both models and the final 
> reranked list is the interleaved sequence of results coming from the two 
> models lists.
> As a first step it is going to be implemented through:
> TeamDraft Interleaving with two models in input.
> In the future, we can expand the functionality adding the interleaving 
> algorithm as a parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] alessandrobenedetti opened a new pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-06-11 Thread GitBox


alessandrobenedetti opened a new pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571


   
   # Description
   
   Interleaving is an approach to Online Search Quality evaluation that can be 
very useful for Learning To Rank models:
   
https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html
   Scope of this issue is to introduce the ability to the LTR query parser of 
accepting multiple models (2 to start with).
   If one model is passed, normal reranking happens.
   If two models are passed, reranking happens for both models and the final 
reranked list is the interleaved sequence of results coming from the two models 
lists.
   As a first step it is going to be implemented through:
   TeamDraft Interleaving with two models in input.
   In the future, we can expand the functionality adding the interleaving 
algorithm as a parameter.
   
   # Solution
   
   Change of core LTR classed and addition of a new rescorer
   
   # Tests
   
   WIP
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [X ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [X ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [X] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14560) Learning To Rank Interleaving

2020-06-11 Thread Alessandro Benedetti (Jira)
Alessandro Benedetti created SOLR-14560:
---

 Summary: Learning To Rank Interleaving
 Key: SOLR-14560
 URL: https://issues.apache.org/jira/browse/SOLR-14560
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
  Components: contrib - LTR
Affects Versions: 8.5.2
Reporter: Alessandro Benedetti


Interleaving is an approach to Online Search Quality evaluation that can be 
very useful for Learning To Rank models:
[https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html|https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html]
Scope of this issue is to introduce the ability to the LTR query parser of 
accepting multiple models (2 to start with).

If one model is passed, normal reranking happens.
If two models are passed, reranking happens for both models and the final 
reranked list is the interleaved sequence of results coming from the two models 
lists.

As a first step it is going to be implemented through:
TeamDraft Interleaving with two models in input.

In the future, we can expand the functionality adding the interleaving 
algorithm as a parameter.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14558) SolrLogPostTool should record all lines

2020-06-11 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski reassigned SOLR-14558:
--

Assignee: Jason Gerlowski

> SolrLogPostTool should record all lines
> ---
>
> Key: SOLR-14558
> URL: https://issues.apache.org/jira/browse/SOLR-14558
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, SolrLogPostTool recognizes a predefined set of "types" of log 
> messages: queries, errors, commits, etc.  This makes it easy to find and 
> explore the traffic your cluster is seeing.
> But it would also be cool if we also indexed all records, even if many of 
> them are just assigned a catch-all "other" type_s value.  We won't be able to 
> parse out detailed values from the log messages the way we would for 
> type_s=query for example, but we can still store the line and timestamp.  
> Gives much better search over the logs than dropping down to "grep" for 
> anything that's not one of the predefined types.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1563: LUCENE-9394: fix and suppress warnings

2020-06-11 Thread GitBox


madrob commented on a change in pull request #1563:
URL: https://github.com/apache/lucene-solr/pull/1563#discussion_r438975730



##
File path: lucene/core/src/test/org/apache/lucene/analysis/TestCharArraySet.java
##
@@ -61,15 +61,17 @@ public void testNonZeroOffset() {
   public void testObjectContains() {
 CharArraySet set = new CharArraySet(10, true);
 Integer val = Integer.valueOf(1);
+@SuppressWarnings("deprecation")
+Integer val1 = new Integer(1);

Review comment:
   Add a comment that we're explicitly avoiding the Integer cache, and an 
`assertNotSame(val, val1)`?

##
File path: 
lucene/spatial-extras/src/test/org/apache/lucene/spatial/prefix/HeatmapFacetCounterTest.java
##
@@ -33,11 +33,7 @@
 import org.locationtech.spatial4j.context.SpatialContext;
 import org.locationtech.spatial4j.context.SpatialContextFactory;
 import org.locationtech.spatial4j.distance.DistanceUtils;
-import org.locationtech.spatial4j.shape.Circle;
-import org.locationtech.spatial4j.shape.Point;
-import org.locationtech.spatial4j.shape.Rectangle;
-import org.locationtech.spatial4j.shape.Shape;
-import org.locationtech.spatial4j.shape.SpatialRelation;
+import org.locationtech.spatial4j.shape.*;

Review comment:
   wildcard import

##
File path: 
lucene/spatial-extras/src/test/org/apache/lucene/spatial/spatial4j/RandomizedShapeTestCase.java
##
@@ -183,106 +179,4 @@ private void _assertIntersect(String msg, SpatialRelation 
expected, Shape a, Sha
 }
   }
 
-  protected void assertEqualsRatio(String msg, double expected, double actual) 
{

Review comment:
   There appear to be more unused methods in this class, why did we keep 
them but not these?

##
File path: lucene/core/src/test/org/apache/lucene/analysis/TestCharArraySet.java
##
@@ -61,15 +61,17 @@ public void testNonZeroOffset() {
   public void testObjectContains() {
 CharArraySet set = new CharArraySet(10, true);
 Integer val = Integer.valueOf(1);
+@SuppressWarnings("deprecation")
+Integer val1 = new Integer(1);

Review comment:
   Add a comment that we're explicitly avoiding the Integer cache, and an 
`assertNotSame(val, val1)`?

##
File path: 
lucene/spatial-extras/src/test/org/apache/lucene/spatial/prefix/HeatmapFacetCounterTest.java
##
@@ -33,11 +33,7 @@
 import org.locationtech.spatial4j.context.SpatialContext;
 import org.locationtech.spatial4j.context.SpatialContextFactory;
 import org.locationtech.spatial4j.distance.DistanceUtils;
-import org.locationtech.spatial4j.shape.Circle;
-import org.locationtech.spatial4j.shape.Point;
-import org.locationtech.spatial4j.shape.Rectangle;
-import org.locationtech.spatial4j.shape.Shape;
-import org.locationtech.spatial4j.shape.SpatialRelation;
+import org.locationtech.spatial4j.shape.*;

Review comment:
   wildcard import

##
File path: 
lucene/spatial-extras/src/test/org/apache/lucene/spatial/spatial4j/RandomizedShapeTestCase.java
##
@@ -183,106 +179,4 @@ private void _assertIntersect(String msg, SpatialRelation 
expected, Shape a, Sha
 }
   }
 
-  protected void assertEqualsRatio(String msg, double expected, double actual) 
{

Review comment:
   There appear to be more unused methods in this class, why did we keep 
them but not these?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.

2020-06-11 Thread GitBox


mikemccand commented on a change in pull request #1543:
URL: https://github.com/apache/lucene-solr/pull/1543#discussion_r438928868



##
File path: lucene/CHANGES.txt
##
@@ -218,6 +218,10 @@ Optimizations
 * LUCENE-9087: Build always trees with full leaves and lower the default value 
for maxPointsPerLeafNode to 512.
   (Ignacio Vera)
 
+* LUCENE-9378: Disabled compression on short binary values, as compression

Review comment:
   Maybe say `Disable doc values compression on short binary values, ...`?  
(To make it clear we are talking about doc values and not maybe stored fields).

##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
##
@@ -404,32 +406,51 @@ private void flushData() throws IOException {
 // Write offset to this block to temporary offsets file
 totalChunks++;
 long thisBlockStartPointer = data.getFilePointer();
-
-// Optimisation - check if all lengths are same
-boolean allLengthsSame = true;
-for (int i = 1; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; i++) {
-  if (docLengths[i] != docLengths[i-1]) {
-allLengthsSame = false;
+
+final int avgLength = uncompressedBlockLength >>> 
Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT;
+int offset = 0;
+// Turn docLengths into deltas from expected values from the average 
length
+for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) {
+  offset += docLengths[i];
+  docLengths[i] = offset - avgLength * (i + 1);
+}
+int numBytes = 0;
+for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) {
+  if (docLengths[i] < Byte.MIN_VALUE || docLengths[i] > 
Byte.MAX_VALUE) {
+numBytes = Integer.BYTES;
 break;
+  } else if (docLengths[i] != 0) {
+numBytes = Math.max(numBytes, Byte.BYTES);
   }
 }
-if (allLengthsSame) {
-// Only write one value shifted. Steal a bit to indicate all other 
lengths are the same
-int onlyOneLength = (docLengths[0] <<1) | 1;
-data.writeVInt(onlyOneLength);
-} else {
-  for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; i++) {
-if (i == 0) {
-  // Write first value shifted and steal a bit to indicate other 
lengths are to follow
-  int multipleLengths = (docLengths[0] <<1);
-  data.writeVInt(multipleLengths);  
-} else {
-  data.writeVInt(docLengths[i]);
-}
+data.writeVLonglong) uncompressedBlockLength) << 4) | numBytes);
+
+if (numBytes == Integer.BYTES) {
+  // encode deltas as ints
+  for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) {
+data.writeInt(Integer.reverseBytes(docLengths[i]));
+  }
+} else if (numBytes == 1) {
+  for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) {
+data.writeByte((byte) docLengths[i]);
   }
+} else if (numBytes != 0) {
+  throw new AssertionError();
 }
+
 maxUncompressedBlockLength = Math.max(maxUncompressedBlockLength, 
uncompressedBlockLength);
-LZ4.compress(block, 0, uncompressedBlockLength, data, ht);
+
+// Compression proved to hurt latency in some cases, so we're only
+// enabling it on long inputs for now. Can we reduce the compression
+// overhead and enable compression again, e.g. by building shared
+// dictionaries that allow decompressing one value at once instead of
+// forcing 32 values to be decompressed even when you only need one?
+if (uncompressedBlockLength >= BINARY_LENGTH_COMPRESSION_THRESHOLD * 
numDocsInCurrentBlock) {
+  LZ4.compress(block, 0, uncompressedBlockLength, data, highCompHt);
+} else {
+  LZ4.compress(block, 0, uncompressedBlockLength, data, noCompHt);

Review comment:
   Hmm do we know that our new `LZ4.NoCompressionHashTable` is actually 
really close to doing nothing?  I don't understand `LZ4` well enough to know 
that e.g. `return -1` from `int get (int offset)` method is really a no-op 
overall...

##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
##
@@ -762,6 +764,97 @@ public BytesRef binaryValue() throws IOException {
   // Decompresses blocks of binary values to retrieve content
   class BinaryDecoder {
 
+private final LongValues addresses;
+private final IndexInput compressedData;
+// Cache of last uncompressed block 
+private long lastBlockId = -1;
+private final ByteBuffer deltas;
+private int numBytes;
+private int uncompressedBlockLength;  

[GitHub] [lucene-solr] madrob commented on a change in pull request #1563: LUCENE-9394: fix and suppress warnings

2020-06-11 Thread GitBox


madrob commented on a change in pull request #1563:
URL: https://github.com/apache/lucene-solr/pull/1563#discussion_r438975730



##
File path: lucene/core/src/test/org/apache/lucene/analysis/TestCharArraySet.java
##
@@ -61,15 +61,17 @@ public void testNonZeroOffset() {
   public void testObjectContains() {
 CharArraySet set = new CharArraySet(10, true);
 Integer val = Integer.valueOf(1);
+@SuppressWarnings("deprecation")
+Integer val1 = new Integer(1);

Review comment:
   Add a comment that we're explicitly avoiding the Integer cache, and an 
`assertNotSame(val, val1)`?

##
File path: 
lucene/spatial-extras/src/test/org/apache/lucene/spatial/prefix/HeatmapFacetCounterTest.java
##
@@ -33,11 +33,7 @@
 import org.locationtech.spatial4j.context.SpatialContext;
 import org.locationtech.spatial4j.context.SpatialContextFactory;
 import org.locationtech.spatial4j.distance.DistanceUtils;
-import org.locationtech.spatial4j.shape.Circle;
-import org.locationtech.spatial4j.shape.Point;
-import org.locationtech.spatial4j.shape.Rectangle;
-import org.locationtech.spatial4j.shape.Shape;
-import org.locationtech.spatial4j.shape.SpatialRelation;
+import org.locationtech.spatial4j.shape.*;

Review comment:
   wildcard import

##
File path: 
lucene/spatial-extras/src/test/org/apache/lucene/spatial/spatial4j/RandomizedShapeTestCase.java
##
@@ -183,106 +179,4 @@ private void _assertIntersect(String msg, SpatialRelation 
expected, Shape a, Sha
 }
   }
 
-  protected void assertEqualsRatio(String msg, double expected, double actual) 
{

Review comment:
   There appear to be more unused methods in this class, why did we keep 
them but not these?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.

2020-06-11 Thread GitBox


mikemccand commented on a change in pull request #1543:
URL: https://github.com/apache/lucene-solr/pull/1543#discussion_r438928868



##
File path: lucene/CHANGES.txt
##
@@ -218,6 +218,10 @@ Optimizations
 * LUCENE-9087: Build always trees with full leaves and lower the default value 
for maxPointsPerLeafNode to 512.
   (Ignacio Vera)
 
+* LUCENE-9378: Disabled compression on short binary values, as compression

Review comment:
   Maybe say `Disable doc values compression on short binary values, ...`?  
(To make it clear we are talking about doc values and not maybe stored fields).

##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
##
@@ -404,32 +406,51 @@ private void flushData() throws IOException {
 // Write offset to this block to temporary offsets file
 totalChunks++;
 long thisBlockStartPointer = data.getFilePointer();
-
-// Optimisation - check if all lengths are same
-boolean allLengthsSame = true;
-for (int i = 1; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; i++) {
-  if (docLengths[i] != docLengths[i-1]) {
-allLengthsSame = false;
+
+final int avgLength = uncompressedBlockLength >>> 
Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT;
+int offset = 0;
+// Turn docLengths into deltas from expected values from the average 
length
+for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) {
+  offset += docLengths[i];
+  docLengths[i] = offset - avgLength * (i + 1);
+}
+int numBytes = 0;
+for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) {
+  if (docLengths[i] < Byte.MIN_VALUE || docLengths[i] > 
Byte.MAX_VALUE) {
+numBytes = Integer.BYTES;
 break;
+  } else if (docLengths[i] != 0) {
+numBytes = Math.max(numBytes, Byte.BYTES);
   }
 }
-if (allLengthsSame) {
-// Only write one value shifted. Steal a bit to indicate all other 
lengths are the same
-int onlyOneLength = (docLengths[0] <<1) | 1;
-data.writeVInt(onlyOneLength);
-} else {
-  for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; i++) {
-if (i == 0) {
-  // Write first value shifted and steal a bit to indicate other 
lengths are to follow
-  int multipleLengths = (docLengths[0] <<1);
-  data.writeVInt(multipleLengths);  
-} else {
-  data.writeVInt(docLengths[i]);
-}
+data.writeVLonglong) uncompressedBlockLength) << 4) | numBytes);
+
+if (numBytes == Integer.BYTES) {
+  // encode deltas as ints
+  for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) {
+data.writeInt(Integer.reverseBytes(docLengths[i]));
+  }
+} else if (numBytes == 1) {
+  for (int i = 0; i < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) {
+data.writeByte((byte) docLengths[i]);
   }
+} else if (numBytes != 0) {
+  throw new AssertionError();
 }
+
 maxUncompressedBlockLength = Math.max(maxUncompressedBlockLength, 
uncompressedBlockLength);
-LZ4.compress(block, 0, uncompressedBlockLength, data, ht);
+
+// Compression proved to hurt latency in some cases, so we're only
+// enabling it on long inputs for now. Can we reduce the compression
+// overhead and enable compression again, e.g. by building shared
+// dictionaries that allow decompressing one value at once instead of
+// forcing 32 values to be decompressed even when you only need one?
+if (uncompressedBlockLength >= BINARY_LENGTH_COMPRESSION_THRESHOLD * 
numDocsInCurrentBlock) {
+  LZ4.compress(block, 0, uncompressedBlockLength, data, highCompHt);
+} else {
+  LZ4.compress(block, 0, uncompressedBlockLength, data, noCompHt);

Review comment:
   Hmm do we know that our new `LZ4.NoCompressionHashTable` is actually 
really close to doing nothing?  I don't understand `LZ4` well enough to know 
that e.g. `return -1` from `int get (int offset)` method is really a no-op 
overall...

##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
##
@@ -762,6 +764,97 @@ public BytesRef binaryValue() throws IOException {
   // Decompresses blocks of binary values to retrieve content
   class BinaryDecoder {
 
+private final LongValues addresses;
+private final IndexInput compressedData;
+// Cache of last uncompressed block 
+private long lastBlockId = -1;
+private final ByteBuffer deltas;
+private int numBytes;
+private int uncompressedBlockLength;  

[GitHub] [lucene-solr] tflobbe commented on pull request #1567: LUCENE-9402: Let MultiCollector handle minCompetitiveScore

2020-06-11 Thread GitBox


tflobbe commented on pull request #1567:
URL: https://github.com/apache/lucene-solr/pull/1567#issuecomment-642845936


   Ah! good Catch, I missed that completely. I'll fix.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

2020-06-11 Thread GitBox


murblanc commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438976552



##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##
@@ -95,16 +95,25 @@
 
   private volatile Stats stats;
 
-  // Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
-  // It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
-  // deleted from the work-queue as that is a batched operation.
+  /**
+   * Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
+   * It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
+   * deleted from the work-queue as that is a batched operation.
+   */
   final private Set runningZKTasks;
-  // This map may contain tasks which are read from work queue but could not
-  // be executed because they are blocked or the execution queue is full
-  // This is an optimization to ensure that we do not read the same tasks
-  // again and again from ZK.
+
+  /**
+   * This map may contain tasks which are read from work queue but could not
+   * be executed because they are blocked or the execution queue is full
+   * This is an optimization to ensure that we do not read the same tasks
+   * again and again from ZK.
+   */
   final private Map blockedTasks = 
Collections.synchronizedMap(new LinkedHashMap<>());
-  final private Predicate excludedTasks = new Predicate() {
+
+  /**
+   * Predicate used to filter out tasks from the Zookeeper queue that should 
not be returned for processing.
+   */
+  final private Predicate excludedTasks = new Predicate<>() {
 @Override
 public boolean test(String s) {
   return runningTasks.contains(s) || blockedTasks.containsKey(s);

Review comment:
   Yes it is. Can likely change this one into a 
`ConcurrentHashMap.newKeySet()` as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

2020-06-11 Thread GitBox


murblanc commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438974719



##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##
@@ -95,16 +95,25 @@
 
   private volatile Stats stats;
 
-  // Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
-  // It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
-  // deleted from the work-queue as that is a batched operation.
+  /**
+   * Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
+   * It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
+   * deleted from the work-queue as that is a batched operation.
+   */
   final private Set runningZKTasks;

Review comment:
   Yes, will change that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] danmuzi commented on pull request #1560: LUCENE-9391: Upgrade HPPC to 0.8.2

2020-06-11 Thread GitBox


danmuzi commented on pull request #1560:
URL: https://github.com/apache/lucene-solr/pull/1560#issuecomment-642844611


   Please do the **"Squash and merge"** below.
   Your sub commits will be combined automatically.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

2020-06-11 Thread GitBox


murblanc commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438970686



##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##
@@ -95,16 +95,25 @@
 
   private volatile Stats stats;
 
-  // Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
-  // It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
-  // deleted from the work-queue as that is a batched operation.
+  /**
+   * Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
+   * It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
+   * deleted from the work-queue as that is a batched operation.
+   */
   final private Set runningZKTasks;
-  // This map may contain tasks which are read from work queue but could not
-  // be executed because they are blocked or the execution queue is full
-  // This is an optimization to ensure that we do not read the same tasks
-  // again and again from ZK.
+
+  /**
+   * This map may contain tasks which are read from work queue but could not
+   * be executed because they are blocked or the execution queue is full
+   * This is an optimization to ensure that we do not read the same tasks
+   * again and again from ZK.
+   */
   final private Map blockedTasks = 
Collections.synchronizedMap(new LinkedHashMap<>());

Review comment:
   We'd need a concurrent linked hash map because we need iteration order 
== insert order...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9397) UniformSplit supports encodable fields metadata

2020-06-11 Thread Bruno Roustant (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Roustant resolved LUCENE-9397.

Fix Version/s: 8.6
   Resolution: Fixed

Thanks [~dsmiley] for the review.

> UniformSplit supports encodable fields metadata
> ---
>
> Key: LUCENE-9397
> URL: https://issues.apache.org/jira/browse/LUCENE-9397
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> UniformSplit already supports custom encoding for term blocks. This is an 
> extension to also support encodable fields metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9397) UniformSplit supports encodable fields metadata

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133535#comment-17133535
 ] 

ASF subversion and git services commented on LUCENE-9397:
-

Commit ac7bb4a53effcd4e37174e74c89f61187f04fcc0 in lucene-solr's branch 
refs/heads/branch_8x from Bruno Roustant
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ac7bb4a ]

LUCENE-9397: UniformSplit supports encodable fields metadata.


> UniformSplit supports encodable fields metadata
> ---
>
> Key: LUCENE-9397
> URL: https://issues.apache.org/jira/browse/LUCENE-9397
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> UniformSplit already supports custom encoding for term blocks. This is an 
> extension to also support encodable fields metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

2020-06-11 Thread GitBox


murblanc commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438967387



##
File path: 
solr/solrj/src/java/org/apache/solr/common/params/CollectionParams.java
##
@@ -42,31 +42,30 @@
 
 
   enum LockLevel {
-CLUSTER(0),
-COLLECTION(1),
-SHARD(2),
-REPLICA(3),
-NONE(10);
-
-public final int level;
-
-LockLevel(int i) {
-  this.level = i;
+NONE(10, null),

Review comment:
   Compiler complained of forward reference when I didn't.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

2020-06-11 Thread GitBox


murblanc commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438966974



##
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/OverseerCollectionMessageHandler.java
##
@@ -867,26 +866,25 @@ public String getTaskKey(ZkNodeProps message) {
   }
 
 
+  // -1 is not a possible batchSessionId so -1 will force initialization of 
lockSession
   private long sessionId = -1;
   private LockTree.Session lockSession;
 
   @Override
-  public Lock lockTask(ZkNodeProps message, OverseerTaskProcessor.TaskBatch 
taskBatch) {
-if (lockSession == null || sessionId != taskBatch.getId()) {
+  public Lock lockTask(ZkNodeProps message, long batchSessionId) {
+if (sessionId != batchSessionId) {
   //this is always called in the same thread.
   //Each batch is supposed to have a new taskBatch
   //So if taskBatch changes we must create a new Session
-  // also check if the running tasks are empty. If yes, clear lockTree
-  // this will ensure that locks are not 'leaked'
-  if(taskBatch.getRunningTasks() == 0) lockTree.clear();

Review comment:
   I hope (and think) it is... A lock can leak if an executor thread dies 
in a place where it shouldn't be dying (just before the try with the lock 
released in the finally.
   Clearing all locks is not a solution IMO. If we do end up with lock leaks we 
should address those in a more elegant way (fix the leak).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

2020-06-11 Thread GitBox


murblanc commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438965089



##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##
@@ -253,20 +277,22 @@ public void run() {
 continue;
   }
 
-  blockedTasks.clear(); // clear it now; may get refilled below.
+  // clear the blocked tasks, may get refilled below. Given 
blockedTasks can only get entries from heads and heads
+  // has at most MAX_BLOCKED_TASKS tasks, blockedTasks will never 
exceed MAX_BLOCKED_TASKS entries.
+  // Note blockedTasks can't be cleared too early as it is used in the 
excludedTasks Predicate above.
+  blockedTasks.clear();
+
+  // Trigger the creation of a new Session used for locking when/if a 
lock is later acquired on the OverseerCollectionMessageHandler
+  batchSessionId++;
 
-  taskBatch.batchId++;
   boolean tooManyTasks = false;
   for (QueueEvent head : heads) {
 if (!tooManyTasks) {
-  synchronized (runningTasks) {
 tooManyTasks = runningTasksSize() >= MAX_PARALLEL_TASKS;
-  }
 }
 if (tooManyTasks) {
   // Too many tasks are running, just shove the rest into the 
"blocked" queue.
-  if(blockedTasks.size() < MAX_BLOCKED_TASKS)
-blockedTasks.put(head.getId(), head);
+  blockedTasks.put(head.getId(), head);

Review comment:
   Commented line 280 above.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zhaih commented on pull request #1560: LUCENE-9391: Upgrade HPPC to 0.8.2

2020-06-11 Thread GitBox


zhaih commented on pull request #1560:
URL: https://github.com/apache/lucene-solr/pull/1560#issuecomment-642835331


   Do I need to squash the commits? Seems commits in Lucene are all squashed? 
Or it will be done automatically when merging somehow?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zhaih commented on pull request #1560: LUCENE-9391: Upgrade HPPC to 0.8.2

2020-06-11 Thread GitBox


zhaih commented on pull request #1560:
URL: https://github.com/apache/lucene-solr/pull/1560#issuecomment-642830815


   > Hi Patrick,
   > Thanks for your contribution 👍
   > I found the JIRA-issue number of this PR is wrong.
   > It should be changed from [LUCENE-8574] to [LUCENE-9391].
   > https://issues.apache.org/jira/browse/LUCENE-9391
   > Please check it.
   
   Oh, thank you for figuring that out! Yeah I picked up a wrong one from my 
backlog... Thank you very much!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

2020-06-11 Thread GitBox


madrob commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438907365



##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerMessageHandler.java
##
@@ -50,7 +50,7 @@
   /**Try to provide an exclusive lock for this particular task
* return null if locking is not possible. If locking is not necessary

Review comment:
   This javadoc includes a sentence fragment, can we complete the thought 
while we're improving documentation in this area?

##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##
@@ -95,16 +95,25 @@
 
   private volatile Stats stats;
 
-  // Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
-  // It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
-  // deleted from the work-queue as that is a batched operation.
+  /**
+   * Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
+   * It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
+   * deleted from the work-queue as that is a batched operation.
+   */
   final private Set runningZKTasks;

Review comment:
   Since there is so much synchronized access to this, should it be a 
`ConcurrentHashMap.newKeySet();`

##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##
@@ -95,16 +95,25 @@
 
   private volatile Stats stats;
 
-  // Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
-  // It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
-  // deleted from the work-queue as that is a batched operation.
+  /**
+   * Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
+   * It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
+   * deleted from the work-queue as that is a batched operation.
+   */
   final private Set runningZKTasks;
-  // This map may contain tasks which are read from work queue but could not
-  // be executed because they are blocked or the execution queue is full
-  // This is an optimization to ensure that we do not read the same tasks
-  // again and again from ZK.
+
+  /**
+   * This map may contain tasks which are read from work queue but could not
+   * be executed because they are blocked or the execution queue is full
+   * This is an optimization to ensure that we do not read the same tasks
+   * again and again from ZK.
+   */
   final private Map blockedTasks = 
Collections.synchronizedMap(new LinkedHashMap<>());
-  final private Predicate excludedTasks = new Predicate() {
+
+  /**
+   * Predicate used to filter out tasks from the Zookeeper queue that should 
not be returned for processing.
+   */
+  final private Predicate excludedTasks = new Predicate<>() {
 @Override
 public boolean test(String s) {
   return runningTasks.contains(s) || blockedTasks.containsKey(s);

Review comment:
   This reference to runningTasks isn't synchronized. Is that an issue?

##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##
@@ -95,16 +95,25 @@
 
   private volatile Stats stats;
 
-  // Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
-  // It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
-  // deleted from the work-queue as that is a batched operation.
+  /**
+   * Set of tasks that have been picked up for processing but not cleaned up 
from zk work-queue.
+   * It may contain tasks that have completed execution, have been entered 
into the completed/failed map in zk but not
+   * deleted from the work-queue as that is a batched operation.
+   */
   final private Set runningZKTasks;
-  // This map may contain tasks which are read from work queue but could not
-  // be executed because they are blocked or the execution queue is full
-  // This is an optimization to ensure that we do not read the same tasks
-  // again and again from ZK.
+
+  /**
+   * This map may contain tasks which are read from work queue but could not
+   * be executed because they are blocked or the execution queue is full
+   * This is an optimization to ensure that we do not read the same tasks
+   * again and again from ZK.
+   */
   final private Map blockedTasks = 
Collections.synchronizedMap(new LinkedHashMap<>());

Review comment:
   Similar here, can this be a ConcurrentHashMap instead of a synchronized 
map?

##
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##
@@ -253,20 +277,22 @@ public void run() {
 continue;
   }
 

[GitHub] [lucene-solr] danmuzi commented on pull request #1560: LUCENE-8574: Upgrade HPPC to 0.8.2

2020-06-11 Thread GitBox


danmuzi commented on pull request #1560:
URL: https://github.com/apache/lucene-solr/pull/1560#issuecomment-642827027


   Hi Patrick,
   Thanks for your contribution 👍 
   I found the JIRA-issue number of this PR is wrong.
   It should be changed from [LUCENE-8574] to [LUCENE-9391].
   https://issues.apache.org/jira/browse/LUCENE-9391
   Please check it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-06-11 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133403#comment-17133403
 ] 

Adrien Grand commented on LUCENE-9356:
--

I had beasted many iterations but the Elastic CI found a failing seed right 
after I pushed that is due to the FST constructor, which throws an 
IllegalStateException when an unexpected byte is read for the input type, so I 
changed it for a CorruptIndexException.

> Add tests for corruptions caused by byte flips
> --
>
> Key: LUCENE-9356
> URL: https://issues.apache.org/jira/browse/LUCENE-9356
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We already have tests that file truncation and modification of the index 
> headers are caught correctly. I'd like to add another test that flipping a 
> byte in a way that modifies the checksum of the file is always caught 
> gracefully by Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133376#comment-17133376
 ] 

ASF subversion and git services commented on LUCENE-9356:
-

Commit 8d95a2ee582da04edf419e6b39756fdde55503fc in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8d95a2e ]

LUCENE-9356: Make FST throw the correct exception upon incorrect input type.


> Add tests for corruptions caused by byte flips
> --
>
> Key: LUCENE-9356
> URL: https://issues.apache.org/jira/browse/LUCENE-9356
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We already have tests that file truncation and modification of the index 
> headers are caught correctly. I'd like to add another test that flipping a 
> byte in a way that modifies the checksum of the file is always caught 
> gracefully by Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133374#comment-17133374
 ] 

ASF subversion and git services commented on LUCENE-9356:
-

Commit 8d95a2ee582da04edf419e6b39756fdde55503fc in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8d95a2e ]

LUCENE-9356: Make FST throw the correct exception upon incorrect input type.


> Add tests for corruptions caused by byte flips
> --
>
> Key: LUCENE-9356
> URL: https://issues.apache.org/jira/browse/LUCENE-9356
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We already have tests that file truncation and modification of the index 
> headers are caught correctly. I'd like to add another test that flipping a 
> byte in a way that modifies the checksum of the file is always caught 
> gracefully by Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant closed pull request #1564: LUCENE-9397: UniformSplit supports encodable fields metadata.

2020-06-11 Thread GitBox


bruno-roustant closed pull request #1564:
URL: https://github.com/apache/lucene-solr/pull/1564


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9397) UniformSplit supports encodable fields metadata

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133372#comment-17133372
 ] 

ASF subversion and git services commented on LUCENE-9397:
-

Commit 75d25ad6779dec194a2e0ef2a3263ce0fb872cf6 in lucene-solr's branch 
refs/heads/master from Bruno Roustant
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=75d25ad ]

LUCENE-9397: UniformSplit supports encodable fields metadata.


> UniformSplit supports encodable fields metadata
> ---
>
> Key: LUCENE-9397
> URL: https://issues.apache.org/jira/browse/LUCENE-9397
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UniformSplit already supports custom encoding for term blocks. This is an 
> extension to also support encodable fields metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-06-11 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9356.
--
Fix Version/s: 8.6
   Resolution: Fixed

> Add tests for corruptions caused by byte flips
> --
>
> Key: LUCENE-9356
> URL: https://issues.apache.org/jira/browse/LUCENE-9356
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We already have tests that file truncation and modification of the index 
> headers are caught correctly. I'd like to add another test that flipping a 
> byte in a way that modifies the checksum of the file is always caught 
> gracefully by Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #1557: LUCENE-9396: Improve truncation detection for points.

2020-06-11 Thread GitBox


jpountz commented on pull request #1557:
URL: https://github.com/apache/lucene-solr/pull/1557#issuecomment-642781876


   @rmuir I combined them in an overloaded `retrieveChecksum(IndexInput, long)` 
variant, what do you think?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133367#comment-17133367
 ] 

ASF subversion and git services commented on LUCENE-9356:
-

Commit d3c74a305ff95f087a6e88953d1ef34e7d71f06f in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d3c74a3 ]

LUCENE-9356: Add a test that verifies that Lucene catches bit flips. (#1569)



> Add tests for corruptions caused by byte flips
> --
>
> Key: LUCENE-9356
> URL: https://issues.apache.org/jira/browse/LUCENE-9356
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We already have tests that file truncation and modification of the index 
> headers are caught correctly. I'd like to add another test that flipping a 
> byte in a way that modifies the checksum of the file is always caught 
> gracefully by Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133366#comment-17133366
 ] 

ASF subversion and git services commented on LUCENE-9356:
-

Commit 36109ec36216141cb0fbf9fb09e9d74721a78bda in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=36109ec ]

LUCENE-9356: Add a test that verifies that Lucene catches bit flips. (#1569)



> Add tests for corruptions caused by byte flips
> --
>
> Key: LUCENE-9356
> URL: https://issues.apache.org/jira/browse/LUCENE-9356
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We already have tests that file truncation and modification of the index 
> headers are caught correctly. I'd like to add another test that flipping a 
> byte in a way that modifies the checksum of the file is always caught 
> gracefully by Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.

2020-06-11 Thread GitBox


jpountz merged pull request #1569:
URL: https://github.com/apache/lucene-solr/pull/1569


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12823) remove clusterstate.json in Lucene/Solr 9.0

2020-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133360#comment-17133360
 ] 

ASF subversion and git services commented on SOLR-12823:


Commit b4dcbfa3de7c512baab642942320d48fb6f180c4 in lucene-solr's branch 
refs/heads/master from murblanc
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b4dcbfa ]

SOLR-12823: fix failures in CloudHttp2SolrClientTest CloudSolrClientTest 
TestCloudSolrClientConnections (#1565)

Co-authored-by: Ilan Ginzburg 

> remove clusterstate.json in Lucene/Solr 9.0
> ---
>
> Key: SOLR-12823
> URL: https://issues.apache.org/jira/browse/SOLR-12823
> Project: Solr
>  Issue Type: Task
>Reporter: Varun Thacker
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> clusterstate.json is an artifact of a pre 5.0 Solr release. We should remove 
> that in 9.0
> It stays empty unless you explicitly ask to create the collection with the 
> old "stateFormat" and there is no reason for one to create a collection with 
> the old stateFormat.
> We should also remove the "stateFormat" argument in create collection
> We should also remove MIGRATESTATEVERSION as well
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob merged pull request #1565: SOLR-12823: fix test failures

2020-06-11 Thread GitBox


madrob merged pull request #1565:
URL: https://github.com/apache/lucene-solr/pull/1565


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14557) eDisMax parser switch + braces regression

2020-06-11 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133301#comment-17133301
 ] 

Mikhail Khludnev edited comment on SOLR-14557 at 6/11/20, 2:38 PM:
---

Thanks for response , [~dsmiley].
# it seems like bug in syntax parsing
# I trust users 
# they have many old queries in curly braces where they switch different parses 
(mostly \{!join}) arbitrarily, so defType isn't an option 
# it seems I achieved what uf does via luceneMatchVersion = 4.5 in config 
that's I'v got SOLR-11501 notes. So, uf doesn't bring any value to me. Or it 
should? 
# So everything seems working (switching \{!parser} inside of edismax query) 
until users add {{(}} braces {{)}}. 

So, old query doesn't work for them. It seems like a bug outside of SOLR-11501 
or loosely related to it.


was (Author: mkhludnev):
Thanks for response , [~dsmiley].
# it seems like bug in syntax parsing
# I trust users 
# they have many old queries in curly braces where they switch different parses 
(mostly \{!join}) arbitrarily, so defType isn't an option 
# it seems I achieved what uf does via luceneMatchVersion = 4.5 in config 
that's I'v got SOLR-11501 notes. So, uf doesn't bring any value to me. Or it 
should? 
# So everything seems working (switching \{!parser} inside of edismax query) 
until users add {{(}}braces{{)}}. 

So, old query doesn't work for them. It seems like a bug outside of SOLR-11501 
or loosely related to it.

> eDisMax parser switch + braces regression
> -
>
> Key: SOLR-14557
> URL: https://issues.apache.org/jira/browse/SOLR-14557
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Major
>  Labels: painful
>
> h2. Solr 4.5
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
>  
>  goes like
>  {code}
>  \{!lucene}(foo)
>  content:foo
>  LuceneQParser
> {code}
> fine
> h2. Solr 8.2 
> with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but 
> it's a question of migrating existing queries. 
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
> goes like 
> {code}
> "querystring":"\{!lucene}(foo)",
>  "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene 
> Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) 
>  "QParser":"ExtendedDismaxQParser",
> {code}
> blah... 
> but removing braces in 8.2 works perfectly fine 
> {code}
> "querystring":"\{!lucene}foo",
>  "parsedquery":"+content:foo",
>  "parsedquery_toString":"+content:foo",
>  "QParser":"ExtendedDismaxQParser",
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14557) eDisMax parser switch + braces regression

2020-06-11 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133301#comment-17133301
 ] 

Mikhail Khludnev commented on SOLR-14557:
-

Thanks for response , [~dsmiley].
# it seems like bug in syntax parsing
# I trust users 
# they have many old queries in curly braces where they switch different parses 
(mostly \{!join}) arbitrarily, so defType isn't an option 
# it seems I achieved what uf does via luceneMatchVersion = 4.5 in config 
that's I'v got SOLR-11501 notes. So, uf doesn't bring any value to me. Or it 
should? 
# So everything seems working (switching \{!parser} inside of edismax query) 
until users add {{(}}braces{{)}}. 
So, old query doesn't work for them. It seems like a bug outside of SOLR-11501 
or loosely related to it.

> eDisMax parser switch + braces regression
> -
>
> Key: SOLR-14557
> URL: https://issues.apache.org/jira/browse/SOLR-14557
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Major
>  Labels: painful
>
> h2. Solr 4.5
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
>  
>  goes like
>  {code}
>  \{!lucene}(foo)
>  content:foo
>  LuceneQParser
> {code}
> fine
> h2. Solr 8.2 
> with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but 
> it's a question of migrating existing queries. 
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
> goes like 
> {code}
> "querystring":"\{!lucene}(foo)",
>  "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene 
> Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) 
>  "QParser":"ExtendedDismaxQParser",
> {code}
> blah... 
> but removing braces in 8.2 works perfectly fine 
> {code}
> "querystring":"\{!lucene}foo",
>  "parsedquery":"+content:foo",
>  "parsedquery_toString":"+content:foo",
>  "QParser":"ExtendedDismaxQParser",
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14557) eDisMax parser switch + braces regression

2020-06-11 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133301#comment-17133301
 ] 

Mikhail Khludnev edited comment on SOLR-14557 at 6/11/20, 2:38 PM:
---

Thanks for response , [~dsmiley].
# it seems like bug in syntax parsing
# I trust users 
# they have many old queries in curly braces where they switch different parses 
(mostly \{!join}) arbitrarily, so defType isn't an option 
# it seems I achieved what uf does via luceneMatchVersion = 4.5 in config 
that's I'v got SOLR-11501 notes. So, uf doesn't bring any value to me. Or it 
should? 
# So everything seems working (switching \{!parser} inside of edismax query) 
until users add {{(}}braces{{)}}. 

So, old query doesn't work for them. It seems like a bug outside of SOLR-11501 
or loosely related to it.


was (Author: mkhludnev):
Thanks for response , [~dsmiley].
# it seems like bug in syntax parsing
# I trust users 
# they have many old queries in curly braces where they switch different parses 
(mostly \{!join}) arbitrarily, so defType isn't an option 
# it seems I achieved what uf does via luceneMatchVersion = 4.5 in config 
that's I'v got SOLR-11501 notes. So, uf doesn't bring any value to me. Or it 
should? 
# So everything seems working (switching \{!parser} inside of edismax query) 
until users add {{(}}braces{{)}}. 
So, old query doesn't work for them. It seems like a bug outside of SOLR-11501 
or loosely related to it.

> eDisMax parser switch + braces regression
> -
>
> Key: SOLR-14557
> URL: https://issues.apache.org/jira/browse/SOLR-14557
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Major
>  Labels: painful
>
> h2. Solr 4.5
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
>  
>  goes like
>  {code}
>  \{!lucene}(foo)
>  content:foo
>  LuceneQParser
> {code}
> fine
> h2. Solr 8.2 
> with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but 
> it's a question of migrating existing queries. 
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
> goes like 
> {code}
> "querystring":"\{!lucene}(foo)",
>  "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene 
> Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) 
>  "QParser":"ExtendedDismaxQParser",
> {code}
> blah... 
> but removing braces in 8.2 works perfectly fine 
> {code}
> "querystring":"\{!lucene}foo",
>  "parsedquery":"+content:foo",
>  "parsedquery_toString":"+content:foo",
>  "QParser":"ExtendedDismaxQParser",
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.

2020-06-11 Thread GitBox


jpountz commented on a change in pull request #1569:
URL: https://github.com/apache/lucene-solr/pull/1569#discussion_r438831834



##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestAllFilesDetectBitFlips.java
##
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.store.BaseDirectoryWrapper;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.LineFileDocs;
+import org.apache.lucene.util.LuceneTestCase;
+import org.apache.lucene.util.LuceneTestCase.SuppressFileSystems;
+import org.apache.lucene.util.TestUtil;
+
+/**
+ * Test that the default codec detects bit flips at open or checkIntegrity 
time.
+ */
+@SuppressFileSystems("ExtrasFS")
+public class TestAllFilesDetectBitFlips extends LuceneTestCase {
+
+  public void test() throws Exception {
+doTest(false);
+  }
+
+  public void testCFS() throws Exception {
+doTest(true);
+  }
+
+  public void doTest(boolean cfs) throws Exception {
+Directory dir = newDirectory();
+
+IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random()));
+conf.setCodec(TestUtil.getDefaultCodec());
+
+if (cfs == false) {
+  conf.setUseCompoundFile(false);
+  conf.getMergePolicy().setNoCFSRatio(0.0);
+}
+
+RandomIndexWriter riw = new RandomIndexWriter(random(), dir, conf);
+// Use LineFileDocs so we (hopefully) get most Lucene features

Review comment:
   This is actually copy-pasted from `TestAllFilesDetectTruncation` :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14557) eDisMax parser switch + braces regression

2020-06-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133288#comment-17133288
 ] 

David Smiley commented on SOLR-14557:
-

The issue description is a bit unclear to me in terms of what you are saying is 
the bug (you filed this as a bug after all).  Yes there was a change in 
SOLR-11501 that will affect what you are trying to do.  But what is the bug or 
problem?  For the overall use-case of wanting to parse that lucene query, then 
pass {{defType=lucene}} instead of edismax.  You could instead set 
{{uf=\*,\_query\_}} if you want _users_ to be able to make this choice if you 
trust them to :-).  This is in the upgrade notes written for SOLR-11501 in 
CHANGES.txt.

> eDisMax parser switch + braces regression
> -
>
> Key: SOLR-14557
> URL: https://issues.apache.org/jira/browse/SOLR-14557
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Major
>  Labels: painful
>
> h2. Solr 4.5
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
>  
>  goes like
>  {code}
>  \{!lucene}(foo)
>  content:foo
>  LuceneQParser
> {code}
> fine
> h2. Solr 8.2 
> with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but 
> it's a question of migrating existing queries. 
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
> goes like 
> {code}
> "querystring":"\{!lucene}(foo)",
>  "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene 
> Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) 
>  "QParser":"ExtendedDismaxQParser",
> {code}
> blah... 
> but removing braces in 8.2 works perfectly fine 
> {code}
> "querystring":"\{!lucene}foo",
>  "parsedquery":"+content:foo",
>  "parsedquery_toString":"+content:foo",
>  "QParser":"ExtendedDismaxQParser",
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.

2020-06-11 Thread GitBox


mikemccand commented on a change in pull request #1569:
URL: https://github.com/apache/lucene-solr/pull/1569#discussion_r438774512



##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestAllFilesDetectBitFlips.java
##
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.store.BaseDirectoryWrapper;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.LineFileDocs;
+import org.apache.lucene.util.LuceneTestCase;
+import org.apache.lucene.util.LuceneTestCase.SuppressFileSystems;
+import org.apache.lucene.util.TestUtil;
+
+/**
+ * Test that the default codec detects bit flips at open or checkIntegrity 
time.
+ */
+@SuppressFileSystems("ExtrasFS")
+public class TestAllFilesDetectBitFlips extends LuceneTestCase {
+
+  public void test() throws Exception {
+doTest(false);
+  }
+
+  public void testCFS() throws Exception {
+doTest(true);
+  }
+
+  public void doTest(boolean cfs) throws Exception {
+Directory dir = newDirectory();
+
+IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random()));
+conf.setCodec(TestUtil.getDefaultCodec());
+
+if (cfs == false) {
+  conf.setUseCompoundFile(false);
+  conf.getMergePolicy().setNoCFSRatio(0.0);
+}
+
+RandomIndexWriter riw = new RandomIndexWriter(random(), dir, conf);
+// Use LineFileDocs so we (hopefully) get most Lucene features

Review comment:
   Woohoo!

##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestAllFilesDetectBitFlips.java
##
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.store.BaseDirectoryWrapper;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.LineFileDocs;
+import org.apache.lucene.util.LuceneTestCase;
+import org.apache.lucene.util.LuceneTestCase.SuppressFileSystems;
+import org.apache.lucene.util.TestUtil;
+
+/**
+ * Test that the default codec detects bit flips at open or checkIntegrity 
time.
+ */
+@SuppressFileSystems("ExtrasFS")
+public class TestAllFilesDetectBitFlips extends LuceneTestCase {
+
+  public void test() throws Exception {
+doTest(false);
+  }
+
+  public void testCFS() throws Exception {
+doTest(true);
+  }
+
+  public void doTest(boolean cfs) throws Exception {
+Directory dir = newDirectory();
+
+IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random()));
+conf.setCodec(TestUtil.getDefaultCodec());
+
+if (cfs == false) {
+  conf.setUseCompoundFile(false);
+  conf.getMergePolicy().setNoCFSRatio(0.0);
+}
+
+RandomIndexWriter riw = new RandomIndexWriter(random(), dir, conf);
+// Use LineFileDocs so we (hopefully) get most Lucene feature

[GitHub] [lucene-solr] gerlowskija opened a new pull request #1570: SOLR-14558: Record all log lines in SolrLogPostTool

2020-06-11 Thread GitBox


gerlowskija opened a new pull request #1570:
URL: https://github.com/apache/lucene-solr/pull/1570


   # Description
   
   Previously, SolrLogPostTool ignored all log-lines that didn't fall into a 
narrow handful of known "types".  This change adds a new "other" type, to hold 
all previously-ignored log records.  This allows all log records to be searched 
- not just the whitelisted cluster-traffic event types.(queries, commits, etc.) 
   
   # Solution
   
   Straightforward implementation.
   
   # Tests
   
   New test to SolrLogPostToolTest.  Manual testing.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.

2020-06-11 Thread GitBox


jpountz commented on pull request #1569:
URL: https://github.com/apache/lucene-solr/pull/1569#issuecomment-642621848


   > So we now check checksums on every file when opening
   
   We only verify checksums of meta file when opening (those that we read 
entirely anyway). Checksums only get verified on other files when 
`LeafReader#checkIntegrity` is called.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.

2020-06-11 Thread GitBox


jpountz commented on a change in pull request #1569:
URL: https://github.com/apache/lucene-solr/pull/1569#discussion_r438754619



##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestAllFilesDetectBitFlips.java
##
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.store.BaseDirectoryWrapper;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.LineFileDocs;
+import org.apache.lucene.util.LuceneTestCase;
+import org.apache.lucene.util.LuceneTestCase.SuppressFileSystems;
+import org.apache.lucene.util.TestUtil;
+
+/**
+ * Test that the default codec detects bit flips at open or checkIntegrity 
time.
+ */
+@SuppressFileSystems("ExtrasFS")
+public class TestAllFilesDetectBitFlips extends LuceneTestCase {
+
+  public void test() throws Exception {
+doTest(false);
+  }
+
+  public void testCFS() throws Exception {
+doTest(true);
+  }
+
+  public void doTest(boolean cfs) throws Exception {
+Directory dir = newDirectory();
+
+IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random()));
+conf.setCodec(TestUtil.getDefaultCodec());
+
+if (cfs == false) {
+  conf.setUseCompoundFile(false);
+  conf.getMergePolicy().setNoCFSRatio(0.0);
+}
+
+RandomIndexWriter riw = new RandomIndexWriter(random(), dir, conf);
+// Use LineFileDocs so we (hopefully) get most Lucene features
+// tested, e.g. IntPoint was recently added to it:
+LineFileDocs docs = new LineFileDocs(random());
+for (int i = 0; i < 100; i++) {
+  riw.addDocument(docs.nextDoc());
+  if (random().nextInt(7) == 0) {
+riw.commit();
+  }
+  if (random().nextInt(20) == 0) {
+riw.deleteDocuments(new Term("docid", Integer.toString(i)));
+  }
+  if (random().nextInt(15) == 0) {
+riw.updateNumericDocValue(new Term("docid", Integer.toString(i)), 
"docid_intDV", Long.valueOf(i));
+  }
+}
+if (TEST_NIGHTLY == false) {
+  riw.forceMerge(1);
+}
+riw.close();
+checkBitFlips(dir);
+dir.close();
+  }
+  
+  private void checkBitFlips(Directory dir) throws IOException {
+for(String name : dir.listAll()) {
+  if (name.equals(IndexWriter.WRITE_LOCK_NAME) == false) {
+corruptFile(dir, name);
+  }
+}
+  }
+  
+  private void corruptFile(Directory dir, String victim) throws IOException {
+try (BaseDirectoryWrapper dirCopy = newDirectory()) {
+  dirCopy.setCheckIndexOnClose(false);
+
+  long victimLength = dir.fileLength(victim);
+  long flipOffset = TestUtil.nextLong(random(), 0, victimLength - 1);
+
+  if (VERBOSE) {
+System.out.println("TEST: now corrupt file " + victim + " by changing 
byte at offset " + flipOffset + " (length= " + victimLength + ")");
+  }
+
+  for(String name : dir.listAll()) {
+if (name.equals(victim) == false) {
+  dirCopy.copyFrom(dir, name, name, IOContext.DEFAULT);
+} else {
+  try (IndexOutput out = dirCopy.createOutput(name, IOContext.DEFAULT);
+  IndexInput in = dir.openInput(name, IOContext.DEFAULT)) {
+  out.copyBytes(in, flipOffset);
+  out.writeByte((byte) (in.readByte() + TestUtil.nextInt(random(), 
0x01, 0xFF)));
+  out.copyBytes(in, victimLength - flipOffset - 1);
+  }
+  try (IndexInput in = dirCopy.openInput(name, IOContext.DEFAULT)) {
+try {
+  CodecUtil.checksumEntireFile(in);
+  System.out.println("TEST: changing a byte in " + victim + " did 
not update the checksum)");

Review comment:
   I haven't seen a single occurrence of it (fortunately! :) )





This is an automated message from the Apache Git Service.
To respond to the message, please log on t

[GitHub] [lucene-solr] msokolov commented on a change in pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.

2020-06-11 Thread GitBox


msokolov commented on a change in pull request #1569:
URL: https://github.com/apache/lucene-solr/pull/1569#discussion_r438751988



##
File path: 
lucene/core/src/test/org/apache/lucene/index/TestAllFilesDetectBitFlips.java
##
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.index;
+
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.store.BaseDirectoryWrapper;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.LineFileDocs;
+import org.apache.lucene.util.LuceneTestCase;
+import org.apache.lucene.util.LuceneTestCase.SuppressFileSystems;
+import org.apache.lucene.util.TestUtil;
+
+/**
+ * Test that the default codec detects bit flips at open or checkIntegrity 
time.
+ */
+@SuppressFileSystems("ExtrasFS")
+public class TestAllFilesDetectBitFlips extends LuceneTestCase {
+
+  public void test() throws Exception {
+doTest(false);
+  }
+
+  public void testCFS() throws Exception {
+doTest(true);
+  }
+
+  public void doTest(boolean cfs) throws Exception {
+Directory dir = newDirectory();
+
+IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random()));
+conf.setCodec(TestUtil.getDefaultCodec());
+
+if (cfs == false) {
+  conf.setUseCompoundFile(false);
+  conf.getMergePolicy().setNoCFSRatio(0.0);
+}
+
+RandomIndexWriter riw = new RandomIndexWriter(random(), dir, conf);
+// Use LineFileDocs so we (hopefully) get most Lucene features
+// tested, e.g. IntPoint was recently added to it:
+LineFileDocs docs = new LineFileDocs(random());
+for (int i = 0; i < 100; i++) {
+  riw.addDocument(docs.nextDoc());
+  if (random().nextInt(7) == 0) {
+riw.commit();
+  }
+  if (random().nextInt(20) == 0) {
+riw.deleteDocuments(new Term("docid", Integer.toString(i)));
+  }
+  if (random().nextInt(15) == 0) {
+riw.updateNumericDocValue(new Term("docid", Integer.toString(i)), 
"docid_intDV", Long.valueOf(i));
+  }
+}
+if (TEST_NIGHTLY == false) {
+  riw.forceMerge(1);
+}
+riw.close();
+checkBitFlips(dir);
+dir.close();
+  }
+  
+  private void checkBitFlips(Directory dir) throws IOException {
+for(String name : dir.listAll()) {
+  if (name.equals(IndexWriter.WRITE_LOCK_NAME) == false) {
+corruptFile(dir, name);
+  }
+}
+  }
+  
+  private void corruptFile(Directory dir, String victim) throws IOException {
+try (BaseDirectoryWrapper dirCopy = newDirectory()) {
+  dirCopy.setCheckIndexOnClose(false);
+
+  long victimLength = dir.fileLength(victim);
+  long flipOffset = TestUtil.nextLong(random(), 0, victimLength - 1);
+
+  if (VERBOSE) {
+System.out.println("TEST: now corrupt file " + victim + " by changing 
byte at offset " + flipOffset + " (length= " + victimLength + ")");
+  }
+
+  for(String name : dir.listAll()) {
+if (name.equals(victim) == false) {
+  dirCopy.copyFrom(dir, name, name, IOContext.DEFAULT);
+} else {
+  try (IndexOutput out = dirCopy.createOutput(name, IOContext.DEFAULT);
+  IndexInput in = dir.openInput(name, IOContext.DEFAULT)) {
+  out.copyBytes(in, flipOffset);
+  out.writeByte((byte) (in.readByte() + TestUtil.nextInt(random(), 
0x01, 0xFF)));
+  out.copyBytes(in, victimLength - flipOffset - 1);
+  }
+  try (IndexInput in = dirCopy.openInput(name, IOContext.DEFAULT)) {
+try {
+  CodecUtil.checksumEntireFile(in);
+  System.out.println("TEST: changing a byte in " + victim + " did 
not update the checksum)");

Review comment:
   curious if you saw this much?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abov

[jira] [Created] (SOLR-14559) Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api

2020-06-11 Thread Erick Erickson (Jira)
Erick Erickson created SOLR-14559:
-

 Summary: Fix or suppress warnings in 
solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api
 Key: SOLR-14559
 URL: https://issues.apache.org/jira/browse/SOLR-14559
 Project: Solr
  Issue Type: Sub-task
Reporter: Erick Erickson
Assignee: Erick Erickson


There's considerable overhead in testing and precommit, so fixing up one 
directory at a time is getting tedious as there are fewer and fewer warnings in 
particular directories. This set will fix about half the remaining warnings 
outside of solrj, 300 or so. Then one more Jira will fix the remaining warnings 
in Solr (exclusive of SolrJ).

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-06-11 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133199#comment-17133199
 ] 

Adrien Grand commented on LUCENE-9356:
--

Thanks to LUCENE-7822 and LUCENE-9359, Lucene now always throws a 
CorruptIndexException or an IndexFormatToo(Old|New)Exception when opening and 
then calling checkIntegrity on an index. The attached PR adds a test.

> Add tests for corruptions caused by byte flips
> --
>
> Key: LUCENE-9356
> URL: https://issues.apache.org/jira/browse/LUCENE-9356
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We already have tests that file truncation and modification of the index 
> headers are caught correctly. I'd like to add another test that flipping a 
> byte in a way that modifies the checksum of the file is always caught 
> gracefully by Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz opened a new pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.

2020-06-11 Thread GitBox


jpountz opened a new pull request #1569:
URL: https://github.com/apache/lucene-solr/pull/1569


   Opening a reader and then calling checkIntegrity must throw a 
`CorruptIndexException` or an `IndexFormatToo(Old|New)Exception`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14541) Ensure classes that implement equals implement hashCode or suppress warnings

2020-06-11 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133187#comment-17133187
 ] 

Erick Erickson commented on SOLR-14541:
---

[~murblanc] Thanks. As you can tell, I barely looked for causes, I was just 
excited that it surfaced after implementing your suggestion (I have to go back 
and re-do the ones in solrj/io that I just suppressed).

I'll wait for [~ab]  to weigh in on what the necessary implementation would be. 
If this were ever used to compute a key, things would be messy since leadership 
change would compute a duplicate entry one way or another for what is 
conceptually the same node.

So returning zero might make sense.

> Ensure classes that implement equals implement hashCode or suppress warnings
> 
>
> Key: SOLR-14541
> URL: https://issues.apache.org/jira/browse/SOLR-14541
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: 0001-SOLR-14541-add-hashCode-for-some-classes.patch, 
> 0002-SOLR-14541-add-hashCode-for-some-classes-in-autoscal.patch, 
> 0003-SOLR-14541-add-hashCode-or-remove-equals-for-some-cl.patch
>
>
> While looking at warnings, I found that the following classes generate this 
> warning:
> *overrides equals, but neither it nor any superclass overrides hashCode 
> method*
> I can suppress the warning, but this has been a source of errors in the past 
> so I'm reluctant to just do that blindly.
> NOTE: The Lucene one should probably be it's own Jira if it's going to have 
> hashCode implemented, but here for triage.
> What I need for each method is for someone who has a clue about that 
> particular code to render an opinion that we can safely suppress the warning 
> or to provide a hashCode method.
> Some of these have been here for a very long time and were implemented by 
> people no longer active...
> lucene/suggest/src/java/org/apache/lucene/search/spell/LuceneLevenshteinDistance.java:39
> solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java:34
>  solr/solrj/src/java/org/apache/solr/common/cloud/Replica.java:26
>  solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java:49
> solr/core/src/java/org/apache/solr/cloud/rule/Rule.java:277
>  solr/core/src/java/org/apache/solr/pkg/PackageAPI.java:177
>  solr/core/src/java/org/apache/solr/packagemanager/SolrPackageInstance.java:31
>  
> Noble Paul says it's OK to suppress warnings for these:
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/VersionedData.java:31
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:61
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:150
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:252
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:45
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Policy.java:73
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Preference.java:32
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/ReplicaInfo.java:39
>  
> Joel Bernstein says it's OK to suppress warnings for these:
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/ReplicaCount.java:27
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpression.java:25
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionNamedParameter.java:23
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/CloudSolrStream.java:467
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/DeepRandomStream.java:417
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionValue.java:22
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14442) bin/solr to attempt jstack before killing hung Solr instance

2020-06-11 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev resolved SOLR-14442.
-
Fix Version/s: 8.6
   Resolution: Fixed

> bin/solr to attempt jstack before killing hung Solr instance
> 
>
> Key: SOLR-14442
> URL: https://issues.apache.org/jira/browse/SOLR-14442
> Project: Solr
>  Issue Type: Improvement
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: 8.6
>
> Attachments: SOLR-14442.patch, SOLR-14442.patch, SOLR-14442.patch, 
> screenshot-1.png
>
>
> If a Solr instance did not respond to the 'stop' command in a timely manner 
> then the {{bin/solr}} script will attempt to forcefully kill it: 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.5.1/solr/bin/solr#L859]
> Gathering of information (e.g. a jstack of the java process) before the kill 
> command may be helpful in determining why the instance did not stop as 
> expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14541) Ensure classes that implement equals implement hashCode or suppress warnings

2020-06-11 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133111#comment-17133111
 ] 

Ilan Ginzburg commented on SOLR-14541:
--

The stack trace you posted [~erickerickson] is because {{ReplicaInfo}} is 
stored as a value (not key!) in a {{HashMap}} and we happen to get the hashCode 
of that whole hash map (map {{properties}} in {{TriggerEvent.hashCode()}}), and 
this iterates over all entries of the hash map and computes the hash value of 
the key as well as of the value.

{{SearchRateEvent}} in {{SearchRateTrigger}} for example is a {{TriggerEvent}}. 
It adds lists of {{ReplicaInfo}} into the {{properties}} map.
The issue is not limited to test code.


> Ensure classes that implement equals implement hashCode or suppress warnings
> 
>
> Key: SOLR-14541
> URL: https://issues.apache.org/jira/browse/SOLR-14541
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: 0001-SOLR-14541-add-hashCode-for-some-classes.patch, 
> 0002-SOLR-14541-add-hashCode-for-some-classes-in-autoscal.patch, 
> 0003-SOLR-14541-add-hashCode-or-remove-equals-for-some-cl.patch
>
>
> While looking at warnings, I found that the following classes generate this 
> warning:
> *overrides equals, but neither it nor any superclass overrides hashCode 
> method*
> I can suppress the warning, but this has been a source of errors in the past 
> so I'm reluctant to just do that blindly.
> NOTE: The Lucene one should probably be it's own Jira if it's going to have 
> hashCode implemented, but here for triage.
> What I need for each method is for someone who has a clue about that 
> particular code to render an opinion that we can safely suppress the warning 
> or to provide a hashCode method.
> Some of these have been here for a very long time and were implemented by 
> people no longer active...
> lucene/suggest/src/java/org/apache/lucene/search/spell/LuceneLevenshteinDistance.java:39
> solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java:34
>  solr/solrj/src/java/org/apache/solr/common/cloud/Replica.java:26
>  solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java:49
> solr/core/src/java/org/apache/solr/cloud/rule/Rule.java:277
>  solr/core/src/java/org/apache/solr/pkg/PackageAPI.java:177
>  solr/core/src/java/org/apache/solr/packagemanager/SolrPackageInstance.java:31
>  
> Noble Paul says it's OK to suppress warnings for these:
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/VersionedData.java:31
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:61
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:150
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:252
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:45
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Policy.java:73
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Preference.java:32
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/ReplicaInfo.java:39
>  
> Joel Bernstein says it's OK to suppress warnings for these:
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/ReplicaCount.java:27
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpression.java:25
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionNamedParameter.java:23
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/CloudSolrStream.java:467
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/DeepRandomStream.java:417
>  
> solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionValue.java:22
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram opened a new pull request #1568: SOLR-14537 Improve performance of ExportWriter

2020-06-11 Thread GitBox


sigram opened a new pull request #1568:
URL: https://github.com/apache/lucene-solr/pull/1568


   Details in Jira.
   
   Initial changes here implement the "double buffering" approach to increase 
the throughput - an additional thread is created to fill in a buffer while the 
main thread writes out the documents from the other buffer.
   
   Lucene TermEnum-s and DocValues are not multi-thread safe, that's why this 
change required the documents to be fully materialized in the buffer before 
handing it over to the other thread for writing. I think this is an acceptable 
tradeoff between reasonable amount of memory in the buffer and speed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14557) eDisMax parser switch + braces regression

2020-06-11 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-14557:

Summary: eDisMax parser switch + braces regression  (was: eDisMax 
(regression))

> eDisMax parser switch + braces regression
> -
>
> Key: SOLR-14557
> URL: https://issues.apache.org/jira/browse/SOLR-14557
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Mikhail Khludnev
>Priority: Major
>  Labels: painful
>
> h2. Solr 4.5
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
>  
>  goes like
>  {code}
>  \{!lucene}(foo)
>  content:foo
>  LuceneQParser
> {code}
> fine
> h2. Solr 8.2 
> with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but 
> it's a question of migrating existing queries. 
> {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} 
> goes like 
> {code}
> "querystring":"\{!lucene}(foo)",
>  "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene 
> Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) 
>  "QParser":"ExtendedDismaxQParser",
> {code}
> blah... 
> but removing braces in 8.2 works perfectly fine 
> {code}
> "querystring":"\{!lucene}foo",
>  "parsedquery":"+content:foo",
>  "parsedquery_toString":"+content:foo",
>  "QParser":"ExtendedDismaxQParser",
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9397) UniformSplit supports encodable fields metadata

2020-06-11 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133028#comment-17133028
 ] 

Bruno Roustant commented on LUCENE-9397:


Currently we use the encoder interface to cypher term blocks, FST and fields 
metadata. We don't attach more data.
However I'm going to work on LUCENE-9379 for a directory-based approach of 
encryption that would not be tied to a postings format. Eventually we would 
like to move to that solution.

> UniformSplit supports encodable fields metadata
> ---
>
> Key: LUCENE-9397
> URL: https://issues.apache.org/jira/browse/LUCENE-9397
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UniformSplit already supports custom encoding for term blocks. This is an 
> extension to also support encodable fields metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org