[jira] [Commented] (LUCENE-8574) ExpressionFunctionValues should cache per-hit value
[ https://issues.apache.org/jira/browse/LUCENE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133940#comment-17133940 ] ASF subversion and git services commented on LUCENE-8574: - Commit 2991acf8fffe9dbeda20c24479b108bfb8ea9257 in lucene-solr's branch refs/heads/master from Patrick Zhai [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2991acf ] LUCENE-9391: Upgrade HPPC to 0.8.2 (#1560) * LUCENE-8574: Upgrade HPPC to 0.8.2 (Co-authored-by: Haoyu Zhai ) > ExpressionFunctionValues should cache per-hit value > --- > > Key: LUCENE-8574 > URL: https://issues.apache.org/jira/browse/LUCENE-8574 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.5, 8.0 >Reporter: Michael McCandless >Assignee: Robert Muir >Priority: Major > Attachments: LUCENE-8574.patch > > Time Spent: 1h > Remaining Estimate: 0h > > The original version of {{ExpressionFunctionValues}} had a simple per-hit > cache, so that nested expressions that reference the same common variable > would compute the value for that variable the first time it was referenced > and then use that cached value for all subsequent invocations, within one > hit. I think it was accidentally removed in LUCENE-7609? > This is quite important if you have non-trivial expressions that reference > the same variable multiple times. > E.g. if I have these expressions: > {noformat} > x = c + d > c = b + 2 > d = b * 2{noformat} > Then evaluating x should only cause b's value to be computed once (for a > given hit), but today it's computed twice. The problem is combinatoric if b > then references another variable multiple times, etc. > I think to fix this we just need to restore the per-hit cache? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9391) Upgrade to HPPC 0.8.2
[ https://issues.apache.org/jira/browse/LUCENE-9391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133939#comment-17133939 ] ASF subversion and git services commented on LUCENE-9391: - Commit 2991acf8fffe9dbeda20c24479b108bfb8ea9257 in lucene-solr's branch refs/heads/master from Patrick Zhai [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2991acf ] LUCENE-9391: Upgrade HPPC to 0.8.2 (#1560) * LUCENE-8574: Upgrade HPPC to 0.8.2 (Co-authored-by: Haoyu Zhai ) > Upgrade to HPPC 0.8.2 > - > > Key: LUCENE-9391 > URL: https://issues.apache.org/jira/browse/LUCENE-9391 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Haoyu Zhai >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > HPPC 0.8.2 is out and exposes an Accountable-like interface using to estimate > the memory usage. > [https://issues.carrot2.org/secure/ReleaseNote.jspa?projectId=10070&version=13522&styleName=Text] > We should upgrade to that if any of components using hppc need to estimate > memory better. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss merged pull request #1560: LUCENE-9391: Upgrade HPPC to 0.8.2
dweiss merged pull request #1560: URL: https://github.com/apache/lucene-solr/pull/1560 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness
[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133801#comment-17133801 ] Chris M. Hostetter commented on SOLR-13132: --- Hey Michael, I still haven't had a chance to dig into the recent commits, hopefully i can do that tomorrow, but in response to some of your comments here... bq ... To support such a case, a shim is required because the code paths that do the actual count accumulation (in ByArrayUIF and ByArrayDV) used to directly increment processor.countAcc, and have now been switched to register counts via the SweepDocIterator and SweepDISI abstractions, ... Right right right ... I'm really sorry, i keep forgetting: the changes in this issue to "support" sweeping as a concept affect the the low level impls of ByArrayUIF & ByArrayDV such that now they _only_ work by "sweeping" over the set defined by the SweepingCountSlotAcc – the only question (at run time) is whether that set comes from *just* the "base set" or if there are any other sets (provided by other CountSlotAccs, in turn provided by other SweepableSlotAcc) that they sweep over at the same time So, with that in mind: please ignore/retract my earlier comments about being concerned about subclasses that don't want to sweep * If it simplifies the code, we can certainly assume/assert that any/all future hypothetical ByArrayUIF & ByArrayDV subclasses *must* support sweeping & use a SweepingCountSlotAcc ** provided we make sure to spell that out in the javadocs. * It would be _nice_ to keep the changes to FacetFieldProcessorByArray to a minimum, and say "FacetFieldProcessorByArray will/should not assume all subclasses can sweep" – but there's no reason we _have_ to ** *_If_* the code would be a lot simpler to say "all current FacetFieldProcessorByArray subclasses use sweeping, so we're going to document that from now on any additional future subclasses of FacetFieldProcessorByArray use sweeping and assert/assume that in the common FacetFieldProcessorByArray code" then we can certainly do that *** ie: would that allow us to remove the Shim? *** ie: would it allow us to refactor/merge the Shim impl into the DEV_NULL_COUNT_ACC impl? (IIRC the refinement code path that uses the DEV_NULL countAcc is in FacetFieldProcessorByArray ... correct?) So my question to you is: What do you think? Do you think there are simplification gains to be had if we add assertions & assumptions about these classes always using SweepingCountSlotAcc? {quote}I've thought a bit more about the question of how to detect the allBuckets slot for disabling allBuckets relatedness: I don't really have any good answers, but a handful of thoughts: ... {quote} # i don't think adding SlotContext to the setValues(...) API would work in general because in practice there's no guarantee Processors will have a valid SlotContext at that point in time (i'm thinking of per-segment DVs that use the soltNum as an ord lookup, or TermEnum that just returns the "current" term) # i do think the "papa-bear" approach would work well in the long run (both in terms of being a clean/consistent API and being useful for this particular problem), but i'm still not convinced it's worth the hassle at this point since we really only have this one useage where it matters # considering how much of an "edge case on an edge case" we're talking about, you're current "hack" is growing on me, provided we add some more conditional logic to protect against the possibility of a ClassCastEx if anyone ever adds "sweep" support to some other processor (ie: FacetQueryProcessor) ...either way: I'd suggest we punt for now and worry about all the other nocommits before worrying about the the "which slot is allBuckets when sweeping?" nocommit. > Improve JSON "terms" facet performance when sorted by relatedness > -- > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module >Affects Versions: 7.4, master (9.0) >Reporter: Michael Gibney >Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calcul
[GitHub] [lucene-solr] janhoy commented on pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated
janhoy commented on pull request #1572: URL: https://github.com/apache/lucene-solr/pull/1572#issuecomment-642990676 I think the only thing I'm lacking is a real integration test. I validated manually that core creation fails in `/tmp`, and that setting `-Dsolr.allowPaths=/tmp` allows it: https://user-images.githubusercontent.com/409128/84450542-b9a23d00-ac50-11ea-9ff1-253f9139685e.png";> This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14561) Validate parameters to CoreAdminAPI
[ https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133782#comment-17133782 ] Jan Høydahl commented on SOLR-14561: First PR ready, see [https://github.com/apache/lucene-solr/pull/1572] > Validate parameters to CoreAdminAPI > --- > > Key: SOLR-14561 > URL: https://issues.apache.org/jira/browse/SOLR-14561 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > CoreAdminAPI does not validate parameter input. We should limit what users > can specify for at least {{instanceDir and dataDir}} params, perhaps restrict > them to be relative to SOLR_HOME or SOLR_DATA_HOME. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14561) Validate parameters to CoreAdminAPI
[ https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-14561: --- Description: CoreAdminAPI does not validate parameter input. We should limit what users can specify for at least {{instanceDir and dataDir}} params, perhaps restrict them to be relative to SOLR_HOME or SOLR_DATA_HOME. (was: CoreAdminAPI does not validate parameter input. We should limit what users can specify for at least {{instanceDir }}and {{dataDir}} params, perhaps restrict them to be relative to SOLR_HOME or SOLR_DATA_HOME.) > Validate parameters to CoreAdminAPI > --- > > Key: SOLR-14561 > URL: https://issues.apache.org/jira/browse/SOLR-14561 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > CoreAdminAPI does not validate parameter input. We should limit what users > can specify for at least {{instanceDir and dataDir}} params, perhaps restrict > them to be relative to SOLR_HOME or SOLR_DATA_HOME. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14561) Validate parameters to CoreAdminAPI
[ https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reassigned SOLR-14561: -- Assignee: Jan Høydahl > Validate parameters to CoreAdminAPI > --- > > Key: SOLR-14561 > URL: https://issues.apache.org/jira/browse/SOLR-14561 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > CoreAdminAPI does not validate parameter input. We should limit what users > can specify for at least {{instanceDir }}and {{dataDir}} params, perhaps > restrict them to be relative to SOLR_HOME or SOLR_DATA_HOME. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy opened a new pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated
janhoy opened a new pull request #1572: URL: https://github.com/apache/lucene-solr/pull/1572 See https://issues.apache.org/jira/browse/SOLR-14561 The `instanceDir` and `dataDir` params must now be relative to either `SOLR_HOME`, `SOLR_DATA_HOME` or `coreRootDir`. Added new solr.xml config 'allowPaths', controlled by system property 'solr.allowPaths' that allows you to add other allowed paths when needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14559) Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api
[ https://issues.apache.org/jira/browse/SOLR-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-14559. --- Fix Version/s: 8.6 Resolution: Fixed > Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, > response, cloud, security, schema, api > --- > > Key: SOLR-14559 > URL: https://issues.apache.org/jira/browse/SOLR-14559 > Project: Solr > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Fix For: 8.6 > > > There's considerable overhead in testing and precommit, so fixing up one > directory at a time is getting tedious as there are fewer and fewer warnings > in particular directories. This set will fix about half the remaining > warnings outside of solrj, 300 or so. Then one more Jira will fix the > remaining warnings in Solr (exclusive of SolrJ). > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14559) Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api
[ https://issues.apache.org/jira/browse/SOLR-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133732#comment-17133732 ] ASF subversion and git services commented on SOLR-14559: Commit 01f6cd3a84ef6a002f0f7ae1129bd74cdc2f5c01 in lucene-solr's branch refs/heads/branch_8x from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=01f6cd3 ] SOLR-14559: Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api > Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, > response, cloud, security, schema, api > --- > > Key: SOLR-14559 > URL: https://issues.apache.org/jira/browse/SOLR-14559 > Project: Solr > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > There's considerable overhead in testing and precommit, so fixing up one > directory at a time is getting tedious as there are fewer and fewer warnings > in particular directories. This set will fix about half the remaining > warnings outside of solrj, 300 or so. Then one more Jira will fix the > remaining warnings in Solr (exclusive of SolrJ). > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14559) Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api
[ https://issues.apache.org/jira/browse/SOLR-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133731#comment-17133731 ] ASF subversion and git services commented on SOLR-14559: Commit ff391448d1648c4027133c58248bf7f1aabe5d96 in lucene-solr's branch refs/heads/master from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ff39144 ] SOLR-14559: Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api > Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, > response, cloud, security, schema, api > --- > > Key: SOLR-14559 > URL: https://issues.apache.org/jira/browse/SOLR-14559 > Project: Solr > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > There's considerable overhead in testing and precommit, so fixing up one > directory at a time is getting tedious as there are fewer and fewer warnings > in particular directories. This set will fix about half the remaining > warnings outside of solrj, 300 or so. Then one more Jira will fix the > remaining warnings in Solr (exclusive of SolrJ). > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8392) SolrParam.get(String) returns String and shouldn't be used in other instanceof checks
[ https://issues.apache.org/jira/browse/SOLR-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133721#comment-17133721 ] David Smiley commented on SOLR-8392: Thanks Mike! I like the assertions. I noticed a special case on the empty string that has me scratching my head (and yours too I see with the appropriate addition of the comment). Git blame points to [~noble.paul] see https://github.com/apache/lucene-solr/blob/fb98f30a61f929326105718d2d284d761ac1b6e3/solr/core/src/java/org/apache/solr/core/RequestParams.java#L91 What is that about? We we copy the array for values when the key is non-empty, shouldn't we do the same when the key is empty? BTW as a small optimization, we might not copy the values array if the size is zero. I'm not sure if that would happen in practice though. > SolrParam.get(String) returns String and shouldn't be used in other > instanceof checks > - > > Key: SOLR-8392 > URL: https://issues.apache.org/jira/browse/SOLR-8392 > Project: Solr > Issue Type: Bug >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-8392.patch, SOLR-8392.patch > > Time Spent: 40m > Remaining Estimate: 0h > > There's a couple of places where we declare the return type of > solrParams.get() as an Object and then do instanceof checks for other types. > Since we know it will be a String, we can simplify this logic in several > places. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1546: SOLR: Use absolute paths for server paths.
dsmiley commented on a change in pull request #1546: URL: https://github.com/apache/lucene-solr/pull/1546#discussion_r439083930 ## File path: solr/core/src/java/org/apache/solr/core/CoreDescriptor.java ## @@ -182,7 +182,7 @@ public CoreDescriptor(String coreName, CoreDescriptor other) { */ public CoreDescriptor(String name, Path instanceDir, Map coreProps, Properties containerProperties, ZkController zkController) { -this.instanceDir = instanceDir; +this.instanceDir = instanceDir.toAbsolutePath(); Review comment: That makes sense; thanks. I see that all callers send absolute paths already. So I think I should change this to throw an exception if it isn't absolute to protect us from mistakes. I'll push a new commit here for review. I've never done that for an already merged PR; we'll see how it goes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gandhi-viral commented on pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.
gandhi-viral commented on pull request #1543: URL: https://github.com/apache/lucene-solr/pull/1543#issuecomment-642931662 > @gandhi-viral That would work for me but I'd like to make sure we're talking about the same thing: > > * Lucene86DocValuesConsumer gets a ctor argument to configure the threshold. > * Lucene86DocValuesFormat keeps 32 as a default value. > * You would create your own DocValuesFormat that would reuse Lucene86DocValuesProducer and create a Lucene86DocValuesConsumer with a high threshold for compression of binary values. > * You would enable this format by overriding getDocValueFormatForField in Lucene86Codec. > * This would mean that your indices would no longer have backward compatibility guarantees of the default codec (N-1) but maybe you don't care since you're re-building your indices from scratch on a regular basis? Yes, that's what I had in mind too. Currently, we are doing similar thing after `8.5.1` upgrade to keep using forked BDVs from `8.4`. You are right about backward compatibility guarantees not being an issue for our use-case since we do re-build our indices on each software deployment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on a change in pull request #1546: SOLR: Use absolute paths for server paths.
janhoy commented on a change in pull request #1546: URL: https://github.com/apache/lucene-solr/pull/1546#discussion_r439060787 ## File path: solr/core/src/java/org/apache/solr/core/CoreDescriptor.java ## @@ -182,7 +182,7 @@ public CoreDescriptor(String coreName, CoreDescriptor other) { */ public CoreDescriptor(String name, Path instanceDir, Map coreProps, Properties containerProperties, ZkController zkController) { -this.instanceDir = instanceDir; +this.instanceDir = instanceDir.toAbsolutePath(); Review comment: A bit late, but I don't think this is necessary, as all callers will send absolute paths. And if you ever get a relative path, resolving it with `toAbsolutePath()` leads to it being relative to whatever CWD the app is started with, while the typical resolving of relative `instanceDir` is to resolve it relative to CoreContainer#coreRootDirectory. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson commented on pull request #1563: LUCENE-9394: fix and suppress warnings
ErickErickson commented on pull request #1563: URL: https://github.com/apache/lucene-solr/pull/1563#issuecomment-642915566 speaking from experience, when dealing with this many changes in a big wodge, it’s _very easy_ to have some things slip through ;) > On Jun 11, 2020, at 3:34 PM, Michael Sokolov wrote: > > > Thanks for the comments, @madrob, I posted a new PR addressing them. I'm not sure how I missed all that unused code in RandomizedShapeTestCase - it's pretty bare now! > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub, or unsubscribe. > This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.
jpountz commented on a change in pull request #1543: URL: https://github.com/apache/lucene-solr/pull/1543#discussion_r439048284 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -404,32 +406,51 @@ private void flushData() throws IOException { // Write offset to this block to temporary offsets file totalChunks++; long thisBlockStartPointer = data.getFilePointer(); - -// Optimisation - check if all lengths are same Review comment: If all docs are the same length, then `numBytes`would be 0 below and we only encode the average length, so this case is still optimized. ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -762,6 +764,97 @@ public BytesRef binaryValue() throws IOException { // Decompresses blocks of binary values to retrieve content class BinaryDecoder { +private final LongValues addresses; +private final IndexInput compressedData; +// Cache of last uncompressed block +private long lastBlockId = -1; +private final ByteBuffer deltas; +private int numBytes; +private int uncompressedBlockLength; +private int avgLength; +private final byte[] uncompressedBlock; +private final BytesRef uncompressedBytesRef; +private final int docsPerChunk; +private final int docsPerChunkShift; + +public BinaryDecoder(LongValues addresses, IndexInput compressedData, int biggestUncompressedBlockSize, int docsPerChunkShift) { + super(); + this.addresses = addresses; + this.compressedData = compressedData; + // pre-allocate a byte array large enough for the biggest uncompressed block needed. + this.uncompressedBlock = new byte[biggestUncompressedBlockSize]; + uncompressedBytesRef = new BytesRef(uncompressedBlock); + this.docsPerChunk = 1 << docsPerChunkShift; + this.docsPerChunkShift = docsPerChunkShift; + deltas = ByteBuffer.allocate((docsPerChunk + 1) * Integer.BYTES); + deltas.order(ByteOrder.LITTLE_ENDIAN); +} + +private void decodeBlock(int blockId) throws IOException { + long blockStartOffset = addresses.get(blockId); + compressedData.seek(blockStartOffset); + + final long token = compressedData.readVLong(); + uncompressedBlockLength = (int) (token >>> 4); + avgLength = uncompressedBlockLength >>> docsPerChunkShift; + numBytes = (int) (token & 0x0f); + switch (numBytes) { +case Integer.BYTES: + deltas.putInt(0, (int) 0); + compressedData.readBytes(deltas.array(), Integer.BYTES, docsPerChunk * Integer.BYTES); + break; +case Byte.BYTES: + compressedData.readBytes(deltas.array(), Byte.BYTES, docsPerChunk * Byte.BYTES); + break; +case 0: + break; +default: + throw new CorruptIndexException("Invalid number of bytes: " + numBytes, compressedData); + } + + if (uncompressedBlockLength == 0) { +uncompressedBytesRef.offset = 0; +uncompressedBytesRef.length = 0; + } else { +assert uncompressedBlockLength <= uncompressedBlock.length; +LZ4.decompress(compressedData, uncompressedBlockLength, uncompressedBlock); + } +} + +BytesRef decode(int docNumber) throws IOException { + int blockId = docNumber >> docsPerChunkShift; + int docInBlockId = docNumber % docsPerChunk; + assert docInBlockId < docsPerChunk; + + + // already read and uncompressed? + if (blockId != lastBlockId) { +decodeBlock(blockId); +lastBlockId = blockId; + } + + int startDelta = 0, endDelta = 0; + switch (numBytes) { +case Integer.BYTES: + startDelta = deltas.getInt(docInBlockId * Integer.BYTES); + endDelta = deltas.getInt((docInBlockId + 1) * Integer.BYTES); Review comment: The trick I'm using is that I'm reading 32 values starting at offset 1. This helps avoid a condition for the first value of the block, but we're still writing/reading only 32 values. ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -404,32 +406,51 @@ private void flushData() throws IOException { // Write offset to this block to temporary offsets file totalChunks++; long thisBlockStartPointer = data.getFilePointer(); - -// Optimisation - check if all lengths are same -boolean allLengthsSame = true; -for (int i = 1; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; i++) { - if (docLengths[i] != docLengths[i-1]) { -allLengthsSame = false; + +final int avgLength = uncompressedBlockLength >>> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; +int offset = 0; +// Turn docLengths into deltas from expe
[GitHub] [lucene-solr] gandhi-viral commented on pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.
gandhi-viral commented on pull request #1543: URL: https://github.com/apache/lucene-solr/pull/1543#issuecomment-642889243 Red-line QPS (throughput) based on our internal benchmarking is still unfortunately suffering (-49%) with the latest PR. We were able to isolate one particular field, a ~90 byte on average metadata field, which is causing most of our regression. After disabling compression on that particular field, we are at -8% red-line QPS compared to using Lucene 8.4 BDVs. Looking further into the access pattern for that field, we see that (num_access / num_blocks_decompressed = 1.51), so we are decompressing a whole block per every ~1.5 hits. By temporarily using `BINARY_LENGTH_COMPRESSION_THRESHOLD = 1` to effectively disable the LZ4 compression, we are at -2% red-line QPS, which we could live with. Could we maybe add an option to the `Lucene80DocValuesConsumer` constructor to disable compression for BinaryDocValues, or to control the 32 byte threshold? We could enable this compression by default, since it’s clearly helpful in many cases from the `luceneutil` benchmarks, but let expert users create their custom Codec to control it. Thank you @jpountz for your help. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on pull request #1563: LUCENE-9394: fix and suppress warnings
msokolov commented on pull request #1563: URL: https://github.com/apache/lucene-solr/pull/1563#issuecomment-642886971 Thanks for the comments, @madrob, I posted a new PR addressing them. I'm not sure how I missed all that unused code in RandomizedShapeTestCase - it's pretty bare now! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8392) SolrParam.get(String) returns String and shouldn't be used in other instanceof checks
[ https://issues.apache.org/jira/browse/SOLR-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133616#comment-17133616 ] Mike Drob commented on SOLR-8392: - Hopefully that fixes this, but if it doesn't then we should at least get a good idea of the failures that we can see. > SolrParam.get(String) returns String and shouldn't be used in other > instanceof checks > - > > Key: SOLR-8392 > URL: https://issues.apache.org/jira/browse/SOLR-8392 > Project: Solr > Issue Type: Bug >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-8392.patch, SOLR-8392.patch > > Time Spent: 40m > Remaining Estimate: 0h > > There's a couple of places where we declare the return type of > solrParams.get() as an Object and then do instanceof checks for other types. > Since we know it will be a String, we can simplify this logic in several > places. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8392) SolrParam.get(String) returns String and shouldn't be used in other instanceof checks
[ https://issues.apache.org/jira/browse/SOLR-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133615#comment-17133615 ] ASF subversion and git services commented on SOLR-8392: --- Commit fb98f30a61f929326105718d2d284d761ac1b6e3 in lucene-solr's branch refs/heads/master from Mike Drob [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fb98f30 ] SOLR-8392 type safety on SolrParam (#1556) > SolrParam.get(String) returns String and shouldn't be used in other > instanceof checks > - > > Key: SOLR-8392 > URL: https://issues.apache.org/jira/browse/SOLR-8392 > Project: Solr > Issue Type: Bug >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: 7.0 > > Attachments: SOLR-8392.patch, SOLR-8392.patch > > Time Spent: 40m > Remaining Estimate: 0h > > There's a couple of places where we declare the return type of > solrParams.get() as an Object and then do instanceof checks for other types. > Since we know it will be a String, we can simplify this logic in several > places. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-8392) SolrParam.get(String) returns String and shouldn't be used in other instanceof checks
[ https://issues.apache.org/jira/browse/SOLR-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob resolved SOLR-8392. - Fix Version/s: (was: 7.0) master (9.0) Resolution: Fixed > SolrParam.get(String) returns String and shouldn't be used in other > instanceof checks > - > > Key: SOLR-8392 > URL: https://issues.apache.org/jira/browse/SOLR-8392 > Project: Solr > Issue Type: Bug >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-8392.patch, SOLR-8392.patch > > Time Spent: 40m > Remaining Estimate: 0h > > There's a couple of places where we declare the return type of > solrParams.get() as an Object and then do instanceof checks for other types. > Since we know it will be a String, we can simplify this logic in several > places. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob merged pull request #1556: SOLR-8392 type safety on SolrParam
madrob merged pull request #1556: URL: https://github.com/apache/lucene-solr/pull/1556 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14561) Validate parameters to CoreAdminAPI
Jan Høydahl created SOLR-14561: -- Summary: Validate parameters to CoreAdminAPI Key: SOLR-14561 URL: https://issues.apache.org/jira/browse/SOLR-14561 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Jan Høydahl CoreAdminAPI does not validate parameter input. We should limit what users can specify for at least {{instanceDir }}and {{dataDir}} params, perhaps restrict them to be relative to SOLR_HOME or SOLR_DATA_HOME. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order
madrob commented on a change in pull request #1561: URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r439010217 ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ## @@ -253,20 +277,22 @@ public void run() { continue; } - blockedTasks.clear(); // clear it now; may get refilled below. + // clear the blocked tasks, may get refilled below. Given blockedTasks can only get entries from heads and heads + // has at most MAX_BLOCKED_TASKS tasks, blockedTasks will never exceed MAX_BLOCKED_TASKS entries. + // Note blockedTasks can't be cleared too early as it is used in the excludedTasks Predicate above. + blockedTasks.clear(); + + // Trigger the creation of a new Session used for locking when/if a lock is later acquired on the OverseerCollectionMessageHandler + batchSessionId++; - taskBatch.batchId++; boolean tooManyTasks = false; for (QueueEvent head : heads) { if (!tooManyTasks) { - synchronized (runningTasks) { tooManyTasks = runningTasksSize() >= MAX_PARALLEL_TASKS; - } } if (tooManyTasks) { // Too many tasks are running, just shove the rest into the "blocked" queue. - if(blockedTasks.size() < MAX_BLOCKED_TASKS) -blockedTasks.put(head.getId(), head); + blockedTasks.put(head.getId(), head); Review comment: Ah, ok, I saw that but then missed the connection by the time I got to this method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order
murblanc commented on a change in pull request #1561: URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r439009885 ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ## @@ -95,16 +95,25 @@ private volatile Stats stats; - // Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. - // It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not - // deleted from the work-queue as that is a batched operation. + /** + * Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. + * It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not + * deleted from the work-queue as that is a batched operation. + */ final private Set runningZKTasks; - // This map may contain tasks which are read from work queue but could not - // be executed because they are blocked or the execution queue is full - // This is an optimization to ensure that we do not read the same tasks - // again and again from ZK. + + /** + * This map may contain tasks which are read from work queue but could not + * be executed because they are blocked or the execution queue is full + * This is an optimization to ensure that we do not read the same tasks + * again and again from ZK. + */ final private Map blockedTasks = Collections.synchronizedMap(new LinkedHashMap<>()); Review comment: We need a map for the predicate to check presence of an id (map keys also used for logs, but if it was the only use we could work around). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14557) eDisMax parser switch + braces regression
[ https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133600#comment-17133600 ] David Smiley commented on SOLR-14557: - Thanks for clarifying. Then it seems there is a bug in edismax or the underlying query parser syntax rules that we use javacc for. I know very little of that part so you'll have to dig. I don't think SOLR-11501 is the true cause; the former behavior short circuited the query parser altogether to switch it at a higher level. That basically masked whatever deficiencies edismax had and still has in parsing a Lucene query. > eDisMax parser switch + braces regression > - > > Key: SOLR-14557 > URL: https://issues.apache.org/jira/browse/SOLR-14557 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Mikhail Khludnev >Priority: Major > Labels: painful > > h2. Solr 4.5 > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > > goes like > {code} > \{!lucene}(foo) > content:foo > LuceneQParser > {code} > fine > h2. Solr 8.2 > with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but > it's a question of migrating existing queries. > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > goes like > {code} > "querystring":"\{!lucene}(foo)", > "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene > Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) > "QParser":"ExtendedDismaxQParser", > {code} > blah... > but removing braces in 8.2 works perfectly fine > {code} > "querystring":"\{!lucene}foo", > "parsedquery":"+content:foo", > "parsedquery_toString":"+content:foo", > "QParser":"ExtendedDismaxQParser", > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order
madrob commented on a change in pull request #1561: URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r439007892 ## File path: solr/solrj/src/java/org/apache/solr/common/params/CollectionParams.java ## @@ -42,31 +42,30 @@ enum LockLevel { -CLUSTER(0), -COLLECTION(1), -SHARD(2), -REPLICA(3), -NONE(10); - -public final int level; - -LockLevel(int i) { - this.level = i; +NONE(10, null), Review comment: Didn't consider that; yea, that's a good reason for reordering. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order
madrob commented on a change in pull request #1561: URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r439007539 ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ## @@ -95,16 +95,25 @@ private volatile Stats stats; - // Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. - // It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not - // deleted from the work-queue as that is a batched operation. + /** + * Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. + * It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not + * deleted from the work-queue as that is a batched operation. + */ final private Set runningZKTasks; - // This map may contain tasks which are read from work queue but could not - // be executed because they are blocked or the execution queue is full - // This is an optimization to ensure that we do not read the same tasks - // again and again from ZK. + + /** + * This map may contain tasks which are read from work queue but could not + * be executed because they are blocked or the execution queue is full + * This is an optimization to ensure that we do not read the same tasks + * again and again from ZK. + */ final private Map blockedTasks = Collections.synchronizedMap(new LinkedHashMap<>()); Review comment: Would a ConcurrentLinkedQueue work? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14560) Learning To Rank Interleaving
[ https://issues.apache.org/jira/browse/SOLR-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133592#comment-17133592 ] Alessandro Benedetti commented on SOLR-14560: - The draft is attached : [https://github.com/apache/lucene-solr/pull/1571|https://github.com/apache/lucene-solr/pull/1571] Any comments on the architectural changes and the places I touched so far are more than welcome. Bear in mind the task is still work in progress and changes/tests will happen, so in case you are curious and willing to leave a comment, take this into account. Once ready for code review I will add a comment here a finalise the Pull Request from draft. I will proceed to the merge with at least another committer approval. I tag all the people that worked on Learning To Rank, in no particular order: [~cpoerschke] [~diegoceccarelli] [~mnilsson] [~jpantony][~jdorando][~nsanthapuri] [~dave1g] > Learning To Rank Interleaving > - > > Key: SOLR-14560 > URL: https://issues.apache.org/jira/browse/SOLR-14560 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LTR >Affects Versions: 8.5.2 >Reporter: Alessandro Benedetti >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Interleaving is an approach to Online Search Quality evaluation that can be > very useful for Learning To Rank models: > [https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html|https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html] > Scope of this issue is to introduce the ability to the LTR query parser of > accepting multiple models (2 to start with). > If one model is passed, normal reranking happens. > If two models are passed, reranking happens for both models and the final > reranked list is the interleaved sequence of results coming from the two > models lists. > As a first step it is going to be implemented through: > TeamDraft Interleaving with two models in input. > In the future, we can expand the functionality adding the interleaving > algorithm as a parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti opened a new pull request #1571: SOLR-14560: Interleaving for Learning To Rank
alessandrobenedetti opened a new pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571 # Description Interleaving is an approach to Online Search Quality evaluation that can be very useful for Learning To Rank models: https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html Scope of this issue is to introduce the ability to the LTR query parser of accepting multiple models (2 to start with). If one model is passed, normal reranking happens. If two models are passed, reranking happens for both models and the final reranked list is the interleaved sequence of results coming from the two models lists. As a first step it is going to be implemented through: TeamDraft Interleaving with two models in input. In the future, we can expand the functionality adding the interleaving algorithm as a parameter. # Solution Change of core LTR classed and addition of a new rescorer # Tests WIP # Checklist Please review the following and check all that apply: - [X ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [X ] I have created a Jira issue and added the issue ID to my pull request title. - [X ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [X] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14560) Learning To Rank Interleaving
Alessandro Benedetti created SOLR-14560: --- Summary: Learning To Rank Interleaving Key: SOLR-14560 URL: https://issues.apache.org/jira/browse/SOLR-14560 Project: Solr Issue Type: New Feature Security Level: Public (Default Security Level. Issues are Public) Components: contrib - LTR Affects Versions: 8.5.2 Reporter: Alessandro Benedetti Interleaving is an approach to Online Search Quality evaluation that can be very useful for Learning To Rank models: [https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html|https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html] Scope of this issue is to introduce the ability to the LTR query parser of accepting multiple models (2 to start with). If one model is passed, normal reranking happens. If two models are passed, reranking happens for both models and the final reranked list is the interleaved sequence of results coming from the two models lists. As a first step it is going to be implemented through: TeamDraft Interleaving with two models in input. In the future, we can expand the functionality adding the interleaving algorithm as a parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14558) SolrLogPostTool should record all lines
[ https://issues.apache.org/jira/browse/SOLR-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-14558: -- Assignee: Jason Gerlowski > SolrLogPostTool should record all lines > --- > > Key: SOLR-14558 > URL: https://issues.apache.org/jira/browse/SOLR-14558 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Currently, SolrLogPostTool recognizes a predefined set of "types" of log > messages: queries, errors, commits, etc. This makes it easy to find and > explore the traffic your cluster is seeing. > But it would also be cool if we also indexed all records, even if many of > them are just assigned a catch-all "other" type_s value. We won't be able to > parse out detailed values from the log messages the way we would for > type_s=query for example, but we can still store the line and timestamp. > Gives much better search over the logs than dropping down to "grep" for > anything that's not one of the predefined types. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1563: LUCENE-9394: fix and suppress warnings
madrob commented on a change in pull request #1563: URL: https://github.com/apache/lucene-solr/pull/1563#discussion_r438975730 ## File path: lucene/core/src/test/org/apache/lucene/analysis/TestCharArraySet.java ## @@ -61,15 +61,17 @@ public void testNonZeroOffset() { public void testObjectContains() { CharArraySet set = new CharArraySet(10, true); Integer val = Integer.valueOf(1); +@SuppressWarnings("deprecation") +Integer val1 = new Integer(1); Review comment: Add a comment that we're explicitly avoiding the Integer cache, and an `assertNotSame(val, val1)`? ## File path: lucene/spatial-extras/src/test/org/apache/lucene/spatial/prefix/HeatmapFacetCounterTest.java ## @@ -33,11 +33,7 @@ import org.locationtech.spatial4j.context.SpatialContext; import org.locationtech.spatial4j.context.SpatialContextFactory; import org.locationtech.spatial4j.distance.DistanceUtils; -import org.locationtech.spatial4j.shape.Circle; -import org.locationtech.spatial4j.shape.Point; -import org.locationtech.spatial4j.shape.Rectangle; -import org.locationtech.spatial4j.shape.Shape; -import org.locationtech.spatial4j.shape.SpatialRelation; +import org.locationtech.spatial4j.shape.*; Review comment: wildcard import ## File path: lucene/spatial-extras/src/test/org/apache/lucene/spatial/spatial4j/RandomizedShapeTestCase.java ## @@ -183,106 +179,4 @@ private void _assertIntersect(String msg, SpatialRelation expected, Shape a, Sha } } - protected void assertEqualsRatio(String msg, double expected, double actual) { Review comment: There appear to be more unused methods in this class, why did we keep them but not these? ## File path: lucene/core/src/test/org/apache/lucene/analysis/TestCharArraySet.java ## @@ -61,15 +61,17 @@ public void testNonZeroOffset() { public void testObjectContains() { CharArraySet set = new CharArraySet(10, true); Integer val = Integer.valueOf(1); +@SuppressWarnings("deprecation") +Integer val1 = new Integer(1); Review comment: Add a comment that we're explicitly avoiding the Integer cache, and an `assertNotSame(val, val1)`? ## File path: lucene/spatial-extras/src/test/org/apache/lucene/spatial/prefix/HeatmapFacetCounterTest.java ## @@ -33,11 +33,7 @@ import org.locationtech.spatial4j.context.SpatialContext; import org.locationtech.spatial4j.context.SpatialContextFactory; import org.locationtech.spatial4j.distance.DistanceUtils; -import org.locationtech.spatial4j.shape.Circle; -import org.locationtech.spatial4j.shape.Point; -import org.locationtech.spatial4j.shape.Rectangle; -import org.locationtech.spatial4j.shape.Shape; -import org.locationtech.spatial4j.shape.SpatialRelation; +import org.locationtech.spatial4j.shape.*; Review comment: wildcard import ## File path: lucene/spatial-extras/src/test/org/apache/lucene/spatial/spatial4j/RandomizedShapeTestCase.java ## @@ -183,106 +179,4 @@ private void _assertIntersect(String msg, SpatialRelation expected, Shape a, Sha } } - protected void assertEqualsRatio(String msg, double expected, double actual) { Review comment: There appear to be more unused methods in this class, why did we keep them but not these? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.
mikemccand commented on a change in pull request #1543: URL: https://github.com/apache/lucene-solr/pull/1543#discussion_r438928868 ## File path: lucene/CHANGES.txt ## @@ -218,6 +218,10 @@ Optimizations * LUCENE-9087: Build always trees with full leaves and lower the default value for maxPointsPerLeafNode to 512. (Ignacio Vera) +* LUCENE-9378: Disabled compression on short binary values, as compression Review comment: Maybe say `Disable doc values compression on short binary values, ...`? (To make it clear we are talking about doc values and not maybe stored fields). ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -404,32 +406,51 @@ private void flushData() throws IOException { // Write offset to this block to temporary offsets file totalChunks++; long thisBlockStartPointer = data.getFilePointer(); - -// Optimisation - check if all lengths are same -boolean allLengthsSame = true; -for (int i = 1; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; i++) { - if (docLengths[i] != docLengths[i-1]) { -allLengthsSame = false; + +final int avgLength = uncompressedBlockLength >>> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; +int offset = 0; +// Turn docLengths into deltas from expected values from the average length +for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) { + offset += docLengths[i]; + docLengths[i] = offset - avgLength * (i + 1); +} +int numBytes = 0; +for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) { + if (docLengths[i] < Byte.MIN_VALUE || docLengths[i] > Byte.MAX_VALUE) { +numBytes = Integer.BYTES; break; + } else if (docLengths[i] != 0) { +numBytes = Math.max(numBytes, Byte.BYTES); } } -if (allLengthsSame) { -// Only write one value shifted. Steal a bit to indicate all other lengths are the same -int onlyOneLength = (docLengths[0] <<1) | 1; -data.writeVInt(onlyOneLength); -} else { - for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; i++) { -if (i == 0) { - // Write first value shifted and steal a bit to indicate other lengths are to follow - int multipleLengths = (docLengths[0] <<1); - data.writeVInt(multipleLengths); -} else { - data.writeVInt(docLengths[i]); -} +data.writeVLonglong) uncompressedBlockLength) << 4) | numBytes); + +if (numBytes == Integer.BYTES) { + // encode deltas as ints + for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) { +data.writeInt(Integer.reverseBytes(docLengths[i])); + } +} else if (numBytes == 1) { + for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) { +data.writeByte((byte) docLengths[i]); } +} else if (numBytes != 0) { + throw new AssertionError(); } + maxUncompressedBlockLength = Math.max(maxUncompressedBlockLength, uncompressedBlockLength); -LZ4.compress(block, 0, uncompressedBlockLength, data, ht); + +// Compression proved to hurt latency in some cases, so we're only +// enabling it on long inputs for now. Can we reduce the compression +// overhead and enable compression again, e.g. by building shared +// dictionaries that allow decompressing one value at once instead of +// forcing 32 values to be decompressed even when you only need one? +if (uncompressedBlockLength >= BINARY_LENGTH_COMPRESSION_THRESHOLD * numDocsInCurrentBlock) { + LZ4.compress(block, 0, uncompressedBlockLength, data, highCompHt); +} else { + LZ4.compress(block, 0, uncompressedBlockLength, data, noCompHt); Review comment: Hmm do we know that our new `LZ4.NoCompressionHashTable` is actually really close to doing nothing? I don't understand `LZ4` well enough to know that e.g. `return -1` from `int get (int offset)` method is really a no-op overall... ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -762,6 +764,97 @@ public BytesRef binaryValue() throws IOException { // Decompresses blocks of binary values to retrieve content class BinaryDecoder { +private final LongValues addresses; +private final IndexInput compressedData; +// Cache of last uncompressed block +private long lastBlockId = -1; +private final ByteBuffer deltas; +private int numBytes; +private int uncompressedBlockLength;
[GitHub] [lucene-solr] madrob commented on a change in pull request #1563: LUCENE-9394: fix and suppress warnings
madrob commented on a change in pull request #1563: URL: https://github.com/apache/lucene-solr/pull/1563#discussion_r438975730 ## File path: lucene/core/src/test/org/apache/lucene/analysis/TestCharArraySet.java ## @@ -61,15 +61,17 @@ public void testNonZeroOffset() { public void testObjectContains() { CharArraySet set = new CharArraySet(10, true); Integer val = Integer.valueOf(1); +@SuppressWarnings("deprecation") +Integer val1 = new Integer(1); Review comment: Add a comment that we're explicitly avoiding the Integer cache, and an `assertNotSame(val, val1)`? ## File path: lucene/spatial-extras/src/test/org/apache/lucene/spatial/prefix/HeatmapFacetCounterTest.java ## @@ -33,11 +33,7 @@ import org.locationtech.spatial4j.context.SpatialContext; import org.locationtech.spatial4j.context.SpatialContextFactory; import org.locationtech.spatial4j.distance.DistanceUtils; -import org.locationtech.spatial4j.shape.Circle; -import org.locationtech.spatial4j.shape.Point; -import org.locationtech.spatial4j.shape.Rectangle; -import org.locationtech.spatial4j.shape.Shape; -import org.locationtech.spatial4j.shape.SpatialRelation; +import org.locationtech.spatial4j.shape.*; Review comment: wildcard import ## File path: lucene/spatial-extras/src/test/org/apache/lucene/spatial/spatial4j/RandomizedShapeTestCase.java ## @@ -183,106 +179,4 @@ private void _assertIntersect(String msg, SpatialRelation expected, Shape a, Sha } } - protected void assertEqualsRatio(String msg, double expected, double actual) { Review comment: There appear to be more unused methods in this class, why did we keep them but not these? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.
mikemccand commented on a change in pull request #1543: URL: https://github.com/apache/lucene-solr/pull/1543#discussion_r438928868 ## File path: lucene/CHANGES.txt ## @@ -218,6 +218,10 @@ Optimizations * LUCENE-9087: Build always trees with full leaves and lower the default value for maxPointsPerLeafNode to 512. (Ignacio Vera) +* LUCENE-9378: Disabled compression on short binary values, as compression Review comment: Maybe say `Disable doc values compression on short binary values, ...`? (To make it clear we are talking about doc values and not maybe stored fields). ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -404,32 +406,51 @@ private void flushData() throws IOException { // Write offset to this block to temporary offsets file totalChunks++; long thisBlockStartPointer = data.getFilePointer(); - -// Optimisation - check if all lengths are same -boolean allLengthsSame = true; -for (int i = 1; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; i++) { - if (docLengths[i] != docLengths[i-1]) { -allLengthsSame = false; + +final int avgLength = uncompressedBlockLength >>> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; +int offset = 0; +// Turn docLengths into deltas from expected values from the average length +for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) { + offset += docLengths[i]; + docLengths[i] = offset - avgLength * (i + 1); +} +int numBytes = 0; +for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) { + if (docLengths[i] < Byte.MIN_VALUE || docLengths[i] > Byte.MAX_VALUE) { +numBytes = Integer.BYTES; break; + } else if (docLengths[i] != 0) { +numBytes = Math.max(numBytes, Byte.BYTES); } } -if (allLengthsSame) { -// Only write one value shifted. Steal a bit to indicate all other lengths are the same -int onlyOneLength = (docLengths[0] <<1) | 1; -data.writeVInt(onlyOneLength); -} else { - for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; i++) { -if (i == 0) { - // Write first value shifted and steal a bit to indicate other lengths are to follow - int multipleLengths = (docLengths[0] <<1); - data.writeVInt(multipleLengths); -} else { - data.writeVInt(docLengths[i]); -} +data.writeVLonglong) uncompressedBlockLength) << 4) | numBytes); + +if (numBytes == Integer.BYTES) { + // encode deltas as ints + for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) { +data.writeInt(Integer.reverseBytes(docLengths[i])); + } +} else if (numBytes == 1) { + for (int i = 0; i < Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK; ++i) { +data.writeByte((byte) docLengths[i]); } +} else if (numBytes != 0) { + throw new AssertionError(); } + maxUncompressedBlockLength = Math.max(maxUncompressedBlockLength, uncompressedBlockLength); -LZ4.compress(block, 0, uncompressedBlockLength, data, ht); + +// Compression proved to hurt latency in some cases, so we're only +// enabling it on long inputs for now. Can we reduce the compression +// overhead and enable compression again, e.g. by building shared +// dictionaries that allow decompressing one value at once instead of +// forcing 32 values to be decompressed even when you only need one? +if (uncompressedBlockLength >= BINARY_LENGTH_COMPRESSION_THRESHOLD * numDocsInCurrentBlock) { + LZ4.compress(block, 0, uncompressedBlockLength, data, highCompHt); +} else { + LZ4.compress(block, 0, uncompressedBlockLength, data, noCompHt); Review comment: Hmm do we know that our new `LZ4.NoCompressionHashTable` is actually really close to doing nothing? I don't understand `LZ4` well enough to know that e.g. `return -1` from `int get (int offset)` method is really a no-op overall... ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -762,6 +764,97 @@ public BytesRef binaryValue() throws IOException { // Decompresses blocks of binary values to retrieve content class BinaryDecoder { +private final LongValues addresses; +private final IndexInput compressedData; +// Cache of last uncompressed block +private long lastBlockId = -1; +private final ByteBuffer deltas; +private int numBytes; +private int uncompressedBlockLength;
[GitHub] [lucene-solr] tflobbe commented on pull request #1567: LUCENE-9402: Let MultiCollector handle minCompetitiveScore
tflobbe commented on pull request #1567: URL: https://github.com/apache/lucene-solr/pull/1567#issuecomment-642845936 Ah! good Catch, I missed that completely. I'll fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order
murblanc commented on a change in pull request #1561: URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438976552 ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ## @@ -95,16 +95,25 @@ private volatile Stats stats; - // Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. - // It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not - // deleted from the work-queue as that is a batched operation. + /** + * Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. + * It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not + * deleted from the work-queue as that is a batched operation. + */ final private Set runningZKTasks; - // This map may contain tasks which are read from work queue but could not - // be executed because they are blocked or the execution queue is full - // This is an optimization to ensure that we do not read the same tasks - // again and again from ZK. + + /** + * This map may contain tasks which are read from work queue but could not + * be executed because they are blocked or the execution queue is full + * This is an optimization to ensure that we do not read the same tasks + * again and again from ZK. + */ final private Map blockedTasks = Collections.synchronizedMap(new LinkedHashMap<>()); - final private Predicate excludedTasks = new Predicate() { + + /** + * Predicate used to filter out tasks from the Zookeeper queue that should not be returned for processing. + */ + final private Predicate excludedTasks = new Predicate<>() { @Override public boolean test(String s) { return runningTasks.contains(s) || blockedTasks.containsKey(s); Review comment: Yes it is. Can likely change this one into a `ConcurrentHashMap.newKeySet()` as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order
murblanc commented on a change in pull request #1561: URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438974719 ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ## @@ -95,16 +95,25 @@ private volatile Stats stats; - // Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. - // It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not - // deleted from the work-queue as that is a batched operation. + /** + * Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. + * It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not + * deleted from the work-queue as that is a batched operation. + */ final private Set runningZKTasks; Review comment: Yes, will change that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] danmuzi commented on pull request #1560: LUCENE-9391: Upgrade HPPC to 0.8.2
danmuzi commented on pull request #1560: URL: https://github.com/apache/lucene-solr/pull/1560#issuecomment-642844611 Please do the **"Squash and merge"** below. Your sub commits will be combined automatically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order
murblanc commented on a change in pull request #1561: URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438970686 ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ## @@ -95,16 +95,25 @@ private volatile Stats stats; - // Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. - // It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not - // deleted from the work-queue as that is a batched operation. + /** + * Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. + * It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not + * deleted from the work-queue as that is a batched operation. + */ final private Set runningZKTasks; - // This map may contain tasks which are read from work queue but could not - // be executed because they are blocked or the execution queue is full - // This is an optimization to ensure that we do not read the same tasks - // again and again from ZK. + + /** + * This map may contain tasks which are read from work queue but could not + * be executed because they are blocked or the execution queue is full + * This is an optimization to ensure that we do not read the same tasks + * again and again from ZK. + */ final private Map blockedTasks = Collections.synchronizedMap(new LinkedHashMap<>()); Review comment: We'd need a concurrent linked hash map because we need iteration order == insert order... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9397) UniformSplit supports encodable fields metadata
[ https://issues.apache.org/jira/browse/LUCENE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Roustant resolved LUCENE-9397. Fix Version/s: 8.6 Resolution: Fixed Thanks [~dsmiley] for the review. > UniformSplit supports encodable fields metadata > --- > > Key: LUCENE-9397 > URL: https://issues.apache.org/jira/browse/LUCENE-9397 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.6 > > Time Spent: 20m > Remaining Estimate: 0h > > UniformSplit already supports custom encoding for term blocks. This is an > extension to also support encodable fields metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9397) UniformSplit supports encodable fields metadata
[ https://issues.apache.org/jira/browse/LUCENE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133535#comment-17133535 ] ASF subversion and git services commented on LUCENE-9397: - Commit ac7bb4a53effcd4e37174e74c89f61187f04fcc0 in lucene-solr's branch refs/heads/branch_8x from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ac7bb4a ] LUCENE-9397: UniformSplit supports encodable fields metadata. > UniformSplit supports encodable fields metadata > --- > > Key: LUCENE-9397 > URL: https://issues.apache.org/jira/browse/LUCENE-9397 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > UniformSplit already supports custom encoding for term blocks. This is an > extension to also support encodable fields metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order
murblanc commented on a change in pull request #1561: URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438967387 ## File path: solr/solrj/src/java/org/apache/solr/common/params/CollectionParams.java ## @@ -42,31 +42,30 @@ enum LockLevel { -CLUSTER(0), -COLLECTION(1), -SHARD(2), -REPLICA(3), -NONE(10); - -public final int level; - -LockLevel(int i) { - this.level = i; +NONE(10, null), Review comment: Compiler complained of forward reference when I didn't. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order
murblanc commented on a change in pull request #1561: URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438966974 ## File path: solr/core/src/java/org/apache/solr/cloud/api/collections/OverseerCollectionMessageHandler.java ## @@ -867,26 +866,25 @@ public String getTaskKey(ZkNodeProps message) { } + // -1 is not a possible batchSessionId so -1 will force initialization of lockSession private long sessionId = -1; private LockTree.Session lockSession; @Override - public Lock lockTask(ZkNodeProps message, OverseerTaskProcessor.TaskBatch taskBatch) { -if (lockSession == null || sessionId != taskBatch.getId()) { + public Lock lockTask(ZkNodeProps message, long batchSessionId) { +if (sessionId != batchSessionId) { //this is always called in the same thread. //Each batch is supposed to have a new taskBatch //So if taskBatch changes we must create a new Session - // also check if the running tasks are empty. If yes, clear lockTree - // this will ensure that locks are not 'leaked' - if(taskBatch.getRunningTasks() == 0) lockTree.clear(); Review comment: I hope (and think) it is... A lock can leak if an executor thread dies in a place where it shouldn't be dying (just before the try with the lock released in the finally. Clearing all locks is not a solution IMO. If we do end up with lock leaks we should address those in a more elegant way (fix the leak). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order
murblanc commented on a change in pull request #1561: URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438965089 ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ## @@ -253,20 +277,22 @@ public void run() { continue; } - blockedTasks.clear(); // clear it now; may get refilled below. + // clear the blocked tasks, may get refilled below. Given blockedTasks can only get entries from heads and heads + // has at most MAX_BLOCKED_TASKS tasks, blockedTasks will never exceed MAX_BLOCKED_TASKS entries. + // Note blockedTasks can't be cleared too early as it is used in the excludedTasks Predicate above. + blockedTasks.clear(); + + // Trigger the creation of a new Session used for locking when/if a lock is later acquired on the OverseerCollectionMessageHandler + batchSessionId++; - taskBatch.batchId++; boolean tooManyTasks = false; for (QueueEvent head : heads) { if (!tooManyTasks) { - synchronized (runningTasks) { tooManyTasks = runningTasksSize() >= MAX_PARALLEL_TASKS; - } } if (tooManyTasks) { // Too many tasks are running, just shove the rest into the "blocked" queue. - if(blockedTasks.size() < MAX_BLOCKED_TASKS) -blockedTasks.put(head.getId(), head); + blockedTasks.put(head.getId(), head); Review comment: Commented line 280 above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zhaih commented on pull request #1560: LUCENE-9391: Upgrade HPPC to 0.8.2
zhaih commented on pull request #1560: URL: https://github.com/apache/lucene-solr/pull/1560#issuecomment-642835331 Do I need to squash the commits? Seems commits in Lucene are all squashed? Or it will be done automatically when merging somehow? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zhaih commented on pull request #1560: LUCENE-9391: Upgrade HPPC to 0.8.2
zhaih commented on pull request #1560: URL: https://github.com/apache/lucene-solr/pull/1560#issuecomment-642830815 > Hi Patrick, > Thanks for your contribution 👍 > I found the JIRA-issue number of this PR is wrong. > It should be changed from [LUCENE-8574] to [LUCENE-9391]. > https://issues.apache.org/jira/browse/LUCENE-9391 > Please check it. Oh, thank you for figuring that out! Yeah I picked up a wrong one from my backlog... Thank you very much! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order
madrob commented on a change in pull request #1561: URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438907365 ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerMessageHandler.java ## @@ -50,7 +50,7 @@ /**Try to provide an exclusive lock for this particular task * return null if locking is not possible. If locking is not necessary Review comment: This javadoc includes a sentence fragment, can we complete the thought while we're improving documentation in this area? ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ## @@ -95,16 +95,25 @@ private volatile Stats stats; - // Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. - // It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not - // deleted from the work-queue as that is a batched operation. + /** + * Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. + * It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not + * deleted from the work-queue as that is a batched operation. + */ final private Set runningZKTasks; Review comment: Since there is so much synchronized access to this, should it be a `ConcurrentHashMap.newKeySet();` ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ## @@ -95,16 +95,25 @@ private volatile Stats stats; - // Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. - // It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not - // deleted from the work-queue as that is a batched operation. + /** + * Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. + * It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not + * deleted from the work-queue as that is a batched operation. + */ final private Set runningZKTasks; - // This map may contain tasks which are read from work queue but could not - // be executed because they are blocked or the execution queue is full - // This is an optimization to ensure that we do not read the same tasks - // again and again from ZK. + + /** + * This map may contain tasks which are read from work queue but could not + * be executed because they are blocked or the execution queue is full + * This is an optimization to ensure that we do not read the same tasks + * again and again from ZK. + */ final private Map blockedTasks = Collections.synchronizedMap(new LinkedHashMap<>()); - final private Predicate excludedTasks = new Predicate() { + + /** + * Predicate used to filter out tasks from the Zookeeper queue that should not be returned for processing. + */ + final private Predicate excludedTasks = new Predicate<>() { @Override public boolean test(String s) { return runningTasks.contains(s) || blockedTasks.containsKey(s); Review comment: This reference to runningTasks isn't synchronized. Is that an issue? ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ## @@ -95,16 +95,25 @@ private volatile Stats stats; - // Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. - // It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not - // deleted from the work-queue as that is a batched operation. + /** + * Set of tasks that have been picked up for processing but not cleaned up from zk work-queue. + * It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not + * deleted from the work-queue as that is a batched operation. + */ final private Set runningZKTasks; - // This map may contain tasks which are read from work queue but could not - // be executed because they are blocked or the execution queue is full - // This is an optimization to ensure that we do not read the same tasks - // again and again from ZK. + + /** + * This map may contain tasks which are read from work queue but could not + * be executed because they are blocked or the execution queue is full + * This is an optimization to ensure that we do not read the same tasks + * again and again from ZK. + */ final private Map blockedTasks = Collections.synchronizedMap(new LinkedHashMap<>()); Review comment: Similar here, can this be a ConcurrentHashMap instead of a synchronized map? ## File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ## @@ -253,20 +277,22 @@ public void run() { continue; }
[GitHub] [lucene-solr] danmuzi commented on pull request #1560: LUCENE-8574: Upgrade HPPC to 0.8.2
danmuzi commented on pull request #1560: URL: https://github.com/apache/lucene-solr/pull/1560#issuecomment-642827027 Hi Patrick, Thanks for your contribution 👍 I found the JIRA-issue number of this PR is wrong. It should be changed from [LUCENE-8574] to [LUCENE-9391]. https://issues.apache.org/jira/browse/LUCENE-9391 Please check it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133403#comment-17133403 ] Adrien Grand commented on LUCENE-9356: -- I had beasted many iterations but the Elastic CI found a failing seed right after I pushed that is due to the FST constructor, which throws an IllegalStateException when an unexpected byte is read for the input type, so I changed it for a CorruptIndexException. > Add tests for corruptions caused by byte flips > -- > > Key: LUCENE-9356 > URL: https://issues.apache.org/jira/browse/LUCENE-9356 > Project: Lucene - Core > Issue Type: Test >Reporter: Adrien Grand >Priority: Minor > Fix For: 8.6 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > We already have tests that file truncation and modification of the index > headers are caught correctly. I'd like to add another test that flipping a > byte in a way that modifies the checksum of the file is always caught > gracefully by Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133376#comment-17133376 ] ASF subversion and git services commented on LUCENE-9356: - Commit 8d95a2ee582da04edf419e6b39756fdde55503fc in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8d95a2e ] LUCENE-9356: Make FST throw the correct exception upon incorrect input type. > Add tests for corruptions caused by byte flips > -- > > Key: LUCENE-9356 > URL: https://issues.apache.org/jira/browse/LUCENE-9356 > Project: Lucene - Core > Issue Type: Test >Reporter: Adrien Grand >Priority: Minor > Fix For: 8.6 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > We already have tests that file truncation and modification of the index > headers are caught correctly. I'd like to add another test that flipping a > byte in a way that modifies the checksum of the file is always caught > gracefully by Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133374#comment-17133374 ] ASF subversion and git services commented on LUCENE-9356: - Commit 8d95a2ee582da04edf419e6b39756fdde55503fc in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8d95a2e ] LUCENE-9356: Make FST throw the correct exception upon incorrect input type. > Add tests for corruptions caused by byte flips > -- > > Key: LUCENE-9356 > URL: https://issues.apache.org/jira/browse/LUCENE-9356 > Project: Lucene - Core > Issue Type: Test >Reporter: Adrien Grand >Priority: Minor > Fix For: 8.6 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > We already have tests that file truncation and modification of the index > headers are caught correctly. I'd like to add another test that flipping a > byte in a way that modifies the checksum of the file is always caught > gracefully by Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant closed pull request #1564: LUCENE-9397: UniformSplit supports encodable fields metadata.
bruno-roustant closed pull request #1564: URL: https://github.com/apache/lucene-solr/pull/1564 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9397) UniformSplit supports encodable fields metadata
[ https://issues.apache.org/jira/browse/LUCENE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133372#comment-17133372 ] ASF subversion and git services commented on LUCENE-9397: - Commit 75d25ad6779dec194a2e0ef2a3263ce0fb872cf6 in lucene-solr's branch refs/heads/master from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=75d25ad ] LUCENE-9397: UniformSplit supports encodable fields metadata. > UniformSplit supports encodable fields metadata > --- > > Key: LUCENE-9397 > URL: https://issues.apache.org/jira/browse/LUCENE-9397 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > UniformSplit already supports custom encoding for term blocks. This is an > extension to also support encodable fields metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9356) Add tests for corruptions caused by byte flips
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-9356. -- Fix Version/s: 8.6 Resolution: Fixed > Add tests for corruptions caused by byte flips > -- > > Key: LUCENE-9356 > URL: https://issues.apache.org/jira/browse/LUCENE-9356 > Project: Lucene - Core > Issue Type: Test >Reporter: Adrien Grand >Priority: Minor > Fix For: 8.6 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > We already have tests that file truncation and modification of the index > headers are caught correctly. I'd like to add another test that flipping a > byte in a way that modifies the checksum of the file is always caught > gracefully by Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on pull request #1557: LUCENE-9396: Improve truncation detection for points.
jpountz commented on pull request #1557: URL: https://github.com/apache/lucene-solr/pull/1557#issuecomment-642781876 @rmuir I combined them in an overloaded `retrieveChecksum(IndexInput, long)` variant, what do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133367#comment-17133367 ] ASF subversion and git services commented on LUCENE-9356: - Commit d3c74a305ff95f087a6e88953d1ef34e7d71f06f in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d3c74a3 ] LUCENE-9356: Add a test that verifies that Lucene catches bit flips. (#1569) > Add tests for corruptions caused by byte flips > -- > > Key: LUCENE-9356 > URL: https://issues.apache.org/jira/browse/LUCENE-9356 > Project: Lucene - Core > Issue Type: Test >Reporter: Adrien Grand >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > We already have tests that file truncation and modification of the index > headers are caught correctly. I'd like to add another test that flipping a > byte in a way that modifies the checksum of the file is always caught > gracefully by Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133366#comment-17133366 ] ASF subversion and git services commented on LUCENE-9356: - Commit 36109ec36216141cb0fbf9fb09e9d74721a78bda in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=36109ec ] LUCENE-9356: Add a test that verifies that Lucene catches bit flips. (#1569) > Add tests for corruptions caused by byte flips > -- > > Key: LUCENE-9356 > URL: https://issues.apache.org/jira/browse/LUCENE-9356 > Project: Lucene - Core > Issue Type: Test >Reporter: Adrien Grand >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > We already have tests that file truncation and modification of the index > headers are caught correctly. I'd like to add another test that flipping a > byte in a way that modifies the checksum of the file is always caught > gracefully by Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz merged pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.
jpountz merged pull request #1569: URL: https://github.com/apache/lucene-solr/pull/1569 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12823) remove clusterstate.json in Lucene/Solr 9.0
[ https://issues.apache.org/jira/browse/SOLR-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133360#comment-17133360 ] ASF subversion and git services commented on SOLR-12823: Commit b4dcbfa3de7c512baab642942320d48fb6f180c4 in lucene-solr's branch refs/heads/master from murblanc [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b4dcbfa ] SOLR-12823: fix failures in CloudHttp2SolrClientTest CloudSolrClientTest TestCloudSolrClientConnections (#1565) Co-authored-by: Ilan Ginzburg > remove clusterstate.json in Lucene/Solr 9.0 > --- > > Key: SOLR-12823 > URL: https://issues.apache.org/jira/browse/SOLR-12823 > Project: Solr > Issue Type: Task >Reporter: Varun Thacker >Assignee: Mike Drob >Priority: Major > Fix For: master (9.0) > > Time Spent: 4h 40m > Remaining Estimate: 0h > > clusterstate.json is an artifact of a pre 5.0 Solr release. We should remove > that in 9.0 > It stays empty unless you explicitly ask to create the collection with the > old "stateFormat" and there is no reason for one to create a collection with > the old stateFormat. > We should also remove the "stateFormat" argument in create collection > We should also remove MIGRATESTATEVERSION as well > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob merged pull request #1565: SOLR-12823: fix test failures
madrob merged pull request #1565: URL: https://github.com/apache/lucene-solr/pull/1565 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14557) eDisMax parser switch + braces regression
[ https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133301#comment-17133301 ] Mikhail Khludnev edited comment on SOLR-14557 at 6/11/20, 2:38 PM: --- Thanks for response , [~dsmiley]. # it seems like bug in syntax parsing # I trust users # they have many old queries in curly braces where they switch different parses (mostly \{!join}) arbitrarily, so defType isn't an option # it seems I achieved what uf does via luceneMatchVersion = 4.5 in config that's I'v got SOLR-11501 notes. So, uf doesn't bring any value to me. Or it should? # So everything seems working (switching \{!parser} inside of edismax query) until users add {{(}} braces {{)}}. So, old query doesn't work for them. It seems like a bug outside of SOLR-11501 or loosely related to it. was (Author: mkhludnev): Thanks for response , [~dsmiley]. # it seems like bug in syntax parsing # I trust users # they have many old queries in curly braces where they switch different parses (mostly \{!join}) arbitrarily, so defType isn't an option # it seems I achieved what uf does via luceneMatchVersion = 4.5 in config that's I'v got SOLR-11501 notes. So, uf doesn't bring any value to me. Or it should? # So everything seems working (switching \{!parser} inside of edismax query) until users add {{(}}braces{{)}}. So, old query doesn't work for them. It seems like a bug outside of SOLR-11501 or loosely related to it. > eDisMax parser switch + braces regression > - > > Key: SOLR-14557 > URL: https://issues.apache.org/jira/browse/SOLR-14557 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Mikhail Khludnev >Priority: Major > Labels: painful > > h2. Solr 4.5 > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > > goes like > {code} > \{!lucene}(foo) > content:foo > LuceneQParser > {code} > fine > h2. Solr 8.2 > with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but > it's a question of migrating existing queries. > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > goes like > {code} > "querystring":"\{!lucene}(foo)", > "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene > Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) > "QParser":"ExtendedDismaxQParser", > {code} > blah... > but removing braces in 8.2 works perfectly fine > {code} > "querystring":"\{!lucene}foo", > "parsedquery":"+content:foo", > "parsedquery_toString":"+content:foo", > "QParser":"ExtendedDismaxQParser", > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14557) eDisMax parser switch + braces regression
[ https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133301#comment-17133301 ] Mikhail Khludnev commented on SOLR-14557: - Thanks for response , [~dsmiley]. # it seems like bug in syntax parsing # I trust users # they have many old queries in curly braces where they switch different parses (mostly \{!join}) arbitrarily, so defType isn't an option # it seems I achieved what uf does via luceneMatchVersion = 4.5 in config that's I'v got SOLR-11501 notes. So, uf doesn't bring any value to me. Or it should? # So everything seems working (switching \{!parser} inside of edismax query) until users add {{(}}braces{{)}}. So, old query doesn't work for them. It seems like a bug outside of SOLR-11501 or loosely related to it. > eDisMax parser switch + braces regression > - > > Key: SOLR-14557 > URL: https://issues.apache.org/jira/browse/SOLR-14557 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Mikhail Khludnev >Priority: Major > Labels: painful > > h2. Solr 4.5 > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > > goes like > {code} > \{!lucene}(foo) > content:foo > LuceneQParser > {code} > fine > h2. Solr 8.2 > with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but > it's a question of migrating existing queries. > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > goes like > {code} > "querystring":"\{!lucene}(foo)", > "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene > Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) > "QParser":"ExtendedDismaxQParser", > {code} > blah... > but removing braces in 8.2 works perfectly fine > {code} > "querystring":"\{!lucene}foo", > "parsedquery":"+content:foo", > "parsedquery_toString":"+content:foo", > "QParser":"ExtendedDismaxQParser", > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14557) eDisMax parser switch + braces regression
[ https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133301#comment-17133301 ] Mikhail Khludnev edited comment on SOLR-14557 at 6/11/20, 2:38 PM: --- Thanks for response , [~dsmiley]. # it seems like bug in syntax parsing # I trust users # they have many old queries in curly braces where they switch different parses (mostly \{!join}) arbitrarily, so defType isn't an option # it seems I achieved what uf does via luceneMatchVersion = 4.5 in config that's I'v got SOLR-11501 notes. So, uf doesn't bring any value to me. Or it should? # So everything seems working (switching \{!parser} inside of edismax query) until users add {{(}}braces{{)}}. So, old query doesn't work for them. It seems like a bug outside of SOLR-11501 or loosely related to it. was (Author: mkhludnev): Thanks for response , [~dsmiley]. # it seems like bug in syntax parsing # I trust users # they have many old queries in curly braces where they switch different parses (mostly \{!join}) arbitrarily, so defType isn't an option # it seems I achieved what uf does via luceneMatchVersion = 4.5 in config that's I'v got SOLR-11501 notes. So, uf doesn't bring any value to me. Or it should? # So everything seems working (switching \{!parser} inside of edismax query) until users add {{(}}braces{{)}}. So, old query doesn't work for them. It seems like a bug outside of SOLR-11501 or loosely related to it. > eDisMax parser switch + braces regression > - > > Key: SOLR-14557 > URL: https://issues.apache.org/jira/browse/SOLR-14557 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Mikhail Khludnev >Priority: Major > Labels: painful > > h2. Solr 4.5 > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > > goes like > {code} > \{!lucene}(foo) > content:foo > LuceneQParser > {code} > fine > h2. Solr 8.2 > with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but > it's a question of migrating existing queries. > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > goes like > {code} > "querystring":"\{!lucene}(foo)", > "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene > Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) > "QParser":"ExtendedDismaxQParser", > {code} > blah... > but removing braces in 8.2 works perfectly fine > {code} > "querystring":"\{!lucene}foo", > "parsedquery":"+content:foo", > "parsedquery_toString":"+content:foo", > "QParser":"ExtendedDismaxQParser", > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.
jpountz commented on a change in pull request #1569: URL: https://github.com/apache/lucene-solr/pull/1569#discussion_r438831834 ## File path: lucene/core/src/test/org/apache/lucene/index/TestAllFilesDetectBitFlips.java ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.index; + + +import java.io.IOException; +import java.util.Arrays; +import java.util.Collections; + +import org.apache.lucene.analysis.MockAnalyzer; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.store.BaseDirectoryWrapper; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.IndexOutput; +import org.apache.lucene.util.LineFileDocs; +import org.apache.lucene.util.LuceneTestCase; +import org.apache.lucene.util.LuceneTestCase.SuppressFileSystems; +import org.apache.lucene.util.TestUtil; + +/** + * Test that the default codec detects bit flips at open or checkIntegrity time. + */ +@SuppressFileSystems("ExtrasFS") +public class TestAllFilesDetectBitFlips extends LuceneTestCase { + + public void test() throws Exception { +doTest(false); + } + + public void testCFS() throws Exception { +doTest(true); + } + + public void doTest(boolean cfs) throws Exception { +Directory dir = newDirectory(); + +IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random())); +conf.setCodec(TestUtil.getDefaultCodec()); + +if (cfs == false) { + conf.setUseCompoundFile(false); + conf.getMergePolicy().setNoCFSRatio(0.0); +} + +RandomIndexWriter riw = new RandomIndexWriter(random(), dir, conf); +// Use LineFileDocs so we (hopefully) get most Lucene features Review comment: This is actually copy-pasted from `TestAllFilesDetectTruncation` :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14557) eDisMax parser switch + braces regression
[ https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133288#comment-17133288 ] David Smiley commented on SOLR-14557: - The issue description is a bit unclear to me in terms of what you are saying is the bug (you filed this as a bug after all). Yes there was a change in SOLR-11501 that will affect what you are trying to do. But what is the bug or problem? For the overall use-case of wanting to parse that lucene query, then pass {{defType=lucene}} instead of edismax. You could instead set {{uf=\*,\_query\_}} if you want _users_ to be able to make this choice if you trust them to :-). This is in the upgrade notes written for SOLR-11501 in CHANGES.txt. > eDisMax parser switch + braces regression > - > > Key: SOLR-14557 > URL: https://issues.apache.org/jira/browse/SOLR-14557 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Mikhail Khludnev >Priority: Major > Labels: painful > > h2. Solr 4.5 > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > > goes like > {code} > \{!lucene}(foo) > content:foo > LuceneQParser > {code} > fine > h2. Solr 8.2 > with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but > it's a question of migrating existing queries. > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > goes like > {code} > "querystring":"\{!lucene}(foo)", > "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene > Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) > "QParser":"ExtendedDismaxQParser", > {code} > blah... > but removing braces in 8.2 works perfectly fine > {code} > "querystring":"\{!lucene}foo", > "parsedquery":"+content:foo", > "parsedquery_toString":"+content:foo", > "QParser":"ExtendedDismaxQParser", > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.
mikemccand commented on a change in pull request #1569: URL: https://github.com/apache/lucene-solr/pull/1569#discussion_r438774512 ## File path: lucene/core/src/test/org/apache/lucene/index/TestAllFilesDetectBitFlips.java ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.index; + + +import java.io.IOException; +import java.util.Arrays; +import java.util.Collections; + +import org.apache.lucene.analysis.MockAnalyzer; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.store.BaseDirectoryWrapper; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.IndexOutput; +import org.apache.lucene.util.LineFileDocs; +import org.apache.lucene.util.LuceneTestCase; +import org.apache.lucene.util.LuceneTestCase.SuppressFileSystems; +import org.apache.lucene.util.TestUtil; + +/** + * Test that the default codec detects bit flips at open or checkIntegrity time. + */ +@SuppressFileSystems("ExtrasFS") +public class TestAllFilesDetectBitFlips extends LuceneTestCase { + + public void test() throws Exception { +doTest(false); + } + + public void testCFS() throws Exception { +doTest(true); + } + + public void doTest(boolean cfs) throws Exception { +Directory dir = newDirectory(); + +IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random())); +conf.setCodec(TestUtil.getDefaultCodec()); + +if (cfs == false) { + conf.setUseCompoundFile(false); + conf.getMergePolicy().setNoCFSRatio(0.0); +} + +RandomIndexWriter riw = new RandomIndexWriter(random(), dir, conf); +// Use LineFileDocs so we (hopefully) get most Lucene features Review comment: Woohoo! ## File path: lucene/core/src/test/org/apache/lucene/index/TestAllFilesDetectBitFlips.java ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.index; + + +import java.io.IOException; +import java.util.Arrays; +import java.util.Collections; + +import org.apache.lucene.analysis.MockAnalyzer; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.store.BaseDirectoryWrapper; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.IndexOutput; +import org.apache.lucene.util.LineFileDocs; +import org.apache.lucene.util.LuceneTestCase; +import org.apache.lucene.util.LuceneTestCase.SuppressFileSystems; +import org.apache.lucene.util.TestUtil; + +/** + * Test that the default codec detects bit flips at open or checkIntegrity time. + */ +@SuppressFileSystems("ExtrasFS") +public class TestAllFilesDetectBitFlips extends LuceneTestCase { + + public void test() throws Exception { +doTest(false); + } + + public void testCFS() throws Exception { +doTest(true); + } + + public void doTest(boolean cfs) throws Exception { +Directory dir = newDirectory(); + +IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random())); +conf.setCodec(TestUtil.getDefaultCodec()); + +if (cfs == false) { + conf.setUseCompoundFile(false); + conf.getMergePolicy().setNoCFSRatio(0.0); +} + +RandomIndexWriter riw = new RandomIndexWriter(random(), dir, conf); +// Use LineFileDocs so we (hopefully) get most Lucene feature
[GitHub] [lucene-solr] gerlowskija opened a new pull request #1570: SOLR-14558: Record all log lines in SolrLogPostTool
gerlowskija opened a new pull request #1570: URL: https://github.com/apache/lucene-solr/pull/1570 # Description Previously, SolrLogPostTool ignored all log-lines that didn't fall into a narrow handful of known "types". This change adds a new "other" type, to hold all previously-ignored log records. This allows all log records to be searched - not just the whitelisted cluster-traffic event types.(queries, commits, etc.) # Solution Straightforward implementation. # Tests New test to SolrLogPostToolTest. Manual testing. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `ant precommit` and the appropriate test suite. - [x] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.
jpountz commented on pull request #1569: URL: https://github.com/apache/lucene-solr/pull/1569#issuecomment-642621848 > So we now check checksums on every file when opening We only verify checksums of meta file when opening (those that we read entirely anyway). Checksums only get verified on other files when `LeafReader#checkIntegrity` is called. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.
jpountz commented on a change in pull request #1569: URL: https://github.com/apache/lucene-solr/pull/1569#discussion_r438754619 ## File path: lucene/core/src/test/org/apache/lucene/index/TestAllFilesDetectBitFlips.java ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.index; + + +import java.io.IOException; +import java.util.Arrays; +import java.util.Collections; + +import org.apache.lucene.analysis.MockAnalyzer; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.store.BaseDirectoryWrapper; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.IndexOutput; +import org.apache.lucene.util.LineFileDocs; +import org.apache.lucene.util.LuceneTestCase; +import org.apache.lucene.util.LuceneTestCase.SuppressFileSystems; +import org.apache.lucene.util.TestUtil; + +/** + * Test that the default codec detects bit flips at open or checkIntegrity time. + */ +@SuppressFileSystems("ExtrasFS") +public class TestAllFilesDetectBitFlips extends LuceneTestCase { + + public void test() throws Exception { +doTest(false); + } + + public void testCFS() throws Exception { +doTest(true); + } + + public void doTest(boolean cfs) throws Exception { +Directory dir = newDirectory(); + +IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random())); +conf.setCodec(TestUtil.getDefaultCodec()); + +if (cfs == false) { + conf.setUseCompoundFile(false); + conf.getMergePolicy().setNoCFSRatio(0.0); +} + +RandomIndexWriter riw = new RandomIndexWriter(random(), dir, conf); +// Use LineFileDocs so we (hopefully) get most Lucene features +// tested, e.g. IntPoint was recently added to it: +LineFileDocs docs = new LineFileDocs(random()); +for (int i = 0; i < 100; i++) { + riw.addDocument(docs.nextDoc()); + if (random().nextInt(7) == 0) { +riw.commit(); + } + if (random().nextInt(20) == 0) { +riw.deleteDocuments(new Term("docid", Integer.toString(i))); + } + if (random().nextInt(15) == 0) { +riw.updateNumericDocValue(new Term("docid", Integer.toString(i)), "docid_intDV", Long.valueOf(i)); + } +} +if (TEST_NIGHTLY == false) { + riw.forceMerge(1); +} +riw.close(); +checkBitFlips(dir); +dir.close(); + } + + private void checkBitFlips(Directory dir) throws IOException { +for(String name : dir.listAll()) { + if (name.equals(IndexWriter.WRITE_LOCK_NAME) == false) { +corruptFile(dir, name); + } +} + } + + private void corruptFile(Directory dir, String victim) throws IOException { +try (BaseDirectoryWrapper dirCopy = newDirectory()) { + dirCopy.setCheckIndexOnClose(false); + + long victimLength = dir.fileLength(victim); + long flipOffset = TestUtil.nextLong(random(), 0, victimLength - 1); + + if (VERBOSE) { +System.out.println("TEST: now corrupt file " + victim + " by changing byte at offset " + flipOffset + " (length= " + victimLength + ")"); + } + + for(String name : dir.listAll()) { +if (name.equals(victim) == false) { + dirCopy.copyFrom(dir, name, name, IOContext.DEFAULT); +} else { + try (IndexOutput out = dirCopy.createOutput(name, IOContext.DEFAULT); + IndexInput in = dir.openInput(name, IOContext.DEFAULT)) { + out.copyBytes(in, flipOffset); + out.writeByte((byte) (in.readByte() + TestUtil.nextInt(random(), 0x01, 0xFF))); + out.copyBytes(in, victimLength - flipOffset - 1); + } + try (IndexInput in = dirCopy.openInput(name, IOContext.DEFAULT)) { +try { + CodecUtil.checksumEntireFile(in); + System.out.println("TEST: changing a byte in " + victim + " did not update the checksum)"); Review comment: I haven't seen a single occurrence of it (fortunately! :) ) This is an automated message from the Apache Git Service. To respond to the message, please log on t
[GitHub] [lucene-solr] msokolov commented on a change in pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.
msokolov commented on a change in pull request #1569: URL: https://github.com/apache/lucene-solr/pull/1569#discussion_r438751988 ## File path: lucene/core/src/test/org/apache/lucene/index/TestAllFilesDetectBitFlips.java ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.index; + + +import java.io.IOException; +import java.util.Arrays; +import java.util.Collections; + +import org.apache.lucene.analysis.MockAnalyzer; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.store.BaseDirectoryWrapper; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IOContext; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.IndexOutput; +import org.apache.lucene.util.LineFileDocs; +import org.apache.lucene.util.LuceneTestCase; +import org.apache.lucene.util.LuceneTestCase.SuppressFileSystems; +import org.apache.lucene.util.TestUtil; + +/** + * Test that the default codec detects bit flips at open or checkIntegrity time. + */ +@SuppressFileSystems("ExtrasFS") +public class TestAllFilesDetectBitFlips extends LuceneTestCase { + + public void test() throws Exception { +doTest(false); + } + + public void testCFS() throws Exception { +doTest(true); + } + + public void doTest(boolean cfs) throws Exception { +Directory dir = newDirectory(); + +IndexWriterConfig conf = newIndexWriterConfig(new MockAnalyzer(random())); +conf.setCodec(TestUtil.getDefaultCodec()); + +if (cfs == false) { + conf.setUseCompoundFile(false); + conf.getMergePolicy().setNoCFSRatio(0.0); +} + +RandomIndexWriter riw = new RandomIndexWriter(random(), dir, conf); +// Use LineFileDocs so we (hopefully) get most Lucene features +// tested, e.g. IntPoint was recently added to it: +LineFileDocs docs = new LineFileDocs(random()); +for (int i = 0; i < 100; i++) { + riw.addDocument(docs.nextDoc()); + if (random().nextInt(7) == 0) { +riw.commit(); + } + if (random().nextInt(20) == 0) { +riw.deleteDocuments(new Term("docid", Integer.toString(i))); + } + if (random().nextInt(15) == 0) { +riw.updateNumericDocValue(new Term("docid", Integer.toString(i)), "docid_intDV", Long.valueOf(i)); + } +} +if (TEST_NIGHTLY == false) { + riw.forceMerge(1); +} +riw.close(); +checkBitFlips(dir); +dir.close(); + } + + private void checkBitFlips(Directory dir) throws IOException { +for(String name : dir.listAll()) { + if (name.equals(IndexWriter.WRITE_LOCK_NAME) == false) { +corruptFile(dir, name); + } +} + } + + private void corruptFile(Directory dir, String victim) throws IOException { +try (BaseDirectoryWrapper dirCopy = newDirectory()) { + dirCopy.setCheckIndexOnClose(false); + + long victimLength = dir.fileLength(victim); + long flipOffset = TestUtil.nextLong(random(), 0, victimLength - 1); + + if (VERBOSE) { +System.out.println("TEST: now corrupt file " + victim + " by changing byte at offset " + flipOffset + " (length= " + victimLength + ")"); + } + + for(String name : dir.listAll()) { +if (name.equals(victim) == false) { + dirCopy.copyFrom(dir, name, name, IOContext.DEFAULT); +} else { + try (IndexOutput out = dirCopy.createOutput(name, IOContext.DEFAULT); + IndexInput in = dir.openInput(name, IOContext.DEFAULT)) { + out.copyBytes(in, flipOffset); + out.writeByte((byte) (in.readByte() + TestUtil.nextInt(random(), 0x01, 0xFF))); + out.copyBytes(in, victimLength - flipOffset - 1); + } + try (IndexInput in = dirCopy.openInput(name, IOContext.DEFAULT)) { +try { + CodecUtil.checksumEntireFile(in); + System.out.println("TEST: changing a byte in " + victim + " did not update the checksum)"); Review comment: curious if you saw this much? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov
[jira] [Created] (SOLR-14559) Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api
Erick Erickson created SOLR-14559: - Summary: Fix or suppress warnings in solr/core/src/java/org/apache/solr/util, response, cloud, security, schema, api Key: SOLR-14559 URL: https://issues.apache.org/jira/browse/SOLR-14559 Project: Solr Issue Type: Sub-task Reporter: Erick Erickson Assignee: Erick Erickson There's considerable overhead in testing and precommit, so fixing up one directory at a time is getting tedious as there are fewer and fewer warnings in particular directories. This set will fix about half the remaining warnings outside of solrj, 300 or so. Then one more Jira will fix the remaining warnings in Solr (exclusive of SolrJ). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9356) Add tests for corruptions caused by byte flips
[ https://issues.apache.org/jira/browse/LUCENE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133199#comment-17133199 ] Adrien Grand commented on LUCENE-9356: -- Thanks to LUCENE-7822 and LUCENE-9359, Lucene now always throws a CorruptIndexException or an IndexFormatToo(Old|New)Exception when opening and then calling checkIntegrity on an index. The attached PR adds a test. > Add tests for corruptions caused by byte flips > -- > > Key: LUCENE-9356 > URL: https://issues.apache.org/jira/browse/LUCENE-9356 > Project: Lucene - Core > Issue Type: Test >Reporter: Adrien Grand >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > We already have tests that file truncation and modification of the index > headers are caught correctly. I'd like to add another test that flipping a > byte in a way that modifies the checksum of the file is always caught > gracefully by Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz opened a new pull request #1569: LUCENE-9356: Add a test that verifies that Lucene catches bit flips.
jpountz opened a new pull request #1569: URL: https://github.com/apache/lucene-solr/pull/1569 Opening a reader and then calling checkIntegrity must throw a `CorruptIndexException` or an `IndexFormatToo(Old|New)Exception`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14541) Ensure classes that implement equals implement hashCode or suppress warnings
[ https://issues.apache.org/jira/browse/SOLR-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133187#comment-17133187 ] Erick Erickson commented on SOLR-14541: --- [~murblanc] Thanks. As you can tell, I barely looked for causes, I was just excited that it surfaced after implementing your suggestion (I have to go back and re-do the ones in solrj/io that I just suppressed). I'll wait for [~ab] to weigh in on what the necessary implementation would be. If this were ever used to compute a key, things would be messy since leadership change would compute a duplicate entry one way or another for what is conceptually the same node. So returning zero might make sense. > Ensure classes that implement equals implement hashCode or suppress warnings > > > Key: SOLR-14541 > URL: https://issues.apache.org/jira/browse/SOLR-14541 > Project: Solr > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: 0001-SOLR-14541-add-hashCode-for-some-classes.patch, > 0002-SOLR-14541-add-hashCode-for-some-classes-in-autoscal.patch, > 0003-SOLR-14541-add-hashCode-or-remove-equals-for-some-cl.patch > > > While looking at warnings, I found that the following classes generate this > warning: > *overrides equals, but neither it nor any superclass overrides hashCode > method* > I can suppress the warning, but this has been a source of errors in the past > so I'm reluctant to just do that blindly. > NOTE: The Lucene one should probably be it's own Jira if it's going to have > hashCode implemented, but here for triage. > What I need for each method is for someone who has a clue about that > particular code to render an opinion that we can safely suppress the warning > or to provide a hashCode method. > Some of these have been here for a very long time and were implemented by > people no longer active... > lucene/suggest/src/java/org/apache/lucene/search/spell/LuceneLevenshteinDistance.java:39 > solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java:34 > solr/solrj/src/java/org/apache/solr/common/cloud/Replica.java:26 > solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java:49 > solr/core/src/java/org/apache/solr/cloud/rule/Rule.java:277 > solr/core/src/java/org/apache/solr/pkg/PackageAPI.java:177 > solr/core/src/java/org/apache/solr/packagemanager/SolrPackageInstance.java:31 > > Noble Paul says it's OK to suppress warnings for these: > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/VersionedData.java:31 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:61 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:150 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:252 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:45 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Policy.java:73 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Preference.java:32 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/ReplicaInfo.java:39 > > Joel Bernstein says it's OK to suppress warnings for these: > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/ReplicaCount.java:27 > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpression.java:25 > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionNamedParameter.java:23 > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/CloudSolrStream.java:467 > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/DeepRandomStream.java:417 > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionValue.java:22 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14442) bin/solr to attempt jstack before killing hung Solr instance
[ https://issues.apache.org/jira/browse/SOLR-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev resolved SOLR-14442. - Fix Version/s: 8.6 Resolution: Fixed > bin/solr to attempt jstack before killing hung Solr instance > > > Key: SOLR-14442 > URL: https://issues.apache.org/jira/browse/SOLR-14442 > Project: Solr > Issue Type: Improvement >Reporter: Christine Poerschke >Assignee: Christine Poerschke >Priority: Minor > Fix For: 8.6 > > Attachments: SOLR-14442.patch, SOLR-14442.patch, SOLR-14442.patch, > screenshot-1.png > > > If a Solr instance did not respond to the 'stop' command in a timely manner > then the {{bin/solr}} script will attempt to forcefully kill it: > [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.5.1/solr/bin/solr#L859] > Gathering of information (e.g. a jstack of the java process) before the kill > command may be helpful in determining why the instance did not stop as > expected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14541) Ensure classes that implement equals implement hashCode or suppress warnings
[ https://issues.apache.org/jira/browse/SOLR-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133111#comment-17133111 ] Ilan Ginzburg commented on SOLR-14541: -- The stack trace you posted [~erickerickson] is because {{ReplicaInfo}} is stored as a value (not key!) in a {{HashMap}} and we happen to get the hashCode of that whole hash map (map {{properties}} in {{TriggerEvent.hashCode()}}), and this iterates over all entries of the hash map and computes the hash value of the key as well as of the value. {{SearchRateEvent}} in {{SearchRateTrigger}} for example is a {{TriggerEvent}}. It adds lists of {{ReplicaInfo}} into the {{properties}} map. The issue is not limited to test code. > Ensure classes that implement equals implement hashCode or suppress warnings > > > Key: SOLR-14541 > URL: https://issues.apache.org/jira/browse/SOLR-14541 > Project: Solr > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: 0001-SOLR-14541-add-hashCode-for-some-classes.patch, > 0002-SOLR-14541-add-hashCode-for-some-classes-in-autoscal.patch, > 0003-SOLR-14541-add-hashCode-or-remove-equals-for-some-cl.patch > > > While looking at warnings, I found that the following classes generate this > warning: > *overrides equals, but neither it nor any superclass overrides hashCode > method* > I can suppress the warning, but this has been a source of errors in the past > so I'm reluctant to just do that blindly. > NOTE: The Lucene one should probably be it's own Jira if it's going to have > hashCode implemented, but here for triage. > What I need for each method is for someone who has a clue about that > particular code to render an opinion that we can safely suppress the warning > or to provide a hashCode method. > Some of these have been here for a very long time and were implemented by > people no longer active... > lucene/suggest/src/java/org/apache/lucene/search/spell/LuceneLevenshteinDistance.java:39 > solr/solrj/src/java/org/apache/solr/common/cloud/ZkNodeProps.java:34 > solr/solrj/src/java/org/apache/solr/common/cloud/Replica.java:26 > solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java:49 > solr/core/src/java/org/apache/solr/cloud/rule/Rule.java:277 > solr/core/src/java/org/apache/solr/pkg/PackageAPI.java:177 > solr/core/src/java/org/apache/solr/packagemanager/SolrPackageInstance.java:31 > > Noble Paul says it's OK to suppress warnings for these: > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/VersionedData.java:31 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:61 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:150 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:252 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/AutoScalingConfig.java:45 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Policy.java:73 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Preference.java:32 > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/ReplicaInfo.java:39 > > Joel Bernstein says it's OK to suppress warnings for these: > > solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/ReplicaCount.java:27 > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpression.java:25 > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionNamedParameter.java:23 > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/CloudSolrStream.java:467 > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/DeepRandomStream.java:417 > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionValue.java:22 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram opened a new pull request #1568: SOLR-14537 Improve performance of ExportWriter
sigram opened a new pull request #1568: URL: https://github.com/apache/lucene-solr/pull/1568 Details in Jira. Initial changes here implement the "double buffering" approach to increase the throughput - an additional thread is created to fill in a buffer while the main thread writes out the documents from the other buffer. Lucene TermEnum-s and DocValues are not multi-thread safe, that's why this change required the documents to be fully materialized in the buffer before handing it over to the other thread for writing. I think this is an acceptable tradeoff between reasonable amount of memory in the buffer and speed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14557) eDisMax parser switch + braces regression
[ https://issues.apache.org/jira/browse/SOLR-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-14557: Summary: eDisMax parser switch + braces regression (was: eDisMax (regression)) > eDisMax parser switch + braces regression > - > > Key: SOLR-14557 > URL: https://issues.apache.org/jira/browse/SOLR-14557 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Mikhail Khludnev >Priority: Major > Labels: painful > > h2. Solr 4.5 > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > > goes like > {code} > \{!lucene}(foo) > content:foo > LuceneQParser > {code} > fine > h2. Solr 8.2 > with luceneMatchVersion=4.5 following SOLR-11501 I know it's a grey zone but > it's a question of migrating existing queries. > {{/select?defType=edismax&q=\{!lucene}(foo)&debugQuery=true}} > goes like > {code} > "querystring":"\{!lucene}(foo)", > "parsedquery":"+DisjunctionMaxQuery(((Project.Address:lucene > Project.Address:foo) | (Project.OwnerType:lucene Project.OwnerType:foo) > "QParser":"ExtendedDismaxQParser", > {code} > blah... > but removing braces in 8.2 works perfectly fine > {code} > "querystring":"\{!lucene}foo", > "parsedquery":"+content:foo", > "parsedquery_toString":"+content:foo", > "QParser":"ExtendedDismaxQParser", > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9397) UniformSplit supports encodable fields metadata
[ https://issues.apache.org/jira/browse/LUCENE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133028#comment-17133028 ] Bruno Roustant commented on LUCENE-9397: Currently we use the encoder interface to cypher term blocks, FST and fields metadata. We don't attach more data. However I'm going to work on LUCENE-9379 for a directory-based approach of encryption that would not be tied to a postings format. Eventually we would like to move to that solution. > UniformSplit supports encodable fields metadata > --- > > Key: LUCENE-9397 > URL: https://issues.apache.org/jira/browse/LUCENE-9397 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > UniformSplit already supports custom encoding for term blocks. This is an > extension to also support encodable fields metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org