[jira] [Created] (LUCENE-9824) Hunspell suggestions: speed up ngram score calculation for each dictionary entry
Peter Gromov created LUCENE-9824: Summary: Hunspell suggestions: speed up ngram score calculation for each dictionary entry Key: LUCENE-9824 URL: https://issues.apache.org/jira/browse/LUCENE-9824 Project: Lucene - Core Issue Type: Sub-task Reporter: Peter Gromov -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15185) Improve "hash" QParser
[ https://issues.apache.org/jira/browse/SOLR-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved SOLR-15185. - Fix Version/s: master (9.0) Resolution: Fixed > Improve "hash" QParser > -- > > Key: SOLR-15185 > URL: https://issues.apache.org/jira/browse/SOLR-15185 > Project: Solr > Issue Type: Improvement >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h > Remaining Estimate: 0h > > * Don't use Filter (to be removed) > * Do use TwoPhaseIterator, not PostFilter > * Don't pre-compute matching docs (wasteful) > * Support more fields, and more field types > * Faster hash on Strings (avoid Char conversion) > * Stronger hash when using multiple fields -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15185) Improve "hash" QParser
[ https://issues.apache.org/jira/browse/SOLR-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295761#comment-17295761 ] ASF subversion and git services commented on SOLR-15185: Commit ddbd3b88ec8a9c3acc55e351f94f370a11f514b5 in lucene-solr's branch refs/heads/master from David Smiley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ddbd3b8 ] SOLR-15185: Optimize Hash QParser (#1524) used in parallel() streaming expression. Hash algorithm is different. * Simpler * Don't use Filter (to be removed) * Do use TwoPhaseIterator, not PostFilter * Don't pre-compute matching docs (wasteful) * Support more fields, and more field types * Faster hash on Strings (avoid Char conversion) * Stronger hash when using multiple fields > Improve "hash" QParser > -- > > Key: SOLR-15185 > URL: https://issues.apache.org/jira/browse/SOLR-15185 > Project: Solr > Issue Type: Improvement >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > * Don't use Filter (to be removed) > * Do use TwoPhaseIterator, not PostFilter > * Don't pre-compute matching docs (wasteful) > * Support more fields, and more field types > * Faster hash on Strings (avoid Char conversion) > * Stronger hash when using multiple fields -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley merged pull request #1524: SOLR-15185: Rewrite Hash query
dsmiley merged pull request #1524: URL: https://github.com/apache/lucene-solr/pull/1524 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14660) Migrating HDFS into a package
[ https://issues.apache.org/jira/browse/SOLR-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295758#comment-17295758 ] David Smiley commented on SOLR-14660: - When the build.gradle is created for this contrib, please try to undo the dependency flattening that's in most modules only because it was ported from Ant. Basically SOLR-14929 but just scoped to this new contrib. This means removing all/most of the {{transitive=false}} and runtime dependencies, then figure out which deps are added unnecessarily so they can be excluded. So for example, I'm seeing that Netty will be transitively included, and thus won't need a mention in build.gradle. BTW in SOLR-15215 I'm removing Netty from SolrJ and in so doing I had to explicitly reference Netty in solr-core's build because it will no longer come in automatically via SolrJ. But ideally, hadoop deps would not say transitive=false so I wouldn't of had to do this. Woodstox is a dependency of hadoop that solr-core will continue to provide for awhile. Jackson -- same. There's a help/dependencies.txt file that is helpful. > Migrating HDFS into a package > - > > Key: SOLR-14660 > URL: https://issues.apache.org/jira/browse/SOLR-14660 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Priority: Major > Labels: package, packagemanager > > Following up on the deprecation of HDFS (SOLR-14021), we need to work on > isolating it away from Solr core and making a package for this. This issue is > to track the efforts for that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on pull request #2342: LUCENE-9406: Add IndexWriterEventListener to track events in IndexWriter
zacharymorn commented on pull request #2342: URL: https://github.com/apache/lucene-solr/pull/2342#issuecomment-791146478 No problem, and thanks for the review feedback as well Michael! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15217) rename shardsWhitelist and use it more broadly
[ https://issues.apache.org/jira/browse/SOLR-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295720#comment-17295720 ] David Smiley commented on SOLR-15217: - Maybe use it as an alternative for "allowSolrUrls" in CrossCollectionJoinQParser, maybe in ReplicationHandler. I suppose it's current location is not bad but at the top level (outside of {{}} would be better. I'm doubtful it's worth moving it though. I don't love that "shards" is in its name... I'd even prefer using the name chosen by CrossCollectionJoinQParser: allowSolrUrls. > rename shardsWhitelist and use it more broadly > -- > > Key: SOLR-15217 > URL: https://issues.apache.org/jira/browse/SOLR-15217 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Priority: Major > > The {{shardsWhitelist}} is defined on shardHandlerFactory element in > solr.xml. We should rename it so something like "shardsAllowList". And we > could use it in more places. > https://solr.apache.org/guide/8_7/distributed-requests.html#configuring-the-shardhandlerfactory -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15217) rename shardsWhitelist and use it more broadly
David Smiley created SOLR-15217: --- Summary: rename shardsWhitelist and use it more broadly Key: SOLR-15217 URL: https://issues.apache.org/jira/browse/SOLR-15217 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: David Smiley The {{shardsWhitelist}} is defined on shardHandlerFactory element in solr.xml. We should rename it so something like "shardsAllowList". And we could use it in more places. https://solr.apache.org/guide/8_7/distributed-requests.html#configuring-the-shardhandlerfactory -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14788) Solr: The Next Big Thing
[ https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295703#comment-17295703 ] Mark Robert Miller commented on SOLR-14788: --- So with all the issues with my desire to do this, having stumbled upon it, mostly revolves around having the time even available to me for the time needed and the future time, being able to force myself to do it again or that often. On top of that, having lost major work on it, feeling time pressure from multiple angles, mostly fearing I wouldn’t be able to keep concentrating like that, I got setup to be in a vulnerable state, which was fine in general for what I was prepared for, and then the unexpected hit me harder than it should have - and so that cost me a ton and made it all even more uncertain. And I just had to keep doubling down. Forced concentration march. I say all that only to try and explain why I’ll now say, ignore most of what I said. I was trying to get here, to a point like this, really the whole time. And if I’d gotten there, I’d have now said: I’ve got this branch state. It’s got great characteristics, lots of performance and efficiency, tests that can be fast and solid, lots of stuff. I wish I could have shared with something else, or just the knowledge to get there together, or time planning together. But I’m as good at that as I’ve always been. So I only have and had the same thing to offer there. But one other thing I can do is this. And so what can be done with it is an interesting, open, and hard question. A few of us are going to keep pushing on it, not in terms of more changes, but on what is personal final state. In the meantime, I can imagine future brainstorming about what we might do with it if people end up having an interest in some of the results. > Solr: The Next Big Thing > > > Key: SOLR-14788 > URL: https://issues.apache.org/jira/browse/SOLR-14788 > Project: Solr > Issue Type: Task >Reporter: Mark Robert Miller >Assignee: Mark Robert Miller >Priority: Critical > Time Spent: 4h > Remaining Estimate: 0h > > h3. > [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The > Policeman is {color:#de350b}NOW{color} {color:#de350b}OFF{color} > duty!*{color} > {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and > have some fun. Try to make some progress. Don't stress too much about the > impact of your changes or maintaining stability and performance and > correctness so much. Until the end of phase 1, I've got your back. I have a > variety of tools and contraptions I have been building over the years and I > will continue training them on this branch. I will review your changes and > peer out across the land and course correct where needed. As Mike D will be > thinking, "Sounds like a bottleneck Mark." And indeed it will be to some > extent. Which is why once stage one is completed, I will flip The Policeman > to off duty. When off duty, I'm always* *occasionally*{color} *down for some > vigilante justice, but I won't be walking the beat, all that stuff about sit > back and relax goes out the window.*_ > {quote} > > I have stolen this title from Ishan or Noble and Ishan. > This issue is meant to capture the work of a small team that is forming to > push Solr and SolrCloud to the next phase. > I have kicked off the work with an effort to create a very fast and solid > base. That work is not 100% done, but it's ready to join the fight. > Tim Potter has started giving me a tremendous hand in finishing up. Ishan and > Noble have already contributed support and testing and have plans for > additional work to shore up some of our current shortcomings. > Others have expressed an interest in helping and hopefully they will pop up > here as well. > Let's organize and discuss our efforts here and in various sub issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9823) SynonymQuery rewrite can change field boost calculation
[ https://issues.apache.org/jira/browse/LUCENE-9823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295653#comment-17295653 ] Robert Muir commented on LUCENE-9823: - +1 sounds like this rewrite is unsafe. > SynonymQuery rewrite can change field boost calculation > --- > > Key: LUCENE-9823 > URL: https://issues.apache.org/jira/browse/LUCENE-9823 > Project: Lucene - Core > Issue Type: Bug >Reporter: Julie Tibshirani >Priority: Minor > > SynonymQuery accepts a boost per term, which acts as a multiplier on the term > frequency in the document. When rewriting a SynonymQuery with a single term, > we create a BoostQuery wrapping a TermQuery. This changes the meaning of the > boost: it now multiplies the final TermQuery score instead of multiplying the > term frequency before it's passed to the score calculation. > This is a small point, but maybe it's worth avoiding rewriting a single-term > SynonymQuery unless the boost is 1.0. > The same consideration affects CombinedFieldQuery in sandbox. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9822) Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-9822. - Fix Version/s: master (9.0) Resolution: Fixed Thanks [~gsmiller] ! > Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil > -- > > Key: LUCENE-9822 > URL: https://issues.apache.org/jira/browse/LUCENE-9822 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: master (9.0) >Reporter: Greg Miller >Priority: Trivial > Fix For: master (9.0) > > Attachments: LUCENE-9822.patch > > > PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when > generating "patch offsets". If this assumption doesn't hold, PForUtil will > silently encode incorrect positions. While the BLOCK_SIZE isn't particularly > configurable, it would be nice to assert this assumption early in PForUtil in > the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9822) Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295647#comment-17295647 ] ASF subversion and git services commented on LUCENE-9822: - Commit 8e337ab63fac9aeeaf76e91c698cabad0ccbe769 in lucene-solr's branch refs/heads/master from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8e337ab ] LUCENE-9822: Assert that ForUtil.BLOCK_SIZE can be PFOR-encoded in a single byte For/PFor code has BLOCK_SIZE=128 as a static final constant, with a lot of assumptions and optimizations for that case. For example it will encode 3 exceptions at most and optimizes the exception encoding with a single byte. This would not work at all if you changed the constant in the code to something like 512, but an assertion at an early stage helps make experimentation less painful, and better "documents" the assumption of how the exception encoding currently works. > Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil > -- > > Key: LUCENE-9822 > URL: https://issues.apache.org/jira/browse/LUCENE-9822 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: master (9.0) >Reporter: Greg Miller >Priority: Trivial > Attachments: LUCENE-9822.patch > > > PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when > generating "patch offsets". If this assumption doesn't hold, PForUtil will > silently encode incorrect positions. While the BLOCK_SIZE isn't particularly > configurable, it would be nice to assert this assumption early in PForUtil in > the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9823) SynonymQuery rewrite can change field boost calculation
Julie Tibshirani created LUCENE-9823: Summary: SynonymQuery rewrite can change field boost calculation Key: LUCENE-9823 URL: https://issues.apache.org/jira/browse/LUCENE-9823 Project: Lucene - Core Issue Type: Bug Reporter: Julie Tibshirani SynonymQuery accepts a boost per term, which acts as a multiplier on the term frequency in the document. When rewriting a SynonymQuery with a single term, we create a BoostQuery wrapping a TermQuery. This changes the meaning of the boost: it now multiplies the final TermQuery score instead of multiplying the term frequency before it's passed to the score calculation. This is a small point, but maybe it's worth avoiding rewriting a single-term SynonymQuery unless the boost is 1.0. The same consideration affects CombinedFieldQuery in sandbox. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] tflobbe commented on pull request #2456: SOLR-15216 Fix for Invalid Reference to data.followers in Admin UI
tflobbe commented on pull request #2456: URL: https://github.com/apache/lucene-solr/pull/2456#issuecomment-791027598 @deanpearce, do you want to add an entry to CHANGES.txt in the 8.9 section? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently
[ https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295633#comment-17295633 ] Robert Muir commented on LUCENE-9754: - This tokenizer splits on scripts because it lets you customize the tokenization per-script by design. The reason is some writing systems need different approaches... or even choices. See LUCENE-7393 for a great example. So it is just like the "notes" section says, and I will quote: {quote} Normally word breaking does not require breaking between different scripts. However, adding that capability may be useful in combination with other extensions of word segmentation. {quote} And that is what we do, for that exact reason. I am guessing it confuses you because it seems to break all kinds of "rules" (e.g. don't break between letters). If you want a simple state-machine based on those rules without fancy stuff, again I recommend using StandardTokenizer instead. This tokenizer is quite different, it will do entirely different algorithms depending on the writing system. And you can customize that with rules and options (e.g. break Myanmar with ICU word dictionary or with syllables) > ICU Tokenizer: letter-space-number-letter tokenized inconsistently > -- > > Key: LUCENE-9754 > URL: https://issues.apache.org/jira/browse/LUCENE-9754 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.5 > Environment: Tested most recently on Elasticsearch 6.5.4. >Reporter: Trey Jones >Priority: Major > Attachments: LUCENE-9754_prototype.patch > > > The tokenization of strings like _14th_ with the ICU tokenizer is affected by > the character that comes before preceeding whitespace. > For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | > 14 | th. > In general, in a letter-space-number-letter sequence, if the writing system > before the space is the same as the writing system after the number, then you > get two tokens. If the writing systems differ, you get three tokens. > -If the conditions are just right, the chunking that the ICU tokenizer does > (trying to split on spaces to create <4k chunks) can create an artificial > boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the > unexpected split of the second token (_14th_). Because chunking changes can > ripple through a long document, editing text or the effects of a character > filter can cause changes in tokenization thousands of lines later in a > document.- _(This inconsistency was included as a side issue that I thought > might add more weight to the main problem I am concerned with, but it seems > to be more of a distraction. Chunking issues should perhaps be addressed in a > different ticket, so I'm striking it out.)_ > My guess is that some "previous character set" flag is not reset at the > space, and numbers are not in a character set, so _t_ is compared to _ァ_ and > they are not the same—causing a token split at the character set change—but > I'm not sure. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15216) Invalid JS Object Key data.followers.currentData
[ https://issues.apache.org/jira/browse/SOLR-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dean Pearce updated SOLR-15216: --- Affects Version/s: 8.8 8.8.1 > Invalid JS Object Key data.followers.currentData > > > Key: SOLR-15216 > URL: https://issues.apache.org/jira/browse/SOLR-15216 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: 8.7, 8.8, 8.8.1 >Reporter: Dean Pearce >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Minor bug in the Admin UI Angular code, a line was changed to ` > settings.currentTime = parseDateToEpoch(data.follower.currentDate);` but the > underlying API still refers to `data.slave`. I believe this is fixed in the > master stream as the migration to the new leader/follower naming was > complete, but is broken in 8.x (8.7 an onwards). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15216) Invalid JS Object Key data.followers.currentData
[ https://issues.apache.org/jira/browse/SOLR-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295628#comment-17295628 ] Dean Pearce commented on SOLR-15216: Opened a PR on GitHub with the minimum viable patch for the 8.x stream: https://github.com/apache/lucene-solr/pull/2456 > Invalid JS Object Key data.followers.currentData > > > Key: SOLR-15216 > URL: https://issues.apache.org/jira/browse/SOLR-15216 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: 8.7 >Reporter: Dean Pearce >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Minor bug in the Admin UI Angular code, a line was changed to ` > settings.currentTime = parseDateToEpoch(data.follower.currentDate);` but the > underlying API still refers to `data.slave`. I believe this is fixed in the > master stream as the migration to the new leader/follower naming was > complete, but is broken in 8.x (8.7 an onwards). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] deanpearce opened a new pull request #2456: SOLR-15216 Fix for Invalid Reference to data.followers in Admin UI
deanpearce opened a new pull request #2456: URL: https://github.com/apache/lucene-solr/pull/2456 # Description Minor bug in the Admin UI Angular code introduced when switching to followers terminology, the underlying API for 8.x still refers to them as slave. In master 9.x branch this is resolved as full migration to new terminology is complete, but any future 8.x builds would have this issue. This bug prevents the Replication UI for Legacy Replication from loading if polling is enabled and there has been a successful run. # Solution Changed to use the correct JavaScript attribute. # Tests Compiled and ran the UI against my development instance, verified that the UI loads correctly. # Checklist Please review the following and check all that apply: - [X] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [X] I have created a Jira issue and added the issue ID to my pull request title. - [X] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently
[ https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295622#comment-17295622 ] Trey Jones commented on LUCENE-9754: I appreciate that this is frustrating, and I’m sorry that we seem to be frustrating each other. You seem to feel that I am not listening to what you have to say, which is no surprise, since I feel that you are not listening to what I have to say. Can we try again to meet somewhere in the middle? {quote}That's because this tokenizer first divides on scripts {quote} I’m trying my best to hear what you are saying here. The current behavior is the result of the tokenizer splitting on scripts before splitting on spaces. This does in fact completely explain the output we see in the _p 3a π 3a_ example. However, what the tokenizer _does_ and what the tokenizer is _supposed to do_ are not necessarily the same thing. I read your comments as offering the Word Boundary Rules and related Notes from Annex 29 as justification for the tokenizer’s behavior. I read over them, and I don’t see a justification there. Rather, I see a specific concrete example of what _*not*_ to do—splitting _3a_—yet the tokenizer seems to do exactly that. So, I do actually like your answer, but I don’t like the question that goes with it, which seems to be, “Why does the tokenizer do that?” *The question I’m trying to ask is, “Is this what the tokenizer _should_ do?”* My opinion is obviously that this is not what it should do—but opinions can differ. My reading of the documentation you suggested is _also_ that this is not what the tokenizer should do. I’m willing to accept the possibility that I have read UAX29 and WB10 and the example given there incorrectly, but I’m going to need a little help seeing it. Your previous comments have not provided the elucidation that I seek: {quote}That's because this tokenizer first divides on scripts {quote} This explains why it behaves as it does, not why that is the desired behavior. {quote}You can find more discussions on that in Notes section of [https://unicode.org/reports/tr29/#Word_Boundary_Rules] {quote} These rules and notes seem to contradict the behavior of the tokenizer. {quote}I think this tokenizer works behind-the-scenes differently than you imagine {quote} I believe that I understand what it does—as you said, it divides on scripts—but that doesn’t explain why that is the right thing to do. {quote}the rules you see don't mean what you might infer {quote} I infer that _3a,_ the example give in the rules, should not be split. If that is the wrong inference, please make some small attempt to explain _why,_ rather than implying that I don’t understand, or telling me _what_ the tokenizer does to get this behavior, which seems no less incorrect for being explainable. I hope we can give this one more go and find a productive consensus on whether the current tokenizer behavior is correct, and if so, some insight into why. Thanks for the time you've put into this discussion. > ICU Tokenizer: letter-space-number-letter tokenized inconsistently > -- > > Key: LUCENE-9754 > URL: https://issues.apache.org/jira/browse/LUCENE-9754 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.5 > Environment: Tested most recently on Elasticsearch 6.5.4. >Reporter: Trey Jones >Priority: Major > Attachments: LUCENE-9754_prototype.patch > > > The tokenization of strings like _14th_ with the ICU tokenizer is affected by > the character that comes before preceeding whitespace. > For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | > 14 | th. > In general, in a letter-space-number-letter sequence, if the writing system > before the space is the same as the writing system after the number, then you > get two tokens. If the writing systems differ, you get three tokens. > -If the conditions are just right, the chunking that the ICU tokenizer does > (trying to split on spaces to create <4k chunks) can create an artificial > boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the > unexpected split of the second token (_14th_). Because chunking changes can > ripple through a long document, editing text or the effects of a character > filter can cause changes in tokenization thousands of lines later in a > document.- _(This inconsistency was included as a side issue that I thought > might add more weight to the main problem I am concerned with, but it seems > to be more of a distraction. Chunking issues should perhaps be addressed in a > different ticket, so I'm striking it out.)_ > My guess is that some "previous character set" flag is not reset at the > space, and numbers are not in a character set, so _t_
[jira] [Created] (SOLR-15216) Invalid JS Object Key data.followers.currentData
Dean Pearce created SOLR-15216: -- Summary: Invalid JS Object Key data.followers.currentData Key: SOLR-15216 URL: https://issues.apache.org/jira/browse/SOLR-15216 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Admin UI Affects Versions: 8.7 Reporter: Dean Pearce Minor bug in the Admin UI Angular code, a line was changed to ` settings.currentTime = parseDateToEpoch(data.follower.currentDate);` but the underlying API still refers to `data.slave`. I believe this is fixed in the master stream as the migration to the new leader/follower naming was complete, but is broken in 8.x (8.7 an onwards). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-2852) SolrJ doesn't need woodstox jar
[ https://issues.apache.org/jira/browse/SOLR-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295620#comment-17295620 ] David Smiley commented on SOLR-2852: Time to do this for 9.0 in at least SolrJ? > SolrJ doesn't need woodstox jar > --- > > Key: SOLR-2852 > URL: https://issues.apache.org/jira/browse/SOLR-2852 > Project: Solr > Issue Type: Improvement > Components: clients - java >Reporter: David Smiley >Assignee: David Smiley >Priority: Minor > > The /dist/solrj-lib/ directory contains wstx-asl-3.2.7.jar (Woodstox StAX > API). SolrJ doesn't actually have any type of dependency on this library. > The maven build doesn't have it as a dependency and the tests pass. Perhaps > Woodstox is faster than the JDK's StAX, I don't know, but I find that point > quite moot since SolrJ can use the efficient binary format. Woodstox is not > a small library either, weighting in at 524KB, and of course if someone > actually wants to use it, they can. > I propose woodstox be removed as a SolrJ dependency. I am *not* proposing it > be removed as a Solr WAR dependency since it is actually required there due > to an obscure XSLT issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9687) Hunspell support improvements
[ https://issues.apache.org/jira/browse/LUCENE-9687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295619#comment-17295619 ] ASF subversion and git services commented on LUCENE-9687: - Commit 231e3afe0691e55403d297f99778736798726acb in lucene-solr's branch refs/heads/master from Peter Gromov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=231e3af ] LUCENE-9687: Hunspell suggestions: reduce work in the findSimilarDictionaryEntries loop (#2451) The loop is called a lot of times, and some allocations and method calls can be spared > Hunspell support improvements > - > > Key: LUCENE-9687 > URL: https://issues.apache.org/jira/browse/LUCENE-9687 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Peter Gromov >Priority: Major > Fix For: master (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > I'd like Lucene's Hunspell support to be on a par with the native C++ > Hunspell for spellchecking and suggestions, at least for some languages. So I > propose to: > * support the affix rules necessary for English, German, French, Spanish and > Russian dictionaries, possibly more languages later > * mirror Hunspell's suggestion algorithm in Lucene > * provide a public APIs for spellchecking, suggestion, stemming, > morphological data > * check corpora for specific languages to find and fix > spellchecking/suggestion discrepancices between Lucene's implementation and > Hunspell/C++ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir merged pull request #2451: LUCENE-9687: Hunspell suggestions: reduce work in the findSimilarDictionaryEntries loop
rmuir merged pull request #2451: URL: https://github.com/apache/lucene-solr/pull/2451 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15215) SolrJ: Remove needless Netty dependency
David Smiley created SOLR-15215: --- Summary: SolrJ: Remove needless Netty dependency Key: SOLR-15215 URL: https://issues.apache.org/jira/browse/SOLR-15215 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: SolrJ Reporter: David Smiley Assignee: David Smiley SolrJ depends on Netty transitively via ZooKeeper. But ZooKeeper's Netty dependency should be considered optional -- you have to opt-in. BTW it's only needed in Solr-core because of Hadoop/HDFS which ought to move to a contrib and take this dependency with it over there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently
[ https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Jones updated LUCENE-9754: --- Description: The tokenization of strings like _14th_ with the ICU tokenizer is affected by the character that comes before preceeding whitespace. For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | 14 | th. In general, in a letter-space-number-letter sequence, if the writing system before the space is the same as the writing system after the number, then you get two tokens. If the writing systems differ, you get three tokens. -If the conditions are just right, the chunking that the ICU tokenizer does (trying to split on spaces to create <4k chunks) can create an artificial boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the unexpected split of the second token (_14th_). Because chunking changes can ripple through a long document, editing text or the effects of a character filter can cause changes in tokenization thousands of lines later in a document.- _(This inconsistency was included as a side issue that I thought might add more weight to the main problem I am concerned with, but it seems to be more of a distraction. Chunking issues should perhaps be addressed in a different ticket, so I'm striking it out.)_ My guess is that some "previous character set" flag is not reset at the space, and numbers are not in a character set, so _t_ is compared to _ァ_ and they are not the same—causing a token split at the character set change—but I'm not sure. was: The tokenization of strings like _14th_ with the ICU tokenizer is affected by the character that comes before preceeding whitespace. For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | 14 | th. In general, in a letter-space-number-letter sequence, if the writing system before the space is the same as the writing system after the number, then you get two tokens. If the writing systems differ, you get three tokens. If the conditions are just right, the chunking that the ICU tokenizer does (trying to split on spaces to create <4k chunks) can create an artificial boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the unexpected split of the second token (_14th_). Because chunking changes can ripple through a long document, editing text or the effects of a character filter can cause changes in tokenization thousands of lines later in a document. My guess is that some "previous character set" flag is not reset at the space, and numbers are not in a character set, so _t_ is compared to _ァ_ and they are not the same—causing a token split at the character set change—but I'm not sure. > ICU Tokenizer: letter-space-number-letter tokenized inconsistently > -- > > Key: LUCENE-9754 > URL: https://issues.apache.org/jira/browse/LUCENE-9754 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.5 > Environment: Tested most recently on Elasticsearch 6.5.4. >Reporter: Trey Jones >Priority: Major > Attachments: LUCENE-9754_prototype.patch > > > The tokenization of strings like _14th_ with the ICU tokenizer is affected by > the character that comes before preceeding whitespace. > For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | > 14 | th. > In general, in a letter-space-number-letter sequence, if the writing system > before the space is the same as the writing system after the number, then you > get two tokens. If the writing systems differ, you get three tokens. > -If the conditions are just right, the chunking that the ICU tokenizer does > (trying to split on spaces to create <4k chunks) can create an artificial > boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the > unexpected split of the second token (_14th_). Because chunking changes can > ripple through a long document, editing text or the effects of a character > filter can cause changes in tokenization thousands of lines later in a > document.- _(This inconsistency was included as a side issue that I thought > might add more weight to the main problem I am concerned with, but it seems > to be more of a distraction. Chunking issues should perhaps be addressed in a > different ticket, so I'm striking it out.)_ > My guess is that some "previous character set" flag is not reset at the > space, and numbers are not in a character set, so _t_ is compared to _ァ_ and > they are not the same—causing a token split at the character set change—but > I'm not sure. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Commented] (SOLR-14788) Solr: The Next Big Thing
[ https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295595#comment-17295595 ] Mark Robert Miller commented on SOLR-14788: --- Ok, the Mark conquers severe ADD in the face of a myriad of ongoing setbacks, lost work, teams changes, relationship affects and trials is complete. Whew, now I know why I don't attempt these things. That's the end of my phases. The milestone in december gave me some peace of mind, this gives me some conclusion, what is to come is not on my shoulders in anything but a normal and standard way, maybe step away a bit first and let the overworked mind settle. > Solr: The Next Big Thing > > > Key: SOLR-14788 > URL: https://issues.apache.org/jira/browse/SOLR-14788 > Project: Solr > Issue Type: Task >Reporter: Mark Robert Miller >Assignee: Mark Robert Miller >Priority: Critical > Time Spent: 4h > Remaining Estimate: 0h > > h3. > [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The > Policeman is {color:#de350b}NOW{color} {color:#de350b}OFF{color} > duty!*{color} > {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and > have some fun. Try to make some progress. Don't stress too much about the > impact of your changes or maintaining stability and performance and > correctness so much. Until the end of phase 1, I've got your back. I have a > variety of tools and contraptions I have been building over the years and I > will continue training them on this branch. I will review your changes and > peer out across the land and course correct where needed. As Mike D will be > thinking, "Sounds like a bottleneck Mark." And indeed it will be to some > extent. Which is why once stage one is completed, I will flip The Policeman > to off duty. When off duty, I'm always* *occasionally*{color} *down for some > vigilante justice, but I won't be walking the beat, all that stuff about sit > back and relax goes out the window.*_ > {quote} > > I have stolen this title from Ishan or Noble and Ishan. > This issue is meant to capture the work of a small team that is forming to > push Solr and SolrCloud to the next phase. > I have kicked off the work with an effort to create a very fast and solid > base. That work is not 100% done, but it's ready to join the fight. > Tim Potter has started giving me a tremendous hand in finishing up. Ishan and > Noble have already contributed support and testing and have plans for > additional work to shore up some of our current shortcomings. > Others have expressed an interest in helping and hopefully they will pop up > here as well. > Let's organize and discuss our efforts here and in various sub issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295591#comment-17295591 ] Thomas Wöckinger commented on SOLR-15213: - So, you want to save the RTG query by id of the child document? More information can be found [SOLR-15064|https://issues.apache.org/jira/browse/SOLR-15064] > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: SOLR-15213.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already by merging if it is > present and inserting if it isn't. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently
[ https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295585#comment-17295585 ] Robert Muir commented on LUCENE-9754: - I explained here what happens in the first comment, but you didn't like my answer. Please re-read my answer, especially "That's because this tokenizer first divides on scripts". > ICU Tokenizer: letter-space-number-letter tokenized inconsistently > -- > > Key: LUCENE-9754 > URL: https://issues.apache.org/jira/browse/LUCENE-9754 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.5 > Environment: Tested most recently on Elasticsearch 6.5.4. >Reporter: Trey Jones >Priority: Major > Attachments: LUCENE-9754_prototype.patch > > > The tokenization of strings like _14th_ with the ICU tokenizer is affected by > the character that comes before preceeding whitespace. > For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | > 14 | th. > In general, in a letter-space-number-letter sequence, if the writing system > before the space is the same as the writing system after the number, then you > get two tokens. If the writing systems differ, you get three tokens. > If the conditions are just right, the chunking that the ICU tokenizer does > (trying to split on spaces to create <4k chunks) can create an artificial > boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the > unexpected split of the second token (_14th_). Because chunking changes can > ripple through a long document, editing text or the effects of a character > filter can cause changes in tokenization thousands of lines later in a > document. > My guess is that some "previous character set" flag is not reset at the > space, and numbers are not in a character set, so _t_ is compared to _ァ_ and > they are not the same—causing a token split at the character set change—but > I'm not sure. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley opened a new pull request #2455: Make SolrInputField name optional
dsmiley opened a new pull request #2455: URL: https://github.com/apache/lucene-solr/pull/2455 Prevents other bugs by failing fast Very minor change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14759) Separate the Lucene and Solr builds
[ https://issues.apache.org/jira/browse/SOLR-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295568#comment-17295568 ] Dawid Weiss commented on SOLR-14759: Made documentation task work for Solr. It still emits invalid links (and link checkers fail) but at least it passes and generates something. I think the follow-up will have to happen after the split when changes can be made to templates (removing direct links to Lucene changes, etc.). > Separate the Lucene and Solr builds > --- > > Key: SOLR-14759 > URL: https://issues.apache.org/jira/browse/SOLR-14759 > Project: Solr > Issue Type: Sub-task > Components: Build >Reporter: Jan Høydahl >Assignee: Dawid Weiss >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > While still in same git repo, separate the builds, so Lucene and Solr can be > built independently. > The preparation step includes optional building of just Lucene from current > master (prior to any code removal): > Current status of joint and separate builds: > * (/) joint build > {code} > gradlew assemble check > {code} > * (/) Lucene-only > {code} > gradlew -Dskip.solr=true assemble check > {code} > * (/) Solr-only (with documentation exclusions) > {code} > gradlew -Dskip.lucene=true assemble check -x test -x checkBrokenLinks -x > checkLocalJavadocLinksSite > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14759) Separate the Lucene and Solr builds
[ https://issues.apache.org/jira/browse/SOLR-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-14759: --- Description: While still in same git repo, separate the builds, so Lucene and Solr can be built independently. The preparation step includes optional building of just Lucene from current master (prior to any code removal): Current status of joint and separate builds: * (/) joint build {code} gradlew assemble check {code} * (/) Lucene-only {code} gradlew -Dskip.solr=true assemble check {code} * (/) Solr-only (with documentation exclusions) {code} gradlew -Dskip.lucene=true assemble check -x test -x checkBrokenLinks -x checkLocalJavadocLinksSite {code} was: While still in same git repo, separate the builds, so Lucene and Solr can be built independently. The preparation step includes optional building of just Lucene from current master (prior to any code removal): Current status of joint and separate builds: * (/) joint build {code} gradlew assemble check {code} * (/) Lucene-only {code} gradlew -Dskip.solr=true assemble check {code} * (/) Solr-only (with documentation exclusions) {code} gradlew -Dskip.lucene=true assemble check -x test -x documentation -x checkBrokenLinks -x checkLocalJavadocLinksSite {code} > Separate the Lucene and Solr builds > --- > > Key: SOLR-14759 > URL: https://issues.apache.org/jira/browse/SOLR-14759 > Project: Solr > Issue Type: Sub-task > Components: Build >Reporter: Jan Høydahl >Assignee: Dawid Weiss >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > While still in same git repo, separate the builds, so Lucene and Solr can be > built independently. > The preparation step includes optional building of just Lucene from current > master (prior to any code removal): > Current status of joint and separate builds: > * (/) joint build > {code} > gradlew assemble check > {code} > * (/) Lucene-only > {code} > gradlew -Dskip.solr=true assemble check > {code} > * (/) Solr-only (with documentation exclusions) > {code} > gradlew -Dskip.lucene=true assemble check -x test -x checkBrokenLinks -x > checkLocalJavadocLinksSite > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15185) Improve "hash" QParser
[ https://issues.apache.org/jira/browse/SOLR-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295564#comment-17295564 ] Joel Bernstein commented on SOLR-15185: --- The *parallel* Streaming Expression is what is using this currently. > Improve "hash" QParser > -- > > Key: SOLR-15185 > URL: https://issues.apache.org/jira/browse/SOLR-15185 > Project: Solr > Issue Type: Improvement >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > * Don't use Filter (to be removed) > * Do use TwoPhaseIterator, not PostFilter > * Don't pre-compute matching docs (wasteful) > * Support more fields, and more field types > * Faster hash on Strings (avoid Char conversion) > * Stronger hash when using multiple fields -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] magibney commented on pull request #2449: SOLR-15045: Execute local leader commit in parallel with distributed commits in DistributedZkUpdateProcessor
magibney commented on pull request #2449: URL: https://github.com/apache/lucene-solr/pull/2449#issuecomment-790865983 fwiw, I think the gradle precommit is failing on a nocommit comment marking a question about why TOLEADER distrib commit errors aren't propagated back to the client ... not really an issue with this PR _per se_. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15214) Provision an optional client configurable ZK port
[ https://issues.apache.org/jira/browse/SOLR-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohan Ganpatye updated SOLR-15214: -- Description: Expose an optional parameter that defines the port to start the ZK server on. This might be particularly helpful with testing infrastructure by having the ability to specify what port ZK server should use to initialize instead of the current default assignment or have to manually edit via zoo.cfg. Note: The new optional ZK port parameter is applicable with `zkRun` when used to run embedded ZooKeeper with Solr. was: Expose an optional parameter that defines the port to start the ZK server on. This might be particularly helpful with testing infrastructure by having the ability to specify what port ZK server should use to initialize instead of the current default assignment or have to manually edit via zoo.cfg. > Provision an optional client configurable ZK port > - > > Key: SOLR-15214 > URL: https://issues.apache.org/jira/browse/SOLR-15214 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.8.1 >Reporter: Rohan Ganpatye >Priority: Minor > Labels: ZooKeeper > > Expose an optional parameter that defines the port to start the ZK server on. > This might be particularly helpful with testing infrastructure by having the > ability to specify what port ZK server should use to initialize instead of > the current default assignment or have to manually edit via zoo.cfg. > Note: The new optional ZK port parameter is applicable with `zkRun` when used > to run embedded ZooKeeper with Solr. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15214) Provision an optional client configurable ZK port
Rohan Ganpatye created SOLR-15214: - Summary: Provision an optional client configurable ZK port Key: SOLR-15214 URL: https://issues.apache.org/jira/browse/SOLR-15214 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: 8.8.1 Reporter: Rohan Ganpatye Expose an optional parameter that defines the port to start the ZK server on. This might be particularly helpful with testing infrastructure by having the ability to specify what port ZK server should use to initialize instead of the current default assignment or have to manually edit via zoo.cfg. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] HoustonPutman opened a new pull request #2454: Test docker change, just to see if the docker github action works
HoustonPutman opened a new pull request #2454: URL: https://github.com/apache/lucene-solr/pull/2454 Ignore this. Just testing the github action. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] balmukundblr commented on a change in pull request #2345: Benchmark custom
balmukundblr commented on a change in pull request #2345: URL: https://github.com/apache/lucene-solr/pull/2345#discussion_r587723043 ## File path: lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java ## @@ -102,19 +104,43 @@ public void close() throws IOException { public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException { Path f = null; String name = null; -synchronized (this) { - if (nextFile >= inputFiles.size()) { -// exhausted files, start a new round, unless forever set to false. -if (!forever) { - throw new NoMoreDataException(); -} -nextFile = 0; -iteration++; - } - f = inputFiles.get(nextFile++); - name = f.toRealPath() + "_" + iteration; +int inputFilesSize = inputFiles.size(); + +/* + * synchronized (this) { + * if (nextFile >= inputFiles.size()) { // exhausted files, start a new round, unless forever set to false. + * if (!forever) { + *throw new NoMoreDataException(); + * } + * nextFile = 0; + * iteration++; + * } + * f = inputFiles.get(nextFile++); + * name = f.toRealPath() + "_" +iteration; + * } + */ +if (!threadIndexCreated) { + createThreadIndex(); +} + +int index = (int) Thread.currentThread().getId() % threadIndex.length; +int fIndex = index + threadIndex[index] * threadIndex.length; +threadIndex[index]++; Review comment: Although, getId() is controlled by JVM but in our case, all threadIndex are getting initialized at once. Hence, there is high chance of getting guaranteed sequence of thread id, as we also observed. However, we understand your concern and tweaked our code in such a way that it guaranteed to reach every possible int from 0 .. threadIndex.length. We achieved it by setting a unique thread name and parsing the same for calculating the index value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9754) ICU Tokenizer: letter-space-number-letter tokenized inconsistently
[ https://issues.apache.org/jira/browse/LUCENE-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295504#comment-17295504 ] Trey Jones commented on LUCENE-9754: Any chance there is a update/reply coming? I know everyone is very busy, but I very much appreciate the conversation so far, and I'd like to understand why the way _3a_ is tokenized following _p_ vs _π_ is the expected behavior, so—as I said—I can understand it and explain it to other people on my end. Thanks! > ICU Tokenizer: letter-space-number-letter tokenized inconsistently > -- > > Key: LUCENE-9754 > URL: https://issues.apache.org/jira/browse/LUCENE-9754 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.5 > Environment: Tested most recently on Elasticsearch 6.5.4. >Reporter: Trey Jones >Priority: Major > Attachments: LUCENE-9754_prototype.patch > > > The tokenization of strings like _14th_ with the ICU tokenizer is affected by > the character that comes before preceeding whitespace. > For example, _x 14th_ is tokenized as x | 14th; _ァ 14th_ is tokenized as ァ | > 14 | th. > In general, in a letter-space-number-letter sequence, if the writing system > before the space is the same as the writing system after the number, then you > get two tokens. If the writing systems differ, you get three tokens. > If the conditions are just right, the chunking that the ICU tokenizer does > (trying to split on spaces to create <4k chunks) can create an artificial > boundary between the tokens (e.g., between _ァ_ and _14th_) and prevent the > unexpected split of the second token (_14th_). Because chunking changes can > ripple through a long document, editing text or the effects of a character > filter can cause changes in tokenization thousands of lines later in a > document. > My guess is that some "previous character set" flag is not reset at the > space, and numbers are not in a character set, so _t_ is compared to _ァ_ and > they are not the same—causing a token split at the character set change—but > I'm not sure. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] balmukundblr commented on a change in pull request #2345: Benchmark custom
balmukundblr commented on a change in pull request #2345: URL: https://github.com/apache/lucene-solr/pull/2345#discussion_r587707919 ## File path: lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java ## @@ -146,4 +172,11 @@ public synchronized void resetInputs() throws IOException { nextFile = 0; iteration = 0; } + + private synchronized void createThreadIndex() { +if (!threadIndexCreated) { Review comment: Sure, will do the required changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] balmukundblr commented on a change in pull request #2345: Benchmark custom
balmukundblr commented on a change in pull request #2345: URL: https://github.com/apache/lucene-solr/pull/2345#discussion_r587707735 ## File path: lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java ## @@ -102,19 +104,43 @@ public void close() throws IOException { public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException { Path f = null; String name = null; -synchronized (this) { - if (nextFile >= inputFiles.size()) { -// exhausted files, start a new round, unless forever set to false. -if (!forever) { - throw new NoMoreDataException(); -} -nextFile = 0; -iteration++; - } - f = inputFiles.get(nextFile++); - name = f.toRealPath() + "_" + iteration; +int inputFilesSize = inputFiles.size(); + +/* + * synchronized (this) { Review comment: Sure, will delete the commented codes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15191) Faceting on EnumFieldType does not work if allBuckets, numBuckets or missing is set
[ https://issues.apache.org/jira/browse/SOLR-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved SOLR-15191. - Resolution: Fixed > Faceting on EnumFieldType does not work if allBuckets, numBuckets or missing > is set > --- > > Key: SOLR-15191 > URL: https://issues.apache.org/jira/browse/SOLR-15191 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, FacetComponent, faceting, search, > streaming expressions >Affects Versions: 8.7, 8.8, 8.8.1 >Reporter: Thomas Wöckinger >Assignee: David Smiley >Priority: Major > Labels: easy-fix, pull-request-available > Fix For: 8.9 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Due to Solr-14514 FacetFieldProcessorByEnumTermsStream is not used if > allBuckets, numBuckets or missing parma is true. > As fallback FacetFieldProcessorByHashDV is used which > FacetRangeProcessor.getNumericCalc(sf) on the field. EnumFileType is not > handled currently, so a SolrException is thrown with BAD_REQUEST and > 'Expected numeric field type' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] balmukundblr commented on a change in pull request #2345: Benchmark custom
balmukundblr commented on a change in pull request #2345: URL: https://github.com/apache/lucene-solr/pull/2345#discussion_r587706698 ## File path: lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java ## @@ -102,19 +104,43 @@ public void close() throws IOException { public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException { Path f = null; String name = null; -synchronized (this) { - if (nextFile >= inputFiles.size()) { -// exhausted files, start a new round, unless forever set to false. -if (!forever) { - throw new NoMoreDataException(); -} -nextFile = 0; -iteration++; - } - f = inputFiles.get(nextFile++); - name = f.toRealPath() + "_" + iteration; +int inputFilesSize = inputFiles.size(); + +/* + * synchronized (this) { + * if (nextFile >= inputFiles.size()) { // exhausted files, start a new round, unless forever set to false. + * if (!forever) { + *throw new NoMoreDataException(); + * } + * nextFile = 0; + * iteration++; + * } + * f = inputFiles.get(nextFile++); + * name = f.toRealPath() + "_" +iteration; + * } + */ +if (!threadIndexCreated) { Review comment: Sure, will do the required changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9822) Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295492#comment-17295492 ] Greg Miller commented on LUCENE-9822: - Yeah, interesting. Looking at the code, we're packing the number of bits used per entry along with the number of patches in a single byte. Because we max out at 32 bits/entry, we can encode the number of bits/entry in 5 bits, leaving 3 more for the number of patches. Seems like an interesting experiment to bring in one more byte for encoding the number of patches, significantly raising the ceiling on how many entries we can patch in. Just a quick though from looking at the code, but I'll see if I can dig into the literature a little. > Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil > -- > > Key: LUCENE-9822 > URL: https://issues.apache.org/jira/browse/LUCENE-9822 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: master (9.0) >Reporter: Greg Miller >Priority: Trivial > Attachments: LUCENE-9822.patch > > > PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when > generating "patch offsets". If this assumption doesn't hold, PForUtil will > silently encode incorrect positions. While the BLOCK_SIZE isn't particularly > configurable, it would be nice to assert this assumption early in PForUtil in > the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on pull request #2438: SOLR-14928: add exponential backoff for distributed cluster state updates
dsmiley commented on pull request #2438: URL: https://github.com/apache/lucene-solr/pull/2438#issuecomment-790820910 I review what I commit locally in my IDE tooling. I think IntelliJ does a nice job of this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15185) Improve "hash" QParser
[ https://issues.apache.org/jira/browse/SOLR-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295490#comment-17295490 ] David Smiley commented on SOLR-15185: - Okay. Given that most users of this are indirect users via streaming expressions (I presume), can you recommend how I might say that... i.e. _what_ part/expression is affected here? Such users would not even know this optimization affects them otherwise. > Improve "hash" QParser > -- > > Key: SOLR-15185 > URL: https://issues.apache.org/jira/browse/SOLR-15185 > Project: Solr > Issue Type: Improvement >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > * Don't use Filter (to be removed) > * Do use TwoPhaseIterator, not PostFilter > * Don't pre-compute matching docs (wasteful) > * Support more fields, and more field types > * Faster hash on Strings (avoid Char conversion) > * Stronger hash when using multiple fields -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9822) Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295480#comment-17295480 ] Adrien Grand commented on LUCENE-9822: -- I think that the number 3 came from me looking at query throughput vs. size of the .doc/.pos files for our Wikipedia dataset and figuring out the best trade-off. > Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil > -- > > Key: LUCENE-9822 > URL: https://issues.apache.org/jira/browse/LUCENE-9822 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: master (9.0) >Reporter: Greg Miller >Priority: Trivial > Attachments: LUCENE-9822.patch > > > PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when > generating "patch offsets". If this assumption doesn't hold, PForUtil will > silently encode incorrect positions. While the BLOCK_SIZE isn't particularly > configurable, it would be nice to assert this assumption early in PForUtil in > the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram opened a new pull request #2453: SOLR-15210: ParallelStream should execute hashing & filtering directly in ExportWriter
sigram opened a new pull request #2453: URL: https://github.com/apache/lucene-solr/pull/2453 See Jira for details. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15185) Improve "hash" QParser
[ https://issues.apache.org/jira/browse/SOLR-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295448#comment-17295448 ] Joel Bernstein commented on SOLR-15185: --- I don't think we need to bother unless you feel like including it under optimizations. > Improve "hash" QParser > -- > > Key: SOLR-15185 > URL: https://issues.apache.org/jira/browse/SOLR-15185 > Project: Solr > Issue Type: Improvement >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > * Don't use Filter (to be removed) > * Do use TwoPhaseIterator, not PostFilter > * Don't pre-compute matching docs (wasteful) > * Support more fields, and more field types > * Faster hash on Strings (avoid Char conversion) > * Stronger hash when using multiple fields -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15185) Improve "hash" QParser
[ https://issues.apache.org/jira/browse/SOLR-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295407#comment-17295407 ] David Smiley commented on SOLR-15185: - [~jbernste] can you please recommend CHANGES.txt and/or ref guide upgrade notes pertaining to the hash changing? Or maybe don't bother if nobody would care? RE the perf change; I'll just be vague to say it's more efficient. > Improve "hash" QParser > -- > > Key: SOLR-15185 > URL: https://issues.apache.org/jira/browse/SOLR-15185 > Project: Solr > Issue Type: Improvement >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > * Don't use Filter (to be removed) > * Do use TwoPhaseIterator, not PostFilter > * Don't pre-compute matching docs (wasteful) > * Support more fields, and more field types > * Faster hash on Strings (avoid Char conversion) > * Stronger hash when using multiple fields -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9822) Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295404#comment-17295404 ] Michael McCandless commented on LUCENE-9822: +1, no unit test needed for one-line {{assert}} addition. Thanks [~gsmiller]! {quote}But if you are trying to do something like blocksize=512, seems like you would need to allow for more exceptions (e.g. 12 or something) for the patching to be effective for general purposes. Maybe worth checking literature as I don't know off the top of my head where these numbers (128, 3) etc came from. {quote} +1 – seems (naively) like the number of exceptions should probably grow linearly? We could probably make some crazy offline tool that gathers all the ints we are encoding into a given index and then measures what compression we could achieve with different numbers of patched exceptions. > Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil > -- > > Key: LUCENE-9822 > URL: https://issues.apache.org/jira/browse/LUCENE-9822 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: master (9.0) >Reporter: Greg Miller >Priority: Trivial > Attachments: LUCENE-9822.patch > > > PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when > generating "patch offsets". If this assumption doesn't hold, PForUtil will > silently encode incorrect positions. While the BLOCK_SIZE isn't particularly > configurable, it would be nice to assert this assumption early in PForUtil in > the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3320) Explore Proximity Scoring
[ https://issues.apache.org/jira/browse/LUCENE-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295393#comment-17295393 ] Tomoko Uchida commented on LUCENE-3320: --- Thanks [~mikemccand] for the pointer! I think this will bring great improvement especially for long queries or natural language queries. I'd need proximity scoring for a project I'm currently working on... will give it a try. > Explore Proximity Scoring > -- > > Key: LUCENE-3320 > URL: https://issues.apache.org/jira/browse/LUCENE-3320 > Project: Lucene - Core > Issue Type: Sub-task > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Priority: Major > Fix For: Positions Branch > > > Positions will be first class citizens rather sooner than later. We should > explore proximity scoring possibilities as well as collection / scoring > algorithms like proposed on LUCENE-2878 (2 phase collection) > This paper might provide some basis for actual scoring implementation: > http://plg.uwaterloo.ca/~claclark/sigir2006_term_proximity.pdf -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Ashbourne updated SOLR-15213: --- Description: Solr has "add", "set", "add-distinct" which work but all have their limitations. Namely, there's currently no way to atomically update a document where that document may or may not be present already by merging if it is present and inserting if it isn't. i.e. in the scenario where we have a document with two nested children: {noformat} {"id": "ocean1", "_isParent":"true", "fish": [ { "id": "fish1", "type_s": "fish", "name_s": "Doe", "_isParent":"false"}, { "id": "fish2", "type_s": "fish", "name_s": "Hans", "_isParent":"false"}] }{noformat} If we later want to update that child doc e.g.: {noformat} {"id": "ocean1", "_isParent":"true", "fish": [ { "id": "fish1", "type_s": "fish", "name_s": "James", // new name "_isParent":"false"}, ] }{noformat} Existing operations: - "add" - will add another nested doc with the same id leaving us with two children with the same id. - "set" - replaces the whole list of child docs with the single doc, we could use this but would first have to fetch all the existing children. - "add-distinct" - will reject the update based on the doc already being present. I've got some changes (see patch) that a new option "merge" which checks based on the id and merges the new document with the old with a fall back to add if there is no id match. was: Solr has "add", "set", "add-distinct" which work but all have their limitations. Namely, there's currently no way to atomically update a document where that document may or may not be present already and merge if it is present. i.e. in the scenario where we have a document with two nested children: {noformat} {"id": "ocean1", "_isParent":"true", "fish": [ { "id": "fish1", "type_s": "fish", "name_s": "Doe", "_isParent":"false"}, { "id": "fish2", "type_s": "fish", "name_s": "Hans", "_isParent":"false"}] }{noformat} If we later want to update that child doc e.g.: {noformat} {"id": "ocean1", "_isParent":"true", "fish": [ { "id": "fish1", "type_s": "fish", "name_s": "James", // new name "_isParent":"false"}, ] }{noformat} Existing operations: - "add" - will add another nested doc with the same id leaving us with two children with the same id. - "set" - replaces the whole list of child docs with the single doc, we could use this but would first have to fetch all the existing children. - "add-distinct" - will reject the update based on the doc already being present. I've got some changes (see patch) that a new option "merge" which checks based on the id and merges the new document with the old with a fall back to add if there is no id match. > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: SOLR-15213.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already by merging if it is > present and inserting if it isn't. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This
[jira] [Comment Edited] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295388#comment-17295388 ] Endika Posadas edited comment on SOLR-15213 at 3/4/21, 4:24 PM: The problem with directly updating the child document using '_route_' is that there's no "upsert" mechanism. If the parent doesn't contain a child with the same id then it will fail the request. e.g.: {noformat} "msg":"Did not find child ID fish1 in parent ocean1" {noformat} . By allowing a merge mechanism a child can either be inserted or updated with a single request. was (Author: enpos): The problem with directly updating the child document using '_route_' is that there's no "upsert" mechanism. If the parent doesn't contain a child with the same id then it will fail the request. e.g.: {noformat} "msg":"Did not find child ID fish1 in parent ocean1" {noformat} . By allowing a merge mechanism a child can either be inserted or updated. > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: SOLR-15213.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already and merge if it is > present. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295388#comment-17295388 ] Endika Posadas edited comment on SOLR-15213 at 3/4/21, 4:23 PM: The problem with directly updating the child document using '_route_' is that there's no "upsert" mechanism. If the parent doesn't contain a child with the same id then it will fail the request. e.g.: {noformat} "msg":"Did not find child ID fish1 in parent ocean1" {noformat} . By allowing a merge mechanism a child can either be inserted or updated. was (Author: enpos): The problem with directly updating the child document using '_route_' is that there's no "upsert" mechanism. If the parent doesn't contain a child with the same id then it will fail the request. e.g.: `"msg":"Did not find child ID fish1 in parent ocean1"`. By allowing a merge mechanism a child can either be inserted or updated. > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: SOLR-15213.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already and merge if it is > present. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295388#comment-17295388 ] Endika Posadas commented on SOLR-15213: --- The problem with directly updating the child document using '_route_' is that there's no "upsert" mechanism. If the parent doesn't contain a child with the same id then it will fail the request. e.g.: `"msg":"Did not find child ID fish1 in parent ocean1"`. By allowing a merge mechanism a child can either be inserted or updated. > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: SOLR-15213.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already and merge if it is > present. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295383#comment-17295383 ] James Ashbourne commented on SOLR-15213: [~thomas.woeckinger] you're right '_root_' works for updating if you know that child exists already but in some cases you don't know if you have already added that child. "merge" would be a update if present or insert if not. > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: SOLR-15213.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already and merge if it is > present. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9822) Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295367#comment-17295367 ] Robert Muir commented on LUCENE-9822: - Looks good. The single byte assumption reminds me though, with such huge block-sizes, the patching may not even work very well without changing how the class works completely. Currently It allows 3 exceptions for blocks of 128 so that 3 large values don't blow compression up for the whole block. But if you are trying to do something like blocksize=512, seems like you would need to allow for more exceptions (e.g. 12 or something) for the patching to be effective for general purposes. Maybe worth checking literature as I don't know off the top of my head where these numbers (128, 3) etc came from. > Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil > -- > > Key: LUCENE-9822 > URL: https://issues.apache.org/jira/browse/LUCENE-9822 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: master (9.0) >Reporter: Greg Miller >Priority: Trivial > Attachments: LUCENE-9822.patch > > > PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when > generating "patch offsets". If this assumption doesn't hold, PForUtil will > silently encode incorrect positions. While the BLOCK_SIZE isn't particularly > configurable, it would be nice to assert this assumption early in PForUtil in > the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9822) Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295297#comment-17295297 ] Greg Miller edited comment on LUCENE-9822 at 3/4/21, 3:40 PM: -- I think this is just a one-liner in the PForUtil ctor. Patch uploaded. I verified this works on a local branch I have setup for 512 block sizes. Can't think of a good way to add unit testing around this though since the BLOCK_SIZE definition is static/final. was (Author: gsmiller): I think this is just a one-liner in the PForUtil ctor. Patch uploaded. > Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil > -- > > Key: LUCENE-9822 > URL: https://issues.apache.org/jira/browse/LUCENE-9822 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: master (9.0) >Reporter: Greg Miller >Priority: Trivial > Attachments: LUCENE-9822.patch > > > PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when > generating "patch offsets". If this assumption doesn't hold, PForUtil will > silently encode incorrect positions. While the BLOCK_SIZE isn't particularly > configurable, it would be nice to assert this assumption early in PForUtil in > the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9822) Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295297#comment-17295297 ] Greg Miller edited comment on LUCENE-9822 at 3/4/21, 3:39 PM: -- I think this is just a one-liner in the PForUtil ctor. Patch uploaded. was (Author: gsmiller): I think this is just a one-liner in the PForUtil ctor. I'll attach a patch shortly. > Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil > -- > > Key: LUCENE-9822 > URL: https://issues.apache.org/jira/browse/LUCENE-9822 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: master (9.0) >Reporter: Greg Miller >Priority: Trivial > Attachments: LUCENE-9822.patch > > > PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when > generating "patch offsets". If this assumption doesn't hold, PForUtil will > silently encode incorrect positions. While the BLOCK_SIZE isn't particularly > configurable, it would be nice to assert this assumption early in PForUtil in > the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15038) Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to elevation functionality
[ https://issues.apache.org/jira/browse/SOLR-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295358#comment-17295358 ] Bruno Roustant commented on SOLR-15038: --- I reverted this specific line in both master and branch_8x. > Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to > elevation functionality > > > Key: SOLR-15038 > URL: https://issues.apache.org/jira/browse/SOLR-15038 > Project: Solr > Issue Type: Improvement > Components: query >Reporter: Tobias Kässmann >Priority: Minor > Fix For: 8.9 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We've worked a lot with Query Elevation component in the last time and we > were missing two features: > * Elevate only documents that are part of the search result > * In combination with collapsing: Only show the representative if the > elevated documents does have the same collapse field value. > Because of this, we've added these two feature toggles > _elevateDocsWithoutMatchingQ_ and _onlyElevatedRepresentative._ > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9822) Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller updated LUCENE-9822: Attachment: LUCENE-9822.patch > Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil > -- > > Key: LUCENE-9822 > URL: https://issues.apache.org/jira/browse/LUCENE-9822 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: master (9.0) >Reporter: Greg Miller >Priority: Trivial > Attachments: LUCENE-9822.patch > > > PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when > generating "patch offsets". If this assumption doesn't hold, PForUtil will > silently encode incorrect positions. While the BLOCK_SIZE isn't particularly > configurable, it would be nice to assert this assumption early in PForUtil in > the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15038) Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to elevation functionality
[ https://issues.apache.org/jira/browse/SOLR-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295354#comment-17295354 ] ASF subversion and git services commented on SOLR-15038: Commit e791fb00a9452081d417d43fb7713d95ca73663b in lucene-solr's branch refs/heads/branch_8x from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e791fb0 ] SOLR-15038: Restore read-only permission in security.policy > Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to > elevation functionality > > > Key: SOLR-15038 > URL: https://issues.apache.org/jira/browse/SOLR-15038 > Project: Solr > Issue Type: Improvement > Components: query >Reporter: Tobias Kässmann >Priority: Minor > Fix For: 8.9 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We've worked a lot with Query Elevation component in the last time and we > were missing two features: > * Elevate only documents that are part of the search result > * In combination with collapsing: Only show the representative if the > elevated documents does have the same collapse field value. > Because of this, we've added these two feature toggles > _elevateDocsWithoutMatchingQ_ and _onlyElevatedRepresentative._ > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295323#comment-17295323 ] Thomas Wöckinger edited comment on SOLR-15213 at 3/4/21, 3:04 PM: -- You simply update only the child document, and you should use '\_root\_' field to get the right shard, your scenario is already working. was (Author: thomas.woeckinger): You simply update only the child document, and you should use '_root_' field to get the right shard, your scenario is already working. > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: SOLR-15213.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already and merge if it is > present. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295323#comment-17295323 ] Thomas Wöckinger commented on SOLR-15213: - You simply update only the child document, and you should use '_root_' field to get the right shard, your scenario is already working. > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: SOLR-15213.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already and merge if it is > present. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13071) Add JWT Auth support in bin/solr
[ https://issues.apache.org/jira/browse/SOLR-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295322#comment-17295322 ] David Eric Pugh commented on SOLR-13071: I've heard of some CLI's that actually pop open a browser window to do the authentication, and then, I think by running a local webserver, capture the redirect, which then lets you get the authorization code, and use that to get the access_token. Having said that, I haven't found an example written in Java of a CLI doing this, and I'm not sure that I could grok how to do that from scratch. So what you are suggesting is we just document how to get the access_token, and then have someone put that in a file that is read? That seems easier, and or a good first step. > Add JWT Auth support in bin/solr > > > Key: SOLR-13071 > URL: https://issues.apache.org/jira/browse/SOLR-13071 > Project: Solr > Issue Type: Improvement > Components: scripts and tools >Reporter: Jan Høydahl >Priority: Major > > Once SOLR-12121 gets in, we should add support to {{bin/solr}} start scripts > so they can authenticate with Solr using a JWT token. A preferred way would > perhaps be through {{solr.in.sh}} and add new > {noformat} > SOLR_AUTH_TYPE=token > SOLR_AUTHENTICATION_OPTS=-DjwtToken= > {noformat} > A disadvantage with this method is that the user needs to know how to obtain > the token, and the token needs to be long-lived. A more sophisticated way > would be a {{bin/solr auth login}} command that opens a browser window with > the IDP login screen and saves the short-lived access token and optionally > refresh token, in the file system. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Ashbourne updated SOLR-15213: --- Attachment: SOLR-15213.patch > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: SOLR-15213.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already and merge if it is > present. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Ashbourne updated SOLR-15213: --- Attachment: (was: solr-merge.patch) > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: SOLR-15213.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already and merge if it is > present. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9822) Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295297#comment-17295297 ] Greg Miller commented on LUCENE-9822: - I think this is just a one-liner in the PForUtil ctor. I'll attach a patch shortly. > Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil > -- > > Key: LUCENE-9822 > URL: https://issues.apache.org/jira/browse/LUCENE-9822 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: master (9.0) >Reporter: Greg Miller >Priority: Trivial > > PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when > generating "patch offsets". If this assumption doesn't hold, PForUtil will > silently encode incorrect positions. While the BLOCK_SIZE isn't particularly > configurable, it would be nice to assert this assumption early in PForUtil in > the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Ashbourne updated SOLR-15213: --- Description: Solr has "add", "set", "add-distinct" which work but all have their limitations. Namely, there's currently no way to atomically update a document where that document may or may not be present already and merge if it is present. i.e. in the scenario where we have a document with two nested children: {noformat} {"id": "ocean1", "_isParent":"true", "fish": [ { "id": "fish1", "type_s": "fish", "name_s": "Doe", "_isParent":"false"}, { "id": "fish2", "type_s": "fish", "name_s": "Hans", "_isParent":"false"}] }{noformat} If we later want to update that child doc e.g.: {noformat} {"id": "ocean1", "_isParent":"true", "fish": [ { "id": "fish1", "type_s": "fish", "name_s": "James", // new name "_isParent":"false"}, ] }{noformat} Existing operations: - "add" - will add another nested doc with the same id leaving us with two children with the same id. - "set" - replaces the whole list of child docs with the single doc, we could use this but would first have to fetch all the existing children. - "add-distinct" - will reject the update based on the doc already being present. I've got some changes (see patch) that a new option "merge" which checks based on the id and merges the new document with the old with a fall back to add if there is no id match. was: Solr has "add", "set", "add-distinct" which work but all have their limitations. Namely, there's currently no way to atomically update a document where that document may or may not be present already with merging if it is present. i.e. in the scenario where we have a document with two nested children: {noformat} {"id": "ocean1", "_isParent":"true", "fish": [ { "id": "fish1", "type_s": "fish", "name_s": "Doe", "_isParent":"false"}, { "id": "fish2", "type_s": "fish", "name_s": "Hans", "_isParent":"false"}] }{noformat} If we later want to update that child doc e.g.: {noformat} {"id": "ocean1", "_isParent":"true", "fish": [ { "id": "fish1", "type_s": "fish", "name_s": "James", // new name "_isParent":"false"}, ] }{noformat} Existing operations: - "add" - will add another nested doc with the same id leaving us with two children with the same id. - "set" - replaces the whole list of child docs with the single doc, we could use this but would first have to fetch all the existing children. - "add-distinct" - will reject the update based on the doc already being present. I've got some changes (see patch) that a new option "merge" which checks based on the id and merges the new document with the old with a fall back to add if there is no id match. > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: solr-merge.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already and merge if it is > present. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (LUCENE-9822) Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
Greg Miller created LUCENE-9822: --- Summary: Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil Key: LUCENE-9822 URL: https://issues.apache.org/jira/browse/LUCENE-9822 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Affects Versions: master (9.0) Reporter: Greg Miller PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when generating "patch offsets". If this assumption doesn't hold, PForUtil will silently encode incorrect positions. While the BLOCK_SIZE isn't particularly configurable, it would be nice to assert this assumption early in PForUtil in the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15213) Add support for "merge" atomic update operation for child documents
[ https://issues.apache.org/jira/browse/SOLR-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Ashbourne updated SOLR-15213: --- Attachment: solr-merge.patch > Add support for "merge" atomic update operation for child documents > --- > > Key: SOLR-15213 > URL: https://issues.apache.org/jira/browse/SOLR-15213 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: James Ashbourne >Priority: Major > Attachments: solr-merge.patch > > > Solr has "add", "set", "add-distinct" which work but all have their > limitations. Namely, there's currently no way to atomically update a document > where that document may or may not be present already with merging if it is > present. > i.e. in the scenario where we have a document with two nested children: > > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "Doe", > "_isParent":"false"}, > { > "id": "fish2", > "type_s": "fish", > "name_s": "Hans", > "_isParent":"false"}] > }{noformat} > > If we later want to update that child doc e.g.: > {noformat} > {"id": "ocean1", > "_isParent":"true", > "fish": [ > { > "id": "fish1", > "type_s": "fish", > "name_s": "James", // new name > "_isParent":"false"}, > ] > }{noformat} > > Existing operations: > - "add" - will add another nested doc with the same id leaving us with two > children with the same id. > - "set" - replaces the whole list of child docs with the single doc, we > could use this but would first have to fetch all the existing children. > - "add-distinct" - will reject the update based on the doc already being > present. > I've got some changes (see patch) that a new option "merge" which checks > based on the id and merges the new document with the old with a fall back to > add if there is no id match. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15213) Add support for "merge" atomic update operation for child documents
James Ashbourne created SOLR-15213: -- Summary: Add support for "merge" atomic update operation for child documents Key: SOLR-15213 URL: https://issues.apache.org/jira/browse/SOLR-15213 Project: Solr Issue Type: New Feature Security Level: Public (Default Security Level. Issues are Public) Reporter: James Ashbourne Solr has "add", "set", "add-distinct" which work but all have their limitations. Namely, there's currently no way to atomically update a document where that document may or may not be present already with merging if it is present. i.e. in the scenario where we have a document with two nested children: {noformat} {"id": "ocean1", "_isParent":"true", "fish": [ { "id": "fish1", "type_s": "fish", "name_s": "Doe", "_isParent":"false"}, { "id": "fish2", "type_s": "fish", "name_s": "Hans", "_isParent":"false"}] }{noformat} If we later want to update that child doc e.g.: {noformat} {"id": "ocean1", "_isParent":"true", "fish": [ { "id": "fish1", "type_s": "fish", "name_s": "James", // new name "_isParent":"false"}, ] }{noformat} Existing operations: - "add" - will add another nested doc with the same id leaving us with two children with the same id. - "set" - replaces the whole list of child docs with the single doc, we could use this but would first have to fetch all the existing children. - "add-distinct" - will reject the update based on the doc already being present. I've got some changes (see patch) that a new option "merge" which checks based on the id and merges the new document with the old with a fall back to add if there is no id match. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9406) Make it simpler to track IndexWriter's events
[ https://issues.apache.org/jira/browse/LUCENE-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295278#comment-17295278 ] Michael McCandless commented on LUCENE-9406: Should we maybe backport this to 8.x? It is only adding a new experimental API, so it would not be an API break? > Make it simpler to track IndexWriter's events > - > > Key: LUCENE-9406 > URL: https://issues.apache.org/jira/browse/LUCENE-9406 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: master (9.0) > > Time Spent: 2h 20m > Remaining Estimate: 0h > > This is the second spinoff from a [controversial PR to add a new index-time > feature to Lucene to merge small segments during > commit|https://github.com/apache/lucene-solr/pull/1552]. That change can > substantially reduce the number of small index segments to search. > In that PR, there was a new proposed interface, {{IndexWriterEvents}}, giving > the application a chance to track when {{IndexWriter}} kicked off merges > during commit, how many, how long it waited, how often it gave up waiting, > etc. > Such telemetry from production usage is really helpful when tuning settings > like which merges (e.g. a size threshold) to attempt on commit, and how long > to wait during commit, etc. > I am splitting out this issue to explore possible approaches to do this. > E.g. [~simonw] proposed using a statistics class instead, but if I understood > that correctly, I think that would put the role of aggregation inside > {{IndexWriter}}, which is not ideal. > Many interesting events, e.g. how many merges are being requested, how large > are they, how long did they take to complete or fail, etc., can be gleaned by > wrapping expert Lucene classes like {{MergePolicy}} and {{MergeScheduler}}. > But for those events that cannot (e.g. {{IndexWriter}} stopped waiting for > merges during commit), it would be very helpful to have some simple way to > track so applications can better tune. > It is also possible to subclass {{IndexWriter}} and override key methods, but > I think that is inherently risky as {{IndexWriter}}'s protected methods are > not considered to be a stable API, and the synchronization used by > {{IndexWriter}} is confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on pull request #2342: LUCENE-9406: Add IndexWriterEventListener to track events in IndexWriter
mikemccand commented on pull request #2342: URL: https://github.com/apache/lucene-solr/pull/2342#issuecomment-790609431 Woops, sorry for the belated response, and thank you @zacharymorn for creating this and @dweiss for merging -- it looks great! We can now add other events to track incrementally over time ... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] pawel-bugalski-dynatrace commented on a change in pull request #2429: LUCENE-9791 Allow calling BytesRefHash#find concurrently
pawel-bugalski-dynatrace commented on a change in pull request #2429: URL: https://github.com/apache/lucene-solr/pull/2429#discussion_r587450225 ## File path: lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java ## @@ -31,18 +31,21 @@ * to the id is encapsulated inside {@link BytesRefHash} and is guaranteed to be increased for each * added {@link BytesRef}. * + * Note that this implementation is not synchronized. If multiple threads access + * a {@link BytesRefHash} instance concurrently, and at least one of the threads modifies it + * structurally, it must be synchronized externally. (A structural modification is any + * operation on the map except operations explicitly listed in {@link UnmodifiableBytesRefHash} + * interface). + * * Note: The maximum capacity {@link BytesRef} instance passed to {@link #add(BytesRef)} must not * be longer than {@link ByteBlockPool#BYTE_BLOCK_SIZE}-2. The internal storage is limited to 2GB * total byte storage. * * @lucene.internal */ -public final class BytesRefHash implements Accountable { +public final class BytesRefHash implements Accountable, UnmodifiableBytesRefHash { Review comment: Based on comments I'm going to remove UnmodifiableBytesRefHash altogether This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler edited a comment on pull request #2429: LUCENE-9791 Allow calling BytesRefHash#find concurrently
uschindler edited a comment on pull request #2429: URL: https://github.com/apache/lucene-solr/pull/2429#issuecomment-790589517 Hi, I agree with Mike. I like the equals() method to be thread safe. That was my original proposal. Generally: BytesRefHash is my favourite class if you need a `Set`. Although it's marked internal, I prefer to use it. Especially if you need a set of millions of strings, this is fast and does not produce millions of Strings. I personally used it only single threaded, but in all cases a method called equals should never ever change state. Sorry! +1 for the fix -1 to add the unmodifiable interface. That's over-engineered. Uwe This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on pull request #2429: LUCENE-9791 Allow calling BytesRefHash#find concurrently
uschindler commented on pull request #2429: URL: https://github.com/apache/lucene-solr/pull/2429#issuecomment-790589517 Hi, I agree with Mike. I like the equals() method to be thread safe. That was my original proposal. Generally: BytesRefHash is my favourite class if you need a Set. Although it's marked internal, I prefer to use it. Especially if you need a set of millions of strings, this is fast and does not produce millions of Strings. I personally used it only single threaded, but in all cases a method called equals should never ever change state. Sorry! +1 for the fix -1 to add the unmodifiable interface. That's over-engineered. Uwe This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on pull request #2429: LUCENE-9791 Allow calling BytesRefHash#find concurrently
mikemccand commented on pull request #2429: URL: https://github.com/apache/lucene-solr/pull/2429#issuecomment-790583058 +1 for changing `equals` to not require allocation, enabling us to remove the thread-unsafe shared `BytesRef scratch1`! This makes `find` thread-safe (as long as no other threads are making structural changes), and would suffice to fix `Luwak`'s usage, right? This is a nice improvement by itself! I'm also not a fan of adding the `UnmodifiableBytesRefHash` wrapper -- this is indeed an `@lucene.internal` API, not a generic JDK Collections class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase commented on pull request #2452: LUCENE-9580: Don't introduce collinear edges when splitting polygon
iverase commented on pull request #2452: URL: https://github.com/apache/lucene-solr/pull/2452#issuecomment-790571887 @nknize would you mind having a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase opened a new pull request #2452: LUCENE-9580: Don't introduce collinear edges when splitting polygon
iverase opened a new pull request #2452: URL: https://github.com/apache/lucene-solr/pull/2452 I had a look into this failing polygon and it seems the issue comes in the logic the splits the polygon for further processing. It might happen that the new edge introduced is collinear with edges from the polygon. This situation makes this edges not eligible for filtering and it makes the logic fail. This change makes sure we don't introduce collinear edges when splitting polygons. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on pull request #2451: LUCENE-9687: Hunspell suggestions: reduce work in the findSimilarDictionaryEntries loop
donnerpeter commented on pull request #2451: URL: https://github.com/apache/lucene-solr/pull/2451#issuecomment-790570709 Sorry, I couldn't create a separate JIRA issue for this change due to some "XSRF Security Token Missing" error in JIRA This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter opened a new pull request #2451: LUCENE-9687: Hunspell suggestions: reduce work in the findSimilarDictionaryEntries loop
donnerpeter opened a new pull request #2451: URL: https://github.com/apache/lucene-solr/pull/2451 # Description The loop is called a lot of times, and some allocations and method calls can be spared # Solution Extract some code outside the loop # Tests No new tests, ~5% speedup in `TestPerformance.en_suggest` # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir commented on pull request #2429: LUCENE-9791 Allow calling BytesRefHash#find concurrently
rmuir commented on pull request #2429: URL: https://github.com/apache/lucene-solr/pull/2429#issuecomment-790554086 Just to emphasize it even more, this class is marked `@lucene.internal`. The class shouldn't even be exposed to the outside in the public API to start with, so let's please not increase the exposure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir commented on pull request #2429: LUCENE-9791 Allow calling BytesRefHash#find concurrently
rmuir commented on pull request #2429: URL: https://github.com/apache/lucene-solr/pull/2429#issuecomment-790551382 I still don't like the unmodifiable-interface. Sorry, I disagree with exposing thread-safe methods officially in the API for class that should only be used by one thread, just because one user of the class did it in a wrong way. It was my understanding that the problem is being solved this way because its "too hard" to fix lucene-monitor to instead do things correctly: I'll accept that we should do a "quick fix" to workaround its bugginess, but we should ultimately file JIRA issue to fix it (it should not use such a class with multiple threads). We shouldn't expose what we have done in public apis, it is just a temporary solution. If someone wants such a genpurpose hashtable they can use `HashMap` from their jdk, we aren't a hashtable library. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15038) Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to elevation functionality
[ https://issues.apache.org/jira/browse/SOLR-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295207#comment-17295207 ] Dawid Weiss commented on SOLR-15038: Hi Bruno. No, I've no idea... I think the goal was to disallow solr from writing into source locations - I'm not sure if 8x allows that or if it's something introduced on master (sorry!). I only noticed this while changing some bits before repo splitting. > Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to > elevation functionality > > > Key: SOLR-15038 > URL: https://issues.apache.org/jira/browse/SOLR-15038 > Project: Solr > Issue Type: Improvement > Components: query >Reporter: Tobias Kässmann >Priority: Minor > Fix For: 8.9 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We've worked a lot with Query Elevation component in the last time and we > were missing two features: > * Elevate only documents that are part of the search result > * In combination with collapsing: Only show the representative if the > elevated documents does have the same collapse field value. > Because of this, we've added these two feature toggles > _elevateDocsWithoutMatchingQ_ and _onlyElevatedRepresentative._ > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-15038) Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to elevation functionality
[ https://issues.apache.org/jira/browse/SOLR-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295203#comment-17295203 ] Bruno Roustant edited comment on SOLR-15038 at 3/4/21, 10:37 AM: - Ouch, yes I'll revert that. I played with this permission but didn't intend to commit it. When running the tests I noticed many Solr tests have warning about being unable to create some test resources. {code:java} java.security.AccessControlException: access denied ("java.io.FilePermission" "lucene-solr/solr/core/build/resources/test/solr/userfiles" "write") at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:?] at java.security.AccessController.checkPermission(AccessController.java:897) ~[?:?] at java.lang.SecurityManager.checkPermission(SecurityManager.java:322) ~[?:?] at java.lang.SecurityManager.checkWrite(SecurityManager.java:752) ~[?:?] at sun.nio.fs.UnixPath.checkWrite(UnixPath.java:824) ~[?:?] at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:377) ~[?:?] at java.nio.file.Files.createDirectory(Files.java:689) ~[?:?] at java.nio.file.Files.createAndCheckIsDirectory(Files.java:796) ~[?:?] at java.nio.file.Files.createDirectories(Files.java:742) ~[?:?] at org.apache.solr.core.CoreContainer.(CoreContainer.java:383) [main/:?] at org.apache.solr.core.CoreContainer.(CoreContainer.java:344) [main/:?]{code} I noticed they disappeared when I changed the permission for write-access in solr-tests.policy. [~dweiss] do you know how to get rid of these (many) warnings? was (Author: broustant): Ouch, yes I'll revert that. I played with this permission but didn't intend to commit it. When running the tests I noticed many Solr tests have warning about being unable to create some test resources. {code:java} java.security.AccessControlException: access denied ("java.io.FilePermission" "lucene-solr/solr/core/build/resources/test/solr/userfiles" "write") at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:?] at java.security.AccessController.checkPermission(AccessController.java:897) ~[?:?] at java.lang.SecurityManager.checkPermission(SecurityManager.java:322) ~[?:?] at java.lang.SecurityManager.checkWrite(SecurityManager.java:752) ~[?:?] at sun.nio.fs.UnixPath.checkWrite(UnixPath.java:824) ~[?:?] at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:377) ~[?:?] at java.nio.file.Files.createDirectory(Files.java:689) ~[?:?] at java.nio.file.Files.createAndCheckIsDirectory(Files.java:796) ~[?:?] at java.nio.file.Files.createDirectories(Files.java:742) ~[?:?] at org.apache.solr.core.CoreContainer.(CoreContainer.java:383) [main/:?] at org.apache.solr.core.CoreContainer.(CoreContainer.java:344) [main/:?]{code} I noticed they disappeared when I changed the permission for write-access in solr-tests.policy. Do you know how to get rid of these (many) warnings? > Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to > elevation functionality > > > Key: SOLR-15038 > URL: https://issues.apache.org/jira/browse/SOLR-15038 > Project: Solr > Issue Type: Improvement > Components: query >Reporter: Tobias Kässmann >Priority: Minor > Fix For: 8.9 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We've worked a lot with Query Elevation component in the last time and we > were missing two features: > * Elevate only documents that are part of the search result > * In combination with collapsing: Only show the representative if the > elevated documents does have the same collapse field value. > Because of this, we've added these two feature toggles > _elevateDocsWithoutMatchingQ_ and _onlyElevatedRepresentative._ > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15038) Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to elevation functionality
[ https://issues.apache.org/jira/browse/SOLR-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295203#comment-17295203 ] Bruno Roustant commented on SOLR-15038: --- Ouch, yes I'll revert that. I played with this permission but didn't intend to commit it. When running the tests I noticed many Solr tests have warning about being unable to create some test resources. {code:java} java.security.AccessControlException: access denied ("java.io.FilePermission" "lucene-solr/solr/core/build/resources/test/solr/userfiles" "write") at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:?] at java.security.AccessController.checkPermission(AccessController.java:897) ~[?:?] at java.lang.SecurityManager.checkPermission(SecurityManager.java:322) ~[?:?] at java.lang.SecurityManager.checkWrite(SecurityManager.java:752) ~[?:?] at sun.nio.fs.UnixPath.checkWrite(UnixPath.java:824) ~[?:?] at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:377) ~[?:?] at java.nio.file.Files.createDirectory(Files.java:689) ~[?:?] at java.nio.file.Files.createAndCheckIsDirectory(Files.java:796) ~[?:?] at java.nio.file.Files.createDirectories(Files.java:742) ~[?:?] at org.apache.solr.core.CoreContainer.(CoreContainer.java:383) [main/:?] at org.apache.solr.core.CoreContainer.(CoreContainer.java:344) [main/:?]{code} I noticed they disappeared when I changed the permission for write-access in solr-tests.policy. Do you know how to get rid of these (many) warnings? > Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to > elevation functionality > > > Key: SOLR-15038 > URL: https://issues.apache.org/jira/browse/SOLR-15038 > Project: Solr > Issue Type: Improvement > Components: query >Reporter: Tobias Kässmann >Priority: Minor > Fix For: 8.9 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We've worked a lot with Query Elevation component in the last time and we > were missing two features: > * Elevate only documents that are part of the search result > * In combination with collapsing: Only show the representative if the > elevated documents does have the same collapse field value. > Because of this, we've added these two feature toggles > _elevateDocsWithoutMatchingQ_ and _onlyElevatedRepresentative._ > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] pawel-bugalski-dynatrace commented on pull request #2429: LUCENE-9791 Allow calling BytesRefHash#find concurrently
pawel-bugalski-dynatrace commented on pull request #2429: URL: https://github.com/apache/lucene-solr/pull/2429#issuecomment-790467163 @rmuir @madrob what do you think about current state of this PR? Any more comments? What else needs to be done to merge it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15038) Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to elevation functionality
[ https://issues.apache.org/jira/browse/SOLR-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295083#comment-17295083 ] Dawid Weiss commented on SOLR-15038: This change introduced write-access to sources: {code} - permission java.io.FilePermission "${common.dir}${/}..${/}solr${/}-", "read"; + permission java.io.FilePermission "${common.dir}${/}..${/}solr${/}-", "read,write"; {code} I think this bit should be reverted? > Add elevateDocsWithoutMatchingQ and onlyElevatedRepresentative parameters to > elevation functionality > > > Key: SOLR-15038 > URL: https://issues.apache.org/jira/browse/SOLR-15038 > Project: Solr > Issue Type: Improvement > Components: query >Reporter: Tobias Kässmann >Priority: Minor > Fix For: 8.9 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We've worked a lot with Query Elevation component in the last time and we > were missing two features: > * Elevate only documents that are part of the search result > * In combination with collapsing: Only show the representative if the > elevated documents does have the same collapse field value. > Because of this, we've added these two feature toggles > _elevateDocsWithoutMatchingQ_ and _onlyElevatedRepresentative._ > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14759) Separate the Lucene and Solr builds
[ https://issues.apache.org/jira/browse/SOLR-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-14759: --- Description: While still in same git repo, separate the builds, so Lucene and Solr can be built independently. The preparation step includes optional building of just Lucene from current master (prior to any code removal): Current status of joint and separate builds: * (/) joint build {code} gradlew assemble check {code} * (/) Lucene-only {code} gradlew -Dskip.solr=true assemble check {code} * (/) Solr-only (with documentation exclusions) {code} gradlew -Dskip.lucene=true assemble check -x test -x documentation -x checkBrokenLinks -x checkLocalJavadocLinksSite {code} was: While still in same git repo, separate the builds, so Lucene and Solr can be built independently. The preparation step includes optional building of just Lucene from current master (prior to any code removal): Current status of joint and separate builds: * (/) joint build {code:java} gradlew assemble check{code} {code:java} # Current build (no tests, Lucene+Solr) gradlew assemble check# Lucene-only build. gradlew -Dskip.solr=true assemble check# Solr-only build gradlew -Dskip.lucene=true assemble check -x test -x documentation -x checkBrokenLinks -x checkLocalJavadocLinksSite {code} > Separate the Lucene and Solr builds > --- > > Key: SOLR-14759 > URL: https://issues.apache.org/jira/browse/SOLR-14759 > Project: Solr > Issue Type: Sub-task > Components: Build >Reporter: Jan Høydahl >Assignee: Dawid Weiss >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > While still in same git repo, separate the builds, so Lucene and Solr can be > built independently. > The preparation step includes optional building of just Lucene from current > master (prior to any code removal): > Current status of joint and separate builds: > * (/) joint build > {code} > gradlew assemble check > {code} > * (/) Lucene-only > {code} > gradlew -Dskip.solr=true assemble check > {code} > * (/) Solr-only (with documentation exclusions) > {code} > gradlew -Dskip.lucene=true assemble check -x test -x documentation -x > checkBrokenLinks -x checkLocalJavadocLinksSite > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14759) Separate the Lucene and Solr builds
[ https://issues.apache.org/jira/browse/SOLR-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-14759: --- Description: While still in same git repo, separate the builds, so Lucene and Solr can be built independently. The preparation step includes optional building of just Lucene from current master (prior to any code removal): Current status of joint and separate builds: * (/) joint build {code:java} gradlew assemble check{code} {code:java} # Current build (no tests, Lucene+Solr) gradlew assemble check# Lucene-only build. gradlew -Dskip.solr=true assemble check# Solr-only build gradlew -Dskip.lucene=true assemble check -x test -x documentation -x checkBrokenLinks -x checkLocalJavadocLinksSite {code} was: While still in same git repo, separate the builds, so Lucene and Solr can be built independently. The preparation step includes optional building of just Lucene from current master (prior to any code removal): {code:java} gradlew -Dskip.solr=true check -x checkUnusedConstraints -x verifyLocks{code} > Separate the Lucene and Solr builds > --- > > Key: SOLR-14759 > URL: https://issues.apache.org/jira/browse/SOLR-14759 > Project: Solr > Issue Type: Sub-task > Components: Build >Reporter: Jan Høydahl >Assignee: Dawid Weiss >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > While still in same git repo, separate the builds, so Lucene and Solr can be > built independently. > The preparation step includes optional building of just Lucene from current > master (prior to any code removal): > Current status of joint and separate builds: > * (/) joint build > {code:java} > gradlew assemble check{code} > > {code:java} > # Current build (no tests, Lucene+Solr) > gradlew assemble check# Lucene-only build. > gradlew -Dskip.solr=true assemble check# Solr-only build > gradlew -Dskip.lucene=true assemble check -x test -x documentation -x > checkBrokenLinks -x checkLocalJavadocLinksSite > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14759) Separate the Lucene and Solr builds
[ https://issues.apache.org/jira/browse/SOLR-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295076#comment-17295076 ] Dawid Weiss commented on SOLR-14759: Current status will be tracked at the main issue description level. > Separate the Lucene and Solr builds > --- > > Key: SOLR-14759 > URL: https://issues.apache.org/jira/browse/SOLR-14759 > Project: Solr > Issue Type: Sub-task > Components: Build >Reporter: Jan Høydahl >Assignee: Dawid Weiss >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > While still in same git repo, separate the builds, so Lucene and Solr can be > built independently. > The preparation step includes optional building of just Lucene from current > master (prior to any code removal): > {code:java} > gradlew -Dskip.solr=true check -x checkUnusedConstraints -x verifyLocks{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2448: SOLR-14759: a few initial changes so that Lucene can be built independently while Solr code is still in place.
dweiss commented on a change in pull request #2448: URL: https://github.com/apache/lucene-solr/pull/2448#discussion_r587241698 ## File path: gradle/documentation/documentation.gradle ## @@ -20,34 +20,41 @@ configure(rootProject) { def refguideUrlVersion = project.baseVersion.replaceFirst(/^(\d+)\.(\d+).*$/, '$1_$2') ext { -luceneDocUrl = project.propertyOrDefault('lucene.javadoc.url', { - if (project.version != project.baseVersion) { -// non-release build -new File(project('lucene:documentation').buildDir, 'site').toURI().toASCIIString().minus(~'/$') - } else { -// release build -"https://lucene.apache.org/core/${urlVersion}; - } -}()) - -solrDocUrl = project.propertyOrDefault('solr.javadoc.url', { - if (project.version != project.baseVersion) { -// non-release build -new File(project('solr:documentation').buildDir, 'site').toURI().toASCIIString().minus(~'/$') - } else { -// release build -"https://lucene.apache.org/solr/${urlVersion}; - } -}()) +if (!skipLucene) { + luceneDocUrl = project.propertyOrDefault('lucene.javadoc.url', { +if (project.version != project.baseVersion) { + // non-release build + new File(project('lucene:documentation').buildDir, 'site').toURI().toASCIIString().minus(~'/$') +} else { + // release build + "https://lucene.apache.org/core/${urlVersion}; +} + }()) +} -solrRefguideUrl = project.propertyOrDefault('solr.refguide.url', "https://lucene.apache.org/solr/guide/${refguideUrlVersion};) +// SOLR ONLY +if (!skipSolr) { + solrDocUrl = project.propertyOrDefault('solr.javadoc.url', { +if (project.version != project.baseVersion) { + // non-release build + new File(project('solr:documentation').buildDir, 'site').toURI().toASCIIString().minus(~'/$') +} else { + // release build + "https://lucene.apache.org/solr/${urlVersion}; +} + }()) + + solrRefguideUrl = project.propertyOrDefault('solr.refguide.url', "https://lucene.apache.org/solr/guide/${refguideUrlVersion};) +} } task documentation() { group = 'documentation' description = 'Generate all documentation' -dependsOn ':lucene:documentation:assemble' +if (!skipLucene) { + dependsOn ':lucene:documentation:assemble' +} dependsOn ':solr:documentation:assemble' Review comment: top-level documentation wasn't part of assemble, that's why it worked. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org