[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599797#comment-16599797 ] ASF subversion and git services commented on LUCENE-8267: - Commit d93c46ea94dec612aa53e37d119fe34b5e8a828e in lucene-solr's branch refs/heads/master from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d93c46e ] LUCENE-8267: adjust CHANGES.txt advise > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
++ > On 31. Aug 2018, at 17:55, David Smiley (JIRA) wrote: > > >[ > https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598935#comment-16598935 > ] > > David Smiley commented on LUCENE-8267: > -- > > "... consider removing to use the default or experiment with one of the > others." > > Okay Simon? They will visit this issue and/or dig to see what others exist > to make the decision for themselves. > >> Remove memory codecs from the codebase >> -- >> >>Key: LUCENE-8267 >>URL: https://issues.apache.org/jira/browse/LUCENE-8267 >>Project: Lucene - Core >> Issue Type: Task >> Reporter: Dawid Weiss >> Assignee: Dawid Weiss >> Priority: Major >>Fix For: master (8.0) >> >>Attachments: LUCENE-8267.patch >> >> Time Spent: 0.5h >> Remaining Estimate: 0h >> >> Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random >> selection of codecs for tests and cause occasional OOMs when a test with >> huge data is selected. We don't use those memory codecs anywhere outside of >> tests, it has been suggested to just remove them to avoid maintenance costs >> and OOMs in tests. [1] >> [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 > > > > -- > This message was sent by Atlassian JIRA > (v7.6.3#76005) > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598935#comment-16598935 ] David Smiley commented on LUCENE-8267: -- "... consider removing to use the default or experiment with one of the others." Okay Simon? They will visit this issue and/or dig to see what others exist to make the decision for themselves. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598335#comment-16598335 ] Dawid Weiss commented on LUCENE-8267: - It'd be ideal if Solr had a migration.txt file, not just changes.txt. It'd be a better fit there. If you insist on having the FST50 mentioned, I'd suggest something like: {code} * LUCENE-8267: Memory codecs have been removed from the codebase (MemoryPostings, MemoryDocValues). If you used postingsFormat="Memory" or docValuesFormat="Memory", consider using the defaults. For in-memory postings, you can try the "FST50" format as an alternative to "Memory". (Dawid Weiss) {code} > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597436#comment-16597436 ] Simon Willnauer commented on LUCENE-8267: - I personally don't think we should put FST50 into this message. The message links to this issue which has all the discussion. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597386#comment-16597386 ] David Smiley commented on LUCENE-8267: -- Can someone propose new wording here, or is my proposal fine. Note my proposal mentions the default twice, both for postingsFormat and docValuesFormat. We agree that the default codec is an excellent codec. Remember that someone who chooses another one has done so explicitly and is thus aware of the default codec already and yet chose something else as a better fit for them. I want the wording to mention FST50 as an option try try; this postingsFormat seems to fly under the radar of people's awareness. Ultimately the user is going to have to do their own experiments to make the choice for them. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597151#comment-16597151 ] Simon Willnauer commented on LUCENE-8267: - +1 to use defaults as well. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597132#comment-16597132 ] Adrien Grand commented on LUCENE-8267: -- +1 to recommend defaults > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597114#comment-16597114 ] Dawid Weiss commented on LUCENE-8267: - I don't have an opinion on this, really. Hardcoding FST50 seems like binding to a concrete version? You're probably right that Direct is not the best choice though. Perhaps suggest leaving it at the default value like docValues? > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596879#comment-16596879 ] David Smiley commented on LUCENE-8267: -- [~dweiss] I noticed the solr/CHANGES.txt entry you added recommended users switch to "Direct" instead. I'm surprised we would recommend that (especially given the demise of "Memory"). Wouldn't FST50 be better? I'd like to reword the CHANGES.txt to the following: {noformat} * LUCENE-8267: Memory codecs have been removed from the codebase (MemoryPostings, MemoryDocValues). If you used postingsFormat="Memory" switch to "FST50" as the next best alternative, or use the default. If you used docValuesFormat="Memory" then remove it to get the default. (Dawid Weiss){noformat} > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467071#comment-16467071 ] ASF subversion and git services commented on LUCENE-8267: - Commit 85c00e77efdf53f30da6eaffd38c2b016a7805bc in lucene-solr's branch refs/heads/master from [~dawid.weiss] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=85c00e7 ] LUCENE-8267: removed references to memory codecs. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467064#comment-16467064 ] Dawid Weiss commented on LUCENE-8267: - I ran nightly tests three times, but I can't get past Solr tests failing -- different tests each time, don't seem to be related to the change (cloud, distributed). I'm committing it in, regardless of those failures. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465910#comment-16465910 ] Dawid Weiss commented on LUCENE-8267: - Removed references to memory postings and memory docvalues. An aggregate of changes is here, precommit passes, running tests now. https://github.com/apache/lucene-solr/pull/372 > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463572#comment-16463572 ] Dawid Weiss commented on LUCENE-8267: - I was on short holidays, I'll take care of it soon. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461588#comment-16461588 ] David Smiley commented on LUCENE-8267: -- With the help of others using the SolrTextTagger, we've concluded that the speed difference is negligible. I'm glad we've then reached consensus that the MemoryPostingsFormat will not be missed! :D +1 to remove MemoryPostingsFormat & DirectPostingsFormat {quote}I think filing a JIRA issue is kind of soliciting feedback, don't you think? {quote} No! At least not beyond our insular world. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448788#comment-16448788 ] Robert Muir commented on LUCENE-8267: - {quote} Thanks for the suggestion to use MMapDirectory.preload, I didn't know about it, but that appears to only help warmup, not sustained performance; right? {quote} loading stuff into heap memory gives no higher guarantee than doing it this way under pressure, it still depends on vm parameters. {quote} I get the maintenance aspect but we need community input on such decisions to ascertain real-world use. {quote} That is not how it works: this is open source. These memory/direct formats cause excessive maintenance hassle with the tests. I saw Alan and Dawid fighting with them and it seemed clear to me its not worth the trouble. We should remove them: the cost is too high. Someone can always pull in the source code themselves for their esoteric use-case: but unless we have *maintainers* coming up then they need to go: this doesn't come down to a vote by users. If you want to make it hard for us to clean up tech debt like this, by -1s and so on, thats your choice. But it is also my choice to make it hard to add things. Trust me, I will make it equally hard to add code as it is to remove code. It is the only way to make things sustainable. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448771#comment-16448771 ] Dawid Weiss commented on LUCENE-8267: - bq. I get the maintenance aspect but we need community input on such decisions to ascertain real-world use. I think filing a JIRA issue is kind of soliciting feedback, don't you think? I agree with Simon and Robert that there are classes that, while useful, are not at the forefront of what a broad "Lucene API" is... We should have the liberty to adjust or remove such things. I scanned the code of both Lucene and Solr and there were no references (other than in tests) to those classes, so it's not just "Lucene land". Also, given the size and diversity of the Lucene/Solr user community I'm fairly confident there will always be somebody who finds something very useful, no matter what you'd like to change or remove. Hell, I use a lot of internal Lucene infrastructure in my own projects and sometimes I miss things that go away myself... (and frequently I just grab the latest source of something and copy it over to maintain in my own source tree, that's part of the beauty of open source). > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448748#comment-16448748 ] David Smiley commented on LUCENE-8267: -- Ah; I incorrectly assumed the proposal included the FST postings formats but apparently not. It's too bad FSTPulsingFormat is long gone though since in the text-tagging use-case it'd effectively be a substitute for MemoryPostingsFormat. The FSTTermsReader accepts a PostingsReaderBase; maybe it's possible to write an in-memory version of PostingsReaderBase, at least for the "pulsed" (single posting) case. Nonetheless lets see how the text tagger performs with these codec options. Thanks for the suggestion to use MMapDirectory.preload, I didn't know about it, but that appears to only help warmup, not sustained performance; right? And I believe even with FileSwitchDirectory, on shutdown files with certain extensions would vanish; right? {quote}So I perceive your veto as an aggressive step. To me it's a last resort after we can't find a solution that is good for all of us. The conversation already has a tone that is not appropriate and could have been prevented by formulating objections as questions. like I am using this postings format in X and it's serving well, what are the alternatives. - I am sure you would have got an awesome answer. {quote} The "sorry" word immediately after my veto was intended to prevent misperceptions about tone; I don't mean to be aggressive – sorry! I agree I could have asked for alternatives up-front; I'll try and remember that next time. I was thinking my early vote could prevent work that someone does in vein to remove these pieces. In retrospect I didn't need to vote yet to accomplish that (e.g. convey disagreement with others). In this way I was trying to offer improved communication where from other's I've seen no veto but a confusing cloud of doubt as to wether there would be a veto or not (which in my mind is worse). I respect you may feel differently though; just please understand my intended tone is not aggressive. {quote}if you can't remove stuff without others jumping in vetoing the reaction will be to prevent additions in the same way due to _fear_ created by the veto. This is a terrible place to be in, we have seen this in the past we should prevent it. {quote} Do you mean if we add some new thingamajig, we might feel that we *have* to support it indefinitely? (I wouldn't use the word "fear" for this; maybe I've got your intent wrong still) Hmmm; I think it's very situationally dependent. For example with queryNorm & coords, LUCENE-7347, I had concerns but ultimately understood that maintaining these things were making things awkward for us. But the PostingsFormats seem different to me. They conform to our APIs; they don't get in the way or tie our hands. Yes there is maintenance though. I think what I objected to most in the description of this issue was the notion that, because Lucene-core doesn't use something and because there is maintenance to that something, then we should delete that something. I get the maintenance aspect but we need community input on such decisions to ascertain real-world use. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448287#comment-16448287 ] Simon Willnauer commented on LUCENE-8267: - +1 to what [~rcmuir] said so many more efficient options {quote}Do you mean to say I should have said all I said without voting first? Lets have a conversation! (we _are_ having a conversation){quote} So I perceive your veto as an aggressive step. To me it's a last resort after we can't find a solution that is good for all of us. The conversation already has a tone that is not appropriate and could have been prevented by formulating objections as questions. like _I am using this postings format in X and it's serving well, what are the alternatives._ - I am sure you would have got an awesome answer. {quote}I don't understand this point of view; can you please elaborate? Fear of what?{quote} if you can't remove stuff without others jumping in vetoing the reaction will be to prevent additions in the same way due to _fear_ created by the veto. This is a terrible place to be in, we have seen this in the past we should prevent it. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448272#comment-16448272 ] Robert Muir commented on LUCENE-8267: - There are a lot of other alternatives to putting data in heap memory directly in the postings format. The best (IMO) is for the user to use MMapDirectory.preload with the standard index format. This way it doesn't impact their java heap and they use supported index format. Users can also use RAMDirectory/FileSwitchDirectory to load specified files into heap. Finally, users can use FSTPostingsFormat which will load *term dictionary only* into heap fst. This is way different than Memory/Direct which load not only terms but also postings lists and positions and stuff all into heap RAM. So i don't really see any technical merit for your objection: there are many other ways to have a ram-resident terms dictionary, many of them better than the inefficient Memory/Direct formats. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448249#comment-16448249 ] David Smiley commented on LUCENE-8267: -- bq. given that you know that you are using your veto here we are already in a terrible position to have any conversation Do you mean to say I should have said all I said without voting first? Lets have a conversation! (we _are_ having a conversation) bq. we will have a super hard time adding stuff. It creates fear driven decisions. I don't understand this point of view; can you please elaborate? Fear of what? bq. Can you quantify the "it's nice"? Yes, I shall do that. My preferred route to do this is find an existing user of the "Solr Text Tagger" who can experiment with the postingsFormat setting to try a comparison with the default format. Failing that, I'll create a benchmark using that project. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448208#comment-16448208 ] Simon Willnauer commented on LUCENE-8267: - {quote} If we are going to make it harder to remove stuff, I have no problem being the one to make it equally harder to add stuff. \{quote} I agree this is one of these issues that we have to face. if we put the bar very high to remove stuff that is not mainstream then we will have a super hard time adding stuff. It creates fear driven decisions. It sucks I agree with [~rcmuir] 100% here. {quote} -1 sorry. I've used the MemoryPostingsFormat for a text-tagging use-case where there are intense lookups against the terms dictionary. It's highly beneficial to have the terms dictionary be entirely memory resident, albeit in a compact FST. The issue description mentions "We don't use those memory codecs anywhere outside of tests" – this should be no surprise as it's not the default codec. I'm sure it may be hard to gauge the level of use of something outside of core-Lucene. When we ponder removing something that Lucene doesn't even _need_, I propose we raise the issue more openly to the community. Perhaps the question could be proposed in CHANGES.txt and/or release announcements to solicit community input? {quote} given that you know that you are using your veto here we are already in a terrible position to have any conversation. Can you quantify the "it's nice"? since there are alternatives that (standard codec) can you go and provide some numbers. We should not use vetos based on non-quantifiable arguments IMO. We can go and ask the community but I don't expect much useful outcome, most of the folks don't know what they are using here and there. Nevertheless, I am happy to send a mail to dev to get this information. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448143#comment-16448143 ] Robert Muir commented on LUCENE-8267: - If we are going to make it harder to remove stuff, I have no problem being the one to make it equally harder to add stuff. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448123#comment-16448123 ] David Smiley commented on LUCENE-8267: -- -1 sorry. I've used the MemoryPostingsFormat for a text-tagging use-case where there are intense lookups against the terms dictionary. It's highly beneficial to have the terms dictionary be entirely memory resident, albeit in a compact FST. The issue description mentions "We don't use those memory codecs anywhere outside of tests" -- this should be no surprise as it's not the default codec. I'm sure it may be hard to gauge the level of use of something outside of core-Lucene. When we ponder removing something that Lucene doesn't even _need_, I propose we raise the issue more openly to the community. Perhaps the question could be proposed in CHANGES.txt and/or release announcements to solicit community input? Perhaps BaseRangeFieldQueryTestCase.verify should ascertain if the postings format is a known "memory" postings format (of which there are several, to include "Direct"), and if so then use JUnit's Assume to bail out? If this is hard to do, we ought to add a convenience method to make it easier. Speaking of memory postings formats, I'm in favor of the Direct postings format going away since it ought to be re-imagined as some sort of read-time FilterCodecReader that does not require an index format. Credit to Alan for that idea years ago. Though that's more of a re-orientation of something that exists rather than saying it should go away entirely. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448060#comment-16448060 ] Robert Muir commented on LUCENE-8267: - +1 > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447787#comment-16447787 ] Simon Willnauer commented on LUCENE-8267: - +1 > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447751#comment-16447751 ] Adrien Grand commented on LUCENE-8267: -- +1 > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org