[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053361#comment-17053361 ] Dawid Weiss commented on LUCENE-9241: - I wasn't really that much concerned; just pointing out the (sad) fact of how it's implemented for Windows. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052697#comment-17052697 ] ASF subversion and git services commented on LUCENE-9241: - Commit 9cfdf17b2895866877668002d443277a46cd04e8 in lucene-solr's branch refs/heads/master from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9cfdf17 ] LUCENE-9241: fix tests to pass with -Xmx128m > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052668#comment-17052668 ] Robert Muir commented on LUCENE-9241: - [~dweiss] I saw a recent URLclassloader windows leak thread on the jdk list and it reminded me of this issue. I'll remove the use of getResource (*please keep in mind there are many of these elsewhere in the codebase if you are actually concerned about this*). Instead, if the user screws up here in their test, they'll get a NullPointerException and they can follow the stack trace. Soon the default NPE from the JDK will actually be more helpful than such custom messages like this anyway. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043928#comment-17043928 ] Dawid Weiss commented on LUCENE-9241: - I have reviewed it as well. :) Except for the things I mentioned I didn't think anything else was worth mentioning. Direct memory allocation may be misleading in that it is still allocation but escapes the heap... but I don't have an opinion on that (whether it's a good thing or not) so I'll just leave it up to you. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043486#comment-17043486 ] Bruno Roustant commented on LUCENE-9241: As expected I saw no noticeable impact in the luceneutil benchmarks. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042960#comment-17042960 ] Robert Muir commented on LUCENE-9241: - Dawid, if you think Class.getResource has some crazy behavior like this on windows, then I think we should really open a bug with the JDK. If it is such a problem, shouldnt existing usages be removed, and it added to forbidden APIs, until the bug is fixed? https://github.com/apache/lucene-solr/search?q=getResource%28_q=getResource%28 I merely tried to simplify the tests... that is really all this patch is about. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042949#comment-17042949 ] Dawid Weiss commented on LUCENE-9241: - I'm really indifferent about it - I was just pointing out the fact that such pattern (opening an url, not the stream) was (and is) a problem sometimes. Which classloader is going to load tests is often beyond our control; windows is typically the evil here -- it has limited subprocess command argument line so gradle may (and I think this is coming in the next version) try to avoid the problem by forking a launcher which loads JARs in a separate classloader (arguments from file) rather than using system classpath option. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042896#comment-17042896 ] Robert Muir commented on LUCENE-9241: - This isn't a URLClassloader here. The standard one is not URLClassLoader anymore. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042893#comment-17042893 ] Dawid Weiss commented on LUCENE-9241: - I remember now. When you're using a dynamic class loader (URLClassLoader or its subclass) then resources opened on the URL directly will lock the jar. When you use getResourcesAsStream it registers the jar as closeable (as in the code above) and closing the class loader releases the lock on the file as well. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042892#comment-17042892 ] Dawid Weiss commented on LUCENE-9241: - It is actually (and sadly) still true. You're looking at parent class but getResourceAsStream is overriden in URLClassLoader; the behavioral difference is still in there, here: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/net/URLClassLoader.java#L291-L305 > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042888#comment-17042888 ] Dawid Weiss commented on LUCENE-9241: - Not true... anymore. Because I definitely struggled with this at some point of time (java 8?) and there used to be a difference. Thanks for clarifying though. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042885#comment-17042885 ] Robert Muir commented on LUCENE-9241: - {quote} the URL based version causes jar to be locked on Windows (if I recall right). I don't see the benefit of switching to URL here? {quote} Not true. the existing getResourceAsStream is simply getResource + openStream. this way the exc handling is simpler. Look here: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L1723-L1731 (Sorry, I have to call such things out, lest we have shitty code based on rumors or other wrong reasons) > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042871#comment-17042871 ] Dawid Weiss commented on LUCENE-9241: - Nightlies would still require a larger heap (because of increased iteration counts)? There is a runtime difference to this: {code} -InputStream is = TestJapaneseTokenizer.class.getResourceAsStream("userdict.txt"); -if (is == null) { +URL resource = TestJapaneseTokenizer.class.getResource("userdict.txt"); +if (resource == null) { throw new RuntimeException("Cannot find userdict.txt in test classpath!"); } {code} the URL based version causes jar to be locked on Windows (if I recall right). I don't see the benefit of switching to URL here? If there are tests that really require large amount of ram we could create a group for these and then create a separate test run for these... Or assume the nightlies have a bumped heap amount? > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042785#comment-17042785 ] Robert Muir commented on LUCENE-9241: - There are a few "real" code changes here to review: * {{kuromoji}} adopts {{nori}}'s in-memory representation of the in-memory connection cost matrix. Instead of a 2-D heap {{short[][]}}, it uses a direct buffer. I think this is a better representation. * {{RunAutomaton}} uses a {{FixedBitSet}} instead of a {{boolean[]}} for the accept states. This is only checked once per "word" by subclasses (see e.g. https://github.com/apache/lucene-solr/blob/bed694ec8811c67b8ba4b4c8943e60eda281850a/lucene/core/src/java/org/apache/lucene/util/automaton/ByteRunAutomaton.java#L44 ), and it just adds some shift/mask there. Probably helps to not be so wasteful. On the other hand this isn't the heaviest part of this data structure when tableized, but its at least a little less stupid? * "write-time" data structures of {{kuromoji}}/{{nori}} are a little more efficient on the connection costs and per-term metadata. This only impacts tests or "regenerate" type tasks, but we shouldn't be so wasteful anyway: these classes have been recently moved into the public API. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042782#comment-17042782 ] Robert Muir commented on LUCENE-9241: - Attached patch. [~dawid.weiss] I didn't yet modify the gradle build, I figured lets just clean up the memory hungry tests first. It is almost possible to run with 64MB heap with the patch, but we'd need to use OfflineSorter for the kuromoji/nori dictionary builds, which is more involved. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org