[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-03-06 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053361#comment-17053361
 ] 

Dawid Weiss commented on LUCENE-9241:
-

I wasn't really that much concerned; just pointing out the (sad) fact of how 
it's implemented for Windows.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-03-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052697#comment-17052697
 ] 

ASF subversion and git services commented on LUCENE-9241:
-

Commit 9cfdf17b2895866877668002d443277a46cd04e8 in lucene-solr's branch 
refs/heads/master from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9cfdf17 ]

LUCENE-9241: fix tests to pass with -Xmx128m


> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-03-05 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052668#comment-17052668
 ] 

Robert Muir commented on LUCENE-9241:
-

[~dweiss] I saw a recent URLclassloader windows leak thread on the jdk list and 
it reminded me of this issue.

I'll remove the use of getResource (*please keep in mind there are many of 
these elsewhere in the codebase if you are actually concerned about this*).

Instead, if the user screws up here in their test, they'll get a 
NullPointerException and they can follow the stack trace. Soon the default NPE 
from the JDK will actually be more helpful than such custom messages like this 
anyway.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-24 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043928#comment-17043928
 ] 

Dawid Weiss commented on LUCENE-9241:
-

I have reviewed it as well. :) Except for the things I mentioned I didn't think 
anything else was worth mentioning. Direct memory allocation may be misleading 
in that it is still allocation but escapes the heap... but I don't have an 
opinion on that (whether it's a good thing or not) so I'll just leave it up to 
you.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-24 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043486#comment-17043486
 ] 

Bruno Roustant commented on LUCENE-9241:


As expected I saw no noticeable impact in the luceneutil benchmarks.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-23 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042960#comment-17042960
 ] 

Robert Muir commented on LUCENE-9241:
-

Dawid, if you think Class.getResource has some crazy behavior like this on 
windows, then I think we should really open a bug with the JDK. If it is such a 
problem, shouldnt existing usages be removed, and it added to forbidden APIs, 
until the bug is fixed?

https://github.com/apache/lucene-solr/search?q=getResource%28_q=getResource%28

I merely tried to simplify the tests... that is really all this patch is about.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-23 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042949#comment-17042949
 ] 

Dawid Weiss commented on LUCENE-9241:
-

I'm really indifferent about it - I was just pointing out the fact that such 
pattern (opening an url, not the stream) was (and is) a problem sometimes.

Which classloader is going to load tests is often beyond our control; windows 
is typically the evil here -- it has limited subprocess command argument line 
so gradle may (and I think this is coming in the next version) try to avoid the 
problem by forking a launcher which loads JARs in a separate classloader 
(arguments from file) rather than using system classpath option. 



> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-23 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042896#comment-17042896
 ] 

Robert Muir commented on LUCENE-9241:
-

This isn't a URLClassloader here. The standard one is not URLClassLoader 
anymore.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-23 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042893#comment-17042893
 ] 

Dawid Weiss commented on LUCENE-9241:
-

I remember now. When you're using a dynamic class loader (URLClassLoader or its 
subclass) then resources opened on the URL directly will lock the jar. When you 
use getResourcesAsStream it registers the jar as closeable (as in the code 
above) and closing the class loader releases the lock on the file as well.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-23 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042892#comment-17042892
 ] 

Dawid Weiss commented on LUCENE-9241:
-

It is actually (and sadly) still true. You're looking at parent class but 
getResourceAsStream is overriden in URLClassLoader; the behavioral difference 
is still in there, here:

https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/net/URLClassLoader.java#L291-L305

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-23 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042888#comment-17042888
 ] 

Dawid Weiss commented on LUCENE-9241:
-

Not true... anymore. Because I definitely struggled with this at some point of 
time (java 8?) and there used to be a difference. Thanks for clarifying though.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-23 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042885#comment-17042885
 ] 

Robert Muir commented on LUCENE-9241:
-

{quote}
the URL based version causes jar to be locked on Windows (if I recall right). I 
don't see the benefit of switching to URL here?
{quote}

Not true.

the existing getResourceAsStream is simply getResource + openStream. this way 
the exc handling is simpler.

Look here: 
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L1723-L1731

(Sorry, I have to call such things out, lest we have shitty code based on 
rumors or other wrong reasons)

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-23 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042871#comment-17042871
 ] 

Dawid Weiss commented on LUCENE-9241:
-

Nightlies would still require a larger heap (because of increased iteration 
counts)?

There is a runtime difference to this:
{code}
-InputStream is = 
TestJapaneseTokenizer.class.getResourceAsStream("userdict.txt");
-if (is == null) {
+URL resource = TestJapaneseTokenizer.class.getResource("userdict.txt");
+if (resource == null) {
   throw new RuntimeException("Cannot find userdict.txt in test 
classpath!");
 }
{code}
the URL based version causes jar to be locked on Windows (if I recall right). I 
don't see the benefit of switching to URL here?

If there are tests that really require large amount of ram we could create a 
group for these and then create a separate test run for these... Or assume the 
nightlies have a bumped heap amount?


> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-22 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042785#comment-17042785
 ] 

Robert Muir commented on LUCENE-9241:
-

There are a few "real" code changes here to review:
* {{kuromoji}} adopts {{nori}}'s in-memory representation of the in-memory 
connection cost matrix. Instead of a 2-D heap {{short[][]}}, it uses a direct 
buffer. I think this is a better representation.
* {{RunAutomaton}} uses a {{FixedBitSet}} instead of a {{boolean[]}} for the 
accept states. This is only checked once per "word" by subclasses (see e.g. 
https://github.com/apache/lucene-solr/blob/bed694ec8811c67b8ba4b4c8943e60eda281850a/lucene/core/src/java/org/apache/lucene/util/automaton/ByteRunAutomaton.java#L44
 ), and it just adds some shift/mask there. Probably helps to not be so 
wasteful. On the other hand this isn't the heaviest part of this data structure 
when tableized, but its at least a little less stupid?
* "write-time" data structures of {{kuromoji}}/{{nori}} are a little more 
efficient on the connection costs and per-term metadata. This only impacts 
tests or "regenerate" type tasks, but we shouldn't be so wasteful anyway: these 
classes have been recently moved into the public API.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-22 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042782#comment-17042782
 ] 

Robert Muir commented on LUCENE-9241:
-

Attached patch. [~dawid.weiss] I didn't yet modify the gradle build, I figured 
lets just clean up the memory hungry tests first. It is almost possible to run 
with 64MB heap with the patch, but we'd need to use OfflineSorter for the 
kuromoji/nori dictionary builds, which is more involved.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org