Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Uwe Schindler
Hi,

Dawid Weiss and I are both involved in the Apache Lucene project and we know 
the problems with MappedByteBuffer and unmapping. Dawid already responded with 
a source code link to our impl (which needs to use the hacky cleaner() 
approach; also look at the heavy documentation in this class): 
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java

So we would be very happy to get this issue resolved! The cleaner() hack is 
enabled by default in Lucene if the JVM supports it (so we won't break if 
JIGSAW prevents this, but our *large* users would heavily complain).

>> This is fundamentally about *integrity* of the runtime. It follows there
>> are security implications, but it’s still fundamentally an integrity issue
>> and guarding an unsafe operation with a Security Manager is
>> unfortunately an insufficient solution.
>
> Right, and just to add that there has been many attempts over the years 
> to find solutions to this issue. I think the closest was atomimcally 
> remapping but that wasn't feasible on all platforms and also didn't free 
> up the address space in a timely manner.

So we should really find a solution here. I was talking with several people on 
various conferences (Rory O'Donnel or Mark Reinhold) and we had some ideas how 
to solve this. My idea how to solve this is explained below (I am not a JVM 
internals or Hotspot guy, so excuse some obviously "wrong" assumptions):

Actually there are 2 issues, not only one. The first issue is, as mentioned 
before: you cannot unmap via API. This is needed for many apps, including 
Apache Lucene, for a reason which comes more from "another" bug, and this is my 
issue #2 (see below).

First, unmapping for Lucene is very important at the moment, because we operate 
on the Lucene indexes purely using mmap (see [1]), which may be several 
hundreds of Gigabytes easily. On highly dynamic systems, Lucene often maps new 
files (also very largeones ) and relies on the fact, that older, deleted files 
are unmapped in time (this does not need to be ASAP, just "in time"). So we 
have those 2 "bugs", which force us to unmap:

(1) disk space issues / delete after last close (POSIX) vs. No delete at all 
(Windows)

- disk space: we have seen customers running out of disk space on Lucene, 
because unmapping wasn’t done in time and therefore POSIX with delete on last 
close cannot free the disk space, although the file was already deleted. The 
problem you are seeing on Windows that you cannot delete, is therefore worse on 
Linux, because it is hidden to the user - you cannot free the disk space of the 
deleted file! Lucene creates and deletes files all the time while indexing 
realtime data (e.g. think of Github's very dynamic code search index, which is 
backed by Lucene/Elasticsearch).
- virtual memory: If you map huge files (several hundreds of Gigabytes) and 
they are not unmapped in time, you may run out of virtual address space. This 
especially affects Windows, because it does not use the full 46 bits (or like 
that) of addresses. So effectively you can only map like 4 Terabytes on 
Windows. If you have fragmentation of address space this gets worse (In Lucene, 
we map in chunks of 1 GiB because of the signed 32 bit integer limit of 
ByteBuffer, so fragmentation is not our biggest issue).

(2) It takes vry long time until the unmapping actually occurs!

This is the real bug! If the garbage collector would clean up the buffers asap, 
we would not need to unmap from user code. In Lucene we just delay the file 
delete on Windows, so we are not really affected by the file deletion inability 
(but that would be nice if it could be fixed).

If you look at the usage pattern of those huge, mapped files, you will see why 
they are in most cases *never ever* unmapped automatically: Lucene maps very 
large files and uses them for longer time. So the MappedByteBuffer object gets 
migrated to older generations on the heap. Garbage collection there happens, of 
course, very delayed. That would not be the most problematic part, but there is 
a second issue: The MappedByteBuffer object is just a very small object (in 
heap size measurement: just an object header and a few pointers), so the 
garbage collector does not see it as heavy! It's just a very small like 30 
bytes object instance. Why should the Garbage collector clean it up? And in 
fact it will almost never do this! The garbage collector cannot see that our 30 
bytes object instance "sits" on something like 300 Gigabytes of virtual memory 
and disk space!

One proposal to fix this would be to add something like an internal OpenJDK 
Java Annotation or similar where you can "mark" heavy objects, so Garbage 
collector could free them by preference (similar to sun.misc.Contended).

For the Apache Lucene team,
Uwe

[1] http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

-
Uwe Schindler
uschind...@apache.org 
ASF 

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Mike Hearn
Thanks for the contribution, Uwe.

So far I think I like Andrew's suggestion of a guard page the most.
Unmapping the guard page boils down to a kind of thread-local variable
without the actual cost of reading anything (in theory). So by
write-protecting the guard page and then unmapping the file, and letting
the GC clean up the guard page later, the same semantics as today are
preserved and there's no race.

I guess, although it's ugly, a system property could control whether the
NIO implementation returns an ordinary MappedByteBuffer or a new subclass,
the UnmappableMappedByteBuffer. HotSpot would then be responsible for
removing the overhead of the virtual calls, as normal. If a customer finds
that the guard page write is causing performance issues for them, they
could use the system property to get the hold behaviour back and the unmap
call would throw.

But it sounds like users with extreme VMM needs, like Lucene, would find
this a performance win rather than a loss.

I admit that I'm not a JDK dev. Writing such a patch would be possible for
me but I don't have any kind of performance testing rigs, and this tweak
seems to be mostly dominated by performance concerns. Also I'm kind of busy
with other things right now.

On Wed, Sep 9, 2015 at 12:51 PM, Uwe Schindler 
wrote:

> Hi,
>
> Dawid Weiss and I are both involved in the Apache Lucene project and we
> know the problems with MappedByteBuffer and unmapping. Dawid already
> responded with a source code link to our impl (which needs to use the hacky
> cleaner() approach; also look at the heavy documentation in this class):
> https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java
>
> So we would be very happy to get this issue resolved! The cleaner() hack
> is enabled by default in Lucene if the JVM supports it (so we won't break
> if JIGSAW prevents this, but our *large* users would heavily complain).
>
> >> This is fundamentally about *integrity* of the runtime. It follows there
> >> are security implications, but it’s still fundamentally an integrity
> issue
> >> and guarding an unsafe operation with a Security Manager is
> >> unfortunately an insufficient solution.
> >
> > Right, and just to add that there has been many attempts over the years
> > to find solutions to this issue. I think the closest was atomimcally
> > remapping but that wasn't feasible on all platforms and also didn't free
> > up the address space in a timely manner.
>
> So we should really find a solution here. I was talking with several
> people on various conferences (Rory O'Donnel or Mark Reinhold) and we had
> some ideas how to solve this. My idea how to solve this is explained below
> (I am not a JVM internals or Hotspot guy, so excuse some obviously "wrong"
> assumptions):
>
> Actually there are 2 issues, not only one. The first issue is, as
> mentioned before: you cannot unmap via API. This is needed for many apps,
> including Apache Lucene, for a reason which comes more from "another" bug,
> and this is my issue #2 (see below).
>
> First, unmapping for Lucene is very important at the moment, because we
> operate on the Lucene indexes purely using mmap (see [1]), which may be
> several hundreds of Gigabytes easily. On highly dynamic systems, Lucene
> often maps new files (also very largeones ) and relies on the fact, that
> older, deleted files are unmapped in time (this does not need to be ASAP,
> just "in time"). So we have those 2 "bugs", which force us to unmap:
>
> (1) disk space issues / delete after last close (POSIX) vs. No delete at
> all (Windows)
>
> - disk space: we have seen customers running out of disk space on Lucene,
> because unmapping wasn’t done in time and therefore POSIX with delete on
> last close cannot free the disk space, although the file was already
> deleted. The problem you are seeing on Windows that you cannot delete, is
> therefore worse on Linux, because it is hidden to the user - you cannot
> free the disk space of the deleted file! Lucene creates and deletes files
> all the time while indexing realtime data (e.g. think of Github's very
> dynamic code search index, which is backed by Lucene/Elasticsearch).
> - virtual memory: If you map huge files (several hundreds of Gigabytes)
> and they are not unmapped in time, you may run out of virtual address
> space. This especially affects Windows, because it does not use the full 46
> bits (or like that) of addresses. So effectively you can only map like 4
> Terabytes on Windows. If you have fragmentation of address space this gets
> worse (In Lucene, we map in chunks of 1 GiB because of the signed 32 bit
> integer limit of ByteBuffer, so fragmentation is not our biggest issue).
>
> (2) It takes vry long time until the unmapping actually
> occurs!
>
> This is the real bug! If the garbage collector would clean up the buffers
> asap, we would not need to unmap from user code. In Lucene we just delay
> 

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Peter Levart

Hi Uwe,

As I thought, the problem for some seems to be non-prompt unmapping of 
mapped address space held by otherwise unreachable mapped byte buffers. 
The mapped address space doesn't live in the Java heap and doesn't 
represent a heap memory pressure, so GC doesn't kick-in automatically 
when one would like. One could help by manually triggering GC with 
System.gc() in such situations. The problem is how to detect such 
situations. Direct byte buffers (ByteBuffer.allocateDirect) maintain a 
count of bytes currently allocated and don't allow allocation of native 
memory beyond certain configured limit (-XX:MaxDirectMemorySize=). 
Before throwing OutOfMemoryError, the  ByteBuffer.allocateDirect() 
request tries it's best to free direct memory allocated by otherwise 
unreachable direct ByteBuffers (using System.gc() to trigger GC and 
helping process references).


Would similar approach - configured limit for FileChannel.map()ped 
address space be of any help to Lucene applications? Is it possible to 
estimate the max. amount of address space a particular Lucene 
application may need at any one time so that mapping over such limit 
could be considered an application error?


Regards, Peter

On 09/09/2015 12:51 PM, Uwe Schindler wrote:

Hi,

Dawid Weiss and I are both involved in the Apache Lucene project and we know 
the problems with MappedByteBuffer and unmapping. Dawid already responded with 
a source code link to our impl (which needs to use the hacky cleaner() 
approach; also look at the heavy documentation in this class): 
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java

So we would be very happy to get this issue resolved! The cleaner() hack is 
enabled by default in Lucene if the JVM supports it (so we won't break if 
JIGSAW prevents this, but our *large* users would heavily complain).


This is fundamentally about *integrity* of the runtime. It follows there
are security implications, but it’s still fundamentally an integrity issue
and guarding an unsafe operation with a Security Manager is
unfortunately an insufficient solution.

Right, and just to add that there has been many attempts over the years
to find solutions to this issue. I think the closest was atomimcally
remapping but that wasn't feasible on all platforms and also didn't free
up the address space in a timely manner.

So we should really find a solution here. I was talking with several people on various 
conferences (Rory O'Donnel or Mark Reinhold) and we had some ideas how to solve this. My 
idea how to solve this is explained below (I am not a JVM internals or Hotspot guy, so 
excuse some obviously "wrong" assumptions):

Actually there are 2 issues, not only one. The first issue is, as mentioned before: you 
cannot unmap via API. This is needed for many apps, including Apache Lucene, for a reason 
which comes more from "another" bug, and this is my issue #2 (see below).

First, unmapping for Lucene is very important at the moment, because we operate on the Lucene 
indexes purely using mmap (see [1]), which may be several hundreds of Gigabytes easily. On highly 
dynamic systems, Lucene often maps new files (also very largeones ) and relies on the fact, that 
older, deleted files are unmapped in time (this does not need to be ASAP, just "in 
time"). So we have those 2 "bugs", which force us to unmap:

(1) disk space issues / delete after last close (POSIX) vs. No delete at all 
(Windows)

- disk space: we have seen customers running out of disk space on Lucene, 
because unmapping wasn’t done in time and therefore POSIX with delete on last 
close cannot free the disk space, although the file was already deleted. The 
problem you are seeing on Windows that you cannot delete, is therefore worse on 
Linux, because it is hidden to the user - you cannot free the disk space of the 
deleted file! Lucene creates and deletes files all the time while indexing 
realtime data (e.g. think of Github's very dynamic code search index, which is 
backed by Lucene/Elasticsearch).
- virtual memory: If you map huge files (several hundreds of Gigabytes) and 
they are not unmapped in time, you may run out of virtual address space. This 
especially affects Windows, because it does not use the full 46 bits (or like 
that) of addresses. So effectively you can only map like 4 Terabytes on 
Windows. If you have fragmentation of address space this gets worse (In Lucene, 
we map in chunks of 1 GiB because of the signed 32 bit integer limit of 
ByteBuffer, so fragmentation is not our biggest issue).

(2) It takes vry long time until the unmapping actually occurs!

This is the real bug! If the garbage collector would clean up the buffers asap, 
we would not need to unmap from user code. In Lucene we just delay the file 
delete on Windows, so we are not really affected by the file deletion inability 
(but that would be nice if it could be fixed).

If you look at the usage pattern of 

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Peter Levart



On 09/09/2015 04:56 PM, Dawid Weiss wrote:

I think it would be best to leave to the application to decide and
implement the tracking and also triggering GC at times when it approaches
the limit.

I disagree. The GC -- when and how it is triggered -- should be
transparent to the application. We don't want to manage GC, we want to
(truly) release the resources we allocated (and we know when they are
no longer needed).

What you suggest is essentially managing GC from application level. I
don't think it's the right approach to solve the problem.

Dawid


Hi Dawid,

By wanting to truly release the resources you allocated, you are 
essentially wanting to manage the resources yourself. If you are willing 
to track the active mapped byte buffers manually yourself, then what 
about the following idea:


- you track the number of mapped buffers (or mapped address space) that 
you "know" is active in the application manually.
- you track the number of mapped buffers (or mapped address space) that 
is actually mapped at a particular time (by utilizing an after-unmap 
call-back that would have to be added to MappedByteBuffer API)
- when the difference of those two tracked quantities reaches certain 
amount or percentage, you give a kick to GC to do it's job, as it is 
lagging behind.


I would not call this managing GC, but just hinting GC at the right 
time. The most burden in this approach would be the manual tracking of 
active buffers, but you are willing to do that anyway by wanting to 
manually release the resources. Everything else can be made automatic.



Regards, Peter


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Uwe Schindler
Hi,

> As I thought, the problem for some seems to be non-prompt unmapping of
> mapped address space held by otherwise unreachable mapped byte buffers.
> The mapped address space doesn't live in the Java heap and doesn't
> represent a heap memory pressure, so GC doesn't kick-in automatically
> when one would like. One could help by manually triggering GC with
> System.gc() in such situations. The problem is how to detect such situations.
> Direct byte buffers (ByteBuffer.allocateDirect) maintain a count of bytes
> currently allocated and don't allow allocation of native memory beyond
> certain configured limit (-XX:MaxDirectMemorySize=).
> Before throwing OutOfMemoryError, the  ByteBuffer.allocateDirect()
> request tries it's best to free direct memory allocated by otherwise
> unreachable direct ByteBuffers (using System.gc() to trigger GC and helping
> process references).

FileChannel#map does the same (it tries to map, catches OOM, waits a second and 
tries again). But as described in my earlier mail this does not work as 
expected in newer GC implementations - this is why we see the issues like a JVM 
running for a week or longer without any full GC and then sitting on a Terabyte 
of address/diskspace space before getting unuseable. System#gc() is ignored in 
most environments, because it causes more havoc (full pauses) - especially if a 
full GC is otherwise rarely needed. I think this crazy try-catch-sleep-retry 
code should better be removed from FileChannel#map once the GC algorithms are 
fixed to take care of the heaviness of MappedByteBuffer (my proposal, the 
Annotation @sun.misc.Heavy...) and free it earlier.

In addition, I think Andrew Haley made some good comments about opportunities 
how to solve the problem.

Uwe


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Dawid Weiss
> - you track the number of mapped buffers (or mapped address space) that you
> "know" is active in the application manually.

The problem is you really can't do it on a global, JVM-scale, Peter.
It's enough that the same JVM process starts two isolated class
loaders with Lucene in each and such accounting is no longer
correct... There are also other valid reasons (the ones Uwe mentioned)
which make explicit System.gc() a non-viable option.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Peter Levart



On 09/09/2015 04:21 PM, Peter Levart wrote:

Hi Uwe,

As I thought, the problem for some seems to be non-prompt unmapping of 
mapped address space held by otherwise unreachable mapped byte 
buffers. The mapped address space doesn't live in the Java heap and 
doesn't represent a heap memory pressure, so GC doesn't kick-in 
automatically when one would like. One could help by manually 
triggering GC with System.gc() in such situations. The problem is how 
to detect such situations. Direct byte buffers 
(ByteBuffer.allocateDirect) maintain a count of bytes currently 
allocated and don't allow allocation of native memory beyond certain 
configured limit (-XX:MaxDirectMemorySize=). Before throwing 
OutOfMemoryError, the  ByteBuffer.allocateDirect() request tries it's 
best to free direct memory allocated by otherwise unreachable direct 
ByteBuffers (using System.gc() to trigger GC and helping process 
references).


Would similar approach - configured limit for FileChannel.map()ped 
address space be of any help to Lucene applications? Is it possible to 
estimate the max. amount of address space a particular Lucene 
application may need at any one time so that mapping over such limit 
could be considered an application error?


Perhaps the number of bytes mapped is not always a correct quantity to 
track. Maybe Lucene needs tracking the number of mapped regions or 
something else? I think it would be best to leave to the application to 
decide and implement the tracking and also triggering GC at times when 
it approaches the limit. All that is missing currently from 
MappedByteBuffer API for that purpose is a notification to the 
application after it has been unmapped.


Regards, Peter



Regards, Peter

On 09/09/2015 12:51 PM, Uwe Schindler wrote:

Hi,

Dawid Weiss and I are both involved in the Apache Lucene project and 
we know the problems with MappedByteBuffer and unmapping. Dawid 
already responded with a source code link to our impl (which needs to 
use the hacky cleaner() approach; also look at the heavy 
documentation in this class): 
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java


So we would be very happy to get this issue resolved! The cleaner() 
hack is enabled by default in Lucene if the JVM supports it (so we 
won't break if JIGSAW prevents this, but our *large* users would 
heavily complain).


This is fundamentally about *integrity* of the runtime. It follows 
there
are security implications, but it’s still fundamentally an 
integrity issue

and guarding an unsafe operation with a Security Manager is
unfortunately an insufficient solution.

Right, and just to add that there has been many attempts over the years
to find solutions to this issue. I think the closest was atomimcally
remapping but that wasn't feasible on all platforms and also didn't 
free

up the address space in a timely manner.
So we should really find a solution here. I was talking with several 
people on various conferences (Rory O'Donnel or Mark Reinhold) and we 
had some ideas how to solve this. My idea how to solve this is 
explained below (I am not a JVM internals or Hotspot guy, so excuse 
some obviously "wrong" assumptions):


Actually there are 2 issues, not only one. The first issue is, as 
mentioned before: you cannot unmap via API. This is needed for many 
apps, including Apache Lucene, for a reason which comes more from 
"another" bug, and this is my issue #2 (see below).


First, unmapping for Lucene is very important at the moment, because 
we operate on the Lucene indexes purely using mmap (see [1]), which 
may be several hundreds of Gigabytes easily. On highly dynamic 
systems, Lucene often maps new files (also very largeones ) and 
relies on the fact, that older, deleted files are unmapped in time 
(this does not need to be ASAP, just "in time"). So we have those 2 
"bugs", which force us to unmap:


(1) disk space issues / delete after last close (POSIX) vs. No delete 
at all (Windows)


- disk space: we have seen customers running out of disk space on 
Lucene, because unmapping wasn’t done in time and therefore POSIX 
with delete on last close cannot free the disk space, although the 
file was already deleted. The problem you are seeing on Windows that 
you cannot delete, is therefore worse on Linux, because it is hidden 
to the user - you cannot free the disk space of the deleted file! 
Lucene creates and deletes files all the time while indexing realtime 
data (e.g. think of Github's very dynamic code search index, which is 
backed by Lucene/Elasticsearch).
- virtual memory: If you map huge files (several hundreds of 
Gigabytes) and they are not unmapped in time, you may run out of 
virtual address space. This especially affects Windows, because it 
does not use the full 46 bits (or like that) of addresses. So 
effectively you can only map like 4 Terabytes on Windows. If you have 
fragmentation of address space this gets worse (In Lucene, 

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Mark Miller
It seems less than ideal to count on System.gc to do this as a library
though.

Now the user has to worry about what affect System.gc has on what JVM with
what Garbage Collector and whether or not ExplicitGCInvokesConcurrent was
turned on for the JVM, or...

- Mark

On Wed, Sep 9, 2015 at 11:46 AM Peter Levart  wrote:

>
>
> On 09/09/2015 04:56 PM, Dawid Weiss wrote:
> >> I think it would be best to leave to the application to decide and
> >> implement the tracking and also triggering GC at times when it
> approaches
> >> the limit.
> > I disagree. The GC -- when and how it is triggered -- should be
> > transparent to the application. We don't want to manage GC, we want to
> > (truly) release the resources we allocated (and we know when they are
> > no longer needed).
> >
> > What you suggest is essentially managing GC from application level. I
> > don't think it's the right approach to solve the problem.
> >
> > Dawid
>
> Hi Dawid,
>
> By wanting to truly release the resources you allocated, you are
> essentially wanting to manage the resources yourself. If you are willing
> to track the active mapped byte buffers manually yourself, then what
> about the following idea:
>
> - you track the number of mapped buffers (or mapped address space) that
> you "know" is active in the application manually.
> - you track the number of mapped buffers (or mapped address space) that
> is actually mapped at a particular time (by utilizing an after-unmap
> call-back that would have to be added to MappedByteBuffer API)
> - when the difference of those two tracked quantities reaches certain
> amount or percentage, you give a kick to GC to do it's job, as it is
> lagging behind.
>
> I would not call this managing GC, but just hinting GC at the right
> time. The most burden in this approach would be the manual tracking of
> active buffers, but you are willing to do that anyway by wanting to
> manually release the resources. Everything else can be made automatic.
>
>
> Regards, Peter
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
> --
- Mark
about.me/markrmiller


Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Dawid Weiss
> I think it would be best to leave to the application to decide and
> implement the tracking and also triggering GC at times when it approaches
> the limit.

I disagree. The GC -- when and how it is triggered -- should be
transparent to the application. We don't want to manage GC, we want to
(truly) release the resources we allocated (and we know when they are
no longer needed).

What you suggest is essentially managing GC from application level. I
don't think it's the right approach to solve the problem.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Robert Muir
On Wed, Sep 9, 2015 at 11:46 AM, Peter Levart  wrote:
>
> By wanting to truly release the resources you allocated, you are essentially
> wanting to manage the resources yourself. If you are willing to track the
> active mapped byte buffers manually yourself, then what about the following
> idea:
>

As Uwe mentioned that is probably not truly necessary. If lucene
cannot delete a file, it retries it later periodically until it works.
So if things were unmapped "soonish", for the lucene case things would
be fine I think.

I do realize other apps may not have that infrastructure/luxury...

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Uwe Schindler
Hi,

> As I thought, the problem for some seems to be non-prompt unmapping of
> mapped address space held by otherwise unreachable mapped byte buffers.
> The mapped address space doesn't live in the Java heap and doesn't
> represent a heap memory pressure, so GC doesn't kick-in automatically
> when one would like. One could help by manually triggering GC with
> System.gc() in such situations. The problem is how to detect such situations.

Unfortunately, System#gc() is explicitely disallowed in most environments 
(because it performs a full GC): You should not use explicit GCs, because this 
hurts low-latency applications like search engines. So disabling explicit GCs 
should be done for such installations, e.g. external libraries tend to call 
System#gc() for no reason...

> Direct byte buffers (ByteBuffer.allocateDirect) maintain a count of bytes
> currently allocated and don't allow allocation of native memory beyond
> certain configured limit (-XX:MaxDirectMemorySize=).
> Before throwing OutOfMemoryError, the  ByteBuffer.allocateDirect()
> request tries it's best to free direct memory allocated by otherwise
> unreachable direct ByteBuffers (using System.gc() to trigger GC and helping
> process references).

This code breaks if you disallow explicit GC. As Dawid says, I don't think the 
application should take care about GC.

> Would similar approach - configured limit for FileChannel.map()ped address
> space be of any help to Lucene applications? Is it possible to estimate the
> max. amount of address space a particular Lucene application may need at
> any one time so that mapping over such limit could be considered an
> application error?

This does not scale with index sizes going into the hundreds of Gigabytes. We 
cannot force the users to calculate their index size before using it and set 
corresponding JVM settings.

Uwe

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org