subject:"Re\: Suggested fix for JDK\-4724038 \(Add unmap method to MappedByteBuffer\)"

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Uwe Schindler

Hi,

Dawid Weiss and I are both involved in the Apache Lucene project and we know 
the problems with MappedByteBuffer and unmapping. Dawid already responded with 
a source code link to our impl (which needs to use the hacky cleaner() 
approach; also look at the heavy documentation in this class): 
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java

So we would be very happy to get this issue resolved! The cleaner() hack is 
enabled by default in Lucene if the JVM supports it (so we won't break if 
JIGSAW prevents this, but our *large* users would heavily complain).

>> This is fundamentally about *integrity* of the runtime. It follows there
>> are security implications, but it’s still fundamentally an integrity issue
>> and guarding an unsafe operation with a Security Manager is
>> unfortunately an insufficient solution.
>
> Right, and just to add that there has been many attempts over the years 
> to find solutions to this issue. I think the closest was atomimcally 
> remapping but that wasn't feasible on all platforms and also didn't free 
> up the address space in a timely manner.

So we should really find a solution here. I was talking with several people on 
various conferences (Rory O'Donnel or Mark Reinhold) and we had some ideas how 
to solve this. My idea how to solve this is explained below (I am not a JVM 
internals or Hotspot guy, so excuse some obviously "wrong" assumptions):

Actually there are 2 issues, not only one. The first issue is, as mentioned 
before: you cannot unmap via API. This is needed for many apps, including 
Apache Lucene, for a reason which comes more from "another" bug, and this is my 
issue #2 (see below).

First, unmapping for Lucene is very important at the moment, because we operate 
on the Lucene indexes purely using mmap (see [1]), which may be several 
hundreds of Gigabytes easily. On highly dynamic systems, Lucene often maps new 
files (also very largeones ) and relies on the fact, that older, deleted files 
are unmapped in time (this does not need to be ASAP, just "in time"). So we 
have those 2 "bugs", which force us to unmap:

(1) disk space issues / delete after last close (POSIX) vs. No delete at all 
(Windows)

- disk space: we have seen customers running out of disk space on Lucene, 
because unmapping wasn’t done in time and therefore POSIX with delete on last 
close cannot free the disk space, although the file was already deleted. The 
problem you are seeing on Windows that you cannot delete, is therefore worse on 
Linux, because it is hidden to the user - you cannot free the disk space of the 
deleted file! Lucene creates and deletes files all the time while indexing 
realtime data (e.g. think of Github's very dynamic code search index, which is 
backed by Lucene/Elasticsearch).
- virtual memory: If you map huge files (several hundreds of Gigabytes) and 
they are not unmapped in time, you may run out of virtual address space. This 
especially affects Windows, because it does not use the full 46 bits (or like 
that) of addresses. So effectively you can only map like 4 Terabytes on 
Windows. If you have fragmentation of address space this gets worse (In Lucene, 
we map in chunks of 1 GiB because of the signed 32 bit integer limit of 
ByteBuffer, so fragmentation is not our biggest issue).

(2) It takes vry long time until the unmapping actually occurs!

This is the real bug! If the garbage collector would clean up the buffers asap, 
we would not need to unmap from user code. In Lucene we just delay the file 
delete on Windows, so we are not really affected by the file deletion inability 
(but that would be nice if it could be fixed).

If you look at the usage pattern of those huge, mapped files, you will see why 
they are in most cases *never ever* unmapped automatically: Lucene maps very 
large files and uses them for longer time. So the MappedByteBuffer object gets 
migrated to older generations on the heap. Garbage collection there happens, of 
course, very delayed. That would not be the most problematic part, but there is 
a second issue: The MappedByteBuffer object is just a very small object (in 
heap size measurement: just an object header and a few pointers), so the 
garbage collector does not see it as heavy! It's just a very small like 30 
bytes object instance. Why should the Garbage collector clean it up? And in 
fact it will almost never do this! The garbage collector cannot see that our 30 
bytes object instance "sits" on something like 300 Gigabytes of virtual memory 
and disk space!

One proposal to fix this would be to add something like an internal OpenJDK 
Java Annotation or similar where you can "mark" heavy objects, so Garbage 
collector could free them by preference (similar to sun.misc.Contended).

For the Apache Lucene team,
Uwe

[1] http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

-
Uwe Schindler
uschind...@apache.org 
ASF

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Mike Hearn

Thanks for the contribution, Uwe.

So far I think I like Andrew's suggestion of a guard page the most.
Unmapping the guard page boils down to a kind of thread-local variable
without the actual cost of reading anything (in theory). So by
write-protecting the guard page and then unmapping the file, and letting
the GC clean up the guard page later, the same semantics as today are
preserved and there's no race.

I guess, although it's ugly, a system property could control whether the
NIO implementation returns an ordinary MappedByteBuffer or a new subclass,
the UnmappableMappedByteBuffer. HotSpot would then be responsible for
removing the overhead of the virtual calls, as normal. If a customer finds
that the guard page write is causing performance issues for them, they
could use the system property to get the hold behaviour back and the unmap
call would throw.

But it sounds like users with extreme VMM needs, like Lucene, would find
this a performance win rather than a loss.

I admit that I'm not a JDK dev. Writing such a patch would be possible for
me but I don't have any kind of performance testing rigs, and this tweak
seems to be mostly dominated by performance concerns. Also I'm kind of busy
with other things right now.

On Wed, Sep 9, 2015 at 12:51 PM, Uwe Schindler 
wrote:

> Hi,
>
> Dawid Weiss and I are both involved in the Apache Lucene project and we
> know the problems with MappedByteBuffer and unmapping. Dawid already
> responded with a source code link to our impl (which needs to use the hacky
> cleaner() approach; also look at the heavy documentation in this class):
> https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java
>
> So we would be very happy to get this issue resolved! The cleaner() hack
> is enabled by default in Lucene if the JVM supports it (so we won't break
> if JIGSAW prevents this, but our *large* users would heavily complain).
>
> >> This is fundamentally about *integrity* of the runtime. It follows there
> >> are security implications, but it’s still fundamentally an integrity
> issue
> >> and guarding an unsafe operation with a Security Manager is
> >> unfortunately an insufficient solution.
> >
> > Right, and just to add that there has been many attempts over the years
> > to find solutions to this issue. I think the closest was atomimcally
> > remapping but that wasn't feasible on all platforms and also didn't free
> > up the address space in a timely manner.
>
> So we should really find a solution here. I was talking with several
> people on various conferences (Rory O'Donnel or Mark Reinhold) and we had
> some ideas how to solve this. My idea how to solve this is explained below
> (I am not a JVM internals or Hotspot guy, so excuse some obviously "wrong"
> assumptions):
>
> Actually there are 2 issues, not only one. The first issue is, as
> mentioned before: you cannot unmap via API. This is needed for many apps,
> including Apache Lucene, for a reason which comes more from "another" bug,
> and this is my issue #2 (see below).
>
> First, unmapping for Lucene is very important at the moment, because we
> operate on the Lucene indexes purely using mmap (see [1]), which may be
> several hundreds of Gigabytes easily. On highly dynamic systems, Lucene
> often maps new files (also very largeones ) and relies on the fact, that
> older, deleted files are unmapped in time (this does not need to be ASAP,
> just "in time"). So we have those 2 "bugs", which force us to unmap:
>
> (1) disk space issues / delete after last close (POSIX) vs. No delete at
> all (Windows)
>
> - disk space: we have seen customers running out of disk space on Lucene,
> because unmapping wasn’t done in time and therefore POSIX with delete on
> last close cannot free the disk space, although the file was already
> deleted. The problem you are seeing on Windows that you cannot delete, is
> therefore worse on Linux, because it is hidden to the user - you cannot
> free the disk space of the deleted file! Lucene creates and deletes files
> all the time while indexing realtime data (e.g. think of Github's very
> dynamic code search index, which is backed by Lucene/Elasticsearch).
> - virtual memory: If you map huge files (several hundreds of Gigabytes)
> and they are not unmapped in time, you may run out of virtual address
> space. This especially affects Windows, because it does not use the full 46
> bits (or like that) of addresses. So effectively you can only map like 4
> Terabytes on Windows. If you have fragmentation of address space this gets
> worse (In Lucene, we map in chunks of 1 GiB because of the signed 32 bit
> integer limit of ByteBuffer, so fragmentation is not our biggest issue).
>
> (2) It takes vry long time until the unmapping actually
> occurs!
>
> This is the real bug! If the garbage collector would clean up the buffers
> asap, we would not need to unmap from user code. In Lucene we just delay
>

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Peter Levart

Hi Uwe,

As I thought, the problem for some seems to be non-prompt unmapping of
mapped address space held by otherwise unreachable mapped byte buffers.
The mapped address space doesn't live in the Java heap and doesn't
represent a heap memory pressure, so GC doesn't kick-in automatically
when one would like. One could help by manually triggering GC with
System.gc() in such situations. The problem is how to detect such
situations. Direct byte buffers (ByteBuffer.allocateDirect) maintain a
count of bytes currently allocated and don't allow allocation of native
memory beyond certain configured limit (-XX:MaxDirectMemorySize=).
Before throwing OutOfMemoryError, the ByteBuffer.allocateDirect()
request tries it's best to free direct memory allocated by otherwise
unreachable direct ByteBuffers (using System.gc() to trigger GC and
helping process references).

Would similar approach - configured limit for FileChannel.map()ped
address space be of any help to Lucene applications? Is it possible to
estimate the max. amount of address space a particular Lucene
application may need at any one time so that mapping over such limit
could be considered an application error?

Regards, Peter

On 09/09/2015 12:51 PM, Uwe Schindler wrote:

Hi,

Dawid Weiss and I are both involved in the Apache Lucene project and we know
the problems with MappedByteBuffer and unmapping. Dawid already responded with
a source code link to our impl (which needs to use the hacky cleaner()
approach; also look at the heavy documentation in this class):
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java

So we would be very happy to get this issue resolved! The cleaner() hack is
enabled by default in Lucene if the JVM supports it (so we won't break if
JIGSAW prevents this, but our *large* users would heavily complain).

This is fundamentally about *integrity* of the runtime. It follows there
are security implications, but it’s still fundamentally an integrity issue
and guarding an unsafe operation with a Security Manager is
unfortunately an insufficient solution.

Right, and just to add that there has been many attempts over the years
to find solutions to this issue. I think the closest was atomimcally
remapping but that wasn't feasible on all platforms and also didn't free
up the address space in a timely manner.

So we should really find a solution here. I was talking with several people on various
conferences (Rory O'Donnel or Mark Reinhold) and we had some ideas how to solve this. My
idea how to solve this is explained below (I am not a JVM internals or Hotspot guy, so
excuse some obviously "wrong" assumptions):

Actually there are 2 issues, not only one. The first issue is, as mentioned before: you
cannot unmap via API. This is needed for many apps, including Apache Lucene, for a reason
which comes more from "another" bug, and this is my issue #2 (see below).

First, unmapping for Lucene is very important at the moment, because we operate on the Lucene
indexes purely using mmap (see [1]), which may be several hundreds of Gigabytes easily. On highly
dynamic systems, Lucene often maps new files (also very largeones ) and relies on the fact, that
older, deleted files are unmapped in time (this does not need to be ASAP, just "in
time"). So we have those 2 "bugs", which force us to unmap:

(1) disk space issues / delete after last close (POSIX) vs. No delete at all
(Windows)

- disk space: we have seen customers running out of disk space on Lucene,
because unmapping wasn’t done in time and therefore POSIX with delete on last
close cannot free the disk space, although the file was already deleted. The
problem you are seeing on Windows that you cannot delete, is therefore worse on
Linux, because it is hidden to the user - you cannot free the disk space of the
deleted file! Lucene creates and deletes files all the time while indexing
realtime data (e.g. think of Github's very dynamic code search index, which is
backed by Lucene/Elasticsearch).
- virtual memory: If you map huge files (several hundreds of Gigabytes) and
they are not unmapped in time, you may run out of virtual address space. This
especially affects Windows, because it does not use the full 46 bits (or like
that) of addresses. So effectively you can only map like 4 Terabytes on
Windows. If you have fragmentation of address space this gets worse (In Lucene,
we map in chunks of 1 GiB because of the signed 32 bit integer limit of
ByteBuffer, so fragmentation is not our biggest issue).

(2) It takes vry long time until the unmapping actually occurs!

This is the real bug! If the garbage collector would clean up the buffers asap,
we would not need to unmap from user code. In Lucene we just delay the file
delete on Windows, so we are not really affected by the file deletion inability
(but that would be nice if it could be fixed).

If you look at the usage pattern of

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Peter Levart




On 09/09/2015 04:56 PM, Dawid Weiss wrote:

I think it would be best to leave to the application to decide and
implement the tracking and also triggering GC at times when it approaches
the limit.

I disagree. The GC -- when and how it is triggered -- should be
transparent to the application. We don't want to manage GC, we want to
(truly) release the resources we allocated (and we know when they are
no longer needed).

What you suggest is essentially managing GC from application level. I
don't think it's the right approach to solve the problem.

Dawid


Hi Dawid,

By wanting to truly release the resources you allocated, you are 
essentially wanting to manage the resources yourself. If you are willing 
to track the active mapped byte buffers manually yourself, then what 
about the following idea:


- you track the number of mapped buffers (or mapped address space) that 
you "know" is active in the application manually.
- you track the number of mapped buffers (or mapped address space) that 
is actually mapped at a particular time (by utilizing an after-unmap 
call-back that would have to be added to MappedByteBuffer API)
- when the difference of those two tracked quantities reaches certain 
amount or percentage, you give a kick to GC to do it's job, as it is 
lagging behind.


I would not call this managing GC, but just hinting GC at the right 
time. The most burden in this approach would be the manual tracking of 
active buffers, but you are willing to do that anyway by wanting to 
manually release the resources. Everything else can be made automatic.



Regards, Peter


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Uwe Schindler

Hi,

> As I thought, the problem for some seems to be non-prompt unmapping of
> mapped address space held by otherwise unreachable mapped byte buffers.
> The mapped address space doesn't live in the Java heap and doesn't
> represent a heap memory pressure, so GC doesn't kick-in automatically
> when one would like. One could help by manually triggering GC with
> System.gc() in such situations. The problem is how to detect such situations.
> Direct byte buffers (ByteBuffer.allocateDirect) maintain a count of bytes
> currently allocated and don't allow allocation of native memory beyond
> certain configured limit (-XX:MaxDirectMemorySize=).
> Before throwing OutOfMemoryError, the  ByteBuffer.allocateDirect()
> request tries it's best to free direct memory allocated by otherwise
> unreachable direct ByteBuffers (using System.gc() to trigger GC and helping
> process references).

FileChannel#map does the same (it tries to map, catches OOM, waits a second and 
tries again). But as described in my earlier mail this does not work as 
expected in newer GC implementations - this is why we see the issues like a JVM 
running for a week or longer without any full GC and then sitting on a Terabyte 
of address/diskspace space before getting unuseable. System#gc() is ignored in 
most environments, because it causes more havoc (full pauses) - especially if a 
full GC is otherwise rarely needed. I think this crazy try-catch-sleep-retry 
code should better be removed from FileChannel#map once the GC algorithms are 
fixed to take care of the heaviness of MappedByteBuffer (my proposal, the 
Annotation @sun.misc.Heavy...) and free it earlier.

In addition, I think Andrew Haley made some good comments about opportunities 
how to solve the problem.

Uwe


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Dawid Weiss

> - you track the number of mapped buffers (or mapped address space) that you
> "know" is active in the application manually.

The problem is you really can't do it on a global, JVM-scale, Peter.
It's enough that the same JVM process starts two isolated class
loaders with Lucene in each and such accounting is no longer
correct... There are also other valid reasons (the ones Uwe mentioned)
which make explicit System.gc() a non-viable option.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Peter Levart

On 09/09/2015 04:21 PM, Peter Levart wrote:

Hi Uwe,

As I thought, the problem for some seems to be non-prompt unmapping of
mapped address space held by otherwise unreachable mapped byte
buffers. The mapped address space doesn't live in the Java heap and
doesn't represent a heap memory pressure, so GC doesn't kick-in
automatically when one would like. One could help by manually
triggering GC with System.gc() in such situations. The problem is how
to detect such situations. Direct byte buffers
(ByteBuffer.allocateDirect) maintain a count of bytes currently
allocated and don't allow allocation of native memory beyond certain
configured limit (-XX:MaxDirectMemorySize=). Before throwing
OutOfMemoryError, the ByteBuffer.allocateDirect() request tries it's
best to free direct memory allocated by otherwise unreachable direct
ByteBuffers (using System.gc() to trigger GC and helping process
references).

Perhaps the number of bytes mapped is not always a correct quantity to
track. Maybe Lucene needs tracking the number of mapped regions or
something else? I think it would be best to leave to the application to
decide and implement the tracking and also triggering GC at times when
it approaches the limit. All that is missing currently from
MappedByteBuffer API for that purpose is a notification to the
application after it has been unmapped.

Regards, Peter

On 09/09/2015 12:51 PM, Uwe Schindler wrote:

Hi,

Dawid Weiss and I are both involved in the Apache Lucene project and
we know the problems with MappedByteBuffer and unmapping. Dawid
already responded with a source code link to our impl (which needs to
use the hacky cleaner() approach; also look at the heavy
documentation in this class):
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java

So we would be very happy to get this issue resolved! The cleaner()
hack is enabled by default in Lucene if the JVM supports it (so we
won't break if JIGSAW prevents this, but our *large* users would
heavily complain).

This is fundamentally about *integrity* of the runtime. It follows
there
are security implications, but it’s still fundamentally an
integrity issue

and guarding an unsafe operation with a Security Manager is
unfortunately an insufficient solution.

up the address space in a timely manner.
So we should really find a solution here. I was talking with several
people on various conferences (Rory O'Donnel or Mark Reinhold) and we
had some ideas how to solve this. My idea how to solve this is
explained below (I am not a JVM internals or Hotspot guy, so excuse
some obviously "wrong" assumptions):

Actually there are 2 issues, not only one. The first issue is, as
mentioned before: you cannot unmap via API. This is needed for many
apps, including Apache Lucene, for a reason which comes more from
"another" bug, and this is my issue #2 (see below).

First, unmapping for Lucene is very important at the moment, because
we operate on the Lucene indexes purely using mmap (see [1]), which
may be several hundreds of Gigabytes easily. On highly dynamic
systems, Lucene often maps new files (also very largeones ) and
relies on the fact, that older, deleted files are unmapped in time
(this does not need to be ASAP, just "in time"). So we have those 2
"bugs", which force us to unmap:

(1) disk space issues / delete after last close (POSIX) vs. No delete
at all (Windows)

- disk space: we have seen customers running out of disk space on
Lucene, because unmapping wasn’t done in time and therefore POSIX
with delete on last close cannot free the disk space, although the
file was already deleted. The problem you are seeing on Windows that
you cannot delete, is therefore worse on Linux, because it is hidden
to the user - you cannot free the disk space of the deleted file!
Lucene creates and deletes files all the time while indexing realtime
data (e.g. think of Github's very dynamic code search index, which is
backed by Lucene/Elasticsearch).
- virtual memory: If you map huge files (several hundreds of
Gigabytes) and they are not unmapped in time, you may run out of
virtual address space. This especially affects Windows, because it
does not use the full 46 bits (or like that) of addresses. So
effectively you can only map like 4 Terabytes on Windows. If you have
fragmentation of address space this gets worse (In Lucene,

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Mark Miller

It seems less than ideal to count on System.gc to do this as a library
though.

Now the user has to worry about what affect System.gc has on what JVM with
what Garbage Collector and whether or not ExplicitGCInvokesConcurrent was
turned on for the JVM, or...

- Mark

On Wed, Sep 9, 2015 at 11:46 AM Peter Levart  wrote:

>
>
> On 09/09/2015 04:56 PM, Dawid Weiss wrote:
> >> I think it would be best to leave to the application to decide and
> >> implement the tracking and also triggering GC at times when it
> approaches
> >> the limit.
> > I disagree. The GC -- when and how it is triggered -- should be
> > transparent to the application. We don't want to manage GC, we want to
> > (truly) release the resources we allocated (and we know when they are
> > no longer needed).
> >
> > What you suggest is essentially managing GC from application level. I
> > don't think it's the right approach to solve the problem.
> >
> > Dawid
>
> Hi Dawid,
>
> By wanting to truly release the resources you allocated, you are
> essentially wanting to manage the resources yourself. If you are willing
> to track the active mapped byte buffers manually yourself, then what
> about the following idea:
>
> - you track the number of mapped buffers (or mapped address space) that
> you "know" is active in the application manually.
> - you track the number of mapped buffers (or mapped address space) that
> is actually mapped at a particular time (by utilizing an after-unmap
> call-back that would have to be added to MappedByteBuffer API)
> - when the difference of those two tracked quantities reaches certain
> amount or percentage, you give a kick to GC to do it's job, as it is
> lagging behind.
>
> I would not call this managing GC, but just hinting GC at the right
> time. The most burden in this approach would be the manual tracking of
> active buffers, but you are willing to do that anyway by wanting to
> manually release the resources. Everything else can be made automatic.
>
>
> Regards, Peter
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
> --
- Mark
about.me/markrmiller

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Dawid Weiss

> I think it would be best to leave to the application to decide and
> implement the tracking and also triggering GC at times when it approaches
> the limit.

I disagree. The GC -- when and how it is triggered -- should be
transparent to the application. We don't want to manage GC, we want to
(truly) release the resources we allocated (and we know when they are
no longer needed).

What you suggest is essentially managing GC from application level. I
don't think it's the right approach to solve the problem.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Robert Muir

On Wed, Sep 9, 2015 at 11:46 AM, Peter Levart  wrote:
>
> By wanting to truly release the resources you allocated, you are essentially
> wanting to manage the resources yourself. If you are willing to track the
> active mapped byte buffers manually yourself, then what about the following
> idea:
>

As Uwe mentioned that is probably not truly necessary. If lucene
cannot delete a file, it retries it later periodically until it works.
So if things were unmapped "soonish", for the lucene case things would
be fine I think.

I do realize other apps may not have that infrastructure/luxury...

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

2015-09-09 Thread Uwe Schindler

Hi,

> As I thought, the problem for some seems to be non-prompt unmapping of
> mapped address space held by otherwise unreachable mapped byte buffers.
> The mapped address space doesn't live in the Java heap and doesn't
> represent a heap memory pressure, so GC doesn't kick-in automatically
> when one would like. One could help by manually triggering GC with
> System.gc() in such situations. The problem is how to detect such situations.

Unfortunately, System#gc() is explicitely disallowed in most environments 
(because it performs a full GC): You should not use explicit GCs, because this 
hurts low-latency applications like search engines. So disabling explicit GCs 
should be done for such installations, e.g. external libraries tend to call 
System#gc() for no reason...

> Direct byte buffers (ByteBuffer.allocateDirect) maintain a count of bytes
> currently allocated and don't allow allocation of native memory beyond
> certain configured limit (-XX:MaxDirectMemorySize=).
> Before throwing OutOfMemoryError, the  ByteBuffer.allocateDirect()
> request tries it's best to free direct memory allocated by otherwise
> unreachable direct ByteBuffers (using System.gc() to trigger GC and helping
> process references).

This code breaks if you disallow explicit GC. As Dawid says, I don't think the 
application should take care about GC.

> Would similar approach - configured limit for FileChannel.map()ped address
> space be of any help to Lucene applications? Is it possible to estimate the
> max. amount of address space a particular Lucene application may need at
> any one time so that mapping over such limit could be considered an
> application error?

This does not scale with index sizes going into the hundreds of Gigabytes. We 
cannot force the users to calculate their index size before using it and set 
corresponding JVM settings.

Uwe

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

RE: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

Re: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

RE: Suggested fix for JDK-4724038 (Add unmap method to MappedByteBuffer)

11 matches

Site Navigation

Mail list logo

Footer information