Re: Jena TDB OOME GC overhead limit exceeded

2016-07-28 Thread Andy Seaborne

On 27/07/16 21:16, Dick Murray wrote:

The MWE in the previous email will work with any even line text file and
will produce the odd [B values. I can't see anywhere obvious where the non
Jena code is creating them, just odd that there's so many of them!


Could be - it's just not a size that reminds me of anything.



OK, that knocks the DBB idea on the head!

I'll set the mapped symbol and play with batch sizes. Can the map location
be configured or will it go after the TDB location?


See BlockMgrJournal

(really, look at the code for answers - some of this hasn't been visited 
for a while!)


It uses a file in the tmp director from java.io.tmpdir



Is "TDB2" what we discussed some time back? I'm happy to provide some
testing on that as I've ~2000 files to ETL via an automated process each
producing 3-4M quads...


Thanks for the offer.  I need to find time to clean it up but I've used 
it with Fuseki and loaded large files (sized up until I got bored rather 
than some limit I hit). It is Apache licensed - it is not part of the 
Jena project.


Andy



Thanks Dick.

On 27 Jul 2016 20:10, "Andy Seaborne"  wrote:


On 27/07/16 13:19, Dick Murray wrote:


;-) Yes I did. But then I switched to the actual files I need to import

and

they produce ~3.5M triples...

Using normal Jena 3.1 (i.e. no special context symbols set) the commit
after 100k triples works to import the file 10 times with the [B varying
between ~2Mb and ~4Mb. I'm currently testing a 20 instance pass.

A batched commit works for this bulk load because if it fails after a

batch

commit I can remove the graph.

For my understanding... TDB is holding the triples/block/journal in heap
until commit is called? But this doesn't account for the [B not being
cleared after a commit of 3.5M triples. It takes another pass plus ~2M
uncommited triples before I get an OOME.



And the [B have a strange average size.  A block is 8K.


Digging around and there are some references made to the

DirectByteBuffers

causing issues. IBM


https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/excessive_native_memory_usage_by_directbytebuffers?lang=en

links the problem to;

Essentially the problem boils down to either:

   1. There are too many DBBs being allocated (or they are too large),
   and/or
   2. The DBBs are not being cleared up quickly enough.



TDB does not use DirectByteBuffers unless you ask it to.  They are not [B.

.hasArray is false.
.array() throws UnsupportedOperationException.

(Grep the code for "allocateDirect" and trace back the use of single use

of BufferAllocatorDirect to the journal in "direct" mode.)


I can believe that if activated, that the GC recycling would be slow. The

code ought to recycle them (beause yuo can't explicitly free them for some
weird reason - they little more than malloc).


But they are not being used unless you ask for them.

Journal entries are 8K unless they are commit records which are about 20

bytes (I think).






and recommends using -XX:MaxDirectMemorySize=1024m to poke the GC via
System.gc(). Not sure if GC1C helps because of it's new heap model...

Would it be possible to get Jena to write it's uncommitted triples to

disk

and then commit them to the TDB?



Set TDB.transactionJournalWriteBlockMode to "mapped". That uses a disk

file.




Ok it's slower than RAM but until they are
committed only one thread has visibility anyway? Could direct that at a
different disk as well...

Just before hitting send I'm at pass 13 and the [B maxed at just over 4Gb
before dropping back to 2Gb.



Or use TDB2 :-)

It has no problem loading 100m+ triples in a single transaction (the

space per transaction is fixed at about 80 bytes of transaction - disk
writes happen during the transaction not to a roll-forward journal). And it
should be a bit faster because writes happen once.


Just need to find time to clean it up ...

Andy




Dick.



On 27 July 2016 at 11:47, Andy Seaborne  wrote:


On 27/07/16 11:22, Dick Murray wrote:


Hello.

Something doesn't add up here... I've run repeated tests with the
following
MWE on a 16GB machine with -Xms8g -Xmx8g and the I always get an OOME.

What I don't understand is the size of [B increases with each pass

until

the OOME is thrown. The exact same process is run 5 times with a new

graph

for each set of triples.

There are ~3.5M triples added within the transaction from a file which

is

a
"simple" text based file (30Mb) which is read in line pairs.



Err - you said 200k quads earlier!

Set

TransactionManager.QueueBatchSize=0 ;

and break the load into small units for now and see if that helps.

One experiment would be to write the output to disk and load from a
program that only does the TDB part.

Andy




I've tested sequential loads of other text files (i.e. file x *5) and
other
text files loaded sequentally (i.e. file x, file y, file ...) and the

same

result is exhibited.

If I reduce -Xmx to 6g it 

Re: Jena TDB OOME GC overhead limit exceeded

2016-07-27 Thread Dick Murray
The MWE in the previous email will work with any even line text file and
will produce the odd [B values. I can't see anywhere obvious where the non
Jena code is creating them, just odd that there's so many of them!

OK, that knocks the DBB idea on the head!

I'll set the mapped symbol and play with batch sizes. Can the map location
be configured or will it go after the TDB location?

Is "TDB2" what we discussed some time back? I'm happy to provide some
testing on that as I've ~2000 files to ETL via an automated process each
producing 3-4M quads...

Thanks Dick.

On 27 Jul 2016 20:10, "Andy Seaborne"  wrote:
>
> On 27/07/16 13:19, Dick Murray wrote:
>>
>> ;-) Yes I did. But then I switched to the actual files I need to import
and
>> they produce ~3.5M triples...
>>
>> Using normal Jena 3.1 (i.e. no special context symbols set) the commit
>> after 100k triples works to import the file 10 times with the [B varying
>> between ~2Mb and ~4Mb. I'm currently testing a 20 instance pass.
>>
>> A batched commit works for this bulk load because if it fails after a
batch
>> commit I can remove the graph.
>>
>> For my understanding... TDB is holding the triples/block/journal in heap
>> until commit is called? But this doesn't account for the [B not being
>> cleared after a commit of 3.5M triples. It takes another pass plus ~2M
>> uncommited triples before I get an OOME.
>
>
> And the [B have a strange average size.  A block is 8K.
>
>> Digging around and there are some references made to the
DirectByteBuffers
>> causing issues. IBM
>>
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/excessive_native_memory_usage_by_directbytebuffers?lang=en
>> links the problem to;
>>
>> Essentially the problem boils down to either:
>>
>>1. There are too many DBBs being allocated (or they are too large),
>>and/or
>>2. The DBBs are not being cleared up quickly enough.
>>
>
> TDB does not use DirectByteBuffers unless you ask it to.  They are not [B.
>
> .hasArray is false.
> .array() throws UnsupportedOperationException.
>
> (Grep the code for "allocateDirect" and trace back the use of single use
of BufferAllocatorDirect to the journal in "direct" mode.)
>
> I can believe that if activated, that the GC recycling would be slow. The
code ought to recycle them (beause yuo can't explicitly free them for some
weird reason - they little more than malloc).
>
> But they are not being used unless you ask for them.
>
> Journal entries are 8K unless they are commit records which are about 20
bytes (I think).
>
>
>
>>
>> and recommends using -XX:MaxDirectMemorySize=1024m to poke the GC via
>> System.gc(). Not sure if GC1C helps because of it's new heap model...
>>
>> Would it be possible to get Jena to write it's uncommitted triples to
disk
>> and then commit them to the TDB?
>
>
> Set TDB.transactionJournalWriteBlockMode to "mapped". That uses a disk
file.
>
>
>> Ok it's slower than RAM but until they are
>> committed only one thread has visibility anyway? Could direct that at a
>> different disk as well...
>>
>> Just before hitting send I'm at pass 13 and the [B maxed at just over 4Gb
>> before dropping back to 2Gb.
>
>
> Or use TDB2 :-)
>
> It has no problem loading 100m+ triples in a single transaction (the
space per transaction is fixed at about 80 bytes of transaction - disk
writes happen during the transaction not to a roll-forward journal). And it
should be a bit faster because writes happen once.
>
> Just need to find time to clean it up ...
>
> Andy
>
>
>>
>> Dick.
>>
>>
>>
>> On 27 July 2016 at 11:47, Andy Seaborne  wrote:
>>
>>> On 27/07/16 11:22, Dick Murray wrote:
>>>
 Hello.

 Something doesn't add up here... I've run repeated tests with the
 following
 MWE on a 16GB machine with -Xms8g -Xmx8g and the I always get an OOME.

 What I don't understand is the size of [B increases with each pass
until
 the OOME is thrown. The exact same process is run 5 times with a new
graph
 for each set of triples.

 There are ~3.5M triples added within the transaction from a file which
is
 a
 "simple" text based file (30Mb) which is read in line pairs.

>>>
>>> Err - you said 200k quads earlier!
>>>
>>> Set
>>>
>>> TransactionManager.QueueBatchSize=0 ;
>>>
>>> and break the load into small units for now and see if that helps.
>>>
>>> One experiment would be to write the output to disk and load from a
>>> program that only does the TDB part.
>>>
>>> Andy
>>>
>>>
>>>
 I've tested sequential loads of other text files (i.e. file x *5) and
 other
 text files loaded sequentally (i.e. file x, file y, file ...) and the
same
 result is exhibited.

 If I reduce -Xmx to 6g it will fail earlier.

 Changing the GC using -XX:+UseGC1C doesn't alter the outcome.

 I'm running on Ubuntu 16.04 with Java 1.8 and I can replicate this on
 Centos 7 with Java 1.8.

 Any ideas?

 Regards 

Re: Jena TDB OOME GC overhead limit exceeded

2016-07-27 Thread Andy Seaborne

On 27/07/16 13:19, Dick Murray wrote:

;-) Yes I did. But then I switched to the actual files I need to import and
they produce ~3.5M triples...

Using normal Jena 3.1 (i.e. no special context symbols set) the commit
after 100k triples works to import the file 10 times with the [B varying
between ~2Mb and ~4Mb. I'm currently testing a 20 instance pass.

A batched commit works for this bulk load because if it fails after a batch
commit I can remove the graph.

For my understanding... TDB is holding the triples/block/journal in heap
until commit is called? But this doesn't account for the [B not being
cleared after a commit of 3.5M triples. It takes another pass plus ~2M
uncommited triples before I get an OOME.


And the [B have a strange average size.  A block is 8K.


Digging around and there are some references made to the DirectByteBuffers
causing issues. IBM
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/excessive_native_memory_usage_by_directbytebuffers?lang=en
links the problem to;

Essentially the problem boils down to either:

   1. There are too many DBBs being allocated (or they are too large),
   and/or
   2. The DBBs are not being cleared up quickly enough.



TDB does not use DirectByteBuffers unless you ask it to.  They are not [B.

.hasArray is false.
.array() throws UnsupportedOperationException.

(Grep the code for "allocateDirect" and trace back the use of single use 
of BufferAllocatorDirect to the journal in "direct" mode.)


I can believe that if activated, that the GC recycling would be slow. 
The code ought to recycle them (beause yuo can't explicitly free them 
for some weird reason - they little more than malloc).


But they are not being used unless you ask for them.

Journal entries are 8K unless they are commit records which are about 20 
bytes (I think).





and recommends using -XX:MaxDirectMemorySize=1024m to poke the GC via
System.gc(). Not sure if GC1C helps because of it's new heap model...

Would it be possible to get Jena to write it's uncommitted triples to disk
and then commit them to the TDB?


Set TDB.transactionJournalWriteBlockMode to "mapped". That uses a disk file.


Ok it's slower than RAM but until they are
committed only one thread has visibility anyway? Could direct that at a
different disk as well...

Just before hitting send I'm at pass 13 and the [B maxed at just over 4Gb
before dropping back to 2Gb.


Or use TDB2 :-)

It has no problem loading 100m+ triples in a single transaction (the 
space per transaction is fixed at about 80 bytes of transaction - disk 
writes happen during the transaction not to a roll-forward journal). 
And it should be a bit faster because writes happen once.


Just need to find time to clean it up ...

Andy



Dick.



On 27 July 2016 at 11:47, Andy Seaborne  wrote:


On 27/07/16 11:22, Dick Murray wrote:


Hello.

Something doesn't add up here... I've run repeated tests with the
following
MWE on a 16GB machine with -Xms8g -Xmx8g and the I always get an OOME.

What I don't understand is the size of [B increases with each pass until
the OOME is thrown. The exact same process is run 5 times with a new graph
for each set of triples.

There are ~3.5M triples added within the transaction from a file which is
a
"simple" text based file (30Mb) which is read in line pairs.



Err - you said 200k quads earlier!

Set

TransactionManager.QueueBatchSize=0 ;

and break the load into small units for now and see if that helps.

One experiment would be to write the output to disk and load from a
program that only does the TDB part.

Andy




I've tested sequential loads of other text files (i.e. file x *5) and
other
text files loaded sequentally (i.e. file x, file y, file ...) and the same
result is exhibited.

If I reduce -Xmx to 6g it will fail earlier.

Changing the GC using -XX:+UseGC1C doesn't alter the outcome.

I'm running on Ubuntu 16.04 with Java 1.8 and I can replicate this on
Centos 7 with Java 1.8.

Any ideas?

Regards Dick.











Re: Jena TDB OOME GC overhead limit exceeded

2016-07-27 Thread Dick Murray
;-) Yes I did. But then I switched to the actual files I need to import and
they produce ~3.5M triples...

Using normal Jena 3.1 (i.e. no special context symbols set) the commit
after 100k triples works to import the file 10 times with the [B varying
between ~2Mb and ~4Mb. I'm currently testing a 20 instance pass.

A batched commit works for this bulk load because if it fails after a batch
commit I can remove the graph.

For my understanding... TDB is holding the triples/block/journal in heap
until commit is called? But this doesn't account for the [B not being
cleared after a commit of 3.5M triples. It takes another pass plus ~2M
uncommited triples before I get an OOME.

Digging around and there are some references made to the DirectByteBuffers
causing issues. IBM
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/excessive_native_memory_usage_by_directbytebuffers?lang=en
links the problem to;

Essentially the problem boils down to either:

   1. There are too many DBBs being allocated (or they are too large),
   and/or
   2. The DBBs are not being cleared up quickly enough.


and recommends using -XX:MaxDirectMemorySize=1024m to poke the GC via
System.gc(). Not sure if GC1C helps because of it's new heap model...

Would it be possible to get Jena to write it's uncommitted triples to disk
and then commit them to the TDB? Ok it's slower than RAM but until they are
committed only one thread has visibility anyway? Could direct that at a
different disk as well...

Just before hitting send I'm at pass 13 and the [B maxed at just over 4Gb
before dropping back to 2Gb.

Dick.



On 27 July 2016 at 11:47, Andy Seaborne  wrote:

> On 27/07/16 11:22, Dick Murray wrote:
>
>> Hello.
>>
>> Something doesn't add up here... I've run repeated tests with the
>> following
>> MWE on a 16GB machine with -Xms8g -Xmx8g and the I always get an OOME.
>>
>> What I don't understand is the size of [B increases with each pass until
>> the OOME is thrown. The exact same process is run 5 times with a new graph
>> for each set of triples.
>>
>> There are ~3.5M triples added within the transaction from a file which is
>> a
>> "simple" text based file (30Mb) which is read in line pairs.
>>
>
> Err - you said 200k quads earlier!
>
> Set
>
> TransactionManager.QueueBatchSize=0 ;
>
> and break the load into small units for now and see if that helps.
>
> One experiment would be to write the output to disk and load from a
> program that only does the TDB part.
>
> Andy
>
>
>
>> I've tested sequential loads of other text files (i.e. file x *5) and
>> other
>> text files loaded sequentally (i.e. file x, file y, file ...) and the same
>> result is exhibited.
>>
>> If I reduce -Xmx to 6g it will fail earlier.
>>
>> Changing the GC using -XX:+UseGC1C doesn't alter the outcome.
>>
>> I'm running on Ubuntu 16.04 with Java 1.8 and I can replicate this on
>> Centos 7 with Java 1.8.
>>
>> Any ideas?
>>
>> Regards Dick.
>>
>>
>
>


Re: Jena TDB OOME GC overhead limit exceeded

2016-07-27 Thread Andy Seaborne

On 27/07/16 11:22, Dick Murray wrote:

Hello.

Something doesn't add up here... I've run repeated tests with the following
MWE on a 16GB machine with -Xms8g -Xmx8g and the I always get an OOME.

What I don't understand is the size of [B increases with each pass until
the OOME is thrown. The exact same process is run 5 times with a new graph
for each set of triples.

There are ~3.5M triples added within the transaction from a file which is a
"simple" text based file (30Mb) which is read in line pairs.


Err - you said 200k quads earlier!

Set

TransactionManager.QueueBatchSize=0 ;

and break the load into small units for now and see if that helps.

One experiment would be to write the output to disk and load from a 
program that only does the TDB part.


Andy



I've tested sequential loads of other text files (i.e. file x *5) and other
text files loaded sequentally (i.e. file x, file y, file ...) and the same
result is exhibited.

If I reduce -Xmx to 6g it will fail earlier.

Changing the GC using -XX:+UseGC1C doesn't alter the outcome.

I'm running on Ubuntu 16.04 with Java 1.8 and I can replicate this on
Centos 7 with Java 1.8.

Any ideas?

Regards Dick.






Re: Jena TDB OOME GC overhead limit exceeded

2016-07-27 Thread Dick Murray
Hello.

Something doesn't add up here... I've run repeated tests with the following
MWE on a 16GB machine with -Xms8g -Xmx8g and the I always get an OOME.

What I don't understand is the size of [B increases with each pass until
the OOME is thrown. The exact same process is run 5 times with a new graph
for each set of triples.

There are ~3.5M triples added within the transaction from a file which is a
"simple" text based file (30Mb) which is read in line pairs.

I've tested sequential loads of other text files (i.e. file x *5) and other
text files loaded sequentally (i.e. file x, file y, file ...) and the same
result is exhibited.

If I reduce -Xmx to 6g it will fail earlier.

Changing the GC using -XX:+UseGC1C doesn't alter the outcome.

I'm running on Ubuntu 16.04 with Java 1.8 and I can replicate this on
Centos 7 with Java 1.8.

Any ideas?

Regards Dick.


dick@Dick-M3800:~$ uname -a
Linux Dick-M3800 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
dick@Dick-M3800:~$ java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
dick@Dick-M3800:~$


Complete code example below which shows this error, followed by output of
said program. This will attempt to exec jps and jmap...

package org.iungo.dxf;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.time.Instant;
import java.util.LinkedList;
import java.util.List;
import java.util.Objects;
import java.util.UUID;

import org.apache.jena.graph.Node;
import org.apache.jena.graph.NodeFactory;
import org.apache.jena.query.ReadWrite;
import org.apache.jena.sparql.core.DatasetGraph;
import org.apache.jena.tdb.TDBFactory;

public class DXFStream {
static String pid = pid();
public static void main(final String[] args) {

Objects.requireNonNull(pid);
final String dg =
"/home/dick/eclipse-jee-neon/git/iungo-core/iungo-core/tdb-dataset-node";
final String dxf =
"file:///home/dick/eclipse-jee-neon/git/iungo-ssl/iungo-ssl/test/resources/0005_-SSL-XXX-XX-DR-U-0200.dxf";
try {
mem();
log("open");
final DatasetGraph datasetGraph = TDBFactory.createDatasetGraph(dg);
for (int i = 1; i <= 5; i++) {
log("Pass " + String.valueOf(i));
try {
mem();
log("begin WRITE");
datasetGraph.begin(ReadWrite.WRITE);
Node g = NodeFactory.createURI(UUID.randomUUID().toString());
Node pk = NodeFactory.createURI("k");
Node pv = NodeFactory.createURI("v");
final BufferedReader bufferedReader = new BufferedReader(new
InputStreamReader(new URL(dxf).openStream(), StandardCharsets.UTF_8));
String key = null;
String value = null;
Integer count = 0;
while ((key = bufferedReader.readLine()) != null) {
key = key.trim();
if ((value = bufferedReader.readLine()) == null) {
throw new UnsupportedOperationException();
}
count++;
datasetGraph.add(g, NodeFactory.createURI(count.toString()), pk,
NodeFactory.createLiteral(key));
datasetGraph.add(g, NodeFactory.createURI(count.toString()), pv,
NodeFactory.createLiteral(value));
}
bufferedReader.close();
mem();
log(count * 2 + " commit");
datasetGraph.commit();
} catch (final Throwable throwable) {
System.err.println(throwable);
mem();
log("abort");
datasetGraph.abort();
} finally {
mem();
log("end");
datasetGraph.end();
}
}
mem();
log("close");
datasetGraph.close();
} catch (final Throwable throwable) {
System.err.println(throwable);
}
log("exit");
}
public static String pid() {
log("pid");
try {
String pid = null;
final Process p = Runtime.getRuntime().exec(new String[]{"jps", "-m"});
final BufferedReader pbr = new BufferedReader(new
InputStreamReader(p.getInputStream()));
String line = null;
while ((line = pbr.readLine()) != null) {
final String name = line.substring(line.indexOf(" ")).trim();
if (name.equals("DXFStream")) {
pid = line.substring(0, line.indexOf(" "));
log(line);
break;
}
}
p.waitFor();
pbr.close();
log(pid);
return pid;
} catch (final Throwable throwable) {
System.err.println(throwable);
return null;
}
}
public static void jmap() {
log("jmap");
try {
// JVM doesn't like | so use a list then stream and limit to 13...
final Process p = Runtime.getRuntime().exec(new String[]{"jmap", "-histo",
pid});
List lines = new LinkedList<>();
String line = null;
final BufferedReader pbr = new BufferedReader(new
InputStreamReader(p.getInputStream()));
while ((line = pbr.readLine()) != null) {
lines.add(line);
}
p.waitFor();
pbr.close();
lines
.stream()
.limit(13)
.forEach(text -> System.out.println(text));
} catch (final Throwable throwable) {
System.err.println(throwable);
}
}
public static void mem() {
log(Runtime.getRuntime().freeMemory() + " " +
Runtime.getRuntime().totalMemory() + " " +
Runtime.getRuntime().maxMemory());
jmap();
}
public static void log(final String text) {
System.out.println(Instant.now() + " " + text);
}
}


Output;

2016-07-27T09:47:28.013Z pid
2016-07-27T09:47:28.265Z 24777 DXFStream

Re: Jena TDB OOME GC overhead limit exceeded

2016-07-26 Thread Andy Seaborne

On 26/07/16 16:51, Dick Murray wrote:

Ok, I set that option and I get different OOME from the direct buffer
memory.


You now have:

> java.lang.OutOfMemoryError:
> Direct buffer memory

So that means the direct memory space has run out, not the heap.

You can increase direct memory but that isn't getting to the root cause.

200k quads isn't that many, though the JVM arg "-Xmx4g" is pretty small 
for largish transactions.  But you are doing it 5 times.  I calculated 
maybe 300M-1G of temp space but the calculation is a bit "it depends". 
But that is 25% of the heap allocated already.


Try to switch off the commit amalgamation.  Commits get aggregated so 
many one triple insert transactions don't cause vast overheads.


TransactionManager.QueueBatchSize=0 ;

In addition, the unusual size (for TDB) of many byte[] and other objects 
suggests that the non Jena code is using a significant slice of space as 
well.


(without setting TDB.transactionJournalWriteBlockMode).

Not setting TDB.transactionJournalWriteBlockMode is better but an 
alternative is "mapped" mode instead of "direct".


Andy



I then changed the GC using -XX:+UseG1GC (which still throws the OOME) and
I don't get why it's throwing the OOME.!?

 [Eden: 2372.0M(2372.0M)->0.0B(1036.0M) Survivors: 84.0M->196.0M Heap:
2979.4M(4096.0M)->720.4M(4096.0M)]

Unless I'm mistaken I don't have a G1 Heap problem?


...
[GC pause (G1 Evacuation Pause) (young), 0.1580047 secs]
   [Parallel Time: 149.8 ms, GC Workers: 8]
  [GC Worker Start (ms): Min: 463499.3, Avg: 463499.4, Max: 463499.5,
Diff: 0.1]
  [Ext Root Scanning (ms): Min: 0.2, Avg: 0.2, Max: 0.3, Diff: 0.2,
Sum: 1.8]
  [Update RS (ms): Min: 12.5, Avg: 12.6, Max: 12.7, Diff: 0.2, Sum:
100.9]
 [Processed Buffers: Min: 13, Avg: 15.6, Max: 18, Diff: 5, Sum: 125]
  [Scan RS (ms): Min: 18.7, Avg: 18.8, Max: 18.8, Diff: 0.2, Sum: 150.2]
  [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
  [Object Copy (ms): Min: 116.7, Avg: 116.7, Max: 116.8, Diff: 0.1,
Sum: 933.8]
  [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
 [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 8]
  [GC Worker Other (ms): Min: 0.0, Avg: 1.2, Max: 1.4, Diff: 1.3, Sum:
9.3]
  [GC Worker Total (ms): Min: 148.3, Avg: 149.5, Max: 149.8, Diff: 1.4,
Sum: 1196.1]
  [GC Worker End (ms): Min: 463647.8, Avg: 463648.9, Max: 463649.1,
Diff: 1.3]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.8 ms]
   [Other: 7.3 ms]
  [Choose CSet: 0.0 ms]
  [Ref Proc: 4.7 ms]
  [Ref Enq: 0.0 ms]
  [Redirty Cards: 0.4 ms]
  [Humongous Register: 0.1 ms]
  [Humongous Reclaim: 0.0 ms]
  [Free CSet: 1.8 ms]
   [Eden: 2372.0M(2372.0M)->0.0B(1036.0M) Survivors: 84.0M->196.0M Heap:
2979.4M(4096.0M)->720.4M(4096.0M)]
 [Times: user=1.13 sys=0.01, real=0.16 secs]
WARN 2016-07-26T15:26:10.196Z RootLogger Finalizer Return value not called
[Logger [Logger [org.iungo.logger.SystemLogger] Name
[org.iungo.dataset.tdb.node.TDBDatasetNode#TDB] Level [INFO] Counts
[9/9/0]] Name [bulkload#6] Timer [Started [2016:07:26 16:26:05] Stopped
[2016:07:26 16:26:05] Elapsed [0ms]]]
[Full GC (System.gc())  849M->327M(4096M), 1.4194434 secs]
   [Eden: 130.0M(1036.0M)->0.0B(2456.0M) Survivors: 196.0M->0.0B Heap:
849.4M(4096.0M)->327.7M(4096.0M)], [Metaspace: 18671K->18671K(1067008K)]
 [Times: user=1.80 sys=0.13, real=1.42 secs]
WARN 2016-07-26T15:26:12.001Z RootLogger Finalizer Return value not called
[Logger [Logger [org.iungo.logger.SystemLogger] Name
[org.iungo.dataset.bulkload.DXFDatasetBulkload#2] Level [ALL] Counts
[26/26/0]] Name [bulkload#5] Timer [Started [2016:07:26 16:24:03] Stopped
[2016:07:26 16:25:08] Elapsed [64979ms]]]
WARN 2016-07-26T15:26:12.002Z RootLogger Finalizer Return value not called
[Logger [Logger [org.iungo.logger.SystemLogger] Name
[org.iungo.dataset.bulkload.DXFDatasetBulkload#1] Level [ALL] Counts
[26/26/0]] Name [bulkload#3] Timer [Started [2016:07:26 16:21:45] Stopped
[2016:07:26 16:22:47] Elapsed [62554ms]]]
CALL_CATCH 2016-07-26T15:26:12.516Z
org.iungo.dataset.bulkload.DXFDatasetBulkload#3 bulkload#7
java.lang.OutOfMemoryError:
Direct buffer memory
java.nio.Bits.reserveMemory(Bits.java:693)
java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
org.apache.jena.tdb.base.file.BufferAllocatorDirect.allocate(BufferAllocatorDirect.java:31)
org.apache.jena.tdb.transaction.BlockMgrJournal.replicate(BlockMgrJournal.java:381)
org.apache.jena.tdb.transaction.BlockMgrJournal.allocate(BlockMgrJournal.java:149)
org.apache.jena.tdb.base.page.PageBlockMgr.create(PageBlockMgr.java:51)
org.apache.jena.tdb.index.bplustree.BPTreeRecordsMgr.create(BPTreeRecordsMgr.java:79)
org.apache.jena.tdb.index.bplustree.BPTreeRecords.create(BPTreeRecords.java:207)
org.apache.jena.tdb.index.bplustree.BPTreeRecords.split(BPTreeRecords.java:181)

Re: Jena TDB OOME GC overhead limit exceeded

2016-07-26 Thread Dick Murray
Ok, I set that option and I get different OOME from the direct buffer
memory.

I then changed the GC using -XX:+UseG1GC (which still throws the OOME) and
I don't get why it's throwing the OOME.!?

 [Eden: 2372.0M(2372.0M)->0.0B(1036.0M) Survivors: 84.0M->196.0M Heap:
2979.4M(4096.0M)->720.4M(4096.0M)]

Unless I'm mistaken I don't have a G1 Heap problem?


...
[GC pause (G1 Evacuation Pause) (young), 0.1580047 secs]
   [Parallel Time: 149.8 ms, GC Workers: 8]
  [GC Worker Start (ms): Min: 463499.3, Avg: 463499.4, Max: 463499.5,
Diff: 0.1]
  [Ext Root Scanning (ms): Min: 0.2, Avg: 0.2, Max: 0.3, Diff: 0.2,
Sum: 1.8]
  [Update RS (ms): Min: 12.5, Avg: 12.6, Max: 12.7, Diff: 0.2, Sum:
100.9]
 [Processed Buffers: Min: 13, Avg: 15.6, Max: 18, Diff: 5, Sum: 125]
  [Scan RS (ms): Min: 18.7, Avg: 18.8, Max: 18.8, Diff: 0.2, Sum: 150.2]
  [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
  [Object Copy (ms): Min: 116.7, Avg: 116.7, Max: 116.8, Diff: 0.1,
Sum: 933.8]
  [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
 [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 8]
  [GC Worker Other (ms): Min: 0.0, Avg: 1.2, Max: 1.4, Diff: 1.3, Sum:
9.3]
  [GC Worker Total (ms): Min: 148.3, Avg: 149.5, Max: 149.8, Diff: 1.4,
Sum: 1196.1]
  [GC Worker End (ms): Min: 463647.8, Avg: 463648.9, Max: 463649.1,
Diff: 1.3]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.8 ms]
   [Other: 7.3 ms]
  [Choose CSet: 0.0 ms]
  [Ref Proc: 4.7 ms]
  [Ref Enq: 0.0 ms]
  [Redirty Cards: 0.4 ms]
  [Humongous Register: 0.1 ms]
  [Humongous Reclaim: 0.0 ms]
  [Free CSet: 1.8 ms]
   [Eden: 2372.0M(2372.0M)->0.0B(1036.0M) Survivors: 84.0M->196.0M Heap:
2979.4M(4096.0M)->720.4M(4096.0M)]
 [Times: user=1.13 sys=0.01, real=0.16 secs]
WARN 2016-07-26T15:26:10.196Z RootLogger Finalizer Return value not called
[Logger [Logger [org.iungo.logger.SystemLogger] Name
[org.iungo.dataset.tdb.node.TDBDatasetNode#TDB] Level [INFO] Counts
[9/9/0]] Name [bulkload#6] Timer [Started [2016:07:26 16:26:05] Stopped
[2016:07:26 16:26:05] Elapsed [0ms]]]
[Full GC (System.gc())  849M->327M(4096M), 1.4194434 secs]
   [Eden: 130.0M(1036.0M)->0.0B(2456.0M) Survivors: 196.0M->0.0B Heap:
849.4M(4096.0M)->327.7M(4096.0M)], [Metaspace: 18671K->18671K(1067008K)]
 [Times: user=1.80 sys=0.13, real=1.42 secs]
WARN 2016-07-26T15:26:12.001Z RootLogger Finalizer Return value not called
[Logger [Logger [org.iungo.logger.SystemLogger] Name
[org.iungo.dataset.bulkload.DXFDatasetBulkload#2] Level [ALL] Counts
[26/26/0]] Name [bulkload#5] Timer [Started [2016:07:26 16:24:03] Stopped
[2016:07:26 16:25:08] Elapsed [64979ms]]]
WARN 2016-07-26T15:26:12.002Z RootLogger Finalizer Return value not called
[Logger [Logger [org.iungo.logger.SystemLogger] Name
[org.iungo.dataset.bulkload.DXFDatasetBulkload#1] Level [ALL] Counts
[26/26/0]] Name [bulkload#3] Timer [Started [2016:07:26 16:21:45] Stopped
[2016:07:26 16:22:47] Elapsed [62554ms]]]
CALL_CATCH 2016-07-26T15:26:12.516Z
org.iungo.dataset.bulkload.DXFDatasetBulkload#3 bulkload#7
java.lang.OutOfMemoryError:
Direct buffer memory
java.nio.Bits.reserveMemory(Bits.java:693)
java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
org.apache.jena.tdb.base.file.BufferAllocatorDirect.allocate(BufferAllocatorDirect.java:31)
org.apache.jena.tdb.transaction.BlockMgrJournal.replicate(BlockMgrJournal.java:381)
org.apache.jena.tdb.transaction.BlockMgrJournal.allocate(BlockMgrJournal.java:149)
org.apache.jena.tdb.base.page.PageBlockMgr.create(PageBlockMgr.java:51)
org.apache.jena.tdb.index.bplustree.BPTreeRecordsMgr.create(BPTreeRecordsMgr.java:79)
org.apache.jena.tdb.index.bplustree.BPTreeRecords.create(BPTreeRecords.java:207)
org.apache.jena.tdb.index.bplustree.BPTreeRecords.split(BPTreeRecords.java:181)
org.apache.jena.tdb.index.bplustree.BPTreeNode.split(BPTreeNode.java:518)
org.apache.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:454)
org.apache.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:467)
org.apache.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:467)
org.apache.jena.tdb.index.bplustree.BPTreeNode.insert(BPTreeNode.java:211)
org.apache.jena.tdb.index.bplustree.BPlusTree.addAndReturnOld(BPlusTree.java:326)
org.apache.jena.tdb.index.bplustree.BPlusTree.add(BPlusTree.java:318)
org.apache.jena.tdb.store.tupletable.TupleIndexRecord.performAdd(TupleIndexRecord.java:60)
org.apache.jena.tdb.store.tupletable.TupleIndexBase.add(TupleIndexBase.java:62)
org.apache.jena.tdb.store.tupletable.TupleTable.add(TupleTable.java:95)
org.apache.jena.tdb.store.nodetupletable.NodeTupleTableConcrete.addRow(NodeTupleTableConcrete.java:88)
org.apache.jena.tdb.store.QuadTable.add(QuadTable.java:60)
org.apache.jena.tdb.store.DatasetGraphTDB.addToNamedGraph(DatasetGraphTDB.java:98)

Re: Jena TDB OOME GC overhead limit exceeded

2016-07-26 Thread Andy Seaborne

To build clean locally do the following:

at the top level: not jena-tdb

mvn clean install -Pbootstrap
mvn install -Pdev

(or "mvn clean install" but that builds and tests a lot more)

Andy


On 26/07/16 14:25, Andy Seaborne wrote:

On 26/07/16 14:11, Dick Murray wrote:

Hi.

I'll set that and run the process again.

As an aside I just pulled Master and TDB won't compile because it's can't
find MultiSet? Are there notes on getting the Jena GIT into Eclipse? I
want
to put a count on the BufferAllocatorMem to see what it's doing. I've
put a
break point on but gave up on F8 after counting to 500

Dick.


Make sure you have all the dependencies successfully resolved with
 mvn -o dependency:tree.

The Apache snapshot repo was having a bad day earlier and Multiset is
from org.apache.jena:jena-shaded-guava:jar

Andy



On 26 July 2016 at 11:05, Andy Seaborne  wrote:


On 26/07/16 10:51, Dick Murray wrote:


Hi.

Where do you set "transactionJournalWriteBlockMode" please?



We don't - its off by default.

It's a symbol you can set in the global context.
TDB.transactionJournalWriteBlockMode

It is the only place that triggers DirectByteBuffers in TDB which I
see in
your jmap output.



Would you expect to see a large number of [B heap entries in a 3.1 TDB?



Probably not, particularly not retained ones.

Looking at the average sizes:


  1:132187  636210296  [B

Average size: 4812.9

  1:   1148534 1420727464  [B

Average size: 1237.0

  1:   1377821 2328657400  [B

Average size: 1690.101544395099218258394958

  1:333123 2285460248  [B

Average size: 6860.7

  1:333123 2285460248  [B

Average size: 6860.7

  1:333123 2285460248  [B

Average size: 6860.7

  1:934984 3083070024  [B

Average size: 3297.5

  1:   1067937 3321067472  [B

Average size: 3109.8

  1:581039 3615795256  [B

Average size: 6223.0

Does "6860.7" suggest anything?

It's not a unit in TDB that I recognize.  Maybe the data has that in
some
way?

Andy



Dick.

On 26 July 2016 at 10:39, Andy Seaborne  wrote:

Dick,


The report is embedded in your application setup with a lot of
"org.iungo.dataset.bulkload"

Just because the OOME occurs in TDB does not mean that the space is
consumed in TDB - there may be a bigger memory hog elsewhere.

Could you produce an RDF example?

Maybe that file, already converted to RDF, and loaded with tdbloader?

If TDB is using DiretcByteBuffersm have you set
"transactionJournalWriteBlockMode" to "direct"?

You need to increase the direct memory space, not the heap.

Andy


On 26/07/16 10:14, Dick Murray wrote:

Hi.


I've got a repeatable problem with Jena 3.1 when performing a bulk
load.

The bulk load converts a file into ~200k quads and adds them to a TDB
instance within a normal begin write, add quads and commit. Initially
this
completes in 30-40 seconds, However if I repeat the process (with the
same
file) on the 5th iteration I get a OOME exception. As I'm
importing the
same file into different graphs I would expect the DAT file to
stay the
same size after the first process and just the index files to grow.

There are no other process running against the TDB whilst this
process
is
run.

Using jmap the [B grows with each process until finally I get the
exception.

If I increase the Xmx the OOME occurs later.

Any ideas?

I've included details below, including jmap output which shows the
heap
being used and the JVM output which shows the GC (Allocation Failure)
entries transiting to Full GC (Economics) entries...

Regards Dick.

JVM args -Xms2g -Xmx4g -XX:+PrintGCDetails

-javaagent:/home/dick/eclipse-jee-neon/opt/classmexer-0_03/classmexer.jar


dick@Dick-M3800:~$ uname -a
Linux Dick-M3800 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13
00:07:12 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
dick@Dick-M3800:~$ java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build
1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
dick@Dick-M3800:~$


Output from jmap

dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:132187  636210296  [B
   2:500242   31189608  [C
   3:533380   25602240
 org.apache.jena.ext.com
.google.common.cache.LocalCache$StrongAccessEntry
   4:468220   18728800  org.kabeja.math.Point3D
   5:349351   16768848  org.kabeja.entities.Vertex
   6:  1654   16374552  [I
   7:445589   16219672  [Ljava.lang.Object;
dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:   1148534 1420727464  [B
   2:   5961841  344220520  [C
   3:   1335412   85466368  java.nio.DirectByteBuffer
   4:   3453219   

Re: Jena TDB OOME GC overhead limit exceeded

2016-07-26 Thread Andy Seaborne

On 26/07/16 14:11, Dick Murray wrote:

Hi.

I'll set that and run the process again.

As an aside I just pulled Master and TDB won't compile because it's can't
find MultiSet? Are there notes on getting the Jena GIT into Eclipse? I want
to put a count on the BufferAllocatorMem to see what it's doing. I've put a
break point on but gave up on F8 after counting to 500

Dick.


Make sure you have all the dependencies successfully resolved with
 mvn -o dependency:tree.

The Apache snapshot repo was having a bad day earlier and Multiset is 
from org.apache.jena:jena-shaded-guava:jar


Andy



On 26 July 2016 at 11:05, Andy Seaborne  wrote:


On 26/07/16 10:51, Dick Murray wrote:


Hi.

Where do you set "transactionJournalWriteBlockMode" please?



We don't - its off by default.

It's a symbol you can set in the global context.
TDB.transactionJournalWriteBlockMode

It is the only place that triggers DirectByteBuffers in TDB which I see in
your jmap output.



Would you expect to see a large number of [B heap entries in a 3.1 TDB?



Probably not, particularly not retained ones.

Looking at the average sizes:


  1:132187  636210296  [B

Average size: 4812.9

  1:   1148534 1420727464  [B

Average size: 1237.0

  1:   1377821 2328657400  [B

Average size: 1690.101544395099218258394958

  1:333123 2285460248  [B

Average size: 6860.7

  1:333123 2285460248  [B

Average size: 6860.7

  1:333123 2285460248  [B

Average size: 6860.7

  1:934984 3083070024  [B

Average size: 3297.5

  1:   1067937 3321067472  [B

Average size: 3109.8

  1:581039 3615795256  [B

Average size: 6223.0

Does "6860.7" suggest anything?

It's not a unit in TDB that I recognize.  Maybe the data has that in some
way?

Andy



Dick.

On 26 July 2016 at 10:39, Andy Seaborne  wrote:

Dick,


The report is embedded in your application setup with a lot of
"org.iungo.dataset.bulkload"

Just because the OOME occurs in TDB does not mean that the space is
consumed in TDB - there may be a bigger memory hog elsewhere.

Could you produce an RDF example?

Maybe that file, already converted to RDF, and loaded with tdbloader?

If TDB is using DiretcByteBuffersm have you set
"transactionJournalWriteBlockMode" to "direct"?

You need to increase the direct memory space, not the heap.

Andy


On 26/07/16 10:14, Dick Murray wrote:

Hi.


I've got a repeatable problem with Jena 3.1 when performing a bulk load.

The bulk load converts a file into ~200k quads and adds them to a TDB
instance within a normal begin write, add quads and commit. Initially
this
completes in 30-40 seconds, However if I repeat the process (with the
same
file) on the 5th iteration I get a OOME exception. As I'm importing the
same file into different graphs I would expect the DAT file to stay the
same size after the first process and just the index files to grow.

There are no other process running against the TDB whilst this process
is
run.

Using jmap the [B grows with each process until finally I get the
exception.

If I increase the Xmx the OOME occurs later.

Any ideas?

I've included details below, including jmap output which shows the heap
being used and the JVM output which shows the GC (Allocation Failure)
entries transiting to Full GC (Economics) entries...

Regards Dick.

JVM args -Xms2g -Xmx4g -XX:+PrintGCDetails

-javaagent:/home/dick/eclipse-jee-neon/opt/classmexer-0_03/classmexer.jar

dick@Dick-M3800:~$ uname -a
Linux Dick-M3800 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
dick@Dick-M3800:~$ java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build
1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
dick@Dick-M3800:~$


Output from jmap

dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:132187  636210296  [B
   2:500242   31189608  [C
   3:533380   25602240
 org.apache.jena.ext.com
.google.common.cache.LocalCache$StrongAccessEntry
   4:468220   18728800  org.kabeja.math.Point3D
   5:349351   16768848  org.kabeja.entities.Vertex
   6:  1654   16374552  [I
   7:445589   16219672  [Ljava.lang.Object;
dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:   1148534 1420727464  [B
   2:   5961841  344220520  [C
   3:   1335412   85466368  java.nio.DirectByteBuffer
   4:   3453219   82877256  java.lang.String
   5:573585   65399360  [I
   6:   1261244   60539712
 org.apache.jena.ext.com
.google.common.cache.LocalCache$StrongAccessEntry
   7:945955   36365560  [Ljava.lang.Object;

Re: Jena TDB OOME GC overhead limit exceeded

2016-07-26 Thread Andy Seaborne

On 26/07/16 10:51, Dick Murray wrote:

Hi.

Where do you set "transactionJournalWriteBlockMode" please?


We don't - its off by default.

It's a symbol you can set in the global context. 
TDB.transactionJournalWriteBlockMode


It is the only place that triggers DirectByteBuffers in TDB which I see 
in your jmap output.




Would you expect to see a large number of [B heap entries in a 3.1 TDB?


Probably not, particularly not retained ones.

Looking at the average sizes:

>>>   1:132187  636210296  [B
Average size: 4812.9
>>>   1:   1148534 1420727464  [B
Average size: 1237.0
>>>   1:   1377821 2328657400  [B
Average size: 1690.101544395099218258394958
>>>   1:333123 2285460248  [B
Average size: 6860.7
>>>   1:333123 2285460248  [B
Average size: 6860.7
>>>   1:333123 2285460248  [B
Average size: 6860.7
>>>   1:934984 3083070024  [B
Average size: 3297.5
>>>   1:   1067937 3321067472  [B
Average size: 3109.8
>>>   1:581039 3615795256  [B
Average size: 6223.0

Does "6860.7" suggest anything?

It's not a unit in TDB that I recognize.  Maybe the data has that in 
some way?


Andy



Dick.

On 26 July 2016 at 10:39, Andy Seaborne  wrote:


Dick,

The report is embedded in your application setup with a lot of
"org.iungo.dataset.bulkload"

Just because the OOME occurs in TDB does not mean that the space is
consumed in TDB - there may be a bigger memory hog elsewhere.

Could you produce an RDF example?

Maybe that file, already converted to RDF, and loaded with tdbloader?

If TDB is using DiretcByteBuffersm have you set
"transactionJournalWriteBlockMode" to "direct"?

You need to increase the direct memory space, not the heap.

Andy


On 26/07/16 10:14, Dick Murray wrote:


Hi.

I've got a repeatable problem with Jena 3.1 when performing a bulk load.

The bulk load converts a file into ~200k quads and adds them to a TDB
instance within a normal begin write, add quads and commit. Initially this
completes in 30-40 seconds, However if I repeat the process (with the same
file) on the 5th iteration I get a OOME exception. As I'm importing the
same file into different graphs I would expect the DAT file to stay the
same size after the first process and just the index files to grow.

There are no other process running against the TDB whilst this process is
run.

Using jmap the [B grows with each process until finally I get the
exception.

If I increase the Xmx the OOME occurs later.

Any ideas?

I've included details below, including jmap output which shows the heap
being used and the JVM output which shows the GC (Allocation Failure)
entries transiting to Full GC (Economics) entries...

Regards Dick.

JVM args -Xms2g -Xmx4g -XX:+PrintGCDetails
-javaagent:/home/dick/eclipse-jee-neon/opt/classmexer-0_03/classmexer.jar

dick@Dick-M3800:~$ uname -a
Linux Dick-M3800 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
dick@Dick-M3800:~$ java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
dick@Dick-M3800:~$


Output from jmap

dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:132187  636210296  [B
   2:500242   31189608  [C
   3:533380   25602240
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   4:468220   18728800  org.kabeja.math.Point3D
   5:349351   16768848  org.kabeja.entities.Vertex
   6:  1654   16374552  [I
   7:445589   16219672  [Ljava.lang.Object;
dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:   1148534 1420727464  [B
   2:   5961841  344220520  [C
   3:   1335412   85466368  java.nio.DirectByteBuffer
   4:   3453219   82877256  java.lang.String
   5:573585   65399360  [I
   6:   1261244   60539712
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   7:945955   36365560  [Ljava.lang.Object;
dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:   1377821 2328657400  [B
   2:   7566951  434495472  [C
   3:   1717606  109926784  java.nio.DirectByteBuffer
   4:   4339997  104159928  java.lang.String
   5:749581   75578568  [I
   6:   1485127   71286096
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   7:   1089303   42230696  [Ljava.lang.Object;
dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name

Re: Jena TDB OOME GC overhead limit exceeded

2016-07-26 Thread Dick Murray
Hi.

Where do you set "transactionJournalWriteBlockMode" please?

Would you expect to see a large number of [B heap entries in a 3.1 TDB?

Dick.

On 26 July 2016 at 10:39, Andy Seaborne  wrote:

> Dick,
>
> The report is embedded in your application setup with a lot of
> "org.iungo.dataset.bulkload"
>
> Just because the OOME occurs in TDB does not mean that the space is
> consumed in TDB - there may be a bigger memory hog elsewhere.
>
> Could you produce an RDF example?
>
> Maybe that file, already converted to RDF, and loaded with tdbloader?
>
> If TDB is using DiretcByteBuffersm have you set
> "transactionJournalWriteBlockMode" to "direct"?
>
> You need to increase the direct memory space, not the heap.
>
> Andy
>
>
> On 26/07/16 10:14, Dick Murray wrote:
>
>> Hi.
>>
>> I've got a repeatable problem with Jena 3.1 when performing a bulk load.
>>
>> The bulk load converts a file into ~200k quads and adds them to a TDB
>> instance within a normal begin write, add quads and commit. Initially this
>> completes in 30-40 seconds, However if I repeat the process (with the same
>> file) on the 5th iteration I get a OOME exception. As I'm importing the
>> same file into different graphs I would expect the DAT file to stay the
>> same size after the first process and just the index files to grow.
>>
>> There are no other process running against the TDB whilst this process is
>> run.
>>
>> Using jmap the [B grows with each process until finally I get the
>> exception.
>>
>> If I increase the Xmx the OOME occurs later.
>>
>> Any ideas?
>>
>> I've included details below, including jmap output which shows the heap
>> being used and the JVM output which shows the GC (Allocation Failure)
>> entries transiting to Full GC (Economics) entries...
>>
>> Regards Dick.
>>
>> JVM args -Xms2g -Xmx4g -XX:+PrintGCDetails
>> -javaagent:/home/dick/eclipse-jee-neon/opt/classmexer-0_03/classmexer.jar
>>
>> dick@Dick-M3800:~$ uname -a
>> Linux Dick-M3800 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC
>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>> dick@Dick-M3800:~$ java -version
>> openjdk version "1.8.0_91"
>> OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14)
>> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>> dick@Dick-M3800:~$
>>
>>
>> Output from jmap
>>
>> dick@Dick-M3800:~$ jmap -histo 15031 | head -25
>>
>>  num #instances #bytes  class name
>> --
>>1:132187  636210296  [B
>>2:500242   31189608  [C
>>3:533380   25602240
>>  org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
>>4:468220   18728800  org.kabeja.math.Point3D
>>5:349351   16768848  org.kabeja.entities.Vertex
>>6:  1654   16374552  [I
>>7:445589   16219672  [Ljava.lang.Object;
>> dick@Dick-M3800:~$ jmap -histo 15031 | head -25
>>
>>  num #instances #bytes  class name
>> --
>>1:   1148534 1420727464  [B
>>2:   5961841  344220520  [C
>>3:   1335412   85466368  java.nio.DirectByteBuffer
>>4:   3453219   82877256  java.lang.String
>>5:573585   65399360  [I
>>6:   1261244   60539712
>>  org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
>>7:945955   36365560  [Ljava.lang.Object;
>> dick@Dick-M3800:~$ jmap -histo 15031 | head -25
>>
>>  num #instances #bytes  class name
>> --
>>1:   1377821 2328657400  [B
>>2:   7566951  434495472  [C
>>3:   1717606  109926784  java.nio.DirectByteBuffer
>>4:   4339997  104159928  java.lang.String
>>5:749581   75578568  [I
>>6:   1485127   71286096
>>  org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
>>7:   1089303   42230696  [Ljava.lang.Object;
>> dick@Dick-M3800:~$ jmap -histo 15031 | head -25
>>
>>  num #instances #bytes  class name
>> --
>>1:333123 2285460248  [B
>>2:604102   38062832  [C
>>3:660301   31694448
>>  org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
>>4:468220   18728800  org.kabeja.math.Point3D
>>5:349351   16768848  org.kabeja.entities.Vertex
>>6:445689   16486104  [Ljava.lang.Object;
>>7:590752   14178048  java.lang.String
>>8:278273   13357104  java.nio.HeapByteBuffer
>>9:530557   12733368  org.apache.jena.tdb.store.NodeId
>>   10:   420   11221544  [Ljava.util.HashMap$Node;
>>   11:334514   10704448  java.util.HashMap$Node
>>   12:660301   10564816
>>  org.apache.jena.ext.com
>> 

Re: Jena TDB OOME GC overhead limit exceeded

2016-07-26 Thread Andy Seaborne

Dick,

The report is embedded in your application setup with a lot of 
"org.iungo.dataset.bulkload"


Just because the OOME occurs in TDB does not mean that the space is 
consumed in TDB - there may be a bigger memory hog elsewhere.


Could you produce an RDF example?

Maybe that file, already converted to RDF, and loaded with tdbloader?

If TDB is using DiretcByteBuffersm have you set 
"transactionJournalWriteBlockMode" to "direct"?


You need to increase the direct memory space, not the heap.

Andy

On 26/07/16 10:14, Dick Murray wrote:

Hi.

I've got a repeatable problem with Jena 3.1 when performing a bulk load.

The bulk load converts a file into ~200k quads and adds them to a TDB
instance within a normal begin write, add quads and commit. Initially this
completes in 30-40 seconds, However if I repeat the process (with the same
file) on the 5th iteration I get a OOME exception. As I'm importing the
same file into different graphs I would expect the DAT file to stay the
same size after the first process and just the index files to grow.

There are no other process running against the TDB whilst this process is
run.

Using jmap the [B grows with each process until finally I get the exception.

If I increase the Xmx the OOME occurs later.

Any ideas?

I've included details below, including jmap output which shows the heap
being used and the JVM output which shows the GC (Allocation Failure)
entries transiting to Full GC (Economics) entries...

Regards Dick.

JVM args -Xms2g -Xmx4g -XX:+PrintGCDetails
-javaagent:/home/dick/eclipse-jee-neon/opt/classmexer-0_03/classmexer.jar

dick@Dick-M3800:~$ uname -a
Linux Dick-M3800 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
dick@Dick-M3800:~$ java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
dick@Dick-M3800:~$


Output from jmap

dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:132187  636210296  [B
   2:500242   31189608  [C
   3:533380   25602240
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   4:468220   18728800  org.kabeja.math.Point3D
   5:349351   16768848  org.kabeja.entities.Vertex
   6:  1654   16374552  [I
   7:445589   16219672  [Ljava.lang.Object;
dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:   1148534 1420727464  [B
   2:   5961841  344220520  [C
   3:   1335412   85466368  java.nio.DirectByteBuffer
   4:   3453219   82877256  java.lang.String
   5:573585   65399360  [I
   6:   1261244   60539712
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   7:945955   36365560  [Ljava.lang.Object;
dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:   1377821 2328657400  [B
   2:   7566951  434495472  [C
   3:   1717606  109926784  java.nio.DirectByteBuffer
   4:   4339997  104159928  java.lang.String
   5:749581   75578568  [I
   6:   1485127   71286096
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   7:   1089303   42230696  [Ljava.lang.Object;
dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:333123 2285460248  [B
   2:604102   38062832  [C
   3:660301   31694448
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   4:468220   18728800  org.kabeja.math.Point3D
   5:349351   16768848  org.kabeja.entities.Vertex
   6:445689   16486104  [Ljava.lang.Object;
   7:590752   14178048  java.lang.String
   8:278273   13357104  java.nio.HeapByteBuffer
   9:530557   12733368  org.apache.jena.tdb.store.NodeId
  10:   420   11221544  [Ljava.util.HashMap$Node;
  11:334514   10704448  java.util.HashMap$Node
  12:660301   10564816
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongValueReference
  13:420443   10090632  org.kabeja.tools.LazyContainer
  14:2782498903968  org.apache.jena.tdb.base.block.Block
  15:2017198068760
 org.apache.jena.graph.impl.LiteralLabelImpl
  16:2271855452440  java.lang.Long
  17:2572474115952  org.apache.jena.graph.BlankNodeId
  18:2572474115952  org.apache.jena.graph.Node_Blank
  19:  16943770384  [I
  20:201719 

Jena TDB OOME GC overhead limit exceeded

2016-07-26 Thread Dick Murray
Hi.

I've got a repeatable problem with Jena 3.1 when performing a bulk load.

The bulk load converts a file into ~200k quads and adds them to a TDB
instance within a normal begin write, add quads and commit. Initially this
completes in 30-40 seconds, However if I repeat the process (with the same
file) on the 5th iteration I get a OOME exception. As I'm importing the
same file into different graphs I would expect the DAT file to stay the
same size after the first process and just the index files to grow.

There are no other process running against the TDB whilst this process is
run.

Using jmap the [B grows with each process until finally I get the exception.

If I increase the Xmx the OOME occurs later.

Any ideas?

I've included details below, including jmap output which shows the heap
being used and the JVM output which shows the GC (Allocation Failure)
entries transiting to Full GC (Economics) entries...

Regards Dick.

JVM args -Xms2g -Xmx4g -XX:+PrintGCDetails
-javaagent:/home/dick/eclipse-jee-neon/opt/classmexer-0_03/classmexer.jar

dick@Dick-M3800:~$ uname -a
Linux Dick-M3800 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
dick@Dick-M3800:~$ java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
dick@Dick-M3800:~$


Output from jmap

dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:132187  636210296  [B
   2:500242   31189608  [C
   3:533380   25602240
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   4:468220   18728800  org.kabeja.math.Point3D
   5:349351   16768848  org.kabeja.entities.Vertex
   6:  1654   16374552  [I
   7:445589   16219672  [Ljava.lang.Object;
dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:   1148534 1420727464  [B
   2:   5961841  344220520  [C
   3:   1335412   85466368  java.nio.DirectByteBuffer
   4:   3453219   82877256  java.lang.String
   5:573585   65399360  [I
   6:   1261244   60539712
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   7:945955   36365560  [Ljava.lang.Object;
dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:   1377821 2328657400  [B
   2:   7566951  434495472  [C
   3:   1717606  109926784  java.nio.DirectByteBuffer
   4:   4339997  104159928  java.lang.String
   5:749581   75578568  [I
   6:   1485127   71286096
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   7:   1089303   42230696  [Ljava.lang.Object;
dick@Dick-M3800:~$ jmap -histo 15031 | head -25

 num #instances #bytes  class name
--
   1:333123 2285460248  [B
   2:604102   38062832  [C
   3:660301   31694448
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   4:468220   18728800  org.kabeja.math.Point3D
   5:349351   16768848  org.kabeja.entities.Vertex
   6:445689   16486104  [Ljava.lang.Object;
   7:590752   14178048  java.lang.String
   8:278273   13357104  java.nio.HeapByteBuffer
   9:530557   12733368  org.apache.jena.tdb.store.NodeId
  10:   420   11221544  [Ljava.util.HashMap$Node;
  11:334514   10704448  java.util.HashMap$Node
  12:660301   10564816
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongValueReference
  13:420443   10090632  org.kabeja.tools.LazyContainer
  14:2782498903968  org.apache.jena.tdb.base.block.Block
  15:2017198068760
 org.apache.jena.graph.impl.LiteralLabelImpl
  16:2271855452440  java.lang.Long
  17:2572474115952  org.apache.jena.graph.BlankNodeId
  18:2572474115952  org.apache.jena.graph.Node_Blank
  19:  16943770384  [I
  20:2017193227504  org.apache.jena.graph.Node_Literal
  21: 171992476656  org.kabeja.entities.Attrib
  22: 915702197680  java.lang.Double
dick@Dick-M3800:~$ jmap -histo 15031 | head -10

 num #instances #bytes  class name
--
   1:333123 2285460248  [B
   2:604102   38062832  [C
   3:660301   31694448
 org.apache.jena.ext.com.google.common.cache.LocalCache$StrongAccessEntry
   4:468220   18728800