Hi Kieron,

On 28/09/17 10:39, Kieron Taylor wrote:
Hi everyone,

I'm trying to use Fuseki as a temporary memory-only server, i.e. load RDF into 
memory, run queries, dispose of server.

My testing was going really well until I tried to take it from development on 
laptop to a compute farm.

JVM 1.8.0_112-b15
Fuseki version 3.4.0
Redhat enterprise 7 via LSF

Server invocation: java --Xmx24GB -Xms24GB -jar fuseki-server.jar --update 
--port 3355 --mem /test
                         ^^^^^^
It's -Xmx, not --Xmx

I thought that caused an error - but if it doesn't the max heap isn't set, which explain why -Xms is needed.

I'd have thought 24G is plenty for 23 million triples unless there are many very large literals.


I load my data (totalling 23 million triples across tens of files) using the 
s-put utilty into two graphs, and with time and progress depending on how much 
heap I have allocated (14 GB up to 40 GB), it loads for a while and then 
explodes. See below for a sample of the whole error.

The perplexing part is that I cannot see any sign of an error to trigger the 
dump or predict when it will die. If I do not set -Xms to the same as the -Xmx 
parameter, it dies within ten seconds of starting to load (where loading should 
take 30 minutes or more). If I give it loads of heap, the crash seems to occur 
around it receiving its first SPARQL query after the data is loaded. The client 
(calling s-post) sees generic_request.rb:206:in `copy_stream': Broken pipe - 
sendfile (Errno::EPIPE), which I infer to mean that the server has gone away 
mid-request.

Data is added transactionally so even if a bad update happens the rest of the data should be safe.


I have tried the following so far:

1. Add heap
2. Change JVM to another Java 8 release
3. Turn up Fuseki logging - No debug messages obviously indicate an error prior 
to the crash

Can anybody recommend a course of action to diagnose and fix my issue?

I hate to say it but it smells a bit like a hardware fault (or JVM fault?), especially the unpredictability. Anything software is usually reasonably predictable.

    Andy



Regards,

Kieron

----------------- thread dump-----------------------
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.112-b15 mixed mode):

"qtp596910004-172" #172 prio=5 os_prio=0 tid=0x00002b9c54002000 nid=0xcc67 
waiting on condition [0x00002b9bd3a7f000]
   java.lang.Thread.State: WAITING (parking)

This is the webserver (Jetty) waiting for something.

        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000004659a2800> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:173)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:672)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:590)
        at java.lang.Thread.run(Thread.java:745)

"qtp596910004-41" #41 prio=5 os_prio=0 tid=0x00002b9c64001000 nid=0xb393 
runnable [0x00002b9bd3c2d000]
   java.lang.Thread.State: RUNNABLE
..... lots more threads

"VM Thread" os_prio=0 tid=0x00002b9b3c3c5800 nid=0xd1da runnable

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00002b9b3c01f000 nid=0xd1c0 
runnable
.... more GC

"VM Periodic Task Thread" os_prio=0 tid=0x00002b9b3c432800 nid=0xd1ff waiting 
on condition

JNI global references: 499

Heap
PSYoungGen      total 3185664K, used 1577280K [0x000000069c580000, 
0x00000007c0000000, 0x00000007c0000000)
  eden space 1592832K, 99% used 
[0x000000069c580000,0x00000006fc9d0308,0x00000006fd900000)
  from space 1592832K, 0% used 
[0x00000006fd900000,0x00000006fd900000,0x000000075ec80000)
  to   space 1592832K, 0% used 
[0x000000075ec80000,0x000000075ec80000,0x00000007c0000000)
ParOldGen       total 9557504K, used 9557170K [0x0000000455000000, 
0x000000069c580000, 0x000000069c580000)
  object space 9557504K, 99% used 
[0x0000000455000000,0x000000069c52c950,0x000000069c580000)
Metaspace       used 27217K, capacity 27644K, committed 28032K, reserved 
1073152K
  class space    used 3508K, capacity 3636K, committed 3712K, reserved 1048576K

I don't see more the 10G being used here.


Kieron Taylor PhD.
Ensembl Developer

EMBL, European Bioinformatics Institute

Reply via email to