Re: huge fuseki memory usage; NIO errors; heap NOT running out

Dan Pritts Tue, 12 Jun 2018 19:50:30 -0700

We had the problem again today.

Load was higher than average, but again not lunacy - about 3k hits perminute. There is no immediately obviously bad query, although i hardlyknow what to look for in the sparql - i just looked for extra-longstatements. Nothing in the fuseki.log at all within an hour of theevent. As you know the logs are verbose, so we have logging set to"WARN" for just about everything. I'll append the log4j.properties tothe end of this message - if there's something in particular that'd beuseful to turn up, let me know.

I upgraded our dev & test to 3.7.0 today, am doing production tonight.Also recreated the database from a backup, and am looking to verify thatall db changes made since the 3.6 upgrade made it into fuseki.


For background, could you share a directory listing with files sizes?


total 2.8G
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 GOSP.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 GOSP.idn
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 GPOS.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 GPOS.idn
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 GSPO.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 GSPO.idn
-rw-rw-r--. 1 fuseki fuseki    0 Jun 12 15:27 journal.jrnl
-rw-r--r--. 1 fuseki fuseki 208M Jun 12 15:27 node2id.dat
-rw-r--r--. 1 fuseki fuseki  32M Jun 12 10:55 node2id.idn
-rw-r--r--. 1 fuseki fuseki 545M Jun 12 15:27 nodes.dat
-rw-r--r--. 1 fuseki fuseki 784M Jun 12 15:27 OSP.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 OSPG.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 OSPG.idn
-rw-r--r--. 1 fuseki fuseki  88M Jun 12 11:09 OSP.idn
-rw-r--r--. 1 fuseki fuseki 760M Jun 12 15:27 POS.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 POSG.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 POSG.idn
-rw-r--r--. 1 fuseki fuseki  88M Jun 12 11:09 POS.idn
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 prefix2id.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 prefix2id.idn
-rw-r--r--. 1 fuseki fuseki    0 Feb 18 11:30 prefixes.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 prefixIdx.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 prefixIdx.idn
-rw-r--r--. 1 fuseki fuseki 808M Jun 12 15:27 SPO.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 SPOG.dat
-rw-r--r--. 1 fuseki fuseki 8.0M Feb 18 11:30 SPOG.idn
-rw-r--r--. 1 fuseki fuseki  96M Jun 12 11:09 SPO.idn
-rw-r--r--. 1 fuseki fuseki  20K Feb 18 11:33 stats.opt
-rw-rw-r--. 1 fuseki fuseki    5 Jun 12 12:30 tdb.lock

When you restart - looks like that 10G is the mapped file space beingdropped. Mapping on-demand in chunks, so on restart it is very smalland grows over time. It should reach a steady state. It should notcause swapping or GC.

Yes, I noticed that the server actually uses more Vsize than it's usingvirtual memory (swap + ram), i figured it was something along thoselines. But when I referred to memory + swap used, i meant the actualRSS as reported by ps, plus the inferred swap usage (swap before andafter fuseki restart).

I was running "ps" & "free" every couple minutes. As you can seebetween 12:24 & 12:26 fuseki's memory usage skyrockets.


I've mildly edited the below but the numbers are all unmolested.


Tue Jun 12 12:18:01 EDT 2018

USER PID %CPU %MEM VSZ RSS TTY STATSTART TIME COMMANDfuseki 32175 23.1 65.3 41186832 21496864 ? Sl Jun1104:41:46 /etc/alternatives/java_sdk_1.8.0/bin/java -Xmx16G-Dlog4j.configuration=file:/etc/archonnex/fuseki/log4j.properties [ gclogging options here ] -jar/usr/local/apache-jena-fuseki-3.6.0/fuseki-server.jar--config=/etc/archonnex/fuseki/fcrepo.ttl


             total       used       free     shared buffers     cached
Mem:      32877320   31168460    1708860        416 3184     961276
-/+ buffers/cache:   30204000    2673320
Swap:     27257848    3145708   24112140

[...]

Tue Jun 12 12:22:01 EDT 2018

fuseki 32175 23.1 66.5 41383440 21870824 ? Sl Jun1104:43:32 /etc/alternatives/java_sdk_1.8.0/bin/java -Xmx16G

Mem:      32877320   31314128    1563192        488 1880     720456
-/+ buffers/cache:   30591792    2285528
Swap:     27257848    3145256   24112592

Tue Jun 12 12:24:01 EDT 2018

fuseki 32175 23.2 64.9 40859152 21352808 ? Sl Jun1104:44:19 /etc/alternatives/java_sdk_1.8.0/bin/java -Xmx16G -

Mem:      32877320   31276020    1601300        504 2104    1231452
-/+ buffers/cache:   30042464    2834856
Swap:     27257848    3094076   24163772

Tue Jun 12 12:26:02 EDT 2018

fuseki 32175 23.3 82.6 49183252 27179308 ? Sl Jun1104:46:21 /etc/alternatives/java_sdk_1.8.0/bin/java -Xmx16G

Mem:      32877320   32655256     222064        476 1516      25612
-/+ buffers/cache:   32628128     249192
Swap:     27257848    8361760   18896088

Tue Jun 12 12:28:01 EDT 2018

fuseki 32175 23.5 71.6 48702204 23540952 ? Sl Jun1104:49:44 /etc/alternatives/java_sdk_1.8.0/bin/java -Xmx16G

Mem:      32877320   30432416    2444904        484 2132     239924
-/+ buffers/cache:   30190360    2686960
Swap:     27257848   10598088   16659760

Java monitoring of the heap size should show the heap in use after amajor GC to be a different, smaller size.

Yesterday I fixed the garbage collection logging. I looked at it withgceasy.io; There is nothing horribly wrong there. Heap doesn't go above7GB, even when things went to hell. heap usage did increasesignificantly at the time of the problems - note the repeated Full GC's.






2018-06-12T12:22:28.779-0400: 73381.748: [GC (System.gc())
Desired survivor size 134742016 bytes, new threshold 9 (max 15)

[PSYoungGen: 3068955K->7622K(5460480K)] 3087627K->26295K(6654976K),0.0068808 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]

2018-06-12T12:22:28.786-0400: 73381.755: [Full GC (System.gc())[PSYoungGen: 7622K->0K(5460480K)] [ParOldGen: 18672K->24964K(1194496K)]26295K->24964K(6654976K), [Metaspace: 34037K->34037K(1081344K)],0.1054190 secs] [Times: user=0.57 sys=0.00, real=0.10 secs]


2018-06-12T12:23:22.592-0400: 73435.562: [GC (System.gc())
Desired survivor size 130547712 bytes, new threshold 8 (max 15)

[PSYoungGen: 2440898K->2816K(5455872K)] 2465863K->27780K(6650368K),0.0037102 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]

2018-06-12T12:23:22.596-0400: 73435.566: [Full GC (System.gc())[PSYoungGen: 2816K->0K(5455872K)] [ParOldGen: 24964K->27048K(1194496K)]27780K->27048K(6650368K), [Metaspace: 34037K->34037K(1081344K)],0.1114969 secs] [Times: user=0.61 sys=0.00, real=0.11 secs]


2018-06-12T12:24:02.404-0400: 73475.374: [GC (Allocation Failure)
Desired survivor size 175112192 bytes, new threshold 7 (max 15)

[PSYoungGen: 5324288K->127456K(5377536K)] 5351336K->201416K(6572032K),0.1020528 secs] [Times: user=0.66 sys=0.00, real=0.10 secs]


2018-06-12T12:24:29.348-0400: 73502.318: [GC (System.gc())
Desired survivor size 193986560 bytes, new threshold 6 (max 15)

[PSYoungGen: 880066K->129888K(5380096K)] 954027K->203848K(6574592K),0.0642832 secs] [Times: user=0.33 sys=0.00, real=0.06 secs]

2018-06-12T12:24:29.412-0400: 73502.382: [Full GC (System.gc())[PSYoungGen: 129888K->0K(5380096K)] [ParOldGen:73960K->196536K(1194496K)] 203848K->196536K(6574592K), [Metaspace:34037K->34037K(1081344K)], 0.3551479 secs] [Times: user=1.78 sys=0.00,real=0.35 secs]


2018-06-12T12:27:48.073-0400: 73701.045: [GC (System.gc())
Desired survivor size 186646528 bytes, new threshold 5 (max 15)

[PSYoungGen: 2862549K->16720K(5409792K)] 3059085K->213256K(6604288K),2.1344761 secs] [Times: user=1.07 sys=0.09, real=2.13 secs]

2018-06-12T12:27:50.210-0400: 73703.179: [Full GC (System.gc())[PSYoungGen: 16720K->0K(5409792K)] [ParOldGen:196536K->206591K(1194496K)] 213256K->206591K(6604288K), [Metaspace:34037K->34037K(1081344K)], 2.9111523 secs] [Times: user=2.51 sys=0.09,real=2.91 secs]

If that is not how it is, there is something to investigate.

    Andy

>
> thanks
> danno
Dan Pritts <mailto:[email protected]>
June 11, 2018 at 5:28 PM
Hi all,
we've been having trouble with our production fuseki instance. a fewspecifics:
fuseki 3.6.0, standalone/jetty. OpenJDK 1.8.0.171 on RHEL6. On anm4.2xlarge, shared with two other applications.
we have about 21M triples in the database. We hit fuseki medium hard,on the order of 1000 hits per minute. 99%+ of the hits are queries.Our code could stand to do some client-side caching, we get lots ofrepetitive queries. That said, fuseki is normally plenty fast atthose, it's rare that it takes >10ms on a query.
It looks like i'm getting hit by JENA-1516, I will schedule an upgradeto 3.7 ASAP.
The log is full of errors like this.

[2018-06-11 16:15:07] BindingTDB ERROR get1(?s)
org.apache.jena.tdb.base.file.FileException:ObjectFileStorage.read[nodes](488281706)[filesize=569694455][file.size()=569694455]:Failed to read the length : got 0 bytesatorg.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:341)
[2018-06-11 16:15:39] BindingTDB ERROR get1(?identifier)
org.apache.jena.tdb.base.file.FileException: In the middle of analloc-writeatorg.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:311)atorg.apache.jena.tdb.base.objectfile.ObjectFileWrapper.read(ObjectFileWrapper.java:57)
        at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)
The problem that got me looking is that fuseki memory usage goes nuts,which causes the server to start swapping, etc. Swapping = slow =pager. Total memory + swap in use by fuseki when I investigatedwas about 32GB; It's configured to use a 16GB heap. Garbagecollection logging was not configured properly, so I can't say whethermy immediate problem was heap exhaustion.
I'm monitoring swap usage hourly - sometime in a <1hr timeframe theswap usage increased past 2GB (10%) to about 11GB (10 of which wascleared after I restarted fuseki). So the memory ballooned fairlyquickly when it happened.
The TDB errors happen much earlier than that memory goes nuts.Obviously, could be a delayed effect of this problem, but I'm wondering:
- if this rings a bell in some other way - how much memory should Iexpect fuseki to need?
-  if there is any particular debugging I should enable
-  if our traffic level is out of the ordinary

thanks
danno


--
Dan Pritts
ICPSR Computing & Network Services
University of Michigan


# Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0

# Plain output to stdout
log4j.appender.jena.plainstdout=org.apache.log4j.ConsoleAppender
log4j.appender.jena.plainstdout.target=System.out
log4j.appender.jena.plainstdout.layout=org.apache.log4j.PatternLayout

log4j.appender.jena.plainstdout.layout.ConversionPattern=[%d{yyyy-MM-ddHH:mm:ss}] %-10c{1} %-5p %m%n

## %d{ISO8601} -- includes "ss,sss"

##log4j.appender.jena.plainstdout.layout.ConversionPattern=[%d{ISO8601}]%-10c{1} %-5p %m%n


# Unadorned, for the NCSA requests log.
log4j.appender.fuseki.plain=org.apache.log4j.ConsoleAppender
log4j.appender.fuseki.plain.target=System.out
log4j.appender.fuseki.plain.layout=org.apache.log4j.PatternLayout
log4j.appender.fuseki.plain.layout.ConversionPattern=%m%n

#http://www.codejava.net/coding/configure-log4j-for-creating-daily-rolling-log-files# also seehttps://github.com/epimorphics/sedgemoor-data/blob/master/package/fuseki-config/log4j.properties

log4j.rootLogger=WARN,ArchonnexFusekiFileLog
log4j.appender.ArchonnexFusekiFileLog=org.apache.log4j.DailyRollingFileAppender
log4j.appender.ArchonnexFusekiFileLog.File=/var/log/archonnex/fuseki/fuseki.log
log4j.appender.ArchonnexFusekiFileLog.DatePattern='.'yyyy-MM-dd
log4j.appender.ArchonnexFusekiFileLog.layout=org.apache.log4j.PatternLayout

log4j.appender.ArchonnexFusekiFileLog.layout.ConversionPattern=[%d{yyyy-MM-ddHH:mm:ss}] %-10c{1} %-5p %m%n








#log4j.rootLogger=WARN, jena.plainstdout
log4j.logger.org.apache.jena=WARN
log4j.logger.org.apache.jena=WARN
log4j.logger.org.apache.jena.fuseki=WARN

# Others
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.apache.shiro=WARN

# Fuseki System logs.
log4j.logger.org.apache.jena.fuseki.Server=WARN
log4j.logger.org.apache.jena.fuseki.Fuseki=WARN
log4j.logger.org.apache.jena.fuseki.Admin=WARN
log4j.logger.org.apache.jena.fuseki.Validate=WARN
log4j.logger.org.apache.jena.fuseki.Config=WARN

# NCSA Request log.
log4j.additivity.org.apache.jena.fuseki.Request=false
log4j.logger.org.apache.jena.fuseki.Request=OFF, fuseki.plain

# TDB
log4j.logger.org.apache.jena.tdb.loader=WARN
## Parser output
log4j.additivity.org.apache.jena.riot=false
log4j.logger.org.apache.jena.riot=WARN, jena.plainstdout
<https://www.postbox-inc.com>

Re: huge fuseki memory usage; NIO errors; heap NOT running out

Reply via email to