Hi list,

we bundle our OSGi application with karaf, having our own feature
repository put into etc/org.apache.karaf.features.cfg and also defined our
boot feature with the featuresBoot property.

When starting up on RHEL7 and re-starting on Docker the startup gets
interrupted occasionally when org.apache.karaf.features.internal.osgi.Activator
tries to read etc/org.apache.karaf.features.cfg. We can see in a heap dump
from this state, that the features.core bundle is loaded, but the
properties from the config file are not present.
This leads to no further features/bundles being installed.

When debugging this I can see the timestamps
on etc/org.apache.karaf.features.cfg beeing later then the timestamps on
the karaf pid file:

/opt/instana/agent# stat agent.pid
  File: agent.pid
  Size: 5               Blocks: 8          IO Block: 4096   regular file
Device: 41h/65d Inode: 19025079    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-10-02 06:52:09.823302941 +0000
Modify: 2020-10-02 06:57:41.029279007 +0000
Change: 2020-10-02 06:57:41.029279007 +0000
 Birth: -

/opt/instana/agent# stat etc/org.apache.karaf.features.cfg
  File: etc/org.apache.karaf.features.cfg
  Size: 3393            Blocks: 8          IO Block: 4096   regular file
Device: 41h/65d Inode: 39076582    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-10-02 06:57:42.645969271 +0000
Modify: 2020-10-02 06:57:42.645969271 +0000
Change: 2020-10-02 06:57:42.645969271 +0000
 Birth: -



Following is a thread dump from the stuck startup:


/opt/instana/agent# jvm/bin/jcmd 26665 Thread.print
26665:
2020-10-02 07:00:53
Full thread dump OpenJDK 64-Bit Server VM (25.232-b18 mixed
mode):"Attach Listener" #18 daemon prio=9 os_prio=0
tid=0x0000000002a64000 nid=0x6cd4 waiting on condition
[0x0000000000000000]
   java.lang.Thread.State:
RUNNABLE"fileinstall-/opt/instana/agent/deploy" #17 daemon prio=5
os_prio=0 tid=0x0000000002de2000 nid=0x6842 in Object.wait()
[0x00007faa5ef42000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f8f7d1d0> (a
org.apache.felix.fileinstall.internal.DirectoryWatcher)
        at 
org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:320)
        - locked <0x00000000f8f7d1d0> (a
org.apache.felix.fileinstall.internal.DirectoryWatcher)"fileinstall-/opt/instana/agent/etc"
#15 daemon prio=5 os_prio=0 tid=0x0000000002e2a800 nid=0x6841 in
Object.wait() [0x00007faa5efa3000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f8c87360> (a
org.apache.felix.fileinstall.internal.DirectoryWatcher)
        at 
org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:320)
        - locked <0x00000000f8c87360> (a
org.apache.felix.fileinstall.internal.DirectoryWatcher)"Thread-2" #16
daemon prio=5 os_prio=0 tid=0x0000000002ccc800 nid=0x683c runnable
[0x00007faa5f880000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.fs.LinuxWatchService.poll(Native Method)
        at sun.nio.fs.LinuxWatchService.access$600(LinuxWatchService.java:47)
        at sun.nio.fs.LinuxWatchService$Poller.run(LinuxWatchService.java:314)
        at java.lang.Thread.run(Thread.java:748)"CM Event Dispatcher"
#14 daemon prio=5 os_prio=0 tid=0x0000000002caa000 nid=0x683b in
Object.wait() [0x00007faa5f8e1000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f8c87860> (a java.util.LinkedList)
        at java.lang.Object.wait(Object.java:502)
        at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:79)
        - locked <0x00000000f8c87860> (a java.util.LinkedList)
        at java.lang.Thread.run(Thread.java:748)"CM Configuration
Updater" #13 daemon prio=5 os_prio=0 tid=0x0000000002ca1800 nid=0x683a
in Object.wait() [0x00007faa5f942000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f8c87c78> (a java.util.LinkedList)
        at java.lang.Object.wait(Object.java:502)
        at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:79)
        - locked <0x00000000f8c87c78> (a java.util.LinkedList)
        at java.lang.Thread.run(Thread.java:748)"Coordination Timer"
#12 daemon prio=5 os_prio=0 tid=0x0000000002c8d000 nid=0x6839 in
Object.wait() [0x00007faa5fa4b000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f8c87e18> (a java.util.TaskQueue)
        at java.lang.Object.wait(Object.java:502)
        at java.util.TimerThread.mainLoop(Timer.java:526)
        - locked <0x00000000f8c87e18> (a java.util.TaskQueue)
        at java.util.TimerThread.run(Timer.java:505)"Karaf Lock
Monitor Thread" #11 prio=5 os_prio=0 tid=0x0000000002863000 nid=0x6838
waiting on condition [0x00007faa5f804000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.karaf.main.Main.doMonitor(Main.java:410)
        at org.apache.karaf.main.Main.access$100(Main.java:74)
        at org.apache.karaf.main.Main$3.run(Main.java:377)"FelixStartLevel"
#10 daemon prio=5 os_prio=0 tid=0x000000000250a000 nid=0x6837 in
Object.wait() [0x00007faa5fb20000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f84d8b00> (a java.util.ArrayList)
        at java.lang.Object.wait(Object.java:502)
        at 
org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:283)
        - locked <0x00000000f84d8b00> (a java.util.ArrayList)
        at java.lang.Thread.run(Thread.java:748)"FelixFrameworkWiring"
#9 daemon prio=5 os_prio=0 tid=0x0000000002508000 nid=0x6836 in
Object.wait() [0x00007faa5fc62000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f84d8cb8> (a java.util.ArrayList)
        at java.lang.Object.wait(Object.java:502)
        at 
org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
        - locked <0x00000000f84d8cb8> (a java.util.ArrayList)
        at java.lang.Thread.run(Thread.java:748)"FelixDispatchQueue"
#8 prio=5 os_prio=0 tid=0x0000000002577000 nid=0x6835 in Object.wait()
[0x00007faa5fb81000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f84d8e70> (a java.util.ArrayList)
        at java.lang.Object.wait(Object.java:502)
        at 
org.apache.felix.framework.EventDispatcher.run(EventDispatcher.java:1122)
        - locked <0x00000000f84d8e70> (a java.util.ArrayList)
        at 
org.apache.felix.framework.EventDispatcher.access$000(EventDispatcher.java:54)
        at 
org.apache.felix.framework.EventDispatcher$1.run(EventDispatcher.java:102)
        at java.lang.Thread.run(Thread.java:748)"Service Thread" #6
daemon prio=9 os_prio=0 tid=0x0000000001f7d000 nid=0x6830 runnable
[0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"C2 CompilerThread0" #5 daemon
prio=9 os_prio=0 tid=0x0000000001f73000 nid=0x682f waiting on
condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"Signal Dispatcher" #4 daemon
prio=9 os_prio=0 tid=0x0000000001f63000 nid=0x682e runnable
[0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"Finalizer" #3 daemon prio=8
os_prio=0 tid=0x0000000001f3e800 nid=0x682d in Object.wait()
[0x00007faa63b2e000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f83a18a8> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
        - locked <0x00000000f83a18a8> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
        at 
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)"Reference
Handler" #2 daemon prio=10 os_prio=0 tid=0x0000000001f34000 nid=0x682c
in Object.wait() [0x00007faa63b8f000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f83a1a60> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:502)
        at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
        - locked <0x00000000f83a1a60> (a java.lang.ref.Reference$Lock)
        at 
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)"main"
#1 prio=5 os_prio=0 tid=0x0000000001ec4000 nid=0x682a in Object.wait()
[0x00007faa65f49000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f855e590> (a
org.apache.felix.framework.util.ThreadGate)
        at org.apache.felix.framework.util.ThreadGate.await(ThreadGate.java:79)
        - locked <0x00000000f855e590> (a
org.apache.felix.framework.util.ThreadGate)
        at org.apache.felix.framework.Felix.waitForStop(Felix.java:1075)
        at org.apache.karaf.main.Main.awaitShutdown(Main.java:668)
        at org.apache.karaf.main.Main.main(Main.java:188)"VM Thread"
os_prio=0 tid=0x0000000001f2a000 nid=0x682b runnable"VM Periodic Task
Thread" os_prio=0 tid=0x0000000001f80800 nid=0x6831 waiting on
conditionJNI global references: 304


This is our startup.properties:

# Bundles to be started on startup, with startlevel

mvn\:org.apache.karaf.features/org.apache.karaf.features.extension/4.2.9 = 1

mvn\:org.ops4j.pax.url/pax-url-aether/2.6.2 = 5

mvn\:org.ops4j.pax.logging/pax-logging-api/1.11.6 = 8

mvn\:org.ops4j.pax.logging/pax-logging-log4j2/1.11.6 = 8

mvn\:org.fusesource.jansi/jansi/1.18 = 8

mvn\:org.apache.felix/org.apache.felix.coordinator/1.0.2 = 9

mvn\:org.apache.felix/org.apache.felix.configadmin/1.9.16 = 10

mvn\:org.apache.felix/org.apache.felix.fileinstall/3.6.6 = 11

mvn\:org.apache.karaf.features/org.apache.karaf.features.core/4.2.9 = 15

I suspect a race condition between felix fileinstall, felix config admin
and the feature.core bundle to cause this when the file is re-written by
fileinstall(?) and feature.core opening it for reading. Unfortunately I
have no detailed experience in the combined workings of those components.

Happy to hear any suggestion.

ps: seen with karaf 4.2.8 on RHEL7 and 4.2.9 on docker.

Cheers
/henning
-- 
*Henning Treu *| Senior Software Engineer
Mobile: +49 160 1797564

*INSTANA Inc* | 222 South Riverside Plaza, 15th Floor | Chicago, IL 60606 |
United States

*INSTANA GMBH *| Dunkelnberger Str. 39 | 42697 Solingen | Deutschland
Sitz der Gesellschaft: Solingen | HRB 26637 | Amtsgericht Wuppertal
Geschäftsführer: Mirko Novakovic | Pavlo Baron

Reply via email to