Thanks, I will leave some comments in the PR.

Regards
JB

> Le 6 oct. 2020 à 10:27, Henning Treu <[email protected]> a écrit :
> 
> Hi, I created https://issues.apache.org/jira/browse/KARAF-6866 
> <https://issues.apache.org/jira/browse/KARAF-6866>
> 
> Will try to come up with a PR for a fix.
> 
> Stay tuned.
> /henning
> 
> On Mon, Oct 5, 2020 at 5:00 PM Jean-Baptiste Onofre <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi,
> 
> It would "force" configadmin to be startup in etc/startup.properties before 
> features service (which is the case by default but not mandatory).
> 
> It’s certainly acceptable but I would like to evaluate the impact.
> 
> Can you please create a Jira related to that (or I can do it for you) ?
> 
> Thanks,
> Regards
> JB
> 
>> Le 5 oct. 2020 à 16:53, Henning Treu <[email protected] 
>> <mailto:[email protected]>> a écrit :
>> 
>> Hi JB,
>> 
>> unfortunately there is no practical way to reproduce this. It happens 
>> "occasionally" on our docker images with karaf packaged and in 10% of cases 
>> when our customer installs the karaf distro on RHEL7.
>> 
>> In the feature.core Activator the config is loaded specifically from file, 
>> why not use the config already stored in ConfigAdmin (I guess provided by 
>> felix FileInstall)? Since the timestamp on the file updates it looks like it 
>> gets re-written and races with the Activator to open it for reading. Does 
>> this make sense?
>> 
>> Best
>> /henning
>> 
>> On Fri, Oct 2, 2020 at 2:22 PM Jean-Baptiste Onofre <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi,
>> 
>> Does it really occur with Karaf 4.2.8 ?
>> 
>> I never saw such issue. Any step to reproduce this (maybe changing time) ?
>> 
>> Regards
>> JB
>> 
>>> Le 2 oct. 2020 à 13:31, Henning Treu <[email protected] 
>>> <mailto:[email protected]>> a écrit :
>>> 
>>> Hi list,
>>> 
>>> we bundle our OSGi application with karaf, having our own feature 
>>> repository put into etc/org.apache.karaf.features.cfg and also defined our 
>>> boot feature with the featuresBoot property.
>>> 
>>> When starting up on RHEL7 and re-starting on Docker the startup gets 
>>> interrupted occasionally when 
>>> org.apache.karaf.features.internal.osgi.Activator tries to read 
>>> etc/org.apache.karaf.features.cfg. We can see in a heap dump from this 
>>> state, that the features.core bundle is loaded, but the properties from the 
>>> config file are not present.
>>> This leads to no further features/bundles being installed.
>>> 
>>> When debugging this I can see the timestamps on 
>>> etc/org.apache.karaf.features.cfg beeing later then the timestamps on the 
>>> karaf pid file:
>>> 
>>> /opt/instana/agent# stat agent.pid
>>>   File: agent.pid
>>>   Size: 5               Blocks: 8          IO Block: 4096   regular file
>>> Device: 41h/65d Inode: 19025079    Links: 1
>>> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
>>> Access: 2020-10-02 06:52:09.823302941 +0000
>>> Modify: 2020-10-02 06:57:41.029279007 +0000
>>> Change: 2020-10-02 06:57:41.029279007 +0000
>>>  Birth: -
>>> 
>>> /opt/instana/agent# stat etc/org.apache.karaf.features.cfg
>>>   File: etc/org.apache.karaf.features.cfg
>>>   Size: 3393            Blocks: 8          IO Block: 4096   regular file
>>> Device: 41h/65d Inode: 39076582    Links: 1
>>> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
>>> Access: 2020-10-02 06:57:42.645969271 +0000
>>> Modify: 2020-10-02 06:57:42.645969271 +0000
>>> Change: 2020-10-02 06:57:42.645969271 +0000
>>>  Birth: -
>>> 
>>> 
>>> Following is a thread dump from the stuck startup:
>>> 
>>> /opt/instana/agent# jvm/bin/jcmd 26665 Thread.print
>>> 26665:
>>> 2020-10-02 07:00:53
>>> Full thread dump OpenJDK 64-Bit Server VM (25.232-b18 mixed mode):
>>> "Attach Listener" #18 daemon prio=9 os_prio=0 tid=0x0000000002a64000 
>>> nid=0x6cd4 waiting on condition [0x0000000000000000]
>>>    java.lang.Thread.State: RUNNABLE
>>> "fileinstall-/opt/instana/agent/deploy" #17 daemon prio=5 os_prio=0 
>>> tid=0x0000000002de2000 nid=0x6842 in Object.wait() [0x00007faa5ef42000]
>>>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0x00000000f8f7d1d0> (a 
>>> org.apache.felix.fileinstall.internal.DirectoryWatcher)
>>>         at 
>>> org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:320)
>>>         - locked <0x00000000f8f7d1d0> (a 
>>> org.apache.felix.fileinstall.internal.DirectoryWatcher)
>>> "fileinstall-/opt/instana/agent/etc" #15 daemon prio=5 os_prio=0 
>>> tid=0x0000000002e2a800 nid=0x6841 in Object.wait() [0x00007faa5efa3000]
>>>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0x00000000f8c87360> (a 
>>> org.apache.felix.fileinstall.internal.DirectoryWatcher)
>>>         at 
>>> org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:320)
>>>         - locked <0x00000000f8c87360> (a 
>>> org.apache.felix.fileinstall.internal.DirectoryWatcher)
>>> "Thread-2" #16 daemon prio=5 os_prio=0 tid=0x0000000002ccc800 nid=0x683c 
>>> runnable [0x00007faa5f880000]
>>>    java.lang.Thread.State: RUNNABLE
>>>         at sun.nio.fs.LinuxWatchService.poll(Native Method)
>>>         at 
>>> sun.nio.fs.LinuxWatchService.access$600(LinuxWatchService.java:47)
>>>         at 
>>> sun.nio.fs.LinuxWatchService$Poller.run(LinuxWatchService.java:314)
>>>         at java.lang.Thread.run(Thread.java:748)
>>> "CM Event Dispatcher" #14 daemon prio=5 os_prio=0 tid=0x0000000002caa000 
>>> nid=0x683b in Object.wait() [0x00007faa5f8e1000]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0x00000000f8c87860> (a java.util.LinkedList)
>>>         at java.lang.Object.wait(Object.java:502)
>>>         at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:79)
>>>         - locked <0x00000000f8c87860> (a java.util.LinkedList)
>>>         at java.lang.Thread.run(Thread.java:748)
>>> "CM Configuration Updater" #13 daemon prio=5 os_prio=0 
>>> tid=0x0000000002ca1800 nid=0x683a in Object.wait() [0x00007faa5f942000]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0x00000000f8c87c78> (a java.util.LinkedList)
>>>         at java.lang.Object.wait(Object.java:502)
>>>         at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:79)
>>>         - locked <0x00000000f8c87c78> (a java.util.LinkedList)
>>>         at java.lang.Thread.run(Thread.java:748)
>>> "Coordination Timer" #12 daemon prio=5 os_prio=0 tid=0x0000000002c8d000 
>>> nid=0x6839 in Object.wait() [0x00007faa5fa4b000]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0x00000000f8c87e18> (a java.util.TaskQueue)
>>>         at java.lang.Object.wait(Object.java:502)
>>>         at java.util.TimerThread.mainLoop(Timer.java:526)
>>>         - locked <0x00000000f8c87e18> (a java.util.TaskQueue)
>>>         at java.util.TimerThread.run(Timer.java:505)
>>> "Karaf Lock Monitor Thread" #11 prio=5 os_prio=0 tid=0x0000000002863000 
>>> nid=0x6838 waiting on condition [0x00007faa5f804000]
>>>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>         at java.lang.Thread.sleep(Native Method)
>>>         at org.apache.karaf.main.Main.doMonitor(Main.java:410)
>>>         at org.apache.karaf.main.Main.access$100(Main.java:74)
>>>         at org.apache.karaf.main.Main$3.run(Main.java:377)
>>> "FelixStartLevel" #10 daemon prio=5 os_prio=0 tid=0x000000000250a000 
>>> nid=0x6837 in Object.wait() [0x00007faa5fb20000]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0x00000000f84d8b00> (a java.util.ArrayList)
>>>         at java.lang.Object.wait(Object.java:502)
>>>         at 
>>> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:283)
>>>         - locked <0x00000000f84d8b00> (a java.util.ArrayList)
>>>         at java.lang.Thread.run(Thread.java:748)
>>> "FelixFrameworkWiring" #9 daemon prio=5 os_prio=0 tid=0x0000000002508000 
>>> nid=0x6836 in Object.wait() [0x00007faa5fc62000]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0x00000000f84d8cb8> (a java.util.ArrayList)
>>>         at java.lang.Object.wait(Object.java:502)
>>>         at 
>>> org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
>>>         - locked <0x00000000f84d8cb8> (a java.util.ArrayList)
>>>         at java.lang.Thread.run(Thread.java:748)
>>> "FelixDispatchQueue" #8 prio=5 os_prio=0 tid=0x0000000002577000 nid=0x6835 
>>> in Object.wait() [0x00007faa5fb81000]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0x00000000f84d8e70> (a java.util.ArrayList)
>>>         at java.lang.Object.wait(Object.java:502)
>>>         at 
>>> org.apache.felix.framework.EventDispatcher.run(EventDispatcher.java:1122)
>>>         - locked <0x00000000f84d8e70> (a java.util.ArrayList)
>>>         at 
>>> org.apache.felix.framework.EventDispatcher.access$000(EventDispatcher.java:54)
>>>         at 
>>> org.apache.felix.framework.EventDispatcher$1.run(EventDispatcher.java:102)
>>>         at java.lang.Thread.run(Thread.java:748)
>>> "Service Thread" #6 daemon prio=9 os_prio=0 tid=0x0000000001f7d000 
>>> nid=0x6830 runnable [0x0000000000000000]
>>>    java.lang.Thread.State: RUNNABLE
>>> "C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x0000000001f73000 
>>> nid=0x682f waiting on condition [0x0000000000000000]
>>>    java.lang.Thread.State: RUNNABLE
>>> "Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x0000000001f63000 
>>> nid=0x682e runnable [0x0000000000000000]
>>>    java.lang.Thread.State: RUNNABLE
>>> "Finalizer" #3 daemon prio=8 os_prio=0 tid=0x0000000001f3e800 nid=0x682d in 
>>> Object.wait() [0x00007faa63b2e000]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0x00000000f83a18a8> (a 
>>> java.lang.ref.ReferenceQueue$Lock)
>>>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
>>>         - locked <0x00000000f83a18a8> (a java.lang.ref.ReferenceQueue$Lock)
>>>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
>>>         at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)
>>> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x0000000001f34000 
>>> nid=0x682c in Object.wait() [0x00007faa63b8f000]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0x00000000f83a1a60> (a java.lang.ref.Reference$Lock)
>>>         at java.lang.Object.wait(Object.java:502)
>>>         at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
>>>         - locked <0x00000000f83a1a60> (a java.lang.ref.Reference$Lock)
>>>         at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
>>> "main" #1 prio=5 os_prio=0 tid=0x0000000001ec4000 nid=0x682a in 
>>> Object.wait() [0x00007faa65f49000]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0x00000000f855e590> (a 
>>> org.apache.felix.framework.util.ThreadGate)
>>>         at 
>>> org.apache.felix.framework.util.ThreadGate.await(ThreadGate.java:79)
>>>         - locked <0x00000000f855e590> (a 
>>> org.apache.felix.framework.util.ThreadGate)
>>>         at org.apache.felix.framework.Felix.waitForStop(Felix.java:1075)
>>>         at org.apache.karaf.main.Main.awaitShutdown(Main.java:668)
>>>         at org.apache.karaf.main.Main.main(Main.java:188)
>>> "VM Thread" os_prio=0 tid=0x0000000001f2a000 nid=0x682b runnable
>>> "VM Periodic Task Thread" os_prio=0 tid=0x0000000001f80800 nid=0x6831 
>>> waiting on condition
>>> JNI global references: 304
>>> 
>>> This is our startup.properties:
>>> 
>>> # Bundles to be started on startup, with startlevel
>>> mvn\:org.apache.karaf.features/org.apache.karaf.features.extension/4.2.9 = 1
>>> mvn\:org.ops4j.pax.url/pax-url-aether/2.6.2 = 5
>>> mvn\:org.ops4j.pax.logging/pax-logging-api/1.11.6 = 8
>>> mvn\:org.ops4j.pax.logging/pax-logging-log4j2/1.11.6 = 8
>>> mvn\:org.fusesource.jansi/jansi/1.18 = 8
>>> mvn\:org.apache.felix/org.apache.felix.coordinator/1.0.2 = 9
>>> mvn\:org.apache.felix/org.apache.felix.configadmin/1.9.16 = 10
>>> mvn\:org.apache.felix/org.apache.felix.fileinstall/3.6.6 = 11
>>> mvn\:org.apache.karaf.features/org.apache.karaf.features.core/4.2.9 = 15
>>> 
>>> I suspect a race condition between felix fileinstall, felix config admin 
>>> and the feature.core bundle to cause this when the file is re-written by 
>>> fileinstall(?) and feature.core opening it for reading. Unfortunately I 
>>> have no detailed experience in the combined workings of those components.
>>> 
>>> Happy to hear any suggestion. 
>>> 
>>> ps: seen with karaf 4.2.8 on RHEL7 and 4.2.9 on docker.
>>> 
>>> Cheers
>>> /henning
>>> -- 
>>> Henning Treu | Senior Software Engineer
>>> Mobile: +49 160 1797564
>>> 
>>> INSTANA Inc | 222 South Riverside Plaza, 15th Floor | Chicago, IL 60606 | 
>>> United States
>>> 
>>> INSTANA GMBH | Dunkelnberger Str. 39 | 42697 Solingen | Deutschland
>>> Sitz der Gesellschaft: Solingen | HRB 26637 | Amtsgericht Wuppertal
>>> Geschäftsführer: Mirko Novakovic | Pavlo Baron
>> 
>> 
>> 
>> -- 
>> Henning Treu | Senior Software Engineer
>> Mobile: +49 160 1797564
>> 
>> INSTANA Inc | 222 South Riverside Plaza, 15th Floor | Chicago, IL 60606 | 
>> United States
>> 
>> INSTANA GMBH | Dunkelnberger Str. 39 | 42697 Solingen | Deutschland
>> Sitz der Gesellschaft: Solingen | HRB 26637 | Amtsgericht Wuppertal
>> Geschäftsführer: Mirko Novakovic | Pavlo Baron
> 
> 
> 
> -- 
> Henning Treu | Senior Software Engineer
> Mobile: +49 160 1797564
> 
> INSTANA Inc | 222 South Riverside Plaza, 15th Floor | Chicago, IL 60606 | 
> United States
> 
> INSTANA GMBH | Dunkelnberger Str. 39 | 42697 Solingen | Deutschland
> Sitz der Gesellschaft: Solingen | HRB 26637 | Amtsgericht Wuppertal
> Geschäftsführer: Mirko Novakovic | Pavlo Baron

Reply via email to