Thank you Stefan, this was very helpful. I configured
org.apache.sling.jcr.oak.server.internal.OakSlingRepositoryManager to
slightly increase the queue length, and also to enable limitCommitRate.
No more missed events now.
On 3/1/23 11:29, Stefan Egli wrote:
Hi Sergiu,
Yes there is. The default in Oak is 10000 [0] and it is set at
repository creation time via withObservationQueueLength [1]. In AEM via
oak.observation.queue-length on the SlingRepositoryManager. Note however
that this directly increases memory usage and is sometimes correlated to
inefficient observation handlers. Typically the observation queue should
remain small.
Cheers,
Stefan
--
[0] -
https://github.com/apache/jackrabbit-oak/blob/c6ddcc55bee3de915459af01e91edad32d538f3d/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/spi/commit/BackgroundObserver.java#L57
[1] -
https://github.com/apache/jackrabbit-oak/blob/01a6709f76f65a5118ee18c8f5b194d29d1d60b9/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/Jcr.java#L257
On 01.03.23 16:44, Sergiu Dumitriu wrote:
Indeed, that is the case. Is there any way to increase that limit? Or,
how can I catch that external event?
Thanks,
Sergiu
On 3/1/23 10:41, Stefan Egli wrote:
Hi Sergiu,
Are you using Jackrabbit Oak underneath?
If yes, Oak has a limit on how many observation event it can keep in
memory. Once it goes beyond that limit it will collapse events into
an external event that looses individual commit granularity.
That could be one explanation that would come to mind for what you
are describing.
Cheers,
Stefan
On 24.02.23 16:35, Sergiu Dumitriu wrote:
We noticed that under heavy load a ResourceChangeListener doesn't
receive notifications about all the changes in the repository. Is
this expected behavior?
More details:
We have a batch import scheduled job that pulls in a few thousand
records from an external database and creates a new node for each of
them (a Visit node). For each of the new nodes, we have a
ResourceChangeListener that creates additional nodes as needed (one
or two Surveys associated with the Visit), and changes the Visit
node to mark that it has surveys. All the new Visit nodes correctly
trigger a ResourceChange event which is correctly processed by the
survey assignment ResourceChangeListener. The subsequent change of
the hasSurveys property in the Visit node triggers a new
ResourceChange event, which is then caught by another
ResourceChangeListener that is monitoring the status of the visits.
And this second listener doesn't always receive events.
Some debugging:
We only noticed the issue only when our import went from tens of
records in our test environment to thousands of records in
production. The first few hundreds of records don't seem to be
affected, then more and more events are missed, with a total of
between 5 and 10% event loss. When I added a 500ms delay between
each imported record, no events were lost. With a 100ms delay
between each imported record, 2% of the events were lost.
There's nothing in the logs (at the INFO level) to indicate that the
events were intentionally dropped.
Any ideas?
Thanks,
--
Sergiu Dumitriu
http://purl.org/net/sergiu