On Mon, 2 Jun 2025 09:02:05 GMT, Johannes Bechberger <jbechber...@openjdk.org> 
wrote:

>> I see. With a bounded queue as used in this solution, it can work quite 
>> nicely, that is, if the thread is actually on CPU in native, and just not 
>> waiting - if waiting (which is most likely) then pending requests could take 
>> a long time to be sent to consumers.
>> 
>> I also understand better the optimization you tried as part of async walk in 
>> native and frames. Also quite nice, to walk from the last JfrSampleRequest 
>> and do equals to "batch" the top JFR sample requests that are the same (i,.e 
>> taken for the ljf). Maybe you can retry that again, but then you need to 
>> save the sid AND the tid to be reused for the top equal requests (you only 
>> need stacktrace.record_inner() for one request). Its a nice optimization.
>
> The problem is when in between queue processing a new JFR chunk is started. 
> This caused problems before.
> 
> I would leave these kinds of optimizations for later.

Then I would recommend you drain immediately when the thread is in native, not 
waiting for the queue to fill up to 2/3. The reason is because the solution is 
based on CPU time samples and most threads that are _thread_in_native are 
waiting (i.e. they will not get their queues filled while in native).

I would recommend dropping the second clause about testing the queue size 
altogether.

That way you will not get threads stuck with a lot of events a long time in 
native, not being delivered.

Revive it later when you begin to attack the optimizations.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120855119

Reply via email to