Hi Emanuel,
I actually enabled loop unrolling for Wasm, in LLVM, for this same reason.
The Turboshaft unroller doesn't appear to be triggering for the one loop I
looked at, as I think it's too big. I have tweaked the TS unroller
heuristics to generally unroll less, as added the compilation time
> This sounds like the easiest option, so the first thing to try..?
Sure, go for it :)
> With the caveat that yesterday I looked at one loop, on one machine, *the*
> loop
is still doing a reasonable amount of vector number crunching, perf reports
the overhead of load and compare as 18% and 10%,
> Do you mean a load from a memory region that we would protect when we
want an interrupt to happen so that the load traps then and we would rely
on the trap handler to process the interrupt request? If so: maybe, but
that's still a load, which isn't great.
Yes, but because the subsequent instr
Hi,
> what is the upper bound for interrupt latency?
There is no clear official upper bound. LoopStackCheckElisionReducer
removes stack checks from loops that have less
than kMaxIterForStackCheckRemoval (= 5000) iterations, but I chose that
number fairly arbitrarily, using the "meh, sounds vaguel
>
> what is the upper bound for interrupt latency? If we have an inner loop,
> without calls, I would assume we could have a reasonable number of
> instructions and iterations before that latency would be adversely affected?
>
Yes, interrupting doesn't need to be immediate. There are situations wh
Hi,
It seems my previous reply to Emanuel didn't send... Basically, I have
enabled wasm unrolling in LLVM to help mitigate this already, and I have
modified the TS unroller to help improve compile time, and performance, so
I'd like to avoid increasing code size.
With respect to figuring out wh
Hi Sam,
Just so that everybody is on the same page: these stack checks are really
interrupt checks rather than stack-overflow checks or something like that.
And yea, performing them on every loop iteration is a bit of a waste of
time, but we kinda need them in every loop where we can't prove that
Hi Sam,
In principle, loop unrolling should already reduce the number of stack
checks, but it could be that it's insufficient or that for whatever reason
this optimization does not get applied here. Did you take a look at the
generated code?
Cheers,
Emanuel
On Thu, Jun 12, 2025 at 4:44 PM Sa
Hi!
While running some AI code, the loop header WasmStackCheck was appearing
quite heavily in the profile. Disabling the checks results in ~1.5%
speedup. So, is it necessary to execute these for every iteration? Or could
we wrap inner loops, devoid of a stack check, in a new loop with one so
t