I added a test in:

https://chromium-review.googlesource.com/c/v8/v8/+/6182264

With --no-flush-liftoff-code the external memory stays at 15k. With the
default --flush-liftoff-code the external memory rises until it hits 300k
and the test fails. Hopefully this can help diagnose the issue.


On Wed, Jan 15, 2025 at 5:03 PM Kenton Varda <ken...@cloudflare.com> wrote:

> By bisecting in production we determined that the problem is
> --flush_liftoff_code, which was enabled by default starting in 13.2. In our
> environment, this flag seems to leak memory that lives in the code cache
> and so affects newly-created isolates. I've filed a bug:
>
> https://issues.chromium.org/issues/390075235
>
> -Kenton
>
> On Tue, Jan 14, 2025 at 9:09 AM Kenton Varda <ken...@cloudflare.com>
> wrote:
>
>> On Tue, Jan 14, 2025 at 5:59 AM Jakob Kummerow <jkumme...@chromium.org>
>> wrote:
>>
>>> - from what you describe, perhaps it would be feasible to craft a
>>> reproducer. It'd probably have to be a custom V8 embedder that, in a loop,
>>> creates many fresh isolates and instantiates/runs the same (or several?)
>>> demo Wasm module in them.
>>>
>>
>> I tried exactly that yesterday, and was able to see that "external
>> memory" was indeed correlated across isolates, but after
>> creating/destroying thousands of isolates it seemed to converge on a
>> reasonable number rather than keep growing forever.
>>
>> But in prod we see something in external memory growing and growing.
>>
>>
>>> - it could make sense to verify (with printfs in their destructors) that
>>> both Isolates and NativeModules get destroyed as expected. It's
>>> conceivable that the memory growth you're observing is intentional caching
>>> (of generated code, or something?) because the WasmEngine thinks that
>>> the cached data is still needed/useful.
>>>
>>> How/where exactly are you seeing this increased "external memory"?
>>> I.e. what reporting system are you using to get memory consumption numbers?
>>>
>>>
>>> On Tue, Jan 14, 2025 at 1:09 AM Kenton Varda <ken...@cloudflare.com>
>>> wrote:
>>>
>>>> To add context here:
>>>>
>>>> The problem appears to show up only after running in production for an
>>>> hour or two. During that time we will have created thousands of isolates to
>>>> handle millions of requests.
>>>>
>>>> But the problem seems to affect *new* isolates, even when those
>>>> isolates are loaded with applications that had been loaded into previous
>>>> isolates without problems. Startup of an application should be 100%
>>>> deterministic since we disallow any I/O during startup, but we're seeing
>>>> that after the host has been running a while, new isolates are showing much
>>>> higher "external memory" on startup. (E.g. 400MB external memory, but we
>>>> enforce a 128MB limit on the whole isolate.)
>>>>
>>>> We observed that the wasm native module cache causes identical wasm
>>>> modules to be shared across isolates, and that wasm lazy compilation causes
>>>> memory usage of a wasm module -- as accounted by all isolates that have
>>>> loaded it -- to change.
>>>>
>>>> Could it be that there is a memory leak in lazy compilation, such that
>>>> these shared cached modules are gradually growing over time, to the point
>>>> where new isolates that try to load these modules are being hit with
>>>> extremely high "external memory" numbers right off the bat?
>>>>
>>>> -Kenton
>>>>
>>>> On Mon, Jan 13, 2025 at 5:31 PM Erik Corry <erikco...@chromium.org>
>>>> wrote:
>>>>
>>>>> It looks like it's related to shared objects between isolates. Is
>>>>> there a newer document than
>>>>> https://docs.google.com/document/d/18lYuaEsDSudzl2TDu-nc-0sVXW7WTGAs14k64GEhnFg/edit?usp=drivesdk
>>>>> that describes how this works today? In particular cross-isolate GCs?
>>>>>
>>>>> On Mon, 13 Jan 2025, 15:25 Jakob Kummerow, <jkumme...@chromium.org>
>>>>> wrote:
>>>>>
>>>>>> Sounds like a bug, but without more details (or a repro) I don't have
>>>>>> a more specific guess than that.
>>>>>>
>>>>>> If you're desperate, you could try to bisect it (even with a flaky
>>>>>> repro). Or review the ~500 changes between those branches:
>>>>>> https://chromium.googlesource.com/v8/v8/+log/branch-heads/13.1..branch-heads/13.2?n=10000
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 13, 2025 at 2:48 PM 'Dan Lapid' via v8-dev <
>>>>>> v8-dev@googlegroups.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> In V8 13.2 and 13.3 we see wasm isolates external memory usage
>>>>>>> blowing up sometimes (up to gigabytes).
>>>>>>> Under V8 13.1 the same code would never ever use more than 80-100MB
>>>>>>> The issue doesn't happen every time for the same wasm bytecode. It
>>>>>>> doesn't even reproduce locally.
>>>>>>> But some significant percentage of the time it does happen.
>>>>>>> This has only started happening in 13.2, what are we missing? Should
>>>>>>> we be enabling/disabling some flags?
>>>>>>> It also seems that 13.3 is significantly worse in terms of error
>>>>>>> rate.
>>>>>>> The problem happens under "--liftoff-only".
>>>>>>> We use pointer compression but not sandbox.
>>>>>>> We've tried enabling --turboshaft-wasm in 13.1 and the problem did
>>>>>>> not reproduce.
>>>>>>> Has anything changed that we need to adapt to?
>>>>>>> Would really appreciate your help!
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>

-- 
-- 
v8-dev mailing list
v8-dev@googlegroups.com
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-dev+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/v8-dev/CAHZxHpjNJWYQC9HoNtFon2k8mh79nYmTwMJpCn5K6woayHCBbQ%40mail.gmail.com.

Reply via email to