Re: Workspace overflow handling

Dridi Boukelmoune Mon, 09 Feb 2015 06:58:20 -0800

Hi Martin,

I have been thinking about this for a while, but couldn't go past "it is
sometimes overkill to crash the whole child process".


Regarding your comment on state unwinding, my main concern is critical
sections and shared/global state. But I'm not familiar enough with the
code base to measure the implications.

On an unrelated note, I have been trying to code in Rust[1] lately, and it
provides a `panic!` macro[2]. It will kill the failing thread/task and somewhat
leave the rest of the program alone[3] which I thought could also apply to
worker threads.

I know this is half off-topic, but on top of not panicking on memory
exhaustion, I thought we could also "panic" a single thread when it's a
worker (regardless of the issue) when we can't serve a 5xx response.
The current assert system is a bit too unforgiving for many cases.

I hope I'm making some sense here.

Regards,
Dridi

[1] http://www.rust-lang.org/
[2] http://doc.rust-lang.org/std/macro.panic!.html
[3] http://doc.rust-lang.org/reference.html#thread

On Mon, Feb 9, 2015 at 3:32 PM, Martin Blix Grydeland
<[email protected]> wrote:
>
> VDD Hamburg talking point:
>
> Varnish asserting on workspace overflow is a problem that we really should 
> address. It is most hurtful when it happens in Varnish core, as there are 
> many code paths relying on workspace being available. If none was available 
> the assertion triggers taking the cache with it. (Examples: Vary processing, 
> delivery processor pushes, delivery IO vectors etc). Creating proper error 
> handling and state unwinding for all these will be a major undertaking, and 
> also error prone as testing all the failure points will be very hard.
>
> Workspace exhaustion also hurt in VCL space. Most VRT functions are written 
> to handle it, but will do so by truncating the result and log the fact 
> (LostHeader). This masks errors, and can potentially be an attack vector for 
> circumventing VCL implemented security barriers. It also poses a DOS attack 
> vector, if you can know there are some serious manipulations happening on 
> some header and send large payloads on them, causing an assert later when 
> Varnish attempts delivery. In my opinion any failed attempt at setting a 
> header from VCL should result in an error response immediately as we could 
> not process the request properly.
>
> One way of dealing with this issue would be to add some guarantees for 
> workspace allocations: Unless the workspace overflow flag is already set, all 
> code is guaranteed to be able to allocate at least the set size of the 
> workspace. This is achieved by allocating twice the amount of needed 
> workspace on allocation. Since this space is normally untouched it will just 
> be virtual memory and not backed by real memory. (We might have to bypass 
> malloc and go for mmap anonymous to be able to do that). All 
> WS_Alloc/WS_Release calls will then update the overflow flag whenever half of 
> the available workspace has been used. Upon recycling of the workspace 
> (request or busyobj), the flag is tested and if an overflow occured an 
> madvise(MADV_DONTNEED/MADV_FREE) is issued on the second half of the mapping 
> to return the pages to the OS. This way the extra pages are returned to the 
> OS, causing the range to be pure virtual again.
>
> Error handling in Varnish core will now be able to just have a handful of 
> check points (mostly after the major VCL functions where we are prepared to 
> error out anyways). If the overflow flag is set, we write out a static 5xx 
> response (unless it's too late), and start processing the next request (or 
> close if that's too late).
>
> In VCL we will teach the VCC compiler to check after each statement if the 
> overflow flag is set, and return immediately when it is (so VCL execution is 
> terminated prematurely). The next check point in Varnish core will then pick 
> up that the overflow has happened and error out from there.
>
> Comments much appreciated.
>
> Regards,
> Martin Blix Grydeland
>
> --
> Martin Blix Grydeland
> Senior Developer | Varnish Software AS
> Mobile: +47 992 74 756
> We Make Websites Fly!
>
> _______________________________________________
> varnish-dev mailing list
> [email protected]
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

_______________________________________________
varnish-dev mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: Workspace overflow handling

Reply via email to