VCL_STRANDS

2017-12-18 Thread Poul-Henning Kamp
I have gone over the VCC and added an alternate way of passing
a uncomposed string to functions, as an alternative to STRING_LIST.

STRANDS is basically a STRING_LIST which gets stuffed into a
on-stack struct, so that more than one STRANDS argument can be
passed to a (VMOD-)function, something which is not possible
with STRING_LIST because it uses the var-args mechanism.

One place where this is now used is in string comparisons in
VCL, this may save significant workspace for some users.

While at it, I have also added support for <, <=, >= and > 
string comparisons.

In the process I have done major surgery on string-handling
in VCC, cleaning it up in the process, and therefore I kindly
ask everybody to be on the lookout for things which changed or
fails now.

Feedback from VMOD writers welcome...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


Re: VFP and VDP configurations

2017-12-18 Thread Nils Goroll
Hi,

at first, I found Rezas concept appealing and there are some aspects which I
think we should take from it:

- take the protocol-vpfs v1f_* h2_body out of the game for vcl

- format specifiers:

- have:

  (gzip), (plain) *1), (esi)

- ideas:

  (br), (buffertext) *2)

esi being a format which can contain gzip segments, but that's would
be opaque to other vfps

- the notion of format conversion(s) that a vfp can handle, e.g.

- have:

  esi: (plain)->(esi), (gzip)->(esi)

  gzip: (plain)->(gzip)
  ungzip: (gzip)->(plain)

- ideas:

  br: (plain)->(br)
  unbr: (br)->(plain)

  re: (plain)->(plain)


But reflecting on it, I am not so sure about runtime resolution and these
aspects in particular:

- "algorithm (...) can reorder the candidates if that allows a match."

- "(A VFP) can (...) add new VFPs to the candidate list, remove itself, remove
   other VFPs, or delete itself or other VFPs

I wonder how we would even guarantee that this algorithm ever terminates.

So I think we really need to have VCL compile time checking of all possible
outcomes:

- Either by keeping track of all possible filter chain states at each point
  during VCL compilation

- or by restricting ourselves to setting all of the filter chain at once.

The latter will probably lead to largish decision trees in VCL for advanced
cases, but I think we should start with this simple and safe solution with the
format/conversion check.

Nils

*1) "(text)" in reza's concept

*2) not sure if this is a good idea, maybe multi segment regexen are the better
idea
___
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


Re: VFP and VDP configurations

2017-12-18 Thread Dridi Boukelmoune
> So from VCL, here is how we add VFPs:
>
> VOID add_vfp(VFP init, ENUM position = DEFAULT);
>
> VFP is "struct vfp" and any VMOD can return that, thus registering itself as
> a VFP. This contains all the callback and its input and output requirements.
>
> position is: DEFAULT, FRONT, MIDDLE, LAST, FETCH, STEVEDORE
>
> DEFAULT lets the VMOD recommend a position, otherwise it falls back to LAST.
> FETCH and STEVEDORE are special positions which tells Varnish to put the VFP
> in front or last, regardless of actual FRONT and LAST.

I think the position should be mapped closer to HTTP semantics:

$Enum {
content,
assembly,
encoding,
transfer,
};

The `content` value would map to Accept/Content-Type headers, working
on the original body. The order shouldn't matter (otherwise you are
changing the content type) and you could for example chain operations:

- js-minification
- js-obfuscation

You should expect the same results regardless of the order, of course
the simplest would be to keep the order set in VCL. The `content` step
would feed from storage where the body is buffered.

The `assembly` value would map to ESI-like features, and would feed
from the content, with built-in support for Varnish's subset of ESI.

The `encoding` value would map to Accept-Encoding/Content-Encoding
headers. With built-in support for gzip and opening support for other
encodings. It would feed from the contents after an optional assembly.

The `transfer` value would map to Transfer-Encoding headers, with
built-in support for chunked encoding. ZeGermans could implement
trailers this way.

Would this step make sense in h2? If not, should Varnish just ignore them?

Now problems arise if you have an `encoding` step in a VFP (eg. gzip'd
in storage) and use `content` or `assembly` steps in a VDP for that
same object, or a different encoding altogether. But in your proposal
you don't seem bothered by this prospect. Neither am I, because that's
only a classic memory vs cpu trade off. But it might be hard to implement
the current ESI+gzip optimization if we go this route (or a good reason to
go back to upstream zlib).

Dridi
___
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


Re: VFP and VDP configurations

2017-12-18 Thread Dridi Boukelmoune
> - format specifiers:
>
> - have:
>
>   (gzip), (plain) *1), (esi)
>
> - ideas:
>
>   (br), (buffertext) *2)
>
> esi being a format which can contain gzip segments, but that's would
> be opaque to other vfps
[...]
> *1) "(text)" in reza's concept

Or "identity" to match HTTP vocabulary.

> *2) not sure if this is a good idea, maybe multi segment regexen are the 
> better
> idea

For a lack of better place to comment, in my previous message I put
`content` before `assembly`. On second thought it should be the other
way around, otherwise  tags break the content type.

Dridi
___
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


Re: VFP and VDP configurations

2017-12-18 Thread Reza Naghibi
> take the protocol-vpfs v1f_* h2_body out of the game for vcl

Those will be builtin on the delivery side. So I didnt really dive into
VDPs, but it works similar to VFPs in that the client expects a certain
kind of response, so its upto the VDP chain to produce a matching output.
So if the client wants an H2 range response gzipped, then that chain needs
to be put together starting at resp in the stevedore and ending at the
client. So its different, but the same structure and rules apply.

> I wonder how we would even guarantee that this algorithm ever terminates.

Right, since processors can modify the chain as its being built and change
things mid flight, this could definitely happen. So the only thing to do
here is have a loop counter and break out after a certain amount of
attempts at creating the best fit chain. Its kind of like a graph search
where when you hit every node, the node can change the graph ahead of you
or optionally move you back positions. So in this case, its very possibly
to get stuck in an unavoidable loop.

> I think the position should be mapped closer to HTTP semantics

I think this makes too many assumptions? For example, where would security
processors go? Knowing what I know about whats possible with these things,
I think the processor universe might be bigger than the 4 categories you
listed out.

I think this brings up an important point, which is that for us to be
successful here, we really need to bring forward some new processors to be
our seeds for building this new framework. This will drive the requirements
that we need. I think there will be a lot of uncertainty if we build this
based on theoretical processors. I think its alright if these new
processors are simple and our new framework starts off simple as well. This
can then evolve as we learn more. For me, I have written a handful of
processors already, so a lot of what I am proposing here comes from past
experience.



--
Reza Naghibi
Varnish Software

On Mon, Dec 18, 2017 at 8:58 AM, Dridi Boukelmoune  wrote:

> > - format specifiers:
> >
> > - have:
> >
> >   (gzip), (plain) *1), (esi)
> >
> > - ideas:
> >
> >   (br), (buffertext) *2)
> >
> > esi being a format which can contain gzip segments, but that's
> would
> > be opaque to other vfps
> [...]
> > *1) "(text)" in reza's concept
>
> Or "identity" to match HTTP vocabulary.
>
> > *2) not sure if this is a good idea, maybe multi segment regexen are the
> better
> > idea
>
> For a lack of better place to comment, in my previous message I put
> `content` before `assembly`. On second thought it should be the other
> way around, otherwise  tags break the content type.
>
> Dridi
>
___
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Re: VFP and VDP configurations

2017-12-18 Thread Dridi Boukelmoune
>> I think the position should be mapped closer to HTTP semantics
>
> I think this makes too many assumptions? For example, where would security
> processors go? Knowing what I know about whats possible with these things, I
> think the processor universe might be bigger than the 4 categories you
> listed out.

I'm a bit perplex regarding theoretical security processors...

> I think this brings up an important point, which is that for us to be
> successful here, we really need to bring forward some new processors to be
> our seeds for building this new framework. This will drive the requirements
> that we need. I think there will be a lot of uncertainty if we build this
> based on theoretical processors.

...since you explicitly advise against designing for theory.

With the 4 categories I listed I can fit real-life processors in all of them:

- assembly: esi, edgestash, probably other kinds of include-able templates
- content: minification, obfuscation, regsub, exif cleanup, resizing,
watermarking
- encoding: gzip, br
- transfer: identity, chunked, trailers

My examples were VDP-oriented (from storage to proto) but would work
the other way around too (except assembly that I can't picture in a
VFP). You can map encoding and transfer processors to headers:
imagining that both gzip and brotli processors are registered, core
code could pick one or none based on good old content negotiation.

Now where would I put security processors? The only places where it
would make sense to me is content. But then again, please define
security (I see two cases off the top of my head, both would run on
content).

> I think its alright if these new processors
> are simple and our new framework starts off simple as well. This can then
> evolve as we learn more. For me, I have written a handful of processors
> already, so a lot of what I am proposing here comes from past experience.

Sure, with the ongoing work to clarify vmod ABIs this one should
definitely start as "strict" until we get to something stable. However
on the VCL side it is not that simple, because we don't want to break
"vcl x.y" if we can avoid it.

We could mimic the feature/debug parameters:

set beresp.deliver = "[+-]value(,...)*";

A + would append a processor to the right step (depending on where it
was registered), a - would remove it from the pipeline, and a lack of
prefix would replace the list altogether. That would create an
equivalent for the `do_*` properties, or even better the `do_*`
properties could be syntactic sugar:

set beresp.do_esi = true;
set beresp.do_br = true;
# same as
set beresp.deliver = "+esi,br";

Dridi
___
varnish-dev mailing list
varnish-dev@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev