Sudheer Vinukonda created TS-3979:
-------------------------------------

             Summary: Deprecate allow empty doc cache setting
                 Key: TS-3979
                 URL: https://issues.apache.org/jira/browse/TS-3979
             Project: Traffic Server
          Issue Type: Bug
          Components: Cache
            Reporter: Sudheer Vinukonda


Refer TS-3978

Following a discussion on the IRC, opening this jira to deprecate the Allow 
empty doc cache setting. The consensus is to cache anything that's cacheable 
and valid (regardless, of if it's empty or non-empty).

{code}
Amaryllis
7:50

sudheerv: do any other reverse proxies refuse to cache empty documents without 
content-length?  it's not something i've noticed before (in varnish, for 
example)
Amaryllis
7:50

the connection could still be broken at any point in the response and result in 
broken content being cached
7:51

but as a compromise, what if there was another option to only cache empty 
responses for 3xx, which is probably 80% of use cases?
7:52

(maybe, in addition, it should accept transfer-encoding: chunked, since that 
can indicate end-of-body)
zwoop_
7:54

Amaryllis  we should talk to amc (he knows everything, we don't need Google 
here) about this
7:54

zwoop_ is now known as woop
7:54

woop is now known as zwoop
7:54

zwoop left the room (quit: Changing host).
7:54

zwoop [~zwoop@apache/committer/zwoop] entered the room.
7:54

mode (+o zwoop) by ChanServ
zwoop
7:55

Amaryllis amc Maybe we could have a plugin API, that basically forces to set 
f.allow_empty_doc ?
amc
7:55

I think you can change the CacheVC structure without problem.
Amaryllis
7:55

i think some sort of fix for this should be in core, at the moment TS won't 
cache any of our redirects
amc
7:56

There's not a global config to allow that?
zwoop
7:56

Amaryllis  does it send a Cache-Control header with the redirects ?
Amaryllis
7:56

amc, zwoop: the origin sends transfer-encoding: chunked instead of 
content-length.
7:56

so even with cache-control it won't be cached, that's what this PR is to change
7:56

TS-3978
ASFBot
7:56

TS-3978: Allow empty document caching to follow normal logic - 
https://issues.apache.org/jira/browse/TS-3978
zwoop
7:56

ah
7:57

yeah, that's expected
7:57

and is why we added that option to allow empty docs to be cached
Amaryllis
7:57

this is with that option enabled
zwoop
7:57

huh
Amaryllis
7:57

it still won't cache any document without content-length
zwoop
7:57

yeah, need CL: too
7:57

that's a requirement
Amaryllis
7:57

so i'll update the PR to also recognise chunked responses as being cacheable 
even if empty
zwoop
7:57

otherwise, it can't distinguish the cases you were worried about (a broken 
connection) vs a truly empty doc
Amaryllis
7:58

zwoop: right, but the PR makes it optional, only if allow_empty_doc=2
zwoop
7:58

heh, your redirect is Chunked ??
Amaryllis
7:58

yes, an empty chunked response 
zwoop
7:58

is that even correct?
Amaryllis
7:58

it's perfectly valid, if somewhat unusual
zwoop
7:58

doesn't the chunking require the final "0" ?
7:58

which means, the doc isn't empty
Amaryllis
7:58

i think ATS is looking at 'empty' after de-chucking, isn't it?
7:59

because it's definitely not cacheing these responses
zwoop
7:59

not sure
7:59

gancho_ left the room (quit: Ping timeout: 272 seconds).
amc
7:59

I think Amaryllis is correct.
zwoop
7:59

Amaryllis  but, I definitely remember that the CL: header was a requirement 
that can not be ignored (safely), so not sure I think having an 
allow_empty_doc=2 is ok
8:00

unless allow_empty_doc=2 also means allow Chunked content to be empty
amc
8:00

I need to check to see what ATS actually caches - the chunked data or the 
payload.
zwoop
8:00

but, you have to have some indicator telling us that the doc really is empty 
before we can cache it
8:00

amc is caches the unchunked data
8:00

it dechunks it, and caches that
8:01

is dechunk even a proper verb? 
amc
8:01

Ah, but it caches the encoding header.
Amaryllis
8:01

zwoop: https://dpaste.de/6kUT
zwoop
8:01

amc Hmmm, really ? That doesn't sound right
amc
8:01

"zwoop drank too much at the summit and dechunked himself in the hallway". Yep, 
a proper verb.
Amaryllis
8:01

that's the actual origin response causing problems for us
zwoop
8:01

amc I'd expect it to change it from chunked to a Content-Length:
amc
8:02

I was wondering about that.
zwoop
8:02

amc lets dechunk big time at the Bar camp
amc
8:02

What happens when the content is served? Is the encoding preserved and obeyed?
zwoop
8:03

for the first client (the cache miss) I think it seems the chunked response
Amaryllis
8:03

yes, TS returns a correct thunked response
8:03

and never caches it, so it's always the same
zwoop
8:03

but subsequent requests (when served out of cache) should have a CL:
amc
8:03

I was thinking of what happens for chunked responses from origin that are 
non-zero lenght.
zwoop
8:03

Amaryllis  can you not convince your origin that they are crazy, and change it 
to not send a Chunked empty response, and instead send CL: 0 ?
sudheerv
8:04

catching up on the messages…but, my concern is the same as zwoop's
Amaryllis
8:04

zwoop: no, it never adds a CL: https://dpaste.de/Goex
zwoop
8:04

Amaryllis  because youa re not caching it
Amaryllis
8:04

yes, exactly, because it's not cacheable 
zwoop
8:04

Amaryllis  amc is asking the general question, to which my answer is
Amaryllis
8:04

ah
zwoop
8:04

amc yes, I'd expect it to send the chunked response to the first client, and 
subsequent responses (out of cache) with a CL: 0
Amaryllis
8:04

perhaps i can make uwsgi do some sort of output buffering and put a CL header in
zwoop
8:05

amc that's at least how it's worked for the last ~10 years, we better not ahve 
changed it
Amaryllis
8:05

zwoop: how can that ever happen?  a chunked empty response will never be cached
zwoop
8:05

not talking empty responses
8:05

general case
amc
8:05

I have many opinions on this. I do think it should be cacheable as an empty doc 
if it's chunked with a zero length payload.
Amaryllis
8:05

okay, but it's not CL: 0 then
zwoop
8:05

your case is deranged
8:05

Amaryllis  right, nothing to do with your case
Amaryllis
8:05

amc: shall i submit a separate PR to fix that?
amc
8:06

Yes.
zwoop
8:06

amc yeah, that seems reasonable. But, If it was me, I'd still beat the Origin 
people over the head, cause they are crazy
sudheerv
8:06

amc: yes, but, you have to ensure that you received the final chunk
8:06

if it already is the case, by the time you are checking for allow empty doc 
caching, then that's fine
amc
8:06

Right. That is happening in the clips Amaryllis showed us.
Amaryllis
8:06

zwoop: i think it's because the origin sends the headers before it's generated 
the response body.  so it doesn't know it's going to be empty
sudheerv
8:06

but, i dont know if that's the case already
zwoop
8:07

Amaryllis  it would know it's a redirect no?
8:07

Amaryllis and why would a redirect have a body, ever ?
Amaryllis
8:07

yes, but redirects can still have bodies
sudheerv
8:07

you may get empy responses without content-length when we support outbound 
spdy/h2 as well
zwoop
8:07

true
Amaryllis
8:07

most web servers include bodies in 3xx
8:07

i did have the origin fix it in one specific case but we need something more 
general
zwoop
8:07

yeah, I forgot about that, you're right
8:07

niq left the room (quit: Ping timeout: 250 seconds).
amc
8:08

If it's chunked and ATS recieves the final payload (0 length frame) it can 
reasonably presume it got valid (non-truncated) content.
sudheerv
8:08

Amaryllis: I think the patch should ensure it can cover non-broken empty body 
cases for chunked-encoding/spdy/h2
Amaryllis
8:08

sudheerv: does this even apply to spdy/h2?
Amaryllis
8:09

actually, there is no H2 to origins at all...
sudheerv
8:09

not right now - we don't support outbound spdy/h2 yet
Amaryllis
8:09

right
sudheerv
8:09

but, when we do, it will be a similar problem there
Amaryllis
8:09

yes
sudheerv
8:09

there's no chunked encoding or content-len with spdy/h2
Amaryllis
8:09

but i can't really test code that's yet to be written 
sudheerv
8:09

so, yet another case of empty body 
8:09

no, my point is - the code we write now must not break when we have spdy/h2
8:10

so, instead of "assuming" that the body is valid, it might make sense
8:10

to somehow indicate that it is
Amaryllis
8:10

well, i think it will break anyway, in the sense that empty H2 responses won't 
be cached... since they have no CL?
sudheerv
8:10

well, but you are fixing that 
Amaryllis
8:10

but H2 doesn't do TE either, does it?  i thought it replaces that entire thing
sudheerv
8:10

if it's consistently broken (or not supported), that's alright
8:10

yes, it doesn't
8:10

that's my point
Amaryllis
8:11

all i'm adding (for now) is to make it accept CL _or_ TE
sudheerv
8:11

so, we need to have a generic way of notifying cache layer
8:11

that the response is valid
Amaryllis
8:11

and test that empty TE is properly checked for the empty chunk
amc
8:12

Hmmm. Maybe a virtual on CacheVConnection "setAllowEmptyContent(bool)".
sudheerv
8:12

yeah, something like that
amc
8:12

Then an implementation in CacheVC.
sudheerv
8:12

amc: i also want to make this setting overridable
amc
8:12

Then the higher layer can say "I know this was a valid zero length content, 
cache monkey"
sudheerv
8:12

(but, that's for later)
8:12

yes, that makes sense
Amaryllis
8:13

amc: CacheVC already has allow_empty_doc flag, i think that's enough
sudheerv
8:13

Amaryllis: yeah, it does - I poked around that a bit recently, didn't seem very 
straightfwd
amc
8:13

Just bang on that directly?
sudheerv
8:13

but, pls go ahead and do it if you find it easy 
Amaryllis
8:13

amc: that was the plan, yes 
8:14

amc: 
https://github.com/apache/trafficserver/pull/310/files#diff-7fada9a23fa1d12e90ca6bec876ecf8fL477
8:14

amc: ... that check just needs another check for chunked as well.
sudheerv
8:15

Amaryllis: if we did something like what amc suggested above
amc
8:15

No, I meant that it would be set from (for instance) the chunked decoder or 
just above that.
sudheerv
8:15

where the higher layer can notify cache that it can be cached
8:15

then we don't have to check any headers
8:15

that already seems hacky enough
amc
8:15

So there wouldn't be an addiitional value for the config, it would just work as 
epexted.
8:15

expected.
sudheerv
8:15

yes
jpeach
8:15

there's already a function that checks whether the whole document was received
amc
8:16

Hmmm. Interesting.
Amaryllis
8:16

i think some things are getting mixed up here - i'm not going to add an 
additional config value
8:16

that's what the rejected PR did
jpeach
8:16

it is used when figuring what to do when getting an EOF
amc
8:16

What I would want is a way for the chunk decoder or its parent logic to pass on 
to the CacheVC the validity of the document.
8:17

Amaryllis-  I thought your PR added the value '2' to the config variable.
jpeach
8:17

afaict as long as you can know you have the entire doc, it could be cacheable
sudheerv
8:17

jpeach: that makes sense
Amaryllis
8:17

amc: yes, but people disliked that, so instead i suggested fixing it to check 
for chucked encoding
8:17

amc: which doesn't need an additional config setting
sudheerv
8:17

but, i guess the complication is that, at what ppint to do you know that you 
received the whole doc
amc
8:17

OK, I must be reading the wrong PR.
sudheerv
8:17

is it before checking allow empty doc or after
Amaryllis
8:18

amc: there's only one PR, i didn't write the chunked version yet
sudheerv
8:18

if it's just a matter of changing the order to check for whole doc, without 
breaking anything , then that seems like a cleaner solution
amc
8:18

Ah, OK.
8:19

Yes, I agree. I think we're all on the same page - check for getting the whole 
document and if it's zero length and the config value is 1, cache it.
Amaryllis
8:19

(we need something for 6.0 fairly quickly, so we'll be using this patch in 
production, but i'm happy to put more effort into fixing it properly)
sudheerv
8:19

+1
amc
8:19

The brokeness is that complete chunked stuff isn't being checked correctly.
8:19

Amaryllis - coool.
Amaryllis
8:19

yes - that's the only bit that actually needs to be fixed, but it might also 
make sense to refactor how the check is done
amc
8:20

Allright.
Amaryllis
8:20

i'm not sure i know enough about the TS internals to actually implement the 
second bit though
amc
8:20

I have to go get my Macbook fixed - update took out my wireless.
8:21

I'll see if I have some time this weekend to take a look.
sudheerv
8:21

Amaryllis: if it's okay with everyone else, adding the check for TE header just 
to fix the TE part alone for now is fine by me as well
8:21

(we can open a separate jira to improve the code for future)
8:22

(was just saying, that instead of adding case by case, it'd be nicer to have a 
cleaner solution - but, by no means, it's immediately required)
Amaryllis
8:23

sudheerv: sure
jpeach
8:23

found it ... HttpSM::is_http_server_eos_truncation
8:23

dustywusty_ is now known as dustywusty
sudheerv
8:23

jpeach: that might be a bit late?
8:24

it seems to be after tunnel completes
jpeach
8:24

the place it is called from might be too late, but that's the logic for knowing 
if we have the whole response
sudheerv
8:24

whereas, the tunnel is writing to UA and Cache in parallel
zwoop
8:24

sorry, had to reboot.
sudheerv
8:24

jpeach: right, but, the point is
zwoop
8:24

amc sudheerv  I'm thinking, maybe we should just deprecate this setting 
completely ? And always allow caching empty docs when we safely can ?
sudheerv
8:24

you may get an EOS after some body is recieved already
8:25

zwoop: +1 to that
zwoop
8:25

it was added for compatibility reasons eons ago, but we changed the default to 
"1"  in 5.x I think
sudheerv
8:25

we just need to know when is it safe 
jpeach
8:25

sudheerv: I'm just saying that we have logic to know whether we have the whole 
response; not that the logic is already used in all the right places
zwoop
8:25

we can deprecate it now (marking it as "don't change / use this unless you 
really, really have to" and remove it for 7.0.0
sudheerv
8:27

+1
8:27

reveller left the room.
zwoop
8:28

sudheerv  file a Jira on it maybe ? Or two. We should change the docs now 
(marking it deprecated) and a bug for 7.0.0 for removal
sudheerv
8:29

sure..
8:30{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to