greg closed this task as "Resolved".
greg assigned this task to BBlack.
greg added a comment.

No, we never made an incident rep on this one, and I don't think it would be fair at this time to implicate ORES as a cause. We can't really say that ORES was directly involved at all (or any of the other services investigated here). Because the cause was so unknown at the time, we stared at lots of recently-deployed things, and probably uncovered hints at minor issues in various services incidentally, but none of them may have been causative.

All we know for sure is that switching Varnish's default behavior from streaming to store-and-forward of certain applayer responses (which was our normal mode over a year ago) broke things, probably because some services are violating assumptions we hold. Unfortunately proper investigation of this will stall for quite a while on our end, but we'll probably eventually come back with some analysis on that later and push for some fixups in various services so that we can move forward on that path again. The RB timeouts mentioned earlier seem a more-likely candidate for what we'll eventually uncover than ORES at this point.

I think this is as good as we're going to get with this one, sadly.



