Ok, if allowlisting vs blocklisting is the heart of the issue, I can accept
that this is a design requirement.

So, re: parse vs. scan -- I'm not sure this is a sufficient simplification.
In particular, if memory serves, our parse cost is roughly 50% scanner and
50% token interpretation + AST building, so you'll get at best a ~2x
speedup over a full parse (or over a pre-parse? I don't remember the exact
breakdown). Particularly there's a cost to identifying keywords vs
identifiers, but we could probably drop that and ignore keywords. Parsing
strings and regexp has some cost, but you could maybe make them cheaper
with stronger approximations (race to closing quotes, that sort of thing).
Then, I wouldn't check if the token combination is a definitely valid one,
just whether the tokenizer failed at all + some simple token-based
heuristics (like brace matching, simple patterns). Tokenizer failure would
most likely catch almost all binary formats; non-binary formats are likely
either too-JS compatible (like some raw JSON, a lot of YAML, and I think
all CSV, is valid JS) and would need to still rely on a more blocklist
approach with said token heuristics.

Getting a TC39 approved version of this... well, any spec word is hard. +Shu-yu
Guo <[email protected]>.

On Thu, Jun 2, 2022 at 5:36 PM 'Daniel Vogelheim' via v8-dev <
[email protected]> wrote:

> On Thursday, June 2, 2022 at 9:46:15 AM UTC+2 [email protected] wrote:
>
>> Can we not detect these via some magic number sniffing? I'm fundamentally
>> concerned about an allowlist approach for JS over a blocklist approach for
>> non-JS.
>>
>
> This is pretty much the heart of the issue: The entire thing of CORB to
> ORB transition is to go from "blocklist" to "allowlist", based on the
> observation that block lists ultimately never seem to work. In particular,
> we don't want to pass things by default, where anything we don't know
> automatically passes. That does lead us to an allowlist, in some form.
> Elsewhere, I summarized (my understanding of) the ORB security requirements
> as this: For "no-cors" requests, we want to have some positive evidence
> that the data we're receiving is in a format suitable for the request type.
>
> Being able to drop unknown stuff by default is really the core benefit of
> ORB.
>
> I do think we have quite a bit of leeway to decide what form of "positive
> evidence" we'll accept. The current draft specifies a full JS parse, which
> I think is way over the top. But I do think we need *something* that
> tells us with some probability whether a given byte sequence looks like JS
> or not. The only hard criteria is that actually valid JS should pass,
> because otherwise we'll break websites left and right. (To that end, "while
> (1);" was arguably a terrible example.) (Caveat: Those are my opinions.
> Other browsers might have stronger opinions.)
>
>
> IMHO, checking for "parser breakers", the way CORB does, is a convenient
> temporary solution, because we already know it's web compatible.
>
> IMHO, a full parse (in the network process, or triggered by the network
> process) is crazy, and I'd really like to have something more lightweight.
>
> Which leads me to the proposal to only use the scanner to look for a few
> tokens. And ideally for TC39 to adopt some sort of SmellsLikeJavaScript
> abstract operation that other standards could point to.
>
>
>
>>
>> Note that CSV is sadly valid JS, so that won't be blocked at all.
>>
>> On Wed, Jun 1, 2022 at 6:45 PM 'Łukasz Anforowicz' via v8-dev <
>> [email protected]> wrote:
>>
>>>
>>>
>>> On Wed, Jun 1, 2022 at 8:34 AM Leszek Swirski <[email protected]>
>>> wrote:
>>>
>>>> On Wed, Jun 1, 2022 at 5:17 PM 'Łukasz Anforowicz' via v8-dev <
>>>> [email protected]> wrote:
>>>>
>>>>> Benefit of full JS parse over a list of known non-JS prefixes:
>>>>> Stricter is-it-JS checking = more non-JS things get blocked = improved
>>>>> security.  Still, there is a balance here - some heuristics (like the ones
>>>>> proposed by Daniel) are almost as secure as full JS parse (while being
>>>>> easier to implement and having less of a performance impact).
>>>>>
>>>>
>>>> Makes sense, I'm just asking to make sure that we strike the right
>>>> balance between security improvements and complexity/performance issues;
>>>> even a JS tokenizer without a full parser is quite a complexity investment
>>>> (it needs e.g. a full regexp parser), plus the language grammar is
>>>> sufficiently broad that I expect exhaustively enumerating all possible
>>>> combinations of even just 3-5 tokens to be prohibitively large (setting
>>>> aside maintainability in the face of ever-updating standards).
>>>>
>>>> Do we have a measure of how much non-JS coverage the current heuristics
>>>> give, on real-world examples of JSON files? Or perhaps, a measure of how
>>>> many different prefixes there are that we could blocklist? Do we know at
>>>> what point the improved security has diminishing returns?
>>>>
>>>
>>> Examples of a response bodies that we would want to block, but that
>>> wouldn't get blocked without full JS parsing/verification (assume that the
>>> responses below are served as text/html or application/octet-stream):
>>>
>>>    - PDF
>>>    - ProtoBuf
>>>    - Microsoft Word
>>>    - CSV files
>>>
>>>
>>>> - Leszek
>>>>
>>>> --
>>>> --
>>>> v8-dev mailing list
>>>> [email protected]
>>>> http://groups.google.com/group/v8-dev
>>>> ---
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "v8-dev" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/v8-dev/CAGRskv9UUNJ9sjW0FvuHyCN90j%3DfbafSOgGVBG19qRe19_%2BO5w%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/v8-dev/CAGRskv9UUNJ9sjW0FvuHyCN90j%3DfbafSOgGVBG19qRe19_%2BO5w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>>
>>> Lukasz
>>>
>>> --
>>> --
>>> v8-dev mailing list
>>> [email protected]
>>> http://groups.google.com/group/v8-dev
>>> ---
>>>
>> You received this message because you are subscribed to the Google Groups
>>> "v8-dev" group.
>>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUE%3DgtMdPPzFGy-gSuvV62VqesgRdkTkfvpOXNf9xHKpYQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUE%3DgtMdPPzFGy-gSuvV62VqesgRdkTkfvpOXNf9xHKpYQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "v8-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/v8-dev/3ab87558-c9ea-484c-b42a-459380e8ad25n%40googlegroups.com
> <https://groups.google.com/d/msgid/v8-dev/3ab87558-c9ea-484c-b42a-459380e8ad25n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/CAGRskv-koJeiWCti%2B8DgRcDAMMnRoUDN_WtY_VL8diSdxLrM6Q%40mail.gmail.com.

Reply via email to