Re: [v8-dev] Utility to check if a given stream can parse as Javascript (ORB)

'Łukasz Anforowicz' via v8-dev Tue, 31 May 2022 10:34:25 -0700

On Tue, May 31, 2022 at 9:00 AM Leszek Swirski <[email protected]> wrote:


> I want to note one thing here, kind of a side observation really:
> while(1); is valid JS, it's just an infinite loop. Do we also want to
> guard against common patterns like this?
>

FWIW today CORB explicitly detects and blocks `while(1);` (the code here
<https://source.chromium.org/chromium/chromium/src/+/main:services/network/public/cpp/corb/corb_impl.cc;l=471-505;drc=3c60abdfc28ef5be216ebdf4501cf3a24c901007>
has
some extra comments and details).  OTOH, 1) I am not sure if detecting
`while(1);` is a hard requirements (maybe detecting JS-parser-breakers
<https://source.chromium.org/chromium/chromium/src/+/main:services/network/public/cpp/corb/corb_impl.cc;l=483-498;drc=3c60abdfc28ef5be216ebdf4501cf3a24c901007>
is sufficient), and 2) I am not sure if/how `while(1);`-related
considerations impact the main points and questions from Daniel.

>
> - Leszek
>
> On Tue, May 31, 2022 at 2:45 PM 'Daniel Vogelheim' via v8-dev <
> [email protected]> wrote:
>
>> Hi all,
>>
>> Apologies for reviving this thread, but this problem is coming up again.
>> I think the answer of parsing in a separate process would work, but I'd
>> really like to find a simpler solution. For all I can see, the underlying
>> security requirements should be much less strict than the current ORB
>> proposal implies. An approximation should do just fine. For example, for
>> media formats we just look for a "magic number" (e.g. a 3-byte constant for
>> JPEG files); so I don't think we need a full parse of the input.
>>
>> Here is how I'd like to simplify this:
>> - Run only the JS scanner. (Including charset + comment processing.)
>> - Take the first N tokens. I suspect N=3 would be enough.
>> - Check the token list against a set of permissible token sequences.
>>
>> Even for small N a complete list of permissible sequences might be rather
>> large. It might be worth approximating it.
>> In either case, this method easily distinguishes valid JS from pretty
>> much any of the requirements from Lukasz' earlier mail (except "while(1);",
>> which needs N>=5). It does leave some ambiguity towards JSON, but IMHO
>> that's tolerable.
>>
>> Would this make sense from a V8 perspective?
>>
>> Is it possible to generate a list of possible token sequences from the JS
>> grammar, or would one have to do that manually? (For, say, N=3)
>>
>> The question of standardization has also come up. Could TC39 maybe be
>> convinced to adopt such a JavaScript sniffer, since it's fundamentally an
>> operation on JS syntax? (That would hopefully prevent the sniffer and the
>> actual syntax from getting out of sync as JS evolves.)
>>
>> Any thoughts?
>>
>> Daniel
>>
>> On Wednesday, September 1, 2021 at 5:46:25 PM UTC+2 [email protected]
>> wrote:
>>
>>> Wait, no, we do handle running out of stack in a robust way and the
>>> "does this parse" should just return false then (even though the code might
>>> be valid Js). Please ignore that part of my comment :)
>>>
>>> On Wed, 1 Sep 2021, 16:38 Marja Hölttä, <[email protected]> wrote:
>>>
>>>> A random side note: it's also possible to make V8's recursive descent
>>>> parser run out of stack using valid JS, e.g., let a = [[[[[..[ 0 ]]]]]..]
>>>> or other similar constructs (deep enough). Meaning you prob don't want to
>>>> call into the parser in a process where you don't want this to happen.
>>>>
>>>> Re: encodings, when I worked on script streaming I noticed it's pretty
>>>> common that scripts advertised as UTF-8 are not valid UTF-8 (e.g., have
>>>> invalid chars inside comments), and Chrome is currently pretty lenient
>>>> about those.
>>>>
>>>>
>>>> On Wed, Aug 18, 2021 at 3:18 PM Toon Verwaest <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 18, 2021 at 2:29 AM 'Łukasz Anforowicz' via v8-dev <
>>>>> [email protected]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 17, 2021 at 6:59 AM Toon Verwaest <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thinking out loud: One idea could be to have a separate sandboxed
>>>>>>> compiler process in which we compile incoming JS code. That could reject
>>>>>>> the source if it doesn't compile; or compile it to a script that just
>>>>>>> throws with no additional info about the actual source.
>>>>>>>
>>>>>>> That process could implement streaming compilation; so we don't
>>>>>>> block streaming until later, we don't double parse, we still have a 
>>>>>>> sandbox
>>>>>>> (not in the network process). There might even be benefits for caching 
>>>>>>> as a
>>>>>>> compromised renderer cannot look at the compilation artefacts until it
>>>>>>> receives them.
>>>>>>>
>>>>>>> If we fully compile and create a code cache from the compilation
>>>>>>> result we don't need a new API on the V8 side, but do additional
>>>>>>> serialization/deserialization work. That should be faster than reparsing
>>>>>>> though. The upper limit of the cost would essentially be the cost of
>>>>>>> serializing / deserializing a code cache for each script.
>>>>>>>
>>>>>>
>>>>>> This seems like an interesting idea.  I wonder if compilation (no
>>>>>> evaluation / running of scripts) would be considered safe enough to 
>>>>>> handle
>>>>>> in a single (not origin/site-bound/locked) process.
>>>>>>
>>>>>
>>>>> The parser/compiler aren't tiny, so it's not unlikely there's a bug.
>>>>> It's certainly much less easy to control such bugs than full-blown JS OOB
>>>>> access though. I could imagine a security bug replacing scripts in another
>>>>> site (assuming it's sandboxed so well that it can't do much else), which
>>>>> would be terrible; and it's unclear to me how easy that would be.
>>>>>
>>>>>
>>>>>>
>>>>>> One thing that I don't fully understand (For both full-JS-parsing and
>>>>>> partial/hackish-non-JS-detection approaches) is if the encoding (e.g. 
>>>>>> UTF8
>>>>>> vs UTF16-LE vs Win-1250) has to be known and communicated upfront to the
>>>>>> parser/sniffer?  Or maybe the input to the decoder needs to be already in
>>>>>> UTF8?  Or maybe something in //net or //network layers can already handle
>>>>>> this aspect of the problem (e.g. ensuring UTF8 in URLLoader::DidRead)?
>>>>>>
>>>>>
>>>>> There's some encoding guessing happening before we streaming compile (
>>>>> https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/core/v8/script_streamer.cc;l=584;drc=f0b502c3c977f47c58b49506629b2dd8353e4c59;bpv=1;bpt=1)
>>>>> and some afterwards; and if we initially compiled with the wrong encoding
>>>>> we discard and redo iirc. Presumably compilation failed anyway if the
>>>>> encoding was wrong; but this presumably also doesn't happen too often.
>>>>>
>>>>>
>>>>>>
>>>>>> Also - when trying to explore the partial/hackish-non-JS-detection
>>>>>> idea, I wondered if the very first character in a script may only come 
>>>>>> from
>>>>>> a relatively limited set of characters?  Let's assume that the sniffer 
>>>>>> can
>>>>>> skip whitespace (space, tab, CR, LF, LS, PS) and html/xml comments (e.g.
>>>>>> <!-- ... -->) - AFAICT the very next character has to be either:
>>>>>>
>>>>>>    - The start of a reserved keyword like "if", "let", etc. (all
>>>>>>    lowercase ASCII)
>>>>>>    - The start of an identifier (any Unicode code point with the
>>>>>>    Unicode property “ID_Start”)
>>>>>>    - The start of a unary expression: + - ~ !
>>>>>>    - The start of a string literal, string template, or a regexp
>>>>>>    literal (or non-HTML comment): " ' ` /
>>>>>>    - The start of a numeric literal: 0-9
>>>>>>    - An opening paren, bracket or brace: ( [ {
>>>>>>    - Not quite sure if a dot or an equal sign can appear as the very
>>>>>>    first character: . =
>>>>>>
>>>>>> This would reject PDFs (starts with %) and HTML/XML (starts with <),
>>>>>> but still would accept ZIP files (first character is a 0x50 - capital P)
>>>>>> and MSOffice files (first character is a 0xD0 which according to Unicode
>>>>>> has ID_Start property set to true).  Rejecting ZIP and MSOffice files 
>>>>>> would
>>>>>> require going beyond the first character - maybe rejecting control
>>>>>> characters like 0x11 or 0x03 outside of comments (not sure if at this 
>>>>>> point
>>>>>> the sniffer's heuristics are starting to get too complex).
>>>>>>
>>>>>
>>>>> That was my initial thought too for e.g., PDF. You'd be blacklisting
>>>>> files you don't want to leak vs whitelisting JS though, which isn't
>>>>> entirely ideal security-wise. It might be better than the alternative
>>>>> though; if we either end up spending slowing down the web (repeat parsing,
>>>>> interfere with streaming) or potentially have new security issues through 
>>>>> a
>>>>> shared compiler process.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> On Fri, Aug 13, 2021 at 12:26 AM 'Łukasz Anforowicz' via v8-dev <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> On Thu, Aug 12, 2021 at 3:18 PM Łukasz Anforowicz <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 12, 2021 at 3:11 PM Jakob Kummerow <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> ORB-with-html/json/xml-sniffing shows that some security benefits
>>>>>>>>>>> of ORB may be realized without full-fidelity JS sniffing/parsing.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> You may call it a security benefit to block "obvious" parser
>>>>>>>>>> breakers like )]}', but in general, any "when in doubt, don't
>>>>>>>>>> block it" strategy won't be much of an obstacle to intentional 
>>>>>>>>>> attacks. For
>>>>>>>>>> instance, once Mr. Bad Guy has learned that the sniffer only looks 
>>>>>>>>>> at the
>>>>>>>>>> first 1024 characters, they can send a response whose first 1024 
>>>>>>>>>> characters
>>>>>>>>>> lead to a "well, it *might* be valid JS" judgement (such as a JS
>>>>>>>>>> comment, or long string, or whatever). OTOH any "when in doubt, 
>>>>>>>>>> block it"
>>>>>>>>>> strategy runs the risk of breaking existing websites in those 
>>>>>>>>>> doubtful
>>>>>>>>>> cases.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In CORB threat model the attacker does *not* control the responses
>>>>>>>>> - CORB tries to prevent https://attacker.com (with either Spectre
>>>>>>>>> or a compromised renderer) from being able to read no-cors responses 
>>>>>>>>> from
>>>>>>>>> https://victim.com.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>  (Although the JSON object syntax is exactly Javascript's
>>>>>>>>>>> object-initializer syntax, a Javascript object-initializer 
>>>>>>>>>>> expression is
>>>>>>>>>>> not valid as a standalone Javascript statement.)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> There is (at least) one subtlety here: JS is more permissive than
>>>>>>>>>> the official JSON spec. The latter requires quotes around property 
>>>>>>>>>> names,
>>>>>>>>>> the former doesn't. I.e. {"foo": is indeed never valid JS, but
>>>>>>>>>> {foo: is (the brace opens a code block, and foo is a label).
>>>>>>>>>> Also, the colon is essential for rejecting the former snippet, 
>>>>>>>>>> because
>>>>>>>>>> {"foo"; is valid JS (code block plus ignored string á la "use
>>>>>>>>>> strict";), so this is a concrete example where the 1024-char
>>>>>>>>>> prefix issue is relevant.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> When the sniffer sees:
>>>>>>>>>>>      [ 123, 456, “long string taking X bytes”,
>>>>>>>>>>> then it should block the response when the Content-Type is a
>>>>>>>>>>> JSON MIME type
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't follow. When the Content-Type is JSON, and the actual
>>>>>>>>>> contents are valid JSON, why should that be blocked?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Correct.  There is no way to read cross-origin JSON via a
>>>>>>>>> "no-cors" fetch.  The only way to read cross-origin JSON is via
>>>>>>>>> CORS-mediated fetch (where the victim has to opt-in by responding with
>>>>>>>>> "Access-Control-Allow-Origin: ...").
>>>>>>>>>
>>>>>>>>
>>>>>>>> Maybe another way to look at it is:
>>>>>>>>
>>>>>>>>    - Only Javascript (and images/audio/video/stylesheets) can be
>>>>>>>>    sent in no-cors mode (e.g. without CORS).  Non-Javascript (and
>>>>>>>>    non-image/video/etc), no-cors, cross-origin responses can be 
>>>>>>>> blocked.
>>>>>>>>    - If the response sniffs as JSON (Content-Type=JSON and
>>>>>>>>    First1024bytes=JSON) then it is *not* Javascript.  Therefore we can 
>>>>>>>> block
>>>>>>>>    the response (and prevent disclosing
>>>>>>>>    https://victim.com/secret.json to a no-cors fetch from
>>>>>>>>    https://attacker.com).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> v8-dev mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://groups.google.com/group/v8-dev
>>>>>>>>>> ---
>>>>>>>>>> You received this message because you are subscribed to a topic
>>>>>>>>>> in the Google Groups "v8-dev" group.
>>>>>>>>>> To unsubscribe from this topic, visit
>>>>>>>>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
>>>>>>>>>> To unsubscribe from this group and all its topics, send an email
>>>>>>>>>> to [email protected].
>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>> https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com
>>>>>>>>>> <https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Lukasz
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Lukasz
>>>>>>>>
>>>>>>>> --
>>>>>>>> --
>>>>>>>> v8-dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://groups.google.com/group/v8-dev
>>>>>>>> ---
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "v8-dev" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com
>>>>>>>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> v8-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://groups.google.com/group/v8-dev
>>>>>>> ---
>>>>>>> You received this message because you are subscribed to a topic in
>>>>>>> the Google Groups "v8-dev" group.
>>>>>>> To unsubscribe from this topic, visit
>>>>>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
>>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>>> [email protected].
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>>
>>>>>> Lukasz
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> v8-dev mailing list
>>>>>> [email protected]
>>>>>> http://groups.google.com/group/v8-dev
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "v8-dev" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> --
>>>>> --
>>>>> v8-dev mailing list
>>>>> [email protected]
>>>>> http://groups.google.com/group/v8-dev
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "v8-dev" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/v8-dev/CANS-YRqxEZHNcHV%2ByHZLBfoNOCbzQRxjXkfaeo2VCQgvUG9zKg%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqxEZHNcHV%2ByHZLBfoNOCbzQRxjXkfaeo2VCQgvUG9zKg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Google Germany GmbH
>>>>
>>>> Erika-Mann-Straße 33
>>>>
>>>> 80636 München
>>>>
>>>> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>>>>
>>>> Registergericht und -nummer: Hamburg, HRB 86891
>>>>
>>>> Sitz der Gesellschaft: Hamburg
>>>>
>>>> Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise
>>>> erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
>>>> weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte
>>>> wissen, dass die E-Mail an die falsche Person gesendet wurde.
>>>>
>>>>
>>>>
>>>> This e-mail is confidential. If you received this communication by
>>>> mistake, please don't forward it to anyone else, please erase all copies
>>>> and attachments, and please let me know that it has gone to the wrong
>>>> person.
>>>>
>>> --
>> --
>> v8-dev mailing list
>> [email protected]
>> http://groups.google.com/group/v8-dev
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "v8-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/v8-dev/ceb7ce0a-dac1-4634-810b-b35b5b97e1f0n%40googlegroups.com
>> <https://groups.google.com/d/msgid/v8-dev/ceb7ce0a-dac1-4634-810b-b35b5b97e1f0n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "v8-dev" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/v8-dev/CAGRskv9ODo7Hco1M8Ac79KP0R7Zauzo7-QVtZ2-TRYM71881cQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/v8-dev/CAGRskv9ODo7Hco1M8Ac79KP0R7Zauzo7-QVtZ2-TRYM71881cQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Thanks,

Lukasz

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/CAA_NCUEaAoxoxeB5hVQ8Kiw2%3DLCAqcz1d5ddgqM3O1dL2pP4JA%40mail.gmail.com.

Re: [v8-dev] Utility to check if a given stream can parse as Javascript (ORB)

Reply via email to