Re: [v8-dev] Utility to check if a given stream can parse as Javascript (ORB)

Toon Verwaest Wed, 18 Aug 2021 06:18:15 -0700

On Wed, Aug 18, 2021 at 2:29 AM 'Łukasz Anforowicz' via v8-dev <
[email protected]> wrote:


>
>
> On Tue, Aug 17, 2021 at 6:59 AM Toon Verwaest <[email protected]>
> wrote:
>
>> Thinking out loud: One idea could be to have a separate sandboxed
>> compiler process in which we compile incoming JS code. That could reject
>> the source if it doesn't compile; or compile it to a script that just
>> throws with no additional info about the actual source.
>>
>> That process could implement streaming compilation; so we don't block
>> streaming until later, we don't double parse, we still have a sandbox (not
>> in the network process). There might even be benefits for caching as a
>> compromised renderer cannot look at the compilation artefacts until it
>> receives them.
>>
>> If we fully compile and create a code cache from the compilation result
>> we don't need a new API on the V8 side, but do additional
>> serialization/deserialization work. That should be faster than reparsing
>> though. The upper limit of the cost would essentially be the cost of
>> serializing / deserializing a code cache for each script.
>>
>
> This seems like an interesting idea.  I wonder if compilation (no
> evaluation / running of scripts) would be considered safe enough to handle
> in a single (not origin/site-bound/locked) process.
>

The parser/compiler aren't tiny, so it's not unlikely there's a bug. It's
certainly much less easy to control such bugs than full-blown JS OOB access
though. I could imagine a security bug replacing scripts in another site
(assuming it's sandboxed so well that it can't do much else), which would
be terrible; and it's unclear to me how easy that would be.


>
> One thing that I don't fully understand (For both full-JS-parsing and
> partial/hackish-non-JS-detection approaches) is if the encoding (e.g. UTF8
> vs UTF16-LE vs Win-1250) has to be known and communicated upfront to the
> parser/sniffer?  Or maybe the input to the decoder needs to be already in
> UTF8?  Or maybe something in //net or //network layers can already handle
> this aspect of the problem (e.g. ensuring UTF8 in URLLoader::DidRead)?
>

There's some encoding guessing happening before we streaming compile (
https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/core/v8/script_streamer.cc;l=584;drc=f0b502c3c977f47c58b49506629b2dd8353e4c59;bpv=1;bpt=1)
and some afterwards; and if we initially compiled with the wrong encoding
we discard and redo iirc. Presumably compilation failed anyway if the
encoding was wrong; but this presumably also doesn't happen too often.


>
> Also - when trying to explore the partial/hackish-non-JS-detection idea, I
> wondered if the very first character in a script may only come from a
> relatively limited set of characters?  Let's assume that the sniffer can
> skip whitespace (space, tab, CR, LF, LS, PS) and html/xml comments (e.g.
> <!-- ... -->) - AFAICT the very next character has to be either:
>
>    - The start of a reserved keyword like "if", "let", etc. (all
>    lowercase ASCII)
>    - The start of an identifier (any Unicode code point with the Unicode
>    property “ID_Start”)
>    - The start of a unary expression: + - ~ !
>    - The start of a string literal, string template, or a regexp literal
>    (or non-HTML comment): " ' ` /
>    - The start of a numeric literal: 0-9
>    - An opening paren, bracket or brace: ( [ {
>    - Not quite sure if a dot or an equal sign can appear as the very
>    first character: . =
>
> This would reject PDFs (starts with %) and HTML/XML (starts with <), but
> still would accept ZIP files (first character is a 0x50 - capital P) and
> MSOffice files (first character is a 0xD0 which according to Unicode has
> ID_Start property set to true).  Rejecting ZIP and MSOffice files would
> require going beyond the first character - maybe rejecting control
> characters like 0x11 or 0x03 outside of comments (not sure if at this point
> the sniffer's heuristics are starting to get too complex).
>

That was my initial thought too for e.g., PDF. You'd be blacklisting files
you don't want to leak vs whitelisting JS though, which isn't entirely
ideal security-wise. It might be better than the alternative though; if we
either end up spending slowing down the web (repeat parsing, interfere with
streaming) or potentially have new security issues through a shared
compiler process.


>
>
>> On Fri, Aug 13, 2021 at 12:26 AM 'Łukasz Anforowicz' via v8-dev <
>> [email protected]> wrote:
>>
>>> On Thu, Aug 12, 2021 at 3:18 PM Łukasz Anforowicz <[email protected]>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Aug 12, 2021 at 3:11 PM Jakob Kummerow <[email protected]>
>>>> wrote:
>>>>
>>>>> ORB-with-html/json/xml-sniffing shows that some security benefits of
>>>>>> ORB may be realized without full-fidelity JS sniffing/parsing.
>>>>>>
>>>>>>
>>>>> You may call it a security benefit to block "obvious" parser breakers
>>>>> like )]}', but in general, any "when in doubt, don't block it"
>>>>> strategy won't be much of an obstacle to intentional attacks. For 
>>>>> instance,
>>>>> once Mr. Bad Guy has learned that the sniffer only looks at the first 1024
>>>>> characters, they can send a response whose first 1024 characters lead to a
>>>>> "well, it *might* be valid JS" judgement (such as a JS comment, or
>>>>> long string, or whatever). OTOH any "when in doubt, block it" strategy 
>>>>> runs
>>>>> the risk of breaking existing websites in those doubtful cases.
>>>>>
>>>>
>>>> In CORB threat model the attacker does *not* control the responses -
>>>> CORB tries to prevent https://attacker.com (with either Spectre or a
>>>> compromised renderer) from being able to read no-cors responses from
>>>> https://victim.com.
>>>>
>>>>>
>>>>>
>>>>>>  (Although the JSON object syntax is exactly Javascript's
>>>>>> object-initializer syntax, a Javascript object-initializer expression is
>>>>>> not valid as a standalone Javascript statement.)
>>>>>
>>>>>
>>>>> There is (at least) one subtlety here: JS is more permissive than the
>>>>> official JSON spec. The latter requires quotes around property names, the
>>>>> former doesn't. I.e. {"foo": is indeed never valid JS, but {foo: is
>>>>> (the brace opens a code block, and foo is a label). Also, the colon is
>>>>> essential for rejecting the former snippet, because {"foo"; is valid
>>>>> JS (code block plus ignored string á la "use strict";), so this is a
>>>>> concrete example where the 1024-char prefix issue is relevant.
>>>>>
>>>>>
>>>>>> When the sniffer sees:
>>>>>>      [ 123, 456, “long string taking X bytes”,
>>>>>> then it should block the response when the Content-Type is a JSON
>>>>>> MIME type
>>>>>
>>>>>
>>>>> I don't follow. When the Content-Type is JSON, and the actual contents
>>>>> are valid JSON, why should that be blocked?
>>>>>
>>>>
>>>> Correct.  There is no way to read cross-origin JSON via a "no-cors"
>>>> fetch.  The only way to read cross-origin JSON is via CORS-mediated fetch
>>>> (where the victim has to opt-in by responding with
>>>> "Access-Control-Allow-Origin: ...").
>>>>
>>>
>>> Maybe another way to look at it is:
>>>
>>>    - Only Javascript (and images/audio/video/stylesheets) can be sent
>>>    in no-cors mode (e.g. without CORS).  Non-Javascript (and
>>>    non-image/video/etc), no-cors, cross-origin responses can be blocked.
>>>    - If the response sniffs as JSON (Content-Type=JSON and
>>>    First1024bytes=JSON) then it is *not* Javascript.  Therefore we can block
>>>    the response (and prevent disclosing https://victim.com/secret.json
>>>    to a no-cors fetch from https://attacker.com).
>>>
>>>
>>>
>>>>
>>>>> --
>>>>> --
>>>>> v8-dev mailing list
>>>>> [email protected]
>>>>> http://groups.google.com/group/v8-dev
>>>>> ---
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "v8-dev" group.
>>>>> To unsubscribe from this topic, visit
>>>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>>
>>>> Lukasz
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>>
>>> Lukasz
>>>
>>> --
>>> --
>>> v8-dev mailing list
>>> [email protected]
>>> http://groups.google.com/group/v8-dev
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "v8-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> --
>> v8-dev mailing list
>> [email protected]
>> http://groups.google.com/group/v8-dev
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "v8-dev" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com
>> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
> Thanks,
>
> Lukasz
>
> --
> --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "v8-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/CANS-YRqxEZHNcHV%2ByHZLBfoNOCbzQRxjXkfaeo2VCQgvUG9zKg%40mail.gmail.com.

Re: [v8-dev] Utility to check if a given stream can parse as Javascript (ORB)

Reply via email to