Re: [v8-dev] Utility to check if a given stream can parse as Javascript (ORB)

Marja Hölttä Wed, 01 Sep 2021 07:39:44 -0700

A random side note: it's also possible to make V8's recursive descent
parser run out of stack using valid JS, e.g., let a = [[[[[..[ 0 ]]]]]..]
or other similar constructs (deep enough). Meaning you prob don't want to
call into the parser in a process where you don't want this to happen.


Re: encodings, when I worked on script streaming I noticed it's pretty
common that scripts advertised as UTF-8 are not valid UTF-8 (e.g., have
invalid chars inside comments), and Chrome is currently pretty lenient
about those.


On Wed, Aug 18, 2021 at 3:18 PM Toon Verwaest <[email protected]> wrote:

>
>
> On Wed, Aug 18, 2021 at 2:29 AM 'Łukasz Anforowicz' via v8-dev <
> [email protected]> wrote:
>
>>
>>
>> On Tue, Aug 17, 2021 at 6:59 AM Toon Verwaest <[email protected]>
>> wrote:
>>
>>> Thinking out loud: One idea could be to have a separate sandboxed
>>> compiler process in which we compile incoming JS code. That could reject
>>> the source if it doesn't compile; or compile it to a script that just
>>> throws with no additional info about the actual source.
>>>
>>> That process could implement streaming compilation; so we don't block
>>> streaming until later, we don't double parse, we still have a sandbox (not
>>> in the network process). There might even be benefits for caching as a
>>> compromised renderer cannot look at the compilation artefacts until it
>>> receives them.
>>>
>>> If we fully compile and create a code cache from the compilation result
>>> we don't need a new API on the V8 side, but do additional
>>> serialization/deserialization work. That should be faster than reparsing
>>> though. The upper limit of the cost would essentially be the cost of
>>> serializing / deserializing a code cache for each script.
>>>
>>
>> This seems like an interesting idea.  I wonder if compilation (no
>> evaluation / running of scripts) would be considered safe enough to handle
>> in a single (not origin/site-bound/locked) process.
>>
>
> The parser/compiler aren't tiny, so it's not unlikely there's a bug. It's
> certainly much less easy to control such bugs than full-blown JS OOB access
> though. I could imagine a security bug replacing scripts in another site
> (assuming it's sandboxed so well that it can't do much else), which would
> be terrible; and it's unclear to me how easy that would be.
>
>
>>
>> One thing that I don't fully understand (For both full-JS-parsing and
>> partial/hackish-non-JS-detection approaches) is if the encoding (e.g. UTF8
>> vs UTF16-LE vs Win-1250) has to be known and communicated upfront to the
>> parser/sniffer?  Or maybe the input to the decoder needs to be already in
>> UTF8?  Or maybe something in //net or //network layers can already handle
>> this aspect of the problem (e.g. ensuring UTF8 in URLLoader::DidRead)?
>>
>
> There's some encoding guessing happening before we streaming compile (
> https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/core/v8/script_streamer.cc;l=584;drc=f0b502c3c977f47c58b49506629b2dd8353e4c59;bpv=1;bpt=1)
> and some afterwards; and if we initially compiled with the wrong encoding
> we discard and redo iirc. Presumably compilation failed anyway if the
> encoding was wrong; but this presumably also doesn't happen too often.
>
>
>>
>> Also - when trying to explore the partial/hackish-non-JS-detection idea,
>> I wondered if the very first character in a script may only come from a
>> relatively limited set of characters?  Let's assume that the sniffer can
>> skip whitespace (space, tab, CR, LF, LS, PS) and html/xml comments (e.g.
>> <!-- ... -->) - AFAICT the very next character has to be either:
>>
>>    - The start of a reserved keyword like "if", "let", etc. (all
>>    lowercase ASCII)
>>    - The start of an identifier (any Unicode code point with the Unicode
>>    property “ID_Start”)
>>    - The start of a unary expression: + - ~ !
>>    - The start of a string literal, string template, or a regexp literal
>>    (or non-HTML comment): " ' ` /
>>    - The start of a numeric literal: 0-9
>>    - An opening paren, bracket or brace: ( [ {
>>    - Not quite sure if a dot or an equal sign can appear as the very
>>    first character: . =
>>
>> This would reject PDFs (starts with %) and HTML/XML (starts with <), but
>> still would accept ZIP files (first character is a 0x50 - capital P) and
>> MSOffice files (first character is a 0xD0 which according to Unicode has
>> ID_Start property set to true).  Rejecting ZIP and MSOffice files would
>> require going beyond the first character - maybe rejecting control
>> characters like 0x11 or 0x03 outside of comments (not sure if at this point
>> the sniffer's heuristics are starting to get too complex).
>>
>
> That was my initial thought too for e.g., PDF. You'd be blacklisting files
> you don't want to leak vs whitelisting JS though, which isn't entirely
> ideal security-wise. It might be better than the alternative though; if we
> either end up spending slowing down the web (repeat parsing, interfere with
> streaming) or potentially have new security issues through a shared
> compiler process.
>
>
>>
>>
>>> On Fri, Aug 13, 2021 at 12:26 AM 'Łukasz Anforowicz' via v8-dev <
>>> [email protected]> wrote:
>>>
>>>> On Thu, Aug 12, 2021 at 3:18 PM Łukasz Anforowicz <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 12, 2021 at 3:11 PM Jakob Kummerow <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> ORB-with-html/json/xml-sniffing shows that some security benefits of
>>>>>>> ORB may be realized without full-fidelity JS sniffing/parsing.
>>>>>>>
>>>>>>>
>>>>>> You may call it a security benefit to block "obvious" parser breakers
>>>>>> like )]}', but in general, any "when in doubt, don't block it"
>>>>>> strategy won't be much of an obstacle to intentional attacks. For 
>>>>>> instance,
>>>>>> once Mr. Bad Guy has learned that the sniffer only looks at the first 
>>>>>> 1024
>>>>>> characters, they can send a response whose first 1024 characters lead to 
>>>>>> a
>>>>>> "well, it *might* be valid JS" judgement (such as a JS comment, or
>>>>>> long string, or whatever). OTOH any "when in doubt, block it" strategy 
>>>>>> runs
>>>>>> the risk of breaking existing websites in those doubtful cases.
>>>>>>
>>>>>
>>>>> In CORB threat model the attacker does *not* control the responses -
>>>>> CORB tries to prevent https://attacker.com (with either Spectre or a
>>>>> compromised renderer) from being able to read no-cors responses from
>>>>> https://victim.com.
>>>>>
>>>>>>
>>>>>>
>>>>>>>  (Although the JSON object syntax is exactly Javascript's
>>>>>>> object-initializer syntax, a Javascript object-initializer expression is
>>>>>>> not valid as a standalone Javascript statement.)
>>>>>>
>>>>>>
>>>>>> There is (at least) one subtlety here: JS is more permissive than the
>>>>>> official JSON spec. The latter requires quotes around property names, the
>>>>>> former doesn't. I.e. {"foo": is indeed never valid JS, but {foo: is
>>>>>> (the brace opens a code block, and foo is a label). Also, the colon is
>>>>>> essential for rejecting the former snippet, because {"foo"; is valid
>>>>>> JS (code block plus ignored string á la "use strict";), so this is a
>>>>>> concrete example where the 1024-char prefix issue is relevant.
>>>>>>
>>>>>>
>>>>>>> When the sniffer sees:
>>>>>>>      [ 123, 456, “long string taking X bytes”,
>>>>>>> then it should block the response when the Content-Type is a JSON
>>>>>>> MIME type
>>>>>>
>>>>>>
>>>>>> I don't follow. When the Content-Type is JSON, and the actual
>>>>>> contents are valid JSON, why should that be blocked?
>>>>>>
>>>>>
>>>>> Correct.  There is no way to read cross-origin JSON via a "no-cors"
>>>>> fetch.  The only way to read cross-origin JSON is via CORS-mediated fetch
>>>>> (where the victim has to opt-in by responding with
>>>>> "Access-Control-Allow-Origin: ...").
>>>>>
>>>>
>>>> Maybe another way to look at it is:
>>>>
>>>>    - Only Javascript (and images/audio/video/stylesheets) can be sent
>>>>    in no-cors mode (e.g. without CORS).  Non-Javascript (and
>>>>    non-image/video/etc), no-cors, cross-origin responses can be blocked.
>>>>    - If the response sniffs as JSON (Content-Type=JSON and
>>>>    First1024bytes=JSON) then it is *not* Javascript.  Therefore we can 
>>>> block
>>>>    the response (and prevent disclosing https://victim.com/secret.json
>>>>    to a no-cors fetch from https://attacker.com).
>>>>
>>>>
>>>>
>>>>>
>>>>>> --
>>>>>> --
>>>>>> v8-dev mailing list
>>>>>> [email protected]
>>>>>> http://groups.google.com/group/v8-dev
>>>>>> ---
>>>>>> You received this message because you are subscribed to a topic in
>>>>>> the Google Groups "v8-dev" group.
>>>>>> To unsubscribe from this topic, visit
>>>>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>> [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>>
>>>>> Lukasz
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>>
>>>> Lukasz
>>>>
>>>> --
>>>> --
>>>> v8-dev mailing list
>>>> [email protected]
>>>> http://groups.google.com/group/v8-dev
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "v8-dev" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> --
>>> v8-dev mailing list
>>> [email protected]
>>> http://groups.google.com/group/v8-dev
>>> ---
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "v8-dev" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>> Thanks,
>>
>> Lukasz
>>
>> --
>> --
>> v8-dev mailing list
>> [email protected]
>> http://groups.google.com/group/v8-dev
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "v8-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "v8-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/v8-dev/CANS-YRqxEZHNcHV%2ByHZLBfoNOCbzQRxjXkfaeo2VCQgvUG9zKg%40mail.gmail.com
> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqxEZHNcHV%2ByHZLBfoNOCbzQRxjXkfaeo2VCQgvUG9zKg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 


Google Germany GmbH

Erika-Mann-Straße 33

80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten
haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter,
löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen,
dass die E-Mail an die falsche Person gesendet wurde.



This e-mail is confidential. If you received this communication by mistake,
please don't forward it to anyone else, please erase all copies and
attachments, and please let me know that it has gone to the wrong person.

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/CAED6dUDXGNSVry9WGtUSJup_nvhaq2Sjfh-phrv-FR-9LmQ0uw%40mail.gmail.com.

Re: [v8-dev] Utility to check if a given stream can parse as Javascript (ORB)

Reply via email to