Re: [v8-dev] Utility to check if a given stream can parse as Javascript (ORB)

'Łukasz Anforowicz' via v8-dev Tue, 17 Aug 2021 17:29:32 -0700

On Tue, Aug 17, 2021 at 6:59 AM Toon Verwaest <[email protected]> wrote:


> Thinking out loud: One idea could be to have a separate sandboxed compiler
> process in which we compile incoming JS code. That could reject the source
> if it doesn't compile; or compile it to a script that just throws with no
> additional info about the actual source.
>
> That process could implement streaming compilation; so we don't block
> streaming until later, we don't double parse, we still have a sandbox (not
> in the network process). There might even be benefits for caching as a
> compromised renderer cannot look at the compilation artefacts until it
> receives them.
>
> If we fully compile and create a code cache from the compilation result we
> don't need a new API on the V8 side, but do additional
> serialization/deserialization work. That should be faster than reparsing
> though. The upper limit of the cost would essentially be the cost of
> serializing / deserializing a code cache for each script.
>

This seems like an interesting idea.  I wonder if compilation (no
evaluation / running of scripts) would be considered safe enough to handle
in a single (not origin/site-bound/locked) process.

One thing that I don't fully understand (For both full-JS-parsing and
partial/hackish-non-JS-detection approaches) is if the encoding (e.g. UTF8
vs UTF16-LE vs Win-1250) has to be known and communicated upfront to the
parser/sniffer?  Or maybe the input to the decoder needs to be already in
UTF8?  Or maybe something in //net or //network layers can already handle
this aspect of the problem (e.g. ensuring UTF8 in URLLoader::DidRead)?

Also - when trying to explore the partial/hackish-non-JS-detection idea, I
wondered if the very first character in a script may only come from a
relatively limited set of characters?  Let's assume that the sniffer can
skip whitespace (space, tab, CR, LF, LS, PS) and html/xml comments (e.g.
<!-- ... -->) - AFAICT the very next character has to be either:

   - The start of a reserved keyword like "if", "let", etc. (all lowercase
   ASCII)
   - The start of an identifier (any Unicode code point with the Unicode
   property “ID_Start”)
   - The start of a unary expression: + - ~ !
   - The start of a string literal, string template, or a regexp literal
   (or non-HTML comment): " ' ` /
   - The start of a numeric literal: 0-9
   - An opening paren, bracket or brace: ( [ {
   - Not quite sure if a dot or an equal sign can appear as the very first
   character: . =

This would reject PDFs (starts with %) and HTML/XML (starts with <), but
still would accept ZIP files (first character is a 0x50 - capital P) and
MSOffice files (first character is a 0xD0 which according to Unicode has
ID_Start property set to true).  Rejecting ZIP and MSOffice files would
require going beyond the first character - maybe rejecting control
characters like 0x11 or 0x03 outside of comments (not sure if at this point
the sniffer's heuristics are starting to get too complex).


> On Fri, Aug 13, 2021 at 12:26 AM 'Łukasz Anforowicz' via v8-dev <
> [email protected]> wrote:
>
>> On Thu, Aug 12, 2021 at 3:18 PM Łukasz Anforowicz <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On Thu, Aug 12, 2021 at 3:11 PM Jakob Kummerow <[email protected]>
>>> wrote:
>>>
>>>> ORB-with-html/json/xml-sniffing shows that some security benefits of
>>>>> ORB may be realized without full-fidelity JS sniffing/parsing.
>>>>>
>>>>>
>>>> You may call it a security benefit to block "obvious" parser breakers
>>>> like )]}', but in general, any "when in doubt, don't block it"
>>>> strategy won't be much of an obstacle to intentional attacks. For instance,
>>>> once Mr. Bad Guy has learned that the sniffer only looks at the first 1024
>>>> characters, they can send a response whose first 1024 characters lead to a
>>>> "well, it *might* be valid JS" judgement (such as a JS comment, or
>>>> long string, or whatever). OTOH any "when in doubt, block it" strategy runs
>>>> the risk of breaking existing websites in those doubtful cases.
>>>>
>>>
>>> In CORB threat model the attacker does *not* control the responses -
>>> CORB tries to prevent https://attacker.com (with either Spectre or a
>>> compromised renderer) from being able to read no-cors responses from
>>> https://victim.com.
>>>
>>>>
>>>>
>>>>>  (Although the JSON object syntax is exactly Javascript's
>>>>> object-initializer syntax, a Javascript object-initializer expression is
>>>>> not valid as a standalone Javascript statement.)
>>>>
>>>>
>>>> There is (at least) one subtlety here: JS is more permissive than the
>>>> official JSON spec. The latter requires quotes around property names, the
>>>> former doesn't. I.e. {"foo": is indeed never valid JS, but {foo: is
>>>> (the brace opens a code block, and foo is a label). Also, the colon is
>>>> essential for rejecting the former snippet, because {"foo"; is valid
>>>> JS (code block plus ignored string á la "use strict";), so this is a
>>>> concrete example where the 1024-char prefix issue is relevant.
>>>>
>>>>
>>>>> When the sniffer sees:
>>>>>      [ 123, 456, “long string taking X bytes”,
>>>>> then it should block the response when the Content-Type is a JSON MIME
>>>>> type
>>>>
>>>>
>>>> I don't follow. When the Content-Type is JSON, and the actual contents
>>>> are valid JSON, why should that be blocked?
>>>>
>>>
>>> Correct.  There is no way to read cross-origin JSON via a "no-cors"
>>> fetch.  The only way to read cross-origin JSON is via CORS-mediated fetch
>>> (where the victim has to opt-in by responding with
>>> "Access-Control-Allow-Origin: ...").
>>>
>>
>> Maybe another way to look at it is:
>>
>>    - Only Javascript (and images/audio/video/stylesheets) can be sent in
>>    no-cors mode (e.g. without CORS).  Non-Javascript (and
>>    non-image/video/etc), no-cors, cross-origin responses can be blocked.
>>    - If the response sniffs as JSON (Content-Type=JSON and
>>    First1024bytes=JSON) then it is *not* Javascript.  Therefore we can block
>>    the response (and prevent disclosing https://victim.com/secret.json
>>    to a no-cors fetch from https://attacker.com).
>>
>>
>>
>>>
>>>> --
>>>> --
>>>> v8-dev mailing list
>>>> [email protected]
>>>> http://groups.google.com/group/v8-dev
>>>> ---
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "v8-dev" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>>
>>> Lukasz
>>>
>>
>>
>> --
>> Thanks,
>>
>> Lukasz
>>
>> --
>> --
>> v8-dev mailing list
>> [email protected]
>> http://groups.google.com/group/v8-dev
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "v8-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com
>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "v8-dev" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com
> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Thanks,

Lukasz

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com.

Re: [v8-dev] Utility to check if a given stream can parse as Javascript (ORB)

Reply via email to