As I understand it, the intention here is that false-positives for "is JS" are acceptable, and that it's up to the victim site to avoid prefixes that might be JS, but aren't. With that, what's the benefit of a full JS parse over a list of known non-JS prefixes like the one we already have?
On Tue, May 31, 2022 at 7:34 PM 'Łukasz Anforowicz' via v8-dev < [email protected]> wrote: > On Tue, May 31, 2022 at 9:00 AM Leszek Swirski <[email protected]> > wrote: > >> I want to note one thing here, kind of a side observation really: >> while(1); is valid JS, it's just an infinite loop. Do we also want to >> guard against common patterns like this? >> > > FWIW today CORB explicitly detects and blocks `while(1);` (the code here > <https://source.chromium.org/chromium/chromium/src/+/main:services/network/public/cpp/corb/corb_impl.cc;l=471-505;drc=3c60abdfc28ef5be216ebdf4501cf3a24c901007> > has > some extra comments and details). OTOH, 1) I am not sure if detecting > `while(1);` is a hard requirements (maybe detecting JS-parser-breakers > <https://source.chromium.org/chromium/chromium/src/+/main:services/network/public/cpp/corb/corb_impl.cc;l=483-498;drc=3c60abdfc28ef5be216ebdf4501cf3a24c901007> > is sufficient), and 2) I am not sure if/how `while(1);`-related > considerations impact the main points and questions from Daniel. > >> >> - Leszek >> >> On Tue, May 31, 2022 at 2:45 PM 'Daniel Vogelheim' via v8-dev < >> [email protected]> wrote: >> >>> Hi all, >>> >>> Apologies for reviving this thread, but this problem is coming up again. >>> I think the answer of parsing in a separate process would work, but I'd >>> really like to find a simpler solution. For all I can see, the underlying >>> security requirements should be much less strict than the current ORB >>> proposal implies. An approximation should do just fine. For example, for >>> media formats we just look for a "magic number" (e.g. a 3-byte constant for >>> JPEG files); so I don't think we need a full parse of the input. >>> >>> Here is how I'd like to simplify this: >>> - Run only the JS scanner. (Including charset + comment processing.) >>> - Take the first N tokens. I suspect N=3 would be enough. >>> - Check the token list against a set of permissible token sequences. >>> >>> Even for small N a complete list of permissible sequences might be >>> rather large. It might be worth approximating it. >>> In either case, this method easily distinguishes valid JS from pretty >>> much any of the requirements from Lukasz' earlier mail (except "while(1);", >>> which needs N>=5). It does leave some ambiguity towards JSON, but IMHO >>> that's tolerable. >>> >>> Would this make sense from a V8 perspective? >>> >>> Is it possible to generate a list of possible token sequences from the >>> JS grammar, or would one have to do that manually? (For, say, N=3) >>> >>> The question of standardization has also come up. Could TC39 maybe be >>> convinced to adopt such a JavaScript sniffer, since it's fundamentally an >>> operation on JS syntax? (That would hopefully prevent the sniffer and the >>> actual syntax from getting out of sync as JS evolves.) >>> >>> Any thoughts? >>> >>> Daniel >>> >>> On Wednesday, September 1, 2021 at 5:46:25 PM UTC+2 [email protected] >>> wrote: >>> >>>> Wait, no, we do handle running out of stack in a robust way and the >>>> "does this parse" should just return false then (even though the code might >>>> be valid Js). Please ignore that part of my comment :) >>>> >>>> On Wed, 1 Sep 2021, 16:38 Marja Hölttä, <[email protected]> wrote: >>>> >>>>> A random side note: it's also possible to make V8's recursive descent >>>>> parser run out of stack using valid JS, e.g., let a = [[[[[..[ 0 ]]]]]..] >>>>> or other similar constructs (deep enough). Meaning you prob don't want to >>>>> call into the parser in a process where you don't want this to happen. >>>>> >>>>> Re: encodings, when I worked on script streaming I noticed it's pretty >>>>> common that scripts advertised as UTF-8 are not valid UTF-8 (e.g., have >>>>> invalid chars inside comments), and Chrome is currently pretty lenient >>>>> about those. >>>>> >>>>> >>>>> On Wed, Aug 18, 2021 at 3:18 PM Toon Verwaest <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Aug 18, 2021 at 2:29 AM 'Łukasz Anforowicz' via v8-dev < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 17, 2021 at 6:59 AM Toon Verwaest <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Thinking out loud: One idea could be to have a separate sandboxed >>>>>>>> compiler process in which we compile incoming JS code. That could >>>>>>>> reject >>>>>>>> the source if it doesn't compile; or compile it to a script that just >>>>>>>> throws with no additional info about the actual source. >>>>>>>> >>>>>>>> That process could implement streaming compilation; so we don't >>>>>>>> block streaming until later, we don't double parse, we still have a >>>>>>>> sandbox >>>>>>>> (not in the network process). There might even be benefits for caching >>>>>>>> as a >>>>>>>> compromised renderer cannot look at the compilation artefacts until it >>>>>>>> receives them. >>>>>>>> >>>>>>>> If we fully compile and create a code cache from the compilation >>>>>>>> result we don't need a new API on the V8 side, but do additional >>>>>>>> serialization/deserialization work. That should be faster than >>>>>>>> reparsing >>>>>>>> though. The upper limit of the cost would essentially be the cost of >>>>>>>> serializing / deserializing a code cache for each script. >>>>>>>> >>>>>>> >>>>>>> This seems like an interesting idea. I wonder if compilation (no >>>>>>> evaluation / running of scripts) would be considered safe enough to >>>>>>> handle >>>>>>> in a single (not origin/site-bound/locked) process. >>>>>>> >>>>>> >>>>>> The parser/compiler aren't tiny, so it's not unlikely there's a bug. >>>>>> It's certainly much less easy to control such bugs than full-blown JS OOB >>>>>> access though. I could imagine a security bug replacing scripts in >>>>>> another >>>>>> site (assuming it's sandboxed so well that it can't do much else), which >>>>>> would be terrible; and it's unclear to me how easy that would be. >>>>>> >>>>>> >>>>>>> >>>>>>> One thing that I don't fully understand (For both full-JS-parsing >>>>>>> and partial/hackish-non-JS-detection approaches) is if the encoding >>>>>>> (e.g. >>>>>>> UTF8 vs UTF16-LE vs Win-1250) has to be known and communicated upfront >>>>>>> to >>>>>>> the parser/sniffer? Or maybe the input to the decoder needs to be >>>>>>> already >>>>>>> in UTF8? Or maybe something in //net or //network layers can already >>>>>>> handle this aspect of the problem (e.g. ensuring UTF8 in >>>>>>> URLLoader::DidRead)? >>>>>>> >>>>>> >>>>>> There's some encoding guessing happening before we streaming compile ( >>>>>> https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/core/v8/script_streamer.cc;l=584;drc=f0b502c3c977f47c58b49506629b2dd8353e4c59;bpv=1;bpt=1) >>>>>> and some afterwards; and if we initially compiled with the wrong encoding >>>>>> we discard and redo iirc. Presumably compilation failed anyway if the >>>>>> encoding was wrong; but this presumably also doesn't happen too often. >>>>>> >>>>>> >>>>>>> >>>>>>> Also - when trying to explore the partial/hackish-non-JS-detection >>>>>>> idea, I wondered if the very first character in a script may only come >>>>>>> from >>>>>>> a relatively limited set of characters? Let's assume that the sniffer >>>>>>> can >>>>>>> skip whitespace (space, tab, CR, LF, LS, PS) and html/xml comments (e.g. >>>>>>> <!-- ... -->) - AFAICT the very next character has to be either: >>>>>>> >>>>>>> - The start of a reserved keyword like "if", "let", etc. (all >>>>>>> lowercase ASCII) >>>>>>> - The start of an identifier (any Unicode code point with the >>>>>>> Unicode property “ID_Start”) >>>>>>> - The start of a unary expression: + - ~ ! >>>>>>> - The start of a string literal, string template, or a regexp >>>>>>> literal (or non-HTML comment): " ' ` / >>>>>>> - The start of a numeric literal: 0-9 >>>>>>> - An opening paren, bracket or brace: ( [ { >>>>>>> - Not quite sure if a dot or an equal sign can appear as the >>>>>>> very first character: . = >>>>>>> >>>>>>> This would reject PDFs (starts with %) and HTML/XML (starts with <), >>>>>>> but still would accept ZIP files (first character is a 0x50 - capital P) >>>>>>> and MSOffice files (first character is a 0xD0 which according to Unicode >>>>>>> has ID_Start property set to true). Rejecting ZIP and MSOffice files >>>>>>> would >>>>>>> require going beyond the first character - maybe rejecting control >>>>>>> characters like 0x11 or 0x03 outside of comments (not sure if at this >>>>>>> point >>>>>>> the sniffer's heuristics are starting to get too complex). >>>>>>> >>>>>> >>>>>> That was my initial thought too for e.g., PDF. You'd be blacklisting >>>>>> files you don't want to leak vs whitelisting JS though, which isn't >>>>>> entirely ideal security-wise. It might be better than the alternative >>>>>> though; if we either end up spending slowing down the web (repeat >>>>>> parsing, >>>>>> interfere with streaming) or potentially have new security issues >>>>>> through a >>>>>> shared compiler process. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> On Fri, Aug 13, 2021 at 12:26 AM 'Łukasz Anforowicz' via v8-dev < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> On Thu, Aug 12, 2021 at 3:18 PM Łukasz Anforowicz < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Aug 12, 2021 at 3:11 PM Jakob Kummerow < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> ORB-with-html/json/xml-sniffing shows that some security >>>>>>>>>>>> benefits of ORB may be realized without full-fidelity JS >>>>>>>>>>>> sniffing/parsing. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> You may call it a security benefit to block "obvious" parser >>>>>>>>>>> breakers like )]}', but in general, any "when in doubt, don't >>>>>>>>>>> block it" strategy won't be much of an obstacle to intentional >>>>>>>>>>> attacks. For >>>>>>>>>>> instance, once Mr. Bad Guy has learned that the sniffer only looks >>>>>>>>>>> at the >>>>>>>>>>> first 1024 characters, they can send a response whose first 1024 >>>>>>>>>>> characters >>>>>>>>>>> lead to a "well, it *might* be valid JS" judgement (such as a >>>>>>>>>>> JS comment, or long string, or whatever). OTOH any "when in doubt, >>>>>>>>>>> block >>>>>>>>>>> it" strategy runs the risk of breaking existing websites in those >>>>>>>>>>> doubtful >>>>>>>>>>> cases. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In CORB threat model the attacker does *not* control the >>>>>>>>>> responses - CORB tries to prevent https://attacker.com (with >>>>>>>>>> either Spectre or a compromised renderer) from being able to read >>>>>>>>>> no-cors >>>>>>>>>> responses from https://victim.com. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> (Although the JSON object syntax is exactly Javascript's >>>>>>>>>>>> object-initializer syntax, a Javascript object-initializer >>>>>>>>>>>> expression is >>>>>>>>>>>> not valid as a standalone Javascript statement.) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> There is (at least) one subtlety here: JS is more permissive >>>>>>>>>>> than the official JSON spec. The latter requires quotes around >>>>>>>>>>> property >>>>>>>>>>> names, the former doesn't. I.e. {"foo": is indeed never valid >>>>>>>>>>> JS, but {foo: is (the brace opens a code block, and foo is a >>>>>>>>>>> label). Also, the colon is essential for rejecting the former >>>>>>>>>>> snippet, >>>>>>>>>>> because {"foo"; is valid JS (code block plus ignored string á >>>>>>>>>>> la "use strict";), so this is a concrete example where the >>>>>>>>>>> 1024-char prefix issue is relevant. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> When the sniffer sees: >>>>>>>>>>>> [ 123, 456, “long string taking X bytes”, >>>>>>>>>>>> then it should block the response when the Content-Type is a >>>>>>>>>>>> JSON MIME type >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I don't follow. When the Content-Type is JSON, and the actual >>>>>>>>>>> contents are valid JSON, why should that be blocked? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Correct. There is no way to read cross-origin JSON via a >>>>>>>>>> "no-cors" fetch. The only way to read cross-origin JSON is via >>>>>>>>>> CORS-mediated fetch (where the victim has to opt-in by responding >>>>>>>>>> with >>>>>>>>>> "Access-Control-Allow-Origin: ..."). >>>>>>>>>> >>>>>>>>> >>>>>>>>> Maybe another way to look at it is: >>>>>>>>> >>>>>>>>> - Only Javascript (and images/audio/video/stylesheets) can be >>>>>>>>> sent in no-cors mode (e.g. without CORS). Non-Javascript (and >>>>>>>>> non-image/video/etc), no-cors, cross-origin responses can be >>>>>>>>> blocked. >>>>>>>>> - If the response sniffs as JSON (Content-Type=JSON and >>>>>>>>> First1024bytes=JSON) then it is *not* Javascript. Therefore we >>>>>>>>> can block >>>>>>>>> the response (and prevent disclosing >>>>>>>>> https://victim.com/secret.json to a no-cors fetch from >>>>>>>>> https://attacker.com). >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> v8-dev mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> http://groups.google.com/group/v8-dev >>>>>>>>>>> --- >>>>>>>>>>> You received this message because you are subscribed to a topic >>>>>>>>>>> in the Google Groups "v8-dev" group. >>>>>>>>>>> To unsubscribe from this topic, visit >>>>>>>>>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe >>>>>>>>>>> . >>>>>>>>>>> To unsubscribe from this group and all its topics, send an email >>>>>>>>>>> to [email protected]. >>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com >>>>>>>>>>> <https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Lukasz >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Lukasz >>>>>>>>> >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> v8-dev mailing list >>>>>>>>> [email protected] >>>>>>>>> http://groups.google.com/group/v8-dev >>>>>>>>> --- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "v8-dev" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com >>>>>>>>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- >>>>>>>> -- >>>>>>>> v8-dev mailing list >>>>>>>> [email protected] >>>>>>>> http://groups.google.com/group/v8-dev >>>>>>>> --- >>>>>>>> You received this message because you are subscribed to a topic in >>>>>>>> the Google Groups "v8-dev" group. >>>>>>>> To unsubscribe from this topic, visit >>>>>>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe. >>>>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>>>> [email protected]. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com >>>>>>>> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks, >>>>>>> >>>>>>> Lukasz >>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> v8-dev mailing list >>>>>>> [email protected] >>>>>>> http://groups.google.com/group/v8-dev >>>>>>> --- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "v8-dev" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com >>>>>>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>>>>> -- >>>>>> v8-dev mailing list >>>>>> [email protected] >>>>>> http://groups.google.com/group/v8-dev >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "v8-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/v8-dev/CANS-YRqxEZHNcHV%2ByHZLBfoNOCbzQRxjXkfaeo2VCQgvUG9zKg%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqxEZHNcHV%2ByHZLBfoNOCbzQRxjXkfaeo2VCQgvUG9zKg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> Google Germany GmbH >>>>> >>>>> Erika-Mann-Straße 33 >>>>> >>>>> 80636 München >>>>> >>>>> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado >>>>> >>>>> Registergericht und -nummer: Hamburg, HRB 86891 >>>>> >>>>> Sitz der Gesellschaft: Hamburg >>>>> >>>>> Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise >>>>> erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes >>>>> weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich >>>>> bitte >>>>> wissen, dass die E-Mail an die falsche Person gesendet wurde. >>>>> >>>>> >>>>> >>>>> This e-mail is confidential. If you received this communication by >>>>> mistake, please don't forward it to anyone else, please erase all copies >>>>> and attachments, and please let me know that it has gone to the wrong >>>>> person. >>>>> >>>> -- >>> -- >>> v8-dev mailing list >>> [email protected] >>> http://groups.google.com/group/v8-dev >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "v8-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/v8-dev/ceb7ce0a-dac1-4634-810b-b35b5b97e1f0n%40googlegroups.com >>> <https://groups.google.com/d/msgid/v8-dev/ceb7ce0a-dac1-4634-810b-b35b5b97e1f0n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> -- >> v8-dev mailing list >> [email protected] >> http://groups.google.com/group/v8-dev >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "v8-dev" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/v8-dev/CAGRskv9ODo7Hco1M8Ac79KP0R7Zauzo7-QVtZ2-TRYM71881cQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/v8-dev/CAGRskv9ODo7Hco1M8Ac79KP0R7Zauzo7-QVtZ2-TRYM71881cQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > Thanks, > > Lukasz > > -- > -- > v8-dev mailing list > [email protected] > http://groups.google.com/group/v8-dev > --- > You received this message because you are subscribed to the Google Groups > "v8-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/v8-dev/CAA_NCUEaAoxoxeB5hVQ8Kiw2%3DLCAqcz1d5ddgqM3O1dL2pP4JA%40mail.gmail.com > <https://groups.google.com/d/msgid/v8-dev/CAA_NCUEaAoxoxeB5hVQ8Kiw2%3DLCAqcz1d5ddgqM3O1dL2pP4JA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/CAGRskv-CqQP%2B8ZCkU8oBAek34eR506nHBgoY0ioLOkzWbg-i2A%40mail.gmail.com.
