Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types
Tony Arcierion Fri, Nov 04 2016: > To parse and typecheck TJSON in one pass, it would involve obtaining the > parse tree for the LHS of parsing a particular nonterminal and pass it to > the pushdown automaton parsing the RHS as a sort of parametric argument > along with the remaining unconsumed tokens. This sounds like monadic bind (>>=) which we have in Hammer (I implemented it), but it is for obvious reasons the single most general combinator. ___ langsec-discuss mailing list langsec-discuss@mail.langsec.org https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types
On Thu, Nov 3, 2016 at 2:46 AM, Sven M. Hallbergwrote: > > {"dialpad:A": [["1","2","3"], ["4","5","6"], ["7","8","9]]} > > Now this looks definitely context-sensitive. One nested structure on the > right of the ':' depending on another to the left. You can no longer get > away with a grammar but you'll have all the fun of a type system. > The grammar is certainly still context-free: ::= ::= '"' * ':' '"' ::= | ::= '<' '>' ::= * ::= * ::= | ::= 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' ::= 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ::= | But, as you noted, this does add a sort of type system to the language, such that it's now possible to express documents which don't typecheck. I agree this makes the format more complicated, but it does make the format more amenable to mapping to statically typed programming language. Also, it's a rather simple type system, and one that can typecheck things in the same pass as processing it (I believe, I'm still yet to implement it). Wait a minute, why are you stopping at objects with the type > refinement? Shouldn't you put your entire schema into the type? > Objects as self-describing product types, so no further type information is necessary. -- Tony Arcieri ___ langsec-discuss mailing list langsec-discuss@mail.langsec.org https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types
Tony Arcierion Wed, Nov 02 2016: > {"foo:s": "bar"} Suddenly your grammar for the value depends on a piece of information inside the key... > This means the only type allowed for member names is a string (which seems > fine to me). This one I would actually suggest for consideration in the original form. {"s:x": "s:foo", "s:y": "s:bar", "s:z": "s:baz"} just seems kind of silly. > {"dialpad:A": [["1","2","3"], ["4","5","6"], ["7","8","9]]} Now this looks definitely context-sensitive. One nested structure on the right of the ':' depending on another to the left. You can no longer get away with a grammar but you'll have all the fun of a type system. Also I'm sure some will want their heterogenous lists back. > {"myobjects:A": [{"foo:i":"1"},{"bar:i":"2"},{"baz:i":"3"}]} Wait a minute, why are you stopping at objects with the type refinement? Shouldn't you put your entire schema into the type? Obviously, that's half ironic/rhethoric, but it seems clear that this scheme is a complication of the original, so not a Pareto-efficient improvement. -pesco ___ langsec-discuss mailing list langsec-discuss@mail.langsec.org https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types
On Oct 26, 2016, at 11:34 AM, Tony Arcieriwrote: > On Tue, Oct 25, 2016 at 11:16 PM, Jeffrey Goldberg > wrote: > > If the UTF8 strings aren't normalized, you will get different hashes for > > visually and semantically identical strings. > > Unicode normalization is presently an optional flag in objecthash, I noticed that only after sending my message. > but should be on by default (I think?) and supported by all implementations. Yep. Again, thanks for getting this started. This has been something I’ve been concerned about for a while, but not sufficiently concerned about to actually act on. Cheers, -j ___ langsec-discuss mailing list langsec-discuss@mail.langsec.org https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types
Some serendipitous timing here, I just saw this article "Parsing JSON is a Minefield": http://seriot.ch/parsing_json.html It shows how dramatically differently various parsers handle various types of malformed JSON. One of the things I've been trying to put together in TJSON is a comprehensive set of test cases that ensure conforming parsers have the same behavior: https://github.com/tjson/tjson-spec/blob/master/draft-tjson-examples.txt -- Tony Arcieri ___ langsec-discuss mailing list langsec-discuss@mail.langsec.org https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types
On Oct 25, 2016, at 5:15 PM, Tony Arcieriwrote: > https://www.tjson.org/ Neat! You describe the generic form of ":..." in BNF, but you can also describe all your higher-level requirements in the grammar. Are there plans to produce a fully grammatical specification? You make a point of the language being a subset of JSON which "can be understood by existing JSON parsers". A grammar for the subset is needed to perform proper recognition before processing by a generic JSON parser. Jeffrey Goldberg on Wed, Oct 26 2016: > If the UTF8 strings aren't normalized, you will get different hashes > for visually and semantically identical strings. Along the same line, beware of surrogate pairs escape-encoded in the string. E.g.: "s:\u\u" Here is the relevant piece of ABNF I once wrote for my JSON-like pet project^1: esc-unicode = u (u-basic / u-surro) u-surro = u-surro-hi backslash u u-surro-lo u-basic = (r0C / rEF) hexdig hexdig hexdig ; not D... / dD r07 hexdig hexdig ; D[0-7].. u-surro-hi = dD r8B hexdig hexdig; D[8-B].. u-surro-lo = dD rCF hexdig hexdig; D[C-F].. ; hex ranges r0C = %x30-39 / %x41-43 / %x61-63 ; 0-9 A B C rEF = %x45-46 / %x65-66 ; E F r07 = %x30-37 ; 0-7 r8B = %x38-39 / %x41-42 / %x61-62 ; 8 9 A B rCF = %x43-46 / %x63-66 ; C D E F u = %x75; u dD = %x44 / %x64; d D ^1: http://khjk.org/log/2012/jun/datalang.html -pesco ___ langsec-discuss mailing list langsec-discuss@mail.langsec.org https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types
Sent from my iPad On Oct 25, 2016, at 5:15 PM, Tony Arcieriwrote: > I wanted to give LANGSEC a sneak peek of a project I've been working on with > Ben Laurie before circulating it more widely: > > https://www.tjson.org/ Thank you! I have wanted something like this to exist. > If there are any other notable problems you think should be addressed, I'd be > curious to hear them. If the UTF8 strings aren't normalized, you will get different hashes for visually and semantically identical strings. Cheers, -j___ langsec-discuss mailing list langsec-discuss@mail.langsec.org https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss