Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types

2016-11-07 Thread Sven M. Hallberg
Tony Arcieri  on Fri, Nov 04 2016:
> To parse and typecheck TJSON in one pass, it would involve obtaining the
> parse tree for the LHS of parsing a particular nonterminal and pass it to
> the pushdown automaton parsing the RHS as a sort of parametric argument
> along with the remaining unconsumed tokens.

This sounds like monadic bind (>>=) which we have in Hammer (I
implemented it), but it is for obvious reasons the single most general
combinator.
___
langsec-discuss mailing list
langsec-discuss@mail.langsec.org
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss


Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types

2016-11-03 Thread Tony Arcieri
On Thu, Nov 3, 2016 at 2:46 AM, Sven M. Hallberg  wrote:

> > {"dialpad:A": [["1","2","3"], ["4","5","6"], ["7","8","9]]}
>
> Now this looks definitely context-sensitive. One nested structure on the
> right of the ':' depending on another to the left. You can no longer get
> away with a grammar but you'll have all the fun of a type system.
>

The grammar is certainly still context-free:

 ::=   
 ::= '"' * ':'  '"'
 ::=  | 
 ::=  '<'  '>'
 ::=  *
 ::=  *
 ::=  | 
 ::= 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' |
  'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' |
  'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' |
  'Y' | 'Z'
 ::= 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' |
  'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' |
  'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' |
  'y' | 'z'
 ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
 ::=  | 

But, as you noted, this does add a sort of type system to the language,
such that it's now possible to express documents which don't typecheck.

I agree this makes the format more complicated, but it does make the format
more amenable to mapping to statically typed programming language. Also,
it's a rather simple type system, and one that can typecheck things in the
same pass as processing it (I believe, I'm still yet to implement it).

Wait a minute, why are you stopping at objects with the type
> refinement? Shouldn't you put your entire schema into the type?
>

Objects as self-describing product types, so no further type information is
necessary.

-- 
Tony Arcieri
___
langsec-discuss mailing list
langsec-discuss@mail.langsec.org
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss


Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types

2016-11-03 Thread Sven M. Hallberg
Tony Arcieri  on Wed, Nov 02 2016:
> {"foo:s": "bar"}

Suddenly your grammar for the value depends on a piece of information
inside the key...


> This means the only type allowed for member names is a string (which seems
> fine to me).

This one I would actually suggest for consideration in the original
form. {"s:x": "s:foo", "s:y": "s:bar", "s:z": "s:baz"} just seems kind
of silly.


> {"dialpad:A": [["1","2","3"], ["4","5","6"], ["7","8","9]]}

Now this looks definitely context-sensitive. One nested structure on the
right of the ':' depending on another to the left. You can no longer get
away with a grammar but you'll have all the fun of a type system.

Also I'm sure some will want their heterogenous lists back.


> {"myobjects:A": [{"foo:i":"1"},{"bar:i":"2"},{"baz:i":"3"}]}

Wait a minute, why are you stopping at objects with the type
refinement? Shouldn't you put your entire schema into the type?

Obviously, that's half ironic/rhethoric, but it seems clear that this
scheme is a complication of the original, so not a Pareto-efficient
improvement.


-pesco
___
langsec-discuss mailing list
langsec-discuss@mail.langsec.org
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss


Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types

2016-10-26 Thread Jeffrey Goldberg
On Oct 26, 2016, at 11:34 AM, Tony Arcieri  wrote:

> On Tue, Oct 25, 2016 at 11:16 PM, Jeffrey Goldberg  
> wrote:
> > If the UTF8 strings aren't normalized, you will get different hashes for 
> > visually and semantically identical strings. 
> 
> Unicode normalization is presently an optional flag in objecthash,

I noticed that only after sending my message.

> but should be on by default (I think?) and supported by all implementations.

Yep.

Again, thanks for getting this started. This has been something I’ve been 
concerned about for a while, but not sufficiently concerned about to actually 
act on.

Cheers,

-j
___
langsec-discuss mailing list
langsec-discuss@mail.langsec.org
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss


Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types

2016-10-26 Thread Tony Arcieri
Some serendipitous timing here, I just saw this article "Parsing JSON is a
Minefield":

http://seriot.ch/parsing_json.html

It shows how dramatically differently various parsers handle various types
of malformed JSON. One of the things I've been trying to put together in
TJSON is a comprehensive set of test cases that ensure conforming parsers
have the same behavior:

https://github.com/tjson/tjson-spec/blob/master/draft-tjson-examples.txt

-- 
Tony Arcieri
___
langsec-discuss mailing list
langsec-discuss@mail.langsec.org
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss


Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types

2016-10-26 Thread Sven M. Hallberg

On Oct 25, 2016, at 5:15 PM, Tony Arcieri  wrote:
> https://www.tjson.org/

Neat!

You describe the generic form of ":..." in BNF, but you can also
describe all your higher-level requirements in the grammar. Are there
plans to produce a fully grammatical specification?

You make a point of the language being a subset of JSON which "can be
understood by existing JSON parsers". A grammar for the subset is needed
to perform proper recognition before processing by a generic JSON
parser.


Jeffrey Goldberg  on Wed, Oct 26 2016:
> If the UTF8 strings aren't normalized, you will get different hashes
> for visually and semantically identical strings.

Along the same line, beware of surrogate pairs escape-encoded in the
string. E.g.:

  "s:\u\u"

Here is the relevant piece of ABNF I once wrote for my JSON-like pet
project^1:

  esc-unicode = u (u-basic / u-surro)

  u-surro = u-surro-hi backslash u u-surro-lo
  u-basic = (r0C / rEF) hexdig hexdig hexdig   ; not D...
  / dD r07 hexdig hexdig   ; D[0-7]..
  u-surro-hi = dD r8B hexdig hexdig; D[8-B]..
  u-surro-lo = dD rCF hexdig hexdig; D[C-F]..

  ; hex ranges
  r0C = %x30-39 / %x41-43 / %x61-63   ; 0-9 A B C
  rEF = %x45-46 / %x65-66 ; E F
  r07 = %x30-37   ; 0-7
  r8B = %x38-39 / %x41-42 / %x61-62   ; 8 9 A B
  rCF = %x43-46 / %x63-66 ; C D E F

  u = %x75; u
  dD = %x44 / %x64; d D

^1: http://khjk.org/log/2012/jun/datalang.html


-pesco
___
langsec-discuss mailing list
langsec-discuss@mail.langsec.org
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss


Re: [langsec-discuss] TJSON: Tagged JSON with Rich Types

2016-10-26 Thread Jeffrey Goldberg
Sent from my iPad
On Oct 25, 2016, at 5:15 PM, Tony Arcieri  wrote:

> I wanted to give LANGSEC a sneak peek of a project I've been working on with 
> Ben Laurie before circulating it more widely:
> 
> https://www.tjson.org/

Thank you! I have wanted something like this to exist. 

> If there are any other notable problems you think should be addressed, I'd be 
> curious to hear them.

If the UTF8 strings aren't normalized, you will get different hashes for 
visually and semantically identical strings. 

Cheers,

-j___
langsec-discuss mailing list
langsec-discuss@mail.langsec.org
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss