Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?

2017-06-22 Thread Christopher Allan Webber
Mark H Weaver writes:

> Hi Chris,
>
> I'm terribly sorry for the long delay on this.  For better or worse,
> I've become extremely concerned about computer security, and so I feel a
> heavy responsibility to be extra careful about code that is expected to
> parse hostile data.

No worries.  I also know you've been working hard to keep us secure in
your work on Guix and etc, and all that work is valued / appreciated!

> I was also looking for a cleaner way to express this parser, and to add
> better error reporting, while allowing flexibility for users to
> customize the Scheme representation.  Toward these ends, I ended up
> re-implementing the parser from scratch.
>
> I've attached my current draft of the new parser.  By default, JSON
> objects are represented as (@ . ) where  has dotted pairs,
> but it's easy to ask for your preferred two-element lists using
> 'make-json-parser'.

I like the interface to make-json-reader.  It looks like it would work
well by default, but it's also obvious to me how to extend it.  It also
looks far less disasterous than my attempt to be flexible on using
fashes or s-exps via an overdose of if clauses. ;)  Looks good!

> The json writer is mostly okay, but it also needs to be generalized to
> support the customizable representation (and maybe I went too far here
> with the parser, dunno).

I don't see the writer included below?

> Also, there are a few cases where it will generate invalid JSON,
> notably: if ASCII control characters are present in strings, hex
> escapes must be printed instead of the raw character, and real numbers
> cannot simply be printed using 'display' because of infinities and
> NaNs.
>
> Are you okay with this general direction?

I'm good with it!  It's a bit more complicated to read than David's
original design but it looks well formed (and I followed it well still),
and I'm confident in your reasoning behind its design.  I feel like the
flexibility of plugging in different datastructures as needed is a big
win, anyway.

Speaking of security issues, I've wondered before about someone giving a
json structure so large or deeply nested that it's hard to fit in memory
(maybe a structure that even just looks like "[[...").
Maybe a recursive limit, or some other type of limit, would be useful?
but I imagine that could complexify the design considerably, and maybe
it's not worth it.

I tested this on a few examples from
https://www.w3.org/TR/activitystreams-vocabulary/
and they all seemed to work fine.

Thanks for all your hard work on this!  Looking forward to seeing this land!

 - Chris

PS:

> ;; XXX Consider using conditions and exceptions from SRFI-34 + SRFI-35
> ;; or R6RS (they should be merged!)

Yeah I've wondered about this... I've come to the point where I needed
the kind of condition subtyping that srfi-35 and r6rs conditions
provide, but I'm not sure which one to use.  Advice?



Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?

2017-06-20 Thread Mark H Weaver
I wrote:
> and real numbers cannot simply be printed using 'display' because of
> infinities and NaNs.

... and exact rationals.

  Mark



Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?

2017-06-20 Thread Mark H Weaver
I wrote:

> I was also looking for a cleaner way to express this parser, and to add
> better error reporting, while allowing flexibility for users to
> customize the Scheme representation.

I forgot to mention that another goal was to minimize heap allocations,
e.g. eliminating the allocation of intermediate lists of characters or
digits.

>   (define (read-array port)
> "Read a JSON array from PORT and return the result of calling
> ARRAY-FINALIZE on the final seed produced using ARRAY-KNIL and
> ARRAY-KONS."
> (match-next* port
>   (#\[ (match-next* port
>  (#\] array-knil)
>  (else (let ((seed (read-array-elements array-knil port)))
>  (match-next* port
>(#\] (array-finalize seed)

Sorry, I forgot to apply 'array-finalize' in the empty-array case.
Here's a corrected version:

  (define (read-array port)
"Read a JSON array from PORT and return the result of calling
ARRAY-FINALIZE on the final seed produced using ARRAY-KNIL and
ARRAY-KONS."
(array-finalize
 (match-next* port
   (#\[ (match-next* port
  (#\] array-knil)
  (else (let ((seed (read-array-elements array-knil port)))
  (match-next* port
(#\] seed)

  Mark



Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?

2017-06-20 Thread Mark H Weaver
Hi Chris,

I'm terribly sorry for the long delay on this.  For better or worse,
I've become extremely concerned about computer security, and so I feel a
heavy responsibility to be extra careful about code that is expected to
parse hostile data.

I was also looking for a cleaner way to express this parser, and to add
better error reporting, while allowing flexibility for users to
customize the Scheme representation.  Toward these ends, I ended up
re-implementing the parser from scratch.

I've attached my current draft of the new parser.  By default, JSON
objects are represented as (@ . ) where  has dotted pairs,
but it's easy to ask for your preferred two-element lists using
'make-json-parser'.

The json writer is mostly okay, but it also needs to be generalized to
support the customizable representation (and maybe I went too far here
with the parser, dunno).  Also, there are a few cases where it will
generate invalid JSON, notably: if ASCII control characters are present
in strings, hex escapes must be printed instead of the raw character,
and real numbers cannot simply be printed using 'display' because of
infinities and NaNs.

Are you okay with this general direction?

  Mark

(define-module (ice-9 json)
  #:use-module (ice-9 match)
  #:use-module (srfi srfi-11)  ; let-values
  #:export (read-json make-json-reader))

;; XXX Consider using conditions and exceptions from SRFI-34 + SRFI-35
;; or R6RS (they should be merged!)
(define (json-error port message . args)
  (throw 'json-error port (peek-char port) message args))

(define-syntax match-next
  (syntax-rules (else)
"Like 'match' from (ice-9 match), but with a few differences.  The
first operand is PORT, which must be a variable bound to an open input
port.  The value to be matched is the result of (peek-char PORT).  If
a pattern from one of the clauses matches the peeked character (or
eof-object), then (read-char PORT) is implicitly performed before
evaluating the associated body.  If none of the patterns match, then
leave the port position unchanged, and evaluate the body of the final
'else' clause if present, or else raise a JSON error."
((match-next port
   (c body0 body ...)
   ...
   (else ebody0 ebody ...))
 (match (peek-char port)
   (c (read-char port) body0 body ...)
   ...
   (_ ebody0 ebody ...)))

((match-next port
   (c body0 body ...)
   ...)
 (match-next port
   (c body0 body ...)
   ...
   (else (json-error port "expected" '(c ...)))

(define-syntax match-next*
  (syntax-rules (else)
"Like 'match-next', but with an implicit final clause that consumes
JSON whitespace characters and tries again.  Assuming that none of the
explicit patterns match JSON whitespace, this is equivalent to
consuming JSON whitespace before 'match-next', but it is more
efficient when JSON whitespace is not present."
((match-next* port
   (c body0 body ...)
   ...
   (else ebody0 ebody ...))
 (let loop ()
   (match-next port
 (c body0 body ...) ...
 ((or #\space #\tab #\newline #\return)
  (loop))
 (else ebody0 ebody ...

((match-next* port
   (c body0 body ...)
   ...)
 (match-next* port
   (c body0 body ...)
   ...
   (else (json-error port "expected" '(c ...)))

(define-syntax read-literal
  (syntax-rules ()
"Read the given characters C C* ... from PORT and return VALUE.
Raise a JSON error if the expected characters are not found."
((read-literal port value c c* ...)
 (match-next port
   (c (read-literal port value c* ...))
   (else (json-error port "expected" c
 "while reading" 'value
((read-literal port value)
 value)))

(define* (make-json-reader
  #:key
  (true   #t)
  (false  #f)
  (null   'null)
  (make-string(lambda (s) s))
  (make-number(lambda (x) x))
  (obj-knil   '())
  (obj-kons   (lambda (k v seed) (cons (cons k v) seed)))
  (obj-finalize   (lambda (seed) (cons '@ (reverse! seed
  (array-knil '())
  (array-kons cons)
  (array-finalize reverse!))

  ;; ws = *( %x20 /  ; Space
  ;; %x09 /  ; Horizontal tab
  ;; %x0A /  ; Line feed or New line
  ;; %x0D )  ; Carriage return
  (define (consume-whitespace port)
"Consume zero or more JSON whitespace characters (#\\space #\\tab
#\\newline or #\\return) from PORT."
(match-next* port
  (else #t)))

  ;; DIGIT
  (define (try-read-digit port)
"Peeks at the next character to be read from PORT.  If it's a
decimal digit, then read it and return its value.  Otherwise, leave
the port position unchanged and return false."
(match-next port
  (#\0 0)
  (#\1 1)
  (#\2 2)
  (#\3 3)
  (#\4 4)
  (#\5 5)
  (#\6 6)
  (#\7 7)
  

Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?

2017-05-15 Thread Christopher Allan Webber
Mark H Weaver writes:

> I wrote:
>> Most of the modifications you've made are good, but I'm very
>> uncomfortable with the use of #nil in this API.  [...]
>
> Christopher Allan Webber  writes:
>> Oh!  No you got it backwards, the library *was* using #nil initially,
>> and I modified it to use 'null now instead. :)
>
> Ah, my mistake.  Excellent!
>
> Having now looked more closely, I'm mostly happy with the API, except
> for one issue: I don't like the way fash support was hacked in, with the
> 'use-fash' flag and the (if use-fash [fash-code] [alist-code]) sprinkled
> around.  If this truly needs to be done within the json library itself
> (which I wonder), then I'd prefer to generalize it to support any
> dictionary data structure, and thereby remove the dependency on fashes.

I agree that it's pretty hacky.  Allowing other dictionary structures is
fine by me.

> My main concern about fashes, besides the fact that Andy hasn't yet
> proposed adding them to Guile himself, is that the implementation is
> very complex, and I'd like to achieve some degree of confidence in its
> correctness before adding it.  I'd also tend to favor adding a simpler,
> truly immutable dictionary data structure based on Phil Bagwell's HAMTs
> (Hash Array Mapped Tries) to eliminate the need for thread
> synchronization, but I'm open to suggestions.

I don't really understand enough of the field to really know what the
right direction is.  I do know that I need something that's not O(n) for
json-ld processing, though I guess one option always would have been to
read in the sexp structure and transform it before doing all that
processing.   I've long wanted a better immutable dictionary
structure in Guile though, but am open to what it would be.

> Anyway, since writing my previous message in this thread, I've started
> carefully reviewing the code, making modifications as I go.  At this
> point, my proposed modifications have become quite extensive.  So far,
> I've reworked the code to greatly reduce heap allocations, support
> arbitrary dictionary types (removing the fash dependency, while still
> allowing its use), and fix various bugs (e.g. relying on unspecified
> evaluation order, failure to handle 12-character hex escapes properly,
> producing and accepting invalid JSON in some cases, etc).
>
> I'll followup with another message when I've completed my proposed
> revisions.  Feel free to ping me if it takes more than a week.

Wow, exciting!

>>> Otherwise, I'm generally in favor of incorporating this library into
>>> Guile, after we make sure that it is robust against malicious inputs.
>>
>> Okay, cool!  The other thing is to add more specific error messages, as
>> discussed.
>
> Indeed, better error messages would be a good thing.
>
>> What examples of malicious inputs should we test against?
>
> I'm mostly trying to address that by careful code review.

Yay!  Thank you for doing it.



Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?

2017-05-15 Thread Mark H Weaver
I wrote:
> Most of the modifications you've made are good, but I'm very
> uncomfortable with the use of #nil in this API.  [...]

Christopher Allan Webber  writes:
> Oh!  No you got it backwards, the library *was* using #nil initially,
> and I modified it to use 'null now instead. :)

Ah, my mistake.  Excellent!

Having now looked more closely, I'm mostly happy with the API, except
for one issue: I don't like the way fash support was hacked in, with the
'use-fash' flag and the (if use-fash [fash-code] [alist-code]) sprinkled
around.  If this truly needs to be done within the json library itself
(which I wonder), then I'd prefer to generalize it to support any
dictionary data structure, and thereby remove the dependency on fashes.

My main concern about fashes, besides the fact that Andy hasn't yet
proposed adding them to Guile himself, is that the implementation is
very complex, and I'd like to achieve some degree of confidence in its
correctness before adding it.  I'd also tend to favor adding a simpler,
truly immutable dictionary data structure based on Phil Bagwell's HAMTs
(Hash Array Mapped Tries) to eliminate the need for thread
synchronization, but I'm open to suggestions.

Anyway, since writing my previous message in this thread, I've started
carefully reviewing the code, making modifications as I go.  At this
point, my proposed modifications have become quite extensive.  So far,
I've reworked the code to greatly reduce heap allocations, support
arbitrary dictionary types (removing the fash dependency, while still
allowing its use), and fix various bugs (e.g. relying on unspecified
evaluation order, failure to handle 12-character hex escapes properly,
producing and accepting invalid JSON in some cases, etc).

I'll followup with another message when I've completed my proposed
revisions.  Feel free to ping me if it takes more than a week.

>> Otherwise, I'm generally in favor of incorporating this library into
>> Guile, after we make sure that it is robust against malicious inputs.
>
> Okay, cool!  The other thing is to add more specific error messages, as
> discussed.

Indeed, better error messages would be a good thing.

> What examples of malicious inputs should we test against?

I'm mostly trying to address that by careful code review.

 Regards,
   Mark



Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?

2017-05-13 Thread Christopher Allan Webber
Mark H Weaver writes:

> Hi Chris,
>
> Christopher Allan Webber  writes:
>> So a while ago, David Thompson submitted (ice-9 json) to Guile proper.
>> A few changes were requested, so it hadn't made it in.  In the meanwhile
>> I began using it for a number of projects.  I also added some
>> modifications and extensions: #nil became 'null for the representation
>> of null values,
>
> Most of the modifications you've made are good, but I'm very
> uncomfortable with the use of #nil in this API.  #nil is a terrible hack
> which may not even be adequate for its intended use case.  Its existence
> in any data structure is likely to cause misbehavior in other Scheme
> code that is exposed to it, because it violates a longstanding fact in
> Scheme that there is only one value that is treated as "false".  It
> would also make it difficult or impossible to port this library, and
> thus anything that depends on this library, to other Scheme systems.  We
> should not promote its use by incorporating it into new APIs.
>
> What do you think?

Oh!  No you got it backwards, the library *was* using #nil initially,
and I modified it to use 'null now instead. :)

So I think you'd be probably pretty happy!

> Otherwise, I'm generally in favor of incorporating this library into
> Guile, after we make sure that it is robust against malicious inputs.

Okay, cool!  The other thing is to add more specific error messages, as
discussed.

What examples of malicious inputs should we test against?



Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?

2017-05-12 Thread Mark H Weaver
Hi Chris,

Christopher Allan Webber  writes:
> So a while ago, David Thompson submitted (ice-9 json) to Guile proper.
> A few changes were requested, so it hadn't made it in.  In the meanwhile
> I began using it for a number of projects.  I also added some
> modifications and extensions: #nil became 'null for the representation
> of null values,

Most of the modifications you've made are good, but I'm very
uncomfortable with the use of #nil in this API.  #nil is a terrible hack
which may not even be adequate for its intended use case.  Its existence
in any data structure is likely to cause misbehavior in other Scheme
code that is exposed to it, because it violates a longstanding fact in
Scheme that there is only one value that is treated as "false".  It
would also make it difficult or impossible to port this library, and
thus anything that depends on this library, to other Scheme systems.  We
should not promote its use by incorporating it into new APIs.

What do you think?

Otherwise, I'm generally in favor of incorporating this library into
Guile, after we make sure that it is robust against malicious inputs.

Regards,
  Mark



Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?

2017-05-08 Thread Christopher Allan Webber
Hello!

So a while ago, David Thompson submitted (ice-9 json) to Guile proper.
A few changes were requested, so it hadn't made it in.  In the meanwhile
I began using it for a number of projects.  I also added some
modifications and extensions: #nil became 'null for the representation
of null values, the representation got a bit easier to read for deeply
nested lists-of-dicts-of-lists-of-dicts (moved from '(@ (key . val)) to
'(@ (key val)) after some discussion with David Thompson), I added a
pretty printer, and I also added fash support, snarfing fash.scm from
Andy Wingo (I had a number of cases where I had json documents with a
*lot* of key / value pairs and I was operating on them, and being able
to read/write from a constant time datastructure as an option was
needed).

I released this as an independed library, which is even now packaged in
Guix as guile-sjson.

However, I wonder if we should package this in Guile proper.  There is
still one issue to resolve iirc, I should add more specific exception
names.

The biggest problem to me seems that we would also want to include
Wingo's fash.scm in Guile proper.  Personally, I think this would be a
big win: highly performant immutable hashmaps are desirable (and though
we have vhashes, setting an existing value keeps the old value, and are
not as fast as fashes in my experience).

So:
 - Are Guile's developers open to having an (ice-9 fash) module?
 - And should I submit (ice-9 json), with my changes?

Thanks!
 - Chris