Re: [racket-users] How to write string-match?

2016-12-20 Thread Alex Knauth

> On Dec 20, 2016, at 3:45 PM, Alexis King  wrote:
> 
>> On Dec 20, 2016, at 07:54, Alex Knauth  wrote:
>> 
>> Oooh, that's pretty cool. Much better than my super-slow attempt.
>> 
>> Should you make this into a package (I would certainly use it a lot)
>> or would it make more sense to add in a pull request to the existing
>> `match-string` package?
> 
> I probably won’t take the time to turn it into a package, if only
> because I’m not sure I’d really want that behavior for anything “real”;
> I find that most of the time, when using regular expressions, (.*) is
> not what you want. In this very example, the code invokes string->number
> on one of the captured strings, which implies the pattern your actually
> want is:
> 
>  ^(.*)--(\d+) (.*) end$
> 
> In reality, my experience is that you often want to use negated
> character ranges instead of “.”, so the first capture group would be
> ([^-]*) instead.

This could be expressed with a restricted syntax of `and` patterns. For example 
`(and str-pat id)` would turn into a group containing a regexp-ified str-pat 
but bind the result of the group to the id.

If you make a pull request to add this as a separate module to my 
`match-string` package, I can work to try to combine the two in ways like this.

> At this point, in order to use regular expressions and
> have the constraints affect parsing, you need to compile the constraints
> into the patterns themselves, not apply them post-hoc. For that reason,
> you’d probably effectively want a s-expression regular expression
> syntax.

Would this be closer to what we would want then?
https://docs.racket-lang.org/automata/index.html#%28part._re%29 


It allows regular expressions to be written on lists of tokens instead of just 
strings, and it allows them to be combined with s-expression operations like 
complement, union, intersection, and star.

Alex Knauth

> Personally, I’d just write this:
> 
>[(regexp #px"^([^-]*)--(\\d+) (.*) end$"
> (list _ a b c))
>  ...]
> 
> If you want to add something like it to your match-string package for
> simple cases like this one, though, I won’t have any problem with that,
> of course. I’ve just found that, most of the time, once I have something
> that gets too complicated to read as an inline regular expression, it’s
> easier for me to switch to full-fledged parser combinators rather than
> try and prettify regexps.
> 
> (Admittedly, though, this has made me realize I should probably enhance
> megaparsack to cooperate with match in some way.)

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] How to write string-match?

2016-12-20 Thread Alexis King
> On Dec 20, 2016, at 07:54, Alex Knauth  wrote:
> 
> Oooh, that's pretty cool. Much better than my super-slow attempt.
> 
> Should you make this into a package (I would certainly use it a lot)
> or would it make more sense to add in a pull request to the existing
> `match-string` package?

I probably won’t take the time to turn it into a package, if only
because I’m not sure I’d really want that behavior for anything “real”;
I find that most of the time, when using regular expressions, (.*) is
not what you want. In this very example, the code invokes string->number
on one of the captured strings, which implies the pattern your actually
want is:

  ^(.*)--(\d+) (.*) end$

In reality, my experience is that you often want to use negated
character ranges instead of “.”, so the first capture group would be
([^-]*) instead. At this point, in order to use regular expressions and
have the constraints affect parsing, you need to compile the constraints
into the patterns themselves, not apply them post-hoc. For that reason,
you’d probably effectively want a s-expression regular expression
syntax. Personally, I’d just write this:

[(regexp #px"^([^-]*)--(\\d+) (.*) end$"
 (list _ a b c))
  ...]

If you want to add something like it to your match-string package for
simple cases like this one, though, I won’t have any problem with that,
of course. I’ve just found that, most of the time, once I have something
that gets too complicated to read as an inline regular expression, it’s
easier for me to switch to full-fledged parser combinators rather than
try and prettify regexps.

(Admittedly, though, this has made me realize I should probably enhance
megaparsack to cooperate with match in some way.)

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] How to write string-match?

2016-12-20 Thread Daniel Prager
Thanks Alexis!

That's such an elegant solution. I'm a little in awe of all the pieces that
you pulled together to make it work.

Fantastic!

In terms of surface syntax, I think that your straight extension to match
is preferable to what I asked for, allowing this sort of thing

  (match s
[(str a "--" b " " c " end") (list (string->symbol a) (string->number
b) c)]
[(str "the " object " is " adjective) (~a adjective " " object)]
[(str "what (" a ") the [" b "] was that?") (list a b)]
[(str "whole-string") 'whole]
[else 'no-match-found]))

On the greediness -- I agree that just allowing regex to do its thing makes
sense -- I gave that restriction with the aim of simplifying the
specification through constraint, but as it turns out it's a hindrance.

I tried several other test cases to try to trip it up, but it's pretty darn
good - nice use of regep-quote, BTW!

When you try patterns like (str "a" "b" c d), for example, its behavior
seems to be the most sensible, although a constraint forcing alternating
strings and ids would arguably make sense.

* * *

It would be great to have this available as a package.

I installed and tried out Alex's match-string package and it does the job,
but as Alex warned it is  "super-slow". Could it be completely
re-engineered to use regexes for performance?

Dan

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] How to write string-match?

2016-12-20 Thread Alex Knauth

> On Dec 20, 2016, at 4:35 AM, Alexis King  wrote:
> 
> One relatively easy solution would be to just compile patterns to
> regular expressions and use Racket’s built-in match form. Writing this
> as a match-expander is fairly straightforward:
> 
>  #lang racket
> 
>  (require (for-syntax racket/string
>   syntax/parse/experimental/template)
>   syntax/parse/define)
> 
>  (begin-for-syntax
>(define-syntax-class svp
>  #:attributes [id regex-str]
>  [pattern id:id
>   #:attr regex-str "(.*)"]
>  [pattern str:str
>   #:attr regex-str (regexp-quote (syntax-e #'str))
>   #:attr id #f]))
> 
>  (define-match-expander str
>(syntax-parser
>  [(_ svp:svp ...)
>   #:do [(define regex-strs
>   (string-append* (attribute svp.regex-str)))]
>   #:with regex (regexp (string-append "^" regex-strs "$"))
>   (template (regexp 'regex (list _ (?? svp.id) ...)))]))
> 
> This will compile string-match patterns into regular expressions by
> converting identifiers into capturing groups, (.*), and converting
> string literals into patterns that match themselves using regexp-quote.


Oooh, that's pretty cool. Much better than my super-slow attempt.

Should you make this into a package (I would certainly use it a lot) or would 
it make more sense to add in a pull request to the existing `match-string` 
package?


> This match expander can be used pretty easily as-is, but if you want to
> use your string-match syntax directly, that’s just another simple macro
> away:
> 
>  (define-syntax-parser string-match
>#:literals [else]
>[(_ input:expr
>[(svp:svp ...) body:expr ...] ...
>{~optional [else else-body ...]})
> (template (match input
> [(str svp ...) body ...] ...
> (?? [_ else-body ...])))])
> 
> This handles your syntax, and ensures an else clause is both optional
> and can only occur at the end. For all of the above code, you could
> improve syntax-parse’s error reporting by adding some description
> annotations, but I’ve left them off for brevity.
> 
> With the above definitions, your sm function is valid syntax, and it
> works. However, the output is different from your expected output for
> the final case, which matches the first case, producing the following
> output:
> 
>> (sm "abc--123--456 bar end")
>  '(abc--123 456 "bar")
> 
> I think this behavior is correct, since that input does match the
> pattern specified without any further information (and I cannot come up
> with a useful parsing scheme that would not parse that string given that
> pattern). Greediness doesn’t really matter there, either, since a greedy
> parser would produce the above result (Racket’s regexps are always
> greedy), and a non-greedy parser would still parse, it would just break
> the string into the components "abc", "123--456", and "bar" instead.
> 
> Alexis
> 
>> On Dec 20, 2016, at 00:59, Daniel Prager  wrote:
>> 
>> While working through many of the puzzles in this year's adventofcode.com I 
>> tend to parse the input with a sequence of string-splits.
>> 
>> This isn't too bad, but What I'd *really* like is a "string-match" form to 
>> more elegantly process structured data, via a few strings based on a simple 
>> (and greedy) left-to-right algorithm.
>> 
>> For example:
>> 
>> (define (sm s)
>>  (string-match s
>>[(a "--" b " " c " end") (list (string->symbol a) (string->number b) c)]
>>[("the " object " is " adjective) (~a adjective " " object)] 
>>[("whole-string") 'whole]
>>[else 'no-match-found]))
>> 
>> (sm "abc--123 foo end") -> '(abc 123 "foo")
>> (sm "the fox is black") -> "black fox"
>> (sm "whole-string") -> 'whole
>> (sm "abc--123--456 bar end") -> 'no-match-found ; greedy strategy keeps 
>> things simple and explicit
>> 
>> 
>> But my macro-fu is too weak.
>> 
>> 1. Can someone show me how to write this style of macro?
>> 2. Is this of more general interest?
>> 
>> Dan
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] How to write string-match?

2016-12-20 Thread Matthew Butterick

> On Dec 20, 2016, at 12:59 AM, Daniel Prager  wrote:
> 
> This isn't too bad, but What I'd *really* like is a "string-match" form to 
> more elegantly process structured data, via a few strings based on a simple 
> (and greedy) left-to-right algorithm.
> 
> But my macro-fu is too weak.
> 
> 1. Can someone show me how to write this style of macro?
> 2. Is this of more general interest?



IIRC Alex Knauth's `match-string` package does what you want. 

But vanilla syntax patterns will match strings and numbers literally, and thus 
arguably amount to a "simple (and greedy) left-to-right algorithm".

#lang racket
(require (for-syntax racket/string))

(define-syntax (sm stx)
  (syntax-case stx ()
[(_ "whole-string") #''whole]
[(_ str) (with-syntax ([TOKS (string-split (syntax->datum #'str) 
#px"--|\\s")])
   #'(sm . TOKS))]
[(_ X1 X2 X3 "end") #'(list (string->symbol X1) (string->number X2) X3)]
[(_ "the" ANIMAL "is" COLOR) #'(string-join (list COLOR ANIMAL) " ")]
[else #''no-match-found]))

(sm "abc--123 foo end") ;-> '(abc 123 "foo")
(sm "the fox is black") ;-> "black fox"
(sm "whole-string") ;-> 'whole
(sm "abc--123--456 bar end") ;-> 'no-match-found ; greedy strategy keeps things 
simple and explicit


The tokenizer stashed in the second case is a little shameful. If you were 
making a DSL it would be cleaner to put that part in the reader, and then `sm` 
could omit that case.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] How to write string-match?

2016-12-20 Thread Jens Axel Søgaard
Excellent idea using a pattern expander.

Alexis didn't give an example:

   > (match "abc--123 foo end"
 [(str a "--" b " " c " end") (list a b c)])
 '("abc" "123" "foo")

/Jens Axel


2016-12-20 10:35 GMT+01:00 Alexis King :

> One relatively easy solution would be to just compile patterns to
> regular expressions and use Racket’s built-in match form. Writing this
> as a match-expander is fairly straightforward:
>
>   #lang racket
>
>   (require (for-syntax racket/string
>syntax/parse/experimental/template)
>syntax/parse/define)
>
>   (begin-for-syntax
> (define-syntax-class svp
>   #:attributes [id regex-str]
>   [pattern id:id
>#:attr regex-str "(.*)"]
>   [pattern str:str
>#:attr regex-str (regexp-quote (syntax-e #'str))
>#:attr id #f]))
>
>   (define-match-expander str
> (syntax-parser
>   [(_ svp:svp ...)
>#:do [(define regex-strs
>(string-append* (attribute svp.regex-str)))]
>#:with regex (regexp (string-append "^" regex-strs "$"))
>(template (regexp 'regex (list _ (?? svp.id) ...)))]))
>
> This will compile string-match patterns into regular expressions by
> converting identifiers into capturing groups, (.*), and converting
> string literals into patterns that match themselves using regexp-quote.
> This match expander can be used pretty easily as-is, but if you want to
> use your string-match syntax directly, that’s just another simple macro
> away:
>
>   (define-syntax-parser string-match
> #:literals [else]
> [(_ input:expr
> [(svp:svp ...) body:expr ...] ...
> {~optional [else else-body ...]})
>  (template (match input
>  [(str svp ...) body ...] ...
>  (?? [_ else-body ...])))])
>
> This handles your syntax, and ensures an else clause is both optional
> and can only occur at the end. For all of the above code, you could
> improve syntax-parse’s error reporting by adding some description
> annotations, but I’ve left them off for brevity.
>
> With the above definitions, your sm function is valid syntax, and it
> works. However, the output is different from your expected output for
> the final case, which matches the first case, producing the following
> output:
>
>   > (sm "abc--123--456 bar end")
>   '(abc--123 456 "bar")
>
> I think this behavior is correct, since that input does match the
> pattern specified without any further information (and I cannot come up
> with a useful parsing scheme that would not parse that string given that
> pattern). Greediness doesn’t really matter there, either, since a greedy
> parser would produce the above result (Racket’s regexps are always
> greedy), and a non-greedy parser would still parse, it would just break
> the string into the components "abc", "123--456", and "bar" instead.
>
> Alexis
>
> > On Dec 20, 2016, at 00:59, Daniel Prager 
> wrote:
> >
> > While working through many of the puzzles in this year's
> adventofcode.com I tend to parse the input with a sequence of
> string-splits.
> >
> > This isn't too bad, but What I'd *really* like is a "string-match" form
> to more elegantly process structured data, via a few strings based on a
> simple (and greedy) left-to-right algorithm.
> >
> > For example:
> >
> > (define (sm s)
> >   (string-match s
> > [(a "--" b " " c " end") (list (string->symbol a) (string->number b)
> c)]
> > [("the " object " is " adjective) (~a adjective " " object)]
> > [("whole-string") 'whole]
> > [else 'no-match-found]))
> >
> > (sm "abc--123 foo end") -> '(abc 123 "foo")
> > (sm "the fox is black") -> "black fox"
> > (sm "whole-string") -> 'whole
> > (sm "abc--123--456 bar end") -> 'no-match-found ; greedy strategy keeps
> things simple and explicit
> >
> >
> > But my macro-fu is too weak.
> >
> > 1. Can someone show me how to write this style of macro?
> > 2. Is this of more general interest?
> >
> > Dan
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
-- 
Jens Axel Søgaard

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] How to write string-match?

2016-12-20 Thread Alexis King
One relatively easy solution would be to just compile patterns to
regular expressions and use Racket’s built-in match form. Writing this
as a match-expander is fairly straightforward:

  #lang racket

  (require (for-syntax racket/string
   syntax/parse/experimental/template)
   syntax/parse/define)

  (begin-for-syntax
(define-syntax-class svp
  #:attributes [id regex-str]
  [pattern id:id
   #:attr regex-str "(.*)"]
  [pattern str:str
   #:attr regex-str (regexp-quote (syntax-e #'str))
   #:attr id #f]))

  (define-match-expander str
(syntax-parser
  [(_ svp:svp ...)
   #:do [(define regex-strs
   (string-append* (attribute svp.regex-str)))]
   #:with regex (regexp (string-append "^" regex-strs "$"))
   (template (regexp 'regex (list _ (?? svp.id) ...)))]))

This will compile string-match patterns into regular expressions by
converting identifiers into capturing groups, (.*), and converting
string literals into patterns that match themselves using regexp-quote.
This match expander can be used pretty easily as-is, but if you want to
use your string-match syntax directly, that’s just another simple macro
away:

  (define-syntax-parser string-match
#:literals [else]
[(_ input:expr
[(svp:svp ...) body:expr ...] ...
{~optional [else else-body ...]})
 (template (match input
 [(str svp ...) body ...] ...
 (?? [_ else-body ...])))])

This handles your syntax, and ensures an else clause is both optional
and can only occur at the end. For all of the above code, you could
improve syntax-parse’s error reporting by adding some description
annotations, but I’ve left them off for brevity.

With the above definitions, your sm function is valid syntax, and it
works. However, the output is different from your expected output for
the final case, which matches the first case, producing the following
output:

  > (sm "abc--123--456 bar end")
  '(abc--123 456 "bar")

I think this behavior is correct, since that input does match the
pattern specified without any further information (and I cannot come up
with a useful parsing scheme that would not parse that string given that
pattern). Greediness doesn’t really matter there, either, since a greedy
parser would produce the above result (Racket’s regexps are always
greedy), and a non-greedy parser would still parse, it would just break
the string into the components "abc", "123--456", and "bar" instead.

Alexis

> On Dec 20, 2016, at 00:59, Daniel Prager  wrote:
> 
> While working through many of the puzzles in this year's adventofcode.com I 
> tend to parse the input with a sequence of string-splits.
> 
> This isn't too bad, but What I'd *really* like is a "string-match" form to 
> more elegantly process structured data, via a few strings based on a simple 
> (and greedy) left-to-right algorithm.
> 
> For example:
> 
> (define (sm s)
>   (string-match s
> [(a "--" b " " c " end") (list (string->symbol a) (string->number b) c)]
> [("the " object " is " adjective) (~a adjective " " object)] 
> [("whole-string") 'whole]
> [else 'no-match-found]))
> 
> (sm "abc--123 foo end") -> '(abc 123 "foo")
> (sm "the fox is black") -> "black fox"
> (sm "whole-string") -> 'whole
> (sm "abc--123--456 bar end") -> 'no-match-found ; greedy strategy keeps 
> things simple and explicit
> 
> 
> But my macro-fu is too weak.
> 
> 1. Can someone show me how to write this style of macro?
> 2. Is this of more general interest?
> 
> Dan

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[racket-users] How to write string-match?

2016-12-20 Thread Daniel Prager
While working through many of the puzzles in this year's adventofcode.com I
tend to parse the input with a sequence of string-splits.

This isn't too bad, but What I'd *really* like is a "string-match" form to
more elegantly process structured data, via a few strings based on a simple
(and greedy) left-to-right algorithm.

For example:

(define (sm s)
  (string-match s
[(a "--" b " " c " end") (list (string->symbol a) (string->number b) c)]
[("the " object " is " adjective) (~a adjective " " object)]
[("whole-string") 'whole]
[else 'no-match-found]))

(sm "abc--123 foo end") -> '(abc 123 "foo")
(sm "the fox is black") -> "black fox"
(sm "whole-string") -> 'whole
(sm "abc--123--456 bar end") -> 'no-match-found ; greedy strategy keeps
things simple and explicit



But my macro-fu is too weak.

1. Can someone show me how to write this style of macro?
2. Is this of more general interest?

Dan

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.