Re: [racket-users] Quick regexp question

2018-02-02 Thread nocheroot
It is nice.  I'll have to play with this as well.  Thanks again everyone.

On Friday, February 2, 2018 at 5:49:15 PM UTC-6, johnbclements wrote:
>
>
>
> > On Feb 2, 2018, at 3:21 PM, Matthew Butterick  > wrote: 
> > 
> > 
> >> On Feb 2, 2018, at 10:23 AM, 'John Clements' via Racket Users <
> racket...@googlegroups.com > wrote: 
> >> 
> >> This macro gets the names in much closer to the corresponding patterns 
> than matching by index, but it doesn’t actually embed the names into the 
> regexp. 
> > 
> > 
> > If you like keeping the names and patterns together, you could also 
> create an association list of the names and subpatterns, and iterate: 
> > 
> > #lang racket 
> > 
> > (define msg "2018-02-02T11:26:34 someuser some-computername01 
> 233.194.20.110 something broke") 
> > (with-input-from-string msg 
> >  (thunk 
> >(for/hash ([(name pat) (in-dict '((date . "[-\\dT:]+") 
> >  (username . "\\w+") 
> >  (hostname . "[-\\w\\d]+") 
> >  (ip . "[\\d\\.]+") 
> >  (message . ".+")))]) 
> >  (values name (car (regexp-match (pregexp pat) 
> (current-input-port))) 
>
> Oh, that’s nice. 
>
> In fact, I’ll tell you what I *really* like about that; it could radically 
> simplify the irritating process of debugging regexps by breaking them in 
> various places to perform a binary search; you could instead provide a nice 
> error message specifying exactly which part of the regexp failed to match. 
>
> One thing to be aware of is that you’d need to make sure that your regexp 
> still works without backtracking. If you broke #px”.*abc” into #px”.*” and 
> #px”abc”, it wouldn’t mean the same thing any more. 
>
> John 
>
> > 
> > 
> > '#hash((message . #" something broke") 
> >   (date . #"2018-02-02T11:26:34") 
> >   (username . #"someuser") 
> >   (hostname . #"some-computername01") 
> >   (ip . #"233.194.20.110")) 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Racket Users" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to racket-users...@googlegroups.com . 
> > For more options, visit https://groups.google.com/d/optout. 
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Quick regexp question

2018-02-02 Thread 'John Clements' via Racket Users


> On Feb 2, 2018, at 3:21 PM, Matthew Butterick  wrote:
> 
> 
>> On Feb 2, 2018, at 10:23 AM, 'John Clements' via Racket Users 
>>  wrote:
>> 
>> This macro gets the names in much closer to the corresponding patterns than 
>> matching by index, but it doesn’t actually embed the names into the regexp.
> 
> 
> If you like keeping the names and patterns together, you could also create an 
> association list of the names and subpatterns, and iterate:
> 
> #lang racket
> 
> (define msg "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 
> something broke")
> (with-input-from-string msg
>  (thunk
>(for/hash ([(name pat) (in-dict '((date . "[-\\dT:]+")
>  (username . "\\w+")
>  (hostname . "[-\\w\\d]+")
>  (ip . "[\\d\\.]+")
>  (message . ".+")))])
>  (values name (car (regexp-match (pregexp pat) 
> (current-input-port)))

Oh, that’s nice.

In fact, I’ll tell you what I *really* like about that; it could radically 
simplify the irritating process of debugging regexps by breaking them in 
various places to perform a binary search; you could instead provide a nice 
error message specifying exactly which part of the regexp failed to match.

One thing to be aware of is that you’d need to make sure that your regexp still 
works without backtracking. If you broke #px”.*abc” into #px”.*” and #px”abc”, 
it wouldn’t mean the same thing any more.

John

> 
> 
> '#hash((message . #" something broke")
>   (date . #"2018-02-02T11:26:34")
>   (username . #"someuser")
>   (hostname . #"some-computername01")
>   (ip . #"233.194.20.110"))
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Quick regexp question

2018-02-02 Thread Matthew Butterick

> On Feb 2, 2018, at 10:23 AM, 'John Clements' via Racket Users 
>  wrote:
> 
> This macro gets the names in much closer to the corresponding patterns than 
> matching by index, but it doesn’t actually embed the names into the regexp.


If you like keeping the names and patterns together, you could also create an 
association list of the names and subpatterns, and iterate:

#lang racket

(define msg "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 
something broke")
(with-input-from-string msg
  (thunk
(for/hash ([(name pat) (in-dict '((date . "[-\\dT:]+")
  (username . "\\w+")
  (hostname . "[-\\w\\d]+")
  (ip . "[\\d\\.]+")
  (message . ".+")))])
  (values name (car (regexp-match (pregexp pat) 
(current-input-port)))


'#hash((message . #" something broke")
   (date . #"2018-02-02T11:26:34")
   (username . #"someuser")
   (hostname . #"some-computername01")
   (ip . #"233.194.20.110"))

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Quick regexp question

2018-02-02 Thread nocheroot
In the long run this is probably better than what I wanted.  Thank you

On Friday, February 2, 2018 at 12:23:48 PM UTC-6, johnbclements wrote:
>
> Not sure if this gets you as far as you want, but you could use a macro to 
> associate names with paren-wrapped items: 
>
> #lang racket 
>
> (define-syntax re-match 
>   (syntax-rules () 
> [(_ str re name ...) 
>  (match str 
>[(regexp re (list _ name ...)) 
> (list (list (quote name) name) ...)])])) 
>
> (define msg "2018-02-02T11:26:34 someuser some-computername01 
> 233.194.20.110 something broke") 
>
> (re-match msg 
>   #px"^([-\\dT:]+)\\s(\\w+)\\s([-\\w\\d]+)\\s([\\d\\.]+)\\s(.+)$" 
>   date username hostname ip message) 
>
> … produces: 
>
> '((date "2018-02-02T11:26:34") 
>   (username "someuser") 
>   (hostname "some-computername01") 
>   (ip "233.194.20.110") 
>   (message "something broke”)) 
>
>
> This macro gets the names in much closer to the corresponding patterns 
> than matching by index, but it doesn’t actually embed the names into the 
> regexp. 
>
> John Clements 
>
>
> > On Feb 2, 2018, at 10:01 AM, noch...@gmail.com  wrote: 
> > 
> > Sorry if I've missed this in the documentation, but I don't see it, and 
> it is starting to bother me. 
> > 
> > In Powershell. Python, and Splunk I'm able to perform automatic field 
> extraction on strings and access the values of fields by name.  Is there a 
> way to do this in Racket?  Of course, pairing matches with field names by 
> index is an option, but not as convenient in some situations. 
> > 
> > Take string "2018-02-02T11:26:34 someuser some-computername01 
> 233.194.20.110 something broke" as a trivial example. 
> > 
> > Powershell: 
> > "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 
> something broke" -match 
> "^(?[\d\-T:]+)\s(?\w+)\s(?[\w\-\d]+)\s(?[\d\.]+)\s(?.+)$"
>  
> | Out-Null 
> > 
> > $matches.date 
> > $matches.username 
> > $matches.hostname 
> > $matches.IP 
> > $matches.message 
> > 
> > Python: 
> > m = 
> re.match("^(?P[\d\-T:]+)\s(?P\w+)\s(?P[\w\-\d]+)\s(?P[\d\.]+)\s(?P.+)$",
>  
> "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something 
> broke") 
> > 
> > m['Date'] 
> > m['Username'] 
> > m['Hostname'] 
> > m['IP'] 
> > m['Message'] 
> > 
> > Both output: 
> > 
> > 2018-02-02T11:26:34 
> > someuser 
> > some-computername01 
> > 233.194.20.110 
> > something broke 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Racket Users" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to racket-users...@googlegroups.com . 
> > For more options, visit https://groups.google.com/d/optout. 
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Quick regexp question

2018-02-02 Thread 'John Clements' via Racket Users
Not sure if this gets you as far as you want, but you could use a macro to 
associate names with paren-wrapped items:

#lang racket

(define-syntax re-match
  (syntax-rules ()
[(_ str re name ...)
 (match str
   [(regexp re (list _ name ...))
(list (list (quote name) name) ...)])]))

(define msg "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 
something broke")

(re-match msg
  #px"^([-\\dT:]+)\\s(\\w+)\\s([-\\w\\d]+)\\s([\\d\\.]+)\\s(.+)$"
  date username hostname ip message)

… produces:

'((date "2018-02-02T11:26:34")
  (username "someuser")
  (hostname "some-computername01")
  (ip "233.194.20.110")
  (message "something broke”))


This macro gets the names in much closer to the corresponding patterns than 
matching by index, but it doesn’t actually embed the names into the regexp.

John Clements


> On Feb 2, 2018, at 10:01 AM, nocher...@gmail.com wrote:
> 
> Sorry if I've missed this in the documentation, but I don't see it, and it is 
> starting to bother me.
> 
> In Powershell. Python, and Splunk I'm able to perform automatic field 
> extraction on strings and access the values of fields by name.  Is there a 
> way to do this in Racket?  Of course, pairing matches with field names by 
> index is an option, but not as convenient in some situations.
> 
> Take string "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 
> something broke" as a trivial example.
> 
> Powershell:
> "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something 
> broke" -match 
> "^(?[\d\-T:]+)\s(?\w+)\s(?[\w\-\d]+)\s(?[\d\.]+)\s(?.+)$"
>  | Out-Null
> 
> $matches.date
> $matches.username
> $matches.hostname
> $matches.IP
> $matches.message
> 
> Python:
> m = 
> re.match("^(?P[\d\-T:]+)\s(?P\w+)\s(?P[\w\-\d]+)\s(?P[\d\.]+)\s(?P.+)$",
>  "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something 
> broke")
> 
> m['Date']
> m['Username']
> m['Hostname']
> m['IP']
> m['Message']
> 
> Both output:
> 
> 2018-02-02T11:26:34
> someuser
> some-computername01
> 233.194.20.110
> something broke
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[racket-users] Quick regexp question

2018-02-02 Thread nocheroot
Sorry if I've missed this in the documentation, but I don't see it, and it 
is starting to bother me.

In Powershell. Python, and Splunk I'm able to perform automatic field 
extraction on strings and access the values of fields by name.  Is there a 
way to do this in Racket?  Of course, pairing matches with field names by 
index is an option, but not as convenient in some situations.

Take string "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 
something 
broke" as a trivial example.

Powershell:
"2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something 
broke" -match 
"^(?[\d\-T:]+)\s(?\w+)\s(?[\w\-\d]+)\s(?[\d\.]+)\s(?.+)$"
 
| Out-Null

$matches.date
$matches.username
$matches.hostname
$matches.IP
$matches.message

Python:
m = re.match(
"^(?P[\d\-T:]+)\s(?P\w+)\s(?P[\w\-\d]+)\s(?P[\d\.]+)\s(?P.+)$"
, "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something 
broke")

m['Date']
m['Username']
m['Hostname']
m['IP']
m['Message']

Both output:

2018-02-02T11:26:34
someuser
some-computername01
233.194.20.110
something broke

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.