Re: [Chicken-users] Using irregex safely responsibly

2010-10-11 Thread Peter Bex
On Mon, Oct 11, 2010 at 01:17:49PM +0900, Alex Shinn wrote:
  The valid-index? predicate does not return a boolean #t value:
 
  #;9 (irregex-match-valid-index? m 3)
  0
 
 It returns #t for this in the upstream irregex.

I'll look into that. It's probably a bug introduced by a
Chicken-specific optimization.

 *-valid-index? just states whether the submatch _may_ exist.
 
 We could add a utility irregex-match-matched-index? to test
 if a specific index was successfully matched.

That's a horrible name.  I think we shouldn't need this if
the procedures just returned #f in case of no match.

 An index which could never be a valid submatch should
 arguably always throw an error.

Agreed.

 An index which is valid, but failed to match, could either
 throw an error or return #f.  The -index and -substring
 operations are inconsistent in this respect, so we should
 fix that.

IMHO they all should behave like -substring; return #f if
there was no match.

 It may be good to provide both sets, with a /default version
 analogous to SRFI-69 hash-table-ref and
 hash-table-ref/default:
 
   (irregex-match-substring m invalid-i)= error
   (irregex-match-substring m unmatched-i)  = error
 
   (irregex-match-substring/default m invalid-i #f)= error
   (irregex-match-substring/default m unmatched-i #f)  = #f
 
 Thoughts?

I think this is pointless.  The hash table has a way to specify a
default value because it's possible to have #f as a value in your
hash table, which makes returning #f ambiguous.  That's why there's
a way to specify the default.

However, in case of substring and index operations, the result is
always an integer/a string.  Returning #f is completely unambiguous
in those cases, so I don't see the need to add yet another procedure.

It would be preferable to have this behaviour:

 (irregex-match-substring m invalid-i)= error
 (irregex-match-substring m unmatched-i)  = #f

 (irregex-match-start-index m invalid-i)= error
 (irregex-match-start-index m unmatched-i)  = #f

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music.
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely responsibly

2010-10-11 Thread Jim Ursetto
On Mon, Oct 11, 2010 at 02:51, Peter Bex peter@xs4all.nl wrote:
 On Mon, Oct 11, 2010 at 01:17:49PM +0900, Alex Shinn wrote:

 However, in case of substring and index operations, the result is
 always an integer/a string.  Returning #f is completely unambiguous
 in those cases, so I don't see the need to add yet another procedure.

 It would be preferable to have this behaviour:

  (irregex-match-substring m invalid-i)    = error
  (irregex-match-substring m unmatched-i)  = #f

  (irregex-match-start-index m invalid-i)    = error
  (irregex-match-start-index m unmatched-i)  = #f

I agree with Peter, the /default procedures seem like a needless
abstraction as a totally unambiguous #f is common practice.  For
example, srfi-13 string-index.  Unless this practice is going to be
deprecated somehow by R7RS.

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely responsibly

2010-10-11 Thread Peter Bex
On Mon, Oct 11, 2010 at 09:51:15AM +0200, Peter Bex wrote:
   #;9 (irregex-match-valid-index? m 3)
   0
  
  It returns #t for this in the upstream irregex.
 
 I'll look into that. It's probably a bug introduced by a
 Chicken-specific optimization.

Yeah, it was a small oversight in a manual merge of a failed patch hunk
for irregex upstream changeset 9c903144d459.
It has been fixed in experimental 0ea0570b4555c737e35288ba9f43e45b25539913.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music.
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely responsibly

2010-10-11 Thread Alex Shinn
Jim Ursetto zbignie...@gmail.com writes:

 I agree with Peter, the /default procedures seem like a needless
 abstraction as a totally unambiguous #f is common practice.  For
 example, srfi-13 string-index.

No, in retrospect I'm not sure why I didn't suggest that to
begin with - I think I've been working too much with type
inference lately, which makes such ambiguous return types
undesirable.

-- 
Alex

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely responsibly

2010-10-10 Thread Alex Shinn
Jim Ursetto zbignie...@gmail.com writes:

 There is some inconsistency in the docs:

 irregex-match-num-submatches: Returns the number of numbered
 submatches that are defined in the
 irregex or match object.
 irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}}
 named submatch or index is defined in the {{match}} object.

 But below, *-valid-index? says undefined when *-num-submatches says defined:

Not quite, *-valid-index? says invalid, not undefined.

*-num-submatches just tells you the total number of
submatches that are defined in the regexp, regardless of
what has been matched, and irregex-match-num-submatches on a
match result will always return the same result as
irregex-num-submatches on the corresponding regexp.

 The valid-index? predicate does not return a boolean #t value:

 #;9 (irregex-match-valid-index? m 3)
 0

It returns #t for this in the upstream irregex.

 I prefer the old behavior for consistency because if irregex tells me
 that 3 submatches exist, I expect to be able to access them without an
 exception being thrown.

*-valid-index? just states whether the submatch _may_ exist.

We could add a utility irregex-match-matched-index? to test
if a specific index was successfully matched.

An index which could never be a valid submatch should
arguably always throw an error.

An index which is valid, but failed to match, could either
throw an error or return #f.  The -index and -substring
operations are inconsistent in this respect, so we should
fix that.

It may be good to provide both sets, with a /default version
analogous to SRFI-69 hash-table-ref and
hash-table-ref/default:

  (irregex-match-substring m invalid-i)= error
  (irregex-match-substring m unmatched-i)  = error

  (irregex-match-substring/default m invalid-i #f)= error
  (irregex-match-substring/default m unmatched-i #f)  = #f

Thoughts?

-- 
Alex

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely responsibly [Was: Re: dev-snapshot 4.6.3]

2010-10-08 Thread Peter Bex
On Thu, Oct 07, 2010 at 08:37:59PM -0500, Jim Ursetto wrote:
 Does this mean for every egg that uses the irregex API directly, I
 need to insert this [cond-expand] blob of code?

You have three options:
- Add a dependency on the regex egg and keep doing
   (require-library regex)(import irregex) like before
- Insert this blob of code to ensure it works with old and new Chickens
- Drop the blob if you don't care about older Chickens.

 There is some inconsistency in the docs:
 
 irregex-match-num-submatches: Returns the number of numbered
 submatches that are defined in the
 irregex or match object.
 irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}}
 named submatch or index is defined in the {{match}} object.
 
 But below, *-valid-index? says undefined when *-num-submatches says defined:

Hm, I'll have to take this up with Alex, it looks like a bug indeed.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music.
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely responsibly [Was: Re: dev-snapshot 4.6.3]

2010-10-08 Thread Peter Bex
On Fri, Oct 08, 2010 at 09:05:10AM +0200, Peter Bex wrote:
 On Thu, Oct 07, 2010 at 08:37:59PM -0500, Jim Ursetto wrote:
  Does this mean for every egg that uses the irregex API directly, I
  need to insert this [cond-expand] blob of code?
 
 You have three options:
 - Add a dependency on the regex egg and keep doing
(require-library regex)(import irregex) like before

That's not quite true; some of the compatibility code is still necessary
to make up for the changes in the API.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music.
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely responsibly [Was: Re: dev-snapshot 4.6.3]

2010-10-08 Thread Jim Ursetto
On Fri, Oct 8, 2010 at 02:09, Peter Bex peter@xs4all.nl wrote:
 That's not quite true; some of the compatibility code is still necessary
 to make up for the changes in the API.

If that's the case, it means that eggs compiled with 4.6.0 aren't
compatible with those compiled with 4.6.2, because that compatibility
code is selected at compile-time.  It's looking to me more and more
that the binversion should be bumped from 5 to 6 (as much as I dislike
this).

Why can't the compatibility code be included in the new irregex unit?
In other words, the old procedure names and behavior could be
deprecated but left in so that 1) we don't have to add a blob of
compatibility code to every egg, and 2) eggs using the old irregex API
would be compatible with all Chicken versions without rebuilding.
It's not very nice to the end-user to just remove procedures without
going through a deprecation phase.

Thoughts?
Jim

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely responsibly [Was: Re: dev-snapshot 4.6.3]

2010-10-08 Thread Jim Ursetto
Eh, let me clarify #2.  Eggs built with 4.6.0 need to be recompiled
with 4.6.2 regardless due to the C_regex_toplevel linking issues.
However, once they are, they would also work with 4.6.0 again, as long
as they stuck to the old irregex API.  I think.  This is pretty
confusing.  Maybe we should bump binversion to 6 after all. :(

On Fri, Oct 8, 2010 at 16:00, Jim Ursetto zbignie...@gmail.com wrote:
 2) eggs using the old irregex API would be compatible with all Chicken 
 versions without rebuilding.

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely responsibly

2010-10-08 Thread Felix
From: Jim Ursetto zbignie...@gmail.com
Subject: Re: [Chicken-users] Using irregex safely  responsibly [Was: Re: 
dev-snapshot 4.6.3]
Date: Fri, 8 Oct 2010 16:00:05 -0500

 Why can't the compatibility code be included in the new irregex unit?
 In other words, the old procedure names and behavior could be
 deprecated but left in so that 1) we don't have to add a blob of
 compatibility code to every egg, and 2) eggs using the old irregex API
 would be compatible with all Chicken versions without rebuilding.
 It's not very nice to the end-user to just remove procedures without
 going through a deprecation phase.

That's a good idea. This code could be added to irregex.scm without
touching irregex-core.scm (the upstream code).


cheers,
felix

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Using irregex safely responsibly [Was: Re: dev-snapshot 4.6.3]

2010-10-07 Thread Jim Ursetto
On Thu, Oct 7, 2010 at 15:53, Peter Bex peter@xs4all.nl wrote:
 In your egg's file, where you would previously use this idiom:

 (require-library regex)      ; or (use regex) for the lazy  sloppy
 (import irregex)

 you can now replace it with this block (you can delete emulation of
 procedures you are sure you aren't using):

 (cond-expand
  (total-irregex
  (use irregex))
  (else
  (require-library regex)
  (import (rename irregex
                  (irregex-match-start irregex-match-start-index)
                  (irregex-match-end irregex-match-end-index)))
  (define irregex-num-submatches irregex-submatches)
  (define irregex-match-num-submatches irregex-submatches)
  (define (irregex-match-valid-index? m i)
    (and (irregex-match-start-index m i) #t))
  (define (maybe-string-sre obj)
    (if (string? obj) (string-sre obj) obj

Does this mean for every egg that uses the irregex API directly, I
need to insert this blob of code?

There is some inconsistency in the docs:

irregex-match-num-submatches: Returns the number of numbered
submatches that are defined in the
irregex or match object.
irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}}
named submatch or index is defined in the {{match}} object.

But below, *-valid-index? says undefined when *-num-submatches says defined:

#;1 (define m (irregex-search (irregex (abc)|(def)|(ghi)) ghi))
#;2 (irregex-match-num-submatches m)
3
#;3 (irregex-match-valid-index? m 2)
#f

The valid-index? predicate does not return a boolean #t value:

#;9 (irregex-match-valid-index? m 3)
0
#;9 (irregex-match-substring m 3)
ghi

Failure behavior for match-start-index and match-substring is
unspecified in the docs.  The former throws an error and the latter
returns #f:

#;3 (irregex-match-start-index m 2)
Error: (irregex-match-start-index) not a valid index
#regexp-match (3 submatches)
2

#;6 (irregex-match-substring m 2)
#f

I prefer the old behavior for consistency because if irregex tells me
that 3 submatches exist, I expect to be able to access them without an
exception being thrown.

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users