Re: [Chicken-users] Using irregex safely responsibly
On Mon, Oct 11, 2010 at 01:17:49PM +0900, Alex Shinn wrote: The valid-index? predicate does not return a boolean #t value: #;9 (irregex-match-valid-index? m 3) 0 It returns #t for this in the upstream irregex. I'll look into that. It's probably a bug introduced by a Chicken-specific optimization. *-valid-index? just states whether the submatch _may_ exist. We could add a utility irregex-match-matched-index? to test if a specific index was successfully matched. That's a horrible name. I think we shouldn't need this if the procedures just returned #f in case of no match. An index which could never be a valid submatch should arguably always throw an error. Agreed. An index which is valid, but failed to match, could either throw an error or return #f. The -index and -substring operations are inconsistent in this respect, so we should fix that. IMHO they all should behave like -substring; return #f if there was no match. It may be good to provide both sets, with a /default version analogous to SRFI-69 hash-table-ref and hash-table-ref/default: (irregex-match-substring m invalid-i)= error (irregex-match-substring m unmatched-i) = error (irregex-match-substring/default m invalid-i #f)= error (irregex-match-substring/default m unmatched-i #f) = #f Thoughts? I think this is pointless. The hash table has a way to specify a default value because it's possible to have #f as a value in your hash table, which makes returning #f ambiguous. That's why there's a way to specify the default. However, in case of substring and index operations, the result is always an integer/a string. Returning #f is completely unambiguous in those cases, so I don't see the need to add yet another procedure. It would be preferable to have this behaviour: (irregex-match-substring m invalid-i)= error (irregex-match-substring m unmatched-i) = #f (irregex-match-start-index m invalid-i)= error (irregex-match-start-index m unmatched-i) = #f Cheers, Peter -- http://sjamaan.ath.cx -- The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music. -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely responsibly
On Mon, Oct 11, 2010 at 02:51, Peter Bex peter@xs4all.nl wrote: On Mon, Oct 11, 2010 at 01:17:49PM +0900, Alex Shinn wrote: However, in case of substring and index operations, the result is always an integer/a string. Returning #f is completely unambiguous in those cases, so I don't see the need to add yet another procedure. It would be preferable to have this behaviour: (irregex-match-substring m invalid-i) = error (irregex-match-substring m unmatched-i) = #f (irregex-match-start-index m invalid-i) = error (irregex-match-start-index m unmatched-i) = #f I agree with Peter, the /default procedures seem like a needless abstraction as a totally unambiguous #f is common practice. For example, srfi-13 string-index. Unless this practice is going to be deprecated somehow by R7RS. ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely responsibly
On Mon, Oct 11, 2010 at 09:51:15AM +0200, Peter Bex wrote: #;9 (irregex-match-valid-index? m 3) 0 It returns #t for this in the upstream irregex. I'll look into that. It's probably a bug introduced by a Chicken-specific optimization. Yeah, it was a small oversight in a manual merge of a failed patch hunk for irregex upstream changeset 9c903144d459. It has been fixed in experimental 0ea0570b4555c737e35288ba9f43e45b25539913. Cheers, Peter -- http://sjamaan.ath.cx -- The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music. -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely responsibly
Jim Ursetto zbignie...@gmail.com writes: I agree with Peter, the /default procedures seem like a needless abstraction as a totally unambiguous #f is common practice. For example, srfi-13 string-index. No, in retrospect I'm not sure why I didn't suggest that to begin with - I think I've been working too much with type inference lately, which makes such ambiguous return types undesirable. -- Alex ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely responsibly
Jim Ursetto zbignie...@gmail.com writes: There is some inconsistency in the docs: irregex-match-num-submatches: Returns the number of numbered submatches that are defined in the irregex or match object. irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}} named submatch or index is defined in the {{match}} object. But below, *-valid-index? says undefined when *-num-submatches says defined: Not quite, *-valid-index? says invalid, not undefined. *-num-submatches just tells you the total number of submatches that are defined in the regexp, regardless of what has been matched, and irregex-match-num-submatches on a match result will always return the same result as irregex-num-submatches on the corresponding regexp. The valid-index? predicate does not return a boolean #t value: #;9 (irregex-match-valid-index? m 3) 0 It returns #t for this in the upstream irregex. I prefer the old behavior for consistency because if irregex tells me that 3 submatches exist, I expect to be able to access them without an exception being thrown. *-valid-index? just states whether the submatch _may_ exist. We could add a utility irregex-match-matched-index? to test if a specific index was successfully matched. An index which could never be a valid submatch should arguably always throw an error. An index which is valid, but failed to match, could either throw an error or return #f. The -index and -substring operations are inconsistent in this respect, so we should fix that. It may be good to provide both sets, with a /default version analogous to SRFI-69 hash-table-ref and hash-table-ref/default: (irregex-match-substring m invalid-i)= error (irregex-match-substring m unmatched-i) = error (irregex-match-substring/default m invalid-i #f)= error (irregex-match-substring/default m unmatched-i #f) = #f Thoughts? -- Alex ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely responsibly [Was: Re: dev-snapshot 4.6.3]
On Thu, Oct 07, 2010 at 08:37:59PM -0500, Jim Ursetto wrote: Does this mean for every egg that uses the irregex API directly, I need to insert this [cond-expand] blob of code? You have three options: - Add a dependency on the regex egg and keep doing (require-library regex)(import irregex) like before - Insert this blob of code to ensure it works with old and new Chickens - Drop the blob if you don't care about older Chickens. There is some inconsistency in the docs: irregex-match-num-submatches: Returns the number of numbered submatches that are defined in the irregex or match object. irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}} named submatch or index is defined in the {{match}} object. But below, *-valid-index? says undefined when *-num-submatches says defined: Hm, I'll have to take this up with Alex, it looks like a bug indeed. Cheers, Peter -- http://sjamaan.ath.cx -- The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music. -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely responsibly [Was: Re: dev-snapshot 4.6.3]
On Fri, Oct 08, 2010 at 09:05:10AM +0200, Peter Bex wrote: On Thu, Oct 07, 2010 at 08:37:59PM -0500, Jim Ursetto wrote: Does this mean for every egg that uses the irregex API directly, I need to insert this [cond-expand] blob of code? You have three options: - Add a dependency on the regex egg and keep doing (require-library regex)(import irregex) like before That's not quite true; some of the compatibility code is still necessary to make up for the changes in the API. Cheers, Peter -- http://sjamaan.ath.cx -- The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music. -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely responsibly [Was: Re: dev-snapshot 4.6.3]
On Fri, Oct 8, 2010 at 02:09, Peter Bex peter@xs4all.nl wrote: That's not quite true; some of the compatibility code is still necessary to make up for the changes in the API. If that's the case, it means that eggs compiled with 4.6.0 aren't compatible with those compiled with 4.6.2, because that compatibility code is selected at compile-time. It's looking to me more and more that the binversion should be bumped from 5 to 6 (as much as I dislike this). Why can't the compatibility code be included in the new irregex unit? In other words, the old procedure names and behavior could be deprecated but left in so that 1) we don't have to add a blob of compatibility code to every egg, and 2) eggs using the old irregex API would be compatible with all Chicken versions without rebuilding. It's not very nice to the end-user to just remove procedures without going through a deprecation phase. Thoughts? Jim ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely responsibly [Was: Re: dev-snapshot 4.6.3]
Eh, let me clarify #2. Eggs built with 4.6.0 need to be recompiled with 4.6.2 regardless due to the C_regex_toplevel linking issues. However, once they are, they would also work with 4.6.0 again, as long as they stuck to the old irregex API. I think. This is pretty confusing. Maybe we should bump binversion to 6 after all. :( On Fri, Oct 8, 2010 at 16:00, Jim Ursetto zbignie...@gmail.com wrote: 2) eggs using the old irregex API would be compatible with all Chicken versions without rebuilding. ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely responsibly
From: Jim Ursetto zbignie...@gmail.com Subject: Re: [Chicken-users] Using irregex safely responsibly [Was: Re: dev-snapshot 4.6.3] Date: Fri, 8 Oct 2010 16:00:05 -0500 Why can't the compatibility code be included in the new irregex unit? In other words, the old procedure names and behavior could be deprecated but left in so that 1) we don't have to add a blob of compatibility code to every egg, and 2) eggs using the old irregex API would be compatible with all Chicken versions without rebuilding. It's not very nice to the end-user to just remove procedures without going through a deprecation phase. That's a good idea. This code could be added to irregex.scm without touching irregex-core.scm (the upstream code). cheers, felix ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Using irregex safely responsibly [Was: Re: dev-snapshot 4.6.3]
On Thu, Oct 7, 2010 at 15:53, Peter Bex peter@xs4all.nl wrote: In your egg's file, where you would previously use this idiom: (require-library regex) ; or (use regex) for the lazy sloppy (import irregex) you can now replace it with this block (you can delete emulation of procedures you are sure you aren't using): (cond-expand (total-irregex (use irregex)) (else (require-library regex) (import (rename irregex (irregex-match-start irregex-match-start-index) (irregex-match-end irregex-match-end-index))) (define irregex-num-submatches irregex-submatches) (define irregex-match-num-submatches irregex-submatches) (define (irregex-match-valid-index? m i) (and (irregex-match-start-index m i) #t)) (define (maybe-string-sre obj) (if (string? obj) (string-sre obj) obj Does this mean for every egg that uses the irregex API directly, I need to insert this blob of code? There is some inconsistency in the docs: irregex-match-num-submatches: Returns the number of numbered submatches that are defined in the irregex or match object. irregex-match-valid-index?: Returns {{#t}} iff the {{index-or-name}} named submatch or index is defined in the {{match}} object. But below, *-valid-index? says undefined when *-num-submatches says defined: #;1 (define m (irregex-search (irregex (abc)|(def)|(ghi)) ghi)) #;2 (irregex-match-num-submatches m) 3 #;3 (irregex-match-valid-index? m 2) #f The valid-index? predicate does not return a boolean #t value: #;9 (irregex-match-valid-index? m 3) 0 #;9 (irregex-match-substring m 3) ghi Failure behavior for match-start-index and match-substring is unspecified in the docs. The former throws an error and the latter returns #f: #;3 (irregex-match-start-index m 2) Error: (irregex-match-start-index) not a valid index #regexp-match (3 submatches) 2 #;6 (irregex-match-substring m 2) #f I prefer the old behavior for consistency because if irregex tells me that 3 submatches exist, I expect to be able to access them without an exception being thrown. ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users