[PATCH] Update irregex to 0.9.11
Hi all, Here's a patch to bring irregex in line with upstream 0.9.11 (plus one more commit that fixes a bug in sre->string). It would be nice to have that for 5.4.0. Not that much changed in this release, but a lot of code was restructured, so it would make it easier to port future changes once this has been applied. Also, there are a few fixes related to utf8-mode, so maybe these help with the UTF-8 branch. Cheers, Peter >From 411dcdd39fde172f414e29a216ed413a4758 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Tue, 23 Apr 2024 08:26:07 +0200 Subject: [PATCH] Bump irregex to upstream commit 923cfc39, which is 0.9.11 plus a bugfix --- NEWS | 5 + irregex-core.scm | 19 +-- irregex-utils.scm | 2 +- tests/test-irregex.scm | 15 +++ 4 files changed, 34 insertions(+), 7 deletions(-) diff --git a/NEWS b/NEWS index 6b09db47..b1bf9e1c 100644 --- a/NEWS +++ b/NEWS @@ -37,6 +37,11 @@ an `errno' property. - Deprecated "chicken-home" and added "include-path" in the chicken.platform module. + - Irregex has been updated to upstream 0.9.11 plus an additional fix +for sre->string. The 0.9.11 release fixes a few problems related to +utf-8 handling (which should not affect CHICKEN) and expands the +definition for the 'whitespace character set to include vertical tab, +carriage return and form feed. - Tools - The -R option for csi and csc now accepts list-notation like diff --git a/irregex-core.scm b/irregex-core.scm index 55e9a6c0..5550ace8 100644 --- a/irregex-core.scm +++ b/irregex-core.scm @@ -1,6 +1,6 @@ irregex.scm -- IrRegular Expressions ;; -;; Copyright (c) 2005-2021 Alex Shinn. All rights reserved. +;; Copyright (c) 2005-2024 Alex Shinn. All rights reserved. ;; BSD-style license: http://synthcode.com/license.txt @@ -30,6 +30,7 @@ History +;; 0.9.11: 2024/02/23 - Guile test and packaging support from Tomas Volf. ;; 0.9.10: 2021/07/06 - fixes for submatches under kleene star, empty seqs ;; in alternations, and bol in folds for backtracking ;; matcher (thanks John Clements and snan for reporting @@ -425,7 +426,12 @@ ;; (define *all-chars* `(/ ,(integer->char (- (char->integer #\space) 32)) ,(integer->char (+ (char->integer #\space) 223 ;; set to #f to ignore even an explicit request for utf8 handling -(define *allow-utf8-mode?* #t) +;; The utf8-mode is undesired on any implementation with native unicode support. +;; It is a workaround for those that treat strings as a raw byte sequences, and +;; does not work well otherwise. So disable it on implementations known to +;; handle unicode natively. +(define *allow-utf8-mode?* (cond-expand ((and chicken (not full-unicode)) #t) +(else #f))) ;; (define *named-char-properties* '()) @@ -1568,8 +1574,8 @@ (cons (car sre) (map rec (cdr sre)) (else (case sre -((any) 'utf8-any) -((nonl) 'utf8-nonl) +((any) (if utf8? 'utf8-any 'any)) +((nonl) (if utf8? 'utf8-nonl 'nonl)) (else (if (and utf8? (char? sre) (high-char? sre)) (sre-sequence (map integer->char (char->utf8-list sre))) @@ -2292,10 +2298,11 @@ . (or alphanumeric punctuation #\$ #\+ #\< #\= #\> #\^ #\` #\| #\~)) (graph . graphic) (blank . (or #\space ,(integer->char (- (char->integer #\space) 23 -(whitespace . (or blank #\newline)) +;; 0B - vertical tab, 0C - form feed +(whitespace . (or blank #\newline #\x0C #\return #\x0B)) (space . whitespace) (white . whitespace) -(printing or graphic whitespace) +(printing . (or graphic whitespace)) (print . printing) ;; we assume a (possibly shifted) ASCII-based ordering diff --git a/irregex-utils.scm b/irregex-utils.scm index 291b03ea..37313666 100644 --- a/irregex-utils.scm +++ b/irregex-utils.scm @@ -104,7 +104,7 @@ (display ")" out)) ((* + ? *? ??) (cond -((pair? (cddr x)) +((or (pair? (cddr x)) (and (string? (cadr x)) (not (= 1 (string-length (cadr x)) (display "(?:" out) (for-each lp (cdr x)) (display ")" out)) (else (lp (cadr x (display (car x) out)) diff --git a/tests/test-irregex.scm b/tests/test-irregex.scm index 0888f09b..8c0464ad 100644 --- a/tests/test-irregex.scm +++ b/tests/test-irregex.scm @@ -419,6 +419,12 @@ (test-equal "***x***" (irregex-replace/all (irregex '(: #\space) 'dfa) " x " "*")) + (test-equal "A:42" + (irregex-replace/all "^" "42" "A:&quo
Re: [PATCH] fix compiler-syntax handling of ##sys#override
On Tue, Jan 16, 2024 at 06:09:33PM +0100, felix.winkelm...@bevuta.com wrote: > This patch addresses the problem reported recently my Mario, regarding > failing compilation of http-client. > > The problem was that compiler-syntax definitions changed the "override" > status and disabled the existing value binding. Thanks, this fixes the error after recompiling intarweb and then http-client. Pushed! Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] allow overriding value/syntax toplevel bindings (#1166)
On Tue, Nov 14, 2023 at 01:22:59PM +0100, felix.winkelm...@bevuta.com wrote: > Yes, this is admittedly all a bit ugly. Please find attached 2 patches: > the first addresses the endless expansion loop caused by our recent change > in ##sys#canonicalize-body. The second patch is a new version of the > "override" > patch, with some additional changes to address the example above and a > followup problem that came up during testing. Finally got around to having a proper look (sorry for taking so long!). It's hacky indeed, but the idea seems sound to me. Pushed that second larger patch. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] allow overriding value/syntax toplevel bindings (#1166)
On Sun, Nov 12, 2023 at 01:45:06PM +0100, felix.winkelm...@bevuta.com wrote: > See commit message. Nice to make some progress on this! However, I tested the example given in the ticket: (define begin -) (begin 0 1) => 1 ;; expected: -1 This still evaluates to 1. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] avoid loop in lookup (fix Salmonella hang with transducers)
On Sun, Nov 12, 2023 at 12:45:49PM +0100, felix.winkelm...@bevuta.com wrote: > The attached patch addresses an endless expander loop while compiling > the transducers egg. The egg uses unconvential (and only slightly legal) > alias identifiers that confuse the expander, combined with a recent > change in how it resolves macros in bodies. The problem was reported by > Mario and wasamasa. Oof, nasty. Pushed the change. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] warn on unexpected egg-properties
On Wed, Nov 08, 2023 at 02:59:17PM +0100, felix.winkelm...@bevuta.com wrote: > Attached is a patch to address #1492. Applied. Do we want to turn this into an error for CHICKEN 6? Warnings have a tendency of just scrolling by and getting lost in the noise of compiler output. OTOH, making it an error would make it harder to add new properties and be backward compatible in an egg (though cond-expand would help). Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] fix for #1132
On Wed, Nov 08, 2023 at 12:34:23PM +0100, felix.winkelm...@bevuta.com wrote: > See commit message. Shouldn't we modify the test instead of dropping it? Something like this: (module m3 () (import (rename scheme (define s:define))) (import (only (chicken base) assert)) (define-syntax define (syntax-rules () ((_) (display 'oink)) ((_ var value) (s:define var (+ value 1) (define) (let () (define a 1) (assert (= a 2))) (define) (newline)) Not sure it's useful to have that final define and newline there. Perhaps we can drop that, or change it do something more meaningful. We could simply change it to (define b 5) followed by (assert (= b 6)). Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] export/rename
On Tue, Oct 10, 2023 at 08:14:54AM +0200, Peter Bex wrote: > After some more thought on this, I think you convinced me. > Let's go with the original export/rename patch, unless anyone objects. A week has passed with no objections. I've pushed the original patch. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] export/rename
On Mon, Oct 09, 2023 at 05:01:47PM +0200, felix.winkelm...@bevuta.com wrote: > The export/rename localizes the functionality and doesn't > require changes to existing code. The precedent of the syntax: and > interface: never was a particularly good one (we should have used separate > export forms for those right from the start), so adding funny markers or > possibly ambiguous special cases is not my favorite approach. After some more thought on this, I think you convinced me. Let's go with the original export/rename patch, unless anyone objects. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] export/rename
On Tue, Oct 03, 2023 at 12:51:57PM +0200, felix.winkelm...@bevuta.com wrote: > Ok, I find attached a variant, both more ugly in interface and > implementation, since ##core#module and functor do not yet allow > renamings to be handled. Also, "define-interface" has no notion > of this, so "(rename: OLD NEW)" is purely applicable in "export". I'd prefer if we could develop this a bit further to make the rename form accessible to module as well. Not sure if that would be useful in functors. I've never used them, but it seems to me that the functor's implementation is in a better position to determine whether to rename on export or not, not the functor definition itself. If you agree that it makes little sense to support it in functors, perhaps it's better to call ##sys#syntax-error when the functor definition includes renamed exports? > I prefer the first variant, but feel free to push which one you > like more. As it is, the first variant is cleaner indeed, but if we decide to support renames in the "module" form as well, I think we should go with the second variant. I had a quick look and the main obstacle seems to be that ##core#module is not very extendable, because it assumes the first two "arguments" are special, and the rest is the body (IIUC). Would it perhaps be possible to simply expand (module FOO (exps) BODY) to something like the following? (##core#module FOO (non-renamed-exps) (export (rename: name1 renamed1) ...) BODY) Perhaps a bit hand-wavey (sorry), but I think this is simpler than trying to change ##core#module. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] export/rename
On Mon, Oct 02, 2023 at 06:31:44PM +0200, felix.winkelm...@bevuta.com wrote: > This patch adds a new special form to explicitly export renamed bindings > from a module: > > (export/rename (OLD NEW) ...) Why not add it to the regular "export" form? It's already extendable, as it has syntax: and interface:. So for example we could have: (export (rename: OLD NEW) ...) Saves having yet another top-level special-case form. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] Disallow empty "or" type specifier
On Tue, Sep 26, 2023 at 12:33:43PM +0300, elf wrote: > It certainly shouldn't be *, nor should it be an error... technically an > empty union should be a null set, which would correspond to either a > non-extant type or to no return/value at all... We've discussed this on IRC. Such a null type doesn't exist in Scheme, and "no return value" would correspond to a zero "multiple return values", which has a different notation in our type system. The (or) is an individual value, which could occur multiple times in a "multiple return values" list. We don't splice MV values together, so this is either a null type or an error. So in our case, it should definitely be an error since we don't have null types. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] recover CHICKEN_INSTALL_PREFIX
On Wed, Aug 16, 2023 at 05:25:45PM +0200, felix.winkelm...@bevuta.com wrote: > > Why are install-path and repo-path in chicken-install.scm still using > > destination-repository instead of effective-destination-repository? > > Those are the only places that still use it, aside from one in > > chicken-status and chicken-uninstall. > > The prefix variable is only used for designating where the build > artifacts are to be stored, not where extensions are located or > where .egg-info files are to be stored. It also makes sense to set > CHICKEN_REPOSITORY_PATH in addition to CHICKEN_INSTALL_PREFIX when > dependency extensions installed and used during the build of another > extension are needed to be located. This _is_ confusing, no doubt > about that, but we need a way to override install target locations, > especially for include-files and binaries and has been requested > several times by users, IIRC. It's extremely confusing and hard to use correctly, but it seems to work as intended, so I've pushed the patch. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] recover CHICKEN_INSTALL_PREFIX
On Fri, Jul 21, 2023 at 04:12:52PM +0200, felix.winkelm...@bevuta.com wrote: > Indeed - sorry about that, here an improved version. Sorry for the delay, but I finally had a bit of time to look into this more seriously, but honestly I'm confused by all the parameters. Why are install-path and repo-path in chicken-install.scm still using destination-repository instead of effective-destination-repository? Those are the only places that still use it, aside from one in chicken-status and chicken-uninstall. I checked, but if I call chicken-uninstall with CHICKEN_EGG_PREFIX, it uninstalls (or attempts to uninstall) the egg in the "standard" location. That means chicken-uninstall doesn't work if you're using CHICKEN_EGG_PREFIX. For chicken-status, the same is true - it won't list the eggs you installed in the prefix. So why not simply replace the definition of destination-repository with the new code? Cheers, Peter signature.asc Description: PGP signature
Re: How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))
On Thu, Jul 27, 2023 at 12:30:46PM +0200, Peter Bex wrote: > 1) There's no (efficient) way to know if an object is a finalizable one. >We need this because we can't simply clear *all* the objects inside >a finalizable object that aren't referenced elsewhere, because we do >want to keep foreign pointers etc which are only referenced by the >finalized object itself. So we'd need an efficient way to know if >an object pointed to by a finalizable object is itself finalizable. Strike that - I think it can be done at the cost of an additional pointer per finalizer to encode the boundaries of objects that belong to that finalizer's "reachable set". But still, that leaves us with #2. > 2) We have no (canonical) way of breaking strong references. For weak >references, it is clear that we have some special indicator, the >"broken weak pointer" placeholder that gets put there when an object >reference is cleared. Cheers, Peter signature.asc Description: PGP signature
Re: How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))
On Wed, Jul 26, 2023 at 04:45:53PM +0100, Andy Bennett wrote: > So how can we finalise a circular list of objects all of which have > finalisers and still maintain atomicity? > > The docs say the order is "undefined". > It seems that the best way to finalise this structure is to explicitly break > all the strong references between components of the list (as we do for > external weak references) before any of the finalisers are called. I thought about this one too - it would be nice if all the finalized objects that refer to other finalized objects would have these links cleared. However, there are two main obstacles to that: 1) There's no (efficient) way to know if an object is a finalizable one. We need this because we can't simply clear *all* the objects inside a finalizable object that aren't referenced elsewhere, because we do want to keep foreign pointers etc which are only referenced by the finalized object itself. So we'd need an efficient way to know if an object pointed to by a finalizable object is itself finalizable. 2) We have no (canonical) way of breaking strong references. For weak references, it is clear that we have some special indicator, the "broken weak pointer" placeholder that gets put there when an object reference is cleared. > It may also improve the memory model if we define the object that the > finaliser receives as a "copy" of the object that has ("already") been > garbage collected. I don't really think this will make much of a difference either way - as it is currently, you couldn't distinguish the "original" object from a copy. Possibly a better way is what MIT Scheme (and, gasp, JavaScript) does: register a finalizer on object with an extraction procedure that returns the value to finalize. That way, the object getting deleted is not the object that is getting finalized. For instance, when a port would be finalized, the finalization procedure receives the underlying file descriptor and not the entire port object. Although after giving that some more thought, I'm not 100% sure this would really solve the issue - you can still extract complex objects from the to-be-finalized object (or have the "identity" procedure as the "extractor", so it returns the object itself) and therefore have an object with other things pointing into it, and that could still be resurrected... Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] recover CHICKEN_INSTALL_PREFIX
On Fri, Jul 21, 2023 at 03:54:41PM +0200, felix.winkelm...@bevuta.com wrote: > diff --git a/egg-compile.scm b/egg-compile.scm > index 99a94fe8..b14535da 100644 > --- a/egg-compile.scm > +++ b/egg-compile.scm > @@ -412,7 +420,7 @@ > (define (compile-common info walk) >(case (car info) > ((target) > - (when (eq? mode 'target) > + (when (eq? mode =) > (for-each walk (cdr info > ((host) > (when (eq? mode 'host) This hunk looks incorrect. What did you intend to change this to? Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] finalizer API (was: thread-safe handling of asynchronous events)
On Fri, Jul 07, 2023 at 10:43:29AM +0200, felix.winkelm...@bevuta.com wrote: > Here another attempt at a finalizer API, allowing adding > finalized objects to an existing finalizer after it was > created. Thanks, I've pushed both this and the srfi 18 changes. Should we tag a new srfi-18 release? Cheers, Peter signature.asc Description: PGP signature
[PATCH] Rename read/source-info to avoid self-reference and hang using older compilers
Hi all, Here's a patch to change the name of the read/source-info we export from chicken.syntax. This avoids a problem we've been seeing on the Salmonella machines with hanging processes. Cheers, Peter From 520c12c7642448af2bc1c6e0a1790794d06dc2d4 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Tue, 11 Jul 2023 15:40:45 +0200 Subject: [PATCH] Change official name of read/source-info to avoid conflict Somehow, when using an older compiler to bootstrap, read/source-info from support.scm was referencing itself, even though it refers to chicken.syntax#read-with-source-info explicitly. This caused the Salmonella test servers to hang. At some point, we'll have to find out why this happened, but for now let's take the easy way out and rename the officially exported name to the full name "read-with-source-info". --- DEPRECATED | 6 +++--- NEWS | 6 +++--- batch-driver.scm | 2 +- csi.scm| 6 +++--- eval.scm | 10 +- expand.scm | 10 +- manual/Module (chicken syntax) | 6 +++--- repl.scm | 2 +- support.scm| 2 +- types.db | 2 +- 10 files changed, 26 insertions(+), 26 deletions(-) diff --git a/DEPRECATED b/DEPRECATED index 6315e591..6c2a009e 100644 --- a/DEPRECATED +++ b/DEPRECATED @@ -2,9 +2,9 @@ Deprecated functions and variables == 5.3.1 -- read/source-info was moved from the internal, undocumented module - (chicken compiler support) into (chicken syntax). Using it from - the former module is deprecated. +- read/source-info from the internal, undocumented module + (chicken compiler support) is deprecated. Instead, use + read-with-source-info from (chicken syntax). 5.2.1 - current-milliseconds and its C implementations C_milliseconds and diff --git a/NEWS b/NEWS index 68940ef5..a371ff8e 100644 --- a/NEWS +++ b/NEWS @@ -21,9 +21,9 @@ filename. Previously, the directory part would be stripped. - Added support for embedded strings and characters in SRFI-4 vector literals. - - read/source-info is now documented and officially supported, from -the (chicken syntax) module. It is still exported from the -undocumented internal (chicken compiler support) module, but + - read-with-source-info is now documented and officially supported, +from the (chicken syntax) module. read/source-info is still exported +from the undocumented internal (chicken compiler support) module, but using it from there is deprecated. - Tools diff --git a/batch-driver.scm b/batch-driver.scm index f0a5fc08..8f0a4f35 100644 --- a/batch-driver.scm +++ b/batch-driver.scm @@ -609,7 +609,7 @@ (in (check-and-open-input-file f)) ) (fluid-let ((##sys#current-source-filename f)) (let loop () - (let ((x (chicken.syntax#read/source-info in))) ; OBSOLETE - after bootstrapping we can get rid of this explicit namespacing + (let ((x (chicken.syntax#read-with-source-info in))) ; OBSOLETE - after bootstrapping we can get rid of this explicit namespacing (cond ((eof-object? x) (close-checked-input-file in f) ) diff --git a/csi.scm b/csi.scm index 28ab16a0..63ff4221 100644 --- a/csi.scm +++ b/csi.scm @@ -280,7 +280,7 @@ EOF (define default-evaluator (let ((eval eval) (load-noisily load-noisily) - (read (lambda () (chicken.syntax#read/source-info (current-input-port ; OBSOLETE - after bootstrapping we can get rid of this explicit namespacing + (read (lambda () (chicken.syntax#read-with-source-info (current-input-port ; OBSOLETE - after bootstrapping we can get rid of this explicit namespacing (read-line read-line) (display display) (string-split string-split) @@ -1047,8 +1047,8 @@ EOF (load home-fn) ) ) ) ) (define (evalstring str #!optional (rec (lambda _ (void (let ((in (open-input-string str)) - (read/source-info chicken.syntax#read/source-info)) ; OBSOLETE - after bootstrapping we can get rid of this explicit namespacing - (do ([x (read/source-info in) (read/source-info in)]) + (read-with-source-info chicken.syntax#read-with-source-info)) ; OBSOLETE - after bootstrapping we can get rid of this explicit namespacing + (do ([x (read-with-source-info in) (read-with-source-info in)]) ((eof-object? x)) (rec (receive (eval x))) ) ) ) (when (member* '("-h" "-help" "--help") args) diff --git a/eval.scm b/eval.scm index d54be504..929acc38 100644 --- a/eval.scm +++ b/eval.scm @@ -1079,7 +1079,7 @@ (##sys#current-load-filenam
Re: How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))
On Mon, Jul 10, 2023 at 09:28:19PM +0200, felix.winkelm...@bevuta.com wrote: > After thinking some more about this, I realize that your approach > (clearing weak ref's to finalized data) is the right thing, since > any other behaviour in the presence of multithreading leads to > disaster. Thank you for seeing the light. Cheers, Peter signature.asc Description: PGP signature
How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))
On Fri, Jul 07, 2023 at 11:23:17PM +0200, felix.winkelm...@bevuta.com wrote: > I'm not very comfortable with this change. This feels like trading in > one inconsistency (weak refs being cleared for a potentially non-dead > object) for another (potentially inconsistent ties of GC-controlled > memory to non-GC'd resources). It depends on how you view finalizers. Personally, I would think a finalizer should get run on what's *essentially* "already GCed" data. Therefore, it makes no sense to pass a finalizer data that still holds onto other cleared data. But like I said, I can get behind your POV - you could also argue that if an object holds onto other things and it's "already GCed", all the things it (and nobody else) holds onto (even strongly) should be cleared, and that absolutely makes no sense whatsoever. (come to think of it, the ideal solution would probably be to clear "outside" weak references to finalizable data but keep the object itself internally intact. But that's extremely hard to track in the GC) However, there's one more concern: > The potential use-after-free scenario can still happen if the object is > kept alive, regardless of how we handle weak refs, this is unavoidable > if we allow finalizers and keep the possibility of resurrection. I have thought about this a bit more but I came to the conclusion that from an abstraction point of view it's better to clear weak refs to finalized data. The reason is that when a module exposes an object, the *user* should not need to know or care exactly how that object is implemented and that it happens to use a finalizer. So let's say an egg exposes an object, and I'm using it, but I want to reference it weakly. Then, all things will work fine most of the time. However, there's a nasty race condition lurking: if the finalizer happens to run before my code extracts the object from the weak ref, but I extract it before the next GC, my code may crash. Or it may not, use-after-free is tricky like that. This should not be the user's concern, and it's a breach of abstraction. It's also a global issue, as it can affect *any* code that holds onto an object from a "3rd party" (other module). Note that this would be problematic *even* if the 3rd party wrote the code so carefully that it clears the pointer from the object such that there's no use-after-free bug. Because the code will still raise an exception when passed this invalidated object. And again, that will be a race condition for the person who uses this module. This means that the problem is spread to every single user who decides to use a weak reference to a 3rd party object that involves resources that must be freed. On the other hand, if I'm writing an egg that exposes a foreign object that needs to be collected, it *is* of my concern (and not a breach of abstraction) that the data gets freed properly. Let's say I decide to store weak refs inside the object (which is not even *that* likely), and those weak refs point to other things inside the same object, which I know to be GCed at the time the finalizer is called. Then that's something I know and must take care of when writing the finalizer. Finalizers of objects not seeing weak refs into that same object is very much a localized concernn. Also, perhaps more importantly, this will break *immediately* and *consistently*. The finalizer will simply see broken weak pointers, all the time, every time it runs on the "collected" object. Since this is also documented in my patch, I think this is the least impactful way of doing things. I'm not even all that concerned about finalizers "reviving" dead objects, but in that case it's *definitely* the responsibility of the finalizer's author to make sure it doesn't cause any trouble. So overall, IMHO the problem is a lot more self-contained/localized, and more deterministic if we clear the references before running the finalizer. That has to count for something, I think! Cheers, Peter signature.asc Description: PGP signature
[PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events)
On Thu, Jul 06, 2023 at 09:05:03PM +0200, felix.winkelm...@bevuta.com wrote: > > This would be problematic if the finalizer has run and deleted the > > foreign object, while there are still weak references that hold onto > > the object. This has then become invalid/inconsistent. > > I don't understand this, I'm afraid. Finalizers can always "revive" > objects, this can't be avoided and may sometimes even be required. > If weak refs suddenly make our memory model unsound, then the whole idea > of weak references stands to discussion. I don't think they make the memory model unsound per say. But it is an issue, and one avoided by MIT Scheme by simply making it impossible to revive collected objects. This thread made me consider what to do with such weak references and I decided that we should clear references to finalizable (which may already be finalized) objects. Attached is a patch to ensure that "live" weak references don't hold onto objects that may have been processed by a finalizer. I think this removes the worst potential use-after-free footguns. NOTE: This issue already exists in older CHICKEN versions with weak locatives, it is not newly introduced by weak pairs! While working on this, I also noticed a remaining bug in the weak locative handling: when a locative has already been cleared, it contains a NULL pointer, and we need to avoid dereferencing it on the next GC. Cheers, Peter From e9a267e6a1268f8f7eafdcca8b609c443d717e67 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Fri, 7 Jul 2023 11:07:43 +0200 Subject: [PATCH 1/2] Skip weak locatives that were already invalidated This avoids a NULL pointer dereference --- runtime.c | 1 + 1 file changed, 1 insertion(+) diff --git a/runtime.c b/runtime.c index edda5377..fbce94fd 100644 --- a/runtime.c +++ b/runtime.c @@ -4126,6 +4126,7 @@ static C_regparm void C_fcall update_locatives(int mode) (mode == GC_REALLOC && !C_in_stackp(loc) && !C_in_heapp(loc))); /* NB: *old* heap! */ ptr = C_block_item(loc, 0); /* fix up ptr */ +if (ptr == 0) continue; /* Skip already dropped weak locatives */ offset = C_unfix(C_block_item(loc, 1)); obj = ptr - offset; -- 2.40.1 From 9d1d1f3ee1284ba9e298d5dacdb77e8a3ad5fd27 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Fri, 7 Jul 2023 11:45:51 +0200 Subject: [PATCH 2/2] Don't retain weak references to finalizable objects When an object would be garbage, but has a finalizer, it will be artificially kept alive at the end of a GC cycle so that the finalizer gets a chance to run, with the original event as an argument. However, this object is effectively supposed to be garbage, and the finalizer may put the object in a defunct state. This means we don't want code to be able to extract such an object through a weak reference, because it should be considered garbage and *might* be already cleaned up. This could be a potential footgun, making use after free bugs more likely. To fix this, we make these "garbage" objects appear as garbage for the purpose of clearing weak pointers. The implementation is relatively trivial - we simply remember the heap top pointer before we revive garbage objects that have finalizers, and when dereferencing the pointer to the new object's location, where we check that the object is in the target heap, we also check that it's not in the portion of the heap where we've put the finalizable objects and anything they reference. --- manual/Module (chicken gc) | 6 ++ runtime.c | 38 +++- tests/weak-pointer-test.scm | 113 3 files changed, 141 insertions(+), 16 deletions(-) diff --git a/manual/Module (chicken gc) b/manual/Module (chicken gc) index 48653e3a..25a29373 100644 --- a/manual/Module (chicken gc) +++ b/manual/Module (chicken gc) @@ -53,6 +53,12 @@ Multiple finalizers can be registered for the same object. The order in which the finalizers run is undefined. Execution of finalizers may be nested. +NOTE: When a finalizable object has any weak references (i.e., weak +locatives or weak pairs) to objects that are only reachable through it +or other finalizable objects, those references will be broken like +when the objects had already been collected. This is done in order to +avoid user code from accessing objects that are possibly in an +invalid state. === force-finalizers diff --git a/runtime.c b/runtime.c index fbce94fd..6fdd7fca 100644 --- a/runtime.c +++ b/runtime.c @@ -543,8 +543,8 @@ static void C_fcall mark_nested_objects(C_byte *heap_scan_top, C_byte *tgt_space static void C_fcall mark_live_objects(C_byte *tgt_space_start, C_byte **tgt_space_top, C_byte *tgt_space_limit) C_regparm; static void C_fcall mark_live_heap_only_objects(C_byte *tgt_space_start, C_byte **tgt_space_top, C_byte *tgt_space_limit) C_regparm; static C_word C_fcall intern0(C_char *name) C_re
Re: [PATCH] thread-safe handling of asynchronous events
On Thu, Jul 06, 2023 at 02:51:43PM +0200, felix.winkelm...@bevuta.com wrote: > That is indeed a shortcoming of the API. I must say that I'm not > too keen on the "guardian" approach, though, or your suggested hacky > workaround. There would be too many meanings attached to the > argument for the polling procedure. I agree, which is why I called it hacky. > I also don't like the MIT apporach, since our very basic > "set-finalizer!" has worked fine so far (with the execution context > being the main problem). Yeah, MIT's is rather heavy-handed. Perhaps simply we can just return two values? The first a polling procedure and the second a procedure to add new objects to the finalizer? You can just receive the polling procedure in a single-value context and ignore the object-adder if you don't want to use it. Alternatively, make-finalizer could return a new structure type that represents the finalizer. It could just be a wrapper for the queue internally. We'd have to add a getter and an adder procedure that accepts this new object type (and perhaps a predicate). > > Separating the collectable object from the context prevents the > > collected object from re-entering the live system through the > > finalization procedure (it may set! some variable to it, for example > > and that's not desirable). Apparently, this makes ephemerons easier to > > implement. > > I found this through Taylor Campbell's comment on Andy Wingo's blog post: > > https://wingolog.org/archives/2022/10/31/ephemerons-and-finalizers > > I don't quite understand why many APIs are so afraid of retaining > the finalized object. The point is becoming aware of the object > being reclaimable. If it survives yet another GC cycle, so what? This would be problematic if the finalizer has run and deleted the foreign object, while there are still weak references that hold onto the object. This has then become invalid/inconsistent. And vice versa, if there are weak references which are now broken, and the finalizer restores the object, this might cause different kinds of inconsistencies to arise. > Unless we are severely memory-constrained and must at all costs release > the object, I see no need in separating object and context for > finalization. I don't think this separation was ever about memory efficiency. > Moreover, there may be cases when we may need full access > to the object to perform our finalization action. Yeah, hiding the object could be problematic if it holds onto several foreign objects that all have to be cleaned up. > > Now I think this API is rather heavy-handed, but perhaps we can settle > > on some sort of middle ground. I also think that maybe separating the > > finalization object from the "context" is too late for us, unless we > > decide to rework the set-finalizer! API as well for CHICKEN 6. > > Indeed. I wouldn't want to touch the underlying machinery. it works > pretty well, I think. Agreed. > [ event queue API ] > That sounds good, and I understand the motivation but I don't want to > touch the scheduler, to be honest. The UTF transition is already enough > work as it is. The original subject is merely a question of API choice, > let's not try to fix everything right now. Like I said, just a brain dump. Perhaps something to keep in mind for CHICKEN 7. But as we design a new API we'd do well to make such a change possible without too many changes in user code. > I will try to come up with something better, to allow adding objects > to an existing finalizer. Looking forward to it! Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] thread-safe handling of asynchronous events
On Wed, Jul 05, 2023 at 03:28:54PM +0200, felix.winkelm...@bevuta.com wrote: > The first patch provides the event-queue mechanism and cleans up the > scheduler a bit (hiding internal variables while also exposing ##sys#... > procedures to access them). This also defines hooks that a threading > API should override to allow suspension/resumption on events. I'm not super happy with exposing ##sys#fd-list and ##sys#timeout-list as a procedure under the exact same name that originally held a list. If there's any code that uses it, that would lead to strange errors. Might be better to either not expose it at all, or use a completely different name. Perhaps now is a good time to make the scheduler into a proper module that exposes procedures in an unprefixed way? > The second patch provides a new signal API: "make-signal-handler", > which creates a handler for one or more signals and "signal-ignore" and > "signal-default" (to ignore a signal or set the default disposition). > This is roughly modelled after the Racket API. The old "signal-handler" > and "set-signal-handler!" procedures have been deprecated. > > The third patch provides "make-finalizer", inspired by Chez' guardians, > but slightly different to allow blocking/non-blocking tests for > finalizations. The old API is still available, as often finalization > performs only very basic operations independent of the currently > executing context. Finalization procedures can be composed, as a > finalizer itself becomes subject to finalization once all associated > objects have been collected. Building guardians on top of this > should be straightforward, but the latter does not (to my knowledge) > allow blocking a thread until a finalizer triggers so I chose not > to implement this interface. This API looks nice, but as I understand it, make-finalizer requires all objects in each "pool" to be known when calling it. AFAIK this means that when you are creating objects on-the-fly, you'd have to call make-finalizer N times, and when retrieving objects you'd have to poll N finalizer "pools", and therefore you can't block on all objects you'd want to collect. So what's missing is a way to add an object to an existing finalizer. Note that this is what guardians support too - when you call a guardian procedure with an argument, that object is added to the pool. A (somewhat hacky) way to do this would be to have the finalizer closure check whether the "mode" argument is a boolean. If "mode" is not a boolean, it is assumed to be an object to add to the queue. Perhaps with an extra check that it's not an immediate value. Alternatively, we could steal (part of) MIT Scheme's (undocumented) finalizer API: https://git.savannah.gnu.org/cgit/mit-scheme.git/tree/src/runtime/gcfinal.scm?id=176cd871bdd9c9dabcb8ad602da0b618be2d0373#n36 There, you call (make-gc-finalizer) with the finalization procedure, a predicate for the type of object that may be collected, a procedure that can be used to extract the "context" from the collectable object and a procedure that can be used to set the context inside the collectable object. The "context" here is the value that gets passed to the finalizer procedure (typically, this would be the foreign pointer inside a finalizable struct object). Then, add-to-gc-finalizer! can be used to add objects to the finalizer and remove-from-gc-finalizer! can be used to remove objects from the finalizer. Separating the collectable object from the context prevents the collected object from re-entering the live system through the finalization procedure (it may set! some variable to it, for example and that's not desirable). Apparently, this makes ephemerons easier to implement. I found this through Taylor Campbell's comment on Andy Wingo's blog post: https://wingolog.org/archives/2022/10/31/ephemerons-and-finalizers Now I think this API is rather heavy-handed, but perhaps we can settle on some sort of middle ground. I also think that maybe separating the finalization object from the "context" is too late for us, unless we decide to rework the set-finalizer! API as well for CHICKEN 6. Perhaps another API that exposes the queues directly might be better. This way one could use one queue for different things, so that finalization, signals and file descriptor readiness could happen on the same queue, so that a thread could block on "any event". A bit like the waitForMultipleObjects API in Windows. Something like: (make-event-queue) => q (enqueue-read q PORT) ;; and a low-level (enqueue-read/fd q FD)? (enqueue-write q PORT) (enqueue-i/o q PORT) ;; read or write (enqueue-finalization! q OBJ) (wait-for-event! q) ;; blocks for any "enqueued" event (peek-event q) ;; returns #f if nothing ready This makes it easy to pick off events from a queue, regardless of the type and you'll get an object of the type. I don't know if it can be done easily, but it would allow GUI libraries to hook into this too by having an event loop provide
Re: [PATCH] fix "tail?"
On Wed, Jul 05, 2023 at 07:27:43PM +0200, felix.winkelm...@bevuta.com wrote: > Reported by "acdw": make "tail?" more robust. Also generalizes > it to accept all types of data, remove unneeded fast path expecting > proper list argument. Thanks, pushed! Cheers, Peter signature.asc Description: PGP signature
[PATCH] Fix get-call-chain thread filtering
Hi there, The attached patch fixes a small but important oversight in get-call-chain. With this patch, the "trace" egg's tests pass again. Cheers, Peter From 7e20f8dc5310ee8fb41522bff062783b224b07f3 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Fri, 30 Jun 2023 09:42:48 +0200 Subject: [PATCH] Fix thread id extraction in get-call-chain This was missed when changing the trace buffer struct; all accessors had to be adjusted by adding one to the index, except for the "raw" slot. Without it, the filtering of call chain entries for the current thread will fail for anything with "frameinfo" / a filled cooked2 slot, so the call chain will be a lot smaller than it ought to be. --- library.scm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/library.scm b/library.scm index 94f304b8..b04b330a 100644 --- a/library.scm +++ b/library.scm @@ -4975,7 +4975,7 @@ EOF (let loop ((i 0)) (if (fx>= i n) '() - (let ((t (##sys#slot vec (fx+ i 3 ; thread id + (let ((t (##sys#slot vec (fx+ i 4 ; thread id (if (or (not t) (not thread) (eq? t-id t)) (cons (vector (or (##sys#slot vec (fx+ i 1)) ; cooked_location -- 2.40.1 signature.asc Description: PGP signature
[PATCH] Restore read/source-info in support.scm and export it officially from (chicken syntax)
Hi all, I was checking the Salmonella results and it seems a good number of eggs are actually relying on chicken.compiler.support#read/source-info and therefore break with the current master: http://salmonella-linux-x86.call-cc.org/master/clang/linux/x86/2023/06/28/yesterday-diff/ The attached patch restores the procedure in support.scm, but keeps the implementation itself in chicken.syntax, where it makes more sense. Since these eggs are using it, that TODO above ##sys#read/source-info is settled - it's useful enough for those eggs, so let's just expose it publically and document it. I noticed there's also an error in the "trace" egg. This probably has something to do with the changes in the call chain. Not sure how, but I'll investigate. Cheers, Peter From ed691a470bd45bf1ecb9a8d8f53fdfbe4d8c952a Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Fri, 30 Jun 2023 09:01:36 +0200 Subject: [PATCH] Restore read/source-info in support.scm and export from (chicken syntax) It turns out several eggs got broken by dropping chicken.compiler.support#read/source-info because they are using it directly, even though it's undocumented and supposed to be "internal". So, restore it in support.scm as an alias for the one in (chicken syntax) to fix those eggs, but mark it as deprecated. Since several eggs make use of it, that settles the question of whether it is useful to export it from (chicken syntax), so do this. Also, since it's now an official API, make it safe to use by checking the argument for being a port, and make the argument optional, like in the regular (read) procedure. --- DEPRECATED | 5 + NEWS | 4 batch-driver.scm | 4 +++- chicken.syntax.import.scm | 1 + csi.scm| 7 --- eval.scm | 13 - expand.scm | 5 +++-- manual/Module (chicken syntax) | 12 repl.scm | 7 --- rules.make | 1 + support.scm| 6 +- types.db | 5 + 12 files changed, 55 insertions(+), 15 deletions(-) diff --git a/DEPRECATED b/DEPRECATED index 7d9cffe7..6315e591 100644 --- a/DEPRECATED +++ b/DEPRECATED @@ -1,6 +1,11 @@ Deprecated functions and variables == +5.3.1 +- read/source-info was moved from the internal, undocumented module + (chicken compiler support) into (chicken syntax). Using it from + the former module is deprecated. + 5.2.1 - current-milliseconds and its C implementations C_milliseconds and C_a_i_current_milliseconds have been deprecated in favor of diff --git a/NEWS b/NEWS index 023d8939..68940ef5 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,10 @@ filename. Previously, the directory part would be stripped. - Added support for embedded strings and characters in SRFI-4 vector literals. + - read/source-info is now documented and officially supported, from +the (chicken syntax) module. It is still exported from the +undocumented internal (chicken compiler support) module, but +using it from there is deprecated. - Tools - The -R option for csi and csc now accepts list-notation like diff --git a/batch-driver.scm b/batch-driver.scm index ee7ae28a..f0a5fc08 100644 --- a/batch-driver.scm +++ b/batch-driver.scm @@ -47,6 +47,7 @@ chicken.pretty-print chicken.process-context chicken.string + chicken.syntax chicken.port chicken.time chicken.condition @@ -608,7 +609,8 @@ (in (check-and-open-input-file f)) ) (fluid-let ((##sys#current-source-filename f)) (let loop () - (let ((x (##sys#read/source-info in))) + (let ((x (chicken.syntax#read/source-info in))) ; OBSOLETE - after bootstrapping we can get rid of this explicit namespacing + (cond ((eof-object? x) (close-checked-input-file in f) ) (else diff --git a/chicken.syntax.import.scm b/chicken.syntax.import.scm index 7ce92fea..0aa5b36d 100644 --- a/chicken.syntax.import.scm +++ b/chicken.syntax.import.scm @@ -32,6 +32,7 @@ 'expand '((expand . chicken.syntax#expand) (get-line-number . chicken.syntax#get-line-number) + (read/source-info . chicken.syntax#read/source-info) (strip-syntax . chicken.syntax#strip-syntax) (syntax-error . chicken.syntax#syntax-error) (er-macro-transformer . chicken.syntax#er-macro-transformer) diff --git a/csi.scm b/csi.scm index fca1da4e..28ab16a0 100644 --- a/csi.scm +++ b/csi.scm @@ -280,7 +280,7 @@ EOF (define default-evaluator (let ((eval eval) (load-noisily load-noisily) - (read (lambda () (##sys#read/source-info (current-input-port +
[PATCH] Use weak chain approach for locatives, too
Hi all, Here's a patch to replace the locative table with the same approach we use for weak pair tracking. Not much to say, the idea is simple, the patch a bit bigger (because it rips out the locative table). There's no manual or NEWS entry because this should not be user-visible unless code creates lots of garbage locatives (as those would not be traversed during GC, unlike before). As I point out in the commit message, I think it would be worthwile for CHICKEN 6 to change the representation of locatives. Let me elaborate: Currently, we store a direct C pointer to the target object's slot in the locative's first slot. Because we also need to update said pointer during GC, we also store the offset (in the second slot) and do lots of calculations on *every* GC, for *every* locative, to find the original object, chase its forwarding pointer and then re-apply the offset. There's a fourth slot for the type of object (which we use to determine the offset's "stride" and when dereferencing to figure out how to "convert" it. Finally, there's a fifth slot which holds either the object (again!) or #f if it's a weak locative. Graphically: | HEADER | PTR | FX offset | FX subtype | OBJ/#f | If we change it such that we *always* store the object in the first slot, we can make the distinction of weak/strong refs like we do for pairs: set the C_SPECIALBLOCK_BIT conditionally when the reference is weak, and clear it if it's strong. Graphically: --- | HEADER | OBJ | FX offset | FX subtype | --- This change has two benefits: - locatives are smaller, so (slightly) less GC pressure if lots of these are made. But more importantly: - when traversing locatives, we *only* have to process weak locatives specially in the GC, and there's no need to recalculate the pointer. This means if a program uses only strong locatives, there is *no additional penalty* incurred after a garbage collection run. The only disadvantage is that every access of a locative will now have to do the calculation to find the base pointer. This might slow down locative-ref somewhat. But I think it's manageable - we already have a switch statement in locative_ref, so adding the offset calculation for all the different types of locatives is trivial and should only add *very* minor overhead, while the GCs should all be faster. The change should be relatively minor, but because it's a change in the representation, and there may be code out there that relies on the current representation, I think this change is better left for CHICKEN 6. Cheers, Peter From 8f753a2d031a3c23198c397223f55ce3ae7ec087 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Tue, 27 Jun 2023 08:13:59 +0200 Subject: [PATCH] Replace locative table with simpler "weak chain" solution Instead of keeping track of every locative in a table, we instead use the same approach as for tracking weak pairs: during GC, as we encounter live locatives, build up a chain which we traverse when the GC has completed. We "recycle" the first slot of the locative when it is turned into a forwarding pointer for storing the chain pointer. Unlike weak pairs, we have to traverse both strong *and* weak locatives, because their pointer slots need to be fixed up. This could be improved if we change the representation of locatives to be object+offset instead of pointer+offset(+object), and have the C_SPECIALBLOCK_BIT set depending on whether it is weak/strong. This would be a fundamental representational change, so this would be better left for CHICKEN 6. --- runtime.c | 205 +++- tests/weak-pointer-test.scm | 75 - 2 files changed, 136 insertions(+), 144 deletions(-) diff --git a/runtime.c b/runtime.c index 5ff08f3f..0dc05feb 100644 --- a/runtime.c +++ b/runtime.c @@ -153,7 +153,6 @@ static C_TLS int timezone; #define DEFAULT_HEAP_MIN_FREE (4 * 1024 * 1024) #define HEAP_SHRINK_COUNTS 10 #define DEFAULT_FORWARDING_TABLE_SIZE 32 -#define DEFAULT_LOCATIVE_TABLE_SIZE32 #define DEFAULT_COLLECTIBLES_SIZE 1024 #define DEFAULT_TRACE_BUFFER_SIZE 16 #define MIN_TRACE_BUFFER_SIZE 3 @@ -402,8 +401,8 @@ static C_TLS C_word **mutation_stack_limit, **mutation_stack_top, *stack_bottom, - *locative_table, weak_pair_chain, + locative_chain, error_location, interrupt_hook_symbol, current_thread_symbol, @@ -468,8 +467,6 @@ static C_TLS double static C_TLS LF_LIST *lf_list; static C_TLS int signal_mapping_table[ NSIG ]; static C_TLS int - locative_table_size, - locative_table_count, live_finalizer_count, allocated_finalizer_count, pending_finalizer_count, @@ -544,8 +541,8 @@ static void C_fcall mark_nested_objects(C_byte *heap_scan_top
Re: [PATCH] fix some problems with the SRFI-4 syntax extensions added recently
On Sun, Jun 25, 2023 at 08:06:01AM +0200, felix.winkelm...@bevuta.com wrote: > Hi! > > Attached a patch to fix some problems with the newly introduced > extension to SRFI-4 vector read syntax. "Siiky" pointed out a bogus > empty string comparison and further testing showed that empty strings > were not handled properly in certain situations. Thanks, pushed! > As Peter remarks, this code is too hairy. I added a comment trying > to explain the reason for this. Thanks for adding a bit of explanation. Are the lists really that long though? I'd expect this syntax to be used only in user-typed literals, not in data to be read in. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] Add line number tracking to the interpreter
Hi all, I noticed that the interpreter got quite a bit slower due to fetching the line number information for an expression when emitting info into the trace buffer. Attached is an additional patch to bring things back to our original performance by pre-fetching the line number info and passing it as an additional argument to emit-trace-info. Patch has also been pushed to the line-numbers-in-csi branch. Cheers, Peter From df10eb66dc22a1538bdab4ceafd3f15812c18842 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Sun, 25 Jun 2023 10:22:01 +0200 Subject: [PATCH] Fetch line number from info in evaluator before compiling applications Emitting trace information should be as cheap as possible, it is called many times during a program's execution (upon every procedure application). The get-line-number call would slow things down by a factor of almost 2. Instead, retrieve the line number at closure compilation time and pass it directly to emit-trace-info. --- eval.scm | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/eval.scm b/eval.scm index 9a69e051..7fa72028 100644 --- a/eval.scm +++ b/eval.scm @@ -109,11 +109,11 @@ ((eq? x (##sys#slot lst 0)) i) (else (loop (##sys#slot lst 1) (fx+ i 1))) ) ) ) - (define (emit-trace-info tf info cntr e v) + (define (emit-trace-info tf ln info cntr e v) (when tf (##core#inline "C_emit_trace_info" - (or (get-line-number info) "") + ln info (##sys#make-structure 'frameinfo cntr e v) (thread-id ##sys#current-thread) ) ) ) @@ -689,37 +689,38 @@ (compile (##sys#slot x 0) e #f tf cntr #f))) (args (##sys#slot x 1)) (argc (checked-length args)) - (info x) ) + (info x) + (ln (or (get-line-number info) ""))) (case argc ((#f) (##sys#syntax-error/context "malformed expression" x)) ((0) (lambda (v) - (emit-trace-info tf info cntr e v) + (emit-trace-info tf ln info cntr e v) ((##core#app fn v ((1) (let ((a1 (compile (##sys#slot args 0) e #f tf cntr #f))) (lambda (v) -(emit-trace-info tf info cntr e v) +(emit-trace-info tf ln info cntr e v) ((##core#app fn v) (##core#app a1 v))) ) ) ((2) (let* ((a1 (compile (##sys#slot args 0) e #f tf cntr #f)) (a2 (compile (##core#inline "C_u_i_list_ref" args 1) e #f tf cntr #f)) ) (lambda (v) -(emit-trace-info tf info cntr e v) +(emit-trace-info tf ln info cntr e v) ((##core#app fn v) (##core#app a1 v) (##core#app a2 v))) ) ) ((3) (let* ((a1 (compile (##sys#slot args 0) e #f tf cntr #f)) (a2 (compile (##core#inline "C_u_i_list_ref" args 1) e #f tf cntr #f)) (a3 (compile (##core#inline "C_u_i_list_ref" args 2) e #f tf cntr #f)) ) (lambda (v) -(emit-trace-info tf info cntr e v) +(emit-trace-info tf ln info cntr e v) ((##core#app fn v) (##core#app a1 v) (##core#app a2 v) (##core#app a3 v))) ) ) ((4) (let* ((a1 (compile (##sys#slot args 0) e #f tf cntr #f)) (a2 (compile (##core#inline "C_u_i_list_ref" args 1) e #f tf cntr #f)) (a3 (compile (##core#inline "C_u_i_list_ref" args 2) e #f tf cntr #f)) (a4 (compile (##core#inline "C_u_i_list_ref" args 3) e #f tf cntr #f)) ) (lambda (v) -(emit-trace-info tf info cntr e v) +(emit-trace-info tf ln info cntr e v) ((##core#app fn v) (##core#app a1 v) (##core#app a2 v) (##core#app a3 v) (##core#app a4 v))) ) ) (else (let ((as (##sys#map (lambda (a) (compile a e #f tf cntr #f)) args))) (lambda (v) - (emit-trace-info tf info cntr e v) + (emit-trace-info tf ln info cntr e v) (apply (##core#app fn v) (##sys#map (lambda (a) (##core#app a v)) as))) ) ) ) ) ) (compile exp env #f (fx> (##sys#eval-debug-level) 0) cntr tl?) ) ) ) -- 2.40.1 signature.asc Description: PGP signature
Re: [PATCH] Extend SRFI-4 vector syntax
On Wed, Jun 07, 2023 at 09:12:51PM +0200, felix.winkelm...@bevuta.com wrote: > This patch allows strings and chars in homogenous number vectors, > as suggested by klm at the meetup. Code is a bit hairy, but seems to work. Pushed! Cheers, Peter signature.asc Description: PGP signature
[PATCH] Add line number tracking to the interpreter
Hi all, Attached is a set of patches (also to be found in the "line-numbers-in-csi" in the branch) to add line number tracking to the interpreter (both for csi and user-invoked (repl) calls). The first commit simply moves some code around. Having the line number database accessors in support.scm makes them only available to the compiler, while we now want to be able to use them from the interpreter as well. So we move it to expand.scm. The second patch changes the interpreter to track line numbers via fluid-letting the ##sys#read/source-info-hook, like the compiler does. In order to store the line number info in the trace buffer, we have to also extend the trace buffer struct to hold either raw C strings or Scheme strings for the location. The third patch changes the alist we store in the line number hash table to weakly hold onto the form (using the new weak-pairs feature), so that we can drop unreferenced forms from the database. I tested this with a trivial program consisting of two files: ;; loader.scm (import chicken.syntax) (let lp ((i 0)) #;(when (= 0 (modulo i 999)) (##sys#display-line-number-database)) (when (< i 999) (load "def.scm") (lp (add1 i ;; def.scm (define x (lambda (y) (+ y 1))) If you run csi -s loader.scm, this would eat up more and more memory with only the first two patches, but it stays constant with the third patch. The fourth patch improves line number tracking a bit to ensure we keep line numbers during syntax expansion (like in the compiler). Without this, you see a lot of un-numbered entries in the csi trace output. The fifth patch moves the tracking to (repl), so it also works for user-created REPLs, not just csi. The sixth patch is a small refactor so we're not using fluid-let to override the ##sys#default-read-info-hook for (read), but instead use ##sys#read/source-info directly in all code that will read and then evaluate code. I'm only wondering if we need to make the hash table itself weak as well. This would require quite a few more changes and I'm not sure of the benefits. Cheers, Peter From 1330e7d252eac583d6c0bbabc4917a99a1437135 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Tue, 20 Jun 2023 15:14:53 +0200 Subject: [PATCH 1/6] Move line number database accessors from support.scm to expand.scm Because support.scm is only available inside the compiler, we can't call its procedures in the interpreter. Since the line number database variable itself is defined in expand.scm, it makes sense to put the accessors there as well. While moving, it became apparent that get-line from support.scm is essentially the same as get-line-number from expand.scm, so use that everywhere instead. The other procedures are now prefixed with ##sys# because we don't want to export them (yet?) from the user-visible chicken.syntax module. Rewrite display-line-number-database from printf to ##sys#print to avoid adding a dependency on extras.scm / (chicken format) in expand.scm. It's still not 100% isolated - it would be better if the line number database itself would purely be accessed through accessors, but it is currently mutated directly in various places. --- batch-driver.scm| 6 +++--- compiler-syntax.scm | 7 --- core.scm| 32 +++ expand.scm | 46 +++-- rules.make | 8 ++-- support.scm | 44 +++ 6 files changed, 76 insertions(+), 67 deletions(-) diff --git a/batch-driver.scm b/batch-driver.scm index 0b3d7e02..ee7ae28a 100644 --- a/batch-driver.scm +++ b/batch-driver.scm @@ -27,7 +27,7 @@ (declare (unit batch-driver) - (uses extras data-structures pathname + (uses extras data-structures pathname expand support compiler-syntax compiler optimizer internal ;; TODO: Backend should be configurable scrutinizer lfa2 c-platform c-backend user-pass)) @@ -608,7 +608,7 @@ (in (check-and-open-input-file f)) ) (fluid-let ((##sys#current-source-filename f)) (let loop () - (let ((x (read/source-info in))) + (let ((x (##sys#read/source-info in))) (cond ((eof-object? x) (close-checked-input-file in f) ) (else @@ -670,7 +670,7 @@ (when (debugging '|N| "real name table:") (display-real-name-table) ) (when (debugging 'n "line number database:") - (display-line-number-database) ) + (##sys#display-line-number-database) ) (set! ##sys#line-number-database line-number-database-2) (set! line-number-database-2 #f) diff --git a/compiler-syntax.scm b/compiler
Re: [PATCH] fix segfault in csi's ",d"
On Mon, Jun 19, 2023 at 01:38:13PM +0200, felix.winkelm...@bevuta.com wrote: > Attached patch fixes a segfault when trying to describe a brokwn weak > ptr. I also added a clause in the case construct to dispatch on a > described value to catch unknown immediate values instead of crashing > in the locative test (which expects block values). Thanks for fixing my oversight ;) Good call on "hardening" it a bit against unexpected other values. I've pushed the patch. Cheers, Peter signature.asc Description: PGP signature
Re: Hyperbolic Functions Patch Set
On Mon, Jun 05, 2023 at 08:12:01PM +0200, felix.winkelm...@bevuta.com wrote: > Thanks a lot, Christian, much obliged. Indeed, thanks Christian! > I have combined the two patches and amended the result slightly (wrappers in > runtime.c for all trigonometric functions were not calling the > aliases from chicken.h, something we forgot and which I change here, > while we're at it. I also added a few lines of documentation). I added a NEWS entry and updated the types.db file to have these new procedures as well. > We now just need some other -hacker to sign off push this. Done! Cheers, Peter signature.asc Description: PGP signature
[PATCH] Allow collecting weak pairs in minor GC (was: Re: [PATCH] Add user-facing weak pair API)
On Sat, Jun 03, 2023 at 11:12:40PM +0200, Peter Bex wrote: > Attached are 5 patches to add the weak pair support. I've also pushed > this to the git repo under the "user-facing-weak-pairs" branch. Attached are two more patches that together wrap up the weak pairs support. The first patch gets rid of the hacks that were needed to get weak pairs working *efficiently* in the original implementation, by "ignoring" weak pairs in minor GCs and treating them like regular pairs instead. Those hacks were needed because initially, weak pairs required traversing the entire symbol table which would be prohibitively expensive during minor GC. The second patch is a minor performance / quality of implementation improvement that came from an insight I had while enabling minor GC for weak pairs. I realised that sometimes, we'd hit the assertion that a weak pair's car must be a block object in update_weak_pairs(). This got hit because "somehow" the car already contained a #!bwp object. Then I realised that's because the same pair gets linked up in the weak pair chain more than once, and this is more likely to trigger when minor GCs are allowed to collect weak pairs because minor GCs are likely to trigger major GCs. In the first patch I papered over this by adding a special case for this in update_weak_pairs(), but it's cleaner to avoid the situation in the first place. This is done by resetting the weak pair chain to NULL when a new GC "mode" is entered. That works because each forwarded weak pair is guaranteed to be forwarded again in the new mode, and every non-forwarded weak pair is *also* guaranteed to be forwarded, if it is live. I've pushed these two patches to the user-facing-weak-pairs branch as well. I ran a benchmark in hopes it would improve performance, but it doesn't seem to have any effect, which makes sense of course. The benchmark programs are all compiled, so their symbols won't be collected, as the strings for the symbols are all in static ("permanent") storage. I expect it to have a larger effect on interpreted code and code that generates a lot of short-lived symbols (e.g. macro-heavy code). Anyway, remaining TODOs that can be tackled after merging: - Doing the same for locatives (should greatly affect GC times in locative-heavy code, even for minor GCs!) - Adding line numbers (why we did this in the first place!) Cheers, Peter From 62fcf64a9d6f311730abfae7c664b1a29950f318 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Fri, 9 Jun 2023 08:17:13 +0200 Subject: [PATCH 1/2] Enable minor collection of weak pairs In the initial weak symbol GC implementation, we added a temporary hack in mark_nested_objects() to skip weak pairs in minor GC. This was required because back then, we still had to scan the entire symbol table in order to correctly drop GCed symbols. Now that we're not doing that anymore, we can collect weak pairs in minor GCs too. This allows us to drop said hack. We also can drop an extra check in really_mark which was added to prevent adding weak pairs to the weak pair chain in minor GC. Drop the assert in update_weak_pairs and refactor the liveness checks to be more comprehensive (the old checks didn't account for pairs allocated in static memory or malloced memory - not likely, but might be possible now they're user-facing), and to include stack checks for the minor GC. Change the weak pointer tests not to do major collections, and add (slightly paranoid) checks that "permanent" symbols are kept around. They're paranoid because permanent symbols don't actually live in static memory - only their strings do. So they should not result in broken weak pointers anyway. While working on this, I noticed that there's a small but odd limitation - when using set-car! on a weak pair, depending on where the pair and the car value live, we add it to the mutation stack, which gets marked unconditionally. Therefore, the value will be kept alive on the next GC even if nothing else holds onto it. To avoid this, we would need to know in C_mutate whether the slot we're mutating is a GC-managed slot (ie it is not the first slot of a "special block"). We don't have this information there, and making it available would entail a massive API change. This is not worth the hassle, so document this minor limitation, instead. --- manual/Module (chicken base) | 4 runtime.c| 40 ++-- tests/weak-pointer-test.scm | 38 ++ 3 files changed, 53 insertions(+), 29 deletions(-) diff --git a/manual/Module (chicken base) b/manual/Module (chicken base) index d26ba5ad..11d9c441 100644 --- a/manual/Module (chicken base) +++ b/manual/Module (chicken base) @@ -176,6 +176,10 @@ They're the same as regular pairs for all intents and purposes. However, there's a {{weak-pair?}} predicate which ''can'' distinguish between r
Re: [PATCH] Compact symbol table in lookup/intern, and fix a few gitignores (was: [PATCH] Add user-facing weak pair API)
On Sat, Jun 03, 2023 at 11:12:40PM +0200, Peter Bex wrote: > Attached are 5 patches to add the weak pair support. I've also pushed > this to the git repo under the "user-facing-weak-pairs" branch. I've pushed two more patches, attached here as well. The first one implements the idea of moving the "compaction" of the symbol table out of the GC and into the lookup/intern process, so that we don't traverse the (possibly unchanged) entire symbol table every time we do a major GC. It seems to give a minor performance improvement to a few benchmarks, but it could just be noise. See the attached benchmark.diff file. The second just adds a few missing .gitignore rules for two files I noticed under "Untracked files": libchicken.so.11 and srfi-4.import.scm Cheers, Peter From 59c9cbc180506c61d64e7d0cb6532fa64e3347c9 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Wed, 7 Jun 2023 09:56:18 +0200 Subject: [PATCH 1/2] Move symbol table compaction to lookup/interning of symbols Instead of adding an additional pass to the GC in which we scan over the symbol tables and drop empty buckets, do this in lookup(). The lookup() function is used by all the *intern*() functions and in C_lookup_symbol() as well, so this is done on every symbol lookup and interning of symbols. The check is relatively cheap and we only compact buckets we actually have to look at anyway. This means we may keep around "broken" buckets for longer than necessary, if we don't perform any lookups or interning in between GCs. This is probably fine - only symbol-heavy code would create lots of broken buckets anyway, and symbol-heavy code would be doing lookups, so we'd be doing the compaction more often as well. This does not touch lookup_bucket() which is used to persist/unpersist a given symbol by toggling its bucket's "is a weak pair" bit aka specialblock bit. We could do it there too, but I figured it makes more sense to only do this when we have to walk the table for string->symbol and ##sys#interned-symbol?, as those are typically expected to be somewhat "heavy" already. We also have to tweak compute_symbol_table_load(), as that is the function we use in the symbolgc test to find out how many live symbols we still have. And more in general, it makes sense to not count buckets with broken weak pointers as "live symbols" anyway. While at it, drop some of the paranoid debugging checks as it would be a bit awkward to try and shoehorn those into the lookup function(). --- runtime.c | 139 +++--- 1 file changed, 27 insertions(+), 112 deletions(-) diff --git a/runtime.c b/runtime.c index d3a3b750..755a9bec 100644 --- a/runtime.c +++ b/runtime.c @@ -546,8 +546,6 @@ static void C_fcall mark_live_heap_only_objects(C_byte *tgt_space_start, C_byte static C_word C_fcall intern0(C_char *name) C_regparm; static void C_fcall update_locative_table(int mode) C_regparm; static void C_fcall update_weak_pairs(int mode) C_regparm; -static void C_fcall fixup_symbol_forwards(C_word sym) C_regparm; -static void C_fcall update_symbol_tables(int mode) C_regparm; static LF_LIST *find_module_handle(C_char *name); static void set_profile_timer(C_uword freq); static void take_profile_sample(); @@ -2449,16 +2447,24 @@ C_regparm C_word C_fcall hash_string(int len, C_char *str, C_word m, C_word r, i C_regparm C_word C_fcall lookup(C_word key, int len, C_char *str, C_SYMBOL_TABLE *stable) { - C_word bucket, sym, s; + C_word bucket, last = 0, sym, s; for(bucket = stable->table[ key ]; bucket != C_SCHEME_END_OF_LIST; bucket = C_block_item(bucket,1)) { sym = C_block_item(bucket,0); -s = C_block_item(sym, 1); -if(C_header_size(s) == (C_word)len - && !C_memcmp(str, (C_char *)C_data_pointer(s), len)) - return sym; +/* If the symbol is unreferenced, drop it: */ +if (sym == C_SCHEME_BROKEN_WEAK_PTR) { + if (last) C_set_block_item(last, 1, C_block_item(bucket, 1)); + else stable->table[ key ] = C_block_item(bucket,1); +} else { + last = bucket; + s = C_block_item(sym, 1); + + if(C_header_size(s) == (C_word)len + && !C_memcmp(str, (C_char *)C_data_pointer(s), len)) +return sym; +} } return C_SCHEME_FALSE; @@ -2538,14 +2544,23 @@ C_regparm C_word C_fcall lookup_bucket(C_word sym, C_SYMBOL_TABLE *stable) double compute_symbol_table_load(double *avg_bucket_len, int *total_n) { - C_word bucket; + C_word bucket, last; int i, j, alen = 0, bcount = 0, total = 0; for(i = 0; i < symbol_table->size; ++i) { -bucket = symbol_table->table[ i ]; - -for(j = 0; bucket != C_SCHEME_END_OF_LIST; ++j) - bucket = C_block_item(bucket,1); +last = 0; +j = 0; +for(bucket = symbol_table->table[ i ]; bucket != C_SCHEME_END_OF_LIST; +bucket = C_block_item(bucket,1)) { +
Re: [PATCH] Add user-facing weak pair API
On Tue, Jun 06, 2023 at 07:46:10PM +0200, felix.winkelm...@bevuta.com wrote: > > Do we even want to make a 5.4 release at all? > > I would suggest to add this for 5.4, for the reasons you > state. I also would be for making at least one minor release > before C6, as I doubt that we will get UTF support with > all the eggs into a robust state that soon. Code that > wants to test for this feature could cond-expand on 5.4 > (and later versions). Sure, that's probably the better way to go. Perhaps the feature name should simply be "weak-pairs". WDYT? Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] Add user-facing weak pair API
On Sat, Jun 03, 2023 at 11:12:40PM +0200, Peter Bex wrote: > Dear hackers, > > At the Village CHICKENs event, I gave a presentation about how we could > add "proper" support for weak pairs, so that they can be exposed to the > user. PPS: When writing tests for weak pairs, I realised I made a mistake in my presentation: cyclic lists of weak pairs are not a problem, because the cars would *still* be cleared. The issue, IIUC, is when both the cdr and the car (or a car of one weak pair and a cdr of another) refer to the same object, *then* it can't be collected. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] Add user-facing weak pair API
On Sat, Jun 03, 2023 at 11:12:40PM +0200, Peter Bex wrote: > Dear hackers, > > At the Village CHICKENs event, I gave a presentation about how we could > add "proper" support for weak pairs, so that they can be exposed to the > user. Right now we have half-baked support for weak pairs as a hidden > implementation detail of the symbol table. PS: I'm not sure if this should go into 5.4 or 6.0. On the one hand, it's a relatively minor addition to the API. On the other hand, if anyone wants to rely on it we might need to provide a feature for it whereas in 6.0 it would be trivial. Do we even want to make a 5.4 release at all? Cheers, Peter signature.asc Description: PGP signature
[PATCH] Add user-facing weak pair API
Dear hackers, At the Village CHICKENs event, I gave a presentation about how we could add "proper" support for weak pairs, so that they can be exposed to the user. Right now we have half-baked support for weak pairs as a hidden implementation detail of the symbol table. This would be useful for users, but also for core: this would make it easier to support line numbers in the interpreter without unbounded memory usage - the expression in source code would be the key, the line number where it occurs would be the value in the database. When an expression gets discarded because nothing hangs on to it, we can drop the corresponding entry from the database. Slides of the talk are here: https://code.more-magic.net/weak-refs-for-chicken/tree/weak-refs.org I also expect to publish a more accessible writeup of this on my blog, but don't hold your breath - it may take a while. Attached are 5 patches to add the weak pair support. I've also pushed this to the git repo under the "user-facing-weak-pairs" branch. The patches contain more information about the specific approach. Especially of interest is the trick to build a linked list in the "dead space" of the old cdr that is still allocated behind forwarding pointers of weak pairs that makes this efficient. I've taking this idea from MIT Scheme, see "src/microcode/gcloop.c" in their source code repository for more info. Viewable online at: https://git.savannah.gnu.org/cgit/mit-scheme.git/tree/src/microcode/gcloop.c?id=545efe9dcaac695d4d5c34de7497543b8dd4f2b0#n789 Even though we take this neat trick from MIT Scheme, I decided not to use their API as it is a bit awkward. It may be more Schemely as weak pairs are completely distinct from regular pairs in MIT Scheme, but there's an issue with the way you can detect broken weak pairs; you have to follow the pattern in the manual where you access the pair to make sure it wasn't actually storing #f. See here: https://web.mit.edu/scheme_v9.2/doc/mit-scheme-ref/Weak-Pairs.html Instead, this set of patches follows the Chez Scheme API: it adds only three procedures: weak-cons, weak-pair? and the broken-weak-pointer? predicate, whilst all the regular pair operations transparently accept either regular or weak pairs. It is much more expedient as it means you get to use all the list operators, and you get printing of weak pairs for free, since they get serialized like regular pairs. And it's extensible - if we ever decide to support "ephemerons", we could also go with the ephemerons API from Chez. See the following page for more info about Chez's API: https://cisco.github.io/ChezScheme/csug9.5/smgmt.html#./smgmt:h2 NOTE: Changing locatives to use the "chain of forwarding pointers to weak container objects" trick (so we can drop the locative table) is still to be done. NOTE 2: I wasn't 100% sure if (chicken base) is the best place to put these new procedures. We could also add them to a new module, e.g. (chicken weak-pair) or (chicken weak-pointer) or some such, or add them to the extant (chicken gc) module (although that's a bit odd since locatives are in their own rather than under (chicken gc))... However, it's only three procedures, they sort of "match" cons, pair? etc in (scheme base), and finally Chez also puts them in their (chezscheme) module (which kind of makes sense in that it holds the "core" data types). This is great bikeshed discussion fodder, of course. NOTE 3: I tried to keep the commits bite-sized and form a narrative of sorts so that it should be easy to understand even if you haven't seen my talk at the Village CHICKEN event. Cheers, Peter From 1d477b2857ec3f01c5367dadc5725b85e7b53799 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Sat, 3 Jun 2023 11:20:40 +0200 Subject: [PATCH 1/5] Replace special-casing of weak symbol GC with generic weak pair GC When we introduced weak pairs, it was initially rather "customized" for the specific case of bucket chains of internal symbol tables. This is the first step to make make it more generic, so that we can eventually support a user-facing weak pair type. In this change, when we mark a weak pair in the GC: - First we convert it to a forwarding pointer (as usual) - Then, we "recycle" the old value's car field to make it part of a chain of all the forwarded weak pairs. We can get away with that because the pair remains allocated as 3 words, even if the forwarding pointer we replace it with is only 1 word. - When GC is done, we traverse this chain and follow each forwarding pointer. The car field of the weak pair in the tospace will still hold a pointer into fromspace, which needs to be either fixed up (if the target value was replaced with a forwarding pointer) or cleared. --- runtime.c | 181 +++--- 1 file changed, 133 insertions(+), 48 deletions(-) diff --git a/runtime.c b/
[PATCH] Stop run-time option processing after first "-:" or non-runtime option (was: Re: [PATCH] stop run-time option processing after "--")
On Tue, Mar 14, 2023 at 08:43:52AM +0100, Peter Bex wrote: > On Tue, Mar 14, 2023 at 08:40:18AM +0100, felix.winkelm...@bevuta.com wrote: > > Backwards compatibility is from now on fucked anyway, so let's > > try to find somethin simple and sensible. > > Then we should do both. I.e., have -: to explicitly tell it to stop > parsing (and have that dropped from the command-line-options) and > stop at the first argument. The -: would be in situations where > a script just passes on all its arguments, like > > ./my-program -: "$@" Here's a patch that implements this proposal. Cheers, Peter From dce61eb2d7caf7f570701161ee1cada07a862835 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Fri, 24 Mar 2023 10:42:49 +0100 Subject: [PATCH] Stop run-time option processing after "-:" or the first non-runtime option This means that runtime options after non-runtime options will no longer be processed. Programs using command-line-arguments will see all options starting at the first non-runtime option, even if later options start with "-:". Processing can also be stopped explicitly with a single "-:" option, which will not be seen by programs using command-line-arguments. Every argument following this "-:" will be seen, even if they start with "-:". --- NEWS| 4 + library.scm | 22 +- manual/Module (chicken process-context) | 22 +- manual/Using the compiler | 15 +- runtime.c | 258 5 files changed, 179 insertions(+), 142 deletions(-) diff --git a/NEWS b/NEWS index 23661fc2..6ecbbad2 100644 --- a/NEWS +++ b/NEWS @@ -5,6 +5,10 @@ to avoid arbitrary OS command injection during egg installation, reported by Vasilij Schneidermann who also provided the necessary patches to mitigate this problem. + - The runtime option "-:b" has been removed, as it was deemed too +insecure to be able to drop to a REPL from the CLI of any program. + - Runtime option processing has been hardened: processing now stops on +the first non-runtime option or after "-:", whichever comes first. - Core libraries - Added "locative-index", kindly contributed by John Croisant. diff --git a/library.scm b/library.scm index 827666d5..819dacfe 100644 --- a/library.scm +++ b/library.scm @@ -6024,17 +6024,23 @@ static C_word C_fcall C_setenv(C_word x, C_word y) { (define command-line-arguments (make-parameter - (let ([args (argv)]) + (let ((args (argv))) (if (pair? args) -(let loop ([args (##sys#slot args 1)]) +(let loop ((args (##sys#slot args 1))) ; Skip over program name (argv[0]) (if (null? args) '() - (let ([arg (##sys#slot args 0)] -[r (##sys#slot args 1)] ) -(if (and (fx>= (##sys#size arg) 3) - (string=? "-:" (##sys#substring arg 0 2))) -(loop r) -(cons arg (loop r)) ) ) ) ) + (let ((arg (##sys#slot args 0)) +(rest (##sys#slot args 1)) ) +(cond + ((string=? "-:" arg) ; Consume first "empty" runtime options list, return rest + rest) + + ((and (fx>= (##sys#size arg) 3) + (string=? "-:" (##sys#substring arg 0 2))) + (loop rest)) + + ;; First non-runtime option and everything following it is returned as-is + (else args) ) ) ) ) args) ) (lambda (x) (##sys#check-list x 'command-line-arguments) diff --git a/manual/Module (chicken process-context) b/manual/Module (chicken process-context) index 5299905e..87165187 100644 --- a/manual/Module (chicken process-context) +++ b/manual/Module (chicken process-context) @@ -13,7 +13,7 @@ This module provides access to the current process context. Returns two values: an integer and a foreign-pointer object representing the {{argc}} and {{argv}} arguments passed to the current -process. +process. See also {{argv}} below. argv @@ -24,13 +24,27 @@ the list is a string containing the name of the executing program. The other items are the arguments passed to the application. It depends on the host-shell whether arguments are expanded ('globbed') or not. +NOTE: This is the "raw" unprocessed argument list, including runtime +options (starting with {{-:}}) which are consumed by the runtime +library. + command-line-arguments (command-line-arguments) -Contains the list of arguments passed to this program, with the name of -the program and any runtime options (all options starting with {{-:}}) -removed. +Contains the list of arguments passed to this program. + +This ''excludes'' the name
Re: [PATCH] stop run-time option processing after "--"
On Tue, Mar 14, 2023 at 02:48:28PM +0100, Mario Domenech Goulart wrote: What should the behavior of executables be in case they are compiled > with the runtime options parser disabled (e.g., > -disable-runtime-options, as per Felix' previous patch)? Assuming we go with the decision to process until we hit either "-:" or the first non-runtime option, I think programs with the parser disabled could still consume the first argument if it is identical to "-:". I prefer that over doing completely nothing, as that makes it easier to disable runtime parsing in any existing program at any point - all the existing scripts for that program which already use "-:" to disable runtime option parsing would continue to work without modification. Thanks for asking, this is an important point to discuss! NOTE: Perhaps a simpler alternative to that patch would be to treat any *non-empty* runtime option as an error (like we do for unknown options now) and error out with a message saying runtime option parsing has been disabled. This would give a better hint to the user that they could decide to recompile the program with runtime parsing enabled. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] stop run-time option processing after "--"
On Tue, Mar 14, 2023 at 08:40:18AM +0100, felix.winkelm...@bevuta.com wrote: > > Of course. In fact, I think it would make more sense to simply tell the > > runtime options parser to stop after the first non-"-:"-prefixed > > argument. That makes runtime argument stuffing harder and allows it > > to play nice with _any_ option parser, and makes the "--" behaviour > > automatically work if the program already handles getopt-style options. > > That's another idea. Backwards compatibility is from now on > fucked anyway, so let's try to find somethin simple and sensible. Then we should do both. I.e., have -: to explicitly tell it to stop parsing (and have that dropped from the command-line-options) and stop at the first argument. The -: would be in situations where a script just passes on all its arguments, like ./my-program -: "$@" Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] stop run-time option processing after "--"
On Tue, Mar 14, 2023 at 08:39:22AM +0200, Lassi Kortela wrote: > Is there any reason -: runtime options need to be recognized beyond the > first argument? Gambit also uses the -: syntax, and it does not. Making it compatible might be nice, but involves a change in how these options are parsed - they're comma separated. I wouldn't be opposed to this change, but it deviates further from what we have making backwards compatibility more of an issue. This might not be a problem since these options are relatively obscure and it should be easy to tweak a calling script. It would be harder to make a script that can work with older and newer versions of CHICKEN, since the old one wouldn't grok commas in the options and the new one wouldn't grok separately supplied options. Perhaps something for CHICKEN 6? Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] stop run-time option processing after "--"
On Mon, Mar 13, 2023 at 09:35:53PM +0100, felix.winkelm...@bevuta.com wrote: > > In other words, "./my-program -:a100 -- foo" is seen as '("--" "foo") > > when you ask for (command-line-arguments), while the intention probably > > was '("foo"). > > Actually not. Currently "--" is not specially handled, so it makes > no difference whether r/t options are passed or not ("command-line-arguments") > returns exactly the same. With or without this patch, "--" will be > seen by user code. I know, my point is that this change breaks backwards compatibility. Let's say one has a pre-existing program which has no flag handling code. For example a simple program that only accepts file names on its command line, but doesn't do option processing. Let's say the user would like to "harden" the invocation to said program, they'd have to _add_ the "--". This would then be interpreted by said program as a file name. My point is that said program would still be able to parse run-time options (because that's baked in to each program). Therefore, it makes sense to have the run-time option parser be 100% transparent to the program and have its own way of stopping processing, invisible to the program. > If the program does not parse the command line, having > a "--" has no effect. If it does, it is probably a good idea (and > good style) to be aware of it and handle the case appropriately. For > "my-program" above, the "--" in the example invocation is immaterial. Of course. In fact, I think it would make more sense to simply tell the runtime options parser to stop after the first non-"-:"-prefixed argument. That makes runtime argument stuffing harder and allows it to play nice with _any_ option parser, and makes the "--" behaviour automatically work if the program already handles getopt-style options. Note that this is _also_ a backward compatibility breaker, but IMO the lesser of two evils as it doesn't introduce spurious command line arguments into existing programs. > But if "my-program" does process the command-line, it may be > appropriate to handle "--" anyway (mostly to distinguish its own options > from non-option arguments). Indeed. It might make sense, but the point was to be able to disable runtime option processing, and IMO that should be possible even without requiring the program to cooperate. Note that it's also a very easy thing to forget to handle "--" specially in programs that don't use a gettext-style options parser (i.e., all of the core tools! If we do go with this approach, all those would need to be patched to handle "--" as well because they all have hand-rolled options parsers. Most of my own programs do, too). Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] stop run-time option processing after "--"
On Mon, Mar 13, 2023 at 10:37:30AM +0100, felix.winkelm...@bevuta.com wrote: > Another problem with run-time option processing pointed out by > florz. This allows passing arguments that look like run-time > options, when preceded by "--". This needs more thought - simply adding "--" to stop it from processing options leaves us with ("--") as the first command-line argument. In other words, "./my-program -:a100 -- foo" is seen as '("--" "foo") when you ask for (command-line-arguments), while the intention probably was '("foo"). Just dropping "--" from the arguments list is probably also not desirable, because our option parser doesn't have any getopt-like behaviour, so -xyz or --xyz aren't treated specially in any way. Perhaps the safest here is to stop processing after reading ":-" or some such (and discard it!) - that would be guaranteed to never pass through to the program anyway, because it might be a runtime option and already gets intercepted by the option parser. Only potential issue is that it's not exactly standard when compared to other programs... What do y'all think? Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] drop "b" runtime option
On Sun, Mar 12, 2023 at 06:04:49PM +0100, felix.winkelm...@bevuta.com wrote: > As rightly pointed out by florz and others, the "b" (break on > repl) offers a potential security vulnerability by allowing > code from stdin to be interpreted. This patch simply removes > the option, as it doesn't seem to be widely used, anyway. Good idea to drop it, like you said it's not widely used (I didn't even really know about it). Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] rename egg status files
On Sun, Oct 30, 2022 at 02:47:47PM +0100, felix.winkelm...@bevuta.com wrote: > See #1753, as reported by Kon. This will bust egg caches, but should otherwise > not cause any incompatibilities. Pushed, thanks Felix. However, I noticed it would still just install an egg I had previously downloaded, keeping the original VERSION, STATUS and TIMESTAMP files and not creating _VERSION, _STATUS and _TIMESTAMP. Not sure if that's correct, since you mention it would bust egg caches... Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] try to make generation of egg-info file on Windows more robust
On Wed, Nov 02, 2022 at 10:24:13AM +0100, felix.winkelm...@bevuta.com wrote: > The attached patch was contributed by Vassilij. See #1800. Thanks guys, pushed the updated version of this we discussed elsewhere. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] handle misquoting of target in build-scripts
On Mon, Oct 31, 2022 at 05:46:55PM +0100, felix.winkelm...@bevuta.com wrote: > Thanks, pushed. I had to hand-apply the patch as git-am didn't like it. I also > fixed a scrutinizer warning. I suppose the warning was about target-librarian-options not being a list? Unfortunately, splicing in these options wouldn't work properly - they're taken from a baked in foreign value which may contain spaces. We'll have to figure out a way to keep certain arguments "as-is". Attached is a pretty straightforward proposal to make this work. PS: I noticed two more mistakes that have nothing to do with this - just the platform was still being passed in to "filelist" and was *not* being passed in to print-build-command. I've fixed these already and pushed them as trivial patches. Cheers, Peter From 651f70a66ed2ab3e70334466ae8a93b57ffbaa10 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Tue, 1 Nov 2022 10:39:32 +0100 Subject: [PATCH] Add a way to pass in already-quoted arguments to qs* The librarian options are kept around in the binary as a foreign string which may contain multiple flags separated by spaces. These should be taken verbatim by print-build-command. The way we do this is by wrapping such verbatim snippets in a "raw" record type and detecting these values in qs* and unpacking their strings directly from the struct instead of quoting them. --- egg-compile.scm | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/egg-compile.scm b/egg-compile.scm index 662fad3d..c1f2ceb0 100644 --- a/egg-compile.scm +++ b/egg-compile.scm @@ -650,7 +650,7 @@ link-objects (print-build-command (list out3) `(,out2 ,@lobjs) -`(,target-librarian ,target-librarian-options ,out3 ,out2 ,@lobjs) +`(,target-librarian ,(raw-arg target-librarian-options) ,out3 ,out2 ,@lobjs) platform))) (print-end-command platform))) @@ -1152,10 +1152,19 @@ EOF ;; have to undo that here again. It can also convert slashes to ;; backslashes on Windows, which is necessary in many cases when ;; running programs via "cmd". +;; +;; It also supports already-quoted arguments which can be taken as-is. (define (qs* arg platform #!optional slashify?) - (let* ((arg (->string arg)) -(path (if slashify? (slashify arg platform) arg))) -(qs path (if (eq? platform 'windows) 'mingw32 platform + (if (raw-arg? arg) + (raw-arg-value arg) + (let* ((arg (->string arg)) +(path (if slashify? (slashify arg platform) arg))) + (qs path (if (eq? platform 'windows) 'mingw32 platform) + +(define-record-type raw-arg + (raw-arg value) + raw-arg? + (value raw-arg-value)) (define (slashify str platform) (if (eq? platform 'windows) -- 2.36.2 signature.asc Description: PGP signature
[PATCH] Fix double normalize-destination call
Hi all, While testing my previous patch on Windows, I noticed some weird error output during the installation phase. It would say several times in a row: The filename, directory name, or volume label syntax is incorrect. If you look at the foo.install.bat file, there's calls to mkdir which include the chicken PREFIX twice. Then, I realised this also happens in *nix. For example, try to install the "amb" egg. Just before it emits the egg-info file, it emits several calls to mkdir -p /path/to/chicken/path/to/chicken/share, which ought to be just /path/to/chicken/share. After investigating further, I realised these 5 calls correspond to the (data ...) components in the egg, which aren't actually installed! If you install the amb egg and then run the following command from the target directory: "find -name amb-dwelling.scm", it turns up nothing. The amb-dwelling.scm file also doesn't even occur in the install script. It looks like that's actually a bug in the egg though - contrast to the srfi-19 egg which *does* work, it has (data srf-29-bundles (files ...)) instead. The attached patch fixes the double call to normalize-destination on the directory for "share" files (data files) Cheers, Peter From e8341cca08c43e8bac1c5d08aaac670f5c37d211 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Mon, 31 Oct 2022 13:06:26 +0100 Subject: [PATCH] Do not double call normalize-destination on share dir In install-random-files, don't call normalize-destination on the destination directory - this is already done in compile-egg-info Before, we'd see things like in the install script like: mkdir -p /path/to/chicken/path/to/chicken/share With this patch, it becomes: mkdir -p /path/to/chicken/share which is as it should be. --- egg-compile.scm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/egg-compile.scm b/egg-compile.scm index 6818c6e4..5a59a43f 100644 --- a/egg-compile.scm +++ b/egg-compile.scm @@ -990,7 +990,7 @@ (root (string-append srcdir "/")) (mkdir (mkdir-command platform)) (sfiles (map (cut prefix srcdir <>) files)) - (dfile (qs* (normalize-destination dest mode) platform #t)) + (dfile (qs* dest platform #t)) (ddir (shell-variable "DESTDIR" platform))) (print "\n" mkdir " " ddir dfile) (let-values (((ds fs) (partition directory? sfiles))) -- 2.36.2 signature.asc Description: PGP signature
Re: [PATCH] handle misquoting of target in build-scripts
On Sun, Oct 30, 2022 at 02:46:49PM +0100, felix.winkelm...@bevuta.com wrote: > Another follow-up patch on the recent change of quoting in chicken-install, > which causes > chicken-do to always fire (and thus rebuilding any egg, regardless of status). I noticed there were still inconsistencies here. How about the attached patch instead? It may be a little bit trickier to read (but not by much), but should eliminate the problem entirely for chicken-do calls. Cheers, Peter From 730c74994a5feb96ef5d1d7ac2afdefbad65db7a Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Mon, 31 Oct 2022 11:44:52 +0100 Subject: [PATCH] Hopefully completely fix quoting hell in generated build commands Instead of using a mess of "qs*", "joins", "slashify" and "filelist" and hoping for the best when generating chicken-do commands, use a more principled method by way of a helper procedure to print the build command. This helper will receive *only* unquoted arguments: the list of targets, list of dependencies and the build command line with flags as a list. It then calls "qs*" on all of these where needed and emits a corresponding chicken-do line. By doing it this way, it's much more obvious where quotation happens: only in print-build-command, and never in the procedure that calls it. For consistency, also change prepare-custom-command so that it accepts an unquoted filename, so that quotation is delegated to it. For now, leave the code that emits installation commands untouched. We'll probably want to do the same for these though. --- egg-compile.scm | 333 +--- 1 file changed, 142 insertions(+), 191 deletions(-) diff --git a/egg-compile.scm b/egg-compile.scm index 23af8b4e..6818c6e4 100644 --- a/egg-compile.scm +++ b/egg-compile.scm @@ -595,9 +595,8 @@ link-objects modules custom types-file inline-file) srcdir platform) - (let* ((cmd (qs* (or (custom-cmd custom srcdir platform) - default-csc) - platform)) + (let* ((cmd (or (custom-cmd custom srcdir platform) + default-csc)) (sname (prefix srcdir name)) (tfile (prefix srcdir (conc types-file ".types"))) (ifile (prefix srcdir (conc inline-file ".inline"))) @@ -613,51 +612,46 @@ (list "-emit-inline-file" ifile) '( (out1 (conc sname ".static")) - (out2 (qs* (target-file (conc out1 - (object-extension platform)) - mode) -platform)) + (out2 (target-file (conc out1 + (object-extension platform)) +mode)) (out3 (if (null? link-objects) out2 - (qs* (target-file (conc out1 - (archive-extension platform)) - mode) -platform))) + (target-file (conc out1 + (archive-extension platform)) +mode))) (targets (append (list out3 lfile) (maybe types-file tfile) (maybe inline-file ifile) (map (lambda (m) (prefix srcdir (conc m ".import.scm"))) (or modules '() - (src (qs* (or source (conc name ".scm")) platform))) + (src (or source (conc name ".scm" (when custom (prepare-custom-command cmd platform)) -(print "\n" (qs* default-builder platform #t) " " - (joins targets platform) " : " - src " " (qs* eggfile platform) " " - (if custom cmd "") " " - (filelist srcdir source-dependencies platform) - " : " cmd - (if keep-generated-files " -k" "") - " -regenerate-import-libraries" - (if modules " -J" "") " -M" - " -setup-mode -static -I " srcdir - " -emit-link-file " (qs* lfile platform) - (if (eq? mode 'host) " -host" "") - " -D compiling-extension -c -unit " name - " -D compiling-static-extension" - " -C -I" srcdir " " (joins opts platform) - " " src " -o " out2) +(print-build-command targets +`(,@(filelist srcdir source-dependencies) ,src ,eggfile +
Re: [PATCH] simplification/refactoring
On Sun, Oct 30, 2022 at 02:48:27PM +0100, felix.winkelm...@bevuta.com wrote: > in egg-compile.scm, as suggested by Peter. Thanks, pushed. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] always check servers if version not given
On Fri, Oct 28, 2022 at 05:47:22PM +1300, Evan Hanson wrote: > Applied! > > Lots of tickets closed lately, very cool. Indeed. I've taken the liberty of adding these two changes to the NEWS file. Even though they're relatively minor, they correspond to tickets/bug reports by users, so it's worthwhile adding it there. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] fix for double-quoting behaviour in egg-compilation
On Tue, Oct 25, 2022 at 12:09:55PM +0200, Peter Bex wrote: > Also, we forgot to look at Salmonella or we'd noticed: > http://salmonella-linux-x86-64.call-cc.org/master/gcc/linux/x86-64/2022/10/24/yesterday-diff/ So above, that's "-151" for succesfull installation, i.e. we broke 151 eggs (either directly or through transitive dependencies). I checked Salmonella after the fixes and it fixed 147 eggs: http://salmonella-linux-x86-64.call-cc.org/master/gcc/linux/x86-64/2022/10/26/yesterday-diff/ That leaves 4 eggs that are broken still. It turns out these are: - expat (Felix and Shawn Rutledge) - dbus (Shawn Rutledge) - icu (Diego A. Mundo) - espeak (Diego A. Mundo) These eggs all rely on something a la the following in their .egg files: (csc-options -C "`pkg-config --cflags expat`" -L "`pkg-config --libs expat`") I would argue that's a bug and relying on undocumented, unintented functionality. Since it's only 4 eggs I think it's reasonable to expect their authors to fix the eggs themselves. Probably this requires using a custom build script like we do in for example imlib2: https://anonym...@code.call-cc.org/svn/chicken-eggs/release/5/imlib2/trunk/ Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] fix for double-quoting behaviour in egg-compilation
On Mon, Oct 24, 2022 at 05:23:24PM +0200, felix.winkelm...@bevuta.com wrote: > As reported by Kon, our previous "fix" for shell argument quoting > exposed another bug, this is an attempt to improve on that. Thanks, pushed. Also, we forgot to look at Salmonella or we'd noticed: http://salmonella-linux-x86-64.call-cc.org/master/gcc/linux/x86-64/2022/10/24/yesterday-diff/ One question though: "joins" is essentially the same as "arglist", isn't it? I mean, it's used in different contexts but the effect is the same. Perhaps we should merge them. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] get rid of obscure mac resource file
On Fri, Oct 21, 2022 at 03:58:42PM +0200, felix.winkelm...@bevuta.com wrote: > This stuff is obscure and, as described in #1774, packaging/"deploying" > applications is something > that should be left to application developers. Agreed, patch pushed. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] handling of prelude/postlude forms
On Mon, Oct 24, 2022 at 11:23:09AM +0200, felix.winkelm...@bevuta.com wrote: > The attached patch addresses the broken semantics of -prelude/-postlude > in the compiler driver, as reported by sjamaan. Thanks, pushed! I wonder how this got so broken. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] slight improvement in option quoting for csc
On Sun, Oct 23, 2022 at 12:36:20PM +0200, felix.winkelm...@bevuta.com wrote: > > On Fri, Oct 21, 2022 at 04:36:58PM +0200, felix.winkelm...@bevuta.com wrote: > > > See #1302. > > > > Why do we even attempt to detect whether to quote? That looks a bit > > brittle to me. It should be simpler to just always quote (e.g. using "qs"), > > like we do in other places, for example in chicken-install, or is there > > some reason we can't do that here? > > > > Ok, that is of course a much simpler approach, please find attached a new > patch. Thanks, pushed! Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] slight improvement in option quoting for csc
On Fri, Oct 21, 2022 at 04:36:58PM +0200, felix.winkelm...@bevuta.com wrote: > See #1302. Why do we even attempt to detect whether to quote? That looks a bit brittle to me. It should be simpler to just always quote (e.g. using "qs"), like we do in other places, for example in chicken-install, or is there some reason we can't do that here? Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] allow list-notation for -R option
On Fri, Oct 21, 2022 at 11:26:05AM +0200, felix.winkelm...@bevuta.com wrote: > See #1809, as suggested by Zipheir. Thanks, pushed! I also took the liberty of adding Zipheir to the acknowledgments and adding the change to the NEWS file. > +(define (string-trim str) > + (let loop ((front 0) > + (back (sub1 (string-length str > +(cond ((= front back) "") > + ((char-whitespace? (string-ref str front)) > + (loop (add1 front) back)) > + ((char-whitespace? (string-ref str back)) > + (loop front (sub1 back))) > + (else (substring str front (add1 back)) I was a bit surprised to remember that we don't really have this in (chicken string), all we have is the somewhat whimsical string-chomp. First I thought maybe moving this definition to (chicken string), but then I realised we probably want to at least somehow specify what's considered whitespace, but then you end up with a slow implementation. And we do have a version of this in srfi-13 (which includes a predicate/char-set argument), so maybe we don't really need it after all. But still, it's a bit weird not to have this in the standard lib... What do the other hackers think? Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] chicken-do should warn but ignore missing dependencies
On Tue, Oct 12, 2021 at 12:13:17PM +0200, felix.winkelm...@bevuta.com wrote: > This patch changes chicken-do to emit a warning when a dependency is missing, > but lets it still continue. That way missing dependencies in egg files can at > least > not break the build. Why not break the build though? A missing dependency is an error, isn't it? With all the output you get, it's easy to overlook this warning, but if it halts, that at least would get my attention ;) Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] fix #1797
On Thu, May 19, 2022 at 12:03:16PM +0200, felix.winkelm...@bevuta.com wrote: > This patch addresses the unnecessary interning in char-comparison > procedures. Thanks, pushed! Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] disable umask POSIX tests on windows
On Sat, Mar 26, 2022 at 05:59:24PM +0100, felix.winkelm...@bevuta.com wrote: > Hi! > > Attached a patch to address the issue reported by Claude Marinier. > I tried to understand what the expected behaviour of umask should > be, but in the end I just dropped the whole test, since I don't > understand how the permission bits on cygwin ought to behave. Agreed, let's just drop it. I don't have the time or energy to try and understand it fully. Pushed. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] additional tests for fp*+
On Thu, May 19, 2022 at 02:34:09PM +0200, felix.winkelm...@bevuta.com wrote: > Oh boy... I should have looked at that a bit more seriously, Sorry about that. No worries! > Attached a new patch. Thanks, pushed! Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] additional tests for fp*+
On Thu, May 19, 2022 at 12:01:43PM +0200, felix.winkelm...@bevuta.com wrote: > These tests were contributed by Christian Himpe. > +;;; Tests contributed by Christian Himpe: > + > +;; dummy testee > +(define (fp*+ a b c) > + (+ (* a b) c)) > + IIUC that means these tests aren't actually exercising the real fp*+. Is there a reason to override it like this? Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] Fix the type of make-absolute-pathname
On Sun, Feb 20, 2022 at 01:58:31PM +0100, felix.winkelm...@bevuta.com wrote: > > Hi all, > > > > First time contributing to CHICKEN itself. Is this all? > > > > Basically, yes. Attached is a signed off copy of the patch, now > we wait for some other core developer to apply it... Sorry for the long wait. Pushed! > Thanks for your contribution! +1 to that, much appreciated. Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] Don't add LIBDIR to rpath by default
On Sat, Jan 29, 2022 at 12:08:17PM +0100, Sören Tempel wrote: > I don't think there is a semi-portable way of doing this, or at least I > am personally not aware of one. However, maybe a Makefile configuration > variable could be introduced specifically for creating CHICKEN Scheme > packages for a distribution (for example as in `make PACKAGING=1 > install`)? If $PACKAGING is set it could then be assumed that $(LIBDIR) > and other variables point to directories which are already in the > default search path. Hi Sören, I think that too is too simplistic - as explained before, in for example pkgsrc on NetBSD you still need to set the rpath, even though that's a package manager. And when you run pkgsrc on Linux, I suppose the rpath might be unnecessary, strictly speaking. So, in my opinion it doesn't seem to be a packaging/non-packaging question. It's more a question of whether the libdir to be installed to happens to be in the default search path, which is highly OS-dependent. But, let me flip the question around: how exactly is it *wrong* to include a "redundant" rpath? Technically speaking, is there a good reason to want to avoid explicitly baking in the rpath for default search paths? Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] Don't add LIBDIR to rpath by default
On Sun, Jan 23, 2022 at 12:54:58PM +0100, Sören Tempel wrote: > I recently started working on Alpine packages for Scheme software which > I compiled with CHICKEN. While doing so, I noticed that csc(1) > unconditionally adds $(LIBDIR) as an rpath entry by adding > -Wl,-rpath=$(LIBDIR) to the linker flags. I find this surprising as (in > the common case) LIBDIR should already be part of the default rpath on > (hopefully) all systems (otherwise it doesn't make sense to install .so > files to it). Since this rpath entry is redundant it causes a warning on > Alpine. Hi Sören, Thanks for pointing this out. If memory serves, we started adding rpath because BSD systems require it. They don't have a ld.so.conf or something similar (I don't know what modern Linux uses to set the default rpath). Even on Linux, many people also install CHICKEN to nonstandard locations. For instance, if you don't have root, it's very convenient to install CHICKEN to your homedir, and in our own infrastructure we typically install CHICKEN to /usr/local/chickens/ and then symlink to it as /usr/local/chickens/qwiki for example, to indicate that the CHICKEN installation for qwiki is . This allows us to gradually upgrade our infrastructure. > I was wondering if it would be possible to only add additional rpath > entries if -deployed is given or non-host-mode is used. The attached > git-format-patch(1) implements this. Would it be possible to integrate > it upstream? Given what I said above, I think simply removing the rpath entirely for host-local builds is not the right thing to do, as changing the C flags to include -rpath is too onerous and obscure for users to care about. Is there a simple way of detecting whether an rpath is the default? Cheers, Peter signature.asc Description: PGP signature
[PATCH] Remove some dead code from runtime.c
Hi all, In #chicken Guest-liao was asking some questions about compiling runtime.c manually, and I noticed that they didn't have chicken-config.h, and passing in -DHAVE_CHICKEN_CONFIG_H is needed. While looking into this, I noticed we also used to have a HAVE_CONFIG_H macro which got checked and defined, but this no longer seems to be in use. The attached patch removes the obsolete #ifdef HAVE_CONFIG_H block from runtime.c Cheers, Peter From dba340c1fe1b3e966e69ee51bb951b121e84079a Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Wed, 20 Oct 2021 09:38:45 +0200 Subject: [PATCH] Remove obsolete HAVE_CONFIG_H check in runtime.c We never define this, only HAVE_CHICKEN_CONFIG_H, so this entire block of #ifdef can be removed. Also, PACKAGE and VERSION are not used anywhere, so it's pointless to undefine them. --- chicken.h | 2 +- runtime.c | 14 -- 2 files changed, 1 insertion(+), 15 deletions(-) diff --git a/chicken.h b/chicken.h index 7e51a38f..cc3be8ff 100644 --- a/chicken.h +++ b/chicken.h @@ -44,7 +44,7 @@ /* * N.B. This file MUST not rely upon "chicken-config.h" */ -#if defined(HAVE_CONFIG_H) || defined(HAVE_CHICKEN_CONFIG_H) +#if defined(HAVE_CHICKEN_CONFIG_H) # include "chicken-config.h" #endif diff --git a/runtime.c b/runtime.c index c46cde6f..981996f4 100644 --- a/runtime.c +++ b/runtime.c @@ -130,20 +130,6 @@ static C_TLS int timezone; # include #endif -#ifdef HAVE_CONFIG_H -# ifdef PACKAGE -# undef PACKAGE -# endif -# ifdef VERSION -# undef VERSION -# endif -# include - -# ifndef HAVE_ALLOCA -# error this package requires "alloca()" -# endif -#endif - /* Parameters: */ #define RELAX_MULTIVAL_CHECK -- 2.31.1 signature.asc Description: PGP signature
Re: [PATCH] Default to "cc" on *BSD systems
On Tue, Oct 12, 2021 at 12:18:52PM +0200, felix.winkelm...@bevuta.com wrote: > See attached patch. This change is not intended for the current RC. Thanks, applied. I changed 5.4.0 to 5.3.1 in the NEWS file, as this change would appear in the first snapshot we make (assuming we will make one during the 5.3.0->5.4.0 cycle). Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] Fix for #1788
On Tue, Sep 28, 2021 at 04:52:26PM +0200, felix.winkelm...@bevuta.com wrote: > Hi! > > The attached patch should fix the issue of builtin module ids appearing in > generated > ..link files. Thanks, pushed! Cheers, Peter signature.asc Description: PGP signature
[PATCH] Check argument type for setters in (chicken process-context posix)
Hi there, I saw this ticket come in and thought it's trivial enough that it's probably a good idea to make it into 5.3.0. In any case, even if not, this patch can always go into master. NOTE: It also fixes indentation for some of these setters. Sorry if that makes the patch slightly larger. Cheers, Peter From da9f177e085c8c49efe6cbda0b90939f7f1e759a Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Fri, 24 Sep 2021 08:25:07 +0200 Subject: [PATCH] Add checks to current user/group id setters in (chicken process-context posix) Fixes #1787 --- NEWS| 9 + manual/Acknowledgements | 4 ++-- posixunix.scm | 26 +++--- 3 files changed, 26 insertions(+), 13 deletions(-) diff --git a/NEWS b/NEWS index 3118461b..0ce3a1bd 100644 --- a/NEWS +++ b/NEWS @@ -1,3 +1,12 @@ +5.3.0rc4 + +- Core libraries + - In (chicken process-context posix), the setters for current-user-id, +current-effective-user-id and current-group-id now check that the +new user/group value is a fixnum instead of blindly passing it on +to the C implementation (which would cause bogus user ids to be set). +Fixes #1787, thanks to Christopher Brannon. + 5.3.0rc3 - Build system diff --git a/manual/Acknowledgements b/manual/Acknowledgements index fcaaa92f..8e6ac12b 100644 --- a/manual/Acknowledgements +++ b/manual/Acknowledgements @@ -7,8 +7,8 @@ Annis, Jason E. Aten, Marc Baily, Peter Barabas, Andrei Barbu, Jonah Beckford, Arto Bendiken, Andy Bennett, Kevin Beranek, Peter Bex, Jean-Francois Bignolles, Oivind Binde, Alaric Blagrave Snell-Pym, Dave Bodenstab, Fabian Böhlke, T. Kurt Bond, Ashley Bone, Dominique Boucher, -Terence Brannon, Roy Bryant, Adam Buchbinder, Hans Bulfone, "Category -5", Taylor Campbell, Naruto Canada, Mark Carter, Esteban U. Caamano +Christopher Brannon, Terence Brannon, Roy Bryant, Adam Buchbinder, Hans Bulfone, +"Category 5", Taylor Campbell, Naruto Canada, Mark Carter, Esteban U. Caamano Castro, Semih Cemiloglu, Alex Charlton, Franklin Chen, Joo ChurlSoo, Thomas Chust, Gian Paolo Ciceri, Fulvio Ciriaco, Paul Colby, Tobia Conforto, John Cowan, Grzegorz Chrupala, James Crippen, Evan Hanson, diff --git a/posixunix.scm b/posixunix.scm index e8cf8526..992d8eed 100644 --- a/posixunix.scm +++ b/posixunix.scm @@ -602,6 +602,7 @@ static int set_file_mtime(char *filename, C_word atime, C_word mtime) (getter-with-setter (foreign-lambda int "C_getuid") (lambda (id) + (##sys#check-fixnum id 'current-user-id) (when (fx< (##core#inline "C_setuid" id) 0) (##sys#update-errno) (##sys#error 'current-user-id!-setter "cannot set user ID" id) ) ) @@ -611,29 +612,32 @@ static int set_file_mtime(char *filename, C_word atime, C_word mtime) (getter-with-setter (foreign-lambda int "C_geteuid") (lambda (id) -(when (fx< (##core#inline "C_seteuid" id) 0) - (##sys#update-errno) - (##sys#error - 'effective-user-id!-setter "cannot set effective user ID" id) ) ) + (##sys#check-fixnum id 'current-effective-user-id) + (when (fx< (##core#inline "C_seteuid" id) 0) + (##sys#update-errno) + (##sys#error + 'effective-user-id!-setter "cannot set effective user ID" id) ) ) "(chicken.process-context.posix#current-effective-user-id)")) (set! chicken.process-context.posix#current-group-id (getter-with-setter (foreign-lambda int "C_getgid") (lambda (id) -(when (fx< (##core#inline "C_setgid" id) 0) - (##sys#update-errno) - (##sys#error 'current-group-id!-setter "cannot set group ID" id) ) ) + (##sys#check-fixnum id 'current-group-id) + (when (fx< (##core#inline "C_setgid" id) 0) + (##sys#update-errno) + (##sys#error 'current-group-id!-setter "cannot set group ID" id) ) ) "(chicken.process-context.posix#current-group-id)") ) (set! chicken.process-context.posix#current-effective-group-id (getter-with-setter (foreign-lambda int "C_getegid") (lambda (id) -(when (fx< (##core#inline "C_setegid" id) 0) - (##sys#update-errno) - (##sys#error - 'effective-group-id!-setter "cannot set effective group ID" id) ) ) + (##sys#check-fixnum id 'current-effective-group-id) + (when (fx< (##core#inline "C_setegid" id) 0) + (##sys#update-errno) + (##sys#error + 'effective-group-id!-setter "cannot set effective group ID" id) ) ) "(chicken.process-context.posix#current-effective-group-id)") ) (define-foreign-variable _user-name nonnull-c-string "C_user->pw_name") -- 2.31.1 signature.asc Description: PGP signature
[PATCH] Fix tests under Windows
Hi all, As Vasilij already reported, there are two failing tests under mingw. They initially fail for the same reason - they try to delete files that have been opened but not closed, which is an error under Windows. After fixing this, the posix-tests fail due to the fact that Windows doesn't fully support file modes. It doesn't know about execute permissions, and doesn't support making files write-only or having files with no permissions. To fix that, we can conceptually "right-extend" the user permissions over group and others after adding read permissions (as Windows does not have files without read permissions) and dropping execute permissions. In code, I cond-expand on Windows, shift the group and other bits out to obtain the user bits, and then dispatch on the values to get read and write access or read-only access. With these patches in place, the tests pass once again on mingw-msys. Why we never noticed before? The POSIX tests were extended in ffe553, and the read-lines-test was added in cfa1e7, both of which were added somewhere between 5.2.0 and 5.3.0rc1. And on *nix these tests work fine, of course... Cheers, Peter From 00f5fafd86f625c25231688a1b975c69bff630d1 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Thu, 2 Sep 2021 08:03:39 +0200 Subject: [PATCH 1/2] Ensure all ports are closed in tests when deleting the file Under Windows, read-lines-test.scm and posix-tests.scm would fail with "permission denied" upon deletion of the output file, but that's because there was still an open file handle to it (and on Windows you can't delete open files). In read-lines-test.scm, it is my favorite Scheme footgun: call-with-input-file doesn't close its port when the dynamic extent is exited via other ways than a regular procedure return. In posix-tests.scm it was a more mundane error - we opened a file using the descriptor-returning POSIX procedure file-open, but never closed the descriptor. --- tests/posix-tests.scm | 2 +- tests/read-lines-tests.scm | 7 +-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/tests/posix-tests.scm b/tests/posix-tests.scm index 361f55c1..26f495e7 100644 --- a/tests/posix-tests.scm +++ b/tests/posix-tests.scm @@ -98,7 +98,7 @@ (let ((mode (file-creation-mode))) (set! (file-creation-mode) umask) (delete-file* "posix-tests.out") - (file-open "posix-tests.out" open/creat given ...) + (file-close (file-open "posix-tests.out" open/creat given ...)) (assert (equal? (file-permissions "posix-tests.out") expected)) (set! (file-creation-mode) mode)) ;; default file mode diff --git a/tests/read-lines-tests.scm b/tests/read-lines-tests.scm index 28cdd223..e107c26f 100644 --- a/tests/read-lines-tests.scm +++ b/tests/read-lines-tests.scm @@ -46,8 +46,11 @@ EOF (test-error "with an invalid second argument (max)" (call-with-input-file input-test-file - (lambda (port) -(read-lines port 2.0 + (lambda (port) + (dynamic-wind + void + (lambda () (read-lines port 2.0)) + (lambda () (close-input-port port)) (test-end "read-lines") -- 2.31.1 From 9ce043d50ea6e7005bf2a434fada84fac5e3 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Thu, 2 Sep 2021 09:44:13 +0200 Subject: [PATCH 2/2] Fix posix-tests file permission test on Windows In Windows, there are three problems with this test: - Deleting read-only files gives a permission error. - Permissions don't know about "group" and "other" permissions - the permissions are always extended from the user so the rest match it. - It is impossible to give write-only or "no" permissions on a file. So we make files writable before attempting to delete them, and we provide a mapping for expected permissions so that we only look at the user part of the file mode, and always have it include "readable" as a permission. See also https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/umask and https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/remove-wremove --- tests/posix-tests.scm | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/tests/posix-tests.scm b/tests/posix-tests.scm index 26f495e7..1c695d9d 100644 --- a/tests/posix-tests.scm +++ b/tests/posix-tests.scm @@ -1,4 +1,5 @@ -(import (chicken pathname) +(import (chicken bitwise) + (chicken pathname) (chicken file) (chicken file posix) (chicken platform) @@ -89,17 +90,38 @@ (assert (not (get-environment-variable "FOO"))) ;; file creation and umask interaction +(define (permission-expectation original-expectation) + (cond-expand +;; In Windows, all files are always readable. You cannot give +
Re: [PATCH] Re: patch for chicken on 64-bit windows (msys and mingw-w64)
On Mon, Aug 30, 2021 at 08:10:46PM +0300, elf wrote: > I do use Windows regularly and am looking to improve chicken support for it. Hi Elf, Cool to hear. What are your thoughts on the current situation? Which flavour of Windows build should we keep and which should we drop (if any)? What about WSL and MSVC? One thing I've been wondering about for a while now: Do you think we should switch our scripts from CMD to PowerShell? Would that reduce some of the pain or just bring new and interesting failure modes because it's even further removed from the UNIX world view? Cheers, Peter signature.asc Description: PGP signature
[PATCH] Re: patch for chicken on 64-bit windows (msys and mingw-w64)
On Sun, Aug 29, 2021 at 09:21:38PM +0200, Mario Domenech Goulart wrote: > Hi, > > On Fri, 27 Aug 2021 18:09:09 +0200 felix.winkelm...@bevuta.com wrote: > > Using the already provided implementation is preferable, I'd say - one way > > to shift responsibility, which, in this utter mess of Windows API, seems to > > be > > the only way forward. > > I agree. Alright, I'm convinced. Attached is a patch to drop the damn thing. > > Do we already have test results on all supported Windows platforms? I lost > > track. > > I think we have a set of problems related to Windows, which is going to > be very hard for us to handle without help: > > * As far as I can tell, none of the CHICKEN developers use Windows > regularly. True. I have it running in a VM, but I always start it with a lot of trepidation and I don't fully understand the OS itself. For example, suddenly now "make check" has started to give a "permission denied - cannot delete read-lines.in" error where it's always worked before and I have no idea why it can't delete it. > * There are many combinations of Windows environments, leading to a > great variety of ways to break stuff. They all suck in their own way. I'm kind of thinking maybe we don't really need the cygwin build anymore now that there's WSL? > * We do not have automated tests on Windows. > > As a start, it would be nice to collect the information we need in a > wiki page (e.g., which variants of Windows we want to/can support, how > they differ etc.). At this point I'm lost at the combinations of > different Windows things that affect the build/execution of CHICKEN. > > I'd say we need help at that so that we can improve support on Windows. That's a catch-22 though: in order to get that help, we need CHICKEN in a useful-enough state that people can use it on Windows. And in general, out of the total set of users, only a small subset will actually contribute. Cheers, Peter From 1edd51e77982ebce33b165e481da8df3f7ec813a Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Mon, 30 Aug 2021 12:20:03 +0200 Subject: [PATCH] Drop custom opendir/readdir implementation in Windows Windows doesn't have these functions, but MinGW provides them as part of its crt library. Our own stub implementation had some issues which caused an error to trigger a loop which resulted in a hanging process when running chicken-install -update-db --- NEWS | 3 ++- file.scm | 82 ++-- 2 files changed, 4 insertions(+), 81 deletions(-) diff --git a/NEWS b/NEWS index c30c4395..00e79659 100644 --- a/NEWS +++ b/NEWS @@ -3,7 +3,8 @@ - Core libraries - The srfi-17 module now exports the `getter-with-setter` and `setter` procedures, not just the set! macro (thanks to Lassi Kortela) - + - Fix hang in chicken-install -update-db on Windows (thanks to Mark +Fisher for reporting and Jani Hakala for debugging and patch). 5.3.0rc1 diff --git a/file.scm b/file.scm index 18ce00f1..56e4c909 100644 --- a/file.scm +++ b/file.scm @@ -65,86 +65,8 @@ # define C_mkdir(str) C_fix(mkdir(C_c_string(str))) #endif -#if !defined(_WIN32) || defined(__CYGWIN__) -# include -# include -#else -struct dirent -{ -char * d_name; -}; - -typedef struct -{ -struct _finddata_t fdata; -int handle; -struct dirent current; -} DIR; - -static DIR * C_fcall -opendir(const char *name) -{ -int name_len = strlen(name); -int what_len = name_len + 3; -DIR *dir = (DIR *)malloc(sizeof(DIR)); -char *what; -if (!dir) -{ - errno = ENOMEM; - return NULL; -} -what = (char *)malloc(what_len); -if (!what) -{ - free(dir); - errno = ENOMEM; - return NULL; -} -C_strlcpy(what, name, what_len); -if (strchr("\\/", name[name_len - 1])) - C_strlcat(what, "*", what_len); -else - C_strlcat(what, "\\*", what_len); - -dir->handle = _findfirst(what, >fdata); -if (dir->handle == -1) -{ - free(what); - free(dir); - return NULL; -} -dir->current.d_name = NULL; /* as the first-time indicator */ -free(what); -return dir; -} - -static int C_fcall -closedir(DIR * dir) -{ -if (dir) -{ - int res = _findclose(dir->handle); - free(dir); - return res; -} -return -1; -} - -static struct dirent * C_fcall -readdir(DIR * dir) -{ -if (dir) -{ - if (!dir->current.d_name /* first time after opendir */ - || _findnext(dir->handle, >fdata) != -1) - { - dir->current.d_name = dir->fdata.name; - return >current; - } -} -return NULL; -} -#endif +#include +#include #define C_opendir(s,h) C_set_block_item(h, 0, (C_word) opendir(C_c_string(s))) #define C_readdir(h,e) C_set_block_item(e, 0, (C_word) readdir((DIR *)C_block_item(h, 0))) -- 2.31.1 signature.asc Description: PGP signature
Re: patch for chicken on 64-bit windows (msys and mingw-w64)
On Wed, Aug 25, 2021 at 09:29:31PM +0300, Jani Hakala wrote: > > Hi, > > This patch seems to fix the problem with chicken-install when > > chicken-install -defaults ./setup.defaults -update-db > > appears to hang when installing chicken. I've done some investigation. First off, Windows doesn't have dirent.h nor any of its functions in the C library. So I was wondering why it works if you simply drop this check and our custom implementation, and then simply include dirent.h, because it does! It appears that Mingw comes with its own implementation of dirent.h, which is included in its crt. I've also briefly checked mingw32 (the barely maintained "original" Mingw) and it looks like it has it too. I suppose the #ifdef check was either cargo-culted from somewhere else, came from a distant time when Mingw didn't have its own dirent.h or was still there from when we also supported building with MSVC. I'm a bit on the fence whether we should simply apply this patch (which I'm not really clear on if it should work and why) or drop the check completely and rely on Mingw's (hopefully nicely debugged, tested and maintained) version of dirent.h My gut says we should just drop our custom version and rely on Mingw's dirent.h , but that is a bigger change for the RC (but we can ask people to test again of course) and would make supporting MSVC in the future a bit more difficult again. Thoughts? Opinions? Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] Export procedures from srfi-17
On Thu, Aug 19, 2021 at 01:58:48PM +0200, Peter Bex wrote: > PS: aren't all the register-primitive-module invocations for srfis wrong, > in that they register a module under a similarly named unit, but there > is no such unit library (in any of these cases)? I thought about it some more and realised that the "module-library" field of the module record type is only used to emit (##core#require ...) in chicken.syntax#import-definition, and ##core#require treats "builtin-features" as being already linked, which includes the srfis. It also treats "core-units" as being already linked, which includes "library" among others, of course. I think we can simplify this logic by dropping the builtin-features bit, as that seems like it's a hack specific for SRFIs. I've filed https://bugs.call-cc.org/ticket/1777 for this. Of course not something we should do for 5.3. Cheers, Peter signature.asc Description: PGP signature
[PATCH] Export procedures from srfi-17
Hi all, Lassi Kortela posted a patch on IRC to fix the fact that our SRFI-17 module doesn't export the "setter" and "getter-with-setter" procedures. If this is approved, please also apply it to the prerelease branch, as it is both trivial and important enough to include. PS: aren't all the register-primitive-module invocations for srfis wrong, in that they register a module under a similarly named unit, but there is no such unit library (in any of these cases)? In other words, shouldn't register-primitive-module be defined like so: ;; same as register-core-module, but uses "library" as the library name (define (##sys#register-primitive-module name vexports #!optional (sexports '())) (##sys#register-core-module name 'library vexports sexports)) Or, perhaps, it should be #f, since module-library can be false and syntax-only libraries don't require a unit to be linked in. Cheers, Peter From 86530c17b873c5063ff56275ea4d8e4af29d123f Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Thu, 19 Aug 2021 13:45:29 +0200 Subject: [PATCH] Also export setter and getter-with-setter from builtin srfi-17 module Thanks to Lassi Kortela for the patch. --- NEWS| 9 - modules.scm | 7 +-- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index 051b212d..c30c4395 100644 --- a/NEWS +++ b/NEWS @@ -1,4 +1,11 @@ -5.3.0 +5.3.0rc2 + +- Core libraries + - The srfi-17 module now exports the `getter-with-setter` and `setter` +procedures, not just the set! macro (thanks to Lassi Kortela) + + +5.3.0rc1 - Core libraries - Fixed an inadvertant error during error reporting in the `numerator` diff --git a/modules.scm b/modules.scm index 36d32032..cbe48fe7 100644 --- a/modules.scm +++ b/modules.scm @@ -1152,8 +1152,11 @@ (##sys#register-primitive-module 'srfi-16 '() (se-subset '(case-lambda) ##sys#chicken.base-macro-environment)) -(##sys#register-primitive-module - 'srfi-17 '() (se-subset '(set!) ##sys#default-macro-environment)) +(##sys#register-core-module + 'srfi-17 'library + '((getter-with-setter . chicken.base#getter-with-setter) + (setter . chicken.base#setter)) + (se-subset '(set!) ##sys#default-macro-environment)) (##sys#register-core-module 'srfi-23 'library '((error . chicken.base#error))) -- 2.30.2 signature.asc Description: PGP signature
Re: [PATCH] use "share/chicken" also on compilation target
On Fri, Aug 06, 2021 at 10:22:42PM +0200, felix.winkelm...@bevuta.com wrote: > This patch changes TARGET_SHARE_HOME to also include the "chicken" suffix, > so it should be identical to INSTALL_SHARE_HOME in normal situations (when > not cross-compiling). This also addresses the installation location of "mac.r" > (only on macs, which should use IDATADIR and not ISHAREDIR). Thanks, pushed! Cheers, Peter signature.asc Description: PGP signature
Release process?
Hi all, Evan just closed the last ticket in our roadmap for 5.3 (https://bugs.call-cc.org/roadmap). So I think now is a good time to start the (pre)release process. However, there are a few questions remaining regarding uninstallation of files and the file tree in the README. Should we wait until those are resolved? Is anyone working on them at all? Cheers, Peter signature.asc Description: PGP signature
Re: Layout of files installed under "share"
On Tue, Aug 03, 2021 at 08:07:33PM +0300, Lassi Kortela wrote: > README shows the following layout for files installed under "share": > > `-- share > |-- chicken > | |-- doc > | | |-- LICENSE > | | |-- README > | | |-- DEPRECATED > | | |-- mac.r (Macintosh) > | | |-- CHICKEN.icns (Macintosh) > | | |-- manual-html > | | |-- chicken.png > | | |-- feathers.tcl > | | `-- *.html > | `-- setup.defaults > > Should "mac.r", "CHICKEN.icns", and "feathers.tcl" go directly under > "share/chicken" like "setup.defaults" does, instead of going under > "share/chicken/doc"? hm, it looks like feathers.tcl is definitely in the wrong place in the diagram. I don't know about the mac-specific files, but at least the installation rules seem to indicate they're supposed to go directly under share/chicken too. > Could someone more experienced than me sanity-check the diagram for > "share" in README, as well as the corresponding install and uninstall > commands in rules.make? Those mac-specific files should also be uninstalled, they're currently only installed. Cheers, Peter signature.asc Description: PGP signature
Re: `make uninstall` does not uninstall the chicken-do manpage
On Tue, Aug 03, 2021 at 07:06:23PM +0300, Lassi Kortela wrote: > Patch attached to fix the problem. Thanks, pushed! > Even with this patch applied, `make uninstall` still leaves behind some > empty directories: > > |-- bin > |-- include > | `-- chicken > |-- lib > | `-- chicken > `-- share > `-- man > `-- man1 > > This is probably OK? I think this is normal - AFAIK most programs leave around directories. But I suppose we could delete lib/chicken and include/chicken. What do the other hackers think? Cheers, Peter signature.asc Description: PGP signature
Re: [PATCH] Add regression test for #1771
And now with reexport-m9.scm and reexport-m10.scm From ab6b7e798151e8af2d987e91acdf86f98f0251f4 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Sun, 1 Aug 2021 15:35:44 +0200 Subject: [PATCH] Add regression test for #1771 This bug was fixed by the fix for #1772, but this was supposed to only be a performance improvement, so the fix was "accidental" (and, accordingly, there was no change to the test suite). Therefore it's a good idea to add a regression test. --- NEWS | 2 ++ distribution/manifest | 2 ++ tests/reexport-m10.scm | 3 +++ tests/reexport-m9.scm | 6 ++ tests/reexport-tests-2.scm | 5 + tests/runtests.bat | 4 tests/runtests.sh | 2 ++ 7 files changed, 24 insertions(+) create mode 100644 tests/reexport-m10.scm create mode 100644 tests/reexport-m9.scm diff --git a/NEWS b/NEWS index 899ddb2f..91507e1d 100644 --- a/NEWS +++ b/NEWS @@ -36,6 +36,8 @@ - Module system - Reexported macros now work when the reexporting module redefines identifiers from the original (fixes #1757, reported by Sandra Snan). + - When using "except" in "import" to omit reexported macros, +they are really not imported (#1771, reported by Sandra Snan). - Runtime system - Sleeping primordial thread doesn't forget mutations made to diff --git a/distribution/manifest b/distribution/manifest index 53aeaa4f..00a8c515 100644 --- a/distribution/manifest +++ b/distribution/manifest @@ -198,6 +198,8 @@ tests/reexport-m5.scm tests/reexport-m6.scm tests/reexport-m7.scm tests/reexport-m8.scm +tests/reexport-m9.scm +tests/reexport-m10.scm tests/reexport-tests.scm tests/reexport-tests-2.scm tests/ec.scm diff --git a/tests/reexport-m10.scm b/tests/reexport-m10.scm new file mode 100644 index ..f485bad2 --- /dev/null +++ b/tests/reexport-m10.scm @@ -0,0 +1,3 @@ +(module reexport-m10 () + (import (only reexport-m9 define) (chicken module)) + (reexport (only reexport-m9 define))) diff --git a/tests/reexport-m9.scm b/tests/reexport-m9.scm new file mode 100644 index ..8bc3a425 --- /dev/null +++ b/tests/reexport-m9.scm @@ -0,0 +1,6 @@ +(module reexport-m9 (define) + (import (only scheme define-syntax syntax-rules) (only (chicken base) error)) + (define-syntax define +(syntax-rules () + ((_ a b) + (error "wrong define!") diff --git a/tests/reexport-tests-2.scm b/tests/reexport-tests-2.scm index fbaad6cf..8f57808b 100644 --- a/tests/reexport-tests-2.scm +++ b/tests/reexport-tests-2.scm @@ -11,3 +11,8 @@ (import reexport-m8) (assert (equal? '(d c b a) (reexported-reverse '(a b c d (assert (equal? '(d c b a) (reexported-local-reverse '(a b c d + +;; Regression test for #1771 where reexports would get ignored by +;; "except" specifier. +(import (except reexport-m10 define)) +(define something 1) ;; If reexport messed up, this would be syntax from reexport-m9 expanding to (error ...) diff --git a/tests/runtests.bat b/tests/runtests.bat index b696899c..8411ec72 100644 --- a/tests/runtests.bat +++ b/tests/runtests.bat @@ -334,6 +334,10 @@ if errorlevel 1 exit /b 1 if errorlevel 1 exit /b 1 %compile_s% reexport-m8.scm -J if errorlevel 1 exit /b 1 +%compile_s% reexport-m9.scm -J +if errorlevel 1 exit /b 1 +%compile_s% reexport-m10.scm -J +if errorlevel 1 exit /b 1 %compile% reexport-tests-2.scm if errorlevel 1 exit /b 1 a.out diff --git a/tests/runtests.sh b/tests/runtests.sh index 11503b3c..e7dd37b3 100755 --- a/tests/runtests.sh +++ b/tests/runtests.sh @@ -264,6 +264,8 @@ $compile_s reexport-m5.scm -J $compile_s reexport-m6.scm -J $compile_s reexport-m7.scm -J $compile_s reexport-m8.scm -J +$compile_s reexport-m9.scm -J +$compile_s reexport-m10.scm -J $compile reexport-tests-2.scm ./a.out -- 2.20.1 signature.asc Description: PGP signature
[PATCH] Add regression test for #1771
Hi all, As Evan found out, my fix for #1772 happened to also fix #1771, but that was not what I expected. So, I think it's a good idea to add a regression test so that the bug can't sneak back in when we make further changes to the import stuff. Cheers, Peter From 084b189e6adaf4809649ee44e72bec0c8115b0cd Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Sun, 1 Aug 2021 15:35:44 +0200 Subject: [PATCH] Add regression test for #1771 This bug was fixed by the fix for #1772, but this was supposed to only be a performance improvement, so the fix was "accidental" (and, accordingly, there was no change to the test suite). Therefore it's a good idea to add a regression test. --- NEWS | 2 ++ distribution/manifest | 2 ++ tests/reexport-tests-2.scm | 5 + tests/runtests.bat | 4 tests/runtests.sh | 2 ++ 5 files changed, 15 insertions(+) diff --git a/NEWS b/NEWS index 899ddb2f..91507e1d 100644 --- a/NEWS +++ b/NEWS @@ -36,6 +36,8 @@ - Module system - Reexported macros now work when the reexporting module redefines identifiers from the original (fixes #1757, reported by Sandra Snan). + - When using "except" in "import" to omit reexported macros, +they are really not imported (#1771, reported by Sandra Snan). - Runtime system - Sleeping primordial thread doesn't forget mutations made to diff --git a/distribution/manifest b/distribution/manifest index 53aeaa4f..00a8c515 100644 --- a/distribution/manifest +++ b/distribution/manifest @@ -198,6 +198,8 @@ tests/reexport-m5.scm tests/reexport-m6.scm tests/reexport-m7.scm tests/reexport-m8.scm +tests/reexport-m9.scm +tests/reexport-m10.scm tests/reexport-tests.scm tests/reexport-tests-2.scm tests/ec.scm diff --git a/tests/reexport-tests-2.scm b/tests/reexport-tests-2.scm index fbaad6cf..8f57808b 100644 --- a/tests/reexport-tests-2.scm +++ b/tests/reexport-tests-2.scm @@ -11,3 +11,8 @@ (import reexport-m8) (assert (equal? '(d c b a) (reexported-reverse '(a b c d (assert (equal? '(d c b a) (reexported-local-reverse '(a b c d + +;; Regression test for #1771 where reexports would get ignored by +;; "except" specifier. +(import (except reexport-m10 define)) +(define something 1) ;; If reexport messed up, this would be syntax from reexport-m9 expanding to (error ...) diff --git a/tests/runtests.bat b/tests/runtests.bat index b696899c..8411ec72 100644 --- a/tests/runtests.bat +++ b/tests/runtests.bat @@ -334,6 +334,10 @@ if errorlevel 1 exit /b 1 if errorlevel 1 exit /b 1 %compile_s% reexport-m8.scm -J if errorlevel 1 exit /b 1 +%compile_s% reexport-m9.scm -J +if errorlevel 1 exit /b 1 +%compile_s% reexport-m10.scm -J +if errorlevel 1 exit /b 1 %compile% reexport-tests-2.scm if errorlevel 1 exit /b 1 a.out diff --git a/tests/runtests.sh b/tests/runtests.sh index 11503b3c..e7dd37b3 100755 --- a/tests/runtests.sh +++ b/tests/runtests.sh @@ -264,6 +264,8 @@ $compile_s reexport-m5.scm -J $compile_s reexport-m6.scm -J $compile_s reexport-m7.scm -J $compile_s reexport-m8.scm -J +$compile_s reexport-m9.scm -J +$compile_s reexport-m10.scm -J $compile reexport-tests-2.scm ./a.out -- 2.20.1 signature.asc Description: PGP signature
Re: [PATCH] Fix #1757 by not merging module environment into the syntactic environments of reexported macros
On Sat, Jul 31, 2021 at 04:05:52PM +1200, Evan Hanson wrote: > I was wondering whether there are any implications in changing the order > of the `sexports` field of the module, now that we're putting all > "normal" syntax exports before any re-exported ones. However, after a > bit of digging, I don't think this should matter? Good call, I hadn't realised this was changing the order. Although you're right - the order shouldn't really matter. Cheers, Peter signature.asc Description: PGP signature
[PATCH] Fix memory leak (#1772) when importing modules more than once
Hi all, Attached is a straightforward fix for #1772. I put it on the milestone 5.4, but since it's such a simple fix, I think we could include it in 5.3 anyway. Cheers, Peter From 70e45fe529d51421fbeea8828dc1704e3a821d40 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Thu, 22 Jul 2021 13:22:00 +0200 Subject: [PATCH] Use merge-se, not append in ##sys#import to fix memory leak (#1772) Using append on the current module (meta) environment in "import" will extend the environment correctly, but it will grow the list. Instead, we want to replace add those bindings which are new (and possibly replace existing bindings of the same name). This is what merge-se was made for. This potentially slows down performance on import, but should be relatively harmless. --- NEWS| 4 modules.scm | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index e65f885a..060a9d0e 100644 --- a/NEWS +++ b/NEWS @@ -81,6 +81,10 @@ - Fixed bug in chicken-install regarding variable quotation on UNIX-like systems which prevented installation into paths with spaces (#1685). +- Module system + - Fixed a memory leak when calling (import) multiple times in a row +on the same module (#1772; reported by "plugd" on IRC). + 5.2.0 - Core libraries diff --git a/modules.scm b/modules.scm index 0ba1df49..d91178ff 100644 --- a/modules.scm +++ b/modules.scm @@ -820,8 +820,8 @@ cm (merge-se (module-iexports cm) vsi)) (dm "export-list: " (module-export-list cm))) -(import-env (append vsv (import-env))) -(macro-env (append vss (macro-env) +(import-env (merge-se (import-env) vsv)) +(macro-env (merge-se (macro-env) vss (define (module-rename sym prefix) (##sys#string->symbol -- 2.20.1 signature.asc Description: PGP signature
[PATCH] Remove severely outdated README section
Hi all, As I was helping someone on IRC, I glanced over the README and noticed a section talking about compatibility, but that mentioned only the macro system and module system rewrite from CHICKEN 4. We could add information about a CHICKEN 4 to 5 upgrade, but I think the 5 release is old enough and most people have migrated already anyway, and the instructions on the wiki are much more complete anyway. The README is pretty long already, so I think it's fine to just drop this section. That's what the attached patch does. Cheers, Peter From 668e97444f9588788774978c31e5cbcf50dcb561 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Wed, 21 Jul 2021 11:35:18 +0200 Subject: [PATCH] Remove severely outdated README section on compatibility This mentioned only the upgrade from 3 to 4, and had nothing about 4 to 5 anyway. Better to remove it completely to avoid confusion. --- README | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/README b/README index b2cc4fa5..20c4546b 100644 --- a/README +++ b/README @@ -687,18 +687,7 @@ _/_/_/_/_/_/ _/_/_/_/_/ extensions for Scheme and CHICKEN programming. - 8. Compatibility notes - -In CHICKEN 4, the macro system has been reimplemented - completely and provides module system, which has considerably - more flexibility and power, but will require the - re-implementation of macros in code that previously was used - with CHICKEN 3. Notably, `define-macro' is not available - anymore. See the manual on how to translate such macros to - low-level hygienic macros or ask on the CHICKEN mailing list. - - - 9. What's next? + 8. What's next? If you find any bugs, or want to report a problem, please send a detailed bug report. -- 2.20.1 signature.asc Description: PGP signature
[PATCH] Fix #1727 with patch from ticket
Hi all, I finally gathered enough gumption to spin up a Windows VM and test the patch from #1727. It seems to work properly, so here is a signed-off version of the patch. Cheers, Peter From 46fd66aef7b2345a92ac944282b313a8a584202d Mon Sep 17 00:00:00 2001 From: Vasilij Schneidermann Date: Sun, 11 Apr 2021 20:01:23 +0200 Subject: [PATCH] Correctly quote set calls in Windows scripts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit According to ​https://ss64.com/nt/set.html, the set command should quote its entire argument, including the name. Likewise, quotes inside paths are illegal, therefore no quotes should be used inside PATH. It appears that the Windows shell cannot deal with quoting inside an argument, only when quotes surround the argument. This is not the case on Unix environments at all. Closes #1727 Signed-off-by: Peter Bex --- NEWS| 3 +++ egg-compile.scm | 16 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/NEWS b/NEWS index e65f885a..90d62c7d 100644 --- a/NEWS +++ b/NEWS @@ -80,6 +80,9 @@ cache directory matching its program name (#1713, thanks to Alice Maz) - Fixed bug in chicken-install regarding variable quotation on UNIX-like systems which prevented installation into paths with spaces (#1685). + - Fixed a similar bug in chicken-install for paths with spaces on mingw +and mingw-msys (#1727, thanks to Josh Helzer for reporting and Vasilij +Schneidermann for the patch) 5.2.0 diff --git a/egg-compile.scm b/egg-compile.scm index 186a7227..205505a6 100644 --- a/egg-compile.scm +++ b/egg-compile.scm @@ -1128,16 +1128,16 @@ EOF ((windows) (printf #< signature.asc Description: PGP signature
[PATCH] Fix #1757 by not merging module environment into the syntactic environments of reexported macros
Hi all, Here's a reasonably straightforward patch (I hope) for #1757. Commit should speak for itself, even if the stuff itself is a bit tricky. I'm not 100% sure about the other places where "sexps" is used, if we should use the local macros only there, or not. All the tests seem to work if we keep it referring to only local macros, and I think we shouldn't mess too much with the environments where that's not needed, so I think it's fine. Cheers, Peter From 96827b77554e8b56838b0e47363326011b6e Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Wed, 14 Jul 2021 12:18:56 +0200 Subject: [PATCH] Don't merge syntactic environment for reexported macros (#1757) Separate the syntactic exports into local and reexported macros, so we can merge the syntactic environment into the local macros. The original code would merge the reexporting module's syntactic environment into every macro it exports, but this is incorrect for reexported macros - those should stick with their original syntactic environment, as it was in the defining module. --- NEWS | 4 distribution/manifest | 2 ++ modules.scm| 14 -- tests/reexport-m7.scm | 17 + tests/reexport-m8.scm | 8 tests/reexport-tests-2.scm | 6 ++ tests/runtests.bat | 4 tests/runtests.sh | 2 ++ 8 files changed, 51 insertions(+), 6 deletions(-) create mode 100644 tests/reexport-m7.scm create mode 100644 tests/reexport-m8.scm diff --git a/NEWS b/NEWS index e65f885a..e0e2147d 100644 --- a/NEWS +++ b/NEWS @@ -33,6 +33,10 @@ - Made topological-sort behave better when dependency target is listed multiple times by concatenating dependencies (fixes #1185). +- Module system + - Reexported macros now work when the reexporting module redefines +identifiers from the original (fixes #1757, reported by Sandra Snan). + - Runtime system - Sleeping primordial thread doesn't forget mutations made to parameters in interrupt handlers anymore. (See #1638. Fix diff --git a/distribution/manifest b/distribution/manifest index 5416eff7..53aeaa4f 100644 --- a/distribution/manifest +++ b/distribution/manifest @@ -196,6 +196,8 @@ tests/reexport-m3.scm tests/reexport-m4.scm tests/reexport-m5.scm tests/reexport-m6.scm +tests/reexport-m7.scm +tests/reexport-m8.scm tests/reexport-tests.scm tests/reexport-tests-2.scm tests/ec.scm diff --git a/modules.scm b/modules.scm index 0ba1df49..77c9af52 100644 --- a/modules.scm +++ b/modules.scm @@ -385,16 +385,18 @@ 'import "cannot find implementation of re-exported syntax" name (let* ((sexps - (map (lambda (se) - (if (symbol? se) - (find-reexport se) - (list (car se) #f (##sys#ensure-transformer (cdr se) (car se) - sexports)) + (filter-map (lambda (se) + (and (not (symbol? se)) + (list (car se) #f (##sys#ensure-transformer (cdr se) (car se) + sexports)) + (reexp-sexps + (filter-map (lambda (se) (and (symbol? se) (find-reexport se))) + sexports)) (nexps (map (lambda (ne) (list (car ne) #f (##sys#ensure-transformer (cdr ne) (car ne sdefs)) - (mod (make-module name lib '() vexports sexps iexports)) + (mod (make-module name lib '() vexports (append sexps reexp-sexps) iexports)) (senv (if (or (not (null? sexps)) ; Only macros have an senv (not (null? nexps))) ; which must be patched up (merge-se diff --git a/tests/reexport-m7.scm b/tests/reexport-m7.scm new file mode 100644 index ..3c62fabf --- /dev/null +++ b/tests/reexport-m7.scm @@ -0,0 +1,17 @@ +(module reexport-m7 +(reexported-reverse (reexported-local-reverse my-reverse)) + + (import scheme (chicken syntax)) + + (define (my-reverse lis) +(reverse lis)) + + (define-syntax reexported-local-reverse +(syntax-rules () + ((_ lis) + (my-reverse lis + + (define-syntax reexported-reverse +(syntax-rules () + ((_ lis) + (reverse lis) diff --git a/tests/reexport-m8.scm b/tests/reexport-m8.scm new file mode 100644 index ..2fe2b9ac --- /dev/null +++ b/tests/reexport-m8.scm @@ -0,0 +1,8 @@ +(module reexport-m8 () + ;; NOTE: Reexport only works properly if m7 is imported here, when + ;; the implementing library isn't required by the user of m8.. + (import reexport-m7 + (rename scheme (reverse reverse-og)) + (rename (chicken base) (identity reverse)) + (chicken module)) + (reexport reexport-m7)) diff --git a/tests/reexport-tests-2.scm b/tests/reexport-tests-2.scm index 14b51c5a..61d81993 100644 --- a/tests/reexport-tests-2.scm +++ b/tests/reexport-tests-2.scm @@ -6,3 +6,9 @@ (import reexport-m6) (f:s1); expands to s2, which is reexported and refers to "s2", which is also visible in this context as "f:s2" (f:s2) + +;; reexport of syntax using shadowed identifiers in new module (#1757) +(import reexport-m8)
[PATCH] Another fix for irregex to bring us to 0.9.10
Hi all, Here's one more patch to bring Irregex completely up to date with the just released 0.9.10 version. It fixes an issue where "bol" would overlap with newline characters in a weird way. Cheers, Peter From efe932f4fa7afbc56865d33edfbf6836c34ce919 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Tue, 6 Jul 2021 15:15:34 +0200 Subject: [PATCH] Bump irregex to upstream commit 29334af, bringing us to version 0.9.10 This fixes upstream ticket #25, where newlines would overlap with "bol" in situations where a string matches multiple times due to inconsistent handling. --- NEWS | 6 -- irregex-core.scm | 16 +++- tests/test-irregex.scm | 4 3 files changed, 19 insertions(+), 7 deletions(-) diff --git a/NEWS b/NEWS index 53a40f0f..2e254e48 100644 --- a/NEWS +++ b/NEWS @@ -6,15 +6,17 @@ - Fixed a bug where optimisations for `irregex-match?` would cause runtime errors due to the inlined specialisations not being fully-expanded (see #1690). - - Irregex has been updated to upstream 0.9.9, which fixes behaviour + - Irregex has been updated to upstream 0.9.10, which fixes behaviour of irregex-replace/all with positive lookbehind so all matches are replaced instead of only the first (reported by Kay Rhodes), and a regression regarding replacing empty matches which was introduced by the fixes in 0.9.7 (reported by Sandra Snan). Also, the http-url shorthand now allows any top-level domain and the old "top-level-domain" now also supports "edu" (fixed by Sandra Snan). -Finally, a problem was fixed with capturing groups inside a kleene +Also, a problem was fixed with capturing groups inside a kleene star, which could sometimes return incorrect parts of the match. +Finally, "bol" handling was fixed to handle newlines consistently +so that multiple matches don't overlap (reported by Sandra Snan). - current-milliseconds has been deprecated in favor of the name current-process-milliseconds, to avoid confusion due to naming of current-milliseconds versus current-seconds, which do something diff --git a/irregex-core.scm b/irregex-core.scm index f86b7992..55e9a6c0 100644 --- a/irregex-core.scm +++ b/irregex-core.scm @@ -30,6 +30,10 @@ History +;; 0.9.10: 2021/07/06 - fixes for submatches under kleene star, empty seqs +;; in alternations, and bol in folds for backtracking +;; matcher (thanks John Clements and snan for reporting +;; and Peter Bex for fixing) ;; 0.9.9: 2021/05/14 - more comprehensive fix for repeated empty matches ;; 0.9.8: 2020/07/13 - fix irregex-replace/all with look-behind patterns ;; 0.9.7: 2019/12/31 - more intuitive handling of empty matches in -fold, @@ -3508,9 +3512,10 @@ (fail ((bol) (lambda (cnk init src str i end matches fail) - (if (or (and (eq? src (car init)) (eqv? i (cdr init))) - (and (> i ((chunker-get-start cnk) src)) -(eqv? #\newline (string-ref str (- i 1) + (if (let ((ch (if (> i ((chunker-get-start cnk) src)) + (string-ref str (- i 1)) + (chunker-prev-char cnk init src + (or (not ch) (eqv? #\newline ch))) (next cnk init src str i end matches fail) (fail ((bow) @@ -3908,13 +3913,14 @@ matches))) (if (not m) (finish from acc) -(let ((j (%irregex-match-end-index m 0)) +(let ((j-start (%irregex-match-start-index m 0)) + (j (%irregex-match-end-index m 0)) (acc (kons from m acc))) (irregex-reset-matches! matches) (cond ((flag-set? (irregex-flags irx) ~consumer?) (finish j acc)) - ((= j i) + ((= j j-start) ;; skip one char forward if we match the empty string (lp (list str j end) j (+ j 1) acc)) (else diff --git a/tests/test-irregex.scm b/tests/test-irregex.scm index 5cf5b685..0888f09b 100644 --- a/tests/test-irregex.scm +++ b/tests/test-irregex.scm @@ -451,6 +451,10 @@ (irregex-extract (irregex "[aeiou]*") "foobarbaz")) (test-equal '("Line 1\n" "Line 2\n" "Line 3") (irregex-split 'bol "Line 1\nLine 2\nLine 3")) + (test-equal '("foo\n" "bar\n" "baz\n") + (irregex-extract '(: bol (+ alpha) newline) "\nfoo\nbar\nbaz\n")) + (test-equal '("\nblah" "\nblah" "\nblah") + (irregex-extract '(: newline "blah" eol) "\nblah\nblah\nblah\n")) ) -- 2.20.1 signature.asc Description: PGP signature
Re: [PATCH] show more helpful error message with "-l"
On Thu, Jul 01, 2021 at 10:44:22PM +0200, felix.winkelm...@bevuta.com wrote: > See patch, suggested by jcroisant on #chicken. Since -lxxx may > be ambiguous, not all cases may be caught, but it can still be helpful to new > users. Thanks, applied. The patch has some weird indentation and also git complained about spaces occurring before tabs, both of which I took the liberty to fix. Cheers, Peter signature.asc Description: PGP signature
[PATCH] Fix for incorrect NFA compilation of "or" with empty sequence [was: Re: [PATCH] Fix for irregex incorrect matches]
On Mon, Jul 05, 2021 at 11:45:36AM +0200, Peter Bex wrote: > Hi there, > > The attached patch is a port of the most recent commit in the irregex > repository, which fixes this upstream ticket: > https://github.com/ashinn/irregex/issues/27 And here is another patch to fix a different issue: https://github.com/ashinn/irregex/issues/26 I would suggest applying this patch after the one I sent earlier, because both touch the same paragraph in the NEWS file. If applied the other way around, you'll get conflicts. Cheers, Peter From 7606aed8e97666ef3100d9432f9c8575e2094ba6 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Mon, 5 Jul 2021 15:27:32 +0200 Subject: [PATCH] Update irregex to upstream b3116764 (fc1adacb) to fix issue with "or" When compiling an NFA from a SRE object containing an (or) which contains an empty sequence, the resulting state machine would be invalid, causing a crash when trying to convert it to a DFA. This was due a mismatch in internal bookkeeping between state numbers and the actual state transitions. --- irregex-core.scm | 2 +- tests/test-irregex.scm | 9 + 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/irregex-core.scm b/irregex-core.scm index a8e7c97f..f86b7992 100644 --- a/irregex-core.scm +++ b/irregex-core.scm @@ -2563,7 +2563,7 @@ flags next (and a - (let ((c (add-state! (new-state-number a) + (let ((c (add-state! (new-state-number (max a b)) '( (nfa-add-epsilon! buf c a #f) (nfa-add-epsilon! buf c b #f) diff --git a/tests/test-irregex.scm b/tests/test-irregex.scm index f1aefc21..5cf5b685 100644 --- a/tests/test-irregex.scm +++ b/tests/test-irregex.scm @@ -567,6 +567,15 @@ ;; irregex-flags, irregex-lengths ) +(test-group "SRE representation edge cases" + ;; NFA compilation skipped alternative after empty sequence (#26, found by John Clements) + (test-equal "empty sequence in \"or\"" + "" + (irregex-match-substring (irregex-search `(or (seq) "foo") ""))) + (test-equal "alternative to empty sequence in \"or\"" + "foo" + (irregex-match-substring (irregex-search `(or (seq) "foo") "foo" + (test-end) -- 2.20.1 signature.asc Description: PGP signature
[PATCH] Fix for irregex incorrect matches
Hi there, The attached patch is a port of the most recent commit in the irregex repository, which fixes this upstream ticket: https://github.com/ashinn/irregex/issues/27 Cheers, Peter From b552052f4085e84d662f70bb76cb4abf41ab25bc Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Mon, 5 Jul 2021 11:38:43 +0200 Subject: [PATCH] Update irregex to upstream 960fa22b, fixing a group matching issue When a kleene star is used around an alternative containing submatches, in some circumstances the DFA compilation would emit reordering commands which would cause the regex capturing to go wrong, returning faulty matches. This would go wrong because the ordering commands would read from a memory slot and write to a target memory slot. For example, the following set of reordering commands has no "correct" order in which they can be executed: p[0] <- p[1] p[1] <- p[0] After executing both of them in either order, both of the slots will contain the same value, instead of swapping them as was the intention. This is fixed by executing the ordering commands after first fetching the old memory slot locations into a closure. Fixes upstream issue #27 --- NEWS | 4 +++- irregex-core.scm | 18 -- tests/re-tests.txt | 1 + 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/NEWS b/NEWS index 46af9bd1..53a40f0f 100644 --- a/NEWS +++ b/NEWS @@ -10,9 +10,11 @@ of irregex-replace/all with positive lookbehind so all matches are replaced instead of only the first (reported by Kay Rhodes), and a regression regarding replacing empty matches which was introduced -by the fixes in 0.9.7 (reported by Sandra Snan). Finally, the +by the fixes in 0.9.7 (reported by Sandra Snan). Also, the http-url shorthand now allows any top-level domain and the old "top-level-domain" now also supports "edu" (fixed by Sandra Snan). +Finally, a problem was fixed with capturing groups inside a kleene +star, which could sometimes return incorrect parts of the match. - current-milliseconds has been deprecated in favor of the name current-process-milliseconds, to avoid confusion due to naming of current-milliseconds versus current-seconds, which do something diff --git a/irregex-core.scm b/irregex-core.scm index 8f672333..a8e7c97f 100644 --- a/irregex-core.scm +++ b/irregex-core.scm @@ -2235,12 +2235,18 @@ (chunk (cons src (+ i 1 (vector-set! slot (car s) chunk))) (cdr cmds)) - (for-each (lambda (c) - (let* ((tag (vector-ref c 0)) - (ss (vector-ref memory (vector-ref c 1))) - (ds (vector-ref memory (vector-ref c 2 -(vector-set! ds tag (vector-ref ss tag -(car cmds) + ;; Reassigning commands may be in an order which + ;; causes memory cells to be clobbered before + ;; they're read out. Make 2 passes to maintain + ;; old values by copying them into a closure. + (for-each (lambda (execute!) (execute!)) +(map (lambda (c) + (let* ((tag (vector-ref c 0)) + (ss (vector-ref memory (vector-ref c 1))) + (ds (vector-ref memory (vector-ref c 2))) + (value-from (vector-ref ss tag))) + (lambda () (vector-set! ds tag value-from + (car cmds)) (if new-finalizer (lp2 (+ i 1) next src (+ i 1) new-finalizer) (lp2 (+ i 1) next res-src res-index #f diff --git a/tests/re-tests.txt b/tests/re-tests.txt index 7a56edb7..39a747e6 100644 --- a/tests/re-tests.txt +++ b/tests/re-tests.txt @@ -171,3 +171,4 @@ multiple words multiple words, yeah y & multiple words (a([^a])*)* abcaBC y &-\1-\2 abcaBC-aBC-C ([Aa]b).*\1 abxyzab y &-\1 abxyzab-ab a([\/\\]*)b a//\\b y &-\1 a//\\b-//\\ +(?:[[:alnum:]]|(@[[:alnum:]]))* oeh@2tu@2n342 y \1 @2 -- 2.20.1 signature.asc Description: PGP signature
[PATCH] Add runtime option to fix randomization seed on startup
Hi all, Megane looked into #1650 and found out that the srand(time(NULL)) call is the culprit; due to the hash table randomization the number of GCs can differ on every run. This led me to think that this could be one reason our benchmarks are so noisy, so I added a runtime option to fix the randomization factor on startup. This does indeed completely remove all deviation on the "number of major GCs" metric in the CHICKEN benchmarks, but unfortunately doesn't seem to do all that much for the cpu-time or GC time metrics. Oh well, I think it could be useful enough anyway, and perhaps a good first stab at reducing the noisiness of our benchmarks. The patch should speak for itself. Cheers, Peter From eecb550157a0bf809132329928b9338c37875bd8 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Thu, 17 Jun 2021 13:28:57 +0200 Subject: [PATCH] Add new -:R runtime option to influence how srand() is called Without this, there's some nondeterminism when running an application, which defends against symbol table stuffing attacks but can make debugging or benchmarking GC issues difficult. As noted in #1650, it is unsettling when a program behaves differently every time you run it. While we're at it, remove unnecessary C_fix() call on the output of time(NULL), as srand() is not a CHICKEN C function accepting fixnums, but a native C function accepting regular integers. --- NEWS | 2 ++ manual/Using the compiler | 2 ++ runtime.c | 12 +++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index e06914c7..46af9bd1 100644 --- a/NEWS +++ b/NEWS @@ -50,6 +50,8 @@ - Garbage collection algorithm has been changed to reduce thrashing when heap is almost full, by growing the heap sooner. A new -:hf option was added to tweak when heap growth should occur. + - Added `-:R' runtime option to initialize rand() state +deterministically (should help with #1650 and benchmarking). - Compiler - Avoid re-using argvector when inline rest operations are being diff --git a/manual/Using the compiler b/manual/Using the compiler index 7e8ff6de..68f09908 100644 --- a/manual/Using the compiler +++ b/manual/Using the compiler @@ -237,6 +237,8 @@ by the startup code and will not be contained in the result of ; {{-:r}} : Writes trace output to stderr. This option has no effect in files compiled with the {{-no-trace}} options. +; {{-:RNUMBER}} : Specifies the initial number passed to seed the {{rand()}} PRNG (which is currently only used to randomize the symbol table). If not supplied, the current system time is used. This can be useful when debugging or benchmarking because it removes a source of nondeterminism which can affect how soon or how often the GC is triggered. + ; {{-:sNUMBER}} : Specifies stack size. ; {{-:tNUMBER}} : Specifies symbol table size. diff --git a/runtime.c b/runtime.c index 580c6fbe..93dd9d29 100644 --- a/runtime.c +++ b/runtime.c @@ -454,6 +454,7 @@ static C_TLS int stack_size_changed, dlopen_flags, heap_size_changed, + random_state_initialized = 0, chicken_is_running, chicken_ran_once, pass_serious_signals = 1, @@ -845,7 +846,10 @@ int CHICKEN_initialize(int heap, int stack, int symbols, void *toplevel) current_module_handle = NULL; callback_continuation_level = 0; gc_ms = 0; - srand(C_fix(time(NULL))); + if (!random_state_initialized) { +srand(time(NULL)); +random_state_initialized = 1; + } for(i = 0; i < C_RANDOM_STATE_SIZE / sizeof(C_uword); ++i) random_state[ i ] = rand(); @@ -1379,6 +1383,7 @@ void CHICKEN_parse_command_line(int argc, char *argv[], C_word *heap, C_word *st " -:huPERCENTAGE set percentage of memory used at which heap will be shrunk\n" " -:hSIZE set fixed heap size\n" " -:r write trace output to stderr\n" + " -:RSEED initialize rand() seed with SEED (helpful for benchmark stability)\n" " -:p collect statistical profile and write to file at exit\n" " -:PFREQUENCY like -:p, specifying sampling frequency in us (default: 1)\n" " -:sSIZE set nursery (stack) size\n" @@ -1494,6 +1499,11 @@ void CHICKEN_parse_command_line(int argc, char *argv[], C_word *heap, C_word *st show_trace = 1; break; + case 'R': + srand((unsigned int)arg_val(ptr)); + random_state_initialized = 1; + goto next; + case 'x': C_abort_on_thread_exceptions = 1; break; -- 2.20.1 signature.asc Description: PGP signature
[PATCH] Fix #1756 by replacing rest ops for explicitly consed rest args
Hi all, The attached patches fix #1756 by detecting when a procedure with a rest argument is converted into a regular procedure with the last argument being a list which is consed up at the call sites (aka an "explicitly consed rest argument"). When this happens, we "undo" the argvector-based rest operations by replacing them with regular car/cdr/length list operations, like we also already do in closure conversion when the rest arg has been reified into a list and stored inside the current closure. To avoid code duplication, there's a commit which first moves that code into a procedure in support.scm, followed by a commit which calls this new procedure in the situation described above. The first commit simply adds tracking of the situation so that we know when a rest argument variable has become an explicitly consed parameter. So, while this is a medium-sized change, hopefully the way the commits are structured should make it easy to review. Cheers, Peter From bf48a7f9dbb26e3461d1090da63ca9496d902287 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Wed, 16 Jun 2021 08:32:28 +0200 Subject: [PATCH 1/3] Add information to the db for rest args which are explicitly consed Before, this would only be stored as an "explicit-rest" property on the procedure variable. In contexts where you only have the variable it was difficult to find out whether the rest arg was explicitly consed. Rest-ops only hold a reference to the rest arg variable, not the procedure. --- batch-driver.scm | 2 +- core.scm | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/batch-driver.scm b/batch-driver.scm index 78296c9d..3f456531 100644 --- a/batch-driver.scm +++ b/batch-driver.scm @@ -147,7 +147,7 @@ ((potential-values) (set! pvals (cdar es))) ((replacable home contains contained-in use-expr closure-size rest-parameter -captured-variables explicit-rest rest-cdr rest-null?) +captured-variables explicit-rest rest-cdr rest-null? consed-rest-arg) (printf "\t~a=~s" (caar es) (cdar es)) ) ((derived-rest-vars) (set! derived-rvars (cdar es))) diff --git a/core.scm b/core.scm index 2a380f03..c484071c 100644 --- a/core.scm +++ b/core.scm @@ -263,6 +263,7 @@ ; extended-binding -> If true: variable names an extended binding ; unused -> If true: variable is a formal parameter that is never used ; rest-parameter -> #f | 'list If true: variable holds rest-argument list +; consed-rest-arg -> If true: variable is a rest variable in a procedure called with consed rest list ; rest-cdr -> (rvar . n) Variable references the cdr of rest list rvar after n cdrs (0 = rest list itself) ; rest-null? -> (rvar . n) Variable checks if the cdr of rest list rvar after n cdrs is empty (0 = rest list itself) ; derived-rest-vars -> (v1 v2 ...) Other variables aliasing or referencing cdrs of a rest variable @@ -2435,7 +2436,8 @@ (cond ((and has (not (rassoc sym callback-names eq?))) (db-put! db (first lparams) 'has-unused-parameters #t) ) (rest - (db-put! db (first lparams) 'explicit-rest #t) ) ) ) ) ) ) ) ) ) + (db-put! db (first lparams) 'explicit-rest #t) + (db-put! db rest 'consed-rest-arg #t) ) ) ) ) ) ) ) ) ) ;; Make 'removable, if it has no references and is not assigned to, and one of the following: ;; - it has either a value that does not cause any side-effects -- 2.20.1 From ed34e4857f319f3654f37680ebd4c358c494f286 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Wed, 16 Jun 2021 08:35:33 +0200 Subject: [PATCH 2/3] Refactor replacing of rest args to make it reusable This moves the replacing of rest args with corresponding list ops into a procedure in support.scm --- core.scm| 37 ++--- support.scm | 26 +- 2 files changed, 31 insertions(+), 32 deletions(-) diff --git a/core.scm b/core.scm index c484071c..630bfd04 100644 --- a/core.scm +++ b/core.scm @@ -2649,37 +2649,12 @@ ;; This can be improved, as it can actually introduce ;; many more cdr calls than necessary. (cond ((eq? class '##core#rest-cdr) - (let lp ((cdr-calls (add1 (second params))) - (var rest-var)) - (if (zero? cdr-calls) - (transform var here closure) - (lp (sub1 cdr-calls) - (make-node '##core#inline (list "C_i_cdr") (list var)) - - ;; If customizable, the list is consed up at the - ;; call site and there is no argvector. So convert - ;; back to list-ref/list-tail calls. - ;; - ;; Alternatively, if n isn't val, this node was - ;; processed and the variable got replaced by a - ;; closure access. - ((or (test here 'customizable) - (not (eq? val n))) - (case class - ((##core#rest-car) -
[PATCH] Fix #1675 by using theoretical maximum addressable range on 64-bit platforms
Hi all, As discussed in the ticket and on IRC, the default maximum heap size is *not* to kill processes early to prevent programs from running amok, but as a way to avoid allocating more than our C_word pointers can address with the masked out bits. Therefore, the value on 64-bit platforms is not correct - it is 2GB there, which is the 32-bit limit. Attached is a signed-off copy of Dieggsy's trivial patch with a change in the NEWS file to point this out to users as this may be an important change for people relying on this limit for other reasons. Cheers, Peter From d6cc3d958c6f73a1bc7adaa89a3c71c7d0394b11 Mon Sep 17 00:00:00 2001 From: dieggsy Date: Sat, 10 Apr 2021 13:36:18 -0400 Subject: [PATCH] Increase maximum heap size on 64-bit machines Signed-off-by: Peter Bex --- NEWS | 3 +++ runtime.c | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index c5eb0f03..b283d436 100644 --- a/NEWS +++ b/NEWS @@ -35,6 +35,9 @@ contributed by Sebastien Marie) - A feature corresponding to the word size is available regardless of the word size (#1693) + - The default maximum heap size on 64-bit platforms is now the +theoretical maximum addressable memory size (#1675). Use -:m +if you would like to override this at run-time. - Deprecated C_(a_i_current_)milliseconds in favor of C_(a_i_)current_process_milliseconds to match the Scheme-level deprecation of current-milliseconds. diff --git a/runtime.c b/runtime.c index 51cfa7be..dfc2e0ee 100644 --- a/runtime.c +++ b/runtime.c @@ -150,8 +150,10 @@ static C_TLS int timezone; #ifdef C_SIXTY_FOUR # define DEFAULT_STACK_SIZE(1024 * 1024) +# define DEFAULT_MAXIMAL_HEAP_SIZE 0x7ff0 #else # define DEFAULT_STACK_SIZE(256 * 1024) +# define DEFAULT_MAXIMAL_HEAP_SIZE 0x7ff0 #endif #define DEFAULT_SYMBOL_TABLE_SIZE 2999 @@ -159,7 +161,6 @@ static C_TLS int timezone; #define DEFAULT_HEAP_SIZE DEFAULT_STACK_SIZE #define MINIMAL_HEAP_SIZE DEFAULT_STACK_SIZE #define DEFAULT_SCRATCH_SPACE_SIZE 256 -#define DEFAULT_MAXIMAL_HEAP_SIZE 0x7ff0 #define DEFAULT_HEAP_GROWTH200 #define DEFAULT_HEAP_SHRINKAGE 50 #define DEFAULT_HEAP_SHRINKAGE_USED25 -- 2.20.1 signature.asc Description: PGP signature
[PATCH] Partial fix for #1685
Hi all, Here's a more "official" version of Mario's patch for #1685, which works on UNIX only. For Windows we'll probably need to overhaul egg-compile.scm into some sort of structured system instead of using ad-hoc quoting, because on Windows it is context-dependent how one should quote variables and paths (see #1727 for more details). To me this would be more suitable to fully (i.e. also Windows) fix it in 5.4, but the UNIX path quoting seems to me like a trivial and nice improvement to go into 5.3 already. Cheers, Peter From 9fd1dda69f0f5edd0db1b13078116c16c77880dc Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Thu, 20 May 2021 09:57:17 +0200 Subject: [PATCH] Partial fix for #1685 - quote shell variables on UNIX platforms This allows having shell metacharacters (like spaces) in path names. This does not yet fix the situation for Windows - there, quotation is context-dependent, so quite a bit harder to fix properly. Signed-off-by: Peter Bex --- NEWS| 2 ++ egg-compile.scm | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index c5eb0f03..7f9faecd 100644 --- a/NEWS +++ b/NEWS @@ -65,6 +65,8 @@ `Error: (string->number) bad argument type: #!eof` in some cases. - If chicken-install has a program prefix/suffix, it now writes to a cache directory matching its program name (#1713, thanks to Alice Maz) + - Fixed bug in chicken-install regarding variable quotation on UNIX-like +systems which prevented installation into paths with spaces (#1685). 5.2.0 diff --git a/egg-compile.scm b/egg-compile.scm index a4c4bf0c..186a7227 100644 --- a/egg-compile.scm +++ b/egg-compile.scm @@ -1239,7 +1239,7 @@ EOF (define (shell-variable var platform) (case platform -((unix) (string-append "${" var "}")) +((unix) (string-append "\"${" var "}\"")) ((windows) (string-append "%" var "%" ;; NOTE `cmd' must already be quoted for shell -- 2.20.1 signature.asc Description: PGP signature
[PATCH] Update irregex to 0.9.9
Hi all, Attached is a patch to bring irregex up to date with the latest upstream. It includes an important bugfix regarding irregex-replace/all when an empty match at the beginning of a string is replaced - it would incorrectly duplicate the first character. It also includes changes so that the top-level-domain shorthand now also allows "edu", but this is an incomplete list anyway now that anyone can buy their own, and the http-url shorthand now goes through "domain" rather than "domain-common" to allow for top-level domains not listed in the aforementioned top-level-domain. Cheers, Peter From 1a7ee0d480b296d3a93f81d74854a2d899ed67f2 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Fri, 14 May 2021 08:32:48 +0200 Subject: [PATCH] Update irregex to upstream 0.9.9 (rev 3c367082) This fixes an edge case from upstream bug #24 where replacing empty strings with replace/all at the start of a string would incorrectly duplicate the starting character, found by Sandra Snan. --- NEWS | 8 ++-- irregex-core.scm | 38 -- tests/test-irregex.scm | 5 - 3 files changed, 26 insertions(+), 25 deletions(-) diff --git a/NEWS b/NEWS index 1e852d90..c5eb0f03 100644 --- a/NEWS +++ b/NEWS @@ -6,9 +6,13 @@ - Fixed a bug where optimisations for `irregex-match?` would cause runtime errors due to the inlined specialisations not being fully-expanded (see #1690). - - Irregex has been updated to upstream 0.9.8, which fixes behaviour + - Irregex has been updated to upstream 0.9.9, which fixes behaviour of irregex-replace/all with positive lookbehind so all matches are -replaced instead of only the first (reported by Kay Rhodes). +replaced instead of only the first (reported by Kay Rhodes), and +a regression regarding replacing empty matches which was introduced +by the fixes in 0.9.7 (reported by Sandra Snan). Finally, the +http-url shorthand now allows any top-level domain and the old +"top-level-domain" now also supports "edu" (fixed by Sandra Snan). - current-milliseconds has been deprecated in favor of the name current-process-milliseconds, to avoid confusion due to naming of current-milliseconds versus current-seconds, which do something diff --git a/irregex-core.scm b/irregex-core.scm index 42f2a806..8f672333 100644 --- a/irregex-core.scm +++ b/irregex-core.scm @@ -1,6 +1,6 @@ irregex.scm -- IrRegular Expressions ;; -;; Copyright (c) 2005-2020 Alex Shinn. All rights reserved. +;; Copyright (c) 2005-2021 Alex Shinn. All rights reserved. ;; BSD-style license: http://synthcode.com/license.txt @@ -30,6 +30,7 @@ History +;; 0.9.9: 2021/05/14 - more comprehensive fix for repeated empty matches ;; 0.9.8: 2020/07/13 - fix irregex-replace/all with look-behind patterns ;; 0.9.7: 2019/12/31 - more intuitive handling of empty matches in -fold, ;; -replace and -split @@ -2348,8 +2349,8 @@ (domain . (seq domain-atom (+ #\. domain-atom))) ;; now anything can be a top-level domain, but this is still handy (top-level-domain . (w/nocase (or "arpa" "com" "gov" "mil" "net" "org" - "aero" "biz" "coop" "info" "museum" - "name" "pro" (= 2 alpha + "edu" "aero" "biz" "coop" "info" + "museum" "name" "pro" (= 2 alpha (domain/common . (seq (+ domain-atom #\.) top-level-domain)) ;;(email-local-part . (seq (+ (or (~ #\") string (email-local-part . (+ (or alphanumeric #\_ #\- #\. #\+))) @@ -2360,7 +2361,7 @@ (seq "%" hex-digit hex-digit))) (http-url . (w/nocase "http" (? "s") "://" - (or domain/common ipv4-address) ;; (seq "[" ipv6-address "]") + (or domain ipv4-address) ;; (seq "[" ipv6-address "]") (? ":" (+ numeric)) ;; port ;; path (? "/" (* (or url-char "/")) @@ -3889,9 +3890,9 @@ (if (not (and (integer? end) (exact? end))) (error 'irregex-fold "not an exact integer" end)) (irregex-match-chunker-set! matches irregex-basic-string-chunker) -(let lp ((src init-src) (i start) (acc knil)) +(let lp ((src init-src) (from start) (i start) (acc knil)) (if (>= i end) - (finish i acc) + (finish from acc)
Re: [PATCH] make chicken-install cache depend on PROGRAM_PREFIX/SUFFIX
On Thu, Apr 15, 2021 at 11:05:36AM +0200, felix.winkelm...@bevuta.com wrote: > A patch by alicemaz, see #1713. Thanks, pushed. Cheers, Peter signature.asc Description: PGP signature