[PATCH] Update irregex to 0.9.11

2024-04-23 Thread Peter Bex
Hi all,

Here's a patch to bring irregex in line with upstream 0.9.11 (plus one
more commit that fixes a bug in sre->string).  It would be nice to have
that for 5.4.0.

Not that much changed in this release, but a lot of code was
restructured, so it would make it easier to port future changes once
this has been applied.

Also, there are a few fixes related to utf8-mode, so maybe these help
with the UTF-8 branch.

Cheers,
Peter
>From 411dcdd39fde172f414e29a216ed413a4758 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Tue, 23 Apr 2024 08:26:07 +0200
Subject: [PATCH] Bump irregex to upstream commit 923cfc39, which is 0.9.11
 plus a bugfix

---
 NEWS   |  5 +
 irregex-core.scm   | 19 +--
 irregex-utils.scm  |  2 +-
 tests/test-irregex.scm | 15 +++
 4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/NEWS b/NEWS
index 6b09db47..b1bf9e1c 100644
--- a/NEWS
+++ b/NEWS
@@ -37,6 +37,11 @@
 an `errno' property.
   - Deprecated "chicken-home" and added "include-path" in the
 chicken.platform module.
+  - Irregex has been updated to upstream 0.9.11 plus an additional fix
+for sre->string.  The 0.9.11 release fixes a few problems related to
+utf-8 handling (which should not affect CHICKEN) and expands the
+definition for the 'whitespace character set to include vertical tab,
+carriage return and form feed.
 
 - Tools
   - The -R option for csi and csc now accepts list-notation like
diff --git a/irregex-core.scm b/irregex-core.scm
index 55e9a6c0..5550ace8 100644
--- a/irregex-core.scm
+++ b/irregex-core.scm
@@ -1,6 +1,6 @@
  irregex.scm -- IrRegular Expressions
 ;;
-;; Copyright (c) 2005-2021 Alex Shinn.  All rights reserved.
+;; Copyright (c) 2005-2024 Alex Shinn.  All rights reserved.
 ;; BSD-style license: http://synthcode.com/license.txt
 
 
@@ -30,6 +30,7 @@
 
 
  History
+;; 0.9.11: 2024/02/23 - Guile test and packaging support from Tomas Volf.
 ;; 0.9.10: 2021/07/06 - fixes for submatches under kleene star, empty seqs
 ;; in alternations, and bol in folds for backtracking
 ;; matcher (thanks John Clements and snan for reporting
@@ -425,7 +426,12 @@
 ;; (define *all-chars* `(/ ,(integer->char (- (char->integer #\space) 32)) 
,(integer->char (+ (char->integer #\space) 223
 
 ;; set to #f to ignore even an explicit request for utf8 handling
-(define *allow-utf8-mode?* #t)
+;; The utf8-mode is undesired on any implementation with native unicode 
support.
+;; It is a workaround for those that treat strings as a raw byte sequences, and
+;; does not work well otherwise.  So disable it on implementations known to
+;; handle unicode natively.
+(define *allow-utf8-mode?* (cond-expand ((and chicken (not full-unicode)) #t)
+(else #f)))
 
 ;; (define *named-char-properties* '())
 
@@ -1568,8 +1574,8 @@
  (cons (car sre) (map rec (cdr sre))
  (else
   (case sre
-((any) 'utf8-any)
-((nonl) 'utf8-nonl)
+((any) (if utf8? 'utf8-any 'any))
+((nonl) (if utf8? 'utf8-nonl 'nonl))
 (else
  (if (and utf8? (char? sre) (high-char? sre))
  (sre-sequence (map integer->char (char->utf8-list sre)))
@@ -2292,10 +2298,11 @@
  . (or alphanumeric punctuation #\$ #\+ #\< #\= #\> #\^ #\` #\| #\~))
 (graph . graphic)
 (blank . (or #\space ,(integer->char (- (char->integer #\space) 23
-(whitespace . (or blank #\newline))
+;; 0B - vertical tab, 0C - form feed
+(whitespace . (or blank #\newline #\x0C #\return #\x0B))
 (space . whitespace)
 (white . whitespace)
-(printing or graphic whitespace)
+(printing . (or graphic whitespace))
 (print . printing)
 
 ;;  we assume a (possibly shifted) ASCII-based ordering
diff --git a/irregex-utils.scm b/irregex-utils.scm
index 291b03ea..37313666 100644
--- a/irregex-utils.scm
+++ b/irregex-utils.scm
@@ -104,7 +104,7 @@
(display ")" out))
   ((* + ? *? ??)
(cond
-((pair? (cddr x))
+((or (pair? (cddr x)) (and (string? (cadr x)) (not (= 1 
(string-length (cadr x))
  (display "(?:" out) (for-each lp (cdr x)) (display ")" out))
 (else (lp (cadr x
(display (car x) out))
diff --git a/tests/test-irregex.scm b/tests/test-irregex.scm
index 0888f09b..8c0464ad 100644
--- a/tests/test-irregex.scm
+++ b/tests/test-irregex.scm
@@ -419,6 +419,12 @@
   (test-equal "***x***"
   (irregex-replace/all
(irregex '(: #\space) 'dfa) "   x   " "*"))
+  (test-equal "A:42"
+  (irregex-replace/all "^" "42" "A:&quo

Re: [PATCH] fix compiler-syntax handling of ##sys#override

2024-01-16 Thread Peter Bex
On Tue, Jan 16, 2024 at 06:09:33PM +0100, felix.winkelm...@bevuta.com wrote:
> This patch addresses the problem reported recently my Mario, regarding
> failing compilation of http-client.
> 
> The problem was that compiler-syntax definitions changed the "override"
> status and disabled the existing value binding.

Thanks, this fixes the error after recompiling intarweb and then
http-client.

Pushed!

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] allow overriding value/syntax toplevel bindings (#1166)

2023-12-20 Thread Peter Bex
On Tue, Nov 14, 2023 at 01:22:59PM +0100, felix.winkelm...@bevuta.com wrote:
> Yes, this is admittedly all a bit ugly. Please find attached 2 patches:
> the first addresses the endless expansion loop caused by our recent change
> in ##sys#canonicalize-body. The second patch is a new version of the 
> "override"
> patch, with some additional changes to address the example above and a
> followup problem that came up during testing.

Finally got around to having a proper look (sorry for taking so long!).
It's hacky indeed, but the idea seems sound to me.  Pushed that second
larger patch.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] allow overriding value/syntax toplevel bindings (#1166)

2023-11-14 Thread Peter Bex
On Sun, Nov 12, 2023 at 01:45:06PM +0100, felix.winkelm...@bevuta.com wrote:
> See commit message.

Nice to make some progress on this!

However, I tested the example given in the ticket:

 (define begin -)
 (begin 0 1) => 1  ;; expected: -1

This still evaluates to 1.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] avoid loop in lookup (fix Salmonella hang with transducers)

2023-11-12 Thread Peter Bex
On Sun, Nov 12, 2023 at 12:45:49PM +0100, felix.winkelm...@bevuta.com wrote:
> The attached patch addresses an endless expander loop while compiling
> the transducers egg. The egg uses unconvential (and only slightly legal)
> alias identifiers that confuse the expander, combined with a recent
> change in how it resolves macros in bodies. The problem was reported by
> Mario and wasamasa.

Oof, nasty.  Pushed the change.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] warn on unexpected egg-properties

2023-11-10 Thread Peter Bex
On Wed, Nov 08, 2023 at 02:59:17PM +0100, felix.winkelm...@bevuta.com wrote:
> Attached is a patch to address #1492.

Applied.  Do we want to turn this into an error for CHICKEN 6?
Warnings have a tendency of just scrolling by and getting lost in the
noise of compiler output.

OTOH, making it an error would make it harder to add new properties and
be backward compatible in an egg (though cond-expand would help).

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] fix for #1132

2023-11-08 Thread Peter Bex
On Wed, Nov 08, 2023 at 12:34:23PM +0100, felix.winkelm...@bevuta.com wrote:
> See commit message.

Shouldn't we modify the test instead of dropping it?

Something like this:

(module m3 ()
  (import (rename scheme (define s:define)))
  (import (only (chicken base) assert))
  (define-syntax define
(syntax-rules ()
  ((_) (display 'oink))
  ((_ var value) (s:define var (+ value 1)
  (define)
  (let ()
(define a 1)
(assert (= a 2)))
  (define)
  (newline))

Not sure it's useful to have that final define and newline there.
Perhaps we can drop that, or change it do something more meaningful.
We could simply change it to (define b 5) followed by (assert (= b 6)).

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] export/rename

2023-10-17 Thread Peter Bex
On Tue, Oct 10, 2023 at 08:14:54AM +0200, Peter Bex wrote:
> After some more thought on this, I think you convinced me.
> Let's go with the original export/rename patch, unless anyone objects.

A week has passed with no objections.  I've pushed the original patch.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] export/rename

2023-10-10 Thread Peter Bex
On Mon, Oct 09, 2023 at 05:01:47PM +0200, felix.winkelm...@bevuta.com wrote:
> The export/rename localizes the functionality and doesn't
> require changes to existing code. The precedent of the syntax: and
> interface: never was a particularly good one (we should have used separate
> export forms for those right from the start), so adding funny markers or
> possibly ambiguous special cases is not my favorite approach.

After some more thought on this, I think you convinced me.
Let's go with the original export/rename patch, unless anyone objects.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] export/rename

2023-10-09 Thread Peter Bex
On Tue, Oct 03, 2023 at 12:51:57PM +0200, felix.winkelm...@bevuta.com wrote:
> Ok, I find attached a variant, both more ugly in interface and 
> implementation, since ##core#module and functor do not yet allow
> renamings to be handled. Also, "define-interface" has no notion
> of this, so "(rename: OLD NEW)" is purely applicable in "export".

I'd prefer if we could develop this a bit further to make the rename
form accessible to module as well.

Not sure if that would be useful in functors.  I've never used them,
but it seems to me that the functor's implementation is in a better
position to determine whether to rename on export or not, not the
functor definition itself.

If you agree that it makes little sense to support it in functors,
perhaps it's better to call ##sys#syntax-error when the functor
definition includes renamed exports?

> I prefer the first variant, but feel free to push which one you
> like more.

As it is, the first variant is cleaner indeed, but if we decide to
support renames in the "module" form as well, I think we should go
with the second variant.

I had a quick look and the main obstacle seems to be that
##core#module is not very extendable, because it assumes the first two
"arguments" are special, and the rest is the body (IIUC).

Would it perhaps be possible to simply expand (module FOO (exps) BODY) to
something like the following?

  (##core#module FOO (non-renamed-exps)
(export (rename: name1 renamed1) ...)
BODY)

Perhaps a bit hand-wavey (sorry), but I think this is simpler than
trying to change ##core#module.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] export/rename

2023-10-03 Thread Peter Bex
On Mon, Oct 02, 2023 at 06:31:44PM +0200, felix.winkelm...@bevuta.com wrote:
> This patch adds a new special form to explicitly export renamed bindings 
> from a module:
> 
> (export/rename (OLD NEW) ...)

Why not add it to the regular "export" form?  It's already extendable,
as it has syntax: and interface:.  So for example we could have:

  (export (rename: OLD NEW) ...)

Saves having yet another top-level special-case form.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] Disallow empty "or" type specifier

2023-09-26 Thread Peter Bex
On Tue, Sep 26, 2023 at 12:33:43PM +0300, elf wrote:
> It certainly shouldn't be *, nor should it be an error... technically an 
> empty union should be a null set, which would correspond to either a 
> non-extant type or to no return/value at all...

We've discussed this on IRC.  Such a null type doesn't exist in Scheme,
and "no return value" would correspond to a zero "multiple return
values", which has a different notation in our type system.

The (or) is an individual value, which could occur multiple times in a
"multiple return values" list.  We don't splice MV values together,
so this is either a null type or an error.

So in our case, it should definitely be an error since we don't have
null types.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] recover CHICKEN_INSTALL_PREFIX

2023-09-19 Thread Peter Bex
On Wed, Aug 16, 2023 at 05:25:45PM +0200, felix.winkelm...@bevuta.com wrote:
> > Why are install-path and repo-path in chicken-install.scm still using
> > destination-repository instead of effective-destination-repository?
> > Those are the only places that still use it, aside from one in
> > chicken-status and chicken-uninstall.
> 
> The prefix variable is only used for designating where the build
> artifacts are to be stored, not where extensions are located or
> where .egg-info files are to be stored. It also makes sense to set
> CHICKEN_REPOSITORY_PATH in addition to CHICKEN_INSTALL_PREFIX when
> dependency extensions installed and used during the build of another
> extension are needed to be located. This _is_ confusing, no doubt
> about that, but we need a way to override install target locations,
> especially for include-files and binaries and has been requested 
> several times by users, IIRC.

It's extremely confusing and hard to use correctly, but it seems to work
as intended, so I've pushed the patch.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] recover CHICKEN_INSTALL_PREFIX

2023-08-02 Thread Peter Bex
On Fri, Jul 21, 2023 at 04:12:52PM +0200, felix.winkelm...@bevuta.com wrote:
> Indeed - sorry about that, here an improved version.

Sorry for the delay, but I finally had a bit of time to look into this
more seriously, but honestly I'm confused by all the parameters.

Why are install-path and repo-path in chicken-install.scm still using
destination-repository instead of effective-destination-repository?
Those are the only places that still use it, aside from one in
chicken-status and chicken-uninstall.

I checked, but if I call chicken-uninstall with CHICKEN_EGG_PREFIX,
it uninstalls (or attempts to uninstall) the egg in the "standard"
location.  That means chicken-uninstall doesn't work if you're using
CHICKEN_EGG_PREFIX.

For chicken-status, the same is true - it won't list the eggs you
installed in the prefix.

So why not simply replace the definition of destination-repository
with the new code?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))

2023-07-27 Thread Peter Bex
On Thu, Jul 27, 2023 at 12:30:46PM +0200, Peter Bex wrote:
> 1) There's no (efficient) way to know if an object is a finalizable one.
>We need this because we can't simply clear *all* the objects inside
>a finalizable object that aren't referenced elsewhere, because we do
>want to keep foreign pointers etc which are only referenced by the
>finalized object itself.  So we'd need an efficient way to know if
>an object pointed to by a finalizable object is itself finalizable.

Strike that - I think it can be done at the cost of an additional
pointer per finalizer to encode the boundaries of objects that belong to
that finalizer's "reachable set".  But still, that leaves us with #2.

> 2) We have no (canonical) way of breaking strong references.  For weak
>references, it is clear that we have some special indicator, the
>"broken weak pointer" placeholder that gets put there when an object
>reference is cleared.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))

2023-07-27 Thread Peter Bex
On Wed, Jul 26, 2023 at 04:45:53PM +0100, Andy Bennett wrote:
> So how can we finalise a circular list of objects all of which have
> finalisers and still maintain atomicity?
> 
> The docs say the order is "undefined".
> It seems that the best way to finalise this structure is to explicitly break
> all the strong references between components of the list (as we do for
> external weak references) before any of the finalisers are called.

I thought about this one too - it would be nice if all the finalized objects
that refer to other finalized objects would have these links cleared.
However, there are two main obstacles to that:

1) There's no (efficient) way to know if an object is a finalizable one.
   We need this because we can't simply clear *all* the objects inside
   a finalizable object that aren't referenced elsewhere, because we do
   want to keep foreign pointers etc which are only referenced by the
   finalized object itself.  So we'd need an efficient way to know if
   an object pointed to by a finalizable object is itself finalizable.

2) We have no (canonical) way of breaking strong references.  For weak
   references, it is clear that we have some special indicator, the
   "broken weak pointer" placeholder that gets put there when an object
   reference is cleared.

> It may also improve the memory model if we define the object that the
> finaliser receives as a "copy" of the object that has ("already") been
> garbage collected.

I don't really think this will make much of a difference either way - as
it is currently, you couldn't distinguish the "original" object from a
copy.

Possibly a better way is what MIT Scheme (and, gasp, JavaScript) does:
register a finalizer on object with an extraction procedure that returns
the value to finalize.  That way, the object getting deleted is not the
object that is getting finalized.  For instance, when a port would be
finalized, the finalization procedure receives the underlying file
descriptor and not the entire port object.

Although after giving that some more thought, I'm not 100% sure this
would really solve the issue - you can still extract complex objects
from the to-be-finalized object (or have the "identity" procedure as
the "extractor", so it returns the object itself) and therefore have
an object with other things pointing into it, and that could still
be resurrected...

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] recover CHICKEN_INSTALL_PREFIX

2023-07-21 Thread Peter Bex
On Fri, Jul 21, 2023 at 03:54:41PM +0200, felix.winkelm...@bevuta.com wrote:
> diff --git a/egg-compile.scm b/egg-compile.scm
> index 99a94fe8..b14535da 100644
> --- a/egg-compile.scm
> +++ b/egg-compile.scm
> @@ -412,7 +420,7 @@
>  (define (compile-common info walk)
>(case (car info)
>  ((target)
> - (when (eq? mode 'target)
> + (when (eq? mode =)
> (for-each walk (cdr info
>  ((host)
>   (when (eq? mode 'host)

This hunk looks incorrect.  What did you intend to change this to?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] finalizer API (was: thread-safe handling of asynchronous events)

2023-07-18 Thread Peter Bex
On Fri, Jul 07, 2023 at 10:43:29AM +0200, felix.winkelm...@bevuta.com wrote:
> Here another attempt at a finalizer API, allowing adding
> finalized objects to an existing finalizer after it was
> created.

Thanks, I've pushed both this and the srfi 18 changes.
Should we tag a new srfi-18 release?

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Rename read/source-info to avoid self-reference and hang using older compilers

2023-07-11 Thread Peter Bex
Hi all,

Here's a patch to change the name of the read/source-info we export from
chicken.syntax.  This avoids a problem we've been seeing on the
Salmonella machines with hanging processes.

Cheers,
Peter
From 520c12c7642448af2bc1c6e0a1790794d06dc2d4 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Tue, 11 Jul 2023 15:40:45 +0200
Subject: [PATCH] Change official name of read/source-info to avoid conflict

Somehow, when using an older compiler to bootstrap, read/source-info
from support.scm was referencing itself, even though it refers to
chicken.syntax#read-with-source-info explicitly.  This caused the
Salmonella test servers to hang.

At some point, we'll have to find out why this happened, but for now
let's take the easy way out and rename the officially exported name to
the full name "read-with-source-info".
---
 DEPRECATED |  6 +++---
 NEWS   |  6 +++---
 batch-driver.scm   |  2 +-
 csi.scm|  6 +++---
 eval.scm   | 10 +-
 expand.scm | 10 +-
 manual/Module (chicken syntax) |  6 +++---
 repl.scm   |  2 +-
 support.scm|  2 +-
 types.db   |  2 +-
 10 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/DEPRECATED b/DEPRECATED
index 6315e591..6c2a009e 100644
--- a/DEPRECATED
+++ b/DEPRECATED
@@ -2,9 +2,9 @@ Deprecated functions and variables
 ==
 
 5.3.1
-- read/source-info was moved from the internal, undocumented module
-  (chicken compiler support) into (chicken syntax).  Using it from
-  the former module is deprecated.
+- read/source-info from the internal, undocumented module
+  (chicken compiler support) is deprecated.  Instead, use
+  read-with-source-info from (chicken syntax).
 
 5.2.1
 - current-milliseconds and its C implementations C_milliseconds and
diff --git a/NEWS b/NEWS
index 68940ef5..a371ff8e 100644
--- a/NEWS
+++ b/NEWS
@@ -21,9 +21,9 @@
 filename. Previously, the directory part would be stripped.
   - Added support for embedded strings and characters in SRFI-4 vector
 literals.
-  - read/source-info is now documented and officially supported, from
-the (chicken syntax) module.  It is still exported from the
-undocumented internal (chicken compiler support) module, but
+  - read-with-source-info is now documented and officially supported,
+from the (chicken syntax) module.  read/source-info is still exported
+from the undocumented internal (chicken compiler support) module, but
 using it from there is deprecated.
 
 - Tools
diff --git a/batch-driver.scm b/batch-driver.scm
index f0a5fc08..8f0a4f35 100644
--- a/batch-driver.scm
+++ b/batch-driver.scm
@@ -609,7 +609,7 @@
   (in (check-and-open-input-file f)) )
  (fluid-let ((##sys#current-source-filename f))
(let loop ()
- (let ((x (chicken.syntax#read/source-info in))) ; 
OBSOLETE - after bootstrapping we can get rid of this explicit namespacing
+ (let ((x (chicken.syntax#read-with-source-info 
in))) ; OBSOLETE - after bootstrapping we can get rid of this explicit 
namespacing
 
(cond ((eof-object? x) 
   (close-checked-input-file in f) )
diff --git a/csi.scm b/csi.scm
index 28ab16a0..63ff4221 100644
--- a/csi.scm
+++ b/csi.scm
@@ -280,7 +280,7 @@ EOF
 (define default-evaluator
   (let ((eval eval)
(load-noisily load-noisily)
-   (read (lambda () (chicken.syntax#read/source-info 
(current-input-port ; OBSOLETE - after bootstrapping we can get rid of this 
explicit namespacing
+   (read (lambda () (chicken.syntax#read-with-source-info 
(current-input-port ; OBSOLETE - after bootstrapping we can get rid of this 
explicit namespacing
(read-line read-line)
(display display)
(string-split string-split)
@@ -1047,8 +1047,8 @@ EOF
  (load home-fn) ) ) ) )
   (define (evalstring str #!optional (rec (lambda _ (void
(let ((in (open-input-string str))
- (read/source-info chicken.syntax#read/source-info)) ; OBSOLETE - 
after bootstrapping we can get rid of this explicit namespacing
- (do ([x (read/source-info in) (read/source-info in)])
+ (read-with-source-info chicken.syntax#read-with-source-info)) ; 
OBSOLETE - after bootstrapping we can get rid of this explicit namespacing
+ (do ([x (read-with-source-info in) (read-with-source-info in)])
  ((eof-object? x))
(rec (receive (eval x))) ) ) )
   (when (member* '("-h" "-help" "--help") args)
diff --git a/eval.scm b/eval.scm
index d54be504..929acc38 100644
--- a/eval.scm
+++ b/eval.scm
@@ -1079,7 +1079,7 @@
 (##sys#current-load-filenam

Re: How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))

2023-07-11 Thread Peter Bex
On Mon, Jul 10, 2023 at 09:28:19PM +0200, felix.winkelm...@bevuta.com wrote:
> After thinking some more about this, I realize that your approach
> (clearing weak ref's to finalized data) is the right thing, since
> any other behaviour in the presence of multithreading leads to
> disaster.

Thank you for seeing the light.

Cheers,
Peter


signature.asc
Description: PGP signature


How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))

2023-07-10 Thread Peter Bex
On Fri, Jul 07, 2023 at 11:23:17PM +0200, felix.winkelm...@bevuta.com wrote:
> I'm not very comfortable with this change. This feels like trading in
> one inconsistency (weak refs being cleared for a potentially non-dead
> object) for another (potentially inconsistent ties of GC-controlled
> memory to non-GC'd resources).

It depends on how you view finalizers.  Personally, I would think a
finalizer should get run on what's *essentially* "already GCed" data.
Therefore, it makes no sense to pass a finalizer data that still holds
onto other cleared data.

But like I said, I can get behind your POV - you could also argue that
if an object holds onto other things and it's "already GCed", all the
things it (and nobody else) holds onto (even strongly) should be
cleared, and that absolutely makes no sense whatsoever.

(come to think of it, the ideal solution would probably be to clear
"outside" weak references to finalizable data but keep the object
itself internally intact.  But that's extremely hard to track in the GC)

However, there's one more concern:

> The potential use-after-free scenario can still happen if the object is
> kept alive, regardless of how we handle weak refs, this is unavoidable
> if we allow finalizers and keep the possibility of resurrection.

I have thought about this a bit more but I came to the conclusion that
from an abstraction point of view it's better to clear weak refs to
finalized data.  The reason is that when a module exposes an object, the
*user* should not need to know or care exactly how that object is
implemented and that it happens to use a finalizer.

So let's say an egg exposes an object, and I'm using it, but I want to
reference it weakly.  Then, all things will work fine most of the time.
However, there's a nasty race condition lurking: if the finalizer
happens to run before my code extracts the object from the weak ref,
but I extract it before the next GC, my code may crash.  Or it may not,
use-after-free is tricky like that.

This should not be the user's concern, and it's a breach of abstraction.
It's also a global issue, as it can affect *any* code that holds onto
an object from a "3rd party" (other module).

Note that this would be problematic *even* if the 3rd party wrote the
code so carefully that it clears the pointer from the object such that
there's no use-after-free bug.  Because the code will still raise an
exception when passed this invalidated object.  And again, that will
be a race condition for the person who uses this module.  This means
that the problem is spread to every single user who decides to use a
weak reference to a 3rd party object that involves resources that must
be freed.

On the other hand, if I'm writing an egg that exposes a foreign object
that needs to be collected, it *is* of my concern (and not a breach of
abstraction) that the data gets freed properly.  Let's say I decide to
store weak refs inside the object (which is not even *that* likely),
and those weak refs point to other things inside the same object, which
I know to be GCed at the time the finalizer is called.  Then that's
something I know and must take care of when writing the finalizer.
Finalizers of objects not seeing weak refs into that same object is
very much a localized concernn.

Also, perhaps more importantly, this will break *immediately* and
*consistently*.  The finalizer will simply see broken weak pointers, all
the time, every time it runs on the "collected" object.  Since this is
also documented in my patch, I think this is the least impactful way of
doing things.

I'm not even all that concerned about finalizers "reviving" dead
objects, but in that case it's *definitely* the responsibility of the
finalizer's author to make sure it doesn't cause any trouble.

So overall, IMHO the problem is a lot more self-contained/localized,
and more deterministic if we clear the references before running the
finalizer.  That has to count for something, I think!

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events)

2023-07-07 Thread Peter Bex
On Thu, Jul 06, 2023 at 09:05:03PM +0200, felix.winkelm...@bevuta.com wrote:
> > This would be problematic if the finalizer has run and deleted the
> > foreign object, while there are still weak references that hold onto
> > the object.  This has then become invalid/inconsistent.
> 
> I don't understand this, I'm afraid. Finalizers can always "revive"
> objects, this can't be avoided and may sometimes even be required.
> If weak refs suddenly make our memory model unsound, then the whole idea
> of weak references stands to discussion.

I don't think they make the memory model unsound per say.  But it is
an issue, and one avoided by MIT Scheme by simply making it impossible
to revive collected objects.  This thread made me consider what to do
with such weak references and I decided that we should clear references
to finalizable (which may already be finalized) objects.

Attached is a patch to ensure that "live" weak references don't hold
onto objects that may have been processed by a finalizer.  I think this
removes the worst potential use-after-free footguns.

NOTE: This issue already exists in older CHICKEN versions with weak
locatives, it is not newly introduced by weak pairs!

While working on this, I also noticed a remaining bug in the weak locative
handling: when a locative has already been cleared, it contains a NULL
pointer, and we need to avoid dereferencing it on the next GC.

Cheers,
Peter
From e9a267e6a1268f8f7eafdcca8b609c443d717e67 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Fri, 7 Jul 2023 11:07:43 +0200
Subject: [PATCH 1/2] Skip weak locatives that were already invalidated

This avoids a NULL pointer dereference
---
 runtime.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/runtime.c b/runtime.c
index edda5377..fbce94fd 100644
--- a/runtime.c
+++ b/runtime.c
@@ -4126,6 +4126,7 @@ static C_regparm void C_fcall update_locatives(int mode)
(mode == GC_REALLOC && !C_in_stackp(loc) && !C_in_heapp(loc))); /* 
NB: *old* heap! */
 
 ptr = C_block_item(loc, 0); /* fix up ptr */
+if (ptr == 0) continue; /* Skip already dropped weak locatives */
 offset = C_unfix(C_block_item(loc, 1));
 obj = ptr - offset;
 
-- 
2.40.1

From 9d1d1f3ee1284ba9e298d5dacdb77e8a3ad5fd27 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Fri, 7 Jul 2023 11:45:51 +0200
Subject: [PATCH 2/2] Don't retain weak references to finalizable objects

When an object would be garbage, but has a finalizer, it will be
artificially kept alive at the end of a GC cycle so that the finalizer
gets a chance to run, with the original event as an argument.

However, this object is effectively supposed to be garbage, and the
finalizer may put the object in a defunct state.  This means we don't
want code to be able to extract such an object through a weak
reference, because it should be considered garbage and *might* be
already cleaned up.  This could be a potential footgun, making use
after free bugs more likely.

To fix this, we make these "garbage" objects appear as garbage for the
purpose of clearing weak pointers.

The implementation is relatively trivial - we simply remember the heap
top pointer before we revive garbage objects that have finalizers, and
when dereferencing the pointer to the new object's location, where we
check that the object is in the target heap, we also check that it's
not in the portion of the heap where we've put the finalizable
objects and anything they reference.
---
 manual/Module (chicken gc)  |   6 ++
 runtime.c   |  38 +++-
 tests/weak-pointer-test.scm | 113 
 3 files changed, 141 insertions(+), 16 deletions(-)

diff --git a/manual/Module (chicken gc) b/manual/Module (chicken gc)
index 48653e3a..25a29373 100644
--- a/manual/Module (chicken gc)
+++ b/manual/Module (chicken gc)
@@ -53,6 +53,12 @@ Multiple finalizers can be registered for the same object. 
The order
 in which the finalizers run is undefined. Execution of finalizers
 may be nested.
 
+NOTE: When a finalizable object has any weak references (i.e., weak
+locatives or weak pairs) to objects that are only reachable through it
+or other finalizable objects, those references will be broken like
+when the objects had already been collected.  This is done in order to
+avoid user code from accessing objects that are possibly in an
+invalid state.
 
 === force-finalizers
 
diff --git a/runtime.c b/runtime.c
index fbce94fd..6fdd7fca 100644
--- a/runtime.c
+++ b/runtime.c
@@ -543,8 +543,8 @@ static void C_fcall mark_nested_objects(C_byte 
*heap_scan_top, C_byte *tgt_space
 static void C_fcall mark_live_objects(C_byte *tgt_space_start, C_byte 
**tgt_space_top, C_byte *tgt_space_limit) C_regparm;
 static void C_fcall mark_live_heap_only_objects(C_byte *tgt_space_start, 
C_byte **tgt_space_top, C_byte *tgt_space_limit) C_regparm;
 static C_word C_fcall intern0(C_char *name) C_re

Re: [PATCH] thread-safe handling of asynchronous events

2023-07-06 Thread Peter Bex
On Thu, Jul 06, 2023 at 02:51:43PM +0200, felix.winkelm...@bevuta.com wrote:
> That is indeed a shortcoming of the API. I must say that I'm not
> too keen on the "guardian" approach, though, or your suggested hacky
> workaround. There would be too many meanings attached to the
> argument for the polling procedure.

I agree, which is why I called it hacky.

> I also don't like the MIT apporach, since our very basic
> "set-finalizer!" has worked fine so far (with the execution context
> being the main problem).

Yeah, MIT's is rather heavy-handed.

Perhaps simply we can just return two values?  The first a polling
procedure and the second a procedure to add new objects to the
finalizer?  You can just receive the polling procedure in a
single-value context and ignore the object-adder if you don't want
to use it.

Alternatively, make-finalizer could return a new structure type that
represents the finalizer.  It could just be a wrapper for the queue
internally.  We'd have to add a getter and an adder procedure that
accepts this new object type (and perhaps a predicate).

> > Separating the collectable object from the context prevents the
> > collected object from re-entering the live system through the
> > finalization procedure (it may set! some variable to it, for example
> > and that's not desirable).  Apparently, this makes ephemerons easier to
> > implement.
> > I found this through Taylor Campbell's comment on Andy Wingo's blog post:
> > https://wingolog.org/archives/2022/10/31/ephemerons-and-finalizers
> 
> I don't quite understand why many APIs are so afraid of retaining
> the finalized object. The point is becoming aware of the object
> being reclaimable. If it survives yet another GC cycle, so what?

This would be problematic if the finalizer has run and deleted the
foreign object, while there are still weak references that hold onto
the object.  This has then become invalid/inconsistent.

And vice versa, if there are weak references which are now broken, and
the finalizer restores the object, this might cause different kinds of
inconsistencies to arise.

> Unless we are severely memory-constrained and must at all costs release
> the object, I see no need in separating object and context for
> finalization.

I don't think this separation was ever about memory efficiency.

> Moreover, there may be cases when we may need full access
> to the object to perform our finalization action.

Yeah, hiding the object could be problematic if it holds onto
several foreign objects that all have to be cleaned up.

> > Now I think this API is rather heavy-handed, but perhaps we can settle
> > on some sort of middle ground.  I also think that maybe separating the
> > finalization object from the "context" is too late for us, unless we
> > decide to rework the set-finalizer! API as well for CHICKEN 6.
> 
> Indeed. I wouldn't want to touch the underlying machinery. it works
> pretty well, I think.

Agreed.


> [ event queue API ]

> That sounds good, and I understand the motivation but I don't want to
> touch the scheduler, to be honest. The UTF transition is already enough
> work as it is. The original subject is merely a question of API choice,
> let's not try to fix everything right now.

Like I said, just a brain dump.  Perhaps something to keep in mind for
CHICKEN 7.  But as we design a new API we'd do well to make such a
change possible without too many changes in user code.

> I will try to come up with something better, to allow adding objects
> to an existing finalizer.

Looking forward to it!

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] thread-safe handling of asynchronous events

2023-07-06 Thread Peter Bex
On Wed, Jul 05, 2023 at 03:28:54PM +0200, felix.winkelm...@bevuta.com wrote:
> The first patch provides the event-queue mechanism and cleans up the
> scheduler a bit (hiding internal variables while also exposing ##sys#...
> procedures to access them). This also defines hooks that a threading
> API should override to allow suspension/resumption on events.

I'm not super happy with exposing ##sys#fd-list and ##sys#timeout-list as
a procedure under the exact same name that originally held a list.
If there's any code that uses it, that would lead to strange errors.
Might be better to either not expose it at all, or use a completely
different name.

Perhaps now is a good time to make the scheduler into a proper module
that exposes procedures in an unprefixed way?

> The second patch provides a new signal API: "make-signal-handler",
> which creates a handler for one or more signals and "signal-ignore" and
> "signal-default" (to ignore a signal or set the default disposition).
> This is roughly modelled after the Racket API. The old "signal-handler"
> and "set-signal-handler!" procedures have been deprecated.
> 
> The third patch provides "make-finalizer", inspired by Chez' guardians,
> but slightly different to allow blocking/non-blocking tests for
> finalizations. The old API is still available, as often finalization
> performs only very basic operations independent of the currently
> executing context. Finalization procedures can be composed, as a
> finalizer itself becomes subject to finalization once all associated
> objects have been collected. Building guardians on top of this
> should be straightforward, but the latter does not (to my knowledge)
> allow blocking a thread until a finalizer triggers so I chose not
> to implement this interface.

This API looks nice, but as I understand it, make-finalizer requires
all objects in each "pool" to be known when calling it.  AFAIK this
means that when you are creating objects on-the-fly, you'd have to
call make-finalizer N times, and when retrieving objects you'd have
to poll N finalizer "pools", and therefore you can't block on all
objects you'd want to collect.

So what's missing is a way to add an object to an existing finalizer.
Note that this is what guardians support too - when you call a
guardian procedure with an argument, that object is added to the pool.

A (somewhat hacky) way to do this would be to have the finalizer closure
check whether the "mode" argument is a boolean.  If "mode" is not a
boolean, it is assumed to be an object to add to the queue.
Perhaps with an extra check that it's not an immediate value.

Alternatively, we could steal (part of) MIT Scheme's (undocumented)
finalizer API:

https://git.savannah.gnu.org/cgit/mit-scheme.git/tree/src/runtime/gcfinal.scm?id=176cd871bdd9c9dabcb8ad602da0b618be2d0373#n36

There, you call (make-gc-finalizer) with the finalization procedure,
a predicate for the type of object that may be collected, a procedure
that can be used to extract the "context" from the collectable object
and a procedure that can be used to set the context inside the
collectable object.  The "context" here is the value that gets passed
to the finalizer procedure (typically, this would be the foreign pointer
inside a finalizable struct object).

Then, add-to-gc-finalizer! can be used to add objects to the finalizer
and remove-from-gc-finalizer! can be used to remove objects from the
finalizer.

Separating the collectable object from the context prevents the
collected object from re-entering the live system through the
finalization procedure (it may set! some variable to it, for example
and that's not desirable).  Apparently, this makes ephemerons easier to
implement.
I found this through Taylor Campbell's comment on Andy Wingo's blog post:
https://wingolog.org/archives/2022/10/31/ephemerons-and-finalizers

Now I think this API is rather heavy-handed, but perhaps we can settle
on some sort of middle ground.  I also think that maybe separating the
finalization object from the "context" is too late for us, unless we
decide to rework the set-finalizer! API as well for CHICKEN 6.

Perhaps another API that exposes the queues directly might be better.
This way one could use one queue for different things, so that
finalization, signals and file descriptor readiness could happen on
the same queue, so that a thread could block on "any event".  A bit
like the waitForMultipleObjects API in Windows.

Something like:

(make-event-queue) => q

(enqueue-read q PORT)  ;; and a low-level (enqueue-read/fd q FD)?
(enqueue-write q PORT)
(enqueue-i/o q PORT)   ;; read or write
(enqueue-finalization! q OBJ)

(wait-for-event! q)  ;; blocks for any "enqueued" event
(peek-event q)   ;; returns #f if nothing ready

This makes it easy to pick off events from a queue, regardless of the
type and you'll get an object of the type.  I don't know if it can be
done easily, but it would allow GUI libraries to hook into this too
by having an event loop provide 

Re: [PATCH] fix "tail?"

2023-07-06 Thread Peter Bex
On Wed, Jul 05, 2023 at 07:27:43PM +0200, felix.winkelm...@bevuta.com wrote:
> Reported by "acdw": make "tail?" more robust. Also generalizes
> it to accept all types of data, remove unneeded fast path expecting
> proper list argument.

Thanks, pushed!

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Fix get-call-chain thread filtering

2023-06-30 Thread Peter Bex
Hi there,

The attached patch fixes a small but important oversight in
get-call-chain.  With this patch, the "trace" egg's tests pass again.

Cheers,
Peter
From 7e20f8dc5310ee8fb41522bff062783b224b07f3 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Fri, 30 Jun 2023 09:42:48 +0200
Subject: [PATCH] Fix thread id extraction in get-call-chain

This was missed when changing the trace buffer struct; all accessors
had to be adjusted by adding one to the index, except for the "raw"
slot.

Without it, the filtering of call chain entries for the current thread
will fail for anything with "frameinfo" / a filled cooked2 slot, so
the call chain will be a lot smaller than it ought to be.
---
 library.scm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/library.scm b/library.scm
index 94f304b8..b04b330a 100644
--- a/library.scm
+++ b/library.scm
@@ -4975,7 +4975,7 @@ EOF
(let loop ((i 0))
  (if (fx>= i n)
  '()
- (let ((t (##sys#slot vec (fx+ i 3 ; thread id
+ (let ((t (##sys#slot vec (fx+ i 4 ; thread id
(if (or (not t) (not thread) (eq? t-id t))
(cons (vector
   (or (##sys#slot vec (fx+ i 1)) ; cooked_location
-- 
2.40.1



signature.asc
Description: PGP signature


[PATCH] Restore read/source-info in support.scm and export it officially from (chicken syntax)

2023-06-30 Thread Peter Bex
Hi all,

I was checking the Salmonella results and it seems a good number of eggs
are actually relying on chicken.compiler.support#read/source-info and
therefore break with the current master:
http://salmonella-linux-x86.call-cc.org/master/clang/linux/x86/2023/06/28/yesterday-diff/

The attached patch restores the procedure in support.scm, but keeps the
implementation itself in chicken.syntax, where it makes more sense.
Since these eggs are using it, that TODO above ##sys#read/source-info
is settled - it's useful enough for those eggs, so let's just expose
it publically and document it.

I noticed there's also an error in the "trace" egg.  This probably has
something to do with the changes in the call chain.  Not sure how, but
I'll investigate.

Cheers,
Peter
From ed691a470bd45bf1ecb9a8d8f53fdfbe4d8c952a Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Fri, 30 Jun 2023 09:01:36 +0200
Subject: [PATCH] Restore read/source-info in support.scm and export from
 (chicken syntax)

It turns out several eggs got broken by dropping
chicken.compiler.support#read/source-info because they are using it
directly, even though it's undocumented and supposed to be "internal".

So, restore it in support.scm as an alias for the one in
(chicken syntax) to fix those eggs, but mark it as deprecated.

Since several eggs make use of it, that settles the question of
whether it is useful to export it from (chicken syntax), so do this.
Also, since it's now an official API, make it safe to use by checking
the argument for being a port, and make the argument optional, like in
the regular (read) procedure.
---
 DEPRECATED |  5 +
 NEWS   |  4 
 batch-driver.scm   |  4 +++-
 chicken.syntax.import.scm  |  1 +
 csi.scm|  7 ---
 eval.scm   | 13 -
 expand.scm |  5 +++--
 manual/Module (chicken syntax) | 12 
 repl.scm   |  7 ---
 rules.make |  1 +
 support.scm|  6 +-
 types.db   |  5 +
 12 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/DEPRECATED b/DEPRECATED
index 7d9cffe7..6315e591 100644
--- a/DEPRECATED
+++ b/DEPRECATED
@@ -1,6 +1,11 @@
 Deprecated functions and variables
 ==
 
+5.3.1
+- read/source-info was moved from the internal, undocumented module
+  (chicken compiler support) into (chicken syntax).  Using it from
+  the former module is deprecated.
+
 5.2.1
 - current-milliseconds and its C implementations C_milliseconds and
   C_a_i_current_milliseconds have been deprecated in favor of
diff --git a/NEWS b/NEWS
index 023d8939..68940ef5 100644
--- a/NEWS
+++ b/NEWS
@@ -21,6 +21,10 @@
 filename. Previously, the directory part would be stripped.
   - Added support for embedded strings and characters in SRFI-4 vector
 literals.
+  - read/source-info is now documented and officially supported, from
+the (chicken syntax) module.  It is still exported from the
+undocumented internal (chicken compiler support) module, but
+using it from there is deprecated.
 
 - Tools
   - The -R option for csi and csc now accepts list-notation like
diff --git a/batch-driver.scm b/batch-driver.scm
index ee7ae28a..f0a5fc08 100644
--- a/batch-driver.scm
+++ b/batch-driver.scm
@@ -47,6 +47,7 @@
chicken.pretty-print
chicken.process-context
chicken.string
+   chicken.syntax
 chicken.port
chicken.time
 chicken.condition
@@ -608,7 +609,8 @@
   (in (check-and-open-input-file f)) )
  (fluid-let ((##sys#current-source-filename f))
(let loop ()
- (let ((x (##sys#read/source-info in)))
+ (let ((x (chicken.syntax#read/source-info in))) ; 
OBSOLETE - after bootstrapping we can get rid of this explicit namespacing
+
(cond ((eof-object? x) 
   (close-checked-input-file in f) )
  (else
diff --git a/chicken.syntax.import.scm b/chicken.syntax.import.scm
index 7ce92fea..0aa5b36d 100644
--- a/chicken.syntax.import.scm
+++ b/chicken.syntax.import.scm
@@ -32,6 +32,7 @@
  'expand
  '((expand . chicken.syntax#expand)
(get-line-number . chicken.syntax#get-line-number)
+   (read/source-info . chicken.syntax#read/source-info)
(strip-syntax . chicken.syntax#strip-syntax)
(syntax-error . chicken.syntax#syntax-error)
(er-macro-transformer . chicken.syntax#er-macro-transformer)
diff --git a/csi.scm b/csi.scm
index fca1da4e..28ab16a0 100644
--- a/csi.scm
+++ b/csi.scm
@@ -280,7 +280,7 @@ EOF
 (define default-evaluator
   (let ((eval eval)
(load-noisily load-noisily)
-   (read (lambda () (##sys#read/source-info (current-input-port
+

[PATCH] Use weak chain approach for locatives, too

2023-06-27 Thread Peter Bex
Hi all,

Here's a patch to replace the locative table with the same approach we
use for weak pair tracking.  Not much to say, the idea is simple, the
patch a bit bigger (because it rips out the locative table).

There's no manual or NEWS entry because this should not be user-visible
unless code creates lots of garbage locatives (as those would not be
traversed during GC, unlike before).

As I point out in the commit message, I think it would be worthwile for
CHICKEN 6 to change the representation of locatives.  Let me elaborate:

Currently, we store a direct C pointer to the target object's slot
in the locative's first slot.  Because we also need to update said
pointer during GC, we also store the offset (in the second slot) and
do lots of calculations on *every* GC, for *every* locative, to find
the original object, chase its forwarding pointer and then re-apply
the offset.  There's a fourth slot for the type of object (which we
use to determine the offset's "stride" and when dereferencing to
figure out how to "convert" it. Finally, there's a fifth slot which
holds either the object (again!) or #f if it's a weak locative.

Graphically:

   
  | HEADER | PTR | FX offset | FX subtype | OBJ/#f |
   

If we change it such that we *always* store the object
in the first slot, we can make the distinction of weak/strong refs
like we do for pairs: set the C_SPECIALBLOCK_BIT conditionally
when the reference is weak, and clear it if it's strong.

Graphically:

   ---
  | HEADER | OBJ | FX offset | FX subtype |
   ---

This change has two benefits:

- locatives are smaller, so (slightly) less GC pressure if
  lots of these are made.  But more importantly:
- when traversing locatives, we *only* have to process
  weak locatives specially in the GC, and there's no need to
  recalculate the pointer.  This means if a program uses only
  strong locatives, there is *no additional penalty* incurred
  after a garbage collection run.

The only disadvantage is that every access of a locative will
now have to do the calculation to find the base pointer.  This
might slow down locative-ref somewhat.  But I think it's
manageable - we already have a switch statement in locative_ref,
so adding the offset calculation for all the different types of
locatives is trivial and should only add *very* minor overhead,
while the GCs should all be faster.

The change should be relatively minor, but because it's a change
in the representation, and there may be code out there that
relies on the current representation, I think this change is
better left for CHICKEN 6.

Cheers,
Peter
From 8f753a2d031a3c23198c397223f55ce3ae7ec087 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Tue, 27 Jun 2023 08:13:59 +0200
Subject: [PATCH] Replace locative table with simpler "weak chain" solution

Instead of keeping track of every locative in a table, we instead use
the same approach as for tracking weak pairs: during GC, as we
encounter live locatives, build up a chain which we traverse when the
GC has completed.  We "recycle" the first slot of the locative when it
is turned into a forwarding pointer for storing the chain pointer.

Unlike weak pairs, we have to traverse both strong *and* weak
locatives, because their pointer slots need to be fixed up.  This
could be improved if we change the representation of locatives to be
object+offset instead of pointer+offset(+object), and have the
C_SPECIALBLOCK_BIT set depending on whether it is weak/strong.  This
would be a fundamental representational change, so this would be
better left for CHICKEN 6.
---
 runtime.c   | 205 +++-
 tests/weak-pointer-test.scm |  75 -
 2 files changed, 136 insertions(+), 144 deletions(-)

diff --git a/runtime.c b/runtime.c
index 5ff08f3f..0dc05feb 100644
--- a/runtime.c
+++ b/runtime.c
@@ -153,7 +153,6 @@ static C_TLS int timezone;
 #define DEFAULT_HEAP_MIN_FREE  (4 * 1024 * 1024)
 #define HEAP_SHRINK_COUNTS 10
 #define DEFAULT_FORWARDING_TABLE_SIZE  32
-#define DEFAULT_LOCATIVE_TABLE_SIZE32
 #define DEFAULT_COLLECTIBLES_SIZE  1024
 #define DEFAULT_TRACE_BUFFER_SIZE  16
 #define MIN_TRACE_BUFFER_SIZE  3
@@ -402,8 +401,8 @@ static C_TLS C_word
   **mutation_stack_limit,
   **mutation_stack_top,
   *stack_bottom,
-  *locative_table,
   weak_pair_chain,
+  locative_chain,
   error_location,
   interrupt_hook_symbol,
   current_thread_symbol,
@@ -468,8 +467,6 @@ static C_TLS double
 static C_TLS LF_LIST *lf_list;
 static C_TLS int signal_mapping_table[ NSIG ];
 static C_TLS int
-  locative_table_size,
-  locative_table_count,
   live_finalizer_count,
   allocated_finalizer_count,
   pending_finalizer_count,
@@ -544,8 +541,8 @@ static void C_fcall mark_nested_objects(C_byte 
*heap_scan_top

Re: [PATCH] fix some problems with the SRFI-4 syntax extensions added recently

2023-06-26 Thread Peter Bex
On Sun, Jun 25, 2023 at 08:06:01AM +0200, felix.winkelm...@bevuta.com wrote:
> Hi!
> 
> Attached a patch to fix some problems with the newly introduced
> extension to SRFI-4 vector read syntax. "Siiky" pointed out a bogus
> empty string comparison and further testing showed that empty strings
> were not handled properly in certain situations.

Thanks, pushed!

> As Peter remarks, this code is too hairy. I added a comment trying
> to explain the reason for this.

Thanks for adding a bit of explanation.  Are the lists really that long
though?  I'd expect this syntax to be used only in user-typed literals,
not in data to be read in.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] Add line number tracking to the interpreter

2023-06-25 Thread Peter Bex
Hi all,

I noticed that the interpreter got quite a bit slower due to fetching
the line number information for an expression when emitting info into
the trace buffer.  Attached is an additional patch to bring things back
to our original performance by pre-fetching the line number info and
passing it as an additional argument to emit-trace-info.

Patch has also been pushed to the line-numbers-in-csi branch.

Cheers,
Peter
From df10eb66dc22a1538bdab4ceafd3f15812c18842 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Sun, 25 Jun 2023 10:22:01 +0200
Subject: [PATCH] Fetch line number from info in evaluator before compiling
 applications

Emitting trace information should be as cheap as possible, it is
called many times during a program's execution (upon every procedure
application).  The get-line-number call would slow things down by a
factor of almost 2.  Instead, retrieve the line number at closure
compilation time and pass it directly to emit-trace-info.
---
 eval.scm | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/eval.scm b/eval.scm
index 9a69e051..7fa72028 100644
--- a/eval.scm
+++ b/eval.scm
@@ -109,11 +109,11 @@
((eq? x (##sys#slot lst 0)) i)
(else (loop (##sys#slot lst 1) (fx+ i 1))) ) ) )
 
-  (define (emit-trace-info tf info cntr e v) 
+  (define (emit-trace-info tf ln info cntr e v)
(when tf
  (##core#inline 
   "C_emit_trace_info"
-  (or (get-line-number info) "")
+  ln
   info
   (##sys#make-structure 'frameinfo cntr e v)
   (thread-id ##sys#current-thread) ) ) )
@@ -689,37 +689,38 @@
   (compile (##sys#slot x 0) e #f tf cntr #f)))
   (args (##sys#slot x 1))
   (argc (checked-length args))
-  (info x) )
+  (info x)
+  (ln (or (get-line-number info) "")))
  (case argc
((#f) (##sys#syntax-error/context "malformed expression" x))
((0) (lambda (v)
-  (emit-trace-info tf info cntr e v)
+  (emit-trace-info tf ln info cntr e v)
   ((##core#app fn v
((1) (let ((a1 (compile (##sys#slot args 0) e #f tf cntr #f)))
   (lambda (v)
-(emit-trace-info tf info cntr e v)
+(emit-trace-info tf ln info cntr e v)
 ((##core#app fn v) (##core#app a1 v))) ) )
((2) (let* ((a1 (compile (##sys#slot args 0) e #f tf cntr #f))
(a2 (compile (##core#inline "C_u_i_list_ref" args 1) e 
#f tf cntr #f)) )
   (lambda (v)
-(emit-trace-info tf info cntr e v)
+(emit-trace-info tf ln info cntr e v)
 ((##core#app fn v) (##core#app a1 v) (##core#app a2 v))) ) 
)
((3) (let* ((a1 (compile (##sys#slot args 0) e #f tf cntr #f))
(a2 (compile (##core#inline "C_u_i_list_ref" args 1) e 
#f tf cntr #f))
(a3 (compile (##core#inline "C_u_i_list_ref" args 2) e 
#f tf cntr #f)) )
   (lambda (v)
-(emit-trace-info tf info cntr e v)
+(emit-trace-info tf ln info cntr e v)
 ((##core#app fn v) (##core#app a1 v) (##core#app a2 v) 
(##core#app a3 v))) ) )
((4) (let* ((a1 (compile (##sys#slot args 0) e #f tf cntr #f))
(a2 (compile (##core#inline "C_u_i_list_ref" args 1) e 
#f tf cntr #f))
(a3 (compile (##core#inline "C_u_i_list_ref" args 2) e 
#f tf cntr #f))
(a4 (compile (##core#inline "C_u_i_list_ref" args 3) e 
#f tf cntr #f)) )
   (lambda (v)
-(emit-trace-info tf info cntr e v)
+(emit-trace-info tf ln info cntr e v)
 ((##core#app fn v) (##core#app a1 v) (##core#app a2 v) 
(##core#app a3 v) (##core#app a4 v))) ) )
(else (let ((as (##sys#map (lambda (a) (compile a e #f tf cntr #f)) 
args)))
(lambda (v)
- (emit-trace-info tf info cntr e v)
+ (emit-trace-info tf ln info cntr e v)
  (apply (##core#app fn v) (##sys#map (lambda (a) 
(##core#app a v)) as))) ) ) ) ) )
 
   (compile exp env #f (fx> (##sys#eval-debug-level) 0) cntr tl?) ) ) )
-- 
2.40.1



signature.asc
Description: PGP signature


Re: [PATCH] Extend SRFI-4 vector syntax

2023-06-24 Thread Peter Bex
On Wed, Jun 07, 2023 at 09:12:51PM +0200, felix.winkelm...@bevuta.com wrote:
> This patch allows strings and chars in homogenous number vectors,
> as suggested by klm at the meetup.

Code is a bit hairy, but seems to work.  Pushed!

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Add line number tracking to the interpreter

2023-06-23 Thread Peter Bex
Hi all,

Attached is a set of patches (also to be found in the
"line-numbers-in-csi" in the branch) to add line number tracking to the
interpreter (both for csi and user-invoked (repl) calls).

The first commit simply moves some code around.  Having the line
number database accessors in support.scm makes them only available
to the compiler, while we now want to be able to use them from the
interpreter as well.  So we move it to expand.scm.

The second patch changes the interpreter to track line numbers via
fluid-letting the ##sys#read/source-info-hook, like the compiler does.
In order to store the line number info in the trace buffer, we have
to also extend the trace buffer struct to hold either raw C strings
or Scheme strings for the location.

The third patch changes the alist we store in the line number hash table
to weakly hold onto the form (using the new weak-pairs feature), so
that we can drop unreferenced forms from the database.  I tested this
with a trivial program consisting of two files:

  ;; loader.scm
  (import chicken.syntax)
  (let lp ((i 0))
#;(when (= 0 (modulo i 999))
  (##sys#display-line-number-database))
(when (< i 999)
  (load "def.scm")
  (lp (add1 i

  ;; def.scm
  (define x (lambda (y)
 (+ y 1)))

If you run csi -s loader.scm, this would eat up more and more memory
with only the first two patches, but it stays constant with the third
patch.

The fourth patch improves line number tracking a bit to ensure we
keep line numbers during syntax expansion (like in the compiler).
Without this, you see a lot of un-numbered  entries in the
csi trace output.

The fifth patch moves the tracking to (repl), so it also works for
user-created REPLs, not just csi.

The sixth patch is a small refactor so we're not using fluid-let to
override the ##sys#default-read-info-hook for (read), but instead
use ##sys#read/source-info directly in all code that will read and then
evaluate code.

I'm only wondering if we need to make the hash table itself weak as
well.  This would require quite a few more changes and I'm not sure of
the benefits.

Cheers,
Peter
From 1330e7d252eac583d6c0bbabc4917a99a1437135 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Tue, 20 Jun 2023 15:14:53 +0200
Subject: [PATCH 1/6] Move line number database accessors from support.scm to
 expand.scm

Because support.scm is only available inside the compiler, we can't
call its procedures in the interpreter.  Since the line number
database variable itself is defined in expand.scm, it makes sense to
put the accessors there as well.

While moving, it became apparent that get-line from support.scm is
essentially the same as get-line-number from expand.scm, so use that
everywhere instead.

The other procedures are now prefixed with ##sys# because we don't
want to export them (yet?) from the user-visible chicken.syntax
module.

Rewrite display-line-number-database from printf to ##sys#print to
avoid adding a dependency on extras.scm / (chicken format) in
expand.scm.

It's still not 100% isolated - it would be better if the line number
database itself would purely be accessed through accessors, but it is
currently mutated directly in various places.
---
 batch-driver.scm|  6 +++---
 compiler-syntax.scm |  7 ---
 core.scm| 32 +++
 expand.scm  | 46 +++--
 rules.make  |  8 ++--
 support.scm | 44 +++
 6 files changed, 76 insertions(+), 67 deletions(-)

diff --git a/batch-driver.scm b/batch-driver.scm
index 0b3d7e02..ee7ae28a 100644
--- a/batch-driver.scm
+++ b/batch-driver.scm
@@ -27,7 +27,7 @@
 
 (declare
   (unit batch-driver)
-  (uses extras data-structures pathname
+  (uses extras data-structures pathname expand
support compiler-syntax compiler optimizer internal
;; TODO: Backend should be configurable
scrutinizer lfa2 c-platform c-backend user-pass))
@@ -608,7 +608,7 @@
   (in (check-and-open-input-file f)) )
  (fluid-let ((##sys#current-source-filename f))
(let loop ()
- (let ((x (read/source-info in)))
+ (let ((x (##sys#read/source-info in)))
(cond ((eof-object? x) 
   (close-checked-input-file in f) )
  (else
@@ -670,7 +670,7 @@
 (when (debugging '|N| "real name table:")
   (display-real-name-table) )
 (when (debugging 'n "line number database:")
-  (display-line-number-database) )
+  (##sys#display-line-number-database) )
 
 (set! ##sys#line-number-database line-number-database-2)
 (set! line-number-database-2 #f)
diff --git a/compiler-syntax.scm b/compiler

Re: [PATCH] fix segfault in csi's ",d"

2023-06-19 Thread Peter Bex
On Mon, Jun 19, 2023 at 01:38:13PM +0200, felix.winkelm...@bevuta.com wrote:
> Attached patch fixes a segfault when trying to describe a brokwn weak
> ptr. I also added a clause in the case construct to dispatch on a
> described value to catch unknown immediate values instead of crashing
> in the locative test (which expects block values).

Thanks for fixing my oversight ;)

Good call on "hardening" it a bit against unexpected other values.

I've pushed the patch.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: Hyperbolic Functions Patch Set

2023-06-10 Thread Peter Bex
On Mon, Jun 05, 2023 at 08:12:01PM +0200, felix.winkelm...@bevuta.com wrote:
> Thanks a lot, Christian, much obliged.

Indeed, thanks Christian!

> I have combined the two patches and amended the result slightly (wrappers in
> runtime.c for all trigonometric functions were not calling the
> aliases from chicken.h, something we forgot and which I change here,
> while we're at it. I also added a few lines of documentation).

I added a NEWS entry and updated the types.db file to have these new
procedures as well.

> We now just need some other -hacker to sign off push this.

Done!

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Allow collecting weak pairs in minor GC (was: Re: [PATCH] Add user-facing weak pair API)

2023-06-09 Thread Peter Bex
On Sat, Jun 03, 2023 at 11:12:40PM +0200, Peter Bex wrote:
> Attached are 5 patches to add the weak pair support.  I've also pushed
> this to the git repo under the "user-facing-weak-pairs" branch.

Attached are two more patches that together wrap up the weak pairs
support.  The first patch gets rid of the hacks that were needed
to get weak pairs working *efficiently* in the original
implementation, by "ignoring" weak pairs in minor GCs and treating them
like regular pairs instead.

Those hacks were needed because initially, weak pairs required
traversing the entire symbol table which would be prohibitively
expensive during minor GC.

The second patch is a minor performance / quality of implementation
improvement that came from an insight I had while enabling minor GC
for weak pairs.  I realised that sometimes, we'd hit the assertion
that a weak pair's car must be a block object in update_weak_pairs().
This got hit because "somehow" the car already contained a #!bwp
object.  Then I realised that's because the same pair gets linked up
in the weak pair chain more than once, and this is more likely to
trigger when minor GCs are allowed to collect weak pairs because
minor GCs are likely to trigger major GCs.  In the first patch
I papered over this by adding a special case for this in
update_weak_pairs(), but it's cleaner to avoid the situation in the
first place.  This is done by resetting the weak pair chain to NULL
when a new GC "mode" is entered.

That works because each forwarded weak pair is guaranteed to be
forwarded again in the new mode, and every non-forwarded weak pair
is *also* guaranteed to be forwarded, if it is live.

I've pushed these two patches to the user-facing-weak-pairs branch
as well.

I ran a benchmark in hopes it would improve performance, but it doesn't
seem to have any effect, which makes sense of course.  The benchmark
programs are all compiled, so their symbols won't be collected, as
the strings for the symbols are all in static ("permanent") storage.
I expect it to have a larger effect on interpreted code and code that
generates a lot of short-lived symbols (e.g. macro-heavy code).

Anyway, remaining TODOs that can be tackled after merging:
- Doing the same for locatives (should greatly affect GC times
  in locative-heavy code, even for minor GCs!)
- Adding line numbers (why we did this in the first place!)

Cheers,
Peter
From 62fcf64a9d6f311730abfae7c664b1a29950f318 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Fri, 9 Jun 2023 08:17:13 +0200
Subject: [PATCH 1/2] Enable minor collection of weak pairs

In the initial weak symbol GC implementation, we added a temporary
hack in mark_nested_objects() to skip weak pairs in minor GC.

This was required because back then, we still had to scan the entire
symbol table in order to correctly drop GCed symbols.  Now that we're
not doing that anymore, we can collect weak pairs in minor GCs too.
This allows us to drop said hack.  We also can drop an extra check in
really_mark which was added to prevent adding weak pairs to the weak
pair chain in minor GC.

Drop the assert in update_weak_pairs and refactor the liveness checks
to be more comprehensive (the old checks didn't account for pairs
allocated in static memory or malloced memory - not likely, but might
be possible now they're user-facing), and to include stack checks for
the minor GC.

Change the weak pointer tests not to do major collections, and add
(slightly paranoid) checks that "permanent" symbols are kept around.
They're paranoid because permanent symbols don't actually live in
static memory - only their strings do.  So they should not result
in broken weak pointers anyway.

While working on this, I noticed that there's a small but odd
limitation - when using set-car! on a weak pair, depending on where
the pair and the car value live, we add it to the mutation stack,
which gets marked unconditionally.  Therefore, the value will be kept
alive on the next GC even if nothing else holds onto it.

To avoid this, we would need to know in C_mutate whether the slot
we're mutating is a GC-managed slot (ie it is not the first slot of a
"special block").  We don't have this information there, and making it
available would entail a massive API change.  This is not worth the
hassle, so document this minor limitation, instead.
---
 manual/Module (chicken base) |  4 
 runtime.c| 40 ++--
 tests/weak-pointer-test.scm  | 38 ++
 3 files changed, 53 insertions(+), 29 deletions(-)

diff --git a/manual/Module (chicken base) b/manual/Module (chicken base)
index d26ba5ad..11d9c441 100644
--- a/manual/Module (chicken base)  
+++ b/manual/Module (chicken base)  
@@ -176,6 +176,10 @@ They're the same as regular pairs for all intents and 
purposes.
 However, there's a {{weak-pair?}} predicate which ''can'' distinguish
 between r

Re: [PATCH] Compact symbol table in lookup/intern, and fix a few gitignores (was: [PATCH] Add user-facing weak pair API)

2023-06-07 Thread Peter Bex
On Sat, Jun 03, 2023 at 11:12:40PM +0200, Peter Bex wrote:
> Attached are 5 patches to add the weak pair support.  I've also pushed
> this to the git repo under the "user-facing-weak-pairs" branch.

I've pushed two more patches, attached here as well.

The first one implements the idea of moving the "compaction"
of the symbol table out of the GC and into the lookup/intern
process, so that we don't traverse the (possibly unchanged)
entire symbol table every time we do a major GC.

It seems to give a minor performance improvement to a few benchmarks,
but it could just be noise.  See the attached benchmark.diff
file.

The second just adds a few missing .gitignore rules for two files
I noticed under "Untracked files": libchicken.so.11 and
srfi-4.import.scm

Cheers,
Peter
From 59c9cbc180506c61d64e7d0cb6532fa64e3347c9 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Wed, 7 Jun 2023 09:56:18 +0200
Subject: [PATCH 1/2] Move symbol table compaction to lookup/interning of
 symbols

Instead of adding an additional pass to the GC in which we scan over
the symbol tables and drop empty buckets, do this in lookup().  The
lookup() function is used by all the *intern*() functions and in
C_lookup_symbol() as well, so this is done on every symbol lookup and
interning of symbols.  The check is relatively cheap and we only
compact buckets we actually have to look at anyway.

This means we may keep around "broken" buckets for longer than
necessary, if we don't perform any lookups or interning in between
GCs.  This is probably fine - only symbol-heavy code would create lots
of broken buckets anyway, and symbol-heavy code would be doing
lookups, so we'd be doing the compaction more often as well.

This does not touch lookup_bucket() which is used to persist/unpersist
a given symbol by toggling its bucket's "is a weak pair" bit aka
specialblock bit.  We could do it there too, but I figured it makes
more sense to only do this when we have to walk the table for
string->symbol and ##sys#interned-symbol?, as those are typically
expected to be somewhat "heavy" already.

We also have to tweak compute_symbol_table_load(), as that is the
function we use in the symbolgc test to find out how many live symbols
we still have.  And more in general, it makes sense to not count
buckets with broken weak pointers as "live symbols" anyway.

While at it, drop some of the paranoid debugging checks as it would be
a bit awkward to try and shoehorn those into the lookup function().
---
 runtime.c | 139 +++---
 1 file changed, 27 insertions(+), 112 deletions(-)

diff --git a/runtime.c b/runtime.c
index d3a3b750..755a9bec 100644
--- a/runtime.c
+++ b/runtime.c
@@ -546,8 +546,6 @@ static void C_fcall mark_live_heap_only_objects(C_byte 
*tgt_space_start, C_byte
 static C_word C_fcall intern0(C_char *name) C_regparm;
 static void C_fcall update_locative_table(int mode) C_regparm;
 static void C_fcall update_weak_pairs(int mode) C_regparm;
-static void C_fcall fixup_symbol_forwards(C_word sym) C_regparm;
-static void C_fcall update_symbol_tables(int mode) C_regparm;
 static LF_LIST *find_module_handle(C_char *name);
 static void set_profile_timer(C_uword freq);
 static void take_profile_sample();
@@ -2449,16 +2447,24 @@ C_regparm C_word C_fcall hash_string(int len, C_char 
*str, C_word m, C_word r, i
 
 C_regparm C_word C_fcall lookup(C_word key, int len, C_char *str, 
C_SYMBOL_TABLE *stable)
 {
-  C_word bucket, sym, s;
+  C_word bucket, last = 0, sym, s;
 
   for(bucket = stable->table[ key ]; bucket != C_SCHEME_END_OF_LIST; 
   bucket = C_block_item(bucket,1)) {
 sym = C_block_item(bucket,0);
-s = C_block_item(sym, 1);
 
-if(C_header_size(s) == (C_word)len
-   && !C_memcmp(str, (C_char *)C_data_pointer(s), len))
-  return sym;
+/* If the symbol is unreferenced, drop it: */
+if (sym == C_SCHEME_BROKEN_WEAK_PTR) {
+   if (last) C_set_block_item(last, 1, C_block_item(bucket, 1));
+   else stable->table[ key ] = C_block_item(bucket,1);
+} else {
+  last = bucket;
+  s = C_block_item(sym, 1);
+
+  if(C_header_size(s) == (C_word)len
+ && !C_memcmp(str, (C_char *)C_data_pointer(s), len))
+return sym;
+}
   }
 
   return C_SCHEME_FALSE;
@@ -2538,14 +2544,23 @@ C_regparm C_word C_fcall lookup_bucket(C_word sym, 
C_SYMBOL_TABLE *stable)
 
 double compute_symbol_table_load(double *avg_bucket_len, int *total_n)
 {
-  C_word bucket;
+  C_word bucket, last;
   int i, j, alen = 0, bcount = 0, total = 0;
 
   for(i = 0; i < symbol_table->size; ++i) {
-bucket = symbol_table->table[ i ];
-
-for(j = 0; bucket != C_SCHEME_END_OF_LIST; ++j)
-  bucket = C_block_item(bucket,1);
+last = 0;
+j = 0;
+for(bucket = symbol_table->table[ i ]; bucket != C_SCHEME_END_OF_LIST; 
+bucket = C_block_item(bucket,1)) {
+

Re: [PATCH] Add user-facing weak pair API

2023-06-07 Thread Peter Bex
On Tue, Jun 06, 2023 at 07:46:10PM +0200, felix.winkelm...@bevuta.com wrote:
> > Do we even want to make a 5.4 release at all?
> 
> I would suggest to add this for 5.4, for the reasons you
> state. I also would be for making at least one minor release
> before C6, as I doubt that we will get UTF support with
> all the eggs into a robust state that soon. Code that
> wants to test for this feature could cond-expand on 5.4
> (and later versions).

Sure, that's probably the better way to go.  Perhaps the feature
name should simply be "weak-pairs".  WDYT?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] Add user-facing weak pair API

2023-06-03 Thread Peter Bex
On Sat, Jun 03, 2023 at 11:12:40PM +0200, Peter Bex wrote:
> Dear hackers,
> 
> At the Village CHICKENs event, I gave a presentation about how we could
> add "proper" support for weak pairs, so that they can be exposed to the
> user.

PPS: When writing tests for weak pairs, I realised I made a mistake in
my presentation: cyclic lists of weak pairs are not a problem, because
the cars would *still* be cleared.

The issue, IIUC, is when both the cdr and the car (or a car of one weak
pair and a cdr of another) refer to the same object, *then* it can't be
collected.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] Add user-facing weak pair API

2023-06-03 Thread Peter Bex
On Sat, Jun 03, 2023 at 11:12:40PM +0200, Peter Bex wrote:
> Dear hackers,
> 
> At the Village CHICKENs event, I gave a presentation about how we could
> add "proper" support for weak pairs, so that they can be exposed to the
> user.  Right now we have half-baked support for weak pairs as a hidden
> implementation detail of the symbol table.

PS: I'm not sure if this should go into 5.4 or 6.0.  On the one hand,
it's a relatively minor addition to the API.  On the other hand, if
anyone wants to rely on it we might need to provide a feature for it
whereas in 6.0 it would be trivial.

Do we even want to make a 5.4 release at all?

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Add user-facing weak pair API

2023-06-03 Thread Peter Bex
Dear hackers,

At the Village CHICKENs event, I gave a presentation about how we could
add "proper" support for weak pairs, so that they can be exposed to the
user.  Right now we have half-baked support for weak pairs as a hidden
implementation detail of the symbol table.

This would be useful for users, but also for core: this would make it
easier to support line numbers in the interpreter without unbounded
memory usage - the expression in source code would be the key, the line
number where it occurs would be the value in the database.  When an
expression gets discarded because nothing hangs on to it, we can drop
the corresponding entry from the database.

Slides of the talk are here:
https://code.more-magic.net/weak-refs-for-chicken/tree/weak-refs.org

I also expect to publish a more accessible writeup of this on my blog,
but don't hold your breath - it may take a while.

Attached are 5 patches to add the weak pair support.  I've also pushed
this to the git repo under the "user-facing-weak-pairs" branch.

The patches contain more information about the specific approach.
Especially of interest is the trick to build a linked list in the
"dead space" of the old cdr that is still allocated behind forwarding
pointers of weak pairs that makes this efficient.  I've taking this
idea from MIT Scheme, see "src/microcode/gcloop.c" in their source
code repository for more info.  Viewable online at:
https://git.savannah.gnu.org/cgit/mit-scheme.git/tree/src/microcode/gcloop.c?id=545efe9dcaac695d4d5c34de7497543b8dd4f2b0#n789

Even though we take this neat trick from MIT Scheme, I decided not to
use their API as it is a bit awkward.  It may be more Schemely as
weak pairs are completely distinct from regular pairs in MIT Scheme,
but there's an issue with the way you can detect broken weak pairs;
you have to follow the pattern in the manual where you access the
pair to make sure it wasn't actually storing #f.  See here:
https://web.mit.edu/scheme_v9.2/doc/mit-scheme-ref/Weak-Pairs.html

Instead, this set of patches follows the Chez Scheme API: it adds
only three procedures: weak-cons, weak-pair? and the
broken-weak-pointer? predicate, whilst all the regular pair operations
transparently accept either regular or weak pairs.  It is much more
expedient as it means you get to use all the list operators, and you
get printing of weak pairs for free, since they get serialized like
regular pairs.  And it's extensible - if we ever decide to support
"ephemerons", we could also go with the ephemerons API from Chez.

See the following page for more info about Chez's API:
https://cisco.github.io/ChezScheme/csug9.5/smgmt.html#./smgmt:h2

NOTE: Changing locatives to use the "chain of forwarding pointers to
weak container objects" trick (so we can drop the locative table) is
still to be done.

NOTE 2: I wasn't 100% sure if (chicken base) is the best place to put
these new procedures.  We could also add them to a new module, e.g.
(chicken weak-pair) or (chicken weak-pointer) or some such, or add
them to the extant (chicken gc) module (although that's a bit odd
since locatives are in their own rather than under (chicken gc))...
However, it's only three procedures, they sort of "match" cons, pair?
etc in (scheme base), and finally Chez also puts them in their
(chezscheme) module (which kind of makes sense in that it holds the
"core" data types).  This is great bikeshed discussion fodder, of
course.

NOTE 3: I tried to keep the commits bite-sized and form a narrative
of sorts so that it should be easy to understand even if you haven't
seen my talk at the Village CHICKEN event.

Cheers,
Peter
From 1d477b2857ec3f01c5367dadc5725b85e7b53799 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Sat, 3 Jun 2023 11:20:40 +0200
Subject: [PATCH 1/5] Replace special-casing of weak symbol GC with generic
 weak pair GC

When we introduced weak pairs, it was initially rather "customized"
for the specific case of bucket chains of internal symbol tables.
This is the first step to make make it more generic, so that we can
eventually support a user-facing weak pair type.

In this change, when we mark a weak pair in the GC:
- First we convert it to a forwarding pointer (as usual)
- Then, we "recycle" the old value's car field to make it part of a
   chain of all the forwarded weak pairs.  We can get away with that
   because the pair remains allocated as 3 words, even if the
   forwarding pointer we replace it with is only 1 word.
- When GC is done, we traverse this chain and follow each forwarding
   pointer.  The car field of the weak pair in the tospace will still
   hold a pointer into fromspace, which needs to be either fixed up
   (if the target value was replaced with a forwarding pointer) or
   cleared.
---
 runtime.c | 181 +++---
 1 file changed, 133 insertions(+), 48 deletions(-)

diff --git a/runtime.c b/

[PATCH] Stop run-time option processing after first "-:" or non-runtime option (was: Re: [PATCH] stop run-time option processing after "--")

2023-03-24 Thread Peter Bex
On Tue, Mar 14, 2023 at 08:43:52AM +0100, Peter Bex wrote:
> On Tue, Mar 14, 2023 at 08:40:18AM +0100, felix.winkelm...@bevuta.com wrote:
> > Backwards compatibility is from now on fucked anyway, so let's
> > try to find somethin simple and sensible.
> 
> Then we should do both.  I.e., have -: to explicitly tell it to stop
> parsing (and have that dropped from the command-line-options) and
> stop at the first argument.  The -: would be in situations where
> a script just passes on all its arguments, like
> 
> ./my-program -: "$@"

Here's a patch that implements this proposal.

Cheers,
Peter

From dce61eb2d7caf7f570701161ee1cada07a862835 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Fri, 24 Mar 2023 10:42:49 +0100
Subject: [PATCH] Stop run-time option processing after "-:" or the first
 non-runtime option

This means that runtime options after non-runtime options will no
longer be processed.  Programs using command-line-arguments will
see all options starting at the first non-runtime option, even if
later options start with "-:".

Processing can also be stopped explicitly with a single "-:" option,
which will not be seen by programs using command-line-arguments.
Every argument following this "-:" will be seen, even if they start
with "-:".
---
 NEWS|   4 +
 library.scm |  22 +-
 manual/Module (chicken process-context) |  22 +-
 manual/Using the compiler   |  15 +-
 runtime.c   | 258 
 5 files changed, 179 insertions(+), 142 deletions(-)

diff --git a/NEWS b/NEWS
index 23661fc2..6ecbbad2 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,10 @@
 to avoid arbitrary OS command injection during egg installation,
 reported by Vasilij Schneidermann who also provided the necessary
 patches to mitigate this problem.
+  - The runtime option "-:b" has been removed, as it was deemed too
+insecure to be able to drop to a REPL from the CLI of any program.
+  - Runtime option processing has been hardened: processing now stops on
+the first non-runtime option or after "-:", whichever comes first.
 
 - Core libraries
   - Added "locative-index", kindly contributed by John Croisant.
diff --git a/library.scm b/library.scm
index 827666d5..819dacfe 100644
--- a/library.scm
+++ b/library.scm
@@ -6024,17 +6024,23 @@ static C_word C_fcall C_setenv(C_word x, C_word y) {
 
 (define command-line-arguments
   (make-parameter
-   (let ([args (argv)])
+   (let ((args (argv)))
  (if (pair? args)
-(let loop ([args (##sys#slot args 1)])
+(let loop ((args (##sys#slot args 1))) ; Skip over program name 
(argv[0])
   (if (null? args)
   '()
-  (let ([arg (##sys#slot args 0)]
-[r (##sys#slot args 1)] )
-(if (and (fx>= (##sys#size arg) 3)
- (string=? "-:" (##sys#substring arg 0 2)))
-(loop r)
-(cons arg (loop r)) ) ) ) )
+  (let ((arg (##sys#slot args 0))
+(rest (##sys#slot args 1)) )
+(cond
+ ((string=? "-:" arg)  ; Consume first "empty" runtime options 
list, return rest
+  rest)
+
+ ((and (fx>= (##sys#size arg) 3)
+   (string=? "-:" (##sys#substring arg 0 2)))
+  (loop rest))
+
+ ;; First non-runtime option and everything following it is 
returned as-is
+ (else args) ) ) ) )
 args) )
(lambda (x)
  (##sys#check-list x 'command-line-arguments)
diff --git a/manual/Module (chicken process-context) b/manual/Module (chicken 
process-context)
index 5299905e..87165187 100644
--- a/manual/Module (chicken process-context)   
+++ b/manual/Module (chicken process-context)   
@@ -13,7 +13,7 @@ This module provides access to the current process context.
 
 Returns two values: an integer and a foreign-pointer object
 representing the {{argc}} and {{argv}} arguments passed to the current
-process.
+process.  See also {{argv}} below.
 
  argv
 
@@ -24,13 +24,27 @@ the list is a string containing the name of the executing 
program. The
 other items are the arguments passed to the application. It depends on
 the host-shell whether arguments are expanded ('globbed') or not.
 
+NOTE: This is the "raw" unprocessed argument list, including runtime
+options (starting with {{-:}}) which are consumed by the runtime
+library.
+
  command-line-arguments
 
 (command-line-arguments)
 
-Contains the list of arguments passed to this program, with the name of
-the program and any runtime options (all options starting with {{-:}})
-removed.
+Contains the list of arguments passed to this program.
+
+This ''excludes'' the name

Re: [PATCH] stop run-time option processing after "--"

2023-03-14 Thread Peter Bex
On Tue, Mar 14, 2023 at 02:48:28PM +0100, Mario Domenech Goulart wrote:
 What should the behavior of executables be in case they are compiled
> with the runtime options parser disabled (e.g.,
> -disable-runtime-options, as per Felix' previous patch)?

Assuming we go with the decision to process until we hit either "-:" or
the first non-runtime option, I think programs with the parser disabled
could still consume the first argument if it is identical to "-:".

I prefer that over doing completely nothing, as that makes it easier to
disable runtime parsing in any existing program at any point - all the
existing scripts for that program which already use "-:" to disable
runtime option parsing would continue to work without modification.

Thanks for asking, this is an important point to discuss!

NOTE: Perhaps a simpler alternative to that patch would be to treat
any *non-empty* runtime option as an error (like we do for unknown
options now) and error out with a message saying runtime option parsing
has been disabled.  This would give a better hint to the user that
they could decide to recompile the program with runtime parsing
enabled.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] stop run-time option processing after "--"

2023-03-14 Thread Peter Bex
On Tue, Mar 14, 2023 at 08:40:18AM +0100, felix.winkelm...@bevuta.com wrote:
> > Of course.  In fact, I think it would make more sense to simply tell the
> > runtime options parser to stop after the first non-"-:"-prefixed
> > argument.  That makes runtime argument stuffing harder and allows it
> > to play nice with _any_ option parser, and makes the "--" behaviour
> > automatically work if the program already handles getopt-style options.
> 
> That's another idea. Backwards compatibility is from now on
> fucked anyway, so let's try to find somethin simple and sensible.

Then we should do both.  I.e., have -: to explicitly tell it to stop
parsing (and have that dropped from the command-line-options) and
stop at the first argument.  The -: would be in situations where
a script just passes on all its arguments, like

./my-program -: "$@"

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] stop run-time option processing after "--"

2023-03-14 Thread Peter Bex
On Tue, Mar 14, 2023 at 08:39:22AM +0200, Lassi Kortela wrote:
> Is there any reason -: runtime options need to be recognized beyond the
> first argument? Gambit also uses the -: syntax, and it does not.

Making it compatible might be nice, but involves a change in how these
options are parsed - they're comma separated.  I wouldn't be opposed to
this change, but it deviates further from what we have making backwards
compatibility more of an issue.  This might not be a problem since these
options are relatively obscure and it should be easy to tweak a calling
script.

It would be harder to make a script that can work with older and newer
versions of CHICKEN, since the old one wouldn't grok commas in the
options and the new one wouldn't grok separately supplied options.

Perhaps something for CHICKEN 6?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] stop run-time option processing after "--"

2023-03-14 Thread Peter Bex
On Mon, Mar 13, 2023 at 09:35:53PM +0100, felix.winkelm...@bevuta.com wrote:
> > In other words, "./my-program -:a100 -- foo" is seen as '("--" "foo")
> > when you ask for (command-line-arguments), while the intention probably
> > was '("foo").
> 
> Actually not. Currently "--" is not specially handled, so it makes
> no difference whether r/t options are passed or not ("command-line-arguments")
> returns exactly the same. With or without this patch, "--" will be
> seen by user code.

I know, my point is that this change breaks backwards compatibility.
Let's say one has a pre-existing program which has no flag handling
code.  For example a simple program that only accepts file names on
its command line, but doesn't do option processing.

Let's say the user would like to "harden" the invocation to said
program, they'd have to _add_ the "--".  This would then be interpreted
by said program as a file name.

My point is that said program would still be able to parse run-time
options (because that's baked in to each program).  Therefore, it makes
sense to have the run-time option parser be 100% transparent to the
program and have its own way of stopping processing, invisible to the
program.

> If the program does not parse the command line, having
> a "--" has no effect. If it does, it is probably a good idea (and
> good style) to be aware of it and handle the case appropriately. For
> "my-program" above, the "--" in the example invocation is immaterial.

Of course.  In fact, I think it would make more sense to simply tell the
runtime options parser to stop after the first non-"-:"-prefixed
argument.  That makes runtime argument stuffing harder and allows it
to play nice with _any_ option parser, and makes the "--" behaviour
automatically work if the program already handles getopt-style options.

Note that this is _also_ a backward compatibility breaker, but IMO the
lesser of two evils as it doesn't introduce spurious command line
arguments into existing programs.

> But if "my-program" does process the command-line, it may be
> appropriate to handle "--" anyway (mostly to distinguish its own options
> from non-option arguments).

Indeed.  It might make sense, but the point was to be able to disable
runtime option processing, and IMO that should be possible even without
requiring the program to cooperate.

Note that it's also a very easy thing to forget to handle "--" specially
in programs that don't use a gettext-style options parser (i.e., all of
the core tools!  If we do go with this approach, all those would need to
be patched to handle "--" as well because they all have hand-rolled
options parsers.  Most of my own programs do, too).

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] stop run-time option processing after "--"

2023-03-13 Thread Peter Bex
On Mon, Mar 13, 2023 at 10:37:30AM +0100, felix.winkelm...@bevuta.com wrote:
> Another problem with run-time option processing pointed out by
> florz. This allows passing arguments that look like run-time
> options, when preceded by "--".

This needs more thought - simply adding "--" to stop it from processing
options leaves us with ("--") as the first command-line argument.
In other words, "./my-program -:a100 -- foo" is seen as '("--" "foo")
when you ask for (command-line-arguments), while the intention probably
was '("foo").

Just dropping "--" from the arguments list is probably also not
desirable, because our option parser doesn't have any getopt-like
behaviour, so -xyz or --xyz aren't treated specially in any way.

Perhaps the safest here is to stop processing after reading ":-"
or some such (and discard it!) - that would be guaranteed to never
pass through to the program anyway, because it might be a runtime
option and already gets intercepted by the option parser.

Only potential issue is that it's not exactly standard when compared to
other programs...

What do y'all think?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] drop "b" runtime option

2023-03-13 Thread Peter Bex
On Sun, Mar 12, 2023 at 06:04:49PM +0100, felix.winkelm...@bevuta.com wrote:
> As rightly pointed out by florz and others, the "b" (break on
> repl) offers a potential security vulnerability by allowing
> code from stdin to be interpreted. This patch simply removes
> the option, as it doesn't seem to be widely used, anyway.

Good idea to drop it, like you said it's not widely used (I didn't even
really know about it).

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] rename egg status files

2023-01-11 Thread Peter Bex
On Sun, Oct 30, 2022 at 02:47:47PM +0100, felix.winkelm...@bevuta.com wrote:
> See #1753, as reported by Kon. This will bust egg caches, but should otherwise
> not cause any incompatibilities.

Pushed, thanks Felix.

However, I noticed it would still just install an egg I had previously
downloaded, keeping the original VERSION, STATUS and TIMESTAMP files
and not creating _VERSION, _STATUS and _TIMESTAMP.  Not sure if that's
correct, since you mention it would bust egg caches...

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] try to make generation of egg-info file on Windows more robust

2022-11-10 Thread Peter Bex
On Wed, Nov 02, 2022 at 10:24:13AM +0100, felix.winkelm...@bevuta.com wrote:
> The attached patch was contributed by Vassilij. See #1800.

Thanks guys, pushed the updated version of this we discussed elsewhere.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] handle misquoting of target in build-scripts

2022-11-01 Thread Peter Bex
On Mon, Oct 31, 2022 at 05:46:55PM +0100, felix.winkelm...@bevuta.com wrote:
> Thanks, pushed. I had to hand-apply the patch as git-am didn't like it. I also
> fixed a scrutinizer warning.

I suppose the warning was about target-librarian-options not being a
list?  Unfortunately, splicing in these options wouldn't work properly -
they're taken from a baked in foreign value which may contain spaces.

We'll have to figure out a way to keep certain arguments "as-is".
Attached is a pretty straightforward proposal to make this work.

PS: I noticed two more mistakes that have nothing to do with this - just
the platform was still being passed in to "filelist" and was *not* being
passed in to print-build-command.  I've fixed these already and pushed
them as trivial patches.

Cheers,
Peter
From 651f70a66ed2ab3e70334466ae8a93b57ffbaa10 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Tue, 1 Nov 2022 10:39:32 +0100
Subject: [PATCH] Add a way to pass in already-quoted arguments to qs*

The librarian options are kept around in the binary as a foreign
string which may contain multiple flags separated by spaces.  These
should be taken verbatim by print-build-command.

The way we do this is by wrapping such verbatim snippets in a "raw"
record type and detecting these values in qs* and unpacking their
strings directly from the struct instead of quoting them.
---
 egg-compile.scm | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/egg-compile.scm b/egg-compile.scm
index 662fad3d..c1f2ceb0 100644
--- a/egg-compile.scm
+++ b/egg-compile.scm
@@ -650,7 +650,7 @@
link-objects
(print-build-command (list out3)
 `(,out2 ,@lobjs)
-`(,target-librarian ,target-librarian-options 
,out3 ,out2 ,@lobjs)
+`(,target-librarian ,(raw-arg 
target-librarian-options) ,out3 ,out2 ,@lobjs)
 platform)))
 (print-end-command platform)))
 
@@ -1152,10 +1152,19 @@ EOF
 ;; have to undo that here again.  It can also convert slashes to
 ;; backslashes on Windows, which is necessary in many cases when
 ;; running programs via "cmd".
+;;
+;; It also supports already-quoted arguments which can be taken as-is.
 (define (qs* arg platform #!optional slashify?)
-  (let* ((arg (->string arg))
-(path (if slashify? (slashify arg platform) arg)))
-(qs path (if (eq? platform 'windows) 'mingw32 platform
+  (if (raw-arg? arg)
+  (raw-arg-value arg)
+  (let* ((arg (->string arg))
+(path (if slashify? (slashify arg platform) arg)))
+   (qs path (if (eq? platform 'windows) 'mingw32 platform)
+
+(define-record-type raw-arg
+  (raw-arg value)
+  raw-arg?
+  (value raw-arg-value))
 
 (define (slashify str platform)
   (if (eq? platform 'windows)
-- 
2.36.2



signature.asc
Description: PGP signature


[PATCH] Fix double normalize-destination call

2022-10-31 Thread Peter Bex
Hi all,

While testing my previous patch on Windows, I noticed some weird error
output during the installation phase.  It would say several times in a
row:
The filename, directory name, or volume label syntax is incorrect.

If you look at the foo.install.bat file, there's calls to mkdir which
include the chicken PREFIX twice.

Then, I realised this also happens in *nix.  For example, try to install
the "amb" egg.  Just before it emits the egg-info file, it emits several
calls to mkdir -p /path/to/chicken/path/to/chicken/share, which ought
to be just /path/to/chicken/share.

After investigating further, I realised these 5 calls correspond to
the (data ...) components in the egg, which aren't actually installed!

If you install the amb egg and then run the following command from the
target directory: "find -name amb-dwelling.scm", it turns up nothing.
The amb-dwelling.scm file also doesn't even occur in the install script.

It looks like that's actually a bug in the egg though - contrast to the
srfi-19 egg which *does* work, it has (data srf-29-bundles (files ...))
instead.

The attached patch fixes the double call to normalize-destination on the
directory for "share" files (data files)

Cheers,
Peter
From e8341cca08c43e8bac1c5d08aaac670f5c37d211 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Mon, 31 Oct 2022 13:06:26 +0100
Subject: [PATCH] Do not double call normalize-destination on share dir

In install-random-files, don't call normalize-destination on the
destination directory - this is already done in compile-egg-info

Before, we'd see things like in the install script like:
  mkdir -p /path/to/chicken/path/to/chicken/share
With this patch, it becomes:
  mkdir -p /path/to/chicken/share
which is as it should be.
---
 egg-compile.scm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/egg-compile.scm b/egg-compile.scm
index 6818c6e4..5a59a43f 100644
--- a/egg-compile.scm
+++ b/egg-compile.scm
@@ -990,7 +990,7 @@
  (root (string-append srcdir "/"))
  (mkdir (mkdir-command platform))
  (sfiles (map (cut prefix srcdir <>) files))
- (dfile (qs* (normalize-destination dest mode) platform #t))
+ (dfile (qs* dest platform #t))
  (ddir (shell-variable "DESTDIR" platform)))
 (print "\n" mkdir " " ddir dfile)
 (let-values (((ds fs) (partition directory? sfiles)))
-- 
2.36.2



signature.asc
Description: PGP signature


Re: [PATCH] handle misquoting of target in build-scripts

2022-10-31 Thread Peter Bex
On Sun, Oct 30, 2022 at 02:46:49PM +0100, felix.winkelm...@bevuta.com wrote:
> Another follow-up patch on the recent change of quoting in chicken-install, 
> which causes
> chicken-do to always fire (and thus rebuilding any egg, regardless of status).

I noticed there were still inconsistencies here.
How about the attached patch instead?  It may be a little bit
trickier to read (but not by much), but should eliminate the
problem entirely for chicken-do calls.

Cheers,
Peter
From 730c74994a5feb96ef5d1d7ac2afdefbad65db7a Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Mon, 31 Oct 2022 11:44:52 +0100
Subject: [PATCH] Hopefully completely fix quoting hell in generated build
 commands

Instead of using a mess of "qs*", "joins", "slashify" and "filelist"
and hoping for the best when generating chicken-do commands, use a
more principled method by way of a helper procedure to print the build
command.

This helper will receive *only* unquoted arguments: the list of
targets, list of dependencies and the build command line with flags as
a list.  It then calls "qs*" on all of these where needed and emits a
corresponding chicken-do line.

By doing it this way, it's much more obvious where quotation happens:
only in print-build-command, and never in the procedure that calls it.
For consistency, also change prepare-custom-command so that it accepts
an unquoted filename, so that quotation is delegated to it.

For now, leave the code that emits installation commands untouched.
We'll probably want to do the same for these though.
---
 egg-compile.scm | 333 +---
 1 file changed, 142 insertions(+), 191 deletions(-)

diff --git a/egg-compile.scm b/egg-compile.scm
index 23af8b4e..6818c6e4 100644
--- a/egg-compile.scm
+++ b/egg-compile.scm
@@ -595,9 +595,8 @@
link-objects modules
custom types-file inline-file)
  srcdir platform)
-  (let* ((cmd (qs* (or (custom-cmd custom srcdir platform)
-  default-csc)
-  platform))
+  (let* ((cmd (or (custom-cmd custom srcdir platform)
+ default-csc))
  (sname (prefix srcdir name))
  (tfile (prefix srcdir (conc types-file ".types")))
  (ifile (prefix srcdir (conc inline-file ".inline")))
@@ -613,51 +612,46 @@
(list "-emit-inline-file" ifile)
'(
  (out1 (conc sname ".static"))
- (out2 (qs* (target-file (conc out1
-   (object-extension platform))
- mode)
-platform))
+ (out2 (target-file (conc out1
+  (object-extension platform))
+mode))
  (out3 (if (null? link-objects)
out2
-   (qs* (target-file (conc out1
-   (archive-extension platform))
- mode)
-platform)))
+   (target-file (conc out1
+  (archive-extension platform))
+mode)))
  (targets (append (list out3 lfile)
   (maybe types-file tfile)
   (maybe inline-file ifile)
   (map (lambda (m)
  (prefix srcdir (conc m ".import.scm")))
(or modules '()
- (src (qs* (or source (conc name ".scm")) platform)))
+ (src (or source (conc name ".scm"
 (when custom
   (prepare-custom-command cmd platform))
-(print "\n" (qs* default-builder platform #t) " "
-   (joins targets platform) " : "
-   src " " (qs* eggfile platform) " "
-   (if custom cmd "") " "
-   (filelist srcdir source-dependencies platform)
-   " : " cmd
-   (if keep-generated-files " -k" "")
-   " -regenerate-import-libraries"
-   (if modules " -J" "") " -M"
-   " -setup-mode -static -I " srcdir 
-   " -emit-link-file " (qs* lfile platform)
-   (if (eq? mode 'host) " -host" "")
-   " -D compiling-extension -c -unit " name
-   " -D compiling-static-extension"
-   " -C -I" srcdir " " (joins opts platform) 
-   " " src " -o " out2)
+(print-build-command targets
+`(,@(filelist srcdir source-dependencies) ,src ,eggfile
+

Re: [PATCH] simplification/refactoring

2022-10-31 Thread Peter Bex
On Sun, Oct 30, 2022 at 02:48:27PM +0100, felix.winkelm...@bevuta.com wrote:
> in egg-compile.scm, as suggested by Peter.

Thanks, pushed.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] always check servers if version not given

2022-10-27 Thread Peter Bex
On Fri, Oct 28, 2022 at 05:47:22PM +1300, Evan Hanson wrote:
> Applied!
> 
> Lots of tickets closed lately, very cool.

Indeed.  I've taken the liberty of adding these two changes to the NEWS
file.  Even though they're relatively minor, they correspond to
tickets/bug reports by users, so it's worthwhile adding it there.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] fix for double-quoting behaviour in egg-compilation

2022-10-27 Thread Peter Bex
On Tue, Oct 25, 2022 at 12:09:55PM +0200, Peter Bex wrote:
> Also, we forgot to look at Salmonella or we'd noticed:
> http://salmonella-linux-x86-64.call-cc.org/master/gcc/linux/x86-64/2022/10/24/yesterday-diff/

So above, that's "-151" for succesfull installation, i.e. we broke 151
eggs (either directly or through transitive dependencies).

I checked Salmonella after the fixes and it fixed 147 eggs:
http://salmonella-linux-x86-64.call-cc.org/master/gcc/linux/x86-64/2022/10/26/yesterday-diff/

That leaves 4 eggs that are broken still.  It turns out these are:
- expat (Felix and Shawn Rutledge)
- dbus (Shawn Rutledge)
- icu (Diego A. Mundo)
- espeak (Diego A. Mundo)

These eggs all rely on something a la the following in their .egg files:
(csc-options -C "`pkg-config --cflags expat`" -L "`pkg-config --libs expat`")

I would argue that's a bug and relying on undocumented, unintented
functionality.  Since it's only 4 eggs I think it's reasonable to expect
their authors to fix the eggs themselves.  Probably this requires using
a custom build script like we do in for example imlib2:

https://anonym...@code.call-cc.org/svn/chicken-eggs/release/5/imlib2/trunk/

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] fix for double-quoting behaviour in egg-compilation

2022-10-25 Thread Peter Bex
On Mon, Oct 24, 2022 at 05:23:24PM +0200, felix.winkelm...@bevuta.com wrote:
> As reported by Kon, our previous "fix" for shell argument quoting
> exposed another bug, this is an attempt to improve on that.

Thanks, pushed.

Also, we forgot to look at Salmonella or we'd noticed:
http://salmonella-linux-x86-64.call-cc.org/master/gcc/linux/x86-64/2022/10/24/yesterday-diff/

One question though: "joins" is essentially the same as "arglist", isn't
it?  I mean, it's used in different contexts but the effect is the same.
Perhaps we should merge them.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] get rid of obscure mac resource file

2022-10-24 Thread Peter Bex
On Fri, Oct 21, 2022 at 03:58:42PM +0200, felix.winkelm...@bevuta.com wrote:
> This stuff is obscure and, as described in #1774, packaging/"deploying" 
> applications is something
> that should be left to application developers.

Agreed, patch pushed.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] handling of prelude/postlude forms

2022-10-24 Thread Peter Bex
On Mon, Oct 24, 2022 at 11:23:09AM +0200, felix.winkelm...@bevuta.com wrote:
> The attached patch addresses the broken semantics of -prelude/-postlude
> in the compiler driver, as reported by sjamaan.

Thanks, pushed!

I wonder how this got so broken.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] slight improvement in option quoting for csc

2022-10-23 Thread Peter Bex
On Sun, Oct 23, 2022 at 12:36:20PM +0200, felix.winkelm...@bevuta.com wrote:
> > On Fri, Oct 21, 2022 at 04:36:58PM +0200, felix.winkelm...@bevuta.com wrote:
> > > See #1302.
> >
> > Why do we even attempt to detect whether to quote?  That looks a bit
> > brittle to me.  It should be simpler to just always quote (e.g. using "qs"),
> > like we do in other places, for example in chicken-install, or is there
> > some reason we can't do that here?
> >
> 
> Ok, that is of course a much simpler approach, please find attached a new
> patch.

Thanks, pushed!

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] slight improvement in option quoting for csc

2022-10-22 Thread Peter Bex
On Fri, Oct 21, 2022 at 04:36:58PM +0200, felix.winkelm...@bevuta.com wrote:
> See #1302.

Why do we even attempt to detect whether to quote?  That looks a bit
brittle to me.  It should be simpler to just always quote (e.g. using "qs"),
like we do in other places, for example in chicken-install, or is there
some reason we can't do that here?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] allow list-notation for -R option

2022-10-21 Thread Peter Bex
On Fri, Oct 21, 2022 at 11:26:05AM +0200, felix.winkelm...@bevuta.com wrote:
> See #1809, as suggested by Zipheir.

Thanks, pushed! I also took the liberty of adding Zipheir to the
acknowledgments and adding the change to the NEWS file.

> +(define (string-trim str)
> +  (let loop ((front 0) 
> + (back (sub1 (string-length str
> +(cond ((= front back) "")
> +  ((char-whitespace? (string-ref str front))
> +   (loop (add1 front) back))
> +  ((char-whitespace? (string-ref str back))
> +   (loop front (sub1 back)))
> +  (else (substring str front (add1 back))

I was a bit surprised to remember that we don't really have this in
(chicken string), all we have is the somewhat whimsical string-chomp.

First I thought maybe moving this definition to (chicken string), but
then I realised we probably want to at least somehow specify what's
considered whitespace, but then you end up with a slow implementation.
And we do have a version of this in srfi-13 (which includes a
predicate/char-set argument), so maybe we don't really need it after all.

But still, it's a bit weird not to have this in the standard lib...

What do the other hackers think?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] chicken-do should warn but ignore missing dependencies

2022-10-21 Thread Peter Bex
On Tue, Oct 12, 2021 at 12:13:17PM +0200, felix.winkelm...@bevuta.com wrote:
> This patch changes chicken-do to emit a warning when a dependency is missing,
> but lets it still continue. That way missing dependencies in egg files can at 
> least
> not break the build.

Why not break the build though? A missing dependency is an error, isn't it?
With all the output you get, it's easy to overlook this warning, but if
it halts, that at least would get my attention ;)

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] fix #1797

2022-05-20 Thread Peter Bex
On Thu, May 19, 2022 at 12:03:16PM +0200, felix.winkelm...@bevuta.com wrote:
> This patch addresses the unnecessary interning in char-comparison
> procedures.

Thanks, pushed!

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] disable umask POSIX tests on windows

2022-05-20 Thread Peter Bex
On Sat, Mar 26, 2022 at 05:59:24PM +0100, felix.winkelm...@bevuta.com wrote:
> Hi!
> 
> Attached a patch to address the issue reported by Claude Marinier.
> I tried to understand what the expected behaviour of umask should
> be, but in the end I just dropped the whole test, since I don't
> understand how the permission bits on cygwin ought to behave.

Agreed, let's just drop it.  I don't have the time or energy to try and
understand it fully.  Pushed.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] additional tests for fp*+

2022-05-20 Thread Peter Bex
On Thu, May 19, 2022 at 02:34:09PM +0200, felix.winkelm...@bevuta.com wrote:
> Oh boy... I should have looked at that a bit more seriously, Sorry about that.

No worries!

> Attached a new patch.

Thanks, pushed!

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] additional tests for fp*+

2022-05-19 Thread Peter Bex
On Thu, May 19, 2022 at 12:01:43PM +0200, felix.winkelm...@bevuta.com wrote:
> These tests were contributed by Christian Himpe.

> +;;; Tests contributed by Christian Himpe:
> +
> +;; dummy testee
> +(define (fp*+ a b c)
> +  (+ (* a b) c))
> +

IIUC that means these tests aren't actually exercising the real fp*+.
Is there a reason to override it like this?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] Fix the type of make-absolute-pathname

2022-03-10 Thread Peter Bex
On Sun, Feb 20, 2022 at 01:58:31PM +0100, felix.winkelm...@bevuta.com wrote:
> > Hi all,
> >
> > First time contributing to CHICKEN itself. Is this all?
> >
> 
> Basically, yes. Attached is a signed off copy of the patch, now
> we wait for some other core developer to apply it...

Sorry for the long wait.  Pushed!

> Thanks for your contribution!

+1 to that, much appreciated.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] Don't add LIBDIR to rpath by default

2022-01-29 Thread Peter Bex
On Sat, Jan 29, 2022 at 12:08:17PM +0100, Sören Tempel wrote:
> I don't think there is a semi-portable way of doing this, or at least I
> am personally not aware of one. However, maybe a Makefile configuration
> variable could be introduced specifically for creating CHICKEN Scheme
> packages for a distribution (for example as in `make PACKAGING=1
> install`)? If $PACKAGING is set it could then be assumed that $(LIBDIR)
> and other variables point to directories which are already in the
> default search path.

Hi Sören,

I think that too is too simplistic - as explained before, in for example
pkgsrc on NetBSD you still need to set the rpath, even though that's a
package manager.  And when you run pkgsrc on Linux, I suppose the rpath
might be unnecessary, strictly speaking.  So, in my opinion it doesn't
seem to be a packaging/non-packaging question.  It's more a question of
whether the libdir to be installed to happens to be in the default search
path, which is highly OS-dependent.

But, let me flip the question around: how exactly is it *wrong* to
include a "redundant" rpath?  Technically speaking, is there a good
reason to want to avoid explicitly baking in the rpath for default
search paths?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] Don't add LIBDIR to rpath by default

2022-01-23 Thread Peter Bex
On Sun, Jan 23, 2022 at 12:54:58PM +0100, Sören Tempel wrote:
> I recently started working on Alpine packages for Scheme software which
> I compiled with CHICKEN. While doing so, I noticed that csc(1)
> unconditionally adds $(LIBDIR) as an rpath entry by adding
> -Wl,-rpath=$(LIBDIR) to the linker flags. I find this surprising as (in
> the common case) LIBDIR should already be part of the default rpath on
> (hopefully) all systems (otherwise it doesn't make sense to install .so
> files to it). Since this rpath entry is redundant it causes a warning on
> Alpine.

Hi Sören,

Thanks for pointing this out.  If memory serves, we started adding rpath
because BSD systems require it.  They don't have a ld.so.conf or
something similar (I don't know what modern Linux uses to set the
default rpath).

Even on Linux, many people also install CHICKEN to nonstandard locations.
For instance, if you don't have root, it's very convenient to install
CHICKEN to your homedir, and in our own infrastructure we typically
install CHICKEN to /usr/local/chickens/ and then symlink
to it as /usr/local/chickens/qwiki for example, to indicate that the
CHICKEN installation for qwiki is .  This allows us to
gradually upgrade our infrastructure.

> I was wondering if it would be possible to only add additional rpath
> entries if -deployed is given or non-host-mode is used. The attached
> git-format-patch(1) implements this. Would it be possible to integrate
> it upstream?

Given what I said above, I think simply removing the rpath entirely for
host-local builds is not the right thing to do, as changing the C flags
to include -rpath is too onerous and obscure for users to care about.

Is there a simple way of detecting whether an rpath is the default?

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Remove some dead code from runtime.c

2021-10-20 Thread Peter Bex
Hi all,

In #chicken Guest-liao was asking some questions about compiling
runtime.c manually, and I noticed that they didn't have
chicken-config.h, and passing in -DHAVE_CHICKEN_CONFIG_H is needed.
While looking into this, I noticed we also used to have a HAVE_CONFIG_H
macro which got checked and defined, but this no longer seems to be in
use.  The attached patch removes the obsolete #ifdef HAVE_CONFIG_H block
from runtime.c

Cheers,
Peter
From dba340c1fe1b3e966e69ee51bb951b121e84079a Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Wed, 20 Oct 2021 09:38:45 +0200
Subject: [PATCH] Remove obsolete HAVE_CONFIG_H check in runtime.c

We never define this, only HAVE_CHICKEN_CONFIG_H, so this
entire block of #ifdef can be removed.

Also, PACKAGE and VERSION are not used anywhere, so it's
pointless to undefine them.
---
 chicken.h |  2 +-
 runtime.c | 14 --
 2 files changed, 1 insertion(+), 15 deletions(-)

diff --git a/chicken.h b/chicken.h
index 7e51a38f..cc3be8ff 100644
--- a/chicken.h
+++ b/chicken.h
@@ -44,7 +44,7 @@
 /*
  * N.B. This file MUST not rely upon "chicken-config.h"
  */
-#if defined(HAVE_CONFIG_H) || defined(HAVE_CHICKEN_CONFIG_H)
+#if defined(HAVE_CHICKEN_CONFIG_H)
 # include "chicken-config.h"
 #endif
 
diff --git a/runtime.c b/runtime.c
index c46cde6f..981996f4 100644
--- a/runtime.c
+++ b/runtime.c
@@ -130,20 +130,6 @@ static C_TLS int timezone;
 # include 
 #endif
 
-#ifdef HAVE_CONFIG_H
-# ifdef PACKAGE
-#  undef PACKAGE
-# endif
-# ifdef VERSION
-#  undef VERSION
-# endif
-# include 
-
-# ifndef HAVE_ALLOCA
-#  error this package requires "alloca()"
-# endif
-#endif
-
 /* Parameters: */
 
 #define RELAX_MULTIVAL_CHECK
-- 
2.31.1



signature.asc
Description: PGP signature


Re: [PATCH] Default to "cc" on *BSD systems

2021-10-13 Thread Peter Bex
On Tue, Oct 12, 2021 at 12:18:52PM +0200, felix.winkelm...@bevuta.com wrote:
> See attached patch. This change is not intended for the current RC.

Thanks, applied.  I changed 5.4.0 to 5.3.1 in the NEWS file, as this
change would appear in the first snapshot we make (assuming we will
make one during the 5.3.0->5.4.0 cycle).

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] Fix for #1788

2021-10-05 Thread Peter Bex
On Tue, Sep 28, 2021 at 04:52:26PM +0200, felix.winkelm...@bevuta.com wrote:
> Hi!
> 
> The attached patch should fix the issue of builtin module ids appearing in 
> generated
> ..link files.

Thanks, pushed!

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Check argument type for setters in (chicken process-context posix)

2021-09-24 Thread Peter Bex
Hi there,

I saw this ticket come in and thought it's trivial enough that it's
probably a good idea to make it into 5.3.0.  In any case, even if
not, this patch can always go into master.

NOTE: It also fixes indentation for some of these setters.  Sorry
if that makes the patch slightly larger.

Cheers,
Peter
From da9f177e085c8c49efe6cbda0b90939f7f1e759a Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Fri, 24 Sep 2021 08:25:07 +0200
Subject: [PATCH] Add checks to current user/group id setters in (chicken
 process-context posix)

Fixes #1787
---
 NEWS|  9 +
 manual/Acknowledgements |  4 ++--
 posixunix.scm   | 26 +++---
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/NEWS b/NEWS
index 3118461b..0ce3a1bd 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,12 @@
+5.3.0rc4
+
+- Core libraries
+  - In (chicken process-context posix), the setters for current-user-id,
+current-effective-user-id and current-group-id now check that the
+new user/group value is a fixnum instead of blindly passing it on
+to the C implementation (which would cause bogus user ids to be set).
+Fixes #1787, thanks to Christopher Brannon.
+
 5.3.0rc3
 
 - Build system
diff --git a/manual/Acknowledgements b/manual/Acknowledgements
index fcaaa92f..8e6ac12b 100644
--- a/manual/Acknowledgements
+++ b/manual/Acknowledgements
@@ -7,8 +7,8 @@ Annis, Jason E. Aten, Marc Baily, Peter Barabas, Andrei Barbu, Jonah
 Beckford, Arto Bendiken, Andy Bennett, Kevin Beranek, Peter Bex,
 Jean-Francois Bignolles, Oivind Binde, Alaric Blagrave Snell-Pym, Dave
 Bodenstab, Fabian Böhlke, T. Kurt Bond, Ashley Bone, Dominique Boucher,
-Terence Brannon, Roy Bryant, Adam Buchbinder, Hans Bulfone, "Category
-5", Taylor Campbell, Naruto Canada, Mark Carter, Esteban U. Caamano
+Christopher Brannon, Terence Brannon, Roy Bryant, Adam Buchbinder, Hans Bulfone,
+"Category 5", Taylor Campbell, Naruto Canada, Mark Carter, Esteban U. Caamano
 Castro, Semih Cemiloglu, Alex Charlton, Franklin Chen, Joo ChurlSoo,
 Thomas Chust, Gian Paolo Ciceri, Fulvio Ciriaco, Paul Colby, Tobia
 Conforto, John Cowan, Grzegorz Chrupala, James Crippen, Evan Hanson,
diff --git a/posixunix.scm b/posixunix.scm
index e8cf8526..992d8eed 100644
--- a/posixunix.scm
+++ b/posixunix.scm
@@ -602,6 +602,7 @@ static int set_file_mtime(char *filename, C_word atime, C_word mtime)
   (getter-with-setter
(foreign-lambda int "C_getuid")
(lambda (id)
+ (##sys#check-fixnum id 'current-user-id)
  (when (fx< (##core#inline "C_setuid" id) 0)
(##sys#update-errno)
(##sys#error 'current-user-id!-setter "cannot set user ID" id) ) )
@@ -611,29 +612,32 @@ static int set_file_mtime(char *filename, C_word atime, C_word mtime)
   (getter-with-setter
(foreign-lambda int "C_geteuid")
(lambda (id)
-(when (fx< (##core#inline "C_seteuid" id) 0)
-  (##sys#update-errno)
-  (##sys#error 
-	 'effective-user-id!-setter "cannot set effective user ID" id) ) )
+ (##sys#check-fixnum id 'current-effective-user-id)
+ (when (fx< (##core#inline "C_seteuid" id) 0)
+   (##sys#update-errno)
+   (##sys#error 
+	'effective-user-id!-setter "cannot set effective user ID" id) ) )
"(chicken.process-context.posix#current-effective-user-id)"))
 
 (set! chicken.process-context.posix#current-group-id
   (getter-with-setter
(foreign-lambda int "C_getgid")
(lambda (id)
-(when (fx< (##core#inline "C_setgid" id) 0)
-  (##sys#update-errno)
-  (##sys#error 'current-group-id!-setter "cannot set group ID" id) ) )
+ (##sys#check-fixnum id 'current-group-id)
+ (when (fx< (##core#inline "C_setgid" id) 0)
+   (##sys#update-errno)
+   (##sys#error 'current-group-id!-setter "cannot set group ID" id) ) )
"(chicken.process-context.posix#current-group-id)") )
 
 (set! chicken.process-context.posix#current-effective-group-id
   (getter-with-setter 
(foreign-lambda int "C_getegid")
(lambda (id)
-(when (fx< (##core#inline "C_setegid" id) 0)
-  (##sys#update-errno)
-  (##sys#error 
-	 'effective-group-id!-setter "cannot set effective group ID" id) ) )
+ (##sys#check-fixnum id 'current-effective-group-id)
+ (when (fx< (##core#inline "C_setegid" id) 0)
+   (##sys#update-errno)
+   (##sys#error 
+	'effective-group-id!-setter "cannot set effective group ID" id) ) )
"(chicken.process-context.posix#current-effective-group-id)") )
 
 (define-foreign-variable _user-name nonnull-c-string "C_user->pw_name")
-- 
2.31.1



signature.asc
Description: PGP signature


[PATCH] Fix tests under Windows

2021-09-02 Thread Peter Bex
Hi all,

As Vasilij already reported, there are two failing tests under mingw.
They initially fail for the same reason - they try to delete files
that have been opened but not closed, which is an error under Windows.

After fixing this, the posix-tests fail due to the fact that Windows
doesn't fully support file modes.  It doesn't know about execute
permissions, and doesn't support making files write-only or having
files with no permissions.

To fix that, we can conceptually "right-extend" the user permissions
over group and others after adding read permissions (as Windows does
not have files without read permissions) and dropping execute
permissions.  In code, I cond-expand on Windows, shift the group and
other bits out to obtain the user bits, and then dispatch on the values
to get read and write access or read-only access.

With these patches in place, the tests pass once again on mingw-msys.

Why we never noticed before?  The POSIX tests were extended in ffe553,
and the read-lines-test was added in cfa1e7, both of which were added
somewhere between 5.2.0 and 5.3.0rc1.  And on *nix these tests work
fine, of course...

Cheers,
Peter
From 00f5fafd86f625c25231688a1b975c69bff630d1 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Thu, 2 Sep 2021 08:03:39 +0200
Subject: [PATCH 1/2] Ensure all ports are closed in tests when deleting the
 file

Under Windows, read-lines-test.scm and posix-tests.scm would
fail with "permission denied" upon deletion of the output file,
but that's because there was still an open file handle to it
(and on Windows you can't delete open files).

In read-lines-test.scm, it is my favorite Scheme footgun:
call-with-input-file doesn't close its port when the dynamic
extent is exited via other ways than a regular procedure return.

In posix-tests.scm it was a more mundane error - we opened a
file using the descriptor-returning POSIX procedure file-open,
but never closed the descriptor.
---
 tests/posix-tests.scm  | 2 +-
 tests/read-lines-tests.scm | 7 +--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/tests/posix-tests.scm b/tests/posix-tests.scm
index 361f55c1..26f495e7 100644
--- a/tests/posix-tests.scm
+++ b/tests/posix-tests.scm
@@ -98,7 +98,7 @@
  (let ((mode (file-creation-mode)))
(set! (file-creation-mode) umask)
(delete-file* "posix-tests.out")
-   (file-open "posix-tests.out" open/creat given ...)
+   (file-close (file-open "posix-tests.out" open/creat given ...))
(assert (equal? (file-permissions "posix-tests.out") expected))
(set! (file-creation-mode) mode))
   ;; default file mode
diff --git a/tests/read-lines-tests.scm b/tests/read-lines-tests.scm
index 28cdd223..e107c26f 100644
--- a/tests/read-lines-tests.scm
+++ b/tests/read-lines-tests.scm
@@ -46,8 +46,11 @@ EOF
 (test-error
  "with an invalid second argument (max)"
  (call-with-input-file input-test-file
-  (lambda (port)
-(read-lines port 2.0
+   (lambda (port)
+ (dynamic-wind
+	 void
+	 (lambda () (read-lines port 2.0))
+	 (lambda () (close-input-port port))
 
 (test-end "read-lines")
 
-- 
2.31.1

From 9ce043d50ea6e7005bf2a434fada84fac5e3 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Thu, 2 Sep 2021 09:44:13 +0200
Subject: [PATCH 2/2] Fix posix-tests file permission test on Windows

In Windows, there are three problems with this test:

- Deleting read-only files gives a permission error.
- Permissions don't know about "group" and "other" permissions - the
  permissions are always extended from the user so the rest match it.
- It is impossible to give write-only or "no" permissions on a file.

So we make files writable before attempting to delete them, and
we provide a mapping for expected permissions so that we only look
at the user part of the file mode, and always have it include
"readable" as a permission.

See also
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/umask
and
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/remove-wremove
---
 tests/posix-tests.scm | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/tests/posix-tests.scm b/tests/posix-tests.scm
index 26f495e7..1c695d9d 100644
--- a/tests/posix-tests.scm
+++ b/tests/posix-tests.scm
@@ -1,4 +1,5 @@
-(import (chicken pathname)
+(import (chicken bitwise)
+	(chicken pathname)
 (chicken file)
 (chicken file posix)
 (chicken platform)
@@ -89,17 +90,38 @@
 (assert (not (get-environment-variable "FOO")))
 
 ;; file creation and umask interaction
+(define (permission-expectation original-expectation)
+  (cond-expand
+;; In Windows, all files are always readable.  You cannot give
+ 

Re: [PATCH] Re: patch for chicken on 64-bit windows (msys and mingw-w64)

2021-08-31 Thread Peter Bex
On Mon, Aug 30, 2021 at 08:10:46PM +0300, elf wrote:
> I do use Windows regularly and am looking to improve chicken support for it.

Hi Elf,

Cool to hear.  What are your thoughts on the current situation?  Which
flavour of Windows build should we keep and which should we drop (if any)?
What about WSL and MSVC?

One thing I've been wondering about for a while now: Do you think we
should switch our scripts from CMD to PowerShell?  Would that reduce
some of the pain or just bring new and interesting failure modes because
it's even further removed from the UNIX world view?

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Re: patch for chicken on 64-bit windows (msys and mingw-w64)

2021-08-30 Thread Peter Bex
On Sun, Aug 29, 2021 at 09:21:38PM +0200, Mario Domenech Goulart wrote:
> Hi,
> 
> On Fri, 27 Aug 2021 18:09:09 +0200 felix.winkelm...@bevuta.com wrote:
> > Using the already provided implementation is preferable, I'd say - one way
> > to shift responsibility, which, in this utter mess of Windows API, seems to 
> > be
> > the only way forward.
> 
> I agree.

Alright, I'm convinced.  Attached is a patch to drop the damn thing.

> > Do we already have test results on all supported Windows platforms? I lost
> > track.
> 
> I think we have a set of problems related to Windows, which is going to
> be very hard for us to handle without help:
> 
> * As far as I can tell, none of the CHICKEN developers use Windows
>   regularly.

True.  I have it running in a VM, but I always start it with a lot of
trepidation and I don't fully understand the OS itself.  For example,
suddenly now "make check" has started to give a "permission denied -
cannot delete read-lines.in" error where it's always worked before
and I have no idea why it can't delete it.

> * There are many combinations of Windows environments, leading to a
>   great variety of ways to break stuff.

They all suck in their own way.  I'm kind of thinking maybe we don't
really need the cygwin build anymore now that there's WSL?

> * We do not have automated tests on Windows.
> 
> As a start, it would be nice to collect the information we need in a
> wiki page (e.g., which variants of Windows we want to/can support, how
> they differ etc.).  At this point I'm lost at the combinations of
> different Windows things that affect the build/execution of CHICKEN.
> 
> I'd say we need help at that so that we can improve support on Windows.

That's a catch-22 though: in order to get that help, we need CHICKEN
in a useful-enough state that people can use it on Windows.  And in
general, out of the total set of users, only a small subset will
actually contribute.

Cheers,
Peter
From 1edd51e77982ebce33b165e481da8df3f7ec813a Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Mon, 30 Aug 2021 12:20:03 +0200
Subject: [PATCH] Drop custom opendir/readdir implementation in Windows

Windows doesn't have these functions, but MinGW provides
them as part of its crt library.

Our own stub implementation had some issues which caused
an error to trigger a loop which resulted in a hanging
process when running chicken-install -update-db
---
 NEWS |  3 ++-
 file.scm | 82 ++--
 2 files changed, 4 insertions(+), 81 deletions(-)

diff --git a/NEWS b/NEWS
index c30c4395..00e79659 100644
--- a/NEWS
+++ b/NEWS
@@ -3,7 +3,8 @@
 - Core libraries
   - The srfi-17 module now exports the `getter-with-setter` and `setter`
 procedures, not just the set! macro (thanks to Lassi Kortela)
-
+  - Fix hang in chicken-install -update-db on Windows (thanks to Mark
+Fisher for reporting and Jani Hakala for debugging and patch).
 
 5.3.0rc1
 
diff --git a/file.scm b/file.scm
index 18ce00f1..56e4c909 100644
--- a/file.scm
+++ b/file.scm
@@ -65,86 +65,8 @@
 # define C_mkdir(str)   C_fix(mkdir(C_c_string(str)))
 #endif
 
-#if !defined(_WIN32) || defined(__CYGWIN__)
-# include 
-# include 
-#else
-struct dirent
-{
-char *  d_name;
-};
-
-typedef struct
-{
-struct _finddata_t  fdata;
-int handle;
-struct dirent   current;
-} DIR;
-
-static DIR * C_fcall
-opendir(const char *name)
-{
-int name_len = strlen(name);
-int what_len = name_len + 3;
-DIR *dir = (DIR *)malloc(sizeof(DIR));
-char *what;
-if (!dir)
-{
-	errno = ENOMEM;
-	return NULL;
-}
-what = (char *)malloc(what_len);
-if (!what)
-{
-	free(dir);
-	errno = ENOMEM;
-	return NULL;
-}
-C_strlcpy(what, name, what_len);
-if (strchr("\\/", name[name_len - 1]))
-	C_strlcat(what, "*", what_len);
-else
-	C_strlcat(what, "\\*", what_len);
-
-dir->handle = _findfirst(what, >fdata);
-if (dir->handle == -1)
-{
-	free(what);
-	free(dir);
-	return NULL;
-}
-dir->current.d_name = NULL; /* as the first-time indicator */
-free(what);
-return dir;
-}
-
-static int C_fcall
-closedir(DIR * dir)
-{
-if (dir)
-{
-	int res = _findclose(dir->handle);
-	free(dir);
-	return res;
-}
-return -1;
-}
-
-static struct dirent * C_fcall
-readdir(DIR * dir)
-{
-if (dir)
-{
-	if (!dir->current.d_name /* first time after opendir */
-	 || _findnext(dir->handle, >fdata) != -1)
-	{
-	dir->current.d_name = dir->fdata.name;
-	return >current;
-	}
-}
-return NULL;
-}
-#endif
+#include 
+#include 
 
 #define C_opendir(s,h)  C_set_block_item(h, 0, (C_word) opendir(C_c_string(s)))
 #define C_readdir(h,e)  C_set_block_item(e, 0, (C_word) readdir((DIR *)C_block_item(h, 0)))
-- 
2.31.1



signature.asc
Description: PGP signature


Re: patch for chicken on 64-bit windows (msys and mingw-w64)

2021-08-27 Thread Peter Bex
On Wed, Aug 25, 2021 at 09:29:31PM +0300, Jani Hakala wrote:
> 
> Hi,
> 
> This patch seems to fix the problem with chicken-install when
> 
> chicken-install -defaults ./setup.defaults -update-db
> 
> appears to hang when installing chicken.

I've done some investigation.  First off, Windows doesn't have dirent.h
nor any of its functions in the C library.  So I was wondering why
it works if you simply drop this check and our custom implementation,
and then simply include dirent.h, because it does!

It appears that Mingw comes with its own implementation of dirent.h,
which is included in its crt.  I've also briefly checked mingw32 (the
barely maintained "original" Mingw) and it looks like it has it too.

I suppose the #ifdef check was either cargo-culted from somewhere else,
came from a distant time when Mingw didn't have its own dirent.h or was
still there from when we also supported building with MSVC.

I'm a bit on the fence whether we should simply apply this patch (which
I'm not really clear on if it should work and why) or drop the check
completely and rely on Mingw's (hopefully nicely debugged, tested and
maintained) version of dirent.h

My gut says we should just drop our custom version and rely on Mingw's
dirent.h , but that is a bigger change for the RC (but we can ask people
to test again of course) and would make supporting MSVC in the future a
bit more difficult again.

Thoughts?  Opinions?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] Export procedures from srfi-17

2021-08-19 Thread Peter Bex
On Thu, Aug 19, 2021 at 01:58:48PM +0200, Peter Bex wrote:
> PS: aren't all the register-primitive-module invocations for srfis wrong,
> in that they register a module under a similarly named unit, but there
> is no such unit library (in any of these cases)?

I thought about it some more and realised that the "module-library" field
of the module record type is only used to emit (##core#require ...) in
chicken.syntax#import-definition, and ##core#require treats
"builtin-features" as being already linked, which includes the srfis.
It also treats "core-units" as being already linked, which includes
"library" among others, of course.

I think we can simplify this logic by dropping the builtin-features bit,
as that seems like it's a hack specific for SRFIs.  I've filed
https://bugs.call-cc.org/ticket/1777 for this.  Of course not something
we should do for 5.3.

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Export procedures from srfi-17

2021-08-19 Thread Peter Bex
Hi all,

Lassi Kortela posted a patch on IRC to fix the fact that our SRFI-17
module doesn't export the "setter" and "getter-with-setter" procedures.
If this is approved, please also apply it to the prerelease branch,
as it is both trivial and important enough to include.

PS: aren't all the register-primitive-module invocations for srfis wrong,
in that they register a module under a similarly named unit, but there
is no such unit library (in any of these cases)?

In other words, shouldn't register-primitive-module be defined like so:

;; same as register-core-module, but uses "library" as the library name
(define (##sys#register-primitive-module name vexports #!optional (sexports 
'()))
  (##sys#register-core-module name 'library vexports sexports))

Or, perhaps, it should be #f, since module-library can be false and
syntax-only libraries don't require a unit to be linked in.

Cheers,
Peter
From 86530c17b873c5063ff56275ea4d8e4af29d123f Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Thu, 19 Aug 2021 13:45:29 +0200
Subject: [PATCH] Also export setter and getter-with-setter from builtin
 srfi-17 module

Thanks to Lassi Kortela for the patch.
---
 NEWS| 9 -
 modules.scm | 7 +--
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 051b212d..c30c4395 100644
--- a/NEWS
+++ b/NEWS
@@ -1,4 +1,11 @@
-5.3.0
+5.3.0rc2
+
+- Core libraries
+  - The srfi-17 module now exports the `getter-with-setter` and `setter`
+procedures, not just the set! macro (thanks to Lassi Kortela)
+
+
+5.3.0rc1
 
 - Core libraries
   - Fixed an inadvertant error during error reporting in the `numerator`
diff --git a/modules.scm b/modules.scm
index 36d32032..cbe48fe7 100644
--- a/modules.scm
+++ b/modules.scm
@@ -1152,8 +1152,11 @@
 (##sys#register-primitive-module
  'srfi-16 '() (se-subset '(case-lambda) ##sys#chicken.base-macro-environment))
 
-(##sys#register-primitive-module
- 'srfi-17 '() (se-subset '(set!) ##sys#default-macro-environment))
+(##sys#register-core-module
+ 'srfi-17 'library
+ '((getter-with-setter . chicken.base#getter-with-setter)
+   (setter . chicken.base#setter))
+ (se-subset '(set!) ##sys#default-macro-environment))
 
 (##sys#register-core-module
  'srfi-23 'library '((error . chicken.base#error)))
-- 
2.30.2



signature.asc
Description: PGP signature


Re: [PATCH] use "share/chicken" also on compilation target

2021-08-10 Thread Peter Bex
On Fri, Aug 06, 2021 at 10:22:42PM +0200, felix.winkelm...@bevuta.com wrote:
> This patch changes TARGET_SHARE_HOME to also include the "chicken" suffix,
> so it should be identical to INSTALL_SHARE_HOME in normal situations (when
> not cross-compiling). This also addresses the installation location of "mac.r"
> (only on macs, which should use IDATADIR and not ISHAREDIR).

Thanks, pushed!

Cheers,
Peter


signature.asc
Description: PGP signature


Release process?

2021-08-06 Thread Peter Bex
Hi all,

Evan just closed the last ticket in our roadmap for 5.3
(https://bugs.call-cc.org/roadmap).  So I think now is a good
time to start the (pre)release process.

However, there are a few questions remaining regarding uninstallation
of files and the file tree in the README.  Should we wait until those
are resolved?  Is anyone working on them at all?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: Layout of files installed under "share"

2021-08-04 Thread Peter Bex
On Tue, Aug 03, 2021 at 08:07:33PM +0300, Lassi Kortela wrote:
> README shows the following layout for files installed under "share":
> 
>   `-- share
>   |-- chicken
>   |   |-- doc
>   |   |   |-- LICENSE
>   |   |   |-- README
>   |   |   |-- DEPRECATED
>   |   |   |-- mac.r (Macintosh)
>   |   |   |-- CHICKEN.icns  (Macintosh)
>   |   |   |-- manual-html
>   |   |   |-- chicken.png
>   |   |   |-- feathers.tcl
>   |   |   `-- *.html
>   |   `-- setup.defaults
> 
> Should "mac.r", "CHICKEN.icns", and "feathers.tcl" go directly under
> "share/chicken" like "setup.defaults" does, instead of going under
> "share/chicken/doc"?

hm, it looks like feathers.tcl is definitely in the wrong place in the
diagram.  I don't know about the mac-specific files, but at least the
installation rules seem to indicate they're supposed to go directly
under share/chicken too.

> Could someone more experienced than me sanity-check the diagram for
> "share" in README, as well as the corresponding install and uninstall
> commands in rules.make?

Those mac-specific files should also be uninstalled, they're currently
only installed.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: `make uninstall` does not uninstall the chicken-do manpage

2021-08-04 Thread Peter Bex
On Tue, Aug 03, 2021 at 07:06:23PM +0300, Lassi Kortela wrote:
> Patch attached to fix the problem.

Thanks, pushed!

> Even with this patch applied, `make uninstall` still leaves behind some
> empty directories:
> 
> |-- bin
> |-- include
> |   `-- chicken
> |-- lib
> |   `-- chicken
> `-- share
> `-- man
> `-- man1
> 
> This is probably OK?

I think this is normal - AFAIK most programs leave around directories.
But I suppose we could delete lib/chicken and include/chicken.

What do the other hackers think?

Cheers,
Peter


signature.asc
Description: PGP signature


Re: [PATCH] Add regression test for #1771

2021-08-01 Thread Peter Bex
And now with reexport-m9.scm and reexport-m10.scm
From ab6b7e798151e8af2d987e91acdf86f98f0251f4 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Sun, 1 Aug 2021 15:35:44 +0200
Subject: [PATCH] Add regression test for #1771

This bug was fixed by the fix for #1772, but this was supposed to only
be a performance improvement, so the fix was "accidental" (and,
accordingly, there was no change to the test suite).  Therefore it's a
good idea to add a regression test.
---
 NEWS   | 2 ++
 distribution/manifest  | 2 ++
 tests/reexport-m10.scm | 3 +++
 tests/reexport-m9.scm  | 6 ++
 tests/reexport-tests-2.scm | 5 +
 tests/runtests.bat | 4 
 tests/runtests.sh  | 2 ++
 7 files changed, 24 insertions(+)
 create mode 100644 tests/reexport-m10.scm
 create mode 100644 tests/reexport-m9.scm

diff --git a/NEWS b/NEWS
index 899ddb2f..91507e1d 100644
--- a/NEWS
+++ b/NEWS
@@ -36,6 +36,8 @@
 - Module system
   - Reexported macros now work when the reexporting module redefines
 identifiers from the original (fixes #1757, reported by Sandra Snan).
+  - When using "except" in "import" to omit reexported macros,
+they are really not imported (#1771, reported by Sandra Snan).
 
 - Runtime system
   - Sleeping primordial thread doesn't forget mutations made to
diff --git a/distribution/manifest b/distribution/manifest
index 53aeaa4f..00a8c515 100644
--- a/distribution/manifest
+++ b/distribution/manifest
@@ -198,6 +198,8 @@ tests/reexport-m5.scm
 tests/reexport-m6.scm
 tests/reexport-m7.scm
 tests/reexport-m8.scm
+tests/reexport-m9.scm
+tests/reexport-m10.scm
 tests/reexport-tests.scm
 tests/reexport-tests-2.scm
 tests/ec.scm
diff --git a/tests/reexport-m10.scm b/tests/reexport-m10.scm
new file mode 100644
index ..f485bad2
--- /dev/null
+++ b/tests/reexport-m10.scm
@@ -0,0 +1,3 @@
+(module reexport-m10 ()
+  (import (only reexport-m9 define) (chicken module))
+  (reexport (only reexport-m9 define)))
diff --git a/tests/reexport-m9.scm b/tests/reexport-m9.scm
new file mode 100644
index ..8bc3a425
--- /dev/null
+++ b/tests/reexport-m9.scm
@@ -0,0 +1,6 @@
+(module reexport-m9 (define)
+  (import (only scheme define-syntax syntax-rules) (only (chicken base) error))
+  (define-syntax define
+(syntax-rules ()
+  ((_ a b)
+   (error "wrong define!")
diff --git a/tests/reexport-tests-2.scm b/tests/reexport-tests-2.scm
index fbaad6cf..8f57808b 100644
--- a/tests/reexport-tests-2.scm
+++ b/tests/reexport-tests-2.scm
@@ -11,3 +11,8 @@
 (import reexport-m8)
 (assert (equal? '(d c b a) (reexported-reverse '(a b c d
 (assert (equal? '(d c b a) (reexported-local-reverse '(a b c d
+
+;; Regression test for #1771 where reexports would get ignored by
+;; "except" specifier.
+(import (except reexport-m10 define))
+(define something 1) ;; If reexport messed up, this would be syntax from reexport-m9 expanding to (error ...)
diff --git a/tests/runtests.bat b/tests/runtests.bat
index b696899c..8411ec72 100644
--- a/tests/runtests.bat
+++ b/tests/runtests.bat
@@ -334,6 +334,10 @@ if errorlevel 1 exit /b 1
 if errorlevel 1 exit /b 1
 %compile_s% reexport-m8.scm -J
 if errorlevel 1 exit /b 1
+%compile_s% reexport-m9.scm -J
+if errorlevel 1 exit /b 1
+%compile_s% reexport-m10.scm -J
+if errorlevel 1 exit /b 1
 %compile% reexport-tests-2.scm
 if errorlevel 1 exit /b 1
 a.out
diff --git a/tests/runtests.sh b/tests/runtests.sh
index 11503b3c..e7dd37b3 100755
--- a/tests/runtests.sh
+++ b/tests/runtests.sh
@@ -264,6 +264,8 @@ $compile_s reexport-m5.scm -J
 $compile_s reexport-m6.scm -J
 $compile_s reexport-m7.scm -J
 $compile_s reexport-m8.scm -J
+$compile_s reexport-m9.scm -J
+$compile_s reexport-m10.scm -J
 $compile reexport-tests-2.scm
 ./a.out
 
-- 
2.20.1



signature.asc
Description: PGP signature


[PATCH] Add regression test for #1771

2021-08-01 Thread Peter Bex
Hi all,

As Evan found out, my fix for #1772 happened to also fix #1771, but
that was not what I expected.  So, I think it's a good idea to add a
regression test so that the bug can't sneak back in when we make
further changes to the import stuff.

Cheers,
Peter
From 084b189e6adaf4809649ee44e72bec0c8115b0cd Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Sun, 1 Aug 2021 15:35:44 +0200
Subject: [PATCH] Add regression test for #1771

This bug was fixed by the fix for #1772, but this was supposed to only
be a performance improvement, so the fix was "accidental" (and,
accordingly, there was no change to the test suite).  Therefore it's a
good idea to add a regression test.
---
 NEWS   | 2 ++
 distribution/manifest  | 2 ++
 tests/reexport-tests-2.scm | 5 +
 tests/runtests.bat | 4 
 tests/runtests.sh  | 2 ++
 5 files changed, 15 insertions(+)

diff --git a/NEWS b/NEWS
index 899ddb2f..91507e1d 100644
--- a/NEWS
+++ b/NEWS
@@ -36,6 +36,8 @@
 - Module system
   - Reexported macros now work when the reexporting module redefines
 identifiers from the original (fixes #1757, reported by Sandra Snan).
+  - When using "except" in "import" to omit reexported macros,
+they are really not imported (#1771, reported by Sandra Snan).
 
 - Runtime system
   - Sleeping primordial thread doesn't forget mutations made to
diff --git a/distribution/manifest b/distribution/manifest
index 53aeaa4f..00a8c515 100644
--- a/distribution/manifest
+++ b/distribution/manifest
@@ -198,6 +198,8 @@ tests/reexport-m5.scm
 tests/reexport-m6.scm
 tests/reexport-m7.scm
 tests/reexport-m8.scm
+tests/reexport-m9.scm
+tests/reexport-m10.scm
 tests/reexport-tests.scm
 tests/reexport-tests-2.scm
 tests/ec.scm
diff --git a/tests/reexport-tests-2.scm b/tests/reexport-tests-2.scm
index fbaad6cf..8f57808b 100644
--- a/tests/reexport-tests-2.scm
+++ b/tests/reexport-tests-2.scm
@@ -11,3 +11,8 @@
 (import reexport-m8)
 (assert (equal? '(d c b a) (reexported-reverse '(a b c d
 (assert (equal? '(d c b a) (reexported-local-reverse '(a b c d
+
+;; Regression test for #1771 where reexports would get ignored by
+;; "except" specifier.
+(import (except reexport-m10 define))
+(define something 1) ;; If reexport messed up, this would be syntax from reexport-m9 expanding to (error ...)
diff --git a/tests/runtests.bat b/tests/runtests.bat
index b696899c..8411ec72 100644
--- a/tests/runtests.bat
+++ b/tests/runtests.bat
@@ -334,6 +334,10 @@ if errorlevel 1 exit /b 1
 if errorlevel 1 exit /b 1
 %compile_s% reexport-m8.scm -J
 if errorlevel 1 exit /b 1
+%compile_s% reexport-m9.scm -J
+if errorlevel 1 exit /b 1
+%compile_s% reexport-m10.scm -J
+if errorlevel 1 exit /b 1
 %compile% reexport-tests-2.scm
 if errorlevel 1 exit /b 1
 a.out
diff --git a/tests/runtests.sh b/tests/runtests.sh
index 11503b3c..e7dd37b3 100755
--- a/tests/runtests.sh
+++ b/tests/runtests.sh
@@ -264,6 +264,8 @@ $compile_s reexport-m5.scm -J
 $compile_s reexport-m6.scm -J
 $compile_s reexport-m7.scm -J
 $compile_s reexport-m8.scm -J
+$compile_s reexport-m9.scm -J
+$compile_s reexport-m10.scm -J
 $compile reexport-tests-2.scm
 ./a.out
 
-- 
2.20.1



signature.asc
Description: PGP signature


Re: [PATCH] Fix #1757 by not merging module environment into the syntactic environments of reexported macros

2021-07-31 Thread Peter Bex
On Sat, Jul 31, 2021 at 04:05:52PM +1200, Evan Hanson wrote:
> I was wondering whether there are any implications in changing the order
> of the `sexports` field of the module, now that we're putting all
> "normal" syntax exports before any re-exported ones. However, after a
> bit of digging, I don't think this should matter?

Good call, I hadn't realised this was changing the order.  Although
you're right - the order shouldn't really matter.

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Fix memory leak (#1772) when importing modules more than once

2021-07-22 Thread Peter Bex
Hi all,

Attached is a straightforward fix for #1772.  I put it on the
milestone 5.4, but since it's such a simple fix, I think we
could include it in 5.3 anyway.

Cheers,
Peter
From 70e45fe529d51421fbeea8828dc1704e3a821d40 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Thu, 22 Jul 2021 13:22:00 +0200
Subject: [PATCH] Use merge-se, not append in ##sys#import to fix memory leak
 (#1772)

Using append on the current module (meta) environment in "import" will
extend the environment correctly, but it will grow the list.  Instead,
we want to replace add those bindings which are new (and possibly
replace existing bindings of the same name).  This is what merge-se
was made for.

This potentially slows down performance on import, but should be
relatively harmless.
---
 NEWS| 4 
 modules.scm | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/NEWS b/NEWS
index e65f885a..060a9d0e 100644
--- a/NEWS
+++ b/NEWS
@@ -81,6 +81,10 @@
   - Fixed bug in chicken-install regarding variable quotation on UNIX-like
 systems which prevented installation into paths with spaces (#1685).
 
+- Module system
+  - Fixed a memory leak when calling (import) multiple times in a row
+on the same module (#1772; reported by "plugd" on IRC).
+
 5.2.0
 
 - Core libraries
diff --git a/modules.scm b/modules.scm
index 0ba1df49..d91178ff 100644
--- a/modules.scm
+++ b/modules.scm
@@ -820,8 +820,8 @@
cm
(merge-se (module-iexports cm) vsi))
   (dm "export-list: " (module-export-list cm)))
-(import-env (append vsv (import-env)))
-(macro-env (append vss (macro-env)
+(import-env (merge-se (import-env) vsv))
+(macro-env (merge-se (macro-env) vss
 
 (define (module-rename sym prefix)
   (##sys#string->symbol
-- 
2.20.1



signature.asc
Description: PGP signature


[PATCH] Remove severely outdated README section

2021-07-21 Thread Peter Bex
Hi all,

As I was helping someone on IRC, I glanced over the README
and noticed a section talking about compatibility, but that
mentioned only the macro system and module system rewrite
from CHICKEN 4.

We could add information about a CHICKEN 4 to 5 upgrade, but
I think the 5 release is old enough and most people have migrated
already anyway, and the instructions on the wiki are much more
complete anyway.

The README is pretty long already, so I think it's fine to just
drop this section.  That's what the attached patch does.

Cheers,
Peter
From 668e97444f9588788774978c31e5cbcf50dcb561 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Wed, 21 Jul 2021 11:35:18 +0200
Subject: [PATCH] Remove severely outdated README section on compatibility

This mentioned only the upgrade from 3 to 4, and had nothing
about 4 to 5 anyway.  Better to remove it completely to avoid
confusion.
---
 README | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/README b/README
index b2cc4fa5..20c4546b 100644
--- a/README
+++ b/README
@@ -687,18 +687,7 @@ _/_/_/_/_/_/  _/_/_/_/_/
 	extensions for Scheme and CHICKEN programming.
 
 
- 8. Compatibility notes
-
-In CHICKEN 4, the macro system has been reimplemented
-	completely and provides module system, which has considerably
-	more flexibility and power, but will require the
-	re-implementation of macros in code that previously was used
-	with CHICKEN 3. Notably, `define-macro' is not available
-	anymore. See the manual on how to translate such macros to
-	low-level hygienic macros or ask on the CHICKEN mailing list.
-
-
- 9. What's next?
+ 8. What's next?
 
 If you find any bugs, or want to report a problem, please send
 a detailed bug report.
-- 
2.20.1



signature.asc
Description: PGP signature


[PATCH] Fix #1727 with patch from ticket

2021-07-15 Thread Peter Bex
Hi all,

I finally gathered enough gumption to spin up a Windows VM and
test the patch from #1727.  It seems to work properly, so here
is a signed-off version of the patch.

Cheers,
Peter
From 46fd66aef7b2345a92ac944282b313a8a584202d Mon Sep 17 00:00:00 2001
From: Vasilij Schneidermann 
Date: Sun, 11 Apr 2021 20:01:23 +0200
Subject: [PATCH] Correctly quote set calls in Windows scripts
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

According to ​https://ss64.com/nt/set.html, the set command should
quote its entire argument, including the name. Likewise, quotes inside
paths are illegal, therefore no quotes should be used inside PATH.

It appears that the Windows shell cannot deal with quoting inside an
argument, only when quotes surround the argument. This is not the case
on Unix environments at all.

Closes #1727

Signed-off-by: Peter Bex 
---
 NEWS|  3 +++
 egg-compile.scm | 16 
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/NEWS b/NEWS
index e65f885a..90d62c7d 100644
--- a/NEWS
+++ b/NEWS
@@ -80,6 +80,9 @@
 cache directory matching its program name (#1713, thanks to Alice Maz)
   - Fixed bug in chicken-install regarding variable quotation on UNIX-like
 systems which prevented installation into paths with spaces (#1685).
+  - Fixed a similar bug in chicken-install for paths with spaces on mingw
+and mingw-msys (#1727, thanks to Josh Helzer for reporting and Vasilij
+Schneidermann for the patch)
 
 5.2.0
 
diff --git a/egg-compile.scm b/egg-compile.scm
index 186a7227..205505a6 100644
--- a/egg-compile.scm
+++ b/egg-compile.scm
@@ -1128,16 +1128,16 @@ EOF
 ((windows)
  (printf #<

signature.asc
Description: PGP signature


[PATCH] Fix #1757 by not merging module environment into the syntactic environments of reexported macros

2021-07-14 Thread Peter Bex
Hi all,

Here's a reasonably straightforward patch (I hope) for #1757.
Commit should speak for itself, even if the stuff itself is
a bit tricky.

I'm not 100% sure about the other places where "sexps" is used,
if we should use the local macros only there, or not.  All the
tests seem to work if we keep it referring to only local macros,
and I think we shouldn't mess too much with the environments
where that's not needed, so I think it's fine.

Cheers,
Peter
From 96827b77554e8b56838b0e47363326011b6e Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Wed, 14 Jul 2021 12:18:56 +0200
Subject: [PATCH] Don't merge syntactic environment for reexported macros
 (#1757)

Separate the syntactic exports into local and reexported macros,
so we can merge the syntactic environment into the local macros.

The original code would merge the reexporting module's syntactic
environment into every macro it exports, but this is incorrect for
reexported macros - those should stick with their original syntactic
environment, as it was in the defining module.
---
 NEWS   |  4 
 distribution/manifest  |  2 ++
 modules.scm| 14 --
 tests/reexport-m7.scm  | 17 +
 tests/reexport-m8.scm  |  8 
 tests/reexport-tests-2.scm |  6 ++
 tests/runtests.bat |  4 
 tests/runtests.sh  |  2 ++
 8 files changed, 51 insertions(+), 6 deletions(-)
 create mode 100644 tests/reexport-m7.scm
 create mode 100644 tests/reexport-m8.scm

diff --git a/NEWS b/NEWS
index e65f885a..e0e2147d 100644
--- a/NEWS
+++ b/NEWS
@@ -33,6 +33,10 @@
   - Made topological-sort behave better when dependency target is listed
 multiple times by concatenating dependencies (fixes #1185).
 
+- Module system
+  - Reexported macros now work when the reexporting module redefines
+identifiers from the original (fixes #1757, reported by Sandra Snan).
+
 - Runtime system
   - Sleeping primordial thread doesn't forget mutations made to
 parameters in interrupt handlers anymore. (See #1638. Fix
diff --git a/distribution/manifest b/distribution/manifest
index 5416eff7..53aeaa4f 100644
--- a/distribution/manifest
+++ b/distribution/manifest
@@ -196,6 +196,8 @@ tests/reexport-m3.scm
 tests/reexport-m4.scm
 tests/reexport-m5.scm
 tests/reexport-m6.scm
+tests/reexport-m7.scm
+tests/reexport-m8.scm
 tests/reexport-tests.scm
 tests/reexport-tests-2.scm
 tests/ec.scm
diff --git a/modules.scm b/modules.scm
index 0ba1df49..77c9af52 100644
--- a/modules.scm
+++ b/modules.scm
@@ -385,16 +385,18 @@
 	   'import "cannot find implementation of re-exported syntax"
 	   name
   (let* ((sexps
-	  (map (lambda (se)
-		 (if (symbol? se)
-		 (find-reexport se)
-		 (list (car se) #f (##sys#ensure-transformer (cdr se) (car se)
-	   sexports))
+	  (filter-map (lambda (se)
+			(and (not (symbol? se))
+			 (list (car se) #f (##sys#ensure-transformer (cdr se) (car se)
+		  sexports))
+	 (reexp-sexps
+	  (filter-map (lambda (se) (and (symbol? se) (find-reexport se)))
+		  sexports))
 	 (nexps
 	  (map (lambda (ne)
 		 (list (car ne) #f (##sys#ensure-transformer (cdr ne) (car ne
 	   sdefs))
-	 (mod (make-module name lib '() vexports sexps iexports))
+	 (mod (make-module name lib '() vexports (append sexps reexp-sexps) iexports))
 	 (senv (if (or (not (null? sexps))  ; Only macros have an senv
 		   (not (null? nexps))) ; which must be patched up
 		   (merge-se
diff --git a/tests/reexport-m7.scm b/tests/reexport-m7.scm
new file mode 100644
index ..3c62fabf
--- /dev/null
+++ b/tests/reexport-m7.scm
@@ -0,0 +1,17 @@
+(module reexport-m7
+(reexported-reverse (reexported-local-reverse my-reverse))
+
+  (import scheme (chicken syntax))
+
+  (define (my-reverse lis)
+(reverse lis))
+
+  (define-syntax reexported-local-reverse
+(syntax-rules ()
+  ((_ lis)
+   (my-reverse lis
+
+  (define-syntax reexported-reverse
+(syntax-rules ()
+  ((_ lis)
+   (reverse lis)
diff --git a/tests/reexport-m8.scm b/tests/reexport-m8.scm
new file mode 100644
index ..2fe2b9ac
--- /dev/null
+++ b/tests/reexport-m8.scm
@@ -0,0 +1,8 @@
+(module reexport-m8 ()
+  ;; NOTE: Reexport only works properly if m7 is imported here, when
+  ;; the implementing library isn't required by the user of m8..
+  (import reexport-m7
+  (rename scheme (reverse reverse-og))
+	  (rename (chicken base) (identity reverse))
+	  (chicken module))
+  (reexport reexport-m7))
diff --git a/tests/reexport-tests-2.scm b/tests/reexport-tests-2.scm
index 14b51c5a..61d81993 100644
--- a/tests/reexport-tests-2.scm
+++ b/tests/reexport-tests-2.scm
@@ -6,3 +6,9 @@
 (import reexport-m6)
 (f:s1); expands to s2, which is reexported and refers to "s2", which is also visible in this context as "f:s2"
 (f:s2)
+
+;; reexport of syntax using shadowed identifiers in new module (#1757)
+(import reexport-m8)

[PATCH] Another fix for irregex to bring us to 0.9.10

2021-07-06 Thread Peter Bex
Hi all,

Here's one more patch to bring Irregex completely up to date with the
just released 0.9.10 version.  It fixes an issue where "bol" would
overlap with newline characters in a weird way.

Cheers,
Peter
From efe932f4fa7afbc56865d33edfbf6836c34ce919 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Tue, 6 Jul 2021 15:15:34 +0200
Subject: [PATCH] Bump irregex to upstream commit 29334af, bringing us to
 version 0.9.10

This fixes upstream ticket #25, where newlines would overlap with
"bol" in situations where a string matches multiple times due to
inconsistent handling.
---
 NEWS   |  6 --
 irregex-core.scm   | 16 +++-
 tests/test-irregex.scm |  4 
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/NEWS b/NEWS
index 53a40f0f..2e254e48 100644
--- a/NEWS
+++ b/NEWS
@@ -6,15 +6,17 @@
   - Fixed a bug where optimisations for `irregex-match?` would cause
 runtime errors due to the inlined specialisations not being
 fully-expanded (see #1690).
-  - Irregex has been updated to upstream 0.9.9, which fixes behaviour
+  - Irregex has been updated to upstream 0.9.10, which fixes behaviour
 of irregex-replace/all with positive lookbehind so all matches are
 replaced instead of only the first (reported by Kay Rhodes), and
 a regression regarding replacing empty matches which was introduced
 by the fixes in 0.9.7 (reported by Sandra Snan).  Also, the
 http-url shorthand now allows any top-level domain and the old
 "top-level-domain" now also supports "edu" (fixed by Sandra Snan).
-Finally, a problem was fixed with capturing groups inside a kleene
+Also, a problem was fixed with capturing groups inside a kleene
 star, which could sometimes return incorrect parts of the match.
+Finally, "bol" handling was fixed to handle newlines consistently
+so that multiple matches don't overlap (reported by Sandra Snan).
   - current-milliseconds has been deprecated in favor of the name
 current-process-milliseconds, to avoid confusion due to naming
 of current-milliseconds versus current-seconds, which do something
diff --git a/irregex-core.scm b/irregex-core.scm
index f86b7992..55e9a6c0 100644
--- a/irregex-core.scm
+++ b/irregex-core.scm
@@ -30,6 +30,10 @@
 
 
  History
+;; 0.9.10: 2021/07/06 - fixes for submatches under kleene star, empty seqs
+;; in alternations, and bol in folds for backtracking
+;; matcher (thanks John Clements and snan for reporting
+;; and Peter Bex for fixing)
 ;; 0.9.9: 2021/05/14 - more comprehensive fix for repeated empty matches
 ;; 0.9.8: 2020/07/13 - fix irregex-replace/all with look-behind patterns
 ;; 0.9.7: 2019/12/31 - more intuitive handling of empty matches in -fold,
@@ -3508,9 +3512,10 @@
(fail
 ((bol)
  (lambda (cnk init src str i end matches fail)
-   (if (or (and (eq? src (car init)) (eqv? i (cdr init)))
-   (and (> i ((chunker-get-start cnk) src))
-(eqv? #\newline (string-ref str (- i 1)
+   (if (let ((ch (if (> i ((chunker-get-start cnk) src))
+ (string-ref str (- i 1))
+ (chunker-prev-char cnk init src
+ (or (not ch) (eqv? #\newline ch)))
(next cnk init src str i end matches fail)
(fail
 ((bow)
@@ -3908,13 +3913,14 @@
 matches)))
 (if (not m)
 (finish from acc)
-(let ((j (%irregex-match-end-index m 0))
+(let ((j-start (%irregex-match-start-index m 0))
+  (j (%irregex-match-end-index m 0))
   (acc (kons from m acc)))
   (irregex-reset-matches! matches)
   (cond
((flag-set? (irregex-flags irx) ~consumer?)
 (finish j acc))
-   ((= j i)
+   ((= j j-start)
 ;; skip one char forward if we match the empty string
 (lp (list str j end) j (+ j 1) acc))
(else
diff --git a/tests/test-irregex.scm b/tests/test-irregex.scm
index 5cf5b685..0888f09b 100644
--- a/tests/test-irregex.scm
+++ b/tests/test-irregex.scm
@@ -451,6 +451,10 @@
   (irregex-extract (irregex "[aeiou]*") "foobarbaz"))
   (test-equal '("Line 1\n" "Line 2\n" "Line 3")
   (irregex-split 'bol "Line 1\nLine 2\nLine 3"))
+  (test-equal '("foo\n" "bar\n" "baz\n")
+  (irregex-extract '(: bol (+ alpha) newline) "\nfoo\nbar\nbaz\n"))
+  (test-equal '("\nblah" "\nblah" "\nblah")
+  (irregex-extract '(: newline "blah" eol) "\nblah\nblah\nblah\n"))
   )
 
 
-- 
2.20.1



signature.asc
Description: PGP signature


Re: [PATCH] show more helpful error message with "-l"

2021-07-06 Thread Peter Bex
On Thu, Jul 01, 2021 at 10:44:22PM +0200, felix.winkelm...@bevuta.com wrote:
> See patch, suggested by jcroisant on #chicken. Since -lxxx may
> be ambiguous, not all cases may be caught, but it can still be helpful to new
> users.

Thanks, applied.  The patch has some weird indentation and also git
complained about spaces occurring before tabs, both of which I took
the liberty to fix.

Cheers,
Peter


signature.asc
Description: PGP signature


[PATCH] Fix for incorrect NFA compilation of "or" with empty sequence [was: Re: [PATCH] Fix for irregex incorrect matches]

2021-07-05 Thread Peter Bex
On Mon, Jul 05, 2021 at 11:45:36AM +0200, Peter Bex wrote:
> Hi there,
> 
> The attached patch is a port of the most recent commit in the irregex
> repository, which fixes this upstream ticket:
> https://github.com/ashinn/irregex/issues/27

And here is another patch to fix a different issue:
https://github.com/ashinn/irregex/issues/26

I would suggest applying this patch after the one I sent earlier,
because both touch the same paragraph in the NEWS file.  If applied the
other way around, you'll get conflicts.

Cheers,
Peter
From 7606aed8e97666ef3100d9432f9c8575e2094ba6 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Mon, 5 Jul 2021 15:27:32 +0200
Subject: [PATCH] Update irregex to upstream b3116764 (fc1adacb) to fix issue
 with "or"

When compiling an NFA from a SRE object containing an (or) which
contains an empty sequence, the resulting state machine would be
invalid, causing a crash when trying to convert it to a DFA.
This was due a mismatch in internal bookkeeping between state
numbers and the actual state transitions.
---
 irregex-core.scm   | 2 +-
 tests/test-irregex.scm | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/irregex-core.scm b/irregex-core.scm
index a8e7c97f..f86b7992 100644
--- a/irregex-core.scm
+++ b/irregex-core.scm
@@ -2563,7 +2563,7 @@
flags
next
   (and a
-   (let ((c (add-state! (new-state-number a)
+   (let ((c (add-state! (new-state-number (max a b))
 '(
  (nfa-add-epsilon! buf c a #f)
  (nfa-add-epsilon! buf c b #f)
diff --git a/tests/test-irregex.scm b/tests/test-irregex.scm
index f1aefc21..5cf5b685 100644
--- a/tests/test-irregex.scm
+++ b/tests/test-irregex.scm
@@ -567,6 +567,15 @@
   ;; irregex-flags, irregex-lengths
   )
 
+(test-group "SRE representation edge cases"
+  ;; NFA compilation skipped alternative after empty sequence (#26, found by John Clements)
+  (test-equal "empty sequence in \"or\""
+  ""
+  (irregex-match-substring (irregex-search `(or (seq) "foo") "")))
+  (test-equal "alternative to empty sequence in \"or\""
+  "foo"
+  (irregex-match-substring (irregex-search `(or (seq) "foo") "foo"
+
 (test-end)
 
 
-- 
2.20.1



signature.asc
Description: PGP signature


[PATCH] Fix for irregex incorrect matches

2021-07-05 Thread Peter Bex
Hi there,

The attached patch is a port of the most recent commit in the irregex
repository, which fixes this upstream ticket:
https://github.com/ashinn/irregex/issues/27

Cheers,
Peter
From b552052f4085e84d662f70bb76cb4abf41ab25bc Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Mon, 5 Jul 2021 11:38:43 +0200
Subject: [PATCH] Update irregex to upstream 960fa22b, fixing a group matching
 issue

When a kleene star is used around an alternative containing
submatches, in some circumstances the DFA compilation would emit
reordering commands which would cause the regex capturing to go wrong,
returning faulty matches.

This would go wrong because the ordering commands would read from a
memory slot and write to a target memory slot.

For example, the following set of reordering commands has no "correct"
order in which they can be executed:

p[0] <- p[1]
p[1] <- p[0]

After executing both of them in either order, both of the slots will
contain the same value, instead of swapping them as was the intention.
This is fixed by executing the ordering commands after first fetching
the old memory slot locations into a closure.

Fixes upstream issue #27
---
 NEWS   |  4 +++-
 irregex-core.scm   | 18 --
 tests/re-tests.txt |  1 +
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/NEWS b/NEWS
index 46af9bd1..53a40f0f 100644
--- a/NEWS
+++ b/NEWS
@@ -10,9 +10,11 @@
 of irregex-replace/all with positive lookbehind so all matches are
 replaced instead of only the first (reported by Kay Rhodes), and
 a regression regarding replacing empty matches which was introduced
-by the fixes in 0.9.7 (reported by Sandra Snan).  Finally, the
+by the fixes in 0.9.7 (reported by Sandra Snan).  Also, the
 http-url shorthand now allows any top-level domain and the old
 "top-level-domain" now also supports "edu" (fixed by Sandra Snan).
+Finally, a problem was fixed with capturing groups inside a kleene
+star, which could sometimes return incorrect parts of the match.
   - current-milliseconds has been deprecated in favor of the name
 current-process-milliseconds, to avoid confusion due to naming
 of current-milliseconds versus current-seconds, which do something
diff --git a/irregex-core.scm b/irregex-core.scm
index 8f672333..a8e7c97f 100644
--- a/irregex-core.scm
+++ b/irregex-core.scm
@@ -2235,12 +2235,18 @@
 (chunk (cons src (+ i 1
 (vector-set! slot (car s) chunk)))
 (cdr cmds))
-  (for-each (lambda (c)
-  (let* ((tag (vector-ref c 0))
- (ss (vector-ref memory (vector-ref c 1)))
- (ds (vector-ref memory (vector-ref c 2
-(vector-set! ds tag (vector-ref ss tag
-(car cmds)
+		  ;; Reassigning commands may be in an order which
+  ;; causes memory cells to be clobbered before
+  ;; they're read out.  Make 2 passes to maintain
+  ;; old values by copying them into a closure.
+  (for-each (lambda (execute!) (execute!))
+(map (lambda (c)
+   (let* ((tag (vector-ref c 0))
+  (ss (vector-ref memory (vector-ref c 1)))
+  (ds (vector-ref memory (vector-ref c 2)))
+  (value-from (vector-ref ss tag)))
+ (lambda () (vector-set! ds tag value-from
+ (car cmds))
   (if new-finalizer
   (lp2 (+ i 1) next src (+ i 1) new-finalizer)
   (lp2 (+ i 1) next res-src res-index #f
diff --git a/tests/re-tests.txt b/tests/re-tests.txt
index 7a56edb7..39a747e6 100644
--- a/tests/re-tests.txt
+++ b/tests/re-tests.txt
@@ -171,3 +171,4 @@ multiple words	multiple words, yeah	y	&	multiple words
 (a([^a])*)*	abcaBC	y	&-\1-\2	abcaBC-aBC-C
 ([Aa]b).*\1	abxyzab	y	&-\1	abxyzab-ab
 a([\/\\]*)b	a//\\b	y	&-\1	a//\\b-//\\
+(?:[[:alnum:]]|(@[[:alnum:]]))*	oeh@2tu@2n342	y	\1	@2
-- 
2.20.1



signature.asc
Description: PGP signature


[PATCH] Add runtime option to fix randomization seed on startup

2021-06-17 Thread Peter Bex
Hi all,

Megane looked into #1650 and found out that the srand(time(NULL)) call
is the culprit; due to the hash table randomization the number of GCs
can differ on every run.

This led me to think that this could be one reason our benchmarks are so
noisy, so I added a runtime option to fix the randomization factor on
startup.  This does indeed completely remove all deviation on the
"number of major GCs" metric in the CHICKEN benchmarks, but unfortunately
doesn't seem to do all that much for the cpu-time or GC time metrics.

Oh well, I think it could be useful enough anyway, and perhaps a good
first stab at reducing the noisiness of our benchmarks.

The patch should speak for itself.

Cheers,
Peter
From eecb550157a0bf809132329928b9338c37875bd8 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Thu, 17 Jun 2021 13:28:57 +0200
Subject: [PATCH] Add new -:R runtime option to influence how srand() is called

Without this, there's some nondeterminism when running an
application, which defends against symbol table stuffing attacks
but can make debugging or benchmarking GC issues difficult.

As noted in #1650, it is unsettling when a program behaves
differently every time you run it.

While we're at it, remove unnecessary C_fix() call on the output of
time(NULL), as srand() is not a CHICKEN C function accepting fixnums,
but a native C function accepting regular integers.
---
 NEWS  |  2 ++
 manual/Using the compiler |  2 ++
 runtime.c | 12 +++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index e06914c7..46af9bd1 100644
--- a/NEWS
+++ b/NEWS
@@ -50,6 +50,8 @@
   - Garbage collection algorithm has been changed to reduce thrashing
 when heap is almost full, by growing the heap sooner.  A new -:hf
 option was added to tweak when heap growth should occur.
+  - Added `-:R' runtime option to initialize rand() state
+deterministically (should help with #1650 and benchmarking).
 
 - Compiler
   - Avoid re-using argvector when inline rest operations are being
diff --git a/manual/Using the compiler b/manual/Using the compiler
index 7e8ff6de..68f09908 100644
--- a/manual/Using the compiler	
+++ b/manual/Using the compiler	
@@ -237,6 +237,8 @@ by the startup code and will not be contained in the result of
 
 ; {{-:r}} : Writes trace output to stderr. This option has no effect in files compiled with the {{-no-trace}} options.
 
+; {{-:RNUMBER}} : Specifies the initial number passed to seed the {{rand()}} PRNG (which is currently only used to randomize the symbol table).  If not supplied, the current system time is used.  This can be useful when debugging or benchmarking because it removes a source of nondeterminism which can affect how soon or how often the GC is triggered.
+
 ; {{-:sNUMBER}} : Specifies stack size.
 
 ; {{-:tNUMBER}} : Specifies symbol table size.
diff --git a/runtime.c b/runtime.c
index 580c6fbe..93dd9d29 100644
--- a/runtime.c
+++ b/runtime.c
@@ -454,6 +454,7 @@ static C_TLS int
   stack_size_changed,
   dlopen_flags,
   heap_size_changed,
+  random_state_initialized = 0,
   chicken_is_running,
   chicken_ran_once,
   pass_serious_signals = 1,
@@ -845,7 +846,10 @@ int CHICKEN_initialize(int heap, int stack, int symbols, void *toplevel)
   current_module_handle = NULL;
   callback_continuation_level = 0;
   gc_ms = 0;
-  srand(C_fix(time(NULL)));
+  if (!random_state_initialized) {
+srand(time(NULL));
+random_state_initialized = 1;
+  }
 
   for(i = 0; i < C_RANDOM_STATE_SIZE / sizeof(C_uword); ++i)
 random_state[ i ] = rand();
@@ -1379,6 +1383,7 @@ void CHICKEN_parse_command_line(int argc, char *argv[], C_word *heap, C_word *st
 		 " -:huPERCENTAGE   set percentage of memory used at which heap will be shrunk\n"
 		 " -:hSIZE  set fixed heap size\n"
 		 " -:r  write trace output to stderr\n"
+		 " -:RSEED  initialize rand() seed with SEED (helpful for benchmark stability)\n"
 		 " -:p  collect statistical profile and write to file at exit\n"
 		 " -:PFREQUENCY like -:p, specifying sampling frequency in us (default: 1)\n"
 		 " -:sSIZE  set nursery (stack) size\n"
@@ -1494,6 +1499,11 @@ void CHICKEN_parse_command_line(int argc, char *argv[], C_word *heap, C_word *st
 	  show_trace = 1;
 	  break;
 
+	case 'R':
+	  srand((unsigned int)arg_val(ptr));
+	  random_state_initialized = 1;
+	  goto next;
+
 	case 'x':
 	  C_abort_on_thread_exceptions = 1;
 	  break;
-- 
2.20.1



signature.asc
Description: PGP signature


[PATCH] Fix #1756 by replacing rest ops for explicitly consed rest args

2021-06-16 Thread Peter Bex
Hi all,

The attached patches fix #1756 by detecting when a procedure with a rest
argument is converted into a regular procedure with the last argument
being a list which is consed up at the call sites (aka an "explicitly
consed rest argument").  When this happens, we "undo" the argvector-based
rest operations by replacing them with regular car/cdr/length list
operations, like we also already do in closure conversion when the
rest arg has been reified into a list and stored inside the current
closure.

To avoid code duplication, there's a commit which first moves that code
into a procedure in support.scm, followed by a commit which calls this
new procedure in the situation described above.

The first commit simply adds tracking of the situation so that we know
when a rest argument variable has become an explicitly consed parameter.

So, while this is a medium-sized change, hopefully the way the commits
are structured should make it easy to review.

Cheers,
Peter
From bf48a7f9dbb26e3461d1090da63ca9496d902287 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Wed, 16 Jun 2021 08:32:28 +0200
Subject: [PATCH 1/3] Add information to the db for rest args which are
 explicitly consed

Before, this would only be stored as an "explicit-rest" property on
the procedure variable.  In contexts where you only have the variable
it was difficult to find out whether the rest arg was explicitly
consed.  Rest-ops only hold a reference to the rest arg variable, not
the procedure.
---
 batch-driver.scm | 2 +-
 core.scm | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/batch-driver.scm b/batch-driver.scm
index 78296c9d..3f456531 100644
--- a/batch-driver.scm
+++ b/batch-driver.scm
@@ -147,7 +147,7 @@
 		   ((potential-values)
 			(set! pvals (cdar es)))
 		   ((replacable home contains contained-in use-expr closure-size rest-parameter
-captured-variables explicit-rest rest-cdr rest-null?)
+captured-variables explicit-rest rest-cdr rest-null? consed-rest-arg)
 			(printf "\t~a=~s" (caar es) (cdar es)) )
 		   ((derived-rest-vars)
 			(set! derived-rvars (cdar es)))
diff --git a/core.scm b/core.scm
index 2a380f03..c484071c 100644
--- a/core.scm
+++ b/core.scm
@@ -263,6 +263,7 @@
 ;   extended-binding -> If true: variable names an extended binding
 ;   unused ->   If true: variable is a formal parameter that is never used
 ;   rest-parameter -> #f | 'list If true: variable holds rest-argument list
+;   consed-rest-arg ->  If true: variable is a rest variable in a procedure called with consed rest list
 ;   rest-cdr -> (rvar . n)   Variable references the cdr of rest list rvar after n cdrs (0 = rest list itself)
 ;   rest-null? -> (rvar . n) Variable checks if the cdr of rest list rvar after n cdrs is empty (0 = rest list itself)
 ;   derived-rest-vars -> (v1 v2 ...) Other variables aliasing or referencing cdrs of a rest variable
@@ -2435,7 +2436,8 @@
 		  (cond ((and has (not (rassoc sym callback-names eq?)))
 			 (db-put! db (first lparams) 'has-unused-parameters #t) )
 			(rest
-			 (db-put! db (first lparams) 'explicit-rest #t) ) ) ) ) ) ) ) ) )
+			 (db-put! db (first lparams) 'explicit-rest #t)
+			 (db-put! db rest 'consed-rest-arg #t) ) ) ) ) ) ) ) ) )
 
 	 ;; Make 'removable, if it has no references and is not assigned to, and one of the following:
 	 ;; - it has either a value that does not cause any side-effects
-- 
2.20.1

From ed34e4857f319f3654f37680ebd4c358c494f286 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Wed, 16 Jun 2021 08:35:33 +0200
Subject: [PATCH 2/3] Refactor replacing of rest args to make it reusable

This moves the replacing of rest args with corresponding list ops into
a procedure in support.scm
---
 core.scm| 37 ++---
 support.scm | 26 +-
 2 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/core.scm b/core.scm
index c484071c..630bfd04 100644
--- a/core.scm
+++ b/core.scm
@@ -2649,37 +2649,12 @@
 	 ;; This can be improved, as it can actually introduce
 	 ;; many more cdr calls than necessary.
 	 (cond ((eq? class '##core#rest-cdr)
-		(let lp ((cdr-calls (add1 (second params)))
-			 (var rest-var))
-		  (if (zero? cdr-calls)
-			  (transform var here closure)
-			  (lp (sub1 cdr-calls)
-			  (make-node '##core#inline (list "C_i_cdr") (list var))
-
-		   ;; If customizable, the list is consed up at the
-		   ;; call site and there is no argvector.  So convert
-		   ;; back to list-ref/list-tail calls.
-		   ;;
-		   ;; Alternatively, if n isn't val, this node was
-		   ;; processed and the variable got replaced by a
-		   ;; closure access.
-		   ((or (test here 'customizable)
-			(not (eq? val n)))
-		(case class
-		  ((##core#rest-car)
-		   

[PATCH] Fix #1675 by using theoretical maximum addressable range on 64-bit platforms

2021-05-20 Thread Peter Bex
Hi all,

As discussed in the ticket and on IRC, the default maximum heap size
is *not* to kill processes early to prevent programs from running amok,
but as a way to avoid allocating more than our C_word pointers can
address with the masked out bits.

Therefore, the value on 64-bit platforms is not correct - it is 2GB
there, which is the 32-bit limit.  Attached is a signed-off copy of
Dieggsy's trivial patch with a change in the NEWS file to point this
out to users as this may be an important change for people relying
on this limit for other reasons.

Cheers,
Peter
From d6cc3d958c6f73a1bc7adaa89a3c71c7d0394b11 Mon Sep 17 00:00:00 2001
From: dieggsy 
Date: Sat, 10 Apr 2021 13:36:18 -0400
Subject: [PATCH] Increase maximum heap size on 64-bit machines

Signed-off-by: Peter Bex 
---
 NEWS  | 3 +++
 runtime.c | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index c5eb0f03..b283d436 100644
--- a/NEWS
+++ b/NEWS
@@ -35,6 +35,9 @@
 contributed by Sebastien Marie)
   - A feature corresponding to the word size is available
 regardless of the word size (#1693)
+  - The default maximum heap size on 64-bit platforms is now the
+theoretical maximum addressable memory size (#1675).  Use -:m
+if you would like to override this at run-time.
   - Deprecated C_(a_i_current_)milliseconds in favor of
 C_(a_i_)current_process_milliseconds to match the Scheme-level
 deprecation of current-milliseconds.
diff --git a/runtime.c b/runtime.c
index 51cfa7be..dfc2e0ee 100644
--- a/runtime.c
+++ b/runtime.c
@@ -150,8 +150,10 @@ static C_TLS int timezone;
 
 #ifdef C_SIXTY_FOUR
 # define DEFAULT_STACK_SIZE(1024 * 1024)
+# define DEFAULT_MAXIMAL_HEAP_SIZE 0x7ff0
 #else
 # define DEFAULT_STACK_SIZE(256 * 1024)
+# define DEFAULT_MAXIMAL_HEAP_SIZE 0x7ff0
 #endif
 
 #define DEFAULT_SYMBOL_TABLE_SIZE  2999
@@ -159,7 +161,6 @@ static C_TLS int timezone;
 #define DEFAULT_HEAP_SIZE  DEFAULT_STACK_SIZE
 #define MINIMAL_HEAP_SIZE  DEFAULT_STACK_SIZE
 #define DEFAULT_SCRATCH_SPACE_SIZE 256
-#define DEFAULT_MAXIMAL_HEAP_SIZE  0x7ff0
 #define DEFAULT_HEAP_GROWTH200
 #define DEFAULT_HEAP_SHRINKAGE 50
 #define DEFAULT_HEAP_SHRINKAGE_USED25
-- 
2.20.1



signature.asc
Description: PGP signature


[PATCH] Partial fix for #1685

2021-05-20 Thread Peter Bex
Hi all,

Here's a more "official" version of Mario's patch for #1685, which
works on UNIX only.  For Windows we'll probably need to overhaul
egg-compile.scm into some sort of structured system instead of using
ad-hoc quoting, because on Windows it is context-dependent how one
should quote variables and paths (see #1727 for more details).

To me this would be more suitable to fully (i.e. also Windows) fix it
in 5.4, but the UNIX path quoting seems to me like a trivial and nice
improvement to go into 5.3 already.

Cheers,
Peter
From 9fd1dda69f0f5edd0db1b13078116c16c77880dc Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Thu, 20 May 2021 09:57:17 +0200
Subject: [PATCH] Partial fix for #1685 - quote shell variables on UNIX
 platforms

This allows having shell metacharacters (like spaces) in path names.
This does not yet fix the situation for Windows - there, quotation
is context-dependent, so quite a bit harder to fix properly.

Signed-off-by: Peter Bex 
---
 NEWS| 2 ++
 egg-compile.scm | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index c5eb0f03..7f9faecd 100644
--- a/NEWS
+++ b/NEWS
@@ -65,6 +65,8 @@
 `Error: (string->number) bad argument type: #!eof` in some cases.
   - If chicken-install has a program prefix/suffix, it now writes to a
 cache directory matching its program name (#1713, thanks to Alice Maz)
+  - Fixed bug in chicken-install regarding variable quotation on UNIX-like
+systems which prevented installation into paths with spaces (#1685).
 
 5.2.0
 
diff --git a/egg-compile.scm b/egg-compile.scm
index a4c4bf0c..186a7227 100644
--- a/egg-compile.scm
+++ b/egg-compile.scm
@@ -1239,7 +1239,7 @@ EOF
 
 (define (shell-variable var platform)
   (case platform
-((unix) (string-append "${" var "}"))
+((unix) (string-append "\"${" var "}\""))
 ((windows) (string-append "%" var "%"
 
 ;; NOTE `cmd' must already be quoted for shell
-- 
2.20.1



signature.asc
Description: PGP signature


[PATCH] Update irregex to 0.9.9

2021-05-14 Thread Peter Bex
Hi all,

Attached is a patch to bring irregex up to date with the latest upstream.
It includes an important bugfix regarding irregex-replace/all when an
empty match at the beginning of a string is replaced - it would
incorrectly duplicate the first character.

It also includes changes so that the top-level-domain shorthand now also
allows "edu", but this is an incomplete list anyway now that anyone can
buy their own, and the http-url shorthand now goes through "domain"
rather than "domain-common" to allow for top-level domains not listed
in the aforementioned top-level-domain.

Cheers,
Peter
From 1a7ee0d480b296d3a93f81d74854a2d899ed67f2 Mon Sep 17 00:00:00 2001
From: Peter Bex 
Date: Fri, 14 May 2021 08:32:48 +0200
Subject: [PATCH] Update irregex to upstream 0.9.9 (rev 3c367082)

This fixes an edge case from upstream bug #24 where replacing empty
strings with replace/all at the start of a string would incorrectly
duplicate the starting character, found by Sandra Snan.
---
 NEWS   |  8 ++--
 irregex-core.scm   | 38 --
 tests/test-irregex.scm |  5 -
 3 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/NEWS b/NEWS
index 1e852d90..c5eb0f03 100644
--- a/NEWS
+++ b/NEWS
@@ -6,9 +6,13 @@
   - Fixed a bug where optimisations for `irregex-match?` would cause
 runtime errors due to the inlined specialisations not being
 fully-expanded (see #1690).
-  - Irregex has been updated to upstream 0.9.8, which fixes behaviour
+  - Irregex has been updated to upstream 0.9.9, which fixes behaviour
 of irregex-replace/all with positive lookbehind so all matches are
-replaced instead of only the first (reported by Kay Rhodes).
+replaced instead of only the first (reported by Kay Rhodes), and
+a regression regarding replacing empty matches which was introduced
+by the fixes in 0.9.7 (reported by Sandra Snan).  Finally, the
+http-url shorthand now allows any top-level domain and the old
+"top-level-domain" now also supports "edu" (fixed by Sandra Snan).
   - current-milliseconds has been deprecated in favor of the name
 current-process-milliseconds, to avoid confusion due to naming
 of current-milliseconds versus current-seconds, which do something
diff --git a/irregex-core.scm b/irregex-core.scm
index 42f2a806..8f672333 100644
--- a/irregex-core.scm
+++ b/irregex-core.scm
@@ -1,6 +1,6 @@
  irregex.scm -- IrRegular Expressions
 ;;
-;; Copyright (c) 2005-2020 Alex Shinn.  All rights reserved.
+;; Copyright (c) 2005-2021 Alex Shinn.  All rights reserved.
 ;; BSD-style license: http://synthcode.com/license.txt
 
 
@@ -30,6 +30,7 @@
 
 
  History
+;; 0.9.9: 2021/05/14 - more comprehensive fix for repeated empty matches
 ;; 0.9.8: 2020/07/13 - fix irregex-replace/all with look-behind patterns
 ;; 0.9.7: 2019/12/31 - more intuitive handling of empty matches in -fold,
 ;; -replace and -split
@@ -2348,8 +2349,8 @@
 (domain . (seq domain-atom (+ #\. domain-atom)))
 ;;  now anything can be a top-level domain, but this is still handy
 (top-level-domain . (w/nocase (or "arpa" "com" "gov" "mil" "net" "org"
-  "aero" "biz" "coop" "info" "museum"
-  "name" "pro" (= 2 alpha
+  "edu" "aero" "biz" "coop" "info"
+  "museum" "name" "pro" (= 2 alpha
 (domain/common . (seq (+ domain-atom #\.) top-level-domain))
 ;;(email-local-part . (seq (+ (or (~ #\") string
 (email-local-part . (+ (or alphanumeric #\_ #\- #\. #\+)))
@@ -2360,7 +2361,7 @@
   (seq "%" hex-digit hex-digit)))
 (http-url . (w/nocase
  "http" (? "s") "://"
- (or domain/common ipv4-address) ;; (seq "[" ipv6-address "]")
+ (or domain ipv4-address) ;; (seq "[" ipv6-address "]")
  (? ":" (+ numeric)) ;; port
  ;; path
  (? "/" (* (or url-char "/"))
@@ -3889,9 +3890,9 @@
 (if (not (and (integer? end) (exact? end)))
 (error 'irregex-fold "not an exact integer" end))
 (irregex-match-chunker-set! matches irregex-basic-string-chunker)
-(let lp ((src init-src) (i start) (acc knil))
+(let lp ((src init-src) (from start) (i start) (acc knil))
   (if (>= i end)
-  (finish i acc)
+  (finish from acc)
 

Re: [PATCH] make chicken-install cache depend on PROGRAM_PREFIX/SUFFIX

2021-04-15 Thread Peter Bex
On Thu, Apr 15, 2021 at 11:05:36AM +0200, felix.winkelm...@bevuta.com wrote:
> A patch by alicemaz, see #1713.

Thanks, pushed.

Cheers,
Peter


signature.asc
Description: PGP signature


  1   2   3   4   5   6   7   8   9   10   >