Re: New egg: CHICKEN Transducers

2023-01-15 Thread Tomas Hlavaty
Hi Jeremy,

thank you for your detailed response.

On Fri 13 Jan 2023 at 17:56, Jeremy Steward  wrote:
>> On Wed 04 Jan 2023 at 18:48, Jeremy Steward  wrote:
>>> 
>> 
>> My main problem with generators and accumulators is that they
>> basically replace all our existing data types with a new type
>> (generators) that can then be mapped / filtered over in a unified
>> way. After one is done with gmap / gfilter / etc. they can then
>> convert this generator back into some kind of type using an
>> accumulator or one of the built in generator->type procedures. This
>> solves a problem of abstraction by routing around it. Rather than
>> worry about what operations are defined on a type, we instead just
>> create a type that has all operations work on it.
>> 
>> What are the procedures reduced? and unwrap in your example?
>> Don't they point to similar issue with transducers?
>
> |reduced?| is a predicate that checks if a value has been wrapped by 
> |make-reduced|. |unwrap| unwraps a value that has been wrapped by 
> |make-reduced|. A "reduced" value isn't particularly special, but it's 
> using a unique record type (and predicate) to identify a particular kind 
> of value.
>
> This is how early termination is done with transducers. We could use 
> call/cc, but that would involve injecting call/cc somehow through all 
> the transducers / collectors / etc. Instead, if a transducer (e.g. 
> |take|) wants to stop folding over the collection early, it can just 
> wrap the current result in |make-reduced| which will tell the folder 
> "hey, I'm fully reduced" and the folder will stop.
>
> It doesn't really point to the same problem. See, transducers are still 
> monomorphized to the specific type. There is no polymorphism in 
> transducers as the library currently exists. So there's no type 
> deflection going on here.

I understand how the hardoced use-case of early termination works.

I do not understand, why you say that

   "generators replace all our existing data types with a new type"

i.e. original type -> generator

and at the same time it seems to me that you are doing this also in your
transducers:

i.e. original type -> wrapper record

So the difference is not that your transducers replace existing data
types with a new type and transducers do not, but that "your" generators
replace existing data types with closure but your transducers with
record.  Maybe there is a difference, that the generator represents the
whole while the wrapper record represents an item but I am not sure if
that is an interesting distiction to make.

Are you allocating a new record per element of the input sequence?
Is that a good idea?

It seems to me that your choice of using record is arbirtary.
There are other ways to implement this.

I would probably just use a unique (eq) sentinel value.
iirc you already use one.
The if branching and unwrapping would not be necessary as
the collector already uses a sentinel value to stop iirc.
Is there a reason, why this simple solution was not used?

>> This kind of approach is a good first abstraction, but fails because
>> it is generally impossible to make this fast. It also doesn’t solve
>> the fact that we have type->list and list->type proliferation. If
>> anything, it makes it worse because most libraries are not
>> generator-aware, and writing generators correctly can be
>> tricky. That’s not to say writing any code cannot be tricky, but the
>> obvious ways to write a generator are often using
>> make-coroutine-generator which uses call/cc4 and as a result is
>> pretty slow on most Schemes.
>> 
>> It seem that the problem is with the way people write generators in scheme.
>> Not that generators are "generally impossible to make fast".
>
> Generators are "generally impossible to make fast." If you accept a 
> generator as an argument to a procedure, then you don't know (from 
> within that procedure) if that generator produced values directly or is 
> the result of several layers of (gmap ... (gmap ... (gmap ... 
> generator))). Note this isn't true of lists and vectors and other 
> collection types: getting a value from a vector is O(1), and getting all 
> values sequentially is O(n).
>
> Because of that, you can very quickly end up with generators that call 
> generators that call generators that call generators and so on. This 
> isn't inherently slow, but those functions aren't local and it's a lot 
> more expensive than a simple fold / named-let over the data.
>
> Now, for a lot of code that may not be a problem. You might keep things 
> simple or only use generators that directly generate values and aren't 
> chained through several intermediate processes / mappers like above. 
> Alternatively, you might find that a generator is doing something 
> expensive anyways, like a network call. In such cases, maybe the 
> difference between 

Re: New egg: CHICKEN Transducers

2023-01-13 Thread Jeremy Steward

On 1/11/23 16:20, Tomas Hlavaty wrote:

Hi Jeremy,

thank you for interesting reading.


Thank you for taking the time to go through it :)


On Wed 04 Jan 2023 at 18:48, Jeremy Steward  wrote:




My main problem with generators and accumulators is that they
basically replace all our existing data types with a new type
(generators) that can then be mapped / filtered over in a unified
way. After one is done with gmap / gfilter / etc. they can then
convert this generator back into some kind of type using an
accumulator or one of the built in generator->type procedures. This
solves a problem of abstraction by routing around it. Rather than
worry about what operations are defined on a type, we instead just
create a type that has all operations work on it.

What are the procedures reduced? and unwrap in your example?
Don't they point to similar issue with transducers?


|reduced?| is a predicate that checks if a value has been wrapped by 
|make-reduced|. |unwrap| unwraps a value that has been wrapped by 
|make-reduced|. A "reduced" value isn't particularly special, but it's 
using a unique record type (and predicate) to identify a particular kind 
of value.


This is how early termination is done with transducers. We could use 
call/cc, but that would involve injecting call/cc somehow through all 
the transducers / collectors / etc. Instead, if a transducer (e.g. 
|take|) wants to stop folding over the collection early, it can just 
wrap the current result in |make-reduced| which will tell the folder 
"hey, I'm fully reduced" and the folder will stop.


It doesn't really point to the same problem. See, transducers are still 
monomorphized to the specific type. There is no polymorphism in 
transducers as the library currently exists. So there's no type 
deflection going on here.




This kind of approach is a good first abstraction, but fails because
it is generally impossible to make this fast. It also doesn’t solve
the fact that we have type->list and list->type proliferation. If
anything, it makes it worse because most libraries are not
generator-aware, and writing generators correctly can be
tricky. That’s not to say writing any code cannot be tricky, but the
obvious ways to write a generator are often using
make-coroutine-generator which uses call/cc4 and as a result is
pretty slow on most Schemes.

It seem that the problem is with the way people write generators in scheme.
Not that generators are "generally impossible to make fast".


Generators are "generally impossible to make fast." If you accept a 
generator as an argument to a procedure, then you don't know (from 
within that procedure) if that generator produced values directly or is 
the result of several layers of (gmap ... (gmap ... (gmap ... 
generator))). Note this isn't true of lists and vectors and other 
collection types: getting a value from a vector is O(1), and getting all 
values sequentially is O(n).


Because of that, you can very quickly end up with generators that call 
generators that call generators that call generators and so on. This 
isn't inherently slow, but those functions aren't local and it's a lot 
more expensive than a simple fold / named-let over the data.


Now, for a lot of code that may not be a problem. You might keep things 
simple or only use generators that directly generate values and aren't 
chained through several intermediate processes / mappers like above. 
Alternatively, you might find that a generator is doing something 
expensive anyways, like a network call. In such cases, maybe the 
difference between transducers vs. higher-order generator functions 
don't matter.


For me though, I don't really want to work off that assumption. 
Transducers are plenty fast and there's not really the same issue of 
non-locality of the data / functions that one has with generators. The 
general advice of "benchmark your code and target performance 
improvements" applies, but if I write a function that accepts a 
generator, I'd like to be able to guess at what I'm dealing with.


Fun experiment: for any given set of procedures over a generator I'd 
wager that transducers will probably be faster. You can test this yourself:


(transduce reader-fold
   (compose
 ;; All variants of map / filter / etc
 )
   collect-list
   generator)

I am willing to bet this will be faster than almost any combination of 
gmap / gfilter / etc. I say almost because I know if your generator is 
doing a lot of IO or some kind of expensive network call then the 
differences will be completely dwarfed.


So in the most general case, I'd say yeah - generators are pretty much 
impossible to make fast. Compared to iterating over a list or vector, I 
can't know upfront that any given process is going to be bounded within 
some time / 

Re: New egg: CHICKEN Transducers

2023-01-11 Thread Tomas Hlavaty
Hi Jeremy,

thank you for interesting reading.

On Wed 04 Jan 2023 at 18:48, Jeremy Steward  wrote:
> 

   My main problem with generators and accumulators is that they
   basically replace all our existing data types with a new type
   (generators) that can then be mapped / filtered over in a unified
   way. After one is done with gmap / gfilter / etc. they can then
   convert this generator back into some kind of type using an
   accumulator or one of the built in generator->type procedures. This
   solves a problem of abstraction by routing around it. Rather than
   worry about what operations are defined on a type, we instead just
   create a type that has all operations work on it.

What are the procedures reduced? and unwrap in your example?
Don't they point to similar issue with transducers?

   This kind of approach is a good first abstraction, but fails because
   it is generally impossible to make this fast. It also doesn’t solve
   the fact that we have type->list and list->type proliferation. If
   anything, it makes it worse because most libraries are not
   generator-aware, and writing generators correctly can be
   tricky. That’s not to say writing any code cannot be tricky, but the
   obvious ways to write a generator are often using
   make-coroutine-generator which uses call/cc4 and as a result is
   pretty slow on most Schemes.

It seem that the problem is with the way people write generators in scheme.
Not that generators are "generally impossible to make fast".

Regards,

Tomas



Re: New egg: CHICKEN Transducers

2023-01-10 Thread Jeremy Steward

Hey! These are some interesting changes, let me go through them bit-by-bit:

On 1/9/23 11:45, siiky wrote:

[^1]: This can be easily remied if the sentinel is given to the collect
instead of the transduce, in which case the current def of transduce is
a single case-lambda case instead of two:

(transduce fold (map *) (collect 'sentinel-value) data)


I was "forced" to procrastinate yesterday by this incredible force... ^^'


Before jumping into the remaining bits, I saw that there was a commit in 
your fork of the repo regarding this. It's an interesting idea and I'm 
inclined to actually merge this in proper, but there is a trade-off (as 
there always is).


Namely, as transducers currently are one can use variable argument 
procedures as "collectors." I somewhat weighed on this back and forth 
when writing 0.1.0: is it worth having separate sum / product procedures 
or should I allow for collectors to be any procedure that is defined 
with 0-arity to 2-arity?


(transduce list-fold
   values
   +
   (list 1 2 3 4 5))

See, if you want to be consistent and make it so that collectors always 
define their sentinel (which makes the API a bit more consistent), then 
you cannot do the above, since there's no way to call |(+ 100)| as a 
collector. You need a separate "sum" or "collect-sum" to function for 
you here.


I left it as-is because the broader Clojure community uses a similar 
strategy for their transducers. I think a difference here isn't 
necessarily /bad/ but it may mean that certain combinations collector + 
sentinel aren't as easy to express (without doing your own case-lambda, 
which means writing collectors yourself).


Thoughts? I don't think giving every collector an optional argument is 
necessarily wrong in terms of API but the trade-off does exist. Given 
that there's little code written with transducers today I'm not sure I 
have any argument-from-inertia to say which way is better or more useful 
but I think this is a good question to solve before a 1.0 release.



I've been thinking for the past couple of days about this and couldn't
come up with any way to make a generic "zip" work with transducers other
than (a) rewriting the basic algorithm of the egg (which is simple and I
prefer that way), or (b) introducing the concept of an "iterator" (which
is, I think, how Rust does it, and is equivalent to C++ Ranges (cf.
"Functional Programming in C++", Ivan Čukić)). The problem is that
type-fold does all the work from start to finish and can't be
"suspended" to look to the side for a second.


I personally think a fully generic zip / chain / interleave / flatten 
aren't that appealing. You (should) almost always know the type of 
something, and if you're doing a partial application, e.g.:


(define my-process
  (compose
(zip-list (list 'a 'b 'c))
(map ...)
(filter ...)))

to later call |my-process| in a transducer, then the "zip" is 
well-defined by this point in most scenarios (in my experience, although 
that might be me conflating how I write code based on the API 
restrictions I have presented for myself). You have to monomorphize your 
function at some point, else you're using an intermediate type (which is 
just monomorphizing by indirection).



Because (a) goes against extending this egg, replacing its
implementation instead, and because it's a lot more work than (and not
as obvious as) (b), I went with (b).

An iterator is some object that keeps internal state about a collection
and how to access it, that can be used to incrementally[^0] traverse
this collection's elements in some (iterator-defined) order. Just like
streams![^1] So after a few hours hacking at it, it's as simple as this:


I do like your process of discovery; In fact, I actually started writing 
this egg with this exact thinking in mind: why not try to define a 
common iterator type for everything and then only provide one 
|iterable-fold| to rule them all? Well, it ended up being pretty slow, 
which forced me to re-think my assumptions about performance and how I 
should think about types in Scheme. Trying too hard to avoid any 
type-specification ends up being a problem in its own right if you take 
it too far.


Nevertheless, Rust's iterators are lazy (and very similar to streams). 
SRFI-41 streams have some performance questions compared to e.g. 
generators though, which is why I opted to make sure that reader-fold 
works with generators off the bat.


Generators are the more appropriate solution to laziness in most cases. 
Delay and Force seem to come with too much performance baggage for my 
taste, although I could absolutely see providing (transducers streams) 
and doing basically what you've done with "iterators" within transducers 
in a future version of the library.


After all, if you had (transducers streams) you could use the base and 
streams sub-modules and write whatever extension you wanted extant to 
that. This to me is why I 

Re: New egg: CHICKEN Transducers

2023-01-09 Thread siiky

Hiya,

This wouldn't work very well with the current API in Scheme, because the 
fold is passed to transduce explicitly.  And given that, either a 
restriction must be imposed on the input data structures (they must all 
be of the same type); or the folder of each input data structure must be 
given. If the latter something like this would be kinda awkward (though 
not impossible, I guess):


(transduce
    list-fold vector-fold
    (map *)
    collect-list
    lst vec)

And accepting the optional sentinel with this new API would also be 
awkward to implement[^1]...



[^1]: This can be easily remied if the sentinel is given to the collect 
instead of the transduce, in which case the current def of transduce is 
a single case-lambda case instead of two:


(transduce fold (map *) (collect 'sentinel-value) data)


I was "forced" to procrastinate yesterday by this incredible force... ^^'

I've been thinking for the past couple of days about this and couldn't 
come up with any way to make a generic "zip" work with transducers other 
than (a) rewriting the basic algorithm of the egg (which is simple and I 
prefer that way), or (b) introducing the concept of an "iterator" (which 
is, I think, how Rust does it, and is equivalent to C++ Ranges (cf. 
"Functional Programming in C++", Ivan Čukić)). The problem is that 
type-fold does all the work from start to finish and can't be 
"suspended" to look to the side for a second.


Because (a) goes against extending this egg, replacing its 
implementation instead, and because it's a lot more work than (and not 
as obvious as) (b), I went with (b).


An iterator is some object that keeps internal state about a collection 
and how to access it, that can be used to incrementally[^0] traverse 
this collection's elements in some (iterator-defined) order. Just like 
streams![^1] So after a few hours hacking at it, it's as simple as this:



(transduce
  iterator-fold
  (map (cute apply * <>))
  collect-list
  (iterator-zip (list->iterator '(0 1 2 3 4))
(vector->iterator #(5 6 7 8 9))
(range->iterator 5)))

;=> (0 6 28 72 144)


By implementing iterator-fold and the type->iterator functions 
everything works OOB with the transducers egg. With some small changes 
to the egg[0] (especially [1]), it can be improved (IMO) to this:



(transduce
  (zipped-iterator-fold list->iterator vector->iterator range->iterator)
  (map (cute apply * <>))
  (collect-list)
  '(0 1 2 3 4) #(5 6 7 8 9) 5)


This small PoC can be found at [2] (just a bunch of SRFI 41 
renames[^2]). The pros/cons of this approach, the way I see it, are:


A new iterator-fold is needed to use the transducers egg with iterators. 
However, if you don't want/need iterators, you don't have to implement 
type->iterator for your data type(s), just use type-fold and there's no 
performance penalty (IIRC the main selling point of one of the 
generators SRFI was performance compared with streams?).  OTOH if you do 
want/need iterators, you have to implement type->iterator too (I say 
"too" but one doesn't strictly need both type-fold and type->iterator).


Nevertheless, from my understanding, using transducers on streams should 
still be better performance-wise than composing stream-map/filter/...




[^0]: it can be started/stopped at will

[^1]: I also realized why using lists as the intermediate sequence type 
is not such a big problem in Haskell and other lazy-by-default 
languages: they're actually streams!


[^2]: I also experimented with a hand-written stream-like type. Maybe 
it's possible to do better than streams but I dropped it for now.



[0]: https://git.sr.ht/~siiky/chicken-transducers/log/multiple-iterables
[1]: 
https://git.sr.ht/~siiky/chicken-transducers/commit/68f9d6d3faaf80e1b57c7bda5dec888b34c670dd
[2]: 
https://git.sr.ht/~siiky/experiments/tree/main/item/iterators/transducers-example.scm



PS: great work on the docs, Jeremy, they look very comprehensive!


siiky





Re: New egg: CHICKEN Transducers

2023-01-06 Thread siiky
For one, that's not the same code. :p The SRFI 42 code I showed loops 
over all possible pairs, doesn't zip the inputs.


Just occured to me that I could put one transducer inside the other, 
something like this:


(transduce list-fold
   (map (lambda (x)
  (transduce vector-fold
 (map (cute * x <>))
 collect-list
 vec)))
   collect-concat-list)

Assuming collect-concat-list concatenates the resulting list-of-lists. 
It's not as pretty but doable.






Re: New egg: CHICKEN Transducers

2023-01-06 Thread siiky

On 1/5/23 06:11, siiky wrote:
I think the easiest way for this would be to just zip the list together:

  (transduce list-fold
 (compose
   (zip-list lst2)
   (filter (lambda (p)
 (even? (* (car p) (cdr p))
 collect-list
 lst1)


For one, that's not the same code. :p The SRFI 42 code I showed loops 
over all possible pairs, doesn't zip the inputs.


But the biggest problem with zipping like that, is that then the 
transducer abstraction breaks -- zip has to be implemented for each 
input data structure _pair_(?). With SRFI 42 I could also do this:


(list-ec (:list x lst)
 (:vector y vec)
 (if (even? (* x y)))
 (cons x y))


FYI: Clojure's `sequence` function accepts varargs like `map` and also a
transducer which allows you to express this like so:

 (sequence (comp (map *) (filter odd?)) lst1 lst2)

The docstring describes it like this:

(...)

I think you could offer the same API for `transduce`. Of course, you'll
have to adjust the `map` transducer accordingly, too.


This wouldn't work very well with the current API in Scheme, because the 
fold is passed to transduce explicitly.  And given that, either a 
restriction must be imposed on the input data structures (they must all 
be of the same type); or the folder of each input data structure must be 
given. If the latter something like this would be kinda awkward (though 
not impossible, I guess):


(transduce
   list-fold vector-fold
   (map *)
   collect-list
   lst vec)

And accepting the optional sentinel with this new API would also be 
awkward to implement[^1]...



[^1]: This can be easily remied if the sentinel is given to the collect 
instead of the transduce, in which case the current def of transduce is 
a single case-lambda case instead of two:


(transduce fold (map *) (collect 'sentinel-value) data)





Re: New egg: CHICKEN Transducers

2023-01-06 Thread siiky

(...) My reasoning is that (intuitively, without having looked at
an impl of either) type-fold is easier to implement than
type-transduce (...)


Indeed, and this has been the reasoning in why I have left it for the 
most part. An egg like this is nothing if it cannot be extended easily. 
Noticing the distinction between implementing fold, collect, or some 
entire abstraction that tries to do both was something I realized about 
halfway through writing it (specifically, when I started adding fold / 
collect procedures outside of the list variants).



I took some time yesterday to read and try to understand the impl 
(transduce, list-fold, list-collect, vector-fold, range-fold, ...) and 
it is indeed pretty uncomplicated. Amazing how something this simple is 
also this useful!


After reading the larger chunks of the impl, transduce is actually just 
a bit of boiler plate around the type-fold. I wonder if it would be 
worth using `type-transduce` instead, if there's also a `transduce*` (or 
something) that takes care of that boiler plate.



(define ((transduce* fold) xform collect) ...)

(define (type-fold ...) ...)
(define type-transduce (transduce* type-fold))


On one hand, it's a bit of typing saved for very little extra 
abstraction; on the other, it's a bit of abstraction for very little 
extra typing... ^^'


I figured this criticism would come up. My thinking was that keeping 
collectors prefixed with `collect-` helps autocomplete and gives you an 
interesting symmetry:


     (transduce list-fold (compose ...) collect-vector (list ...))

Notice how you start with list- on the left and -vector on the right? 
The transducer is in the middle and so you have list-transducer-vector 
if you squint at it hard enough.


As long as there's a reason that makes sense I'm good. :)





Re: New egg: CHICKEN Transducers

2023-01-06 Thread Moritz Heidkamp
Hi,

On  5 January 2023 18:48 -07, Jeremy Steward wrote:

> On 1/5/23 06:11, siiky wrote:
> I think the easiest way for this would be to just zip the list together:
>
>  (transduce list-fold
> (compose
>   (zip-list lst2)
>   (filter (lambda (p)
> (even? (* (car p) (cdr p))
> collect-list
> lst1)


FYI: Clojure's `sequence` function accepts varargs like `map` and also a
transducer which allows you to express this like so:

(sequence (comp (map *) (filter odd?)) lst1 lst2)

The docstring describes it like this:

> When a transducer is supplied, returns a lazy sequence of applications
> of the transform to the items in coll(s), i.e. to the set of first
> items of each coll, followed by the set of second items in each coll,
> until any one of the colls is exhausted.  Any remaining items in other
> colls are ignored. The transform should accept number-of-colls
> arguments

I think you could offer the same API for `transduce`. Of course, you'll
have to adjust the `map` transducer accordingly, too.

Cheers
Moritz



Re: New egg: CHICKEN Transducers

2023-01-05 Thread Chris Brannon
Jeremy Steward  writes:

> 2. Macro-based: This makes it pretty difficult to extend, especially
> if you don't have a clear "standard" or "winning" data structure that
> the community can rely on. Not that it's impossible, but I'd have an
> easier time as a user writing a custom fold procedure (and maybe a
> collect?) and working with transducers than extending eager
> comprehension macros to support new types.

I was reading through the transducers documentation today when this
jumped out at me.
"The fantastic thing about transducers is that they can be very easily
composed together."
Not to mention curried, partially-applied, and all the lovely
higher-order things Scheme makes possible.
I'm pretty much sold.

> 3. Aesthetics: related to syntax / macro based solutions, I have
> strongly religious (read that as: entirely subjective) reasons to
> prefer to keep my Scheme code mostly syntax free.

I get that.  Some really amazing things can be done with syntax, but
procedures are really the most fundamental building block.  Macros can
be, at least for me, devilishly difficult to debug.

-- Chris



Re: New egg: CHICKEN Transducers

2023-01-05 Thread Jeremy Steward

On 1/5/23 09:45, Walter Lewis wrote:

On 1/4/23 8:48 PM, Jeremy Steward wrote:

And I've written a short blog post outlining some of my frustrations
that led me to writing this egg:



Happy to engage with the rest of the community on what the next
priority for such a library should be.


Great post. I would love to see a portable R7RS library, which you
mention as a possibility. For example, I am working on an R7RS
containers library and it would be straightforward to include
transducers as part of it.


Excellent. I'll keep this on the radar then. I'll probably need to read 
(or perhaps if I choose to whip myself enough, write) a guide for what 
needs to happen to go from CHICKEN -> R7RS.


I doubt it's too much, but I do use the check-errors egg so a portable 
shim is necessary, and I think I need whatever the fixnum SRFI is.


Either way, I'll post here to the list if and when that is completed.

Cheers,
--
Jeremy Steward



Re: New egg: CHICKEN Transducers

2023-01-05 Thread Jeremy Steward

On 1/5/23 05:49, siiky wrote:

Awesomesauce, thanks for the egg and the post! This is a smnall-ish pain
point of mine too. I'm still processing the post as well, but I can tell
I'll start using this frequently!


That's great to hear. I appreciate that you enjoyed the post and will 
start using transducers.



First a couple of typos: In footnote 4 it should be `call/cc` and not
`call-cc`? And "its not the Scheme I reach for" :p


Oof. Thanks for doing the editing work for me. It should be updated now.


About question (1) of "Some open problems ...": IMO (transduce type-fold
...) is a good compromise, and it seems better to me than
(type-transduce ...) though it means a tiny bit more typing. My
reasoning is that (intuitively, without having looked at an impl of
either) type-fold is easier to implement than type-transduce, and thus
it should also be easier for someone to extend the transducers egg with
their own fold+collect rather than transduce+collect.  An usecase of
extending the transducers egg is e.g. if I have an egg where I define a
data structure foo and would like users of my egg to use transducers on
(with?) foo, but without having to change the transducers egg itself.


Indeed, and this has been the reasoning in why I have left it for the 
most part. An egg like this is nothing if it cannot be extended easily. 
Noticing the distinction between implementing fold, collect, or some 
entire abstraction that tries to do both was something I realized about 
halfway through writing it (specifically, when I started adding fold / 
collect procedures outside of the list variants).


There is always the possibility to use the wrong fold with the wrong 
datatype or some-such, but there's really no way for me to get around 
that without a full object system and types, so ¯\_(ツ)_/¯



Regarding (4) and strings: if we survived without transducers until now,
I think can survive until transducers for strings until C6 too and use
string->list+list-fold :) (I'm assuming UTF-8 will only become core in
C6, maybe I'm wrong)



If there were enough of a strong signal to just go UTF-8 only I'd 
probably just do that. The problem is that transduction happens on an 
item-by-item basis and I'm not sure that any string-fold will work the 
way you expect it to. In my experience you don't want individual unicode 
code-points, you want the full character.


I guess if I had a special transducer (similar to chunks or 
chunks-exact) and a collector to take individual code-points and 
generate a full grapheme, then I could get something that approximates 
useful unicode transducers.


But as it stands, I think anything regarding strings is going to be 
hairy and icky, because locales suck and internationalization means that 
any string logic will always be (somewhat) fragile.


My kingdom for a day in which users can use strings in any locale as 
easily as some of us abuse ASCII.



Small nitpick comment: I think it makes more sense to have either
type-fold+type-collect or fold-type+collect-type. It's more common in
Scheme to name type-operation so I'd give preference to that, but the
other is more natural and equally good IMO.


I figured this criticism would come up. My thinking was that keeping 
collectors prefixed with `collect-` helps autocomplete and gives you an 
interesting symmetry:


(transduce list-fold (compose ...) collect-vector (list ...))

Notice how you start with list- on the left and -vector on the right? 
The transducer is in the middle and so you have list-transducer-vector 
if you squint at it hard enough.


But that said: I'm completely willing to change that if I get enough 
people telling me they get it backwards and it's incoherent. Ping me so 
I can keep a count of how many times / how many individuals this trips up.


Regards, and thanks again for the comments, I really appreciate the 
feedback.

--
Jeremy Steward



Re: New egg: CHICKEN Transducers

2023-01-05 Thread Jeremy Steward

On 1/5/23 06:11, siiky wrote:

There are some things I would still use SRFI 42 for, at least for now
that I'm new to transducers. For example:

(list-ec (:list x lst1)
   (:list y lst2)
   (if (even? (* x y)))
   (cons x y))

Which is similar to  (filter even (map * lst1 lst2))  but more
performant because there's no intermediate list. I have no idea how to
do this with transducers.



I think the easiest way for this would be to just zip the list together:

(transduce list-fold
   (compose
 (zip-list lst2)
 (filter (lambda (p)
   (even? (* (car p) (cdr p))
   collect-list
   lst1)

Thoughts?
--
Jeremy Steward



Re: New egg: CHICKEN Transducers

2023-01-05 Thread Jeremy Steward

On 1/5/23 04:22, Chris Brannon wrote:

Jeremy Steward  writes:


And I've written a short blog post outlining some of my frustrations
that led me to writing this egg:




That was a fantastic post.  I'm still digesting it.  How about SRFI 42?
Here's your example from the transducer benchmark, written with SRFI 42:

(import srfi-42)
(time (fold-ec 0 (:range x 1) (if (odd? x)) (* x 3) +))


Thank you as well for the kind words. I am glad you enjoyed reading it.

You are not the first to bring up SRFI-42! There was someone on Mastodon 
earlier who made a similar point. I don't think it's a bad library, but 
it seemed to have a handful of things going against it:


1. Locked in stone: same argument for SRFI-171 really, although SRFI-42 
is a lot older and more battle-tested.


2. Macro-based: This makes it pretty difficult to extend, especially if 
you don't have a clear "standard" or "winning" data structure that the 
community can rely on. Not that it's impossible, but I'd have an easier 
time as a user writing a custom fold procedure (and maybe a collect?) 
and working with transducers than extending eager comprehension macros 
to support new types.


A corollary to that: transducers are a bit more general because they 
separate the collection from the primary form (transduce, or fold-ec in 
your example). This isn't really a strong argument to dismiss it though.


3. Aesthetics: related to syntax / macro based solutions, I have 
strongly religious (read that as: entirely subjective) reasons to prefer 
to keep my Scheme code mostly syntax free. Transducers are a bit more 
"functional programming" in some ways (we still use set! though?), which 
is acceptable to me.


To put it in perspective: if the SRFI community said "we're not writing 
new fold operations, no more (f k x) nonsense, and everyone has to learn 
SRFI-42 or write their procedures in their own Scheme modules" I'd 
probably still find that acceptable. The thing is that eager 
comprehensions aren't really pointed out or taught or talked about 
really all that often, especially with regards to newbies learning the 
language, or to try and guide "best practices" in Scheme. Of course, we 
don't want to tell people too much how to program, but I think 
encouraging code that is more performant or easier to "fix" (or debug) 
when things go wrong is probably better than not.


So if I had to decide between transducers and SRFI-42, I'd be sad 
because both work well. But I'm happy that we can choose either. I just 
have a stronger preference for the transducers approach (and 
|collect-vector| variants are extremely underrated relative to how 
useful I believe they are).



And here is a comparison of performance characteristics:

Script started on 2023-01-05 03:14:57-08:00 [TERM="screen.linux" TTY="/dev/pts/4" 
COLUMNS="80" LINES="30"]
$ ./transducer-bench
12.278s CPU time, 0.002s GC time (major), 1/108400 GCs (major/minor), maximum 
live heap: 879.61 KiB
$ ./srfi-42-bench
3.5s CPU time, 0/70231 GCs (major/minor), maximum live heap: 478.02 KiB
$ exit


Oof, I'm surprised to see such a differential. On my machine it's still 
faster, but it's:


transducers: 3.540s
srfi-42: 1.623s

Still though, looks like I have my work cut out for me.
--
Jeremy Steward



Re: New egg: CHICKEN Transducers

2023-01-05 Thread Jeremy Steward

On 1/5/23 00:44, Peter Bex wrote:

And I've written a short blog post outlining some of my frustrations that
led me to writing this egg:




Excellent observations, especially regarding the fold argument inconsistency,
that's so cringeworthy it isn't even funny.  I thoroughly enjoyed reading
the post.  Even if it's critical of Scheme, it's done in a loving way,
as you mention ;)  I also agree with you that SRFI has probably had its
best time.


Thanks for the kind words, I do hope it is recognized that what I am 
saying in the post is with love for Scheme. The fold thing gets me every 
time in every language I ever work in, so it was somewhat cathartic to 
write that down.



Happy to engage with the rest of the community on what the next priority for
such a library should be. My hope is that this library helps us all move
away from XXX-map and XXX-filter procedures for each individual type.


Well, like I said I kinda prefer having the type-specific procedures.
I rarely deal with situations where I don't know the type on which I'm
operating.  I usually treat generic code with suspicion, as it's not
going to be as fast as it could be, and higher-order code tends to be
more difficult to follow than necessary.


Well, the final transducers produced /are/ type-specific. The input and 
output types are parameterized by the folder and collector, 
respectively. This is a bit different than Clojure where protocols are 
used to paper over the types entirely. E.g.


(transduce list-fold
   (map add1)
   (collect-vector)
   (list 1 2 3))

You will get a type error here from |list-fold| if you pass in a vector 
in place of a list in the last position, for example. Likewise, 
|collect-vector| can only ever collect a vector, and how fast it is 
depends on the size & number of re-allocations (hence why it collects in 
amortized O(1) time, and has an optional size-hint).


I tried my best to avoid any of the higher-level performance pitfalls 
that would come about with transducers. Of course, this doesn't mean bad 
code can't be subbed in, but this is a library with versions and an 
issue tracker and a repo, not just a spec. :) I fully welcome anybody to 
tell me how to make things more performant or easier to debug if there's 
good ideas lying around.


You are right though that higher-order code does tend towards being more 
difficult to follow. My hope is that as long as the parameterizations in 
transducers (as they currently exist) remain explicit that this can be 
mostly avoided.



However, transducers offer one thing that the type-specific procedures
do not - pipelining without accumulating lots of intermediate results.
For me that's the killer feature of transducers.


Yes, the ability to avoid intermediates is perhaps the main advantage. I 
also think that the fact that collection is distinct from folding and 
transducing is a separate advantage -- e.g. if you know that you need a 
vector at the end there's no need to worry about whether that aligns 
with the input / processing steps.


Regards, and thanks again for the thoughtful reply,
--
Jeremy Steward



Re: New egg: CHICKEN Transducers

2023-01-05 Thread Walter Lewis

On 1/4/23 8:48 PM, Jeremy Steward wrote:
And I've written a short blog post outlining some of my frustrations 
that led me to writing this egg:




Happy to engage with the rest of the community on what the next 
priority for such a library should be.


Great post. I would love to see a portable R7RS library, which you 
mention as a possibility. For example, I am working on an R7RS 
containers library and it would be straightforward to include 
transducers as part of it.


Best,

Walter




Re: New egg: CHICKEN Transducers

2023-01-05 Thread siiky

How about SRFI 42?


I wrote a little bit in #chicken but think it should be added here as well.


Pros of SRFI 42:

+ It's short and clear. It feels very natural to me. Maybe just because 
I've used it for some time while I've just been introduced to 
transducers now, but I have the impression that's not it.


Pros of transducers (the egg):

+ Seems to be more generic: the operation and the resulting data 
structure are decoupled.
+ Seems to be easier to implement/understand (based on the examples of 
the post): look at "Why this whole :dispatched business?" from the SRFI 
42 doc[0] to compare with a `type-fold`, or the sample implementation[1].



There are some things I would still use SRFI 42 for, at least for now 
that I'm new to transducers. For example:


(list-ec (:list x lst1)
 (:list y lst2)
 (if (even? (* x y)))
 (cons x y))

Which is similar to  (filter even (map * lst1 lst2))  but more 
performant because there's no intermediate list. I have no idea how to 
do this with transducers.



[0]: https://srfi.schemers.org/srfi-42/srfi-42.html
[1]: https://srfi.schemers.org/srfi-42/ec.scm





Re: New egg: CHICKEN Transducers

2023-01-05 Thread siiky
Awesomesauce, thanks for the egg and the post! This is a smnall-ish pain 
point of mine too. I'm still processing the post as well, but I can tell 
I'll start using this frequently!


First a couple of typos: In footnote 4 it should be `call/cc` and not 
`call-cc`? And "its not the Scheme I reach for" :p



About question (1) of "Some open problems ...": IMO (transduce type-fold 
...) is a good compromise, and it seems better to me than 
(type-transduce ...) though it means a tiny bit more typing. My 
reasoning is that (intuitively, without having looked at an impl of 
either) type-fold is easier to implement than type-transduce, and thus 
it should also be easier for someone to extend the transducers egg with 
their own fold+collect rather than transduce+collect.  An usecase of 
extending the transducers egg is e.g. if I have an egg where I define a 
data structure foo and would like users of my egg to use transducers on 
(with?) foo, but without having to change the transducers egg itself.


Regarding (4) and strings: if we survived without transducers until now, 
I think can survive until transducers for strings until C6 too and use 
string->list+list-fold :) (I'm assuming UTF-8 will only become core in 
C6, maybe I'm wrong)



Small nitpick comment: I think it makes more sense to have either 
type-fold+type-collect or fold-type+collect-type. It's more common in 
Scheme to name type-operation so I'd give preference to that, but the 
other is more natural and equally good IMO.



Thanks,
siiky





Re: New egg: CHICKEN Transducers

2023-01-05 Thread Chris Brannon
Jeremy Steward  writes:

> And I've written a short blog post outlining some of my frustrations
> that led me to writing this egg:
>
> 

That was a fantastic post.  I'm still digesting it.  How about SRFI 42?
Here's your example from the transducer benchmark, written with SRFI 42:

(import srfi-42)
(time (fold-ec 0 (:range x 1) (if (odd? x)) (* x 3) +))

And here is a comparison of performance characteristics:

Script started on 2023-01-05 03:14:57-08:00 [TERM="screen.linux" 
TTY="/dev/pts/4" COLUMNS="80" LINES="30"]
$ ./transducer-bench
12.278s CPU time, 0.002s GC time (major), 1/108400 GCs (major/minor), maximum 
live heap: 879.61 KiB
$ ./srfi-42-bench
3.5s CPU time, 0/70231 GCs (major/minor), maximum live heap: 478.02 KiB
$ exit

Script done on 2023-01-05 03:15:28-08:00 [COMMAND_EXIT_CODE="0"]

-- Chris



Re: New egg: CHICKEN Transducers

2023-01-04 Thread Peter Bex
On Wed, Jan 04, 2023 at 06:48:56PM -0700, Jeremy Steward wrote:
> I've been somewhat bothered by the fragmentation in a certain aspect of
> Scheme / Lisp: notably that there isn't really something akin to Rust's
> Iterator trait in Scheme, and as a result working across various collections
> and data types is a pain.

Arguably that's a feature - in Scheme you'll always be sure of the
performance characteristics of the type you're working with, and you'll
always know what type you're working with.  Also, by not having generics
for collection types you avoid quite a performance overhead related to
dispatch. But I understand many people disagree that this is desirable.

> So I introduce to you - Transducers! These are very similar to Clojure's
> transducers, but add one more parameter: a fold procedure. I was unhappy
> with SRFI-158 (Generators & Accumulators) and also unhappy with SRFI-171
> (mostly because of how sparse the API is).

This looks quite nice and well done!

> And I've written a short blog post outlining some of my frustrations that
> led me to writing this egg:
> 
> 

Excellent observations, especially regarding the fold argument inconsistency,
that's so cringeworthy it isn't even funny.  I thoroughly enjoyed reading
the post.  Even if it's critical of Scheme, it's done in a loving way,
as you mention ;)  I also agree with you that SRFI has probably had its
best time.

Besides just being "etched in stone", the SRFI process has been hijacked
by a small group of people and is producing SRFIs at breakneck speed with
perhaps not enough community input.  Usually by the time I learn about a
SRFI about a subject I find interesting, it's already finalized ;)
That makes SRFIs rather "take it or leave it".  And often they're not
especially ergonomic either.

> Happy to engage with the rest of the community on what the next priority for
> such a library should be. My hope is that this library helps us all move
> away from XXX-map and XXX-filter procedures for each individual type.

Well, like I said I kinda prefer having the type-specific procedures.
I rarely deal with situations where I don't know the type on which I'm
operating.  I usually treat generic code with suspicion, as it's not
going to be as fast as it could be, and higher-order code tends to be
more difficult to follow than necessary.

However, transducers offer one thing that the type-specific procedures
do not - pipelining without accumulating lots of intermediate results.
For me that's the killer feature of transducers.
 
> First things first: I'll probably keep trying to add more common data type
> support (SRFI 146 mappings and SRFI 69 hash-tables come to mind) as I can.
> Let me know if you find this useful!

Sounds good - the more native types are supported, the more useful a
"generic" transducer is.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: New egg: CHICKEN Transducers

2023-01-04 Thread Mario Domenech Goulart
On Wed, 4 Jan 2023 18:48:56 -0700 Jeremy Steward  wrote:

> I've been somewhat bothered by the fragmentation in a certain aspect
> of Scheme / Lisp: notably that there isn't really something akin to
> Rust's Iterator trait in Scheme, and as a result working across
> various collections and data types is a pain.
>
> So I introduce to you - Transducers! These are very similar to
> Clojure's transducers, but add one more parameter: a fold procedure. I
> was unhappy with SRFI-158 (Generators & Accumulators) and also unhappy
> with SRFI-171 (mostly because of how sparse the API is).
>
> test-new-egg passes with the following release-info:
>
> 
>
> I've already added extensive documentation to the wiki page:
>
> 
>
> And I've written a short blog post outlining some of my frustrations
> that led me to writing this egg:
>
> 
>
> Happy to engage with the rest of the community on what the next
> priority for such a library should be. My hope is that this library
> helps us all move away from XXX-map and XXX-filter procedures for each
> individual type.
>
> First things first: I'll probably keep trying to add more common data
> type support (SRFI 146 mappings and SRFI 69 hash-tables come to mind)
> as I can. Let me know if you find this useful!

Many thanks, Jeremy.  Your egg has been added to the coop.

All the best.
Mario
-- 
http://parenteses.org/mario



New egg: CHICKEN Transducers

2023-01-04 Thread Jeremy Steward

Hey all,

I've been somewhat bothered by the fragmentation in a certain aspect of 
Scheme / Lisp: notably that there isn't really something akin to Rust's 
Iterator trait in Scheme, and as a result working across various 
collections and data types is a pain.


So I introduce to you - Transducers! These are very similar to Clojure's 
transducers, but add one more parameter: a fold procedure. I was unhappy 
with SRFI-158 (Generators & Accumulators) and also unhappy with SRFI-171 
(mostly because of how sparse the API is).


test-new-egg passes with the following release-info:



I've already added extensive documentation to the wiki page:



And I've written a short blog post outlining some of my frustrations 
that led me to writing this egg:




Happy to engage with the rest of the community on what the next priority 
for such a library should be. My hope is that this library helps us all 
move away from XXX-map and XXX-filter procedures for each individual type.


First things first: I'll probably keep trying to add more common data 
type support (SRFI 146 mappings and SRFI 69 hash-tables come to mind) as 
I can. Let me know if you find this useful!


Cheers,
--
Jeremy Steward