Re: %default-port-conversion-strategy and string ports

2012-06-03 Thread Daniel Krueger
Hey,

I don't want to comment on what guile should choose to do in the
future but just wanted to say which interface would be clear to me.

In the first place I agree that ports should be seperated and not
mixed textual/binary (mind, I know it may be that this can't be just
changed that easily in guile). There you either work with characters
and don't mind how they are encoded, or you work on binary data, which
has some encoding, if it is for letters or something else like rgb
data etc.
Then there is are procedures which explicitly convert numbers-chars
and bytevectors-strings, which are partly there (number-char and
char-number afaik), but which lack the ability to specify which
encoding they should use (I don't know if they have to use a specific
encoding in rnrs).
And the last is, you have to have open-string-input/output-port, which
opens a textual port and open-bytevector-input/output-port, which
opens a binary port. So if you want to work with the encoded
characters of a string you say `(open-bytevector-input-port
(string-bytevector my string (character-encoding UTF-8)))`.

WDYT?

- Daniel

On Sat, Jun 2, 2012 at 4:55 PM, David Kastrup d...@gnu.org wrote:
 l...@gnu.org (Ludovic Courtès) writes:

 Mark H Weaver m...@netris.org skribis:

 l...@gnu.org (Ludovic Courtès) writes:
 Ports in Guile can be used to write characters, or bytes, or both.  In
 particular, every port (including string ports, void ports, etc.) has an
 “encoding”, which is actually only used for textual I/O.

 Conversely, an R6RS port is either textual or binary, but not both.

 IMO, one advantage of mixed text/binary ports is to allow things like this:

   scheme@(guile-user) (define (string-utf16 s)
                          (let ((p (with-fluids ((%default-port-encoding 
 UTF-16BE))
                                     (open-input-string s
                            (get-bytevector-all p)))
   scheme@(guile-user) (string-utf16 hello)
   $4 = #vu8(0 104 0 101 0 108 0 108 0 111)
   scheme@(guile-user) (use-modules(rnrs bytevectors))
   scheme@(guile-user) (utf16-string $4)
   $5 = hello

 IMHO, this is a bad hack that exposes internal details of our
 implementation of string ports

 Which details?

 It exposes the fact that ports in general are mixed textual/binary, but
 nothing specific to string ports AFAICS.

 If I can't rely on

 (string= (with-output-to-string (display x)) x)

 then the interface is seriously rotten.  There is lots of code around
 that depends on the ability to bounce material between strings and the
 default output port.

 We’d have to dig r6rs-discuss, but my recollection is that there were
 arguments both in favor and against separate binary/textual ports.

 The question binary/textual concerns ports connected to a file.  String
 ports and Scheme ports should be _transparent_: input and output
 identical.  They are used for connecting character streams within Scheme
 and should not tamper with them.

 --
 David Kastrup





Re: %default-port-conversion-strategy and string ports

2012-06-02 Thread Ludovic Courtès
Hi Mark,

Mark H Weaver m...@netris.org skribis:

 l...@gnu.org (Ludovic Courtès) writes:
 Ports in Guile can be used to write characters, or bytes, or both.  In
 particular, every port (including string ports, void ports, etc.) has an
 “encoding”, which is actually only used for textual I/O.

 Conversely, an R6RS port is either textual or binary, but not both.

 IMO, one advantage of mixed text/binary ports is to allow things like this:

   scheme@(guile-user) (define (string-utf16 s)
  (let ((p (with-fluids ((%default-port-encoding 
 UTF-16BE))
 (open-input-string s
(get-bytevector-all p)))
   scheme@(guile-user) (string-utf16 hello)
   $4 = #vu8(0 104 0 101 0 108 0 108 0 111)
   scheme@(guile-user) (use-modules(rnrs bytevectors))
   scheme@(guile-user) (utf16-string $4)
   $5 = hello

 IMHO, this is a bad hack that exposes internal details of our
 implementation of string ports

Which details?

It exposes the fact that ports in general are mixed textual/binary, but
nothing specific to string ports AFAICS.

 and whose only utility AFAIK is to (partially) make up for our lack
 of a proper 'iconv' interface from Scheme.

Well, ‘string-pointer’ and ‘scm_to_stringn’ can be thought of as an
iconv interface, no?

I understand you’re in favor of separate textual/binary ports.  It
may have its advantages; yet, it’s not clear to me that mixed
binary/textual ports are a kludge either.

We’d have to dig r6rs-discuss, but my recollection is that there were
arguments both in favor and against separate binary/textual ports.

Last but not least, mixed ports were decided a couple of years ago as a
reasonable option to allow a smooth transition from 1.8 where the API
was already mixed (think ‘write’, vs. ‘uniform-vector-write’,
vs. ‘scm_c_write’, etc.)  It may not be ideal, but I’m convinced it’s a
good compromise.  It’s not all black and white.

[...]

 I guess that your proposed solution (to make
 SRFI-6 export alternative versions of 'open-input-string' and
 'open-output-string' that always use UTF-8 for the port encoding) is the
 best we can do now.

OK, I’ll do that.

 I have concerns about this solution, but I can't think of a better one,
 given the unfortunate fact that our current semantics have been widely
 deployed (and worse, that the above hack has been advertised on
 guile-user as a way to work around our lack of 'iconv' from Scheme).

Again, I’ve advertised it because I find it useful, and because I don’t
consider it such an ugly kludge.

Thanks,
Ludo’.



Re: %default-port-conversion-strategy and string ports

2012-06-02 Thread David Kastrup
l...@gnu.org (Ludovic Courtès) writes:

 Mark H Weaver m...@netris.org skribis:

 l...@gnu.org (Ludovic Courtès) writes:
 Ports in Guile can be used to write characters, or bytes, or both.  In
 particular, every port (including string ports, void ports, etc.) has an
 “encoding”, which is actually only used for textual I/O.

 Conversely, an R6RS port is either textual or binary, but not both.

 IMO, one advantage of mixed text/binary ports is to allow things like this:

   scheme@(guile-user) (define (string-utf16 s)
  (let ((p (with-fluids ((%default-port-encoding 
 UTF-16BE))
 (open-input-string s
(get-bytevector-all p)))
   scheme@(guile-user) (string-utf16 hello)
   $4 = #vu8(0 104 0 101 0 108 0 108 0 111)
   scheme@(guile-user) (use-modules(rnrs bytevectors))
   scheme@(guile-user) (utf16-string $4)
   $5 = hello

 IMHO, this is a bad hack that exposes internal details of our
 implementation of string ports

 Which details?

 It exposes the fact that ports in general are mixed textual/binary, but
 nothing specific to string ports AFAICS.

If I can't rely on

(string= (with-output-to-string (display x)) x)

then the interface is seriously rotten.  There is lots of code around
that depends on the ability to bounce material between strings and the
default output port.

 We’d have to dig r6rs-discuss, but my recollection is that there were
 arguments both in favor and against separate binary/textual ports.

The question binary/textual concerns ports connected to a file.  String
ports and Scheme ports should be _transparent_: input and output
identical.  They are used for connecting character streams within Scheme
and should not tamper with them.

-- 
David Kastrup




Re: %default-port-conversion-strategy and string ports

2012-06-01 Thread Ludovic Courtès
Hi!

Mark H Weaver m...@netris.org skribis:

 SRFI-6 (string ports) says nothing about port encodings, and yet
 portable code written for SRFI-6 will fail on Guile 2.0 unless the
 string is constrained to whatever the default port encoding happens to
 be.  This is not just a theoretical issue; it has caused trouble in
 practice, e.g.:

   http://bugs.gnu.org/11197

Hey, there’s a patch for SRFI-6 there.  Could we resume the discussion
in that bug?

Guile ports are mixed textual/binary ports.  Whether this or separate
binary/textual ports as in R6 is best is an interesting question, but as
you note, we cannot really change that currently.

Thanks,
Ludo’.



Re: %default-port-conversion-strategy and string ports

2012-06-01 Thread David Kastrup
l...@gnu.org (Ludovic Courtès) writes:

 Hi!

 Mark H Weaver m...@netris.org skribis:

 SRFI-6 (string ports) says nothing about port encodings, and yet
 portable code written for SRFI-6 will fail on Guile 2.0 unless the
 string is constrained to whatever the default port encoding happens to
 be.  This is not just a theoretical issue; it has caused trouble in
 practice, e.g.:

   http://bugs.gnu.org/11197

 Hey, there’s a patch for SRFI-6 there.  Could we resume the discussion
 in that bug?

 Guile ports are mixed textual/binary ports.  Whether this or separate
 binary/textual ports as in R6 is best is an interesting question, but as
 you note, we cannot really change that currently.

I don't understand this distinction.  A port transfers characters, like
strings contain characters.  The relation is 1:1.  The question of
encoding only concerns ports connected to a file, or a terminal, and
then textual/binary is a question of encoding/decoding.  A port that
stays within Guile has no business being concerned with encoding.  It
has to reproduce the characters from its input to its output without
change.

Things are complicated enough talking to the outside.  There is no point
in Guile being confused even when talking to itself.

-- 
David Kastrup




Re: %default-port-conversion-strategy and string ports

2012-06-01 Thread Ludovic Courtès
Hi David,

Ports in Guile can be used to write characters, or bytes, or both.  In
particular, every port (including string ports, void ports, etc.) has an
“encoding”, which is actually only used for textual I/O.

Conversely, an R6RS port is either textual or binary, but not both.

IMO, one advantage of mixed text/binary ports is to allow things like this:

  scheme@(guile-user) (define (string-utf16 s)
 (let ((p (with-fluids ((%default-port-encoding 
UTF-16BE))
(open-input-string s
   (get-bytevector-all p)))
  scheme@(guile-user) (string-utf16 hello)
  $4 = #vu8(0 104 0 101 0 108 0 108 0 111)
  scheme@(guile-user) (use-modules(rnrs bytevectors))
  scheme@(guile-user) (utf16-string $4)
  $5 = hello

Thanks,
Ludo’.




Re: %default-port-conversion-strategy and string ports

2012-06-01 Thread Mark H Weaver
l...@gnu.org (Ludovic Courtès) writes:
 Ports in Guile can be used to write characters, or bytes, or both.  In
 particular, every port (including string ports, void ports, etc.) has an
 “encoding”, which is actually only used for textual I/O.

 Conversely, an R6RS port is either textual or binary, but not both.

 IMO, one advantage of mixed text/binary ports is to allow things like this:

   scheme@(guile-user) (define (string-utf16 s)
  (let ((p (with-fluids ((%default-port-encoding 
 UTF-16BE))
 (open-input-string s
(get-bytevector-all p)))
   scheme@(guile-user) (string-utf16 hello)
   $4 = #vu8(0 104 0 101 0 108 0 108 0 111)
   scheme@(guile-user) (use-modules(rnrs bytevectors))
   scheme@(guile-user) (utf16-string $4)
   $5 = hello

IMHO, this is a bad hack that exposes internal details of our
implementation of string ports, and whose only utility AFAIK is to
(partially) make up for our lack of a proper 'iconv' interface from
Scheme.  If we could enable this hack without compromising the
robustness of the _primary_ use case for string ports, then I would
merely frown at this leak of internal implementation details.
Unfortunately, it's worse than that.

Anyway, I suppose that we are now locked into this broken behavior, at
least for Guile 2.0.  I guess that your proposed solution (to make
SRFI-6 export alternative versions of 'open-input-string' and
'open-output-string' that always use UTF-8 for the port encoding) is the
best we can do now.

I have concerns about this solution, but I can't think of a better one,
given the unfortunate fact that our current semantics have been widely
deployed (and worse, that the above hack has been advertised on
guile-user as a way to work around our lack of 'iconv' from Scheme).

 Mark



Re: %default-port-conversion-strategy and string ports

2012-05-31 Thread Ludovic Courtès
Hi,

David Kastrup d...@gnu.org skribis:

 Shouldn't strings be in internal encoding anyway?  The whole point of
 a string is to be an array of characters.  Not an array of arbitrarily
 encoded bytes.

Yes, but I was referring to “string ports”, which may actually be fed
arbitrary binary data, not just characters.

Thanks,
Ludo’.




Re: %default-port-conversion-strategy and string ports

2012-05-31 Thread David Kastrup
l...@gnu.org (Ludovic Courtès) writes:

 Hi,

 David Kastrup d...@gnu.org skribis:

 Shouldn't strings be in internal encoding anyway?  The whole point of
 a string is to be an array of characters.  Not an array of arbitrarily
 encoded bytes.

 Yes, but I was referring to “string ports”, which may actually be fed
 arbitrary binary data, not just characters.

How so?  A string is an array of characters.  Arbitrary binary data is
an array of bytes.  Merrily mixing the two is not going to lead to
consistent results: you are going to have things accidentally decoded
more than once or not at all, and accidentally encoded more than once or
not at all.

Emacs _does_ have unibyte-string as a data structure of raw bytes for
efficiency reasons, but it is not clear that the hassle is worth it.
You _can_ read binary data into multibyte strings: non-utf-8 sequences
are then put into special code places so that they can be recovered
unchanged when encoding again.  So for strings, Emacs has the two kinds:
unibyte (raw data) and multibyte (conceptually an array of Unicode
characters in some hidden multibyte encoding incidentally quite close to
utf-8).

And that is all.  The rest is decoded (typically from unibyte) into
multibyte, and encoded back when writing it somewhere.  How are you even
supposed to deal with combining strings when they can be encoded
differently?

-- 
David Kastrup




Re: %default-port-conversion-strategy and string ports

2012-05-31 Thread Mark H Weaver
l...@gnu.org (Ludovic Courtès) writes:
 David Kastrup d...@gnu.org skribis:

 Shouldn't strings be in internal encoding anyway?  The whole point of
 a string is to be an array of characters.  Not an array of arbitrarily
 encoded bytes.

 Yes, but I was referring to “string ports”, which may actually be fed
 arbitrary binary data, not just characters.

For the record, I agree with David.  String ports should be textual
ports, not binary ports.  In particular, you should be able to
write/read _any_ character to/from a string port, regardless of what the
current default port encoding happens to be.

SRFI-6 (string ports) says nothing about port encodings, and yet
portable code written for SRFI-6 will fail on Guile 2.0 unless the
string is constrained to whatever the default port encoding happens to
be.  This is not just a theoretical issue; it has caused trouble in
practice, e.g.:

  http://bugs.gnu.org/11197

Unfortunately, we are now in an awkward situation.  The current behavior
of Guile 2.0 is conceptually broken and breaks portable SRFI-6 code, and
yet it is possible that some users have grown to depend on our current
behavior.

 Mark



Re: %default-port-conversion-strategy and string ports

2012-05-30 Thread Mike Gran
Second, in commit 9f6e3f5a997f484548bd03e7e7573c38a95c8d09, I changed
string ports to honor it, like other port types, instead of forcing
'error.  This seems like the right thing to me, for the sake of
consistency (in fact, I’d consider the previous behavior as a bug), but
it’s an observable change.

Sounds fair.  The 'error behavior for was once coupled with UTF-8, making
the error mostly moot at the time, but, if string ports are honoring encoding
they should probably honor the conversion strategy as well.
 
-Mike



Re: %default-port-conversion-strategy and string ports

2012-05-30 Thread David Kastrup
l...@gnu.org (Ludovic Courtès) writes:

 Hello!

 Commit b22e94db7c91d7661204e33f3bc2bfead002c9b7 adds
 ‘%default-port-conversion-strategy’, a natural friend of
 ‘%default-port-encoding’.

 First, I’m wondering whether ‘port’ should be part of the name, given
 that it’s also referred to by ‘scm_stringn’  co.  It’s good to have it
 in the name, for the symmetry with ‘%default-port-encoding’, but it’s
 not accurate.

 Second, in commit 9f6e3f5a997f484548bd03e7e7573c38a95c8d09, I changed
 string ports to honor it, like other port types, instead of forcing
 'error.  This seems like the right thing to me, for the sake of
 consistency (in fact, I’d consider the previous behavior as a bug), but
 it’s an observable change.

 WDYT?

Shouldn't strings be in internal encoding anyway?  The whole point of
a string is to be an array of characters.  Not an array of arbitrarily
encoded bytes.

-- 
David Kastrup