Re: Playing with guile (vs python). Generate file for GDP suitable for gnuplot.

2017-01-30 Thread Alex Vong
Hello,

Germán Diago  writes:

> Hello everyone,
>
> I did a script that parses some file with the GDP since 1970 for many
> countries.  I filter the file and discard uninteresting fields, later I
> write in a format suitable for gnuplot.
>
Wow, you started programming before I was born!

> I did this in python and guile.
>
> In python it takes around 1.1 seconds in my raspberry pi.
>
> In Guile it is taking around 11 seconds.
>
> I do not claim they are doing exactly the same: in python I use arrays and
> dictionaries, in guile I am using mainly lists, I would like to know if you
> could give me advice on how to optimize it. I am just training for now.
>
I know very little about compiler optimization in general. But I
remember a trick in guile. In guile, *top level* definitions doesn't get
inlined, so you put everything in a big 'let', so that the compiler can
do its job.

For examples, instead of:

  (define (f) (display "hello world"))
  (define (g) (newline))

  (f)
  (g)

you write:

  (let ()
(define (f) (display "hello world"))
(define (g) (newline))

(f)
(g))

In addition, in the guile REPL, you can see how your code is optimized
by running using the ',opt' command. For example:

  ,opt(let ()
(define (f) (display "hello world"))
(define (g) (newline))

(f)
(g))

gives:

  $1 = (begin (display "hello world") (newline))

Hope these help!

> The scripts in both python and guile are attached and the profile data for
> scheme is below. Just place in the same directory the .csv file and it
> should generate an output file with the data ready for gnuplot :)
>
> % cumulative   self
> time   seconds seconds  name
>  26.24  3.45  3.43  %read-line
>  20.51  2.68  2.68  string->number
>  15.54  2.05  2.03  string-delete
>   7.39  7.75  0.97  map
>   5.13  3.96  0.67  transform-data
>   4.07  1.75  0.53  format:format-work
>   3.17  0.41  0.41  string=?
>   2.87  0.37  0.37  string-ref
>   1.81  2.50  0.24  tilde-dispatch
>   1.81  0.24  0.24  number->string
>   1.51  0.34  0.20  is-a-digit
>   1.06  0.28  0.14  anychar-dispatch
>   1.06  0.14  0.14  display
>   1.06  0.14  0.14  string-length
>   1.06  0.14  0.14  char>=?
>   1.06  0.14  0.14  char<=?
>   1.06  0.14  0.14  string-split
>   0.60  0.08  0.08  length
>   0.45  0.49  0.06  format:out-num-padded
>   0.45  0.06  0.06  remove-dots
>   0.30  0.04  0.04  %after-gc-thunk
>   0.30  0.04  0.04  list-tail
>   0.30  0.04  0.04  write-char
>   0.15  3.53  0.02  loop
>   0.15  3.47  0.02  read-line
>   0.15  0.02  0.02  substring
>   0.15  0.02  0.02  list-ref
>   0.15  0.02  0.02  reverse!
>   0.15  0.02  0.02  # (e)>
>   0.15  0.02  0.02  integer?
>   0.15  0.02  0.02  char=?
>   0.00 13.07  0.00  load-compiled/vm
>   0.00 13.07  0.00  # (thunk)>
>   0.00 13.07  0.00  # ()>
>   0.00 13.07  0.00  call-with-prompt
>   0.00 13.07  0.00  # ()>
>   0.00 13.07  0.00  apply-smob/1
>   0.00 13.07  0.00  catch
>   0.00 13.07  0.00  #
>   0.00 13.07  0.00  run-repl*
>   0.00 13.07  0.00  save-module-excursion
>   0.00 13.07  0.00  statprof
>   0.00 13.07  0.00  start-repl*
>   0.00 11.22  0.00  #
>   0.00  3.53  0.00  call-with-input-file
>   0.00  1.85  0.00  call-with-output-file
>   0.00  1.79  0.00  for-each
>   0.00  1.75  0.00  format
>   0.00  0.14  0.00  get-fields
>   0.00  0.10  0.00  # (year)>
>   0.00  0.06  0.00  #
>   0.00  0.02  0.00  format:out-obj-padded
>   0.00  0.02  0.00  remove
>   0.00  0.02  0.00  call-with-output-string


signature.asc
Description: PGP signature


Re: guile can't find a chinese named file

2017-01-30 Thread Eli Zaretskii
> Date: Mon, 30 Jan 2017 20:42:38 + (UTC)
> From: Mike Gran 
> Cc: "guile-user@gnu.org" 
> 
> Earlier in the 2.0.x release series, Guile had a hack where it started
> up in a Latin-1 encoding, which would be capable of storing any
> 8-bit string of bytes, even if they weren't Latin-1.

Latin-1 has holes in the 0..255 range, so it isn't very appropriate in
this situation.

> And what was supposed to happen after setlocale was called?

What Emacs does is explicitly decode any variable produced until that
moment that is known to hold unibyte strings.

> As an aside, GTK and GLIB based applications often use a method where
> you may need to set the environment variable G_FILENAME_ENCODING
> if your filename encoding is different from your locale encoding.
> GTK/GLIB also likes to store strings internally as UTF-8, and will
> convert to UTF-8 from either the locale or the G_FILENAME_ENCODING-
> specified encoding.

Emacs stores all environment variables in their original
locale-specific encoding, as unibyte strings, and only decodes them
when they are actually used or handed to Lisp.



Re: guile can't find a chinese named file

2017-01-30 Thread Marko Rauhamaa
Eli Zaretskii :

>> From: Marko Rauhamaa 
>> 
>> UTF-8 beautifully bridges the interpretation gap between 8-bit character
>> strings and text. However, the interpretation step should be done in the
>> application and not in the programming language.
>
> You can't do that in an environment that specifically targets
> sophisticated multi-lingual text processing independent of the outside
> locale.  Unless you can interpret byte sequences as characters, you
> will be unable to even count characters in a range of text,

If you need to operate on Unicode text, have the application invoke the
UTF-8 (or locale-specific) decoder. However, have the application
request it instead of guessing that the environment is all Unicode.

> You do need "other typesetting effects", naturally, but that doesn't
> mean you can get away without more or less full support of Unicode
> nowadays.

Do support it, fully even, but let the application invoke the
conversion when appropriate.

> You are talking about programming, but we should instead think about
> applications -- those of them which need to process text, or even
> access files, as this discussion shows, do need decent Unicode
> support.

Why should opening a file require Unicode support if the underlying
operating system knows nothing about Unicode? I can open a any given
file in a tiny C program without any Unicode support, under Linux, that
is.

> E.g., users generally expect that decomposed and composed character
> sequences behave and are treated identically, although they are
> different byte-stream wise.

Linux begs to differ. Regardless of the locale, two different octet
sequences that ought to be equivalent UTF-8-wise will be considered
different pathnames under Linux.

I don't need a helicopter to walk across the street.

>> But is also causing unnecessary grief in the computer-computer
>> interface, where the classic textual naming and textual protocols
>> are actually cutely chosen octet-aligned binary formats.
>
> The universal acceptance of UTF-8 nowadays makes this much less of an
> issue, IME.

You are jumping the gun. Linux won't be there for a long time if ever.
Nothing prevents a pathname, or a command-line argument, or an
environment variable, or the standard input from containing illegal
UTF-8.

I also wouldn't like my SMTP server to throw a UTF-8 decoding exception
on parsing a command.

(Also note that even Windows allows pathnames with illegal Unicode in
them if I'm not mistaken.)


Marko



Re: guile can't find a chinese named file

2017-01-30 Thread Mike Gran

On Monday, January 30, 2017 12:00 PM, Eli Zaretskii  wrote:
> Actually, the need arises even sooner.  Consider how load-path is set
> up during startup: it starts with the directory from which Emacs was
> invoked, either from argv[0] or by looking up PATH.  Either way, you
> get a file name that is encoded in the locale-specific encoding.  Then
> you cons load-path by expanding file names relative to the startup
> directory.  So you immediately need to be able to create file names
> from directories, check whether a file exists and is a directory,
> etc. -- all of that before you even know in what locale you started,
> so you cannot decode these file names into the internal

> representation, before using them.

Earlier in the 2.0.x release series, Guile had a hack where it started
up in a Latin-1 encoding, which would be capable of storing any
8-bit string of bytes, even if they weren't Latin-1.  I was the author
of the first version of that hack.  Anyway, while it was technically
incorrect, it did get the job done for some of these locale-free
byte string problems.  It could open non-ASCII paths without really
having an encoding, if I recall correctly.


It was an uneasy middle ground, tho.  Error messages with regards
to file names would be mojibake. And string ports were a mess.And what was 
supposed to happen after setlocale was called?


As an aside, GTK and GLIB based applications often use a method where
you may need to set the environment variable G_FILENAME_ENCODING
if your filename encoding is different from your locale encoding.
GTK/GLIB also likes to store strings internally as UTF-8, and will
convert to UTF-8 from either the locale or the G_FILENAME_ENCODING-
specified encoding.

As another aside, OpenBSD removed support for non-UTF8 locales.


-Mike Gran



Re: guile can't find a chinese named file

2017-01-30 Thread Eli Zaretskii
> Date: Mon, 30 Jan 2017 21:32:41 +0200
> From: Eli Zaretskii 
> Cc: guile-user@gnu.org
> 
> > Hm, I know that XEmacs-Mule emphatically does not have unibyte strings
> > (and Stephen considers them a complication and abomination that should
> > never have been left in Emacs), so it must be possible to get away
> > without them.
> 
> I doubt that's possible, at least not in general.  (You could get away
> if you assumed UTF-8 encoded file names.)  Some translation tables for
> some encodings must load files using the likes of load-path, and if
> that includes non-ASCII file names, you are screwed unless you can use
> unibyte strings.

Actually, the need arises even sooner.  Consider how load-path is set
up during startup: it starts with the directory from which Emacs was
invoked, either from argv[0] or by looking up PATH.  Either way, you
get a file name that is encoded in the locale-specific encoding.  Then
you cons load-path by expanding file names relative to the startup
directory.  So you immediately need to be able to create file names
from directories, check whether a file exists and is a directory,
etc. -- all of that before you even know in what locale you started,
so you cannot decode these file names into the internal
representation, before using them.



Re: guile can't find a chinese named file

2017-01-30 Thread Eli Zaretskii
> From: Marko Rauhamaa 
> Date: Mon, 30 Jan 2017 21:01:31 +0200
> Cc: guile-user@gnu.org
> 
> UTF-8 beautifully bridges the interpretation gap between 8-bit character
> strings and text. However, the interpretation step should be done in the
> application and not in the programming language.

You can't do that in an environment that specifically targets
sophisticated multi-lingual text processing independent of the outside
locale.  Unless you can interpret byte sequences as characters, you
will be unable to even count characters in a range of text, let alone
render it for display.  And you cannot request applications to do
those low-level chores.

> Support libraries for Unicode are naturally welcome.

Well, in that case Emacs core is one huge "support library".  And I
don't see why Guile couldn't be another one; it should, IMO.

> Plain Unicode text is actually quite a rare programming need. It is
> woefully inadequate for the human interface, which generally requires
> numerous other typesetting effects.

You do need "other typesetting effects", naturally, but that doesn't
mean you can get away without more or less full support of Unicode
nowadays.  You are talking about programming, but we should instead
think about applications -- those of them which need to process text,
or even access files, as this discussion shows, do need decent Unicode
support.  E.g., users generally expect that decomposed and composed
character sequences behave and are treated identically, although they
are different byte-stream wise.

> But is also causing unnecessary grief in the computer-computer
> interface, where the classic textual naming and textual protocols
> are actually cutely chosen octet-aligned binary formats.

The universal acceptance of UTF-8 nowadays makes this much less of an
issue, IME.



Re: guile can't find a chinese named file

2017-01-30 Thread Eli Zaretskii
> From: David Kastrup 
> Cc: ma...@pacujo.net,  guile-user@gnu.org
> Date: Mon, 30 Jan 2017 20:00:03 +0100
> 
> Eli Zaretskii  writes:
> 
> > One other crucial detail is that Emacs also has unibyte strings
> > (arrays of bytes), which are necessary during startup, when Emacs
> > doesn't yet know how to decode non-ASCII strings.  Without that, you
> > wouldn't be able to start Emacs in a directory whose name includes
> > non-ASCII characters, because it couldn't access files it needs to
> > read to set up some of its decoding machinery.
> 
> Hm, I know that XEmacs-Mule emphatically does not have unibyte strings
> (and Stephen considers them a complication and abomination that should
> never have been left in Emacs), so it must be possible to get away
> without them.

I doubt that's possible, at least not in general.  (You could get away
if you assumed UTF-8 encoded file names.)  Some translation tables for
some encodings must load files using the likes of load-path, and if
that includes non-ASCII file names, you are screwed unless you can use
unibyte strings.  That is why all Emacs primitives that accept file
names support both unibyte and multibyte strings as file names.

> And I don't think that the comparatively worse Mule implementation
> of XEmacs is due to that decision.

Emacs 20 vintage Mule didn't have all the sophisticated Unicode
support machinery we have today, so maybe for that subset the above
wasn't necessary.  Then again, Emacs couldn't be safely built or
started in a non-ASCII directory until just a few years ago, so
perhaps no one bothered to test that thoroughly with XEmacs, except in
ISO 2022 locales.



Re: guile can't find a chinese named file

2017-01-30 Thread David Kastrup
Marko Rauhamaa  writes:

> David Kastrup :
>
>> Marko Rauhamaa  writes:
>>> Guile's mistake was to move to Unicode strings in the operating system
>>> interface.
>>
>> Emacs uses an UTF-8 based encoding internally [...]
>
> C uses 8-bit characters. That is a model worth emulating.

That's Guile-1.8.  Guile-2 uses either Latin-1 or UCS-32 in its string
internals, either Latin-1 or UTF-8 in its string API, and UTF-8 in its
string port internals.

> UTF-8 beautifully bridges the interpretation gap between 8-bit
> character strings and text. However, the interpretation step should be
> done in the application and not in the programming language.

Elisp is focused enough about text that I think its choice of going
UTF-8 internally with a Unicode character type reasonably sane.  Its
strings (the quirky unibyte strings excluded) are its own variant of
UTF-8 internally, and its string port equivalent (buffers) are that same
variant of UTF-8.  And its API talks UTF-8 for strings, Unicode (or
higher) for characters, and it indexes strings and buffers via Unicode
character counts.  Not O(1), but with enough trickery that it works well
enough in practice.  If strings are to be implemented strictly
Scheme-standard-conforming, they need to be O(1) indexable.  The Scheme
standard is rather silent about Unicode however.  I am not sure that
sticking to the standard where it does not deal with reality is the best
choice.

I think the case for Guile-2 to _also_ support "unibyte strings" would
be quite stronger than for Emacs (byte arrays and binary string ports
don't allow using Guile's string processing functions).  As it stands,
the design of Guile-2 in my book currently involves too many mandatory
conversions for just passing data around with Guile itself and
Guile-based applications.

> Support libraries for Unicode are naturally welcome.
>
> Plain Unicode text is actually quite a rare programming need. It is
> woefully inadequate for the human interface, which generally requires
> numerous other typesetting effects. But is also causing unnecessary
> grief in the computer-computer interface, where the classic textual
> naming and textual protocols are actually cutely chosen octet-aligned
> binary formats.

Sometimes yes, sometimes not.  As long as Guile wants to be a
general-purpose programming and extension language, it should deal
reliably and robustly and reproducibly with whatever is thrown at it.
Its choice of libraries does not currently make it so, but that could be
fixed by either working on the (GNU) libraries or by giving Guile its
own implementation.

But that needs to be considered a priority.  Nobody will do this just
for fun and kicks.

-- 
David Kastrup



Re: guile can't find a chinese named file

2017-01-30 Thread Marko Rauhamaa
David Kastrup :

> Marko Rauhamaa  writes:
>> Guile's mistake was to move to Unicode strings in the operating system
>> interface.
>
> Emacs uses an UTF-8 based encoding internally [...]

C uses 8-bit characters. That is a model worth emulating.

UTF-8 beautifully bridges the interpretation gap between 8-bit character
strings and text. However, the interpretation step should be done in the
application and not in the programming language. Support libraries for
Unicode are naturally welcome.

Plain Unicode text is actually quite a rare programming need. It is
woefully inadequate for the human interface, which generally requires
numerous other typesetting effects. But is also causing unnecessary
grief in the computer-computer interface, where the classic textual
naming and textual protocols are actually cutely chosen octet-aligned
binary formats.


Marko



Re: guile can't find a chinese named file

2017-01-30 Thread Eli Zaretskii
> From: David Kastrup 
> Date: Mon, 30 Jan 2017 19:32:14 +0100
> Cc: guile-user@gnu.org
> 
> Emacs uses an UTF-8 based encoding internally: basically, valid UTF-8 is
> represented as itself, there is a number of coding points beyond the
> actual limit of UTF-8 that is used for non-Unicode character sets, and
> single bytes not properly belonging to the read encoding are represented
> with 0x00...0x7f, 0xc0 0x80 ... 0xc0 0xbf and 0xc1 0x80 ... 0xbf (the
> latter two ranges are "overlong" encodings of 0x00...0x7f and
> consequently also not valid utf-8).

One other crucial detail is that Emacs also has unibyte strings
(arrays of bytes), which are necessary during startup, when Emacs
doesn't yet know how to decode non-ASCII strings.  Without that, you
wouldn't be able to start Emacs in a directory whose name includes
non-ASCII characters, because it couldn't access files it needs to
read to set up some of its decoding machinery.



Re: guile can't find a chinese named file

2017-01-30 Thread David Kastrup
Eli Zaretskii  writes:

>> From: David Kastrup 
>> Date: Mon, 30 Jan 2017 19:32:14 +0100
>> Cc: guile-user@gnu.org
>> 
>> Emacs uses an UTF-8 based encoding internally: basically, valid UTF-8 is
>> represented as itself, there is a number of coding points beyond the
>> actual limit of UTF-8 that is used for non-Unicode character sets, and
>> single bytes not properly belonging to the read encoding are represented
>> with 0x00...0x7f, 0xc0 0x80 ... 0xc0 0xbf and 0xc1 0x80 ... 0xbf (the
>> latter two ranges are "overlong" encodings of 0x00...0x7f and
>> consequently also not valid utf-8).
>
> One other crucial detail is that Emacs also has unibyte strings
> (arrays of bytes), which are necessary during startup, when Emacs
> doesn't yet know how to decode non-ASCII strings.  Without that, you
> wouldn't be able to start Emacs in a directory whose name includes
> non-ASCII characters, because it couldn't access files it needs to
> read to set up some of its decoding machinery.

Hm, I know that XEmacs-Mule emphatically does not have unibyte strings
(and Stephen considers them a complication and abomination that should
never have been left in Emacs), so it must be possible to get away
without them.  And I don't think that the comparatively worse Mule
implementation of XEmacs is due to that decision.

-- 
David Kastrup



Re: guile can't find a chinese named file

2017-01-30 Thread David Kastrup
Marko Rauhamaa  writes:

> David Kastrup :
>
>> But at any rate, this cannot easily be fixed since Guile uses libraries
>> for encoding/decoding that cannot deal reproducibly with improper byte
>> patterns.
>
> Guile's mistake was to move to Unicode strings in the operating system
> interface.

Emacs uses an UTF-8 based encoding internally: basically, valid UTF-8 is
represented as itself, there is a number of coding points beyond the
actual limit of UTF-8 that is used for non-Unicode character sets, and
single bytes not properly belonging to the read encoding are represented
with 0x00...0x7f, 0xc0 0x80 ... 0xc0 0xbf and 0xc1 0x80 ... 0xbf (the
latter two ranges are "overlong" encodings of 0x00...0x7f and
consequently also not valid utf-8).

The result is that random binary files read as "utf-8" grow by less than
50% in the internal representation (0x00-0x7f gets represented as
itself, and 0x80-0xff gets encoded with two bytes only when not being a
part of a valid utf-8 sequence).  The internal representation has
several guarantees for processing.  And when reencoding to utf-8 as
output encoding, the input gets reconstructed perfectly even when it
wasn't actually utf-8 to start with.

Emacs does not use "Unicode strings in the operating system interface"
but rather has a number of explicit encodings:

file-name-coding-system is a variable defined in ‘C source code’.
Its value is nil

Documentation:
Coding system for encoding file names.
If it is nil, ‘default-file-name-coding-system’ (which see) is used.

On MS-Windows, the value of this variable is largely ignored if
‘w32-unicode-filenames’ (which see) is non-nil.  Emacs on Windows
behaves as if file names were encoded in ‘utf-8’.

[back]


Coding system for saving this buffer:
  U -- utf-8-emacs-unix (alias: emacs-internal)

Default coding system (for new files):
  U -- utf-8-unix (alias: mule-utf-8-unix)

Coding system for keyboard input:
  U -- utf-8-unix (alias: mule-utf-8-unix)

Coding system for terminal output:
  U -- utf-8-unix (alias: mule-utf-8-unix)

Coding system for inter-client cut and paste:
  nil
Defaults for subprocess I/O:
  decoding: U -- utf-8-unix (alias: mule-utf-8-unix)

  encoding: U -- utf-8-unix (alias: mule-utf-8-unix)


Priority order for recognizing coding systems when reading files:
  1. utf-8 (alias: mule-utf-8)
  2. iso-2022-7bit 
  3. iso-latin-1 (alias: iso-8859-1 latin-1)
  4. iso-2022-7bit-lock (alias: iso-2022-int-1)
  5. iso-2022-8bit-ss2 
  6. emacs-mule 
  7. raw-text 
  8. iso-2022-jp (alias: junet)
  9. in-is13194-devanagari (alias: devanagari)
  10. chinese-iso-8bit (alias: cn-gb-2312 euc-china euc-cn cn-gb gb2312)
  11. utf-8-auto 
  12. utf-8-with-signature 
  13. utf-16 
  14. utf-16be-with-signature (alias: utf-16-be)
  15. utf-16le-with-signature (alias: utf-16-le)
  16. utf-16be 
  17. utf-16le 
  18. japanese-shift-jis (alias: shift_jis sjis)
  19. chinese-big5 (alias: big5 cn-big5 cp950)
  20. undecided 

  Other coding systems cannot be distinguished automatically
  from these, and therefore cannot be recognized automatically
  with the present coding system priorities.

Particular coding systems specified for certain file names:

  OPERATION TARGET PATTERN  CODING SYSTEM(s)
  - --  
  File I/O  "\\.dz\\'"  (no-conversion . no-conversion)
"\\.txz\\'" (no-conversion . no-conversion)
"\\.xz\\'"  (no-conversion . no-conversion)
"\\.lzma\\'"(no-conversion . no-conversion)
"\\.lz\\'"  (no-conversion . no-conversion)
"\\.g?z\\'" (no-conversion . no-conversion)
"\\.\\(?:tgz\\|svgz\\|sifz\\)\\'"
(no-conversion . no-conversion)
"\\.tbz2?\\'"   (no-conversion . no-conversion)
"\\.bz2\\'" (no-conversion . no-conversion)
"\\.Z\\'"   (no-conversion . no-conversion)
"\\.elc\\'" utf-8-emacs
"\\.el\\'"  prefer-utf-8
"\\.utf\\(-8\\)?\\'"utf-8
"\\.xml\\'" xml-find-file-coding-system
"\\(\\`\\|/\\)loaddefs.el\\'"
(raw-text . raw-text-unix)
"\\.tar\\'" (no-conversion . no-conversion)
"\\.po[tx]?\\'\\|\\.po\\."
po-find-file-coding-system
"\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'"
latexenc-find-file-coding-system
""  (undecided)
  Process I/O   nothing specified
  Network I/O   nothing specified

[back]


So in short: this is a rather complex domain.  And Elisp, as a
text-manipulating platform, has a whole lot of tools and 

Re: guile can't find a chinese named file

2017-01-30 Thread Marko Rauhamaa
David Kastrup :

> But at any rate, this cannot easily be fixed since Guile uses libraries
> for encoding/decoding that cannot deal reproducibly with improper byte
> patterns.

Guile's mistake was to move to Unicode strings in the operating system
interface.

> The problem here is that Guile cannot even deal with _properly_
> encoded UTF-8 file names on the command line.

Ok.


Marko



Re: guile can't find a chinese named file

2017-01-30 Thread David Kastrup
l...@gnu.org (Ludovic Courtès) writes:

[...]

> However, in 2.0, the current locale is *not* installed; you have to
> either call ‘setlocale’ explicitly (like in C), or set this environment
> variable (info "(guile) Environment Variables"):
>
>   GUILE_INSTALL_LOCALE=1
>
> When you do that (and this will be the default in 2.2), things work as
> expected:

But shouldn't that be done temporarily by default when processing the
command line?  Or alternatively, shouldn't Guile just pass the command
line byte-transparently to the file open calls?

It seems strange that Guile is unable to just pass what it received to
the file open call: if it is in 8bit-mode, this should work, and if it
is in UTF-8 mode (and the error messages suggest that it is), this
should work as well.

-- 
David Kastrup




Re: guile can't find a chinese named file

2017-01-30 Thread David Kastrup
Marko Rauhamaa  writes:

> David Kastrup :
>
>> Marko Rauhamaa  writes:
>>> l...@gnu.org (Ludovic Courtès):
 Guile assumes its command-line arguments are UTF-8-encoded and
 decodes them accordingly.
>>>
>>> I'm afraid that choice (which Python made, as well) was a bad one
>>> because Linux doesn't guarantee UTF-8 purity.
>>
>> Have you looked at the error messages? They are all perfect UTF-8. As
>> was the command line locale.
>
> I was responding to Ludovic.
>
>> Apparently, Guile can open the file just fine, and it sees the command
>> line just fine as encoded in utf-8.
>
> My problem is when it is not valid UTF-8.
>
>> So I really, really, really suggest that before people post their
>> theories that they actually bother cross-checking them with Guile.
>
> Well, execute these commands from bash:
>
>$ touch $'\xee'
>$ touch xyz
>$ ls -a
>.  ..  ''$'\356'  xyz

We are not talking about file names not encoded in UTF-8.  It is
well-known that Guile is unable to work with strings in UTF-8-encoding
when their byte-pattern is not valid UTF-8.

This is a red herring.  The problem is not that Guile is unable to deal
with badly encoded UTF-8 file names.  The problem is that Guile is
unable to deal with properly encoded UTF-8 file names when it is
supposed to execute them from the command line.

> Then, execute this guile program:
>
> 
> (let ((dir (opendir ".")))
>   (let loop ()
> (let ((filename (readdir dir)))
>   (if (not (eof-object? filename))
>   (begin
> (if (access? filename R_OK)
> (format #t "~s\n" filename))
> (loop))
> 
>
> It outputs:
>
>".."
>"."
>"xyz"
>
> skipping a file. This is a security risk. Files like these appear easily
> when extracting zip files, for example.

I am surprised this does not just throw a bad encoding exception.

But at any rate, this cannot easily be fixed since Guile uses libraries
for encoding/decoding that cannot deal reproducibly with improper byte
patterns.

The problem here is that Guile cannot even deal with _properly_ encoded
UTF-8 file names on the command line.

-- 
David Kastrup




Re: guile can't find a chinese named file

2017-01-30 Thread Ludovic Courtès
Hey Dave!

David Kastrup  skribis:

> l...@gnu.org (Ludovic Courtès) writes:

[...]

>>> ERROR: In procedure open-file: No such file or directory:
>>> "/home/hermann/Desktop/filename_\u540d\u5b57.scm"
>>
>> In C, argv is just an array of byte sequences, but in Guile,
>> (command-line) returns a list of strings, not a list of bytevectors.
>>
>> Guile decodes its arguments according to the encoding of the current
>> locale.  So if you’re in a UTF-8 locale (say, zn_CH.utf8 or en_US.utf8),
>> Guile assumes its command-line arguments are UTF-8-encoded and decodes
>> them accordingly.
>>
>> In the example above, it seems that the file name encoding was different
>> from the locale encoding, leading to this error.
>>
>> HTH!
>
> Did you actually test this?

Oops, let me clarify.

Command-line arguments are indeed decoded encoding to the locale
encoding (that’s commit ed4c3739668b4b111b38555b8bc101cb74c87c1c.)

When making a syscall like open(2), Guile converts strings to the locale
encoding.

However, in 2.0, the current locale is *not* installed; you have to
either call ‘setlocale’ explicitly (like in C), or set this environment
variable (info "(guile) Environment Variables"):

  GUILE_INSTALL_LOCALE=1

When you do that (and this will be the default in 2.2), things work as
expected:

--8<---cut here---start->8---
$ GUILE_INSTALL_LOCALE=1 guile λ.scm
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;   or pass the --no-auto-compile argument to disable.
;;; compiling /home/ludo/src/guile/λ.scm
;;; compiled 
/home/ludo/.cache/guile/ccache/2.0-LE-8-2.0/home/ludo/src/guile/λ.scm.go
hello λ!
$ locale
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER=fr_FR.utf8
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=
--8<---cut here---end--->8---

Sorry for the confusion!

Ludo’.




Re: guile can't find a chinese named file

2017-01-30 Thread Marko Rauhamaa
David Kastrup :

> Marko Rauhamaa  writes:
>> l...@gnu.org (Ludovic Courtès):
>>> Guile assumes its command-line arguments are UTF-8-encoded and
>>> decodes them accordingly.
>>
>> I'm afraid that choice (which Python made, as well) was a bad one
>> because Linux doesn't guarantee UTF-8 purity.
>
> Have you looked at the error messages? They are all perfect UTF-8. As
> was the command line locale.

I was responding to Ludovic.

> Apparently, Guile can open the file just fine, and it sees the command
> line just fine as encoded in utf-8.

My problem is when it is not valid UTF-8.

> So I really, really, really suggest that before people post their
> theories that they actually bother cross-checking them with Guile.

Well, execute these commands from bash:

   $ touch $'\xee'
   $ touch xyz
   $ ls -a
   .  ..  ''$'\356'  xyz

Then, execute this guile program:


(let ((dir (opendir ".")))
  (let loop ()
(let ((filename (readdir dir)))
  (if (not (eof-object? filename))
  (begin
(if (access? filename R_OK)
(format #t "~s\n" filename))
(loop))


It outputs:

   ".."
   "."
   "xyz"

skipping a file. This is a security risk. Files like these appear easily
when extracting zip files, for example.


Marko



Re: guile can't find a chinese named file

2017-01-30 Thread David Kastrup
Marko Rauhamaa  writes:

> l...@gnu.org (Ludovic Courtès):
>
>> In C, argv is just an array of byte sequences, but in Guile,
>> (command-line) returns a list of strings, not a list of bytevectors.
>>
>> Guile decodes its arguments according to the encoding of the current
>> locale. So if you’re in a UTF-8 locale (say, zn_CH.utf8 or
>> en_US.utf8), Guile assumes its command-line arguments are
>> UTF-8-encoded and decodes them accordingly.
>>
>> In the example above, it seems that the file name encoding was
>> different from the locale encoding, leading to this error.
>
> I'm afraid that choice (which Python made, as well) was a bad one
> because Linux doesn't guarantee UTF-8 purity.

Have you looked at the error messages?  They are all perfect UTF-8.  As
was the command line locale.

Here, have another data point:

dak@lola:/usr/local/tmp/lilypond$ guile-2.0 /tmp/f♯.scm 
;;; Stat of /tmp/f?.scm failed:
;;; ERROR: In procedure stat: No such file or directory: "/tmp/f\u266f.scm"
Backtrace:
In ice-9/boot-9.scm:
 160: 8 [catch #t # ...]
In unknown file:
   ?: 7 [apply-smob/1 #]
In ice-9/boot-9.scm:
  66: 6 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
 432: 5 [eval # #]
In ice-9/boot-9.scm:
2404: 4 [save-module-excursion #]
4056: 3 [#]
1727: 2 [%start-stack load-stack ...]
1732: 1 [#]
In unknown file:
   ?: 0 [primitive-load "/tmp/f\u266f.scm"]

ERROR: In procedure primitive-load:
ERROR: In procedure open-file: No such file or directory: "/tmp/f\u266f.scm"
dak@lola:/usr/local/tmp/lilypond$ guile-2.0 
GNU Guile 2.0.13
Copyright (C) 1995-2016 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> (open-input-
open-input-fileopen-input-string  
scheme@(guile-user)> (open-input-file "/tmp/f\u266f.scm")
$1 = #
scheme@(guile-user)> (open-input-file "/tmp/non-existent")
ERROR: In procedure open-file:
ERROR: In procedure open-file: No such file or directory: "/tmp/non-existent"

Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.
scheme@(guile-user) [1]> 

Apparently, Guile can open the file just fine, and it sees the command
line just fine as encoded in utf-8.

But during command line processing rather than afterwards, it fails
opening the file.

So I really, really, really suggest that before people post their
theories that they actually bother cross-checking them with Guile.

-- 
David Kastrup




Re: guile can't find a chinese named file

2017-01-30 Thread Marko Rauhamaa
l...@gnu.org (Ludovic Courtès):

> In C, argv is just an array of byte sequences, but in Guile,
> (command-line) returns a list of strings, not a list of bytevectors.
>
> Guile decodes its arguments according to the encoding of the current
> locale. So if you’re in a UTF-8 locale (say, zn_CH.utf8 or
> en_US.utf8), Guile assumes its command-line arguments are
> UTF-8-encoded and decodes them accordingly.
>
> In the example above, it seems that the file name encoding was
> different from the locale encoding, leading to this error.

I'm afraid that choice (which Python made, as well) was a bad one
because Linux doesn't guarantee UTF-8 purity.


Marko



Re: guile can't find a chinese named file

2017-01-30 Thread David Kastrup
l...@gnu.org (Ludovic Courtès) writes:

> Hi!
>
> Thomas Morley  skribis:
>
>> guile filename_名字.scm
>> ;;; Stat of /home/hermann/Desktop/filename_??.scm failed:
>> ;;; ERROR: In procedure stat: No such file or directory:
>> "/home/hermann/Desktop/filename_\u540d\u5b57.scm"
>> Backtrace:
>> In ice-9/boot-9.scm:
>>  160: 8 [catch #t # ...]
>> In unknown file:
>>?: 7 [apply-smob/1 #]
>> In ice-9/boot-9.scm:
>>   66: 6 [call-with-prompt prompt0 ...]
>> In ice-9/eval.scm:
>>  432: 5 [eval # #]
>> In ice-9/boot-9.scm:
>> 2404: 4 [save-module-excursion #> ice-9/boot-9.scm:4051:3 ()>]
>> 4058: 3 [#]
>> 1727: 2 [%start-stack load-stack ...]
>> 1732: 1 [#]
>> In unknown file:
>>?: 0 [primitive-load "/home/hermann/Desktop/filename_\u540d\u5b57.scm"]
>>
>> ERROR: In procedure primitive-load:
>> ERROR: In procedure open-file: No such file or directory:
>> "/home/hermann/Desktop/filename_\u540d\u5b57.scm"
>
> In C, argv is just an array of byte sequences, but in Guile,
> (command-line) returns a list of strings, not a list of bytevectors.
>
> Guile decodes its arguments according to the encoding of the current
> locale.  So if you’re in a UTF-8 locale (say, zn_CH.utf8 or en_US.utf8),
> Guile assumes its command-line arguments are UTF-8-encoded and decodes
> them accordingly.
>
> In the example above, it seems that the file name encoding was different
> from the locale encoding, leading to this error.
>
> HTH!

Did you actually test this?

dak@lola:/usr/local/tmp/lilypond$ locale
LANG=en_US.UTF-8
LANGUAGE=en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=
dak@lola:/usr/local/tmp/lilypond$ touch /tmp/f♯.scm
dak@lola:/usr/local/tmp/lilypond$ guile-2.0 /tmp/f♯.scm
;;; Stat of /tmp/f?.scm failed:
;;; ERROR: In procedure stat: No such file or directory: "/tmp/f\u266f.scm"
Backtrace:
In ice-9/boot-9.scm:
 160: 8 [catch #t # ...]
In unknown file:
   ?: 7 [apply-smob/1 #]
In ice-9/boot-9.scm:
  66: 6 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
 432: 5 [eval # #]
In ice-9/boot-9.scm:
2404: 4 [save-module-excursion #]
4056: 3 [#]
1727: 2 [%start-stack load-stack ...]
1732: 1 [#]
In unknown file:
   ?: 0 [primitive-load "/tmp/f\u266f.scm"]

ERROR: In procedure primitive-load:
ERROR: In procedure open-file: No such file or directory: "/tmp/f\u266f.scm"
dak@lola:/usr/local/tmp/lilypond$ ls -l /tmp/f*.scm
-rw-rw-r-- 1 dak dak 0 Jan 30 16:42 /tmp/f♯.scm

-- 
David Kastrup




Re: Guile benchmark

2017-01-30 Thread Ludovic Courtès
Hi,

Rchar  skribis:

> Is Guile scheme slows down entire GuixSD (If Guile speeds up, GuixSD also 
> speeds up)?

GuixSD is a GNU/Linux distro and even if some components are written in
Guile (such as the Shepherd and some low-level helpers), most of it is
written in C.

Guile’s speed has an influence on the speed of ‘guix’ commands though.

Ludo’.




Re: guile can't find a chinese named file

2017-01-30 Thread Ludovic Courtès
Hi!

Thomas Morley  skribis:

> guile filename_名字.scm
> ;;; Stat of /home/hermann/Desktop/filename_??.scm failed:
> ;;; ERROR: In procedure stat: No such file or directory:
> "/home/hermann/Desktop/filename_\u540d\u5b57.scm"
> Backtrace:
> In ice-9/boot-9.scm:
>  160: 8 [catch #t # ...]
> In unknown file:
>?: 7 [apply-smob/1 #]
> In ice-9/boot-9.scm:
>   66: 6 [call-with-prompt prompt0 ...]
> In ice-9/eval.scm:
>  432: 5 [eval # #]
> In ice-9/boot-9.scm:
> 2404: 4 [save-module-excursion # ice-9/boot-9.scm:4051:3 ()>]
> 4058: 3 [#]
> 1727: 2 [%start-stack load-stack ...]
> 1732: 1 [#]
> In unknown file:
>?: 0 [primitive-load "/home/hermann/Desktop/filename_\u540d\u5b57.scm"]
>
> ERROR: In procedure primitive-load:
> ERROR: In procedure open-file: No such file or directory:
> "/home/hermann/Desktop/filename_\u540d\u5b57.scm"

In C, argv is just an array of byte sequences, but in Guile,
(command-line) returns a list of strings, not a list of bytevectors.

Guile decodes its arguments according to the encoding of the current
locale.  So if you’re in a UTF-8 locale (say, zn_CH.utf8 or en_US.utf8),
Guile assumes its command-line arguments are UTF-8-encoded and decodes
them accordingly.

In the example above, it seems that the file name encoding was different
from the locale encoding, leading to this error.

HTH!

Ludo’.




Re: extension paths

2017-01-30 Thread Ludovic Courtès
Hi!

Linas Vepstas  skribis:

> I'd like to ask for help/clarification (and maybe even volunteer to
> write the required code) to resolve this extension-loading problem.
>
> I have almost a dozen C++ shared libs that implement guile modules,
> and regularly struggle to get them loaded correctly.   First, they
> need to be installed into one of 8 different places:
>
> /usr/lib/guile/2.0/extensions
> /usr/local/lib/guile/2.0/extensions
> /usr/lib64/guile/2.0/extensions
> /usr/local/lib64/guile/2.0/extensions
> /usr/lib/guile/2.2/extensions
> /usr/local/lib/guile/2.2/extensions
> /usr/lib64/guile/2.2/extensions
> /usr/local/lib64/guile/2.2/extensions

You can get the default location by running:

  pkg-config guile-2.0 --variable extensiondir

Or you can simply install to $extensiondir, where:

  libdir=$prefix/lib
  extensiondir=$libdir/guile/@GUILE_EFFECTIVE_VERSION@/extensions

If you use Autoconf, the GUILE_PKG macro defines and substitutes
‘GUILE_EFFECTIVE_VERSION’ (info "(guile) Autoconf Macros").

HTH!

Ludo’.