Re: Playing with guile (vs python). Generate file for GDP suitable for gnuplot.
Hello, Germán Diagowrites: > Hello everyone, > > I did a script that parses some file with the GDP since 1970 for many > countries. I filter the file and discard uninteresting fields, later I > write in a format suitable for gnuplot. > Wow, you started programming before I was born! > I did this in python and guile. > > In python it takes around 1.1 seconds in my raspberry pi. > > In Guile it is taking around 11 seconds. > > I do not claim they are doing exactly the same: in python I use arrays and > dictionaries, in guile I am using mainly lists, I would like to know if you > could give me advice on how to optimize it. I am just training for now. > I know very little about compiler optimization in general. But I remember a trick in guile. In guile, *top level* definitions doesn't get inlined, so you put everything in a big 'let', so that the compiler can do its job. For examples, instead of: (define (f) (display "hello world")) (define (g) (newline)) (f) (g) you write: (let () (define (f) (display "hello world")) (define (g) (newline)) (f) (g)) In addition, in the guile REPL, you can see how your code is optimized by running using the ',opt' command. For example: ,opt(let () (define (f) (display "hello world")) (define (g) (newline)) (f) (g)) gives: $1 = (begin (display "hello world") (newline)) Hope these help! > The scripts in both python and guile are attached and the profile data for > scheme is below. Just place in the same directory the .csv file and it > should generate an output file with the data ready for gnuplot :) > > % cumulative self > time seconds seconds name > 26.24 3.45 3.43 %read-line > 20.51 2.68 2.68 string->number > 15.54 2.05 2.03 string-delete > 7.39 7.75 0.97 map > 5.13 3.96 0.67 transform-data > 4.07 1.75 0.53 format:format-work > 3.17 0.41 0.41 string=? > 2.87 0.37 0.37 string-ref > 1.81 2.50 0.24 tilde-dispatch > 1.81 0.24 0.24 number->string > 1.51 0.34 0.20 is-a-digit > 1.06 0.28 0.14 anychar-dispatch > 1.06 0.14 0.14 display > 1.06 0.14 0.14 string-length > 1.06 0.14 0.14 char>=? > 1.06 0.14 0.14 char<=? > 1.06 0.14 0.14 string-split > 0.60 0.08 0.08 length > 0.45 0.49 0.06 format:out-num-padded > 0.45 0.06 0.06 remove-dots > 0.30 0.04 0.04 %after-gc-thunk > 0.30 0.04 0.04 list-tail > 0.30 0.04 0.04 write-char > 0.15 3.53 0.02 loop > 0.15 3.47 0.02 read-line > 0.15 0.02 0.02 substring > 0.15 0.02 0.02 list-ref > 0.15 0.02 0.02 reverse! > 0.15 0.02 0.02 # (e)> > 0.15 0.02 0.02 integer? > 0.15 0.02 0.02 char=? > 0.00 13.07 0.00 load-compiled/vm > 0.00 13.07 0.00 # (thunk)> > 0.00 13.07 0.00 # ()> > 0.00 13.07 0.00 call-with-prompt > 0.00 13.07 0.00 # ()> > 0.00 13.07 0.00 apply-smob/1 > 0.00 13.07 0.00 catch > 0.00 13.07 0.00 # > 0.00 13.07 0.00 run-repl* > 0.00 13.07 0.00 save-module-excursion > 0.00 13.07 0.00 statprof > 0.00 13.07 0.00 start-repl* > 0.00 11.22 0.00 # > 0.00 3.53 0.00 call-with-input-file > 0.00 1.85 0.00 call-with-output-file > 0.00 1.79 0.00 for-each > 0.00 1.75 0.00 format > 0.00 0.14 0.00 get-fields > 0.00 0.10 0.00 # (year)> > 0.00 0.06 0.00 # > 0.00 0.02 0.00 format:out-obj-padded > 0.00 0.02 0.00 remove > 0.00 0.02 0.00 call-with-output-string signature.asc Description: PGP signature
Re: guile can't find a chinese named file
> Date: Mon, 30 Jan 2017 20:42:38 + (UTC) > From: Mike Gran> Cc: "guile-user@gnu.org" > > Earlier in the 2.0.x release series, Guile had a hack where it started > up in a Latin-1 encoding, which would be capable of storing any > 8-bit string of bytes, even if they weren't Latin-1. Latin-1 has holes in the 0..255 range, so it isn't very appropriate in this situation. > And what was supposed to happen after setlocale was called? What Emacs does is explicitly decode any variable produced until that moment that is known to hold unibyte strings. > As an aside, GTK and GLIB based applications often use a method where > you may need to set the environment variable G_FILENAME_ENCODING > if your filename encoding is different from your locale encoding. > GTK/GLIB also likes to store strings internally as UTF-8, and will > convert to UTF-8 from either the locale or the G_FILENAME_ENCODING- > specified encoding. Emacs stores all environment variables in their original locale-specific encoding, as unibyte strings, and only decodes them when they are actually used or handed to Lisp.
Re: guile can't find a chinese named file
Eli Zaretskii: >> From: Marko Rauhamaa >> >> UTF-8 beautifully bridges the interpretation gap between 8-bit character >> strings and text. However, the interpretation step should be done in the >> application and not in the programming language. > > You can't do that in an environment that specifically targets > sophisticated multi-lingual text processing independent of the outside > locale. Unless you can interpret byte sequences as characters, you > will be unable to even count characters in a range of text, If you need to operate on Unicode text, have the application invoke the UTF-8 (or locale-specific) decoder. However, have the application request it instead of guessing that the environment is all Unicode. > You do need "other typesetting effects", naturally, but that doesn't > mean you can get away without more or less full support of Unicode > nowadays. Do support it, fully even, but let the application invoke the conversion when appropriate. > You are talking about programming, but we should instead think about > applications -- those of them which need to process text, or even > access files, as this discussion shows, do need decent Unicode > support. Why should opening a file require Unicode support if the underlying operating system knows nothing about Unicode? I can open a any given file in a tiny C program without any Unicode support, under Linux, that is. > E.g., users generally expect that decomposed and composed character > sequences behave and are treated identically, although they are > different byte-stream wise. Linux begs to differ. Regardless of the locale, two different octet sequences that ought to be equivalent UTF-8-wise will be considered different pathnames under Linux. I don't need a helicopter to walk across the street. >> But is also causing unnecessary grief in the computer-computer >> interface, where the classic textual naming and textual protocols >> are actually cutely chosen octet-aligned binary formats. > > The universal acceptance of UTF-8 nowadays makes this much less of an > issue, IME. You are jumping the gun. Linux won't be there for a long time if ever. Nothing prevents a pathname, or a command-line argument, or an environment variable, or the standard input from containing illegal UTF-8. I also wouldn't like my SMTP server to throw a UTF-8 decoding exception on parsing a command. (Also note that even Windows allows pathnames with illegal Unicode in them if I'm not mistaken.) Marko
Re: guile can't find a chinese named file
On Monday, January 30, 2017 12:00 PM, Eli Zaretskiiwrote: > Actually, the need arises even sooner. Consider how load-path is set > up during startup: it starts with the directory from which Emacs was > invoked, either from argv[0] or by looking up PATH. Either way, you > get a file name that is encoded in the locale-specific encoding. Then > you cons load-path by expanding file names relative to the startup > directory. So you immediately need to be able to create file names > from directories, check whether a file exists and is a directory, > etc. -- all of that before you even know in what locale you started, > so you cannot decode these file names into the internal > representation, before using them. Earlier in the 2.0.x release series, Guile had a hack where it started up in a Latin-1 encoding, which would be capable of storing any 8-bit string of bytes, even if they weren't Latin-1. I was the author of the first version of that hack. Anyway, while it was technically incorrect, it did get the job done for some of these locale-free byte string problems. It could open non-ASCII paths without really having an encoding, if I recall correctly. It was an uneasy middle ground, tho. Error messages with regards to file names would be mojibake. And string ports were a mess.And what was supposed to happen after setlocale was called? As an aside, GTK and GLIB based applications often use a method where you may need to set the environment variable G_FILENAME_ENCODING if your filename encoding is different from your locale encoding. GTK/GLIB also likes to store strings internally as UTF-8, and will convert to UTF-8 from either the locale or the G_FILENAME_ENCODING- specified encoding. As another aside, OpenBSD removed support for non-UTF8 locales. -Mike Gran
Re: guile can't find a chinese named file
> Date: Mon, 30 Jan 2017 21:32:41 +0200 > From: Eli Zaretskii> Cc: guile-user@gnu.org > > > Hm, I know that XEmacs-Mule emphatically does not have unibyte strings > > (and Stephen considers them a complication and abomination that should > > never have been left in Emacs), so it must be possible to get away > > without them. > > I doubt that's possible, at least not in general. (You could get away > if you assumed UTF-8 encoded file names.) Some translation tables for > some encodings must load files using the likes of load-path, and if > that includes non-ASCII file names, you are screwed unless you can use > unibyte strings. Actually, the need arises even sooner. Consider how load-path is set up during startup: it starts with the directory from which Emacs was invoked, either from argv[0] or by looking up PATH. Either way, you get a file name that is encoded in the locale-specific encoding. Then you cons load-path by expanding file names relative to the startup directory. So you immediately need to be able to create file names from directories, check whether a file exists and is a directory, etc. -- all of that before you even know in what locale you started, so you cannot decode these file names into the internal representation, before using them.
Re: guile can't find a chinese named file
> From: Marko Rauhamaa> Date: Mon, 30 Jan 2017 21:01:31 +0200 > Cc: guile-user@gnu.org > > UTF-8 beautifully bridges the interpretation gap between 8-bit character > strings and text. However, the interpretation step should be done in the > application and not in the programming language. You can't do that in an environment that specifically targets sophisticated multi-lingual text processing independent of the outside locale. Unless you can interpret byte sequences as characters, you will be unable to even count characters in a range of text, let alone render it for display. And you cannot request applications to do those low-level chores. > Support libraries for Unicode are naturally welcome. Well, in that case Emacs core is one huge "support library". And I don't see why Guile couldn't be another one; it should, IMO. > Plain Unicode text is actually quite a rare programming need. It is > woefully inadequate for the human interface, which generally requires > numerous other typesetting effects. You do need "other typesetting effects", naturally, but that doesn't mean you can get away without more or less full support of Unicode nowadays. You are talking about programming, but we should instead think about applications -- those of them which need to process text, or even access files, as this discussion shows, do need decent Unicode support. E.g., users generally expect that decomposed and composed character sequences behave and are treated identically, although they are different byte-stream wise. > But is also causing unnecessary grief in the computer-computer > interface, where the classic textual naming and textual protocols > are actually cutely chosen octet-aligned binary formats. The universal acceptance of UTF-8 nowadays makes this much less of an issue, IME.
Re: guile can't find a chinese named file
> From: David Kastrup> Cc: ma...@pacujo.net, guile-user@gnu.org > Date: Mon, 30 Jan 2017 20:00:03 +0100 > > Eli Zaretskii writes: > > > One other crucial detail is that Emacs also has unibyte strings > > (arrays of bytes), which are necessary during startup, when Emacs > > doesn't yet know how to decode non-ASCII strings. Without that, you > > wouldn't be able to start Emacs in a directory whose name includes > > non-ASCII characters, because it couldn't access files it needs to > > read to set up some of its decoding machinery. > > Hm, I know that XEmacs-Mule emphatically does not have unibyte strings > (and Stephen considers them a complication and abomination that should > never have been left in Emacs), so it must be possible to get away > without them. I doubt that's possible, at least not in general. (You could get away if you assumed UTF-8 encoded file names.) Some translation tables for some encodings must load files using the likes of load-path, and if that includes non-ASCII file names, you are screwed unless you can use unibyte strings. That is why all Emacs primitives that accept file names support both unibyte and multibyte strings as file names. > And I don't think that the comparatively worse Mule implementation > of XEmacs is due to that decision. Emacs 20 vintage Mule didn't have all the sophisticated Unicode support machinery we have today, so maybe for that subset the above wasn't necessary. Then again, Emacs couldn't be safely built or started in a non-ASCII directory until just a few years ago, so perhaps no one bothered to test that thoroughly with XEmacs, except in ISO 2022 locales.
Re: guile can't find a chinese named file
Marko Rauhamaawrites: > David Kastrup : > >> Marko Rauhamaa writes: >>> Guile's mistake was to move to Unicode strings in the operating system >>> interface. >> >> Emacs uses an UTF-8 based encoding internally [...] > > C uses 8-bit characters. That is a model worth emulating. That's Guile-1.8. Guile-2 uses either Latin-1 or UCS-32 in its string internals, either Latin-1 or UTF-8 in its string API, and UTF-8 in its string port internals. > UTF-8 beautifully bridges the interpretation gap between 8-bit > character strings and text. However, the interpretation step should be > done in the application and not in the programming language. Elisp is focused enough about text that I think its choice of going UTF-8 internally with a Unicode character type reasonably sane. Its strings (the quirky unibyte strings excluded) are its own variant of UTF-8 internally, and its string port equivalent (buffers) are that same variant of UTF-8. And its API talks UTF-8 for strings, Unicode (or higher) for characters, and it indexes strings and buffers via Unicode character counts. Not O(1), but with enough trickery that it works well enough in practice. If strings are to be implemented strictly Scheme-standard-conforming, they need to be O(1) indexable. The Scheme standard is rather silent about Unicode however. I am not sure that sticking to the standard where it does not deal with reality is the best choice. I think the case for Guile-2 to _also_ support "unibyte strings" would be quite stronger than for Emacs (byte arrays and binary string ports don't allow using Guile's string processing functions). As it stands, the design of Guile-2 in my book currently involves too many mandatory conversions for just passing data around with Guile itself and Guile-based applications. > Support libraries for Unicode are naturally welcome. > > Plain Unicode text is actually quite a rare programming need. It is > woefully inadequate for the human interface, which generally requires > numerous other typesetting effects. But is also causing unnecessary > grief in the computer-computer interface, where the classic textual > naming and textual protocols are actually cutely chosen octet-aligned > binary formats. Sometimes yes, sometimes not. As long as Guile wants to be a general-purpose programming and extension language, it should deal reliably and robustly and reproducibly with whatever is thrown at it. Its choice of libraries does not currently make it so, but that could be fixed by either working on the (GNU) libraries or by giving Guile its own implementation. But that needs to be considered a priority. Nobody will do this just for fun and kicks. -- David Kastrup
Re: guile can't find a chinese named file
David Kastrup: > Marko Rauhamaa writes: >> Guile's mistake was to move to Unicode strings in the operating system >> interface. > > Emacs uses an UTF-8 based encoding internally [...] C uses 8-bit characters. That is a model worth emulating. UTF-8 beautifully bridges the interpretation gap between 8-bit character strings and text. However, the interpretation step should be done in the application and not in the programming language. Support libraries for Unicode are naturally welcome. Plain Unicode text is actually quite a rare programming need. It is woefully inadequate for the human interface, which generally requires numerous other typesetting effects. But is also causing unnecessary grief in the computer-computer interface, where the classic textual naming and textual protocols are actually cutely chosen octet-aligned binary formats. Marko
Re: guile can't find a chinese named file
> From: David Kastrup> Date: Mon, 30 Jan 2017 19:32:14 +0100 > Cc: guile-user@gnu.org > > Emacs uses an UTF-8 based encoding internally: basically, valid UTF-8 is > represented as itself, there is a number of coding points beyond the > actual limit of UTF-8 that is used for non-Unicode character sets, and > single bytes not properly belonging to the read encoding are represented > with 0x00...0x7f, 0xc0 0x80 ... 0xc0 0xbf and 0xc1 0x80 ... 0xbf (the > latter two ranges are "overlong" encodings of 0x00...0x7f and > consequently also not valid utf-8). One other crucial detail is that Emacs also has unibyte strings (arrays of bytes), which are necessary during startup, when Emacs doesn't yet know how to decode non-ASCII strings. Without that, you wouldn't be able to start Emacs in a directory whose name includes non-ASCII characters, because it couldn't access files it needs to read to set up some of its decoding machinery.
Re: guile can't find a chinese named file
Eli Zaretskiiwrites: >> From: David Kastrup >> Date: Mon, 30 Jan 2017 19:32:14 +0100 >> Cc: guile-user@gnu.org >> >> Emacs uses an UTF-8 based encoding internally: basically, valid UTF-8 is >> represented as itself, there is a number of coding points beyond the >> actual limit of UTF-8 that is used for non-Unicode character sets, and >> single bytes not properly belonging to the read encoding are represented >> with 0x00...0x7f, 0xc0 0x80 ... 0xc0 0xbf and 0xc1 0x80 ... 0xbf (the >> latter two ranges are "overlong" encodings of 0x00...0x7f and >> consequently also not valid utf-8). > > One other crucial detail is that Emacs also has unibyte strings > (arrays of bytes), which are necessary during startup, when Emacs > doesn't yet know how to decode non-ASCII strings. Without that, you > wouldn't be able to start Emacs in a directory whose name includes > non-ASCII characters, because it couldn't access files it needs to > read to set up some of its decoding machinery. Hm, I know that XEmacs-Mule emphatically does not have unibyte strings (and Stephen considers them a complication and abomination that should never have been left in Emacs), so it must be possible to get away without them. And I don't think that the comparatively worse Mule implementation of XEmacs is due to that decision. -- David Kastrup
Re: guile can't find a chinese named file
Marko Rauhamaawrites: > David Kastrup : > >> But at any rate, this cannot easily be fixed since Guile uses libraries >> for encoding/decoding that cannot deal reproducibly with improper byte >> patterns. > > Guile's mistake was to move to Unicode strings in the operating system > interface. Emacs uses an UTF-8 based encoding internally: basically, valid UTF-8 is represented as itself, there is a number of coding points beyond the actual limit of UTF-8 that is used for non-Unicode character sets, and single bytes not properly belonging to the read encoding are represented with 0x00...0x7f, 0xc0 0x80 ... 0xc0 0xbf and 0xc1 0x80 ... 0xbf (the latter two ranges are "overlong" encodings of 0x00...0x7f and consequently also not valid utf-8). The result is that random binary files read as "utf-8" grow by less than 50% in the internal representation (0x00-0x7f gets represented as itself, and 0x80-0xff gets encoded with two bytes only when not being a part of a valid utf-8 sequence). The internal representation has several guarantees for processing. And when reencoding to utf-8 as output encoding, the input gets reconstructed perfectly even when it wasn't actually utf-8 to start with. Emacs does not use "Unicode strings in the operating system interface" but rather has a number of explicit encodings: file-name-coding-system is a variable defined in ‘C source code’. Its value is nil Documentation: Coding system for encoding file names. If it is nil, ‘default-file-name-coding-system’ (which see) is used. On MS-Windows, the value of this variable is largely ignored if ‘w32-unicode-filenames’ (which see) is non-nil. Emacs on Windows behaves as if file names were encoded in ‘utf-8’. [back] Coding system for saving this buffer: U -- utf-8-emacs-unix (alias: emacs-internal) Default coding system (for new files): U -- utf-8-unix (alias: mule-utf-8-unix) Coding system for keyboard input: U -- utf-8-unix (alias: mule-utf-8-unix) Coding system for terminal output: U -- utf-8-unix (alias: mule-utf-8-unix) Coding system for inter-client cut and paste: nil Defaults for subprocess I/O: decoding: U -- utf-8-unix (alias: mule-utf-8-unix) encoding: U -- utf-8-unix (alias: mule-utf-8-unix) Priority order for recognizing coding systems when reading files: 1. utf-8 (alias: mule-utf-8) 2. iso-2022-7bit 3. iso-latin-1 (alias: iso-8859-1 latin-1) 4. iso-2022-7bit-lock (alias: iso-2022-int-1) 5. iso-2022-8bit-ss2 6. emacs-mule 7. raw-text 8. iso-2022-jp (alias: junet) 9. in-is13194-devanagari (alias: devanagari) 10. chinese-iso-8bit (alias: cn-gb-2312 euc-china euc-cn cn-gb gb2312) 11. utf-8-auto 12. utf-8-with-signature 13. utf-16 14. utf-16be-with-signature (alias: utf-16-be) 15. utf-16le-with-signature (alias: utf-16-le) 16. utf-16be 17. utf-16le 18. japanese-shift-jis (alias: shift_jis sjis) 19. chinese-big5 (alias: big5 cn-big5 cp950) 20. undecided Other coding systems cannot be distinguished automatically from these, and therefore cannot be recognized automatically with the present coding system priorities. Particular coding systems specified for certain file names: OPERATION TARGET PATTERN CODING SYSTEM(s) - -- File I/O "\\.dz\\'" (no-conversion . no-conversion) "\\.txz\\'" (no-conversion . no-conversion) "\\.xz\\'" (no-conversion . no-conversion) "\\.lzma\\'"(no-conversion . no-conversion) "\\.lz\\'" (no-conversion . no-conversion) "\\.g?z\\'" (no-conversion . no-conversion) "\\.\\(?:tgz\\|svgz\\|sifz\\)\\'" (no-conversion . no-conversion) "\\.tbz2?\\'" (no-conversion . no-conversion) "\\.bz2\\'" (no-conversion . no-conversion) "\\.Z\\'" (no-conversion . no-conversion) "\\.elc\\'" utf-8-emacs "\\.el\\'" prefer-utf-8 "\\.utf\\(-8\\)?\\'"utf-8 "\\.xml\\'" xml-find-file-coding-system "\\(\\`\\|/\\)loaddefs.el\\'" (raw-text . raw-text-unix) "\\.tar\\'" (no-conversion . no-conversion) "\\.po[tx]?\\'\\|\\.po\\." po-find-file-coding-system "\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'" latexenc-find-file-coding-system "" (undecided) Process I/O nothing specified Network I/O nothing specified [back] So in short: this is a rather complex domain. And Elisp, as a text-manipulating platform, has a whole lot of tools and
Re: guile can't find a chinese named file
David Kastrup: > But at any rate, this cannot easily be fixed since Guile uses libraries > for encoding/decoding that cannot deal reproducibly with improper byte > patterns. Guile's mistake was to move to Unicode strings in the operating system interface. > The problem here is that Guile cannot even deal with _properly_ > encoded UTF-8 file names on the command line. Ok. Marko
Re: guile can't find a chinese named file
l...@gnu.org (Ludovic Courtès) writes: [...] > However, in 2.0, the current locale is *not* installed; you have to > either call ‘setlocale’ explicitly (like in C), or set this environment > variable (info "(guile) Environment Variables"): > > GUILE_INSTALL_LOCALE=1 > > When you do that (and this will be the default in 2.2), things work as > expected: But shouldn't that be done temporarily by default when processing the command line? Or alternatively, shouldn't Guile just pass the command line byte-transparently to the file open calls? It seems strange that Guile is unable to just pass what it received to the file open call: if it is in 8bit-mode, this should work, and if it is in UTF-8 mode (and the error messages suggest that it is), this should work as well. -- David Kastrup
Re: guile can't find a chinese named file
Marko Rauhamaawrites: > David Kastrup : > >> Marko Rauhamaa writes: >>> l...@gnu.org (Ludovic Courtès): Guile assumes its command-line arguments are UTF-8-encoded and decodes them accordingly. >>> >>> I'm afraid that choice (which Python made, as well) was a bad one >>> because Linux doesn't guarantee UTF-8 purity. >> >> Have you looked at the error messages? They are all perfect UTF-8. As >> was the command line locale. > > I was responding to Ludovic. > >> Apparently, Guile can open the file just fine, and it sees the command >> line just fine as encoded in utf-8. > > My problem is when it is not valid UTF-8. > >> So I really, really, really suggest that before people post their >> theories that they actually bother cross-checking them with Guile. > > Well, execute these commands from bash: > >$ touch $'\xee' >$ touch xyz >$ ls -a >. .. ''$'\356' xyz We are not talking about file names not encoded in UTF-8. It is well-known that Guile is unable to work with strings in UTF-8-encoding when their byte-pattern is not valid UTF-8. This is a red herring. The problem is not that Guile is unable to deal with badly encoded UTF-8 file names. The problem is that Guile is unable to deal with properly encoded UTF-8 file names when it is supposed to execute them from the command line. > Then, execute this guile program: > > > (let ((dir (opendir "."))) > (let loop () > (let ((filename (readdir dir))) > (if (not (eof-object? filename)) > (begin > (if (access? filename R_OK) > (format #t "~s\n" filename)) > (loop)) > > > It outputs: > >".." >"." >"xyz" > > skipping a file. This is a security risk. Files like these appear easily > when extracting zip files, for example. I am surprised this does not just throw a bad encoding exception. But at any rate, this cannot easily be fixed since Guile uses libraries for encoding/decoding that cannot deal reproducibly with improper byte patterns. The problem here is that Guile cannot even deal with _properly_ encoded UTF-8 file names on the command line. -- David Kastrup
Re: guile can't find a chinese named file
Hey Dave! David Kastrupskribis: > l...@gnu.org (Ludovic Courtès) writes: [...] >>> ERROR: In procedure open-file: No such file or directory: >>> "/home/hermann/Desktop/filename_\u540d\u5b57.scm" >> >> In C, argv is just an array of byte sequences, but in Guile, >> (command-line) returns a list of strings, not a list of bytevectors. >> >> Guile decodes its arguments according to the encoding of the current >> locale. So if you’re in a UTF-8 locale (say, zn_CH.utf8 or en_US.utf8), >> Guile assumes its command-line arguments are UTF-8-encoded and decodes >> them accordingly. >> >> In the example above, it seems that the file name encoding was different >> from the locale encoding, leading to this error. >> >> HTH! > > Did you actually test this? Oops, let me clarify. Command-line arguments are indeed decoded encoding to the locale encoding (that’s commit ed4c3739668b4b111b38555b8bc101cb74c87c1c.) When making a syscall like open(2), Guile converts strings to the locale encoding. However, in 2.0, the current locale is *not* installed; you have to either call ‘setlocale’ explicitly (like in C), or set this environment variable (info "(guile) Environment Variables"): GUILE_INSTALL_LOCALE=1 When you do that (and this will be the default in 2.2), things work as expected: --8<---cut here---start->8--- $ GUILE_INSTALL_LOCALE=1 guile λ.scm ;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0 ;;; or pass the --no-auto-compile argument to disable. ;;; compiling /home/ludo/src/guile/λ.scm ;;; compiled /home/ludo/.cache/guile/ccache/2.0-LE-8-2.0/home/ludo/src/guile/λ.scm.go hello λ! $ locale LANG=en_US.utf8 LC_CTYPE="en_US.utf8" LC_NUMERIC="en_US.utf8" LC_TIME="en_US.utf8" LC_COLLATE="en_US.utf8" LC_MONETARY="en_US.utf8" LC_MESSAGES="en_US.utf8" LC_PAPER=fr_FR.utf8 LC_NAME="en_US.utf8" LC_ADDRESS="en_US.utf8" LC_TELEPHONE="en_US.utf8" LC_MEASUREMENT="en_US.utf8" LC_IDENTIFICATION="en_US.utf8" LC_ALL= --8<---cut here---end--->8--- Sorry for the confusion! Ludo’.
Re: guile can't find a chinese named file
David Kastrup: > Marko Rauhamaa writes: >> l...@gnu.org (Ludovic Courtès): >>> Guile assumes its command-line arguments are UTF-8-encoded and >>> decodes them accordingly. >> >> I'm afraid that choice (which Python made, as well) was a bad one >> because Linux doesn't guarantee UTF-8 purity. > > Have you looked at the error messages? They are all perfect UTF-8. As > was the command line locale. I was responding to Ludovic. > Apparently, Guile can open the file just fine, and it sees the command > line just fine as encoded in utf-8. My problem is when it is not valid UTF-8. > So I really, really, really suggest that before people post their > theories that they actually bother cross-checking them with Guile. Well, execute these commands from bash: $ touch $'\xee' $ touch xyz $ ls -a . .. ''$'\356' xyz Then, execute this guile program: (let ((dir (opendir "."))) (let loop () (let ((filename (readdir dir))) (if (not (eof-object? filename)) (begin (if (access? filename R_OK) (format #t "~s\n" filename)) (loop)) It outputs: ".." "." "xyz" skipping a file. This is a security risk. Files like these appear easily when extracting zip files, for example. Marko
Re: guile can't find a chinese named file
Marko Rauhamaawrites: > l...@gnu.org (Ludovic Courtès): > >> In C, argv is just an array of byte sequences, but in Guile, >> (command-line) returns a list of strings, not a list of bytevectors. >> >> Guile decodes its arguments according to the encoding of the current >> locale. So if you’re in a UTF-8 locale (say, zn_CH.utf8 or >> en_US.utf8), Guile assumes its command-line arguments are >> UTF-8-encoded and decodes them accordingly. >> >> In the example above, it seems that the file name encoding was >> different from the locale encoding, leading to this error. > > I'm afraid that choice (which Python made, as well) was a bad one > because Linux doesn't guarantee UTF-8 purity. Have you looked at the error messages? They are all perfect UTF-8. As was the command line locale. Here, have another data point: dak@lola:/usr/local/tmp/lilypond$ guile-2.0 /tmp/f♯.scm ;;; Stat of /tmp/f?.scm failed: ;;; ERROR: In procedure stat: No such file or directory: "/tmp/f\u266f.scm" Backtrace: In ice-9/boot-9.scm: 160: 8 [catch #t # ...] In unknown file: ?: 7 [apply-smob/1 #] In ice-9/boot-9.scm: 66: 6 [call-with-prompt prompt0 ...] In ice-9/eval.scm: 432: 5 [eval # #] In ice-9/boot-9.scm: 2404: 4 [save-module-excursion #] 4056: 3 [#] 1727: 2 [%start-stack load-stack ...] 1732: 1 [#] In unknown file: ?: 0 [primitive-load "/tmp/f\u266f.scm"] ERROR: In procedure primitive-load: ERROR: In procedure open-file: No such file or directory: "/tmp/f\u266f.scm" dak@lola:/usr/local/tmp/lilypond$ guile-2.0 GNU Guile 2.0.13 Copyright (C) 1995-2016 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guile-user)> (open-input- open-input-fileopen-input-string scheme@(guile-user)> (open-input-file "/tmp/f\u266f.scm") $1 = # scheme@(guile-user)> (open-input-file "/tmp/non-existent") ERROR: In procedure open-file: ERROR: In procedure open-file: No such file or directory: "/tmp/non-existent" Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue. scheme@(guile-user) [1]> Apparently, Guile can open the file just fine, and it sees the command line just fine as encoded in utf-8. But during command line processing rather than afterwards, it fails opening the file. So I really, really, really suggest that before people post their theories that they actually bother cross-checking them with Guile. -- David Kastrup
Re: guile can't find a chinese named file
l...@gnu.org (Ludovic Courtès): > In C, argv is just an array of byte sequences, but in Guile, > (command-line) returns a list of strings, not a list of bytevectors. > > Guile decodes its arguments according to the encoding of the current > locale. So if you’re in a UTF-8 locale (say, zn_CH.utf8 or > en_US.utf8), Guile assumes its command-line arguments are > UTF-8-encoded and decodes them accordingly. > > In the example above, it seems that the file name encoding was > different from the locale encoding, leading to this error. I'm afraid that choice (which Python made, as well) was a bad one because Linux doesn't guarantee UTF-8 purity. Marko
Re: guile can't find a chinese named file
l...@gnu.org (Ludovic Courtès) writes: > Hi! > > Thomas Morleyskribis: > >> guile filename_名字.scm >> ;;; Stat of /home/hermann/Desktop/filename_??.scm failed: >> ;;; ERROR: In procedure stat: No such file or directory: >> "/home/hermann/Desktop/filename_\u540d\u5b57.scm" >> Backtrace: >> In ice-9/boot-9.scm: >> 160: 8 [catch #t # ...] >> In unknown file: >>?: 7 [apply-smob/1 #] >> In ice-9/boot-9.scm: >> 66: 6 [call-with-prompt prompt0 ...] >> In ice-9/eval.scm: >> 432: 5 [eval # #] >> In ice-9/boot-9.scm: >> 2404: 4 [save-module-excursion #> ice-9/boot-9.scm:4051:3 ()>] >> 4058: 3 [#] >> 1727: 2 [%start-stack load-stack ...] >> 1732: 1 [#] >> In unknown file: >>?: 0 [primitive-load "/home/hermann/Desktop/filename_\u540d\u5b57.scm"] >> >> ERROR: In procedure primitive-load: >> ERROR: In procedure open-file: No such file or directory: >> "/home/hermann/Desktop/filename_\u540d\u5b57.scm" > > In C, argv is just an array of byte sequences, but in Guile, > (command-line) returns a list of strings, not a list of bytevectors. > > Guile decodes its arguments according to the encoding of the current > locale. So if you’re in a UTF-8 locale (say, zn_CH.utf8 or en_US.utf8), > Guile assumes its command-line arguments are UTF-8-encoded and decodes > them accordingly. > > In the example above, it seems that the file name encoding was different > from the locale encoding, leading to this error. > > HTH! Did you actually test this? dak@lola:/usr/local/tmp/lilypond$ locale LANG=en_US.UTF-8 LANGUAGE=en LC_CTYPE="en_US.UTF-8" LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE="en_US.UTF-8" LC_MONETARY=en_US.UTF-8 LC_MESSAGES="en_US.UTF-8" LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL= dak@lola:/usr/local/tmp/lilypond$ touch /tmp/f♯.scm dak@lola:/usr/local/tmp/lilypond$ guile-2.0 /tmp/f♯.scm ;;; Stat of /tmp/f?.scm failed: ;;; ERROR: In procedure stat: No such file or directory: "/tmp/f\u266f.scm" Backtrace: In ice-9/boot-9.scm: 160: 8 [catch #t # ...] In unknown file: ?: 7 [apply-smob/1 #] In ice-9/boot-9.scm: 66: 6 [call-with-prompt prompt0 ...] In ice-9/eval.scm: 432: 5 [eval # #] In ice-9/boot-9.scm: 2404: 4 [save-module-excursion #] 4056: 3 [#] 1727: 2 [%start-stack load-stack ...] 1732: 1 [#] In unknown file: ?: 0 [primitive-load "/tmp/f\u266f.scm"] ERROR: In procedure primitive-load: ERROR: In procedure open-file: No such file or directory: "/tmp/f\u266f.scm" dak@lola:/usr/local/tmp/lilypond$ ls -l /tmp/f*.scm -rw-rw-r-- 1 dak dak 0 Jan 30 16:42 /tmp/f♯.scm -- David Kastrup
Re: Guile benchmark
Hi, Rcharskribis: > Is Guile scheme slows down entire GuixSD (If Guile speeds up, GuixSD also > speeds up)? GuixSD is a GNU/Linux distro and even if some components are written in Guile (such as the Shepherd and some low-level helpers), most of it is written in C. Guile’s speed has an influence on the speed of ‘guix’ commands though. Ludo’.
Re: guile can't find a chinese named file
Hi! Thomas Morleyskribis: > guile filename_名字.scm > ;;; Stat of /home/hermann/Desktop/filename_??.scm failed: > ;;; ERROR: In procedure stat: No such file or directory: > "/home/hermann/Desktop/filename_\u540d\u5b57.scm" > Backtrace: > In ice-9/boot-9.scm: > 160: 8 [catch #t # ...] > In unknown file: >?: 7 [apply-smob/1 #] > In ice-9/boot-9.scm: > 66: 6 [call-with-prompt prompt0 ...] > In ice-9/eval.scm: > 432: 5 [eval # #] > In ice-9/boot-9.scm: > 2404: 4 [save-module-excursion # ice-9/boot-9.scm:4051:3 ()>] > 4058: 3 [#] > 1727: 2 [%start-stack load-stack ...] > 1732: 1 [#] > In unknown file: >?: 0 [primitive-load "/home/hermann/Desktop/filename_\u540d\u5b57.scm"] > > ERROR: In procedure primitive-load: > ERROR: In procedure open-file: No such file or directory: > "/home/hermann/Desktop/filename_\u540d\u5b57.scm" In C, argv is just an array of byte sequences, but in Guile, (command-line) returns a list of strings, not a list of bytevectors. Guile decodes its arguments according to the encoding of the current locale. So if you’re in a UTF-8 locale (say, zn_CH.utf8 or en_US.utf8), Guile assumes its command-line arguments are UTF-8-encoded and decodes them accordingly. In the example above, it seems that the file name encoding was different from the locale encoding, leading to this error. HTH! Ludo’.
Re: extension paths
Hi! Linas Vepstasskribis: > I'd like to ask for help/clarification (and maybe even volunteer to > write the required code) to resolve this extension-loading problem. > > I have almost a dozen C++ shared libs that implement guile modules, > and regularly struggle to get them loaded correctly. First, they > need to be installed into one of 8 different places: > > /usr/lib/guile/2.0/extensions > /usr/local/lib/guile/2.0/extensions > /usr/lib64/guile/2.0/extensions > /usr/local/lib64/guile/2.0/extensions > /usr/lib/guile/2.2/extensions > /usr/local/lib/guile/2.2/extensions > /usr/lib64/guile/2.2/extensions > /usr/local/lib64/guile/2.2/extensions You can get the default location by running: pkg-config guile-2.0 --variable extensiondir Or you can simply install to $extensiondir, where: libdir=$prefix/lib extensiondir=$libdir/guile/@GUILE_EFFECTIVE_VERSION@/extensions If you use Autoconf, the GUILE_PKG macro defines and substitutes ‘GUILE_EFFECTIVE_VERSION’ (info "(guile) Autoconf Macros"). HTH! Ludo’.