Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-13 Thread zimbatm
Related to that, the suckless conferent talk on UTF-8[1] was pretty interesting. The complexity of Unicode and all that goes with it is pretty crazy. That being said the libutf8 from the same guys seem to be pretty decent and takes sane defaults to a lot of these questions. [1]

Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-13 Thread Erik Rybakken
To anyone interested in this discussion, you may want to follow the issue here: https://github.com/NixOS/nix/issues/770 There I wrote the reason I want this functionality. The main reason is that I want to do as much as possible in Nix, instead of pushing stuff to a building script. Best, Erik

Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-12 Thread Christian Theune
Hi, there are sane approaches to dealing with Strings (encoded) vs. Text (decoded) properly. We might not be able to do this at the moment, but I find Python (3)’s byte/text model quite sane. It might be too much for us to support this with a quick fix, but we should keep that on the radar, I

Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-12 Thread Jookia
On Mon, Jan 11, 2016 at 11:29:37PM +, Erik Rybakken wrote: > Hi, > > In nix, when finding the length of a string containing non-ascii characters, > the number of bytes in the representation is returned, instead of the actual > number of characters: > > > nix-repl> builtins.stringLength "å" > >

[Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-11 Thread Erik Rybakken
Hi, In nix, when finding the length of a string containing non-ascii characters, the number of bytes in the representation is returned, instead of the actual number of characters: > nix-repl> builtins.stringLength "å" > 2 Is there any way to get the number of characters instead, or does this

Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-11 Thread Vladimír Čunát
Hi. On 01/12/2016 12:29 AM, Erik Rybakken wrote: > In nix, when finding the length of a string containing non-ascii characters, > the number of bytes in the representation is returned I'm fairly certain it would need changes to the core of the evaluator to properly support UTF-8 (I assume that