Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-13 Thread zimbatm
Related to that, the suckless conferent talk on UTF-8[1] was pretty
interesting. The complexity of Unicode and all that goes with it is pretty
crazy.
That being said the libutf8 from the same guys seem to be pretty decent and
takes sane defaults to a lot of these questions.

[1] http://suckless.org/conference/ (last on the page)

On Tue, 12 Jan 2016 at 19:10 Christian Theune  wrote:

> Hi,
>
> there are sane approaches to dealing with Strings (encoded) vs. Text
> (decoded) properly. We might not be able to do this at the moment, but I
> find Python (3)’s byte/text model quite sane.
>
> It might be too much for us to support this with a quick fix, but we
> should keep that on the radar, I guess.
>
> Christian
>
> On 12 Jan 2016, at 18:26, Jookia <166...@gmail.com> wrote:
>
> On Mon, Jan 11, 2016 at 11:29:37PM +, Erik Rybakken wrote:
>
> Hi,
>
> In nix, when finding the length of a string containing non-ascii
> characters,
> the number of bytes in the representation is returned, instead of the
> actual
> number of characters:
>
> nix-repl> builtins.stringLength "å"
> 2
>
>
> Is there any way to get the number of characters instead, or does this
> require changes in the core language?
>
>
> It's probably best to leave it like it is now. A string's length is two if
> that's the number of bytes it uses. You'd have to start asking some hard
> questions if you want other behaviour like:
>
> Why do you want the string's length? Do you want to truncate it? What if
> that
> creates an invalid sequence of characters somehow? Do you want to compare
> lengths or equality? Should text be normalized somehow? Which way?
>
> What should the base 'unit' be for a string? A code point? A character? A
> glyph? A grapheme? How would this be implemented?
>
> Best Regards,
> Erik Rybakken
>
>
> Cheers,
> Jookia.
> ___
> nix-dev mailing list
> nix-dev@lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev
>
>
> --
> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> Flying Circus Internet Operations GmbH · http://flyingcircus.io
> Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian.
> Zagrodnick
>
> ___
> nix-dev mailing list
> nix-dev@lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev
>
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-13 Thread Erik Rybakken
To anyone interested in this discussion, you may want to follow the
issue here: https://github.com/NixOS/nix/issues/770

There I wrote the reason I want this functionality. The main reason is
that I want to do as much as possible in Nix, instead of pushing stuff
to a building script.

Best,
Erik

On 2016-01-11 23:29, Erik Rybakken wrote:
> Hi,
> 
> In nix, when finding the length of a string containing non-ascii characters,
> the number of bytes in the representation is returned, instead of the actual
> number of characters:
> 
> > nix-repl> builtins.stringLength "å"
> > 2
> 
> Is there any way to get the number of characters instead, or does this
> require changes in the core language?
> 
> Best Regards,
> Erik Rybakken
> ___
> nix-dev mailing list
> nix-dev@lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-12 Thread Christian Theune
Hi,

there are sane approaches to dealing with Strings (encoded) vs. Text (decoded) 
properly. We might not be able to do this at the moment, but I find Python 
(3)’s byte/text model quite sane.

It might be too much for us to support this with a quick fix, but we should 
keep that on the radar, I guess.

Christian

> On 12 Jan 2016, at 18:26, Jookia <166...@gmail.com> wrote:
> 
> On Mon, Jan 11, 2016 at 11:29:37PM +, Erik Rybakken wrote:
>> Hi,
>> 
>> In nix, when finding the length of a string containing non-ascii characters,
>> the number of bytes in the representation is returned, instead of the actual
>> number of characters:
>> 
>>> nix-repl> builtins.stringLength "å"
>>> 2
>> 
>> Is there any way to get the number of characters instead, or does this
>> require changes in the core language?
> 
> It's probably best to leave it like it is now. A string's length is two if
> that's the number of bytes it uses. You'd have to start asking some hard
> questions if you want other behaviour like:
> 
> Why do you want the string's length? Do you want to truncate it? What if that
> creates an invalid sequence of characters somehow? Do you want to compare
> lengths or equality? Should text be normalized somehow? Which way?
> 
> What should the base 'unit' be for a string? A code point? A character? A
> glyph? A grapheme? How would this be implemented?
> 
>> Best Regards,
>> Erik Rybakken
> 
> Cheers,
> Jookia.
> ___
> nix-dev mailing list
> nix-dev@lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-12 Thread Jookia
On Mon, Jan 11, 2016 at 11:29:37PM +, Erik Rybakken wrote:
> Hi,
>
> In nix, when finding the length of a string containing non-ascii characters,
> the number of bytes in the representation is returned, instead of the actual
> number of characters:
>
> > nix-repl> builtins.stringLength "å"
> > 2
>
> Is there any way to get the number of characters instead, or does this
> require changes in the core language?

It's probably best to leave it like it is now. A string's length is two if
that's the number of bytes it uses. You'd have to start asking some hard
questions if you want other behaviour like:

Why do you want the string's length? Do you want to truncate it? What if that
creates an invalid sequence of characters somehow? Do you want to compare
lengths or equality? Should text be normalized somehow? Which way?

What should the base 'unit' be for a string? A code point? A character? A
glyph? A grapheme? How would this be implemented?

> Best Regards,
> Erik Rybakken

Cheers,
Jookia.
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


[Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-11 Thread Erik Rybakken
Hi,

In nix, when finding the length of a string containing non-ascii characters,
the number of bytes in the representation is returned, instead of the actual
number of characters:

> nix-repl> builtins.stringLength "å"
> 2

Is there any way to get the number of characters instead, or does this
require changes in the core language?

Best Regards,
Erik Rybakken
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] How to get correct length of a string containing non-ascii characters

2016-01-11 Thread Vladimír Čunát
Hi.

On 01/12/2016 12:29 AM, Erik Rybakken wrote:
> In nix, when finding the length of a string containing non-ascii characters,
> the number of bytes in the representation is returned

I'm fairly certain it would need changes to the core of the evaluator to
properly support UTF-8 (I assume that the format you have in mind). You
can file an issue.

Well, technically it should be possible to implement such a thing in
nixpkgs but I really wouldn't want to code that and it would be awfully
slow.

--Vladimir




smime.p7s
Description: S/MIME Cryptographic Signature
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev