Pierre Neidhardt skribis:
>> The reference scanner, currently written in C++, traverses whole
>> directory trees. Being C++ it treats file names as byte arrays so it
>> doesn’t matter what the file name encoding is.
>
> But what matters then is that the filename encodings on the filesystem and
> The reference scanner, currently written in C++, traverses whole
> directory trees. Being C++ it treats file names as byte arrays so it
> doesn’t matter what the file name encoding is.
But what matters then is that the filename encodings on the filesystem and in
the
binary match, right?
>
Pierre Neidhardt skribis:
>> Every file in the store is properly scanned for references. It’s just
>> that users cannot create top-level items with a non-ASCII file name.
>
> So if '/gnu/store/...-foo/á' is stored as UTF-8 in a binary, then it will be
> found? Is it because the filesystem
Pierre Neidhardt skribis:
> Just to be sure I understand: non-toplevel, non-ASCII file names will
> not be scanned properly, right?
Every file in the store is properly scanned for references. It’s just
that users cannot create top-level items with a non-ASCII file name.
I hope this clarifies
Hello,
Mark H Weaver skribis:
> Pierre Neidhardt writes:
>
>>> : > Store file names are always ASCII so problems arise when they are stored
>>> : > as UTF-16 or UTF-32/UCS-4.
>>> :
>>> : I understand that most programs stick to ASCII filenames, but what about
>>> the odd
>>> : one using
Hi Danny,
Danny Milosavljevic writes:
> On Mon, 24 Dec 2018 13:12:23 -0500
> Mark H Weaver wrote:
>
>> Of course, the usual reason to choose UTF-32 is to support non-ASCII
>> characters while retaining fixed-width code points, so that string
>> lookups are straightforward and efficient.
>
>
Pierre Neidhardt writes:
>> : > Store file names are always ASCII so problems arise when they are stored
>> : > as UTF-16 or UTF-32/UCS-4.
>> :
>> : I understand that most programs stick to ASCII filenames, but what about
>> the odd
>> : one using non-English, special characters?
>>
>> That’s
Hi Mark,
On Mon, 24 Dec 2018 13:12:23 -0500
Mark H Weaver wrote:
> Of course, the usual reason to choose UTF-32 is to support non-ASCII
> characters while retaining fixed-width code points, so that string
> lookups are straightforward and efficient.
This kind of lookup is almost never what is