Hello,
I have a question regarding text encoding of filenames on Unix
platforms. I’ve read the two related mailing list threads I could find
in the archive,
<https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg35875.html>
and
<https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg94184.html>.
Both of those explain that, on Unix platforms, the filename string is
passed unmodified by SQLite directly to the open() syscall.

From what I understand from reading a lot of information on the
Internet, this may or may not be correct, and nobody can agree. It seems
that, taking a survey of other software, GLib expects filenames to
always be UTF-8 but allows that to be changed via environment variable,
Qt expects filenames to always be in the locale encoding, and Coreutils
(“ls”) also expects filenames to be in the locale encoding (at least,
it sometimes decides to show filenames in '$'\XYZ escaped form, and it
decides whether or not to do that based on your $LANG and co.
variables, in a way which is consistent with it considering filename to
be locale encoded). It seems, though I could be wrong, that more people
fall on the “locale encoded” side than on the “always UTF-8” side
(though thank goodness it’s becoming less and less relevant as more and
more systems are running with UTF-8 locales anyway).

My question is this: In those two mailing list posts, it was explained
that SQLite’s current behaviour is to pass the string unmodified to the
open() syscall. Is this just an explanation of current behaviour, or is
it an official policy? That is to say, which of the following
statements is correct?

(1) SQLite developers believe that Unix filenames should be UTF-8 at the
syscall layer regardless of your locale, and therefore if your
particular box has a non-UTF-8 file on its disk, you shouldn’t be able
to access it.

(2) SQLite developers believe that Unix filenames should be
locale-encoded at the syscall layer, and therefore the missing
transcode is a bug.

(3) SQLite developers refuse to get into this argument and think it’s
up to the developer of the client application, who should pass a string
of whatever encoding they think right into sqlite_open() which in turn
passes it on to open().

I can’t really tell which of these is the official policy. If it’s #1,
the documentation and code are both fine, though it makes some files
inaccessible for some users. If it’s #2, the documentation and the code
are both wrong. If it’s #3, I think it would make sense if the
documentation were updated to explain this.

The reason I ask is because, in addition to the current behaviour (easy
to find out just by testing or reading the source code), I want some
idea of whether this might change in future. That is, if I just write
“known bug” and insert a workaround in my client code to pass a locale
string to sqlite_open() instead of a UTF-8 string, is that workaround
going to break sometime in future when the bug (if you consider it a
bug) gets fixed?

Thanks for the clarification, and please note that I am not subscribed
to this list so I would appreciate being included explicitly in replies.
-- 
Christopher Head

Attachment: pgpGC4T9j_U9d.pgp
Description: OpenPGP digital signature

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to