Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-09 Thread Al-Khanji Louai
It's not a platform bug. It's an application/framework (Qt) bug.


Unix paths are just a byte array, where certain bytes have special meaning 
(mostly just '/'). Passing around those byte arrays from/to platform functions 
will always work correctly.


The problem is that Qt tries to interpret those byte strings by converting them 
to unicode for use inside QString. There's no guarantee that this conversion 
will succeed using the current system encoding, as the file name may have 
previously been encoded with a different encoding.


On Unix, the correct way to handle paths is to never encode/decode the byte 
array.


Showing a path in a UI is a separate issue from just passing it around and/or 
modifying it.


-- Louai



From: development-bounces+louai.al-khanji=theqtcompany@qt-project.org 
development-bounces+louai.al-khanji=theqtcompany@qt-project.org on behalf 
of Konstantin Ritt ritt...@gmail.com
Sent: Thursday, October 9, 2014 3:44 AM
To: Marc Mutz
Cc: development@qt-project.org
Subject: Re: [Development] The life of a file name and other possibly 
mal-encoded strings on non-Windows systems

2014-10-09 3:57 GMT+04:00 Marc Mutz 
marc.m...@kdab.commailto:marc.m...@kdab.com:
Hi Julien,

On Tuesday 07 October 2014 14:30:59 Julien Blanc wrote:
 However, i agree that changing this would :
 * break a lot of code

No, it cannot, if, as I propose, it's added to Qt 5.

 * permit only to solve really lower level / corner case issues

The value lies _also_ in being able to iterate over weird filenames (where
weird simply means plugging in a USB stick into an otherwise UTF-8-only
system).

Qt doesn't mount that USB stick, Qt doesn't manage mounting [and whatever else 
system-wide] flags and settings; so should Qt ever care about some 
platform/misconfiguration issues?
IMO, issues like this one should (or even must) be fixed at a platform level, 
whilst high-level frameworks should not even try to workaround them. This is 
exactly what we were decided to do with the space character(s) at the end of 
file name issue on Windows, BTW.

Regards,
Konstantin
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-09 Thread Kuba Ober
On Tue, Oct 7, 2014 at 11:41 AM, Thiago Macieira thiago.macie...@intel.com
wrote:

 On Tuesday 07 October 2014 10:38:47 Kuba Ober wrote:
  Just to be very clear: it is currently impossible to make a truly
 portable
  file management utility with Qt’s core APIs. Why? Because it will simply
  ignore all file names that it can’t decode when iterating the directory,
  and it won’t be able to take commandline arguments to open such files
  either. Furthermore, this is something that very basic C code using
 nothing
  but POSIX APIs can trivially deal with. Or that Python 2 trivially deals
  with. I consider it a serious enough problem.

 That's where we disagree: those file names are not common at all.


I have a server that has had its filesystems established at the time of
RHEL 2, first Samba releases, with Windows NT 4 clients. There were
thousands of such files, I eventually grew tired of them being invisible to
Qt and fixed them all. Interestingly enough, on Windows machines the Qt
would see the files, because Samba was pretending really hard that the
names were representable in UTF-16.

The problem manifests itself almost anytime you plug in a small USB memory
stick that has localized file names on FAT-16 into any Unix system, from
the most modern OS X to legacy stuff that seems to have just learned that
USB storage exists. Sure, you could argue that the distribution should be
set up to ask the user for filename encoding on such a medium, or to
transparently do a best-guess transcoding to UTF-8, or whatnot. But the
reality is that none of this happens, and the files become invisible. The
worst part of the problem is that usually not all files with non-ASCII
characters vanish, only those that are not valid UTF-8 code unit sequences
do. It's a behavior that utterly confuses the users.

Cheers, Kuba
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-09 Thread Thiago Macieira
On Thursday 09 October 2014 01:57:05 Marc Mutz wrote:
 The value lies _also_ in being able to iterate over weird filenames
 (where  weird simply means plugging in a USB stick into an otherwise
 UTF-8-only system).

Mind you these two facts:
* USB mass-storage devices came into being after Linux switched to UTF-8
* The number of USB flash drives with Unix filesystems is incredibly small

That means the number of USB flash drives containing Unix filesystems that 
aren't UTF-8 encoded will be effectively zero.

VFAT doesn't count, since it stores the file names as UTF-16. If you're not 
getting the right file names, your system is misconfigured.

The only one that poses trouble are ISO-9660 CD-ROMs that have Rock Ridge 
extensions for Unix attributes and longer file names. Do people still have CD 
drives?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center

___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-09 Thread Thiago Macieira
On Thursday 09 October 2014 02:46:49 Kuba Ober wrote:
 The problem manifests itself almost anytime you plug in a small USB memory
 stick that has localized file names on FAT-16 into any Unix system

You mean a pre-Windows 95 floppy? Are you sure you have a floppy drive?

Everything since Windows 95 should be using VFAT, which means the file names 
are stored in UTF-16. There's no guessing. It's just plain system 
misconfiguration if you don't get it right.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center

___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-09 Thread Julien Blanc
On 09/10/2014 09:27, Thiago Macieira wrote:
 The only one that poses trouble are ISO-9660 CD-ROMs that have Rock Ridge
 extensions for Unix attributes and longer file names. Do people still have CD
 drives?

People also have zip files, which unfortunately may have various 
encoding in them, since the only normative encoding for zip files is cp437…

I have a bunch of such zip files, that :
- cannot be extracted with gui tools
- will result in bad filenames (including invalid utf-8) when extracted 
using « stupid » command line tools.

IMHO saying « the problem does not exist » is not a good answer, because 
if it really didn’t this issue would never have been raised. The 
questions to answer are :
- is it worth breaking lot of code ? (because it will : a good solution 
needs a complete refactor of qt io code, just providing a QFilePath 
class will not be enough)
- will it be ready before c++ provides a core solution ?
- is there someone willing to do it ?

Regards,

Julien Blanc
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-09 Thread Thiago Macieira
On Thursday 09 October 2014 09:55:36 Julien Blanc wrote:
 On 09/10/2014 09:27, Thiago Macieira wrote:
  The only one that poses trouble are ISO-9660 CD-ROMs that have Rock Ridge
  extensions for Unix attributes and longer file names. Do people still have
  CD drives?
 
 People also have zip files, which unfortunately may have various
 encoding in them, since the only normative encoding for zip files is cp437…

And I've said time and again that the bug lies with the unzipping application. 
Copying CP437 encoded names is nonsense. Besides, CP850 is the best.

 I have a bunch of such zip files, that :
 - cannot be extracted with gui tools
 - will result in bad filenames (including invalid utf-8) when extracted
 using « stupid » command line tools.
 
 IMHO saying « the problem does not exist » is not a good answer, because
 if it really didn’t this issue would never have been raised. The
 questions to answer are :
 - is it worth breaking lot of code ? (because it will : a good solution
 needs a complete refactor of qt io code, just providing a QFilePath
 class will not be enough)
 - will it be ready before c++ provides a core solution ?
 - is there someone willing to do it ?

I didn't say the problem doesn't exist. It exists.

I'm saying it's smaller than what people are making it to be in this thread.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center

___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-08 Thread Thiago Macieira
On Tuesday 07 October 2014 23:31:11 Christoph Feck wrote:
 On Tuesday 07 October 2014 23:19:23 Tony Van Eerd wrote:
   The problem is serious enough, indeed, that Python 3 has resorted
   to a hack where they use a private Unicode range to encode the
   bytes between 128 and 255 in strings that fail normal decoding.
   I think that putting this hack into QString is unthinkable, and
   the concept of a platform string has to be taken with heads up
   and in a manner that will make it useful, usable and
   unobtrusive. I don't claim that it's a trivial task, but then
   I'm not asking anyone else but myself to deal with it :)
   
   Cheers, Kuba
  
  I think that hack should be given serious consideration.  Sure it
  is a hack, but it might still be the best solution.
 
 We are using the same hack in KDE4Libs, but it relies on
 QFile::setDecodingFunction. Unfortunately, this function is no longer
 available in Qt5, so in a few years, we will see the same long
 discussion as in https://bugs.kde.org/show_bug.cgi?id=165044

That was done because from Qt 3 to early Qt 4, that's exactly what we did: the 
UTF-8 decoder used private-use characters to notify that it had bad decodings.

Python's solution is a copy of Qt's.
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center

___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-08 Thread Thiago Macieira
On Wednesday 08 October 2014 09:44:56 Tomasz Siekierda wrote:
 On Tuesday 07 October 2014 23:31:11 Christoph Feck wrote:
  We are using the same hack in KDE4Libs, but it relies on
  QFile::setDecodingFunction. Unfortunately, this function is no longer
  available in Qt5, so in a few years, we will see the same long
  discussion as in https://bugs.kde.org/show_bug.cgi?id=165044
 
 Thank you for that link, I've read through the whole discussion, and I
 have to say I had no idea this problem can be that serious; I don't
 think I've ever encountered it myself. But it seems to be rather
 widespread and even made some people frustrated enough to stop using
 KDE :-(

It isn't that widespread. This used to be more of a problem in 2003 when Linux 
started to move from Latin1 and Latin9 to UTF-8.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center

___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-08 Thread Oswald Buddenhagen
On Mon, Oct 06, 2014 at 05:21:45PM +0200, Thiago Macieira wrote:
 File names that cannot be decoded using the locale codec are
 considered filesystem corruption and are silently dropped. They won't
 appear in directory listings.
 
 This was discussed to exhaustion in Qt 5's development process.
 
link:
http://lists.qt-project.org/pipermail/development/2012-June/004276.html

 The conclusion is to remain at status quo since there is no good,
 technical solution.

i actually don't see that conclusion. you mostly ignored the arguments
about doing some kind of 8-bit passthrough. you sort of liked the idea
to make QFile deal directly with urls, but it's obvious that this is a
non-starter due to the effort required at higher levels. i still like my
idea to do something akin to punycode ...
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-08 Thread Thiago Macieira
On Wednesday 08 October 2014 12:10:07 Oswald Buddenhagen wrote:
 On Mon, Oct 06, 2014 at 05:21:45PM +0200, Thiago Macieira wrote:
  File names that cannot be decoded using the locale codec are
  considered filesystem corruption and are silently dropped. They won't
  appear in directory listings.
  
  This was discussed to exhaustion in Qt 5's development process.
 
 link:
 http://lists.qt-project.org/pipermail/development/2012-June/004276.html
 
  The conclusion is to remain at status quo since there is no good,
  technical solution.
 
 i actually don't see that conclusion. you mostly ignored the arguments
 about doing some kind of 8-bit passthrough. you sort of liked the idea
 to make QFile deal directly with urls, but it's obvious that this is a
 non-starter due to the effort required at higher levels. i still like my
 idea to do something akin to punycode ...

The proposal by Kuba would be to have everything use a QFileName or similar 
class. So why not just use QUrl?
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center

___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-08 Thread Oswald Buddenhagen
On Wed, Oct 08, 2014 at 12:16:55PM +0200, Thiago Macieira wrote:
 The proposal by Kuba would be to have everything use a QFileName or
 similar class. So why not just use QUrl?
 
yes, i think kuba's proposal is just as unpractical as the qurl one.

i'd have a much more favorable view towards a QFileName class if it
actually had some functionality of QFileInfo moved into it. but such
a refactoring of the entire i/o class structure (hey, another thing we
mentioned at the summit!) is a significantly bigger task than anyone was
willing to take upon so far.

therefore i think the only solution that is in scope (for qt 5 at least)
is re-introducing escaping at the places where recoding must be done.
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-08 Thread Marc Mutz
Hi Julien,

On Tuesday 07 October 2014 14:30:59 Julien Blanc wrote:
 However, i agree that changing this would :
 * break a lot of code

No, it cannot, if, as I propose, it's added to Qt 5.

 * permit only to solve really lower level / corner case issues

The value lies _also_ in being able to iterate over weird filenames (where 
weird simply means plugging in a USB stick into an otherwise UTF-8-only 
system).

But the value of a QFilePath class _mainly_ lies in distingiushing strings 
from file paths. Google Type-Rich Interfaces Bjarne Stroustroup.

 * be redundant with the std::filesystem api when it will be standardized 
 (hopefully for C++17). I hope Qt will then add the relevant overrides to 
 make its use possible anywhere where relevant.

That's another benefit of QFilePath: it's simple to add a QFilePath::
{to,from}StdFileSystem(), and that's all we'll need to do until Qt requires 
C++17.

Thanks,
Marc

-- 
Qt Developer Days 2014 - October 6 - 8 at BCC, Berlin

Marc Mutz marc.m...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH  Co.KG, a KDAB Group Company
www.kdab.com || Germany +49-30-521325470 || Sweden (HQ) +46-563-540090
KDAB - Qt Experts - Platform-Independent Software Solutions
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-08 Thread Konstantin Ritt
2014-10-09 3:57 GMT+04:00 Marc Mutz marc.m...@kdab.com:

 Hi Julien,

 On Tuesday 07 October 2014 14:30:59 Julien Blanc wrote:
  However, i agree that changing this would :
  * break a lot of code

 No, it cannot, if, as I propose, it's added to Qt 5.

  * permit only to solve really lower level / corner case issues

 The value lies _also_ in being able to iterate over weird filenames
 (where
 weird simply means plugging in a USB stick into an otherwise UTF-8-only
 system).


Qt doesn't mount that USB stick, Qt doesn't manage mounting [and whatever
else system-wide] flags and settings; so should Qt ever care about some
platform/misconfiguration issues?
IMO, issues like this one should (or even must) be fixed at a platform
level, whilst high-level frameworks should not even try to workaround them.
This is exactly what we were decided to do with the space character(s) at
the end of file name issue on Windows, BTW.

Regards,
Konstantin
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-07 Thread Tomasz Siekierda
 This was discussed to exhaustion in Qt 5's development process. The 
 conclusion
 is to remain at status quo since there is no good, technical solution.

 I’d think that the solution could be to use a dedicated class for file 
 names, perhaps with a base class for uninterpreted platform strings.

Ugh, that begins to sound like Java. Let's have a wrapper for a
wrapper... please don't go that way.

  How do you pass it on the command-line? Mind you, QProcess takes a
  QStringList
  for arguments.

 It look as if we’d need something like QPlatformString that’s a “thin”
 wrapper
 around a QByteArray on unices, and around QString on Windows.

 No, thanks! :)

I fully agree with no, thanks
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-07 Thread Marc Mutz
Hi Kuba,

Your criticisms are completely valid, and the conclusions you draw from them 
are, too. The problems Thiago lists make this a daunting task, but mostly not 
because of complexity, but of sheer volume of code that needs to be modified.

I believe it's worth it, but most of us here lack the time for such a change, 
so you're more than welcome to volunteer. I can at least help with review, and 
once the class is in and QDir*/QFile* are adapted to use it, I'd start using 
it in QtWidgets right away.

Many of the issues Thiago lists can be dodged. E.g. a file path class could 
lack streaming operators and instead force early users to use 
toEncoded()/toDecoded() explicitly.

On Monday 06 October 2014 19:30:29 Kuba Ober wrote:
 I’d think that the solution could be to use a dedicated class for file
 names,

Yes, with s/file/path/.

 perhaps with a base class for uninterpreted platform strings.

Value classes should not have base classes. And that class for uninterpreted 
platform strings may be an abstraction vehicle to streamline implementation of 
path name class, but not public API, unless you provide one or two more users 
of such API.

Thanks,
Marc

-- 
Qt Developer Days 2014 - October 6 - 8 at BCC, Berlin

Marc Mutz marc.m...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH  Co.KG, a KDAB Group Company
www.kdab.com || Germany +49-30-521325470 || Sweden (HQ) +46-563-540090
KDAB - Qt Experts - Platform-Independent Software Solutions
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-07 Thread Kuba Ober

On Oct 7, 2014, at 8:30 AM, Julien Blanc julien.bl...@nmc-company.com wrote:

 On 07/10/2014 12:11, Tomasz Siekierda wrote:
 For file paths, I feel QString is really enough.
 Changing it to something else because of a few corner cases seems like
 an overkill to me. We already have a lot of classes that are connected
 with paths and the file system (QFile, QFileInfo, QDir, QDirIterator,
 and more), that is enough. In my view, at least.
 
 Imho using QString for file path (or, more generally, using any string 
 objects with a static api) is somewhat a very widespread bad idea. The 
 std::experimental::filesystem  api, for example, looks really better.

Basically, on Unix, the idea that a file path has any particular encoding
doesn’t hold *by the very design*. On Unix, a file path is a string of nonzero
bytes with a special meaning for ‘/‘ and ‘.’ and that’s it. It’s a safe
assumption that other bytes under 128 are ASCII, perhaps, but even that’s not
an assumption one has to make.

Armin describes it rather succinctly:
it's a byte mess that for display purposes is decoded with an encoding hint.

Given that Qt’s lifecycle is very different from that of standard C++, whatever
solution Qt provides needs to stand on its own.

Cheers, Kuba
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-07 Thread Alejandro Exojo
El Tuesday 07 October 2014, Tomasz Siekierda escribió:
 For file paths, I feel QString is really enough.
 Changing it to something else because of a few corner cases seems like
 an overkill to me.

Just for the sake of documenting the issue and pointing to this thread if 
future questions arise: Is there some solution for those corner cases?

Say one writes a file manager with Qt, and has to support that one file name 
with a wrong encoding could be renamed to the right one. Should that person 
skip the Qt classes?

BTW, subject says non-Windows systems, but IIRC Mac OS X doesn't allow two 
files with equivalent names (e.g. composed vs the precomposed characters that 
are equivalent). Does it apply as well?

-- 
Alex (a.k.a. suy) | GPG ID 0x0B8B0BC2
http://barnacity.net/ | http://disperso.net
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-07 Thread Thiago Macieira
On Tuesday 07 October 2014 10:38:47 Kuba Ober wrote:
 Just to be very clear: it is currently impossible to make a truly portable
 file management utility with Qt’s core APIs. Why? Because it will simply
 ignore all file names that it can’t decode when iterating the directory,
 and it won’t be able to take commandline arguments to open such files
 either. Furthermore, this is something that very basic C code using nothing
 but POSIX APIs can trivially deal with. Or that Python 2 trivially deals
 with. I consider it a serious enough problem.

That's where we disagree: those file names are not common at all.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center

___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-07 Thread Tony Van Eerd
 
 The problem is serious enough, indeed, that Python 3 has resorted to a hack
 where they use a private Unicode range to encode the bytes between 128
 and 255 in strings that fail normal decoding. I think that putting this hack 
 into
 QString is unthinkable, and the concept of a platform string has to be taken
 with heads up and in a manner that will make it useful, usable and
 unobtrusive. I don't claim that it's a trivial task, but then I'm not asking
 anyone else but myself to deal with it :)
 
 Cheers, Kuba

I think that hack should be given serious consideration.  Sure it is a hack, 
but it might still be the best solution.

Tony

___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-07 Thread Christoph Feck
On Tuesday 07 October 2014 23:19:23 Tony Van Eerd wrote:
  The problem is serious enough, indeed, that Python 3 has resorted
  to a hack where they use a private Unicode range to encode the
  bytes between 128 and 255 in strings that fail normal decoding.
  I think that putting this hack into QString is unthinkable, and
  the concept of a platform string has to be taken with heads up
  and in a manner that will make it useful, usable and
  unobtrusive. I don't claim that it's a trivial task, but then
  I'm not asking anyone else but myself to deal with it :)
  
  Cheers, Kuba
 
 I think that hack should be given serious consideration.  Sure it
 is a hack, but it might still be the best solution.

We are using the same hack in KDE4Libs, but it relies on 
QFile::setDecodingFunction. Unfortunately, this function is no longer 
available in Qt5, so in a few years, we will see the same long 
discussion as in https://bugs.kde.org/show_bug.cgi?id=165044

-- 
Christoph Feck
http://kdepepo.wordpress.com/
KDE Quality Team
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


[Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-06 Thread Kuba Ober
I’ve just read Armin Ronacher’s blog post about Unicode in Python 3, and he has 
highlighted a possible problem with “Unicode everywhere” approach to things 
that come from byte APIs like one has on Unix.

File names can have any encoding, and a file name not being valid under the 
encoding given in LC_CTYPE or LANG shouldn’t cause any failures when attempting 
to open the file. After all, when using the byte APIs, everything “just works”. 
About the only problem might be when presenting the name to the user, but even 
then, the name must simply exist in two forms: the native byte form, and a 
QString with some question marks in it.

Thus, how does Qt deal with a directory listing with such “invalid” file names? 
Do they survive the round-trip through a QString and QDirIterator? Would it be 
worthwhile to tackle this issue in a better fashion (whatever it might be) for 
Qt 6?

Cheers, Kuba Ober
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-06 Thread Thiago Macieira
On Monday 06 October 2014 11:12:57 Kuba Ober wrote:
 Thus, how does Qt deal with a directory listing with such “invalid” file
 names? Do they survive the round-trip through a QString and QDirIterator?
 Would it be worthwhile to tackle this issue in a better fashion (whatever
 it might be) for Qt 6?

File names that cannot be decoded using the locale codec are considered 
filesystem corruption and are silently dropped. They won't appear in directory 
listings.

This was discussed to exhaustion in Qt 5's development process. The conclusion 
is to remain at status quo since there is no good, technical solution.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center

___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-06 Thread Kuba Ober
On Oct 6, 2014, at 11:21 AM, Thiago Macieira thiago.macie...@intel.com wrote:

 On Monday 06 October 2014 11:12:57 Kuba Ober wrote:
 Thus, how does Qt deal with a directory listing with such “invalid” file
 names? Do they survive the round-trip through a QString and QDirIterator?
 Would it be worthwhile to tackle this issue in a better fashion (whatever
 it might be) for Qt 6?
 
 File names that cannot be decoded using the locale codec are considered 
 filesystem corruption and are silently dropped. They won't appear in 
 directory 
 listings.
 
 This was discussed to exhaustion in Qt 5's development process. The 
 conclusion 
 is to remain at status quo since there is no good, technical solution.

I’d think that the solution could be to use a dedicated class for file names, 
perhaps with a base class for uninterpreted platform strings. One would also 
need to think about console I/O, since no matter what the encoding is, it’d be 
nice to remain somewhat at the level of what bare C provides. For example, a C 
program doesn’t need to know the encoding of file names nor the encoding used 
by the console - as long as they are one and the same, printing file names will 
“just work”. That’s of course assuming that the encoding is ASCII-compatible.

So, I’m specifically thinking of:

1. Retaining platform-specific representation of a file name in a class like 
QFileName (it’d be a QString wrapper on Windows, for example).

2. Sending it out to `QTextStream(stdout)`, and inputting it from 
`QTextStream(stdin)`.

3. Using the platform representation in addition to visible representation in 
the filesystem model.

Would something like that have any chance of getting accepted into Qt 6, if 
done “properly”? Obviously I haven’t thought out the details yet and there may 
be a need for several prototypes before it turns into anything remotely 
acceptable.

Cheers, Kuba
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-06 Thread Thiago Macieira
On Monday 06 October 2014 13:30:29 Kuba Ober wrote:
  This was discussed to exhaustion in Qt 5's development process. The
  conclusion is to remain at status quo since there is no good, technical
  solution.

 I’d think that the solution could be to use a dedicated class for file
 names, perhaps with a base class for uninterpreted platform strings.

We thought about that. The problem comes when you need to pass the file name 
through other means besides the file management functions.

How do you write it to a file using QIODevice? How do you write it to a file 
using QTextStream?

How do you pass it on the command-line? Mind you, QProcess takes a QStringList 
for arguments.

What should QCommandLineParser do? Assume that all arguments a QFileName?

What happens when you display the file name for the user in a text edit? What 
happens if the user edits the text?

Will we replace all of the QString-based API with QFileName instead? QPixmap, 
QLibrary, QPluginLoader, etc.

And how will you teach people to concatenate using the correct functions, 
instead of manipulating using QString?

Also remember that everything is the opposite on Windows: the file names are 
really UTF-16 and cannot be encoded in the locale's 8-bit encoding.

 So, I’m specifically thinking of:
 
 1. Retaining platform-specific representation of a file name in a class like
 QFileName (it’d be a QString wrapper on Windows, for example).
 
 2. Sending it out to `QTextStream(stdout)`, and inputting it from
 `QTextStream(stdin)`.
 
 3. Using the platform representation in addition to visible representation
 in the filesystem model.
 
 Would something like that have any chance of getting accepted into Qt 6, if
 done “properly”? Obviously I haven’t thought out the details yet and there
 may be a need for several prototypes before it turns into anything remotely
 acceptable.

Chance, yes, if you can fix the rest of the problems.

This is a bigger problem than you realise.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center

___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-06 Thread Kuba Ober
On Oct 6, 2014, at 2:45 PM, Thiago Macieira thiago.macie...@intel.com wrote:

 On Monday 06 October 2014 13:30:29 Kuba Ober wrote:
 This was discussed to exhaustion in Qt 5's development process. The
 conclusion is to remain at status quo since there is no good, technical
 solution.
 
 I’d think that the solution could be to use a dedicated class for file
 names, perhaps with a base class for uninterpreted platform strings.
 
 We thought about that. The problem comes when you need to pass the file name 
 through other means besides the file management functions.
 
 How do you write it to a file using QIODevice? How do you write it to a file 
 using QTextStream?

How about:

1. Byte-oriented system: QIODevice: output as if it were a QByteArray.
   QTextStream: output while bypassing any encoding, as if it were already 
encoded.

2. UTF-16 (Windows): the underlying data is QString already.

 How do you pass it on the command-line? Mind you, QProcess takes a 
 QStringList 
 for arguments.

It look as if we’d need something like QPlatformString that’s a “thin” wrapper
around a QByteArray on unices, and around QString on Windows. QProcess would be
expanded to accept a QListQVariant and QListQPlatformString in addition to
QStringList.Of course QPlatformString would need to be handled by QVariant.

 What should QCommandLineParser do? Assume that all arguments a QFileName?

Not necessarily, but they’d be QPlatformString. A QPlatformString could be
decoded into a QString, but not implicitly converted into it. The parser would
need to be able to process those strings. I presume that they’d need to be
treated as “anything 128 is ASCII, anything above is left alone”.

 What happens when you display the file name for the user in a text edit? What 
 happens if the user edits the text?

Here there’s only one sane behavior: decoder errors are replaced with question
marks, *and* if there were no changes to the text, the original representation
is preserved. QTextEdit and friends would need to support both QString and
QPlatformString.

 Will we replace all of the QString-based API with QFileName instead? QPixmap, 
 QLibrary, QPluginLoader, etc.

Maybe not replace, but expand: anywhere a QString is taken for a file name,
a QFileName or QPlatformString would be accepted as well. I’d think that
QFileName or QFilePath could be publicly derived from QString to indicate 
intent,
but perhaps even that wouldn’t be necessary.

 And how will you teach people to concatenate using the correct functions, 
 instead of manipulating using QString?

There’d be no implicit conversions to/from QString, and it’d need to support
concatenation with ASCII.

 Also remember that everything is the opposite on Windows: the file names are 
 really UTF-16 and cannot be encoded in the locale's 8-bit encoding.

That’s why on Windows, it’d be QString all the way down even when we pretend
otherwise. It’s really Unix that’s the bastard child…


It’s pretty clear, I think, that for source compatibility all of the QString
filename APIs would need to be retained and perhaps deprecated when 6.0 comes 
by.

I’m thinking this could be made BC and done for some 5.x. I volunteer, but any
prototypes are still a month-two away at this point.

Cheers, Kuba
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems

2014-10-06 Thread Konstantin Ritt
2014-10-06 23:53 GMT+04:00 Kuba Ober k...@mareimbrium.org:

 On Oct 6, 2014, at 2:45 PM, Thiago Macieira thiago.macie...@intel.com
 wrote:

  How do you pass it on the command-line? Mind you, QProcess takes a
 QStringList
  for arguments.

 It look as if we’d need something like QPlatformString that’s a “thin”
 wrapper
 around a QByteArray on unices, and around QString on Windows.

No, thanks! :)

Konstantin
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development