Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
2014-10-09 12:18 GMT+04:00 Thiago Macieira : > On Thursday 09 October 2014 09:55:36 Julien Blanc wrote: > > On 09/10/2014 09:27, Thiago Macieira wrote: > > > The only one that poses trouble are ISO-9660 CD-ROMs that have Rock > Ridge > > > extensions for Unix attributes and longer file names. Do people still > have > > > CD drives? > > > > People also have zip files, which unfortunately may have various > > encoding in them, since the only normative encoding for zip files is > cp437… > > And I've said time and again that the bug lies with the unzipping > application. > Copying CP437 encoded names is nonsense. Besides, CP850 is the best. > > > Actually, .ZIP format uses Unicode for quite a while (see http://www.pkware.com/documents/casestudies/APPNOTE.TXT , APPENDIX D - Language Encoding (EFS)). Regards, Konstantin ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Thursday 09 October 2014 09:55:36 Julien Blanc wrote: > On 09/10/2014 09:27, Thiago Macieira wrote: > > The only one that poses trouble are ISO-9660 CD-ROMs that have Rock Ridge > > extensions for Unix attributes and longer file names. Do people still have > > CD drives? > > People also have zip files, which unfortunately may have various > encoding in them, since the only normative encoding for zip files is cp437… And I've said time and again that the bug lies with the unzipping application. Copying CP437 encoded names is nonsense. Besides, CP850 is the best. > I have a bunch of such zip files, that : > - cannot be extracted with gui tools > - will result in bad filenames (including invalid utf-8) when extracted > using « stupid » command line tools. > > IMHO saying « the problem does not exist » is not a good answer, because > if it really didn’t this issue would never have been raised. The > questions to answer are : > - is it worth breaking lot of code ? (because it will : a good solution > needs a complete refactor of qt io code, just providing a QFilePath > class will not be enough) > - will it be ready before c++ provides a core solution ? > - is there someone willing to do it ? I didn't say the problem doesn't exist. It exists. I'm saying it's smaller than what people are making it to be in this thread. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On 09/10/2014 09:27, Thiago Macieira wrote: > The only one that poses trouble are ISO-9660 CD-ROMs that have Rock Ridge > extensions for Unix attributes and longer file names. Do people still have CD > drives? People also have zip files, which unfortunately may have various encoding in them, since the only normative encoding for zip files is cp437… I have a bunch of such zip files, that : - cannot be extracted with gui tools - will result in bad filenames (including invalid utf-8) when extracted using « stupid » command line tools. IMHO saying « the problem does not exist » is not a good answer, because if it really didn’t this issue would never have been raised. The questions to answer are : - is it worth breaking lot of code ? (because it will : a good solution needs a complete refactor of qt io code, just providing a QFilePath class will not be enough) - will it be ready before c++ provides a core solution ? - is there someone willing to do it ? Regards, Julien Blanc ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Thursday 09 October 2014 02:46:49 Kuba Ober wrote: > The problem manifests itself almost anytime you plug in a small USB memory > stick that has localized file names on FAT-16 into any Unix system You mean a pre-Windows 95 floppy? Are you sure you have a floppy drive? Everything since Windows 95 should be using VFAT, which means the file names are stored in UTF-16. There's no guessing. It's just plain system misconfiguration if you don't get it right. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Thursday 09 October 2014 01:57:05 Marc Mutz wrote: > The value lies _also_ in being able to iterate over "weird" filenames > (where weird simply means plugging in a USB stick into an otherwise > UTF-8-only system). Mind you these two facts: * USB mass-storage devices came into being after Linux switched to UTF-8 * The number of USB flash drives with Unix filesystems is incredibly small That means the number of USB flash drives containing Unix filesystems that aren't UTF-8 encoded will be effectively zero. VFAT doesn't count, since it stores the file names as UTF-16. If you're not getting the right file names, your system is misconfigured. The only one that poses trouble are ISO-9660 CD-ROMs that have Rock Ridge extensions for Unix attributes and longer file names. Do people still have CD drives? -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Tue, Oct 7, 2014 at 11:41 AM, Thiago Macieira wrote: > On Tuesday 07 October 2014 10:38:47 Kuba Ober wrote: > > Just to be very clear: it is currently impossible to make a truly > portable > > file management utility with Qt’s core APIs. Why? Because it will simply > > ignore all file names that it can’t decode when iterating the directory, > > and it won’t be able to take commandline arguments to open such files > > either. Furthermore, this is something that very basic C code using > nothing > > but POSIX APIs can trivially deal with. Or that Python 2 trivially deals > > with. I consider it a serious enough problem. > > That's where we disagree: those file names are not common at all. I have a server that has had its filesystems established at the time of RHEL 2, first Samba releases, with Windows NT 4 clients. There were thousands of such files, I eventually grew tired of them being invisible to Qt and fixed them all. Interestingly enough, on Windows machines the Qt would see the files, because Samba was pretending really hard that the names were representable in UTF-16. The problem manifests itself almost anytime you plug in a small USB memory stick that has localized file names on FAT-16 into any Unix system, from the most modern OS X to legacy stuff that seems to have just learned that USB storage exists. Sure, you could argue that the distribution should be set up to ask the user for filename encoding on such a medium, or to transparently do a best-guess transcoding to UTF-8, or whatnot. But the reality is that none of this happens, and the files become invisible. The worst part of the problem is that usually not all files with non-ASCII characters vanish, only those that are not valid UTF-8 code unit sequences do. It's a behavior that utterly confuses the users. Cheers, Kuba ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
It's not a platform bug. It's an application/framework (Qt) bug. Unix paths are just a byte array, where certain bytes have special meaning (mostly just '/'). Passing around those byte arrays from/to platform functions will always work correctly. The problem is that Qt tries to interpret those byte strings by converting them to unicode for use inside QString. There's no guarantee that this conversion will succeed using the current system encoding, as the file name may have previously been encoded with a different encoding. On Unix, the "correct" way to handle paths is to never encode/decode the byte array. Showing a path in a UI is a separate issue from just passing it around and/or modifying it. -- Louai From: development-bounces+louai.al-khanji=theqtcompany@qt-project.org on behalf of Konstantin Ritt Sent: Thursday, October 9, 2014 3:44 AM To: Marc Mutz Cc: development@qt-project.org Subject: Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems 2014-10-09 3:57 GMT+04:00 Marc Mutz mailto:marc.m...@kdab.com>>: Hi Julien, On Tuesday 07 October 2014 14:30:59 Julien Blanc wrote: > However, i agree that changing this would : > * break a lot of code No, it cannot, if, as I propose, it's added to Qt 5. > * permit only to solve really lower level / corner case issues The value lies _also_ in being able to iterate over "weird" filenames (where weird simply means plugging in a USB stick into an otherwise UTF-8-only system). Qt doesn't mount that USB stick, Qt doesn't manage mounting [and whatever else system-wide] flags and settings; so should Qt ever care about some platform/misconfiguration issues? IMO, issues like this one should (or even must) be fixed at a platform level, whilst high-level frameworks should not even try to workaround them. This is exactly what we were decided to do with the "space character(s) at the end of file name" issue on Windows, BTW. Regards, Konstantin ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
2014-10-09 3:57 GMT+04:00 Marc Mutz : > Hi Julien, > > On Tuesday 07 October 2014 14:30:59 Julien Blanc wrote: > > However, i agree that changing this would : > > * break a lot of code > > No, it cannot, if, as I propose, it's added to Qt 5. > > > * permit only to solve really lower level / corner case issues > > The value lies _also_ in being able to iterate over "weird" filenames > (where > weird simply means plugging in a USB stick into an otherwise UTF-8-only > system). > Qt doesn't mount that USB stick, Qt doesn't manage mounting [and whatever else system-wide] flags and settings; so should Qt ever care about some platform/misconfiguration issues? IMO, issues like this one should (or even must) be fixed at a platform level, whilst high-level frameworks should not even try to workaround them. This is exactly what we were decided to do with the "space character(s) at the end of file name" issue on Windows, BTW. Regards, Konstantin ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
Hi Julien, On Tuesday 07 October 2014 14:30:59 Julien Blanc wrote: > However, i agree that changing this would : > * break a lot of code No, it cannot, if, as I propose, it's added to Qt 5. > * permit only to solve really lower level / corner case issues The value lies _also_ in being able to iterate over "weird" filenames (where weird simply means plugging in a USB stick into an otherwise UTF-8-only system). But the value of a QFilePath class _mainly_ lies in distingiushing strings from file paths. Google "Type-Rich Interfaces Bjarne Stroustroup". > * be redundant with the std::filesystem api when it will be standardized > (hopefully for C++17). I hope Qt will then add the relevant overrides to > make its use possible anywhere where relevant. That's another benefit of QFilePath: it's simple to add a QFilePath:: {to,from}StdFileSystem(), and that's all we'll need to do until Qt requires C++17. Thanks, Marc -- Qt Developer Days 2014 - October 6 - 8 at BCC, Berlin Marc Mutz | Senior Software Engineer KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company www.kdab.com || Germany +49-30-521325470 || Sweden (HQ) +46-563-540090 KDAB - Qt Experts - Platform-Independent Software Solutions ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Wed, Oct 08, 2014 at 12:16:55PM +0200, Thiago Macieira wrote: > The proposal by Kuba would be to have everything use a QFileName or > similar class. So why not just use QUrl? > yes, i think kuba's proposal is just as unpractical as the qurl one. i'd have a much more favorable view towards a QFileName class if it actually had some functionality of QFileInfo moved into it. but such a refactoring of the entire i/o class structure (hey, another thing we mentioned at the summit!) is a significantly bigger task than anyone was willing to take upon so far. therefore i think the only solution that is in scope (for qt 5 at least) is re-introducing escaping at the places where recoding must be done. ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Wednesday 08 October 2014 12:10:07 Oswald Buddenhagen wrote: > On Mon, Oct 06, 2014 at 05:21:45PM +0200, Thiago Macieira wrote: > > File names that cannot be decoded using the locale codec are > > considered filesystem corruption and are silently dropped. They won't > > appear in directory listings. > > > > This was discussed to exhaustion in Qt 5's development process. > > link: > http://lists.qt-project.org/pipermail/development/2012-June/004276.html > > > The conclusion is to remain at status quo since there is no good, > > technical solution. > > i actually don't see that conclusion. you mostly ignored the arguments > about doing some kind of 8-bit passthrough. you sort of liked the idea > to make QFile deal directly with urls, but it's obvious that this is a > non-starter due to the effort required at higher levels. i still like my > idea to do something akin to punycode ... The proposal by Kuba would be to have everything use a QFileName or similar class. So why not just use QUrl? -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Mon, Oct 06, 2014 at 05:21:45PM +0200, Thiago Macieira wrote: > File names that cannot be decoded using the locale codec are > considered filesystem corruption and are silently dropped. They won't > appear in directory listings. > > This was discussed to exhaustion in Qt 5's development process. > link: http://lists.qt-project.org/pipermail/development/2012-June/004276.html > The conclusion is to remain at status quo since there is no good, > technical solution. > i actually don't see that conclusion. you mostly ignored the arguments about doing some kind of 8-bit passthrough. you sort of liked the idea to make QFile deal directly with urls, but it's obvious that this is a non-starter due to the effort required at higher levels. i still like my idea to do something akin to punycode ... ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Wednesday 08 October 2014 09:44:56 Tomasz Siekierda wrote: > On Tuesday 07 October 2014 23:31:11 Christoph Feck wrote: > > We are using the same hack in KDE4Libs, but it relies on > > QFile::setDecodingFunction. Unfortunately, this function is no longer > > available in Qt5, so in a few years, we will see the same long > > discussion as in https://bugs.kde.org/show_bug.cgi?id=165044 > > Thank you for that link, I've read through the whole discussion, and I > have to say I had no idea this problem can be that serious; I don't > think I've ever encountered it myself. But it seems to be rather > widespread and even made some people frustrated enough to stop using > KDE :-( It isn't that widespread. This used to be more of a problem in 2003 when Linux started to move from Latin1 and Latin9 to UTF-8. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Tuesday 07 October 2014 23:31:11 Christoph Feck wrote: > We are using the same hack in KDE4Libs, but it relies on > QFile::setDecodingFunction. Unfortunately, this function is no longer > available in Qt5, so in a few years, we will see the same long > discussion as in https://bugs.kde.org/show_bug.cgi?id=165044 Thank you for that link, I've read through the whole discussion, and I have to say I had no idea this problem can be that serious; I don't think I've ever encountered it myself. But it seems to be rather widespread and even made some people frustrated enough to stop using KDE :-( ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Tuesday 07 October 2014 23:31:11 Christoph Feck wrote: > On Tuesday 07 October 2014 23:19:23 Tony Van Eerd wrote: > > > The problem is serious enough, indeed, that Python 3 has resorted > > > to a hack where they use a private Unicode range to encode the > > > bytes between 128 and 255 in strings that fail normal decoding. > > > I think that putting this hack into QString is unthinkable, and > > > the concept of a platform string has to be taken with heads up > > > and in a manner that will make it useful, usable and > > > unobtrusive. I don't claim that it's a trivial task, but then > > > I'm not asking anyone else but myself to deal with it :) > > > > > > Cheers, Kuba > > > > I think that hack should be given serious consideration. Sure it > > is a hack, but it might still be the best solution. > > We are using the same hack in KDE4Libs, but it relies on > QFile::setDecodingFunction. Unfortunately, this function is no longer > available in Qt5, so in a few years, we will see the same long > discussion as in https://bugs.kde.org/show_bug.cgi?id=165044 That was done because from Qt 3 to early Qt 4, that's exactly what we did: the UTF-8 decoder used private-use characters to notify that it had bad decodings. Python's solution is a copy of Qt's. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Tuesday 07 October 2014 23:19:23 Tony Van Eerd wrote: > > The problem is serious enough, indeed, that Python 3 has resorted > > to a hack where they use a private Unicode range to encode the > > bytes between 128 and 255 in strings that fail normal decoding. > > I think that putting this hack into QString is unthinkable, and > > the concept of a platform string has to be taken with heads up > > and in a manner that will make it useful, usable and > > unobtrusive. I don't claim that it's a trivial task, but then > > I'm not asking anyone else but myself to deal with it :) > > > > Cheers, Kuba > > I think that hack should be given serious consideration. Sure it > is a hack, but it might still be the best solution. We are using the same hack in KDE4Libs, but it relies on QFile::setDecodingFunction. Unfortunately, this function is no longer available in Qt5, so in a few years, we will see the same long discussion as in https://bugs.kde.org/show_bug.cgi?id=165044 -- Christoph Feck http://kdepepo.wordpress.com/ KDE Quality Team ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
> > The problem is serious enough, indeed, that Python 3 has resorted to a hack > where they use a private Unicode range to encode the bytes between 128 > and 255 in strings that fail normal decoding. I think that putting this hack > into > QString is unthinkable, and the concept of a platform string has to be taken > with heads up and in a manner that will make it useful, usable and > unobtrusive. I don't claim that it's a trivial task, but then I'm not asking > anyone else but myself to deal with it :) > > Cheers, Kuba I think that hack should be given serious consideration. Sure it is a hack, but it might still be the best solution. Tony ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Tuesday 07 October 2014 10:38:47 Kuba Ober wrote: > Just to be very clear: it is currently impossible to make a truly portable > file management utility with Qt’s core APIs. Why? Because it will simply > ignore all file names that it can’t decode when iterating the directory, > and it won’t be able to take commandline arguments to open such files > either. Furthermore, this is something that very basic C code using nothing > but POSIX APIs can trivially deal with. Or that Python 2 trivially deals > with. I consider it a serious enough problem. That's where we disagree: those file names are not common at all. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
El Tuesday 07 October 2014, Tomasz Siekierda escribió: > For file paths, I feel QString is really enough. > Changing it to something else because of a few corner cases seems like > an overkill to me. Just for the sake of documenting the issue and pointing to this thread if future questions arise: Is there some solution for those corner cases? Say one writes a file manager with Qt, and has to support that one file name with a wrong encoding could be renamed to the right one. Should that person skip the Qt classes? BTW, subject says non-Windows systems, but IIRC Mac OS X doesn't allow two files with equivalent names (e.g. composed vs the precomposed characters that are equivalent). Does it apply as well? -- Alex (a.k.a. suy) | GPG ID 0x0B8B0BC2 http://barnacity.net/ | http://disperso.net ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Oct 7, 2014, at 8:30 AM, Julien Blanc wrote: > On 07/10/2014 12:11, Tomasz Siekierda wrote: >> For file paths, I feel QString is really enough. >> Changing it to something else because of a few corner cases seems like >> an overkill to me. We already have a lot of classes that are connected >> with paths and the file system (QFile, QFileInfo, QDir, QDirIterator, >> and more), that is enough. In my view, at least. > > Imho using QString for file path (or, more generally, using any string > objects with a static api) is somewhat a very widespread bad idea. The > std::experimental::filesystem api, for example, looks really better. Basically, on Unix, the idea that a file path has any particular encoding doesn’t hold *by the very design*. On Unix, a file path is a string of nonzero bytes with a special meaning for ‘/‘ and ‘.’ and that’s it. It’s a safe assumption that other bytes under 128 are ASCII, perhaps, but even that’s not an assumption one has to make. Armin describes it rather succinctly: "it's a byte mess that for display purposes is decoded with an encoding hint." Given that Qt’s lifecycle is very different from that of standard C++, whatever solution Qt provides needs to stand on its own. Cheers, Kuba ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Oct 7, 2014, at 6:11 AM, Tomasz Siekierda wrote: > On 7 October 2014 11:16, Marc Mutz wrote: >>> Ugh, that begins to sound like Java. Let's have a wrapper for a >>> wrapper... please don't go that way. > >> We have QSize and QPoint and they're used ubiquitously in Qt. But, by your >> rationale, everyone should be using two ints instead, so let's remove them! > >> How's that anything to do with Java? C++ is made from the ground up for >> lightweight abstractions such as a size, a point and a file path. It's Java >> that isn't. > > QSize, QPoint, QRect, etc. are useful, very convenient, intuitive and > a good thing to have. For file paths, I feel QString is really enough. > Changing it to something else because of a few corner cases seems like > an overkill to me. Just to be very clear: it is currently impossible to make a truly portable file management utility with Qt’s core APIs. Why? Because it will simply ignore all file names that it can’t decode when iterating the directory, and it won’t be able to take commandline arguments to open such files either. Furthermore, this is something that very basic C code using nothing but POSIX APIs can trivially deal with. Or that Python 2 trivially deals with. I consider it a serious enough problem. The problem is serious enough, indeed, that Python 3 has resorted to a hack where they use a private Unicode range to encode the bytes between 128 and 255 in strings that fail normal decoding. I think that putting this hack into QString is unthinkable, and the concept of a platform string has to be taken with heads up and in a manner that will make it useful, usable and unobtrusive. I don’t claim that it’s a trivial task, but then I’m not asking anyone else but myself to deal with it :) Cheers, Kuba ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On 07/10/2014 12:11, Tomasz Siekierda wrote: > For file paths, I feel QString is really enough. > Changing it to something else because of a few corner cases seems like > an overkill to me. We already have a lot of classes that are connected > with paths and the file system (QFile, QFileInfo, QDir, QDirIterator, > and more), that is enough. In my view, at least. Imho using QString for file path (or, more generally, using any string objects with a static api) is somewhat a very widespread bad idea. The std::experimental::filesystem api, for example, looks really better. However, i agree that changing this would : * break a lot of code * permit only to solve really lower level / corner case issues * be redundant with the std::filesystem api when it will be standardized (hopefully for C++17). I hope Qt will then add the relevant overrides to make its use possible anywhere where relevant. Best regards, Julien Blanc ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On 7 October 2014 11:16, Marc Mutz wrote: >> Ugh, that begins to sound like Java. Let's have a wrapper for a >> wrapper... please don't go that way. > We have QSize and QPoint and they're used ubiquitously in Qt. But, by your > rationale, everyone should be using two ints instead, so let's remove them! > How's that anything to do with Java? C++ is made from the ground up for > lightweight abstractions such as a size, a point and a file path. It's Java > that isn't. QSize, QPoint, QRect, etc. are useful, very convenient, intuitive and a good thing to have. For file paths, I feel QString is really enough. Changing it to something else because of a few corner cases seems like an overkill to me. We already have a lot of classes that are connected with paths and the file system (QFile, QFileInfo, QDir, QDirIterator, and more), that is enough. In my view, at least. My reference to Java comes from their love for large amount of abstractions and interfaces, where even the simplest action requires creation of several objects of various classes that can't talk directly to one another. One of the beautiful things about Qt is that you can do a lot in a very few lines of code. Current use of QString - IMO - works just fine, but maybe I have misunderstood something here. ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
Hi Kuba, Your criticisms are completely valid, and the conclusions you draw from them are, too. The problems Thiago lists make this a daunting task, but mostly not because of complexity, but of sheer volume of code that needs to be modified. I believe it's worth it, but most of us here lack the time for such a change, so you're more than welcome to volunteer. I can at least help with review, and once the class is in and QDir*/QFile* are adapted to use it, I'd start using it in QtWidgets right away. Many of the issues Thiago lists can be dodged. E.g. a file path class could lack streaming operators and instead force early users to use toEncoded()/toDecoded() explicitly. On Monday 06 October 2014 19:30:29 Kuba Ober wrote: > I’d think that the solution could be to use a dedicated class for file > names, Yes, with s/file/path/. > perhaps with a base class for uninterpreted platform strings. Value classes should not have base classes. And that class for uninterpreted platform strings may be an abstraction vehicle to streamline implementation of path name class, but not public API, unless you provide one or two more users of such API. Thanks, Marc -- Qt Developer Days 2014 - October 6 - 8 at BCC, Berlin Marc Mutz | Senior Software Engineer KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company www.kdab.com || Germany +49-30-521325470 || Sweden (HQ) +46-563-540090 KDAB - Qt Experts - Platform-Independent Software Solutions ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Tuesday 07 October 2014 09:47:19 Tomasz Siekierda wrote: > >> I’d think that the solution could be to use a dedicated class for file > >> names, perhaps with a base class for uninterpreted platform strings. > > Ugh, that begins to sound like Java. Let's have a wrapper for a > wrapper... please don't go that way. We have QSize and QPoint and they're used ubiquitously in Qt. But, by your rationale, everyone should be using two ints instead, so let's remove them! How's that anything to do with Java? C++ is made from the ground up for lightweight abstractions such as a size, a point and a file path. It's Java that isn't. Thanks for this helpful comment, Marc -- Qt Developer Days 2014 - October 6 - 8 at BCC, Berlin Marc Mutz | Senior Software Engineer KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company www.kdab.com || Germany +49-30-521325470 || Sweden (HQ) +46-563-540090 KDAB - Qt Experts - Platform-Independent Software Solutions ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
>>> This was discussed to exhaustion in Qt 5's development process. The >>> conclusion >>> is to remain at status quo since there is no good, technical solution. >> >> I’d think that the solution could be to use a dedicated class for file >> names, perhaps with a base class for uninterpreted platform strings. Ugh, that begins to sound like Java. Let's have a wrapper for a wrapper... please don't go that way. >> > How do you pass it on the command-line? Mind you, QProcess takes a >> > QStringList >> > for arguments. >> >> It look as if we’d need something like QPlatformString that’s a “thin” >> wrapper >> around a QByteArray on unices, and around QString on Windows. > > No, thanks! :) I fully agree with "no, thanks" ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
2014-10-06 23:53 GMT+04:00 Kuba Ober : > On Oct 6, 2014, at 2:45 PM, Thiago Macieira > wrote: > > > How do you pass it on the command-line? Mind you, QProcess takes a > QStringList > > for arguments. > > It look as if we’d need something like QPlatformString that’s a “thin” > wrapper > around a QByteArray on unices, and around QString on Windows. > No, thanks! :) Konstantin ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Oct 6, 2014, at 2:45 PM, Thiago Macieira wrote: > On Monday 06 October 2014 13:30:29 Kuba Ober wrote: >>> This was discussed to exhaustion in Qt 5's development process. The >>> conclusion is to remain at status quo since there is no good, technical >>> solution. >> >> I’d think that the solution could be to use a dedicated class for file >> names, perhaps with a base class for uninterpreted platform strings. > > We thought about that. The problem comes when you need to pass the file name > through other means besides the file management functions. > > How do you write it to a file using QIODevice? How do you write it to a file > using QTextStream? How about: 1. Byte-oriented system: QIODevice: output as if it were a QByteArray. QTextStream: output while bypassing any encoding, as if it were already encoded. 2. UTF-16 (Windows): the underlying data is QString already. > How do you pass it on the command-line? Mind you, QProcess takes a > QStringList > for arguments. It look as if we’d need something like QPlatformString that’s a “thin” wrapper around a QByteArray on unices, and around QString on Windows. QProcess would be expanded to accept a QList and QList in addition to QStringList.Of course QPlatformString would need to be handled by QVariant. > What should QCommandLineParser do? Assume that all arguments a QFileName? Not necessarily, but they’d be QPlatformString. A QPlatformString could be decoded into a QString, but not implicitly converted into it. The parser would need to be able to process those strings. I presume that they’d need to be treated as “anything <128 is ASCII, anything above is left alone”. > What happens when you display the file name for the user in a text edit? What > happens if the user edits the text? Here there’s only one sane behavior: decoder errors are replaced with question marks, *and* if there were no changes to the text, the original representation is preserved. QTextEdit and friends would need to support both QString and QPlatformString. > Will we replace all of the QString-based API with QFileName instead? QPixmap, > QLibrary, QPluginLoader, etc. Maybe not replace, but expand: anywhere a QString is taken for a file name, a QFileName or QPlatformString would be accepted as well. I’d think that QFileName or QFilePath could be publicly derived from QString to indicate intent, but perhaps even that wouldn’t be necessary. > And how will you teach people to concatenate using the correct functions, > instead of manipulating using QString? There’d be no implicit conversions to/from QString, and it’d need to support concatenation with ASCII. > Also remember that everything is the opposite on Windows: the file names are > really UTF-16 and cannot be encoded in the locale's 8-bit encoding. That’s why on Windows, it’d be QString all the way down even when we pretend otherwise. It’s really Unix that’s the bastard child… It’s pretty clear, I think, that for source compatibility all of the QString filename APIs would need to be retained and perhaps deprecated when 6.0 comes by. I’m thinking this could be made BC and done for some 5.x. I volunteer, but any prototypes are still a month-two away at this point. Cheers, Kuba ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Monday 06 October 2014 13:30:29 Kuba Ober wrote: > > This was discussed to exhaustion in Qt 5's development process. The > > conclusion is to remain at status quo since there is no good, technical > > solution. > > I’d think that the solution could be to use a dedicated class for file > names, perhaps with a base class for uninterpreted platform strings. We thought about that. The problem comes when you need to pass the file name through other means besides the file management functions. How do you write it to a file using QIODevice? How do you write it to a file using QTextStream? How do you pass it on the command-line? Mind you, QProcess takes a QStringList for arguments. What should QCommandLineParser do? Assume that all arguments a QFileName? What happens when you display the file name for the user in a text edit? What happens if the user edits the text? Will we replace all of the QString-based API with QFileName instead? QPixmap, QLibrary, QPluginLoader, etc. And how will you teach people to concatenate using the correct functions, instead of manipulating using QString? Also remember that everything is the opposite on Windows: the file names are really UTF-16 and cannot be encoded in the locale's 8-bit encoding. > So, I’m specifically thinking of: > > 1. Retaining platform-specific representation of a file name in a class like > QFileName (it’d be a QString wrapper on Windows, for example). > > 2. Sending it out to `QTextStream(stdout)`, and inputting it from > `QTextStream(stdin)`. > > 3. Using the platform representation in addition to visible representation > in the filesystem model. > > Would something like that have any chance of getting accepted into Qt 6, if > done “properly”? Obviously I haven’t thought out the details yet and there > may be a need for several prototypes before it turns into anything remotely > acceptable. Chance, yes, if you can fix the rest of the problems. This is a bigger problem than you realise. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Oct 6, 2014, at 11:21 AM, Thiago Macieira wrote: > On Monday 06 October 2014 11:12:57 Kuba Ober wrote: >> Thus, how does Qt deal with a directory listing with such “invalid” file >> names? Do they survive the round-trip through a QString and QDirIterator? >> Would it be worthwhile to tackle this issue in a better fashion (whatever >> it might be) for Qt 6? > > File names that cannot be decoded using the locale codec are considered > filesystem corruption and are silently dropped. They won't appear in > directory > listings. > > This was discussed to exhaustion in Qt 5's development process. The > conclusion > is to remain at status quo since there is no good, technical solution. I’d think that the solution could be to use a dedicated class for file names, perhaps with a base class for uninterpreted platform strings. One would also need to think about console I/O, since no matter what the encoding is, it’d be nice to remain somewhat at the level of what bare C provides. For example, a C program doesn’t need to know the encoding of file names nor the encoding used by the console - as long as they are one and the same, printing file names will “just work”. That’s of course assuming that the encoding is ASCII-compatible. So, I’m specifically thinking of: 1. Retaining platform-specific representation of a file name in a class like QFileName (it’d be a QString wrapper on Windows, for example). 2. Sending it out to `QTextStream(stdout)`, and inputting it from `QTextStream(stdin)`. 3. Using the platform representation in addition to visible representation in the filesystem model. Would something like that have any chance of getting accepted into Qt 6, if done “properly”? Obviously I haven’t thought out the details yet and there may be a need for several prototypes before it turns into anything remotely acceptable. Cheers, Kuba ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
Re: [Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
On Monday 06 October 2014 11:12:57 Kuba Ober wrote: > Thus, how does Qt deal with a directory listing with such “invalid” file > names? Do they survive the round-trip through a QString and QDirIterator? > Would it be worthwhile to tackle this issue in a better fashion (whatever > it might be) for Qt 6? File names that cannot be decoded using the locale codec are considered filesystem corruption and are silently dropped. They won't appear in directory listings. This was discussed to exhaustion in Qt 5's development process. The conclusion is to remain at status quo since there is no good, technical solution. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development
[Development] The life of a file name and other possibly mal-encoded strings on non-Windows systems
I’ve just read Armin Ronacher’s blog post about Unicode in Python 3, and he has highlighted a possible problem with “Unicode everywhere” approach to things that come from byte APIs like one has on Unix. File names can have any encoding, and a file name not being valid under the encoding given in LC_CTYPE or LANG shouldn’t cause any failures when attempting to open the file. After all, when using the byte APIs, everything “just works”. About the only problem might be when presenting the name to the user, but even then, the name must simply exist in two forms: the native byte form, and a QString with some question marks in it. Thus, how does Qt deal with a directory listing with such “invalid” file names? Do they survive the round-trip through a QString and QDirIterator? Would it be worthwhile to tackle this issue in a better fashion (whatever it might be) for Qt 6? Cheers, Kuba Ober ___ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development