Am Sonntag 05 Juni 2011, 12:00:53 schrieb Thiago Macieira: > Em Sunday, 5 de June de 2011, às 10:00:37, Ivan Čukić escreveu: > > It currently inherits KServiceType which will be changed, and it uses > > KUrl which largely exists due to some parsing problems in QUrl. We are > > hoping to push the fixes to QUrl to allow us to drop KUrl in kdelibs > > 5. > > KUrl doesn't do any parsing. It uses QUrl for parsing. Therefore, "parsing > problems in QUrl" cannot be true, as it would be KUrl parsing problems too. > > KUrl exists mostly to keep KDE 3 KURL API compatibility. > > In any case, QUrl in Qt 5 requires a rewrite of its API. Not the parsing -- > that one is fine. QUrl has a completely flawed API, owed to long-time > misunderstanding of what a URL is. > > URLs and URIs are "designed by committee" and are simultaneously: > - Unicode > - UTF-8 encoded > - binary > > So the following two URLs are the same: > http://localhost/R%C3%A9sum%C3%A9.pdf > http://localhost/Résumé.pdf > but the following URL is permitted too: > http://localhost/R%E9sum%E9.pdf > > Note how "é" expands to %C3%A9 (URLs are Unicode UTF-8 encoded) but at the > same time the byte 0xE9 is permitted too (non-UTF8). QString is therefore > inadequate to represent this in fully-decoded form for the path component: > it is "/Résumé.pdf" for the first two URLs, but what is its value for the > third? > > Also note how the following two URLs are *not* the same: > http://localhost/foo/bar > http://localhost/foo%2Fbar > despite the slash character being 0x2F. > > So again QString is inadequate to represent a component of a URL in fully- > decoded form which is what the QUrl::path() does. At the same time, > QUrl::encodedPath() returning a QByteArray with %-encoding is hard to use. > > The slash character may be a corner case, but these two are also defnitely > not the same: > http://localhost/foo?arg=value#anchor > http://localhost/foo%3Farg=value%23anchor > > QUrl decodes the second URL properly, and QUrl::path() returns > "/foo?arg=value#anchor", which is fine. But then if you call > QUrl::toString(), you get the first URL, which is *not* fine, as we > established that they are different URLs. And to top it all off, QUrl's > constructor uses the same flawed fully-decoded notation. > > In my view, QUrl should be modified to use *only* partially-decoded > components and provide a method (toEncoded()) that returns the > fully-encoded form for proper network transfer. The partially-decoded form > would decode %-encodings that are UTF-8 sequences, including %20 to space, > but not including delimiter characterrs (so it won't decode %3F to a > question mark in a path component, but it would decode it in the query and > fragment component).
Would the planned changes also fix the problem that can appear if file names contain non utf8 symbols and as a result can't be renamed in the ui [1] as the internally used file name simply replaces the "incorrect" signs with the utf8 sign "�" (question mark in a rhombus). [1] At least in KDE. _______________________________________________ Qt5-feedback mailing list [email protected] http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
