Em Sunday, 5 de June de 2011, às 10:00:37, Ivan Čukić escreveu:
> It currently inherits KServiceType which will be changed, and it uses
> KUrl which largely exists due to some parsing problems in QUrl. We are
> hoping to push the fixes to QUrl to allow us to drop KUrl in kdelibs
> 5.

KUrl doesn't do any parsing. It uses QUrl for parsing. Therefore, "parsing
problems in QUrl" cannot be true, as it would be KUrl parsing problems too.

KUrl exists mostly to keep KDE 3 KURL API compatibility.

In any case, QUrl in Qt 5 requires a rewrite of its API. Not the parsing --
that one is fine. QUrl has a completely flawed API, owed to long-time
misunderstanding of what a URL is.

URLs and URIs are "designed by committee" and are simultaneously:
 - Unicode
 - UTF-8 encoded
 - binary

So the following two URLs are the same:
        http://localhost/R%C3%A9sum%C3%A9.pdf
        http://localhost/Résumé.pdf
but the following URL is permitted too:
        http://localhost/R%E9sum%E9.pdf

Note how "é" expands to %C3%A9 (URLs are Unicode UTF-8 encoded) but at the
same time the byte 0xE9 is permitted too (non-UTF8). QString is therefore
inadequate to represent this in fully-decoded form for the path component: it
is "/Résumé.pdf" for the first two URLs, but what is its value for the third?

Also note how the following two URLs are *not* the same:
        http://localhost/foo/bar
        http://localhost/foo%2Fbar
despite the slash character being 0x2F.

So again QString is inadequate to represent a component of a URL in fully-
decoded form which is what the QUrl::path() does. At the same time,
QUrl::encodedPath() returning a QByteArray with %-encoding is hard to use.

The slash character may be a corner case, but these two are also defnitely not
the same:
        http://localhost/foo?arg=value#anchor
        http://localhost/foo%3Farg=value%23anchor

QUrl decodes the second URL properly, and QUrl::path() returns
"/foo?arg=value#anchor", which is fine. But then if you call QUrl::toString(),
you get the first URL, which is *not* fine, as we established that they are
different URLs. And to top it all off, QUrl's constructor uses the same flawed
fully-decoded notation.

In my view, QUrl should be modified to use *only* partially-decoded components
and provide a method (toEncoded()) that returns the fully-encoded form for
proper network transfer. The partially-decoded form would decode %-encodings
that are UTF-8 sequences, including %20 to space, but not including delimiter
characterrs (so it won't decode %3F to a question mark in a path component,
but it would decode it in the query and fragment component).

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Senior Product Manager - Nokia, Qt Development Frameworks
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Qt5-feedback mailing list
[email protected]
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback

Reply via email to