Re: [Sycamore-Dev] UTF-8 Filenames and URLs

Philip Neustrom Fri, 18 May 2007 14:32:33 -0700

I haven't seen anything screw up with the quote functions in a while.
What was the edge case you saw?


It's mad important to me to make sure URLs never break, so it's
important to be careful when playing with the URL encoding stuff.  Old
URLs should keep working forever, unless there's something insane that
needs to happen.  I'm not sure if that's an issue in this case, but I
just thought I'd throw it out there.  See:
http://daviswiki.org/index.scgi/Front_20Page :)

--Philip

On 5/18/07, Rottenchester <[EMAIL PROTECTED]> wrote:
> Scott, thanks for the reference.   It looks like that encoding is the
> intent of quoteFIlenames and the fix I checked in this a.m. should
> handle edge cases that were causing an error in some testing we were
> doing.
>
> The remaining issue in UTF-8 handling is another error in search.py
> that apparently Philip has a fix for, but hasn't checked in [1].
> According to a note in that ticket (dated 1/26), Philip has it fixed
> in his wikis branch but was planning to port it to trunk.
>
> Maybe it would be better for the project if Philip would check in his
> wikis branch "as is" and then we could work on merging it as a
> community.   Philip, do you have some thoughts on that?
>
> I'll move on to other bugs until I hear back.
>
> ----------------
>
> [1] http://sycamore.devjavu.com/projects/sycamore/ticket/17
>
>
>
> On 5/18/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > >> There are a couple of bugs in trac related to UTF-8.  It looks like
> > >> all file names and URLs are run through the pretty restrictive
> > >> quoteFilename in wikiutil.py.  This recodes all characters that aren't
> > >> in (A-Z,a-z,1-9).   In a UTF-8 environment, it doesn't work on UTF-8
> > >> URLs.
> > >
> > > It looks like[1] only these ascii characters are allowed in a URI:
> > >
> > > Unreserved Characters (no encoding needed)
> > > A-Z (uppercase letters)
> > > a-z (lowercase letters)
> > > 0-9 (numbers)
> > > - (dash)
> > > _ (underscore)
> > > . (period)
> > > ~ (tilde)
> > >
> > > Reserved Characters (allowed only if encoded)
> > > ! = %21
> > > * = %2A
> > > ' = %27
> > > ( = %28
> > > ) = %29
> > > ; = %3B
> > > : = %3A
> > > @ = %40
> > > & = %26
> > > = = %3D
> > > + = %2B
> > > $ = %24
> > > , = %2C
> > > / = %2F
> > > ? = %3F
> > > % = %25
> > > # = %23
> > > [ = %5B
> > > ] = %5D
> > >
> > > If the filename is meant to be displayed in the browser it make sense to
> > > encode it using percent encoding.
> >
> > To clarify[1]...
> >
> > "For worldwide interoperability, URIs have to be encoded uniformly. To map
> > the wide range of characters used worldwide into the 60 or so allowed
> > characters in a URI, a two-step process is used:
> >
> >     * Convert the character string into a sequence of bytes using the
> > UTF-8 encoding
> >     * Convert each byte that is not an ASCII letter or digit to %HH, where
> > HH is the hexadecimal value of the byte"
> >
> > Scott
> > --------
> > [1] http://www.w3.org/International/O-URL-code.html
> >
> > _______________________________________________
> > Sycamore-Dev mailing list
> > [EMAIL PROTECTED]
> > http://www.projectsycamore.org/
> > https://tools.cernio.com/mailman/listinfo/sycamore-dev
> >
> _______________________________________________
> Sycamore-Dev mailing list
> [EMAIL PROTECTED]
> http://www.projectsycamore.org/
> https://tools.cernio.com/mailman/listinfo/sycamore-dev
>
_______________________________________________
Sycamore-Dev mailing list
[EMAIL PROTECTED]
http://www.projectsycamore.org/
https://tools.cernio.com/mailman/listinfo/sycamore-dev

Re: [Sycamore-Dev] UTF-8 Filenames and URLs

Reply via email to