[Bug 9530] Section heading anchors shouldn't begin with invalid characters

bugzilla-daemon Mon, 15 Nov 2010 16:15:18 -0800

https://bugzilla.wikimedia.org/show_bug.cgi?id=9530


--- Comment #25 from Aryeh Gregor <[email protected]> 2010-11-16 
00:15:10 UTC ---
(In reply to comment #20)
> What's the history of the practice of trying to encode the name of the section
> in the anchor for that section?

The same as the practice of trying to encode the name of the article in the URL
for the article, I imagine.  Pretty URLs are nice.

> It seems to get very messy and unpredictable unless the heading text is 
> written
> in latin characters without any punctuation.

$wgExperimentalHtmlIds is enabled in trunk, so this is no longer the case --
non-Latin scripts and punctuation will work fine.  (Although there are still a
bunch of other problems.)

> 1. The encoding does not follow a standard encoding algorithm, making any non
> string that's not [a-zA-Z0-9 ] be converted to something that only someone who
> knows the algorithm well would expect.

This is no longer the case on trunk.  (Actually, legacy id's are just
urlencoded as UTF-8, but with "%" replaced by ".", so that's not really
nonstandard.  But it is ugly.)

> 2. The anchor names could possibly intersect with IDs used on the page for
> other things. An effort has been made to conform the IDs of skins use the mw-*
> namespace, but it's still not a guarantee, just a bit less likely.

Yes, this is a problem.

> If anchor names were encoded in a predictable way, such as id="section-1.2" 
> the
> anchors would be able to correspond to the table of contents, which is pretty
> simple and straightforward, plus we could know for sure that there would never
> be collisions with IDs so long as we never use the section-* namespace in 
> skins
> or other software. Since we have more control over the software than the
> content, this seems like a superior approach.

We could also make our URLs use page_id instead of the article title, but I
don't think it's desirable.  The section name is more stable than the number,
because it doesn't change when sections are added or removed, and
adding/removing sections is more common than renaming them.  (No, I have no
stats on this, but it's clear to me from personal experience.)

The section name can also be typed manually or copy-pasted from the rendered
page, not just copy-pasted from the URL, so it's more convenient.  You could
type a #section-1.2 type anchor manually too, but only if you count the
sections, which isn't worth it on large articles.

And the section name gives you an idea of what section you're being linked to
before you click the URL.  The section number is opaque.


There are indeed some problems with the way we do things in trunk.  Overall,
IMO, they're not enough to offset the (modest) advantages we get from using
section names instead of numbers.  It would be pretty easy to more or less
eliminate anchor collisions from headers by just making a big pattern of
reserved anchors, including unprefixed ones like "content" and "top", and
tweaking header id's if they matched -- we wouldn't get them all, but we'd make
the problem really negligible.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 9530] Section heading anchors shouldn't begin with invalid characters

Reply via email to