https://bugzilla.wikimedia.org/show_bug.cgi?id=164
--- Comment #131 from Andrew Dunbar <[email protected]> 2009-11-19 08:11:38 UTC --- > Well, there are people who know the architecture and the programming language > and can presumably do this very quickly (at least the basic step) Why do you presume it can be done quickly? It doesn't seem so to me and I regard this bug to be of huge importance. > if they realize how much of a priority it is. The first step is just to > apply a few extra functions to category sort keys with the intention of > converting them into collation keys. Which functions would those be? What makes you think category sort keys can be converted to collation keys? For instance category sort keys are human-readable and human editable but collation keys are typically binary and not human readable. Category sort keys are included directly in the text of a category link such as [[Category:foo|bar]]. Do to their binary nature collation keys cannot appear here. So you would need to decide whether to remove all category sort keys or make category sort keys interact with collation keys that would be added elsewhere. For collation keys to replace category sort keys you would need to establish that cateogry sort keys have no legitimate uses other than forcing alphabetic order in cases where the current order results in nonalphabetic sequences. I can assure you that people do use category sort keys for other purposes and some might be vociferously upset if these were removed without discussion. For collation keys to interact with category sort keys you need to generate and maintain in the database, collation keys for each page title and for each category sort key since collation keys must be of the same nature to be able to compare and hence sort them. Now Unicode does specifiy a "Unicode Collation Algorithm" (UCA) which we could and probably should use. It is language agnostic but provides for "tailoring" for individual languages. The UCA definitely generates binary keys. Not printable. Not human readable. UCA keys can be very long. I use them in an offline tool for the English Wiktionary and initially set their maximum length to 1024, 4 times the maximum length of a page title. We already had about 10 pages for which 1024 was too short so I had to set it to 2048! Many people might not like all page titles and category sort keys to now require 9x their current amount of space in the database. UCA does allow for various types of sort key compression however. In which case we would need to choose one to use since it will not be possible to mix and match them. PHP currently seems to have no implementation of UCA. We would need to create it from scratch, or find a way to use one in C. For multilingual wikis such as Commons and all of the Wiktionaries just havine one collation language will not work since users of each language will expect things to be in the correct order for their language. For the Wiktionaries this means each category needs a way to declare which language collation to use and each page needs to declare which subset of possible language collation keys to generate for that page. For Commons I'm not sure what the requirements would be but the may differ from those of the Wiktionaries. These new fields will need support in the database schema. The ones requiring multiple language collations will reqire more drastic database changes quite different from what we now have. > Once that's in place, people can work on actually writing such functions for > their particular languages. Later, when those functions are written, they can > be used additionally to generate a proper alphabetically ordered table of > pages for use in the contents listings. Or some similar workflow UCA tailoring would make the particular language collations very easy as long as we have a decent implementation of UCA that easily works with tailoring. > - but there needs to be a plan of action, and that can't be effected by just > anyone, only by the devs who are in charge (no use pretending that everyone's > equal - only certain devs actually have the power to make anything happen). Not true. Anyone with commit access can add such code. Myself for instance. My understaning is that there are not technically any dev in charge at the moment since Brion stepped down though there certainly are a few such as Tim who are acknowledged to have a greater understanding of the entire codebase and hence greater trust, and you definitely want those people to check such changes and would expect them to revert any premature commits. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
