Re: [Wikitech-l] Page title length
This is not an exact answer to your question, but rather a simple and powerful alternative. If you are thinking about using Semantic-MediaWiki (which would be very applicable for a Wiki about resources), you should have a look at: http://www.mediawiki.org/wiki/Extension:SemanticTitle The SemanticTitle extension seems to be currently unmaintained but just yesterday there has been a talk at SMW-Con here in Berlin which revealed plans by MITRE to soon commit their recent changes on the extension and perhaps even maintain it in the future. More information about the talk: http://semantic-mediawiki.org/wiki/SMWCon_Fall_2013/Revolutionizing_page_naming_with_semantic_properties The slides of the talk are already available on that page, a video recording should follow soon. Cheers, Daniel Werner 2013/10/24 Élie Roux elie.r...@telecom-bretagne.eu: Dear MediaWiki developers, I'm responsible for the development of a new Wiki that will contain many Tibetan resources. Traditionnaly, Tibetan titles of books or even parts of books are extremely long, as you can see for instance here : http://www.tbrc.org/#!rid=W23922, and sometimes too long for Mediawiki, for instance the title of http://www.tbrc.org/?locale=bo#library_work_ViewByOutline-O01DG1049094951|W23922 , which is ཤངས་པ་བཀའ་བརྒྱུད་ཀྱི་གཞུང་བཀའ་ཕྱི་མ་རྣམས་ཕྱོགས་གཅིག་ཏུ་བསྒྲིལ་བའི་ཕྱག་ལེན་བདེ་ཆེན་སྙེ་མའི་ཆུན་པོ་ . This title is around 90 Tibetan characters, but each caracter being 3 bytes, it exceeds the limit for title length of 256 bytes that MediaWiki has. So I have two questions: 1. If I change this limit to 1023 in the structure of the database ('page_title' field of the 'page' base), will other things (such as search engine) break? Is there a way to change it more cleanly? 2. Could I propose you to make this limit 1023 instead of 255 (or to make it configurable easily)? This would allow at least 256 characters (even for asian languages) instead of 256 bytes, which seems more consistant with the fact that MediaWiki is well internationalized. Thank you in advance, -- Elie ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Daniel Werner Software Engineer Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. (030) 219 158 26-0 http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Page title length
Reposted to https://www.mediawiki.org/wiki/Page_title_size_limitations . Thank you very much Nathan, Brion and Daniel for your research and good advices! It seems indeed more complex that I naively thought, but I'll try to make the changes you advised soon, keeping track of everything so that a good recipe can be written. Should I open a bugreport about this? Thank you, -- Elie ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Page title length
Search around for '255' appearing in .php, .inc, or .js files and change the checks to 1023. [...] Should someone maybe define a constant Title::MAX_LENGTH instead of hard-coding 255 in many PHP files? DanB ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Page title length
On Fri, Oct 25, 2013 at 8:19 AM, Daniel Barrett d...@vistaprint.com wrote: Search around for '255' appearing in .php, .inc, or .js files and change the checks to 1023. [...] Should someone maybe define a constant Title::MAX_LENGTH instead of hard-coding 255 in many PHP files? Yes, such a thing should be done. :) We'll also have to find a way to export the value to JavaScript and to SQL. JS can probably use a global config var or such; SQL can probably use a magic comment+token replacement like the db type and prefix magic. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Page title length
Dear MediaWiki developers, I'm responsible for the development of a new Wiki that will contain many Tibetan resources. Traditionnaly, Tibetan titles of books or even parts of books are extremely long, as you can see for instance here : http://www.tbrc.org/#!rid=W23922, and sometimes too long for Mediawiki, for instance the title of http://www.tbrc.org/?locale=bo#library_work_ViewByOutline-O01DG1049094951|W23922 , which is ཤངས་པ་བཀའ་བརྒྱུད་ཀྱི་གཞུང་བཀའ་ཕྱི་མ་རྣམས་ཕྱོགས་གཅིག་ཏུ་བསྒྲིལ་བའི་ཕྱག་ལེན་བདེ་ཆེན་སྙེ་མའི་ཆུན་པོ་ . This title is around 90 Tibetan characters, but each caracter being 3 bytes, it exceeds the limit for title length of 256 bytes that MediaWiki has. So I have two questions: 1. If I change this limit to 1023 in the structure of the database ('page_title' field of the 'page' base), will other things (such as search engine) break? Is there a way to change it more cleanly? 2. Could I propose you to make this limit 1023 instead of 255 (or to make it configurable easily)? This would allow at least 256 characters (even for asian languages) instead of 256 bytes, which seems more consistant with the fact that MediaWiki is well internationalized. Thank you in advance, -- Elie ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Page title length
On Thu, Oct 24, 2013 at 1:01 PM, Élie Roux elie.r...@telecom-bretagne.euwrote: Dear MediaWiki developers, I'm responsible for the development of a new Wiki that will contain many Tibetan resources. Traditionnaly, Tibetan titles of books or even parts of books are extremely long, as you can see for instance here : http://www.tbrc.org/#!rid=**W23922 http://www.tbrc.org/#!rid=W23922, and sometimes too long for Mediawiki, for instance the title of http://www.tbrc.org/?locale=**bo#library_work_ViewByOutline-** O01DG1049094951|W23922http://www.tbrc.org/?locale=bo#library_work_ViewByOutline-O01DG1049094951%7CW23922 , which is ཤངས་པ་བཀའ་བརྒྱུད་ཀྱི་གཞུང་བཀའ་**ཕྱི་མ་རྣམས་ཕྱོགས་གཅིག་ཏུ་བསྒྲི** ལ་བའི་ཕྱག་ལེན་བདེ་ཆེན་སྙེ་མའི་**ཆུན་པོ་ . This title is around 90 Tibetan characters, but each caracter being 3 bytes, it exceeds the limit for title length of 256 bytes that MediaWiki has. So I have two questions: 1. If I change this limit to 1023 in the structure of the database ('page_title' field of the 'page' base), will other things (such as search engine) break? Is there a way to change it more cleanly? 2. Could I propose you to make this limit 1023 instead of 255 (or to make it configurable easily)? This would allow at least 256 characters (even for asian languages) instead of 256 bytes, which seems more consistant with the fact that MediaWiki is well internationalized. Thank you in advance, -- Elie Wouldn't ar.title, log_title, rc_title, and so on also need to have their sizes increased? I didn't see any bug report proposing an increase in page title size, although there was this one about comment size. https://bugzilla.wikimedia.org/show_bug.cgi?id=4715 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Page title length
On 2013-10-24 10:01 AM, Élie Roux wrote: 1. If I change this limit to 1023 in the structure of the database ('page_title' field of the 'page' base), will other things (such as search engine) break? Is there a way to change it more cleanly? Other things won't break. But the limit is hardcoded into the software so it won't actually change the limit. However if you change both places you will encounter bugs unless you make sure to make a huge number of other changes to the database and also handle user names by either changing every column that accepts a user name or editing User.php to use a limit other than what Title uses. 2. Could I propose you to make this limit 1023 instead of 255 (or to make it configurable easily)? This would allow at least 256 characters (even for asian languages) instead of 256 bytes, which seems more consistant with the fact that MediaWiki is well internationalized. Just to clarify (You're switching between 255 and 256 through your message) it's 255 bytes. The choice of using 255 bytes for title length wasn't an arbitrary choice. 255 bytes – not characters – was picked because 255 bytes is the maximum string length that can be represented using 1 byte. ie: 1 byte (8 bits - 2^8=265 values not including 0) can represent the range 0-255 so a varchar uses 1 byte to declare the length of the varchar's contents. If you use ANY length larger than 255 the varchar will always require 2 bytes to store the length of the column. Changing the maximum length of a title will require a huge number of changes and is not easy to make configurable so we should do this only one time changing the maximum length of title to the maximum we can feasibly make it. * It's hardcoded in Title::secureAndSplit. * User::getCanonicalName will need some extra code to restrict usernames to 255 bytes as it currently depends on Title to do that. (And increasing the user name length is far more problematic than the title length) * docs/title.txt and TitleTest.php will need an update. * The canonical page.page_title needs an ALTER COLUMN. * A number of other core tables will need column alters; (archive, page, template, image, category)links.(a, p, t, i, c)l_title, category.cat_title, (lang, iw)links.(l,iw)l_title, recentchanges.rc_title, and so on. * All extensions that store titles in the database will need to do similar ALTER COLUMN updates or else they will have unexpected errors. o It might be worth doing this first. Since a larger varchar length in the database won't cause any bugs while the software still uses 255 bytes. The next (and tbh final) title max length shouldn't be something arbitrary, we should pick the maximum we can theoretically store (max we can store with 2 bytes/a 16bit uint and the max we can possibly store are actually the same thing, at least on MySQL in regards to VARCHAR). The next step up from 1 byte representing 0-255 is not 0-1023 as you're thinking. 2 bytes can represent a length of 0-65535 bytes. However we actually cannot use VARCHAR(65535) for the title. There are a number of other limits we hit which cap varchar column lengths below what can be represented with a 16bit uint length. * MySQL Has a maximum row length of 65,535 bytes[1]; This includes the storage of whatever all the columns on the table require in the row storage. * PostgreSQL doesn't seem to have the same max row size issues as MySQL because it's row max – depending on whether you ask the about page[2] or wiki FAQ[3] – is either 1.6TB or 400GB. And the column max size is 1GB. * However PostgreSQL seems to say indexes can not be created on columns longer than about 2,000 characters[1]. I don't know the precise details but it might make our limit 2000 bytes. ((We'll need some more input on someone who knows PostgreSQL)) * The varchar WP page[4] says Oracle's limit is 4000 bytes. * Before MySQL 5.0.3 VARCHAR colums could only be declared a maximum of 255. ((This means we'll have to drop support for 5.0.2 and change our 5.0.2 or later MySQL requirements to 5.0.3 or later)) * MyISAM's index prefix maximum is 1000 bytes and InnoDB is 767 (unless you use a dynamic/compressed field and innodb_large_prefix). This means that [[A{1000 bytes}A]] and [[A{1000 bytes}B]] cannot both exist as the index prefix is used to ensure uniqueness. This limit will be database dependent. And the only fix would be to add a new column containing a hash of the title text and drop the uniqueness constraint on page_title. Deciding the title max we should use will probably need some more information than what I've gathered so far. ((Side topic: We use `varchar(n) binary` for the title now. However anyone that feels like changing this to `varchar(n) CHARACTER SET utf8` needs to be wary that MySQL triples the (n) so it can store n utf8 chars instead of n bytes that happen to be utf8 so they may need to
Re: [Wikitech-l] Page title length
On Thu, Oct 24, 2013 at 10:01 AM, Élie Roux elie.r...@telecom-bretagne.euwrote: This title is around 90 Tibetan characters, but each caracter being 3 bytes, it exceeds the limit for title length of 256 bytes that MediaWiki has. So I have two questions: 1. If I change this limit to 1023 in the structure of the database ('page_title' field of the 'page' base), will other things (such as search engine) break? Is there a way to change it more cleanly? 2. Could I propose you to make this limit 1023 instead of 255 (or to make it configurable easily)? This would allow at least 256 characters (even for asian languages) instead of 256 bytes, which seems more consistant with the fact that MediaWiki is well internationalized. As mentioned already in this thread, changing this limit is possible in theory but . not simple to do reliably, as there are a number of database columns with the same limitation, and the 255-byte limit assumption is unfortunately scattered throughout a number of places in the code. If you only need to *display* longer-form original page titles on the article, you might consider using the {{DISPLAYTITLE:}} keyword with $wgRestrictDisplayTitle disabled to allow fairly arbitrary customizations: https://www.mediawiki.org/wiki/DISPLAYTITLE#Displaytitle https://www.mediawiki.org/wiki/Manual:$wgRestrictDisplayTitle If you really want the full-length titles as the low-level page title, you might try something like this: Note that as of MySQL 5.0.3, VARCHAR columns are no longer limited to 255 bytes -- they can go up to 65,535 (or the maximum row size defined for the table). So it should be possible to change the column definitions... but I have not tested it and make no guarantees! 1) Before configuring MediaWiki, edit maintenance/tables.sql and change instances of VARCHAR(255) to VARCHAR(1023). 2) Search around for '255' appearing in .php, .inc, or .js files and change the checks to 1023. You might be able to get away with mostly changing just this bit in includes/Title.php: # Limit the size of titles to 255 bytes. This is typically the size of the # underlying database field. We make an exception for special pages, which # don't need to be stored in the database, and may edge over 255 bytes due # to subpage syntax for long titles, e.g. [[Special:Block/Long name]] if ( ( $this-mNamespace != NS_SPECIAL strlen( $dbkey ) 255 ) || strlen( $dbkey ) 512 ) { return false; } Update both the 255 and the 512 check with larger numbers (the 512 check is for Special: pages, since they often take other page titles as parameters). There's a JavaScript clone of this check in resources/mediawiki/mediawiki.Title.js, which should probably be updated likewise. There are similar checks for other fields such as the edit comment, which you might also want to make sure get increased, but that looks like the most important one. Be sure *not* to change instances of '255' that are actually color components or other values -- you can't safely do a mass search and replace! -- brion Thank you in advance, -- Elie __**_ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Page title length
On 2013-10-24 3:00 PM, Brion Vibber wrote: 2) Search around for '255' appearing in .php, .inc, or .js files and change the checks to 1023. You might be able to get away with mostly changing just this bit in includes/Title.php: # Limit the size of titles to 255 bytes. This is typically the size of the # underlying database field. We make an exception for special pages, which # don't need to be stored in the database, and may edge over 255 bytes due # to subpage syntax for long titles, e.g. [[Special:Block/Long name]] if ( ( $this-mNamespace != NS_SPECIAL strlen( $dbkey ) 255 ) || strlen( $dbkey ) 512 ) { return false; } Update both the 255 and the 512 check with larger numbers (the 512 check is for Special: pages, since they often take other page titles as parameters). There's a JavaScript clone of this check in resources/mediawiki/mediawiki.Title.js, which should probably be updated likewise. Also either update all user_name fields in the database or update User::getCanonicalName to use a 255 byte limit instead of depending only on Title for that. ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/] ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Page title length
Changing the maximum length of a title will require a huge number of changes and is not easy to make configurable so we should do this only one time changing the maximum length of title to the maximum we can feasibly make it. ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/] Reposted to https://www.mediawiki.org/wiki/Page_title_size_limitations . -- Nathan Larson https://mediawiki.org/wiki/User:Leucosticte Distribution of my contributions to this email is hereby authorized pursuant to the CC0 licensehttp://creativecommons.org/publicdomain/zero/1.0/ . ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Page title length
On 2013-10-24 5:05 PM, Nathan Larson wrote: Reposted to https://www.mediawiki.org/wiki/Page_title_size_limitations . I cleaned up and cited it. Though in this instance writing up an RFC quoting the research I did on what needs changing and what kind of limits we need to understand to pick the maximum title size would have been better in this instance than a page. ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/] ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l