Re: [Wikitech-l] Page title length

2013-10-31 Thread Daniel Werner
This is not an exact answer to your question, but rather a simple and
powerful alternative.

If you are thinking about using Semantic-MediaWiki (which would be
very applicable for a Wiki about resources), you should have a look
at:
http://www.mediawiki.org/wiki/Extension:SemanticTitle

The SemanticTitle  extension seems to be currently unmaintained but
just yesterday there has been a talk at SMW-Con here in Berlin which
revealed plans by MITRE to soon commit their recent changes on the
extension and perhaps even maintain it in the future.

More information about the talk:
http://semantic-mediawiki.org/wiki/SMWCon_Fall_2013/Revolutionizing_page_naming_with_semantic_properties
The slides of the talk are already available on that page, a video
recording should follow soon.

Cheers,
Daniel Werner

2013/10/24 Élie Roux elie.r...@telecom-bretagne.eu:
 Dear MediaWiki developers,

 I'm responsible for the development of a new Wiki that will contain many
 Tibetan resources. Traditionnaly, Tibetan titles of books or even parts of
 books are extremely long, as you can see for instance here :
 http://www.tbrc.org/#!rid=W23922, and sometimes too long for Mediawiki, for
 instance the title of

 http://www.tbrc.org/?locale=bo#library_work_ViewByOutline-O01DG1049094951|W23922

 , which is

 ཤངས་པ་བཀའ་བརྒྱུད་ཀྱི་གཞུང་བཀའ་ཕྱི་མ་རྣམས་ཕྱོགས་གཅིག་ཏུ་བསྒྲིལ་བའི་ཕྱག་ལེན་བདེ་ཆེན་སྙེ་མའི་ཆུན་པོ་

 . This title is around 90 Tibetan characters, but each caracter being 3
 bytes, it exceeds the limit for title length of 256 bytes that MediaWiki
 has.

 So I have two questions:

 1. If I change this limit to 1023 in the structure of the database
 ('page_title' field of the 'page' base), will other things (such as search
 engine) break? Is there a way to change it more cleanly?

 2. Could I propose you to make this limit 1023 instead of 255 (or to make it
 configurable easily)? This would allow at least 256 characters (even for
 asian languages) instead of 256 bytes, which seems more consistant with the
 fact that MediaWiki is well internationalized.

 Thank you in advance,
 --
 Elie

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Daniel Werner
Software Engineer

Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. (030) 219 158 26-0

http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Page title length

2013-10-25 Thread Élie Roux

Reposted to https://www.mediawiki.org/wiki/Page_title_size_limitations .


Thank you very much Nathan, Brion and Daniel for your research and good 
advices!


It seems indeed more complex that I naively thought, but I'll try to 
make the changes you advised soon, keeping track of everything so that a 
good recipe can be written.


Should I open a bugreport about this?

Thank you,
--
Elie

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Page title length

2013-10-25 Thread Daniel Barrett
Search around for '255' appearing in .php, .inc, or .js files and change the 
checks to 1023. [...]

Should someone maybe define a constant Title::MAX_LENGTH instead of hard-coding 
255 in many PHP files?

DanB
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Page title length

2013-10-25 Thread Brion Vibber
On Fri, Oct 25, 2013 at 8:19 AM, Daniel Barrett d...@vistaprint.com wrote:

 Search around for '255' appearing in .php, .inc, or .js files and change
 the checks to 1023. [...]

 Should someone maybe define a constant Title::MAX_LENGTH instead of
 hard-coding 255 in many PHP files?


Yes, such a thing should be done. :)

We'll also have to find a way to export the value to JavaScript and to SQL.
JS can probably use a global config var or such; SQL can probably use a
magic comment+token replacement like the db type and prefix magic.

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Page title length

2013-10-24 Thread Élie Roux

Dear MediaWiki developers,

I'm responsible for the development of a new Wiki that will contain many 
Tibetan resources. Traditionnaly, Tibetan titles of books or even parts 
of books are extremely long, as you can see for instance here : 
http://www.tbrc.org/#!rid=W23922, and sometimes too long for Mediawiki, 
for instance the title of


http://www.tbrc.org/?locale=bo#library_work_ViewByOutline-O01DG1049094951|W23922

, which is

ཤངས་པ་བཀའ་བརྒྱུད་ཀྱི་གཞུང་བཀའ་ཕྱི་མ་རྣམས་ཕྱོགས་གཅིག་ཏུ་བསྒྲིལ་བའི་ཕྱག་ལེན་བདེ་ཆེན་སྙེ་མའི་ཆུན་པོ་

. This title is around 90 Tibetan characters, but each caracter being 3 
bytes, it exceeds the limit for title length of 256 bytes that MediaWiki 
has.


So I have two questions:

1. If I change this limit to 1023 in the structure of the database 
('page_title' field of the 'page' base), will other things (such as 
search engine) break? Is there a way to change it more cleanly?


2. Could I propose you to make this limit 1023 instead of 255 (or to 
make it configurable easily)? This would allow at least 256 characters 
(even for asian languages) instead of 256 bytes, which seems more 
consistant with the fact that MediaWiki is well internationalized.


Thank you in advance,
--
Elie

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Page title length

2013-10-24 Thread Nathan Larson
On Thu, Oct 24, 2013 at 1:01 PM, Élie Roux elie.r...@telecom-bretagne.euwrote:

 Dear MediaWiki developers,

 I'm responsible for the development of a new Wiki that will contain many
 Tibetan resources. Traditionnaly, Tibetan titles of books or even parts of
 books are extremely long, as you can see for instance here :
 http://www.tbrc.org/#!rid=**W23922 http://www.tbrc.org/#!rid=W23922,
 and sometimes too long for Mediawiki, for instance the title of

 http://www.tbrc.org/?locale=**bo#library_work_ViewByOutline-**
 O01DG1049094951|W23922http://www.tbrc.org/?locale=bo#library_work_ViewByOutline-O01DG1049094951%7CW23922

 , which is

 ཤངས་པ་བཀའ་བརྒྱུད་ཀྱི་གཞུང་བཀའ་**ཕྱི་མ་རྣམས་ཕྱོགས་གཅིག་ཏུ་བསྒྲི**
 ལ་བའི་ཕྱག་ལེན་བདེ་ཆེན་སྙེ་མའི་**ཆུན་པོ་

 . This title is around 90 Tibetan characters, but each caracter being 3
 bytes, it exceeds the limit for title length of 256 bytes that MediaWiki
 has.

 So I have two questions:

 1. If I change this limit to 1023 in the structure of the database
 ('page_title' field of the 'page' base), will other things (such as search
 engine) break? Is there a way to change it more cleanly?

 2. Could I propose you to make this limit 1023 instead of 255 (or to make
 it configurable easily)? This would allow at least 256 characters (even for
 asian languages) instead of 256 bytes, which seems more consistant with the
 fact that MediaWiki is well internationalized.

 Thank you in advance,
 --
 Elie


Wouldn't ar.title, log_title, rc_title, and so on also need to have their
sizes increased? I didn't see any bug report proposing an increase in page
title size, although there was this one about comment size.
https://bugzilla.wikimedia.org/show_bug.cgi?id=4715
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Page title length

2013-10-24 Thread Daniel Friesen
On 2013-10-24 10:01 AM, Élie Roux wrote:
 1. If I change this limit to 1023 in the structure of the database
 ('page_title' field of the 'page' base), will other things (such as
 search engine) break? Is there a way to change it more cleanly?
Other things won't break. But the limit is hardcoded into the software
so it won't actually change the limit. However if you change both places
you will encounter bugs unless you make sure to make a huge number of
other changes to the database and also handle user names by either
changing every column that accepts a user name or editing User.php to
use a limit other than what Title uses.

 2. Could I propose you to make this limit 1023 instead of 255 (or to
 make it configurable easily)? This would allow at least 256 characters
 (even for asian languages) instead of 256 bytes, which seems more
 consistant with the fact that MediaWiki is well internationalized.
Just to clarify (You're switching between 255 and 256 through your
message) it's 255 bytes. The choice of using 255 bytes for title length
wasn't an arbitrary choice. 255 bytes – not characters – was picked
because 255 bytes is the maximum string length that can be represented
using 1 byte. ie: 1 byte (8 bits - 2^8=265 values not including 0) can
represent the range 0-255 so a varchar uses 1 byte to declare the length
of the varchar's contents. If you use ANY length larger than 255 the
varchar will always require 2 bytes to store the length of the column.

Changing the maximum length of a title will require a huge number of
changes and is not easy to make configurable so we should do this only
one time changing the maximum length of title to the maximum we can
feasibly make it.

  * It's hardcoded in Title::secureAndSplit.
  * User::getCanonicalName will need some extra code to restrict
usernames to 255 bytes as it currently depends on Title to do that.
(And increasing the user name length is far more problematic than
the title length)
  * docs/title.txt and TitleTest.php will need an update.
  * The canonical page.page_title needs an ALTER COLUMN.
  * A number of other core tables will need column alters; (archive,
page, template, image, category)links.(a, p, t, i, c)l_title,
category.cat_title, (lang, iw)links.(l,iw)l_title,
recentchanges.rc_title, and so on.
  * All extensions that store titles in the database will need to do
similar ALTER COLUMN updates or else they will have unexpected errors.
  o It might be worth doing this first. Since a larger varchar
length in the database won't cause any bugs while the software
still uses 255 bytes.


The next (and tbh final) title max length shouldn't be something
arbitrary, we should pick the maximum we can theoretically store (max we
can store with 2 bytes/a 16bit uint and the max we can possibly store
are actually the same thing, at least on MySQL in regards to VARCHAR).

The next step up from 1 byte representing 0-255 is not 0-1023 as you're
thinking. 2 bytes can represent a length of 0-65535 bytes.
However we actually cannot use VARCHAR(65535) for the title. There are a
number of other limits we hit which cap varchar column lengths below
what can be represented with a 16bit uint length.

  * MySQL Has a maximum row length of 65,535 bytes[1]; This includes the
storage of whatever all the columns on the table require in the row
storage.
  * PostgreSQL doesn't seem to have the same max row size issues as
MySQL because it's row max – depending on whether you ask the about
page[2] or wiki FAQ[3] – is either 1.6TB or 400GB. And the column
max size is 1GB.
  * However PostgreSQL seems to say indexes can not be created on
columns longer than about 2,000 characters[1]. I don't know the
precise details but it might make our limit 2000 bytes. ((We'll need
some more input on someone who knows PostgreSQL))
  * The varchar WP page[4] says Oracle's limit is 4000 bytes.
  * Before MySQL 5.0.3 VARCHAR colums could only be declared a maximum
of 255. ((This means we'll have to drop support for 5.0.2 and change
our 5.0.2 or later MySQL requirements to 5.0.3 or later))
  * MyISAM's index prefix maximum is 1000 bytes and InnoDB is 767
(unless you use a dynamic/compressed field and innodb_large_prefix).
This means that [[A{1000 bytes}A]] and [[A{1000 bytes}B]] cannot
both exist as the index prefix is used to ensure uniqueness. This
limit will be database dependent. And the only fix would be to add a
new column containing a hash of the title text and drop the
uniqueness constraint on page_title.

Deciding the title max we should use will probably need some more
information than what I've gathered so far.

((Side topic: We use `varchar(n) binary` for the title now. However
anyone that feels like changing this to `varchar(n) CHARACTER SET utf8`
needs to be wary that MySQL triples the (n) so it can store n utf8
chars instead of n bytes that happen to be utf8 so they may need to

Re: [Wikitech-l] Page title length

2013-10-24 Thread Brion Vibber
On Thu, Oct 24, 2013 at 10:01 AM, Élie Roux
elie.r...@telecom-bretagne.euwrote:

 This title is around 90 Tibetan characters, but each caracter being 3
 bytes, it exceeds the limit for title length of 256 bytes that MediaWiki
 has.

 So I have two questions:

 1. If I change this limit to 1023 in the structure of the database
 ('page_title' field of the 'page' base), will other things (such as search
 engine) break? Is there a way to change it more cleanly?

 2. Could I propose you to make this limit 1023 instead of 255 (or to make
 it configurable easily)? This would allow at least 256 characters (even for
 asian languages) instead of 256 bytes, which seems more consistant with the
 fact that MediaWiki is well internationalized.


As mentioned already in this thread, changing this limit is possible in
theory but . not simple to do reliably, as there are a number of
database columns with the same limitation, and the 255-byte limit
assumption is unfortunately scattered throughout a number of places in the
code.


If you only need to *display* longer-form original page titles on the
article, you might consider using the {{DISPLAYTITLE:}} keyword with
$wgRestrictDisplayTitle disabled to allow fairly arbitrary customizations:

https://www.mediawiki.org/wiki/DISPLAYTITLE#Displaytitle
https://www.mediawiki.org/wiki/Manual:$wgRestrictDisplayTitle


If you really want the full-length titles as the low-level page title, you
might try something like this:

Note that as of MySQL 5.0.3, VARCHAR columns are no longer limited to 255
bytes -- they can go up to 65,535 (or the maximum row size defined for the
table). So it should be possible to change the column definitions... but I
have not tested it and make no guarantees!

1) Before configuring MediaWiki, edit maintenance/tables.sql and change
instances of VARCHAR(255) to VARCHAR(1023).

2) Search around for '255' appearing in .php, .inc, or .js files and change
the checks to 1023.

You might be able to get away with mostly changing just this bit in
includes/Title.php:

# Limit the size of titles to 255 bytes. This is typically the size
of the
# underlying database field. We make an exception for special
pages, which
# don't need to be stored in the database, and may edge over 255
bytes due
# to subpage syntax for long titles, e.g. [[Special:Block/Long
name]]
if (
( $this-mNamespace != NS_SPECIAL  strlen( $dbkey )  255 )
|| strlen( $dbkey )  512
) {
return false;
}

Update both the 255 and the 512 check with larger numbers (the 512 check is
for Special: pages, since they often take other page titles as parameters).
There's a JavaScript clone of this check in
resources/mediawiki/mediawiki.Title.js, which should probably be updated
likewise.

There are similar checks for other fields such as the edit comment, which
you might also want to make sure get increased, but that looks like the
most important one.

Be sure *not* to change instances of '255' that are actually color
components or other values -- you can't safely do a mass search and replace!

-- brion



 Thank you in advance,
 --
 Elie

 __**_
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Page title length

2013-10-24 Thread Daniel Friesen
On 2013-10-24 3:00 PM, Brion Vibber wrote:
 2) Search around for '255' appearing in .php, .inc, or .js files and change
 the checks to 1023.

 You might be able to get away with mostly changing just this bit in
 includes/Title.php:

 # Limit the size of titles to 255 bytes. This is typically the size
 of the
 # underlying database field. We make an exception for special
 pages, which
 # don't need to be stored in the database, and may edge over 255
 bytes due
 # to subpage syntax for long titles, e.g. [[Special:Block/Long
 name]]
 if (
 ( $this-mNamespace != NS_SPECIAL  strlen( $dbkey )  255 )
 || strlen( $dbkey )  512
 ) {
 return false;
 }

 Update both the 255 and the 512 check with larger numbers (the 512 check is
 for Special: pages, since they often take other page titles as parameters).
 There's a JavaScript clone of this check in
 resources/mediawiki/mediawiki.Title.js, which should probably be updated
 likewise.
Also either update all user_name fields in the database or update
User::getCanonicalName to use a 255 byte limit instead of depending only
on Title for that.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Page title length

2013-10-24 Thread Nathan Larson

 Changing the maximum length of a title will require a huge number of
 changes and is not easy to make configurable so we should do this only
 one time changing the maximum length of title to the maximum we can
 feasibly make it.



 ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]


Reposted to https://www.mediawiki.org/wiki/Page_title_size_limitations .

-- 
Nathan Larson https://mediawiki.org/wiki/User:Leucosticte
Distribution of my contributions to this email is hereby authorized
pursuant to the CC0 licensehttp://creativecommons.org/publicdomain/zero/1.0/
.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Page title length

2013-10-24 Thread Daniel Friesen
On 2013-10-24 5:05 PM, Nathan Larson wrote:
 Reposted to https://www.mediawiki.org/wiki/Page_title_size_limitations . 
I cleaned up and cited it. Though in this instance writing up an RFC
quoting the research I did on what needs changing and what kind of
limits we need to understand to pick the maximum title size would have
been better in this instance than a page.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l