--- Comment #19 from Aryeh Gregor <simetrical+wikib...@gmail.com> 2010-03-03
18:54:20 UTC ---
(In reply to comment #18)
> It's not so much the age of the software as the inefficiency of adding and
> converting to larger fields in the database schema.
This isn't a big issue. We have a system in place for schema changes, even on
large tables. The problem is the efficiency of the resulting deployed system.
> It also doesn't help that most people equate one character with one byte and
> forget that users of non-Latin scripts are stuck with an encoding that takes
> two to three times as much storage space.
I hope by "most people" you don't mean to include developers and sysadmins.
> Nobody is suggesting anything like 1000 Unicode characters. 200 Unicode
> characters -- i.e. the same length Latin-script users get already -- would be
> more than enough, but currently scripts that are encoded with multibyte
> characters (almost anything non-Latin) don't get anywhere near that much.
I disagree. I regularly want to use English-language summaries longer than 255
ASCII characters. If we're going to allow more than 255 bytes, we'll be
allowing at least 64K bytes (likely 16M, if we use the text table) on the
backend, so how many are allowed in an actual comment will be
admin-configurable. I imagine some wikis would request more than 255
characters -- I'd support a software default of 500 or so. The current limit
is only because we use varchar(255).
> Not really. Just cap the automatic summary length at a fixed number of
> characters, rather than bytes. No reason why non-Latin script shouldn't get to
> see the first ~200 characters of their articles too.
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
Wikibugs-l mailing list