Re: [Wikitech-l] Category sorting and first letters
2011/1/18 Tim Starling tstarl...@wikimedia.org: On 18/01/11 07:41, Amir E. Aharoni wrote: And i don't know what to do when in the Lithuanian Wikipedia you sort names of places in the UK - should Islington come before or after York? Before. $collator = new Collator('lt') print $collator-compare( 'Islington', 'York' ) -1 But more interestingly, York goes before London: print $collator-compare( 'York', 'London' ) -1 'York' before 'London' makes sense in lt context, but 'York' before 'Islington' is weird, because to the best of my understanding, it's supposed to be sorted as if it was written 'Iork'. A dictionary that i have at home puts 'ylaragis' before 'įlašeti'. I think attempting to do it any other way would be a lot of trouble, and not what is wanted anyway. To put the question another way: on the English Wikipedia, should Kybartai sort before Klaipėda? I would think not. The intuitive answer is that in en.wikipedia Kybartai should usually be after Klaipėda, although some clever sorting is desirable. Even more so for Wiktionary. For lt.wikipedia, this is something that its editors and readers should decide. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WMDE Developer Meetup moved to May
On 17.01.2011 19:38, Bryan Tong Minh wrote: On Mon, Jan 17, 2011 at 5:11 PM, Daniel Kinzler dan...@brightbyte.de wrote: * There will be a hackathon hosted by Wikimedia Germany in (late) May, probably in Berlin, but that's not decided yet. This will mostly about hacking, with a strong focus on GLAM related stuff. There will be little in terms of presentations. Hmm that would quite suck for me. Will it be during the weekend or during work days? Not sure yet, but I'd prefer something weekendish - say, friday and saturday being the core days. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WMDE Developer Meetup moved to May
On 17.01.2011 19:53, Ashar Voultoiz wrote: On 17/01/11 17:11, Daniel Kinzler wrote: * There will be a hackathon hosted by Wikimedia Germany in (late) May, probably in Berlin, but that's not decided yet. This will mostly about hacking, with a strong focus on GLAM related stuff. There will be little in terms of presentations. I will be able to attend this event wherever it is. Would it be possible to set the date as early as possible so we can arrange days off with our employers and get cheap flights? We'll do our best. Is there any blog / rss feeds I can add to make sure I do not miss any information? ;) the google calendar at http://tinyurl.com/wmde-events is probably the best bet -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Category sorting and first letters
Tim Starling (2011-01-18 02:03): On 18/01/11 07:41, Amir E. Aharoni wrote: 2011/1/17 Tim Starlingtstarl...@wikimedia.org: * It automatically drops accents, since accented letters sort the same as unaccented letters (at the primary level). How locale aware is it? For example, in Swedish accented letters come at the end of the alphabet and in Lithuanian I, Į and Y are collated together as if they were one letter. There are many quirks of this kind in other languages. It's not locale-aware. As I said, it's a compromise collation. I was hoping that other people might be interested in adding support for specific locales, that's part of the reason for my post. ICU supports lots of different locales, and there is locale-specific collation data in the CLDR. And i don't know what to do when in the Lithuanian Wikipedia you sort names of places in the UK - should Islington come before or after York? Before. $collator = new Collator('lt') print $collator-compare( 'Islington', 'York' ) -1 But more interestingly, York goes before London: print $collator-compare( 'York', 'London' ) -1 I think attempting to do it any other way would be a lot of trouble, and not what is wanted anyway. To put the question another way: on the English Wikipedia, should Kybartai sort before Klaipėda? I would think not. I've seen sorting accent insensitive and so for example Bańka would be sorted as if it was Banka, but I haven't yet seen phone insensitive or whatever you call it. What I mean is in Poland rz i pronounced the same (almost the same) as ż, but rz is nowhere near ż when it comes to sorting. In fact it would be very counter intuitive for me (as would be 'York' 'London'). I think it would not be helpful especially for foreigners. I've also said that I've _seen_ accent insensitive dictionaries, but _most_ are case sensitive and so ą a not ą=a also when it comes to the first letter all dictionaries I know have Ż separate from Z. You might see our collation as - without accent first and with accent second. This is the why we say are ABC. And it would be intuitive for to have English collation by it's ABC with Y coming just before Z. I think the problem should only be solved for letters which are not just Latin character + accent. How to sort them in Latin (and Latin based) characters. Regards, Nux. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Minimum PHP now 5.2 in trunk (was: [Mediawiki-l] about requiring PHP 5.2)
On Wed, Nov 3, 2010 at 3:10 AM, Tim Starling tstarl...@wikimedia.org wrote: I don't think JSON support is particularly important since it can easily be simulated, and I don't think you should use the filter extension in MediaWiki, regardless of whether it is supported. I agree about filter. Having native JSON support is a nicety though, it's faster than a userland implementation. However, I can think of a good argument for moving to PHP 5.2, which is to stop the high rate of bit rot in 5.1 support. In particular, support for callbacks with double-colons to indicate static method calls: call_user_func( 'Foo::bar' ) was added in PHP 5.2.3. Developers often use these, and don't realise that they are breaking PHP 5.1 support. So I think there's a good argument for making 5.2.3 the minimum. +1 here. a::b syntax is less keystrokes having to use an array. Also lets us remove the stupid hack from r68760[0] (probably similar things elsewhere in the code) Another example of bit rot: the trunk has 3 calls to array_fill_keys(), with no simulation in GlobalFunctions.php; it was added in 5.2.0. Developers should really check the versions in the manual when they use a function, otherwise 5.2.x will soon be broken as well, in favour of 5.3.x. But in theory we can weed out calls to newly-added functions with grep. The 5.2.3 callback change was more subtle. Other reasons 5.2 is cool: - setcookie() allows httponly cookies (we conditionally support this) - __toString() works properly - Memory management improved - Lots of other stuff here [1] The consensus last time we brought this up (November) was fairly strong that we can start phasing out 5.1 support. After talking again on IRC with people today, I think we can safely break 5.1 in trunk (although lets not backport it). Once the 1.17 release is out, we should find a way to better update [2] so we can indicate that 1.17 will be the last release with 5.1 support. -Chad [0] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/68760 [1] http://php.net/migration52 [2] http://www.mediawiki.org/wiki/Manual:Installation_requirements ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Minimum PHP now 5.2 in trunk (was: [Mediawiki-l] about requiring PHP 5.2)
On Jan 18, 2011, at 2:00 PM, Chad wrote: +1 here. a::b syntax is less keystrokes having to use an array. Also lets us remove the stupid hack from r68760[0] (probably similar things elsewhere in the code) Can't forget the hack that is MWFunction::callArray, which is also a hack that is intended to fix PHP 5.1's incompatibillity. -X! ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] FYI : Commons thumbnails slow
I get long wait times for images from Commons (10sec even for small thumbnails). The browsers (Commons, Firefox) also seem to load the same image repeatedly, even though it should be in cache. I'm in the UK. Everything else seems normal (well, I'm in the UK, but otherwise... ;-). ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] FYI : Commons thumbnails slow
Op 18 jan 2011, om 22:10 heeft Magnus Manske het volgende geschreven: I get long wait times for images from Commons (10sec even for small thumbnails). The browsers (Commons, Firefox) also seem to load the same image repeatedly, even though it should be in cache. I'm in the UK. Everything else seems normal (well, I'm in the UK, but otherwise... ;-). I have noticed the same. Small images (16x16) used in the WikiEditor (not the default buttons but those added by a user script) sometimes don't appear untill half a minute after the page is loaded. -- Krinkle ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] FYI : Commons thumbnails slow
On Tue, Jan 18, 2011 at 9:15 PM, Krinkle krinklem...@gmail.com wrote: Op 18 jan 2011, om 22:10 heeft Magnus Manske het volgende geschreven: I get long wait times for images from Commons (10sec even for small thumbnails). The browsers (Commons, Firefox) also seem to load the same image repeatedly, even though it should be in cache. I'm in the UK. Everything else seems normal (well, I'm in the UK, but otherwise... ;-). I have noticed the same. Small images (16x16) used in the WikiEditor (not the default buttons but those added by a user script) sometimes don't appear untill half a minute after the page is loaded. Maybe the server-side caching time is too short, and all images are requested all the time, slowing the server? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] FYI : Commons thumbnails slow
Same in the U.S. Seeing tons of missing thumbnails, although they do seem to load after a few minutes. Ryan Kaldari On 1/18/11 1:10 PM, Magnus Manske wrote: I get long wait times for images from Commons (10sec even for small thumbnails). The browsers (Commons, Firefox) also seem to load the same image repeatedly, even though it should be in cache. I'm in the UK. Everything else seems normal (well, I'm in the UK, but otherwise... ;-). ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] FYI : Commons thumbnails slow
On Tue, Jan 18, 2011 at 1:10 PM, Magnus Manske magnusman...@googlemail.com wrote: I get long wait times for images from Commons (10sec even for small thumbnails). The browsers (Commons, Firefox) also seem to load the same image repeatedly, even though it should be in cache. I'm in the UK. Everything else seems normal (well, I'm in the UK, but otherwise... ;-). Thankfully, I do read wikitech-l often, but please, please, report this on IRC as well, if possible everyone. Thank you for the report, btw. We're looking into it now. Respectfully, Ryan Lane ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Category sorting and first letters
Maciej Jaros (2011-01-18 15:42): Tim Starling (2011-01-18 02:03): On 18/01/11 07:41, Amir E. Aharoni wrote: 2011/1/17 Tim Starlingtstarl...@wikimedia.org: * It automatically drops accents, since accented letters sort the same as unaccented letters (at the primary level). How locale aware is it? For example, in Swedish accented letters come at the end of the alphabet and in Lithuanian I, Į and Y are collated together as if they were one letter. There are many quirks of this kind in other languages. It's not locale-aware. As I said, it's a compromise collation. I was hoping that other people might be interested in adding support for specific locales, that's part of the reason for my post. ICU supports lots of different locales, and there is locale-specific collation data in the CLDR. And i don't know what to do when in the Lithuanian Wikipedia you sort names of places in the UK - should Islington come before or after York? Before. $collator = new Collator('lt') print $collator-compare( 'Islington', 'York' ) -1 But more interestingly, York goes before London: print $collator-compare( 'York', 'London' ) -1 I think attempting to do it any other way would be a lot of trouble, and not what is wanted anyway. To put the question another way: on the English Wikipedia, should Kybartai sort before Klaipėda? I would think not. I've seen sorting accent insensitive and so for example Bańka would be sorted as if it was Banka, but I haven't yet seen phone insensitive or whatever you call it. What I mean is in Poland rz is pronounced the same (almost the same) as ż, but rz is nowhere near ż when it comes to sorting. In fact it would be very counter intuitive for me (as would be 'York' 'London'). I think it would not be helpful especially for foreigners. I've also said that I've _seen_ accent insensitive dictionaries, but _most_ are case sensitive and so ą a not ą=a also when it comes to the first letter all dictionaries I know have Ż separate from Z. You might see our collation as - without accent first and with accent second. /This is the why we say are ABC. And it would be intuitive for to have English collation by it's ABC with Y coming just before Z./ Sorry, sometimes I type phonetically :-). The last sentences were supposed to be: This is the way we say our ABC. And it would be intuitive for me to have English collation by its ABC with Y coming just before Z. I think the problem should only be solved for letters which are not just Latin character + accent. How to sort them in Latin (and Latin based) characters. Regards, Nux. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] FYI : Commons thumbnails slow
On Tue, Jan 18, 2011 at 10:23 PM, Ryan Lane rlan...@gmail.com wrote: Thankfully, I do read wikitech-l often, but please, please, report this on IRC as well, if possible everyone. I'm not doing much on IRC, and apparently I'm not alone with that here. Assuming that someone reporting a problem to wikitech-l will start a new thread (as I did), how about auto-posting each new thread subject line from wikitech-l on IRC automatically? (here I am again, throwing tech solutions at social problems...) Cheers, Magnus ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] FYI : Commons thumbnails slow
I'm not doing much on IRC, and apparently I'm not alone with that here. Assuming that someone reporting a problem to wikitech-l will start a new thread (as I did), how about auto-posting each new thread subject line from wikitech-l on IRC automatically? (here I am again, throwing tech solutions at social problems...) Almost all threads on wikitech-l are dev related, not ops. I happen to do both, so I noticed it. Pushing all thread subjects from wikitech-l to ops folks would pretty much be spam. Either way, a report is better than no report ;). Respectfully, Ryan Lane ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] From page history to sentence history
On Mon, Jan 17, 2011 at 9:12 PM, Roan Kattouw roan.katt...@gmail.com wrote: Wikimedia doesn't technically use delta compression. It concatenates a couple dozen adjacent revisions of the same page and compresses that (with gzip?), achieving very good compression ratios because there is a huge amount of duplication in, say, 20 adjacent revisions of [[Barack Obama]] (small changes to a large page, probably a few identical versions to due vandalism reverts, etc.). We used to do this, but the problem was that many articles are much larger than the compression window of typical compression algorithms, so the redundancy between adjacent revisions wasn't helping compression except for short articles. Tim wrote a diff-based history storage method (see DiffHistoryBlob in includes/HistoryBlob.php) and deployed it on Wikimedia, for 93% space savings: http://lists.wikimedia.org/pipermail/wikitech-l/2010-March/047231.html I don't know if this was ever deployed to all of external storage, though. In that thread Tim mentioned only recompressing about 40% of revisions, and said that the recompression script required care and human attention to work correctly, so maybe he never got around to recompressing all the rest -- I don't think he ever said, that I saw. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] From page history to sentence history
2011/1/19 Aryeh Gregor simetrical+wikil...@gmail.com: We used to do this, but the problem was that many articles are much larger than the compression window of typical compression algorithms, so the redundancy between adjacent revisions wasn't helping compression except for short articles. Tim wrote a diff-based history storage method (see DiffHistoryBlob in includes/HistoryBlob.php) and deployed it on Wikimedia, for 93% space savings: http://lists.wikimedia.org/pipermail/wikitech-l/2010-March/047231.html That's right, I forgot about that. I don't know if this was ever deployed to all of external storage, though. In that thread Tim mentioned only recompressing about 40% of revisions, and said that the recompression script required care and human attention to work correctly, so maybe he never got around to recompressing all the rest -- I don't think he ever said, that I saw. I think he finished recompressing a couple of months ago. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] From page history to sentence history
It seems a complely different topic, but: is there something to learn about text saving from the smart trick of TeX formulas storing? I did a little bit of reverse engineering on that algorithm, I did never find anything useful application from it, but much fun. :-) Alex ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] FYI : Commons thumbnails slow
Op 19 jan 2011, om 00:11 heeft Ryan Lane het volgende geschreven: I'm not doing much on IRC, and apparently I'm not alone with that here. Assuming that someone reporting a problem to wikitech-l will start a new thread (as I did), how about auto-posting each new thread subject line from wikitech-l on IRC automatically? (here I am again, throwing tech solutions at social problems...) Almost all threads on wikitech-l are dev related, not ops. I happen to do both, so I noticed it. Pushing all thread subjects from wikitech-l to ops folks would pretty much be spam. Either way, a report is better than no report ;). Respectfully, Ryan Lane There's a bot in #wikimedia-toolserver reporting activity on [toolserver-l], I'll see if I can get a similar thing going for [wikitech-l]. In which channel would we want this though ? I assume #wikimedia-tech although, like Ryan said about dev/ops related, perhaps better in #wikimedia-dev (or both?) -- Krinkle ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] FYI : Commons thumbnails slow
Krinkle wrote: There's a bot in #wikimedia-toolserver reporting activity on [toolserver-l], I'll see if I can get a similar thing going for [wikitech-l]. In which channel would we want this though ? I assume #wikimedia-tech although, like Ryan said about dev/ops related, perhaps better in #wikimedia-dev (or both?) I run the bot you're talking about in #wikimedia-toolserver. Her name is Reba and she's a fine and mostly reliable lady. I don't imagine anyone wants a bot in #wikimedia-tech or #wikimedia-dev. It's noisy and largely pointless (spamming every reply to a thread about threatening to rewrite the parser to an IRC channel doesn't help anything or anyone). We need a better system for (power-)users to report site problems. Something that doesn't involve a mailing list, but something that likely has an IRC component (given that most of the ops idle there). A clean web UI -- IRC system could possibly work, but any system like that is open to abuse and misuse. MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l