Re: [Wikitech-l] Category sorting and first letters

2011-01-18 Thread Amir E. Aharoni
2011/1/18 Tim Starling tstarl...@wikimedia.org:
 On 18/01/11 07:41, Amir E. Aharoni wrote:
 And i don't know what to do when in the Lithuanian Wikipedia you sort
 names of places in the UK - should Islington come before or after
 York?

 Before.

 $collator = new Collator('lt')
 print $collator-compare( 'Islington', 'York' )
 -1

 But more interestingly, York goes before London:

 print $collator-compare( 'York', 'London' )
 -1

'York' before 'London' makes sense in lt context, but 'York' before
'Islington' is weird, because to the best of my understanding, it's
supposed to be sorted as if it was written 'Iork'.

A dictionary that i have at home puts 'ylaragis' before 'įlašeti'.

 I think attempting to do it any other way would be a lot of trouble,
 and not what is wanted anyway. To put the question another way: on the
 English Wikipedia, should Kybartai sort before Klaipėda? I would think
 not.

The intuitive answer is that in en.wikipedia Kybartai should usually
be after Klaipėda, although some clever sorting is desirable. Even
more so for Wiktionary.

For lt.wikipedia, this is something that its editors and readers should decide.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] WMDE Developer Meetup moved to May

2011-01-18 Thread Daniel Kinzler
On 17.01.2011 19:38, Bryan Tong Minh wrote:
 On Mon, Jan 17, 2011 at 5:11 PM, Daniel Kinzler dan...@brightbyte.de wrote:
 * There will be a hackathon hosted by Wikimedia Germany in (late) May, 
 probably
 in Berlin, but that's not decided yet. This will mostly about hacking, with a
 strong focus on GLAM related stuff. There will be little in terms of 
 presentations.
 
 Hmm that would quite suck for me. Will it be during the weekend or
 during work days?

Not sure yet, but I'd prefer something weekendish - say, friday and saturday
being the core days.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] WMDE Developer Meetup moved to May

2011-01-18 Thread Daniel Kinzler
On 17.01.2011 19:53, Ashar Voultoiz wrote:
 On 17/01/11 17:11, Daniel Kinzler wrote:
 * There will be a hackathon hosted by Wikimedia Germany in (late) May, 
 probably
 in Berlin, but that's not decided yet. This will mostly about hacking, with a
 strong focus on GLAM related stuff. There will be little in terms of 
 presentations.
 
 I will be able to attend this event wherever it is.  Would it be 
 possible to set the date as early as possible so we can arrange days off 
 with our employers and get cheap flights?

We'll do our best.

 Is there any blog / rss feeds I can add to make sure I do not miss any 
 information? ;)

the google calendar at http://tinyurl.com/wmde-events is probably the best bet

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Category sorting and first letters

2011-01-18 Thread Maciej Jaros
Tim Starling (2011-01-18 02:03):
 On 18/01/11 07:41, Amir E. Aharoni wrote:
 2011/1/17 Tim Starlingtstarl...@wikimedia.org:
 * It automatically drops accents, since accented letters sort the same
 as unaccented letters (at the primary level).
 How locale aware is it? For example, in Swedish accented letters come
 at the end of the alphabet and in Lithuanian I, Į and Y are collated
 together as if they were one letter. There are many quirks of this
 kind in other languages.
 It's not locale-aware. As I said, it's a compromise collation. I was
 hoping that other people might be interested in adding support for
 specific locales, that's part of the reason for my post. ICU supports
 lots of different locales, and there is locale-specific collation data
 in the CLDR.

 And i don't know what to do when in the Lithuanian Wikipedia you sort
 names of places in the UK - should Islington come before or after
 York?
 Before.

 $collator = new Collator('lt')
 print $collator-compare( 'Islington', 'York' )
 -1

 But more interestingly, York goes before London:

 print $collator-compare( 'York', 'London' )
 -1

 I think attempting to do it any other way would be a lot of trouble,
 and not what is wanted anyway. To put the question another way: on the
 English Wikipedia, should Kybartai sort before Klaipėda? I would think
 not.

I've seen sorting accent insensitive and so for example Bańka would be 
sorted as if it was Banka, but I haven't yet seen phone insensitive or 
whatever you call it. What I mean is in Poland rz i pronounced the 
same (almost the same) as ż, but rz is nowhere near ż when it 
comes to sorting. In fact it would be very counter intuitive for me (as 
would be 'York'  'London'). I think it would not be helpful especially 
for foreigners. I've also said that I've _seen_ accent insensitive 
dictionaries, but _most_ are case sensitive and so ą  a not ą=a 
also when it comes to the first letter all dictionaries I know have Ż 
separate from Z. You might see our collation as - without accent first 
and with accent second. This is the why we say are ABC. And it would be 
intuitive for to have English collation by it's ABC with Y coming just 
before Z.

I think the problem should only be solved for letters which are not just 
Latin character + accent. How to sort them in Latin (and Latin based) 
characters.

Regards,
Nux.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Minimum PHP now 5.2 in trunk (was: [Mediawiki-l] about requiring PHP 5.2)

2011-01-18 Thread Chad
On Wed, Nov 3, 2010 at 3:10 AM, Tim Starling tstarl...@wikimedia.org wrote:
 I don't think JSON support is particularly important since it can
 easily be simulated, and I don't think you should use the filter
 extension in MediaWiki, regardless of whether it is supported.


I agree about filter. Having native JSON support is a nicety
though, it's faster than a userland implementation.

 However, I can think of a good argument for moving to PHP 5.2, which
 is to stop the high rate of bit rot in 5.1 support. In particular,
 support for callbacks with double-colons to indicate static method calls:

 call_user_func( 'Foo::bar' )

 was added in PHP 5.2.3. Developers often use these, and don't realise
 that they are breaking PHP 5.1 support. So I think there's a good
 argument for making 5.2.3 the minimum.


+1 here. a::b syntax is less keystrokes having to use an array. Also
lets us remove the stupid hack from r68760[0] (probably similar things
elsewhere in the code)

 Another example of bit rot: the trunk has 3 calls to
 array_fill_keys(), with no simulation in GlobalFunctions.php; it was
 added in 5.2.0. Developers should really check the versions in the
 manual when they use a function, otherwise 5.2.x will soon be broken
 as well, in favour of 5.3.x. But in theory we can weed out calls to
 newly-added functions with grep. The 5.2.3 callback change was more
 subtle.


Other reasons 5.2 is cool:
- setcookie() allows httponly cookies (we conditionally support this)
- __toString() works properly
- Memory management improved
- Lots of other stuff here [1]

The consensus last time we brought this up (November) was fairly
strong that we can start phasing out 5.1 support. After talking again
on IRC with people today, I think we can safely break 5.1 in trunk
(although lets not backport it).

Once the 1.17 release is out, we should find a way to better update
[2] so we can indicate that 1.17 will be the last release with 5.1
support.

-Chad

[0] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/68760
[1] http://php.net/migration52
[2] http://www.mediawiki.org/wiki/Manual:Installation_requirements

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Minimum PHP now 5.2 in trunk (was: [Mediawiki-l] about requiring PHP 5.2)

2011-01-18 Thread Soxred93

On Jan 18, 2011, at 2:00 PM, Chad wrote:

 +1 here. a::b syntax is less keystrokes having to use an array. Also
 lets us remove the stupid hack from r68760[0] (probably similar things
 elsewhere in the code)

Can't forget the hack that is MWFunction::callArray, which is also a hack that 
is intended to fix PHP 5.1's incompatibillity. 

-X!


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] FYI : Commons thumbnails slow

2011-01-18 Thread Magnus Manske
I get long wait times for images from Commons (10sec even for small
thumbnails). The browsers (Commons, Firefox) also seem to load the
same image repeatedly, even though it should be in cache.

I'm in the UK. Everything else seems normal (well, I'm in the UK, but
otherwise... ;-).

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] FYI : Commons thumbnails slow

2011-01-18 Thread Krinkle
Op 18 jan 2011, om 22:10 heeft Magnus Manske het volgende geschreven:

 I get long wait times for images from Commons (10sec even for small
 thumbnails). The browsers (Commons, Firefox) also seem to load the
 same image repeatedly, even though it should be in cache.

 I'm in the UK. Everything else seems normal (well, I'm in the UK, but
 otherwise... ;-).


I have noticed the same. Small images (16x16) used in the WikiEditor
(not the default buttons but those added by a user script) sometimes  
don't appear
untill half a minute after the page is loaded.

--
Krinkle

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] FYI : Commons thumbnails slow

2011-01-18 Thread Magnus Manske
On Tue, Jan 18, 2011 at 9:15 PM, Krinkle krinklem...@gmail.com wrote:
 Op 18 jan 2011, om 22:10 heeft Magnus Manske het volgende geschreven:

 I get long wait times for images from Commons (10sec even for small
 thumbnails). The browsers (Commons, Firefox) also seem to load the
 same image repeatedly, even though it should be in cache.

 I'm in the UK. Everything else seems normal (well, I'm in the UK, but
 otherwise... ;-).


 I have noticed the same. Small images (16x16) used in the WikiEditor
 (not the default buttons but those added by a user script) sometimes
 don't appear
 untill half a minute after the page is loaded.

Maybe the server-side caching time is too short, and all images are
requested all the time, slowing the server?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] FYI : Commons thumbnails slow

2011-01-18 Thread Ryan Kaldari
Same in the U.S. Seeing tons of missing thumbnails, although they do 
seem to load after a few minutes.

Ryan Kaldari

On 1/18/11 1:10 PM, Magnus Manske wrote:
 I get long wait times for images from Commons (10sec even for small
 thumbnails). The browsers (Commons, Firefox) also seem to load the
 same image repeatedly, even though it should be in cache.

 I'm in the UK. Everything else seems normal (well, I'm in the UK, but
 otherwise... ;-).

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] FYI : Commons thumbnails slow

2011-01-18 Thread Ryan Lane
On Tue, Jan 18, 2011 at 1:10 PM, Magnus Manske
magnusman...@googlemail.com wrote:
 I get long wait times for images from Commons (10sec even for small
 thumbnails). The browsers (Commons, Firefox) also seem to load the
 same image repeatedly, even though it should be in cache.

 I'm in the UK. Everything else seems normal (well, I'm in the UK, but
 otherwise... ;-).


Thankfully, I do read wikitech-l often, but please, please, report
this on IRC as well, if possible everyone.

Thank you for the report, btw. We're looking into it now.

Respectfully,

Ryan Lane

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Category sorting and first letters

2011-01-18 Thread Maciej Jaros
Maciej Jaros (2011-01-18 15:42):
 Tim Starling (2011-01-18 02:03):
 On 18/01/11 07:41, Amir E. Aharoni wrote:
 2011/1/17 Tim Starlingtstarl...@wikimedia.org:
 * It automatically drops accents, since accented letters sort the same
 as unaccented letters (at the primary level).
 How locale aware is it? For example, in Swedish accented letters come
 at the end of the alphabet and in Lithuanian I, Į and Y are collated
 together as if they were one letter. There are many quirks of this
 kind in other languages.
 It's not locale-aware. As I said, it's a compromise collation. I was
 hoping that other people might be interested in adding support for
 specific locales, that's part of the reason for my post. ICU supports
 lots of different locales, and there is locale-specific collation data
 in the CLDR.

 And i don't know what to do when in the Lithuanian Wikipedia you sort
 names of places in the UK - should Islington come before or after
 York?
 Before.

 $collator = new Collator('lt')
 print $collator-compare( 'Islington', 'York' )
 -1

 But more interestingly, York goes before London:

 print $collator-compare( 'York', 'London' )
 -1

 I think attempting to do it any other way would be a lot of trouble,
 and not what is wanted anyway. To put the question another way: on the
 English Wikipedia, should Kybartai sort before Klaipėda? I would think
 not.
 I've seen sorting accent insensitive and so for example Bańka would be
 sorted as if it was Banka, but I haven't yet seen phone insensitive or
 whatever you call it. What I mean is in Poland rz is pronounced the
 same (almost the same) as ż, but rz is nowhere near ż when it
 comes to sorting. In fact it would be very counter intuitive for me (as
 would be 'York'  'London'). I think it would not be helpful especially
 for foreigners. I've also said that I've _seen_ accent insensitive
 dictionaries, but _most_ are case sensitive and so ą  a not ą=a
 also when it comes to the first letter all dictionaries I know have Ż
 separate from Z. You might see our collation as - without accent first
 and with accent second. /This is the why we say are ABC. And it would be
 intuitive for to have English collation by it's ABC with Y coming just
 before Z./

Sorry, sometimes I type phonetically :-). The last sentences were 
supposed to be:

This is the way we say our ABC. And it would be intuitive for me to have 
English collation by its ABC with Y coming just before Z.


 I think the problem should only be solved for letters which are not just
 Latin character + accent. How to sort them in Latin (and Latin based)
 characters.

 Regards,
 Nux.


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] FYI : Commons thumbnails slow

2011-01-18 Thread Magnus Manske
On Tue, Jan 18, 2011 at 10:23 PM, Ryan Lane rlan...@gmail.com wrote:
 Thankfully, I do read wikitech-l often, but please, please, report
 this on IRC as well, if possible everyone.

I'm not doing much on IRC, and apparently I'm not alone with that here.

Assuming that someone reporting a problem to wikitech-l will start a
new thread (as I did), how about auto-posting each new thread subject
line from wikitech-l on IRC automatically?

(here I am again, throwing tech solutions at social problems...)

Cheers,
Magnus

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] FYI : Commons thumbnails slow

2011-01-18 Thread Ryan Lane
 I'm not doing much on IRC, and apparently I'm not alone with that here.

 Assuming that someone reporting a problem to wikitech-l will start a
 new thread (as I did), how about auto-posting each new thread subject
 line from wikitech-l on IRC automatically?

 (here I am again, throwing tech solutions at social problems...)


Almost all threads on wikitech-l are dev related, not ops. I happen to
do both, so I noticed it. Pushing all thread subjects from wikitech-l
to ops folks would pretty much be spam.

Either way, a report is better than no report ;).

Respectfully,

Ryan Lane

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-18 Thread Aryeh Gregor
On Mon, Jan 17, 2011 at 9:12 PM, Roan Kattouw roan.katt...@gmail.com wrote:
 Wikimedia doesn't technically use delta compression. It concatenates a
 couple dozen adjacent revisions of the same page and compresses that
 (with gzip?), achieving very good compression ratios because there is
 a huge amount of duplication in, say, 20 adjacent revisions of
 [[Barack Obama]] (small changes to a large page, probably a few
 identical versions to due vandalism reverts, etc.).

We used to do this, but the problem was that many articles are much
larger than the compression window of typical compression algorithms,
so the redundancy between adjacent revisions wasn't helping
compression except for short articles.  Tim wrote a diff-based history
storage method (see DiffHistoryBlob in includes/HistoryBlob.php) and
deployed it on Wikimedia, for 93% space savings:

http://lists.wikimedia.org/pipermail/wikitech-l/2010-March/047231.html

I don't know if this was ever deployed to all of external storage,
though.  In that thread Tim mentioned only recompressing about 40% of
revisions, and said that the recompression script required care and
human attention to work correctly, so maybe he never got around to
recompressing all the rest -- I don't think he ever said, that I saw.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-18 Thread Roan Kattouw
2011/1/19 Aryeh Gregor simetrical+wikil...@gmail.com:
 We used to do this, but the problem was that many articles are much
 larger than the compression window of typical compression algorithms,
 so the redundancy between adjacent revisions wasn't helping
 compression except for short articles.  Tim wrote a diff-based history
 storage method (see DiffHistoryBlob in includes/HistoryBlob.php) and
 deployed it on Wikimedia, for 93% space savings:

 http://lists.wikimedia.org/pipermail/wikitech-l/2010-March/047231.html

That's right, I forgot about that.

 I don't know if this was ever deployed to all of external storage,
 though.  In that thread Tim mentioned only recompressing about 40% of
 revisions, and said that the recompression script required care and
 human attention to work correctly, so maybe he never got around to
 recompressing all the rest -- I don't think he ever said, that I saw.

I think he finished recompressing a couple of months ago.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] From page history to sentence history

2011-01-18 Thread Alex Brollo
It seems a complely different topic, but: is there something to learn about
text saving from the smart trick of TeX formulas storing? I did a little bit
of reverse engineering on that algorithm, I did never find anything useful
application from it, but much fun. :-)

Alex
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] FYI : Commons thumbnails slow

2011-01-18 Thread Krinkle
Op 19 jan 2011, om 00:11 heeft Ryan Lane het volgende geschreven:

 I'm not doing much on IRC, and apparently I'm not alone with that  
 here.

 Assuming that someone reporting a problem to wikitech-l will start a
 new thread (as I did), how about auto-posting each new thread subject
 line from wikitech-l on IRC automatically?

 (here I am again, throwing tech solutions at social problems...)


 Almost all threads on wikitech-l are dev related, not ops. I happen to
 do both, so I noticed it. Pushing all thread subjects from wikitech-l
 to ops folks would pretty much be spam.

 Either way, a report is better than no report ;).

 Respectfully,

 Ryan Lane

There's a bot in #wikimedia-toolserver reporting activity on  
[toolserver-l], I'll see if I can
get a similar thing going for [wikitech-l]. In which channel would we  
want this though ?

I assume #wikimedia-tech although, like Ryan said about dev/ops related,
perhaps better in #wikimedia-dev (or both?)

--
Krinkle

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] FYI : Commons thumbnails slow

2011-01-18 Thread MZMcBride
Krinkle wrote:
 There's a bot in #wikimedia-toolserver reporting activity on [toolserver-l],
 I'll see if I can get a similar thing going for [wikitech-l]. In which channel
 would we want this though ?
 
 I assume #wikimedia-tech although, like Ryan said about dev/ops related,
 perhaps better in #wikimedia-dev (or both?)

I run the bot you're talking about in #wikimedia-toolserver. Her name is
Reba and she's a fine and mostly reliable lady.

I don't imagine anyone wants a bot in #wikimedia-tech or #wikimedia-dev.
It's noisy and largely pointless (spamming every reply to a thread about
threatening to rewrite the parser to an IRC channel doesn't help anything or
anyone). We need a better system for (power-)users to report site problems.
Something that doesn't involve a mailing list, but something that likely has
an IRC component (given that most of the ops idle there). A clean web UI --
IRC system could possibly work, but any system like that is open to abuse
and misuse.

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l