Re: [Wikitech-l] Syntax-highlighting JS CSS code editor gadget embedding Ace
Just came across jsfiddle ( http://jsfiddle.net/ ) via d3 ( https://github.com/mbostock/d3 ) http://jsfiddle.net/mbostock/EVnvj/ Uses CodeMirror http://codemirror.net/ -J -Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Brion Vibber Sent: 13 April 2011 18:31 To: Wikimedia developers Subject: Re: [Wikitech-l] Syntax-highlighting JS CSS code editor gadget embedding Ace On Wed, Apr 13, 2011 at 7:23 AM, Michael Dale md...@wikimedia.org wrote: Very cool. Especially given the development trajectory of Ace to become the eclipse of web IDEs there will be a lot of interesting possibilities as we could develop our own mediaWiki centric plugins for the platform. I can't help but think about where this is ideally headed ;) A gitorius type system for easy branching with mediaWiki.org code review style tools, with in browser editing. With seemless workflows for going from per user developing and testing on the live site, to commits to your personal repository, to being reviewed and tested by other developers, to being enabled by interested users, to being enabled by default if so desired. [snip lots of awesome sauce] I, for one, welcome our new integrated development overlords! :D I started up a page of notes and smaller steps on the road to awesomeness which we can start expanding on: http://www.mediawiki.org/wiki/Gadget_Studio The main things I want to hit in the immediate future are syntax highlighting (including clear detection of parse errors, which I don't think Ace does yet) for editing gadgets and site user scripts. For the upcoming parser stuff we'll want to do lots of experiments, and rapid prototyping the JavaScript-side implementations seems like a good way to get stuff into preliminary testing quickly, so being able to tweak code and immediately re-run it on something is going to be nice. I like the idea of being able to work on a core or extension JS module in-place too though, that could be interesting. :) Not everything will be amenable to being reloaded in the middle of a page view, but things that are could benefit from that kind of testing turnover. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Using MySQL as a NoSQL
Hi, -Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Domas Mituzas Sent: 24 December 2010 09:09 To: Wikimedia developers Subject: Re: [Wikitech-l] Using MySQL as a NoSQL Hi! A: It's easy to get fast results if you don't care about your reads being atomic (*), and I find it hard to believe they've managed to get atomic reads without going through MySQL. MySQL upper layers know nothing much about transactions, it is all engine-specific - BEGIN and COMMIT processing is deferred to table handlers. It would incredibly easy for them to implement repeatable read snapshots :) (if thats what you mean by atomic read) It seems from my tinkering that MySQL query cache handling is circumvented via HandlerSocket. So if you update/insert/delete via HandlerSocket, then query via SQL your not guarenteed to see the changes unless you use SQL_NO_CACHE. (*) Among other possibilities, just use MyISAM. How is that applicable to any discussion? Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Using MySQL as a NoSQL
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Domas Mituzas Sent: 24 December 2010 13:42 To: Wikimedia developers Subject: Re: [Wikitech-l] Using MySQL as a NoSQL Hi! It seems from my tinkering that MySQL query cache handling is circumvented via HandlerSocket. On busy systems (I assume we talk about busy systems, as discussion is about HS) query cache is usually eliminated anyway. Either by compiling it out, or by patching the code not to use qcache mutexes unless it really really is enabled. In worst case, it is just simply disabled. :) So if you update/insert/delete via HandlerSocket, then query via SQL your not guarenteed to see the changes unless you use SQL_NO_CACHE. You are probably right. Again, nobody cares about qcache at those performance boundaries. Domas Ah, interesting. The only reason I took at it was because you don't have to pfaff with encoding/escaping values* the way you have to SQL. SQL injection vulnerabilities don't exist. * And the protocol handles binary values which normally have to pfaff about getting in and out of MySQL with the various PHP apis. Does seem a bit specialised, could have a persistent cache, maybe as a session handler. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Using MySQL as a NoSQL
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Jared Williams Sent: 24 December 2010 16:18 To: 'Wikimedia developers' Subject: Re: [Wikitech-l] Using MySQL as a NoSQL -Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Domas Mituzas Sent: 24 December 2010 13:42 To: Wikimedia developers Subject: Re: [Wikitech-l] Using MySQL as a NoSQL Hi! It seems from my tinkering that MySQL query cache handling is circumvented via HandlerSocket. On busy systems (I assume we talk about busy systems, as discussion is about HS) query cache is usually eliminated anyway. Either by compiling it out, or by patching the code not to use qcache mutexes unless it really really is enabled. In worst case, it is just simply disabled. :) So if you update/insert/delete via HandlerSocket, then query via SQL your not guarenteed to see the changes unless you use SQL_NO_CACHE. You are probably right. Again, nobody cares about qcache at those performance boundaries. Domas Ah, interesting. The only reason I took at it was because you don't have to pfaff with encoding/escaping values* the way you have to SQL. SQL injection vulnerabilities don't exist. * And the protocol handles binary values which normally have to pfaff about getting in and out of MySQL with the various PHP apis. Does seem a bit specialised, could have a persistent cache, maybe as a session handler. Maybe a session handler even. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] New password hashing proposal
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Tim Starling Sent: 19 August 2010 07:37 To: wikitech-l@lists.wikimedia.org Subject: [Wikitech-l] New password hashing proposal It's been said (e.g. [1]) that hashing passwords with two rounds of MD5 is basically a waste of time these days, because brute-forcing even relatively long passwords is now feasible with cheap hardware. Indeed, you can buy software [2] which claims to be able to check 90 million MediaWiki passwords per second on an ordinary GPU. That would let you crack a random 8-letter password in 20 minutes. So the time has probably come for us to come up with a C type password hashing scheme, to replace the B-type hashes that we use at the moment. I've been thinking along the lines of the following goals: 1. Future-proof: should be adaptable to faster hardware. 2. Upgradeable: it should be possible to compute the C-type hash from the B-type hash, to allow upgrades without bothering users. 3. Efficient in PHP, with default configure options. 4. MediaWiki-specific, so that generic software can't be used to crack our hashes. The problem with the standard key strengthening algorithms, e.g. PBKDF1, is that they are not efficient in PHP. We don't want a C implementation of our scheme to be orders of magnitude faster than our PHP implementation, because that would allow brute-forcing to be more feasible than is necessary. The idea I came up with is to hash the output of str_repeat(). This increases the number of rounds of the compression function, while avoiding tight loops in PHP code. PHP's hash extension has been available by default since PHP 5.1.2, and we can always fall back to using B-type hashes if it's explicitly disabled. The WHIRLPOOL hash is supported. It has no patent or copyright restrictions so it's not going to be yanked out of Debian or PHP for legal reasons. It has a 512-bit block size, the largest of any hash function available in PHP, and its security goals state that it can be truncated without compromising its properties. My proposed hash function is a B-type MD5 salted hash, which is then further hashed with a configurable number of invocations of WHIRLPOOL, with a 256-bit substring taken from a MediaWiki-specific location. The input to each WHIRLPOOL operation is expanded by a factor of 100 with str_repeat(). The number of WHIRLPOOL iterations is specified in the output string as a base-2 logarithm (whimsically padded out to 3 decimal digits to allow for future universe-sized computers). This number can be upgraded by taking the hash part of the output and applying more rounds to it. A count of 2^7 = 128 gives a time of 55ms on my laptop, and 12ms on one of our servers, so a reasonable default is probably 2^6 or 2^7. Demo code: http://p.defau.lt/?udYa5CYhHFrgk4SBFiTpGA Typical output: :C:007:187aabf399e25aa1:9441ccffe8f1afd8c277f4d914ce03c6fcfe15 7457596709d846ff832022b037 -- Tim Starling [1] http://www.theregister.co.uk/2010/08/16/password_security_analysis/ [2] http://www.insidepro.com/eng/egb.shtml PHP's crypt has been upgraded in recent times to now include Ulrich Dreppers' SHA crypt [1] Certainly mets 1 3. [1] http://www.akkadia.org/drepper/SHA-crypt.txt Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hiphop progress
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Domas Mituzas Sent: 01 March 2010 10:11 To: Wikimedia developers Subject: [Wikitech-l] hiphop progress Howdy, Most of the code in MediaWiki works just fine with it (since most of it is mundane) but things like dynamically including certain files, declaring classes, eval() and so on are all out. There're two types of includes in MediaWiki, ones I fixed for AutoLoader and ones I didn't - HPHP has all classes loaded, so AutoLoader is redundant. Generally, every include that just defines classes/functions is fine with HPHP, it is just some of MediaWiki's startup logic (Setup/WebStart) that depends on files included in certain order, so we have to make sure HipHop understands those includes. There was some different behavior with file including - in Zend you can say require(File.php), and it will try current script's directory, but if you do require(../File.php) - it will We don't have any eval() at the moment, and actually there's a mode when eval() works, people are just scared too much of it. We had some double class definitions (depending on whether certain components are available), as well as double function definitions ( ProfilerStub vs Profiler ) One of major problems is simply still not complete function set, that we'd need: * session - though we could sure work around it by setting up our own Session abstraction, team at facebook is already busy implementing full support * xdiff, mhash - the only two calls to it are from DiffHistoryBlob - so getting the feature to work is mandatory for production, not needed for testing :) Mhash been obsoleted by the hash extension, and HipHop has the hash extension (looking at the src). I think mhash is implemented as a wrapper onto the hash extension for a while. (http://svn.php.net/viewvc?view=revisionrevision=269961) assert(hash('adler32', 'foo', true) === mhash(MHASH_ADLER32, 'foo')); Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hiphop progress
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Ævar Arnfjörð Bjarmason Sent: 01 March 2010 13:34 To: Wikimedia developers Subject: Re: [Wikitech-l] hiphop progress On Mon, Mar 1, 2010 at 10:10, Domas Mituzas midom.li...@gmail.com wrote: Howdy, Most of the code in MediaWiki works just fine with it (since most of it is mundane) but things like dynamically including certain files, declaring classes, eval() and so on are all out. There're two types of includes in MediaWiki, ones I fixed for AutoLoader and ones I didn't - HPHP has all classes loaded, so AutoLoader is redundant. Generally, every include that just defines classes/functions is fine with HPHP, it is just some of MediaWiki's startup logic (Setup/WebStart) that depends on files included in certain order, so we have to make sure HipHop understands those includes. There was some different behavior with file including - in Zend you can say require(File.php), and it will try current script's directory, but if you do require(../File.php) - it will We don't have any eval() at the moment, and actually there's a mode when eval() works, people are just scared too much of it. We had some double class definitions (depending on whether certain components are available), as well as double function definitions ( ProfilerStub vs Profiler ) One of major problems is simply still not complete function set, that we'd need: * session - though we could sure work around it by setting up our own Session abstraction, team at facebook is already busy implementing full support * xdiff, mhash - the only two calls to it are from DiffHistoryBlob - so getting the feature to work is mandatory for production, not needed for testing :) * tidy - have to call the binary now function_exists() is somewhat crippled, as far as I understand, so I had to work around certain issues there. There're some other crippled functions, which we hit through the testing... It is quite fun to hit all the various edge cases in PHP language (e.g. interfaces may have constants) which are broken in hiphop. Good thing is having developers carefully reading/looking at those. Some things are still broken, some can be worked around in MediaWiki. Some of crashes I hit are quite difficult to reproduce - it is easier to bypass that code for now, and come up with good reproduction cases later. Even if it wasn't hotspots like the parser could still be compiled with hiphop and turned into a PECL extension. hiphop provides major boost for actual mediawiki initialization too - while Zend has to reinitialize objects and data all the time, having all that in core process image is quite efficient. One other nice thing about hiphop is that the compiler output is relatively readable compared to most compilers. Meaning that if you That especially helps with debugging :) need to optimize some particular function it's easy to take the generated .cpp output and replace the generated code with something more native to C++ that doesn't lose speed because it needs to manipulate everything as a php object. Well, that is not entirely true - if it manipulated everything as PHP object (zval), it would be as slow and inefficient as PHP. The major cost benefit here is that it does strict type inference, and falls back to Variant only when it cannot come up with decent type. And yes, one can find offending code that causes the expensive paths. I don't see manual C++ code optimizations as way to go though - because they'd be overwritten by next code build. The case I had in mind is when you have say a function in the parser that takes a $string and munges it. If that turns out to be a bottleneck you could just get a char* out of that $string and munge it at the C level instead of calling the PHP wrappers for things like explode() and other php string/array munging. That's some future project once it's working and those bottlenecks are found though, I was just pleasantly surprised that hphp makes this relatively easy. I would think that getting hiphop to compile out regular expressions from preg_*() calls to C++ (like re2c), would be the idea. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] New phpunit tests eat ~1GB of memory
A guess would be to try PHP 5.3, and enable the garbage collector. http://www.php.net/manual/en/function.gc-enable.php Jared -Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Ævar Arnfjörð Bjarmason Sent: 06 February 2010 01:05 To: wikitech-l@lists.wikimedia.org Cc: mediawiki-...@lists.wikimedia.org Subject: [Wikitech-l] New phpunit tests eat ~1GB of memory Since the tests were ported from t/ to phpunit's phase3/maintenance/tests/ in r61938 and other commits running the tests on my machine takes up to 1GB of memory and grows as it runs more tests. It seems that phpunit uses the same instance of the php interpreter for running all the tests. Is there some way around this? Perhaps phpunit.xml could be tweaked so that it runs a new php for each test? Furthermore when I run `make test' I get this: Time: 03:35, Memory: 1849.25Mb There were 2 failures: 1) LanguageConverterTest::testGetPreferredVariantUserOption Failed asserting that two strings are equal. --- Expected +++ Actual @@ @@ -tg-latn +tg /home/avar/src/mw/trunk/phase3/maintenance/tests/LanguageConve rterTest.php:82 2) Warning No tests found in class ParserUnitTest. FAILURES! Tests: 686, Assertions: 3431, Failures: 2, Incomplete: 34 But when I run phpunit manually on the test then all tests pass: $ phpunit LanguageConverterTest.php PHPUnit 3.4.5 by Sebastian Bergmann. . Time: 23 seconds, Memory: 23.75Mb OK (9 tests, 34 assertions) Also after I get Tests: 686, Assertions: 3431, Failures: 2, Incomplete: 34 in the first output phpunit doesn't exit and continues hugging my memory. Why is it still running? It has already run all the tests. On Wed, Feb 3, 2010 at 17:35, ia...@svn.wikimedia.org wrote: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/61938 Revision: 61938 Author: ialex Date: 2010-02-03 17:35:59 + (Wed, 03 Feb 2010) Log Message: --- * Port tests from t/inc/ * Added new tests to XmlTest Added Paths: --- trunk/phase3/tests/LicensesTest.php trunk/phase3/tests/SanitizerTest.php trunk/phase3/tests/TimeAdjustTest.php trunk/phase3/tests/TitleTest.php trunk/phase3/tests/XmlTest.php Added: trunk/phase3/tests/LicensesTest.php === --- trunk/phase3/tests/LicensesTest.php (rev 0) +++ trunk/phase3/tests/LicensesTest.php 2010-02-03 17:35:59 UTC (rev +++ 61938) @@ -0,0 +1,17 @@ +?php + +/** + * @group Broken + */ +class LicensesTest extends PHPUnit_Framework_TestCase { + + function testLicenses() { + $str = +* Free licenses: +** GFLD|Debian disagrees +; + + $lc = new Licenses( $str ); + $this-assertTrue( is_a( $lc, 'Licenses' ), 'Correct +class' ); + } +} \ No newline at end of file Property changes on: trunk/phase3/tests/LicensesTest.php ___ Added: svn:eol-style + native Added: trunk/phase3/tests/SanitizerTest.php === --- trunk/phase3/tests/SanitizerTest.php (rev 0) +++ trunk/phase3/tests/SanitizerTest.php 2010-02-03 17:35:59 +++ UTC (rev 61938) @@ -0,0 +1,71 @@ +?php + +global $IP; +require_once( $IP/includes/Sanitizer.php ); + +class SanitizerTest extends PHPUnit_Framework_TestCase { + + function testDecodeNamedEntities() { + $this-assertEquals( + \xc3\xa9cole, + Sanitizer::decodeCharReferences( + 'eacute;cole' ), + 'decode named entities' + ); + } + + function testDecodeNumericEntities() { + $this-assertEquals( + \xc4\x88io bonas dans l'\xc3\xa9cole!, + Sanitizer::decodeCharReferences( #x108;io + bonas dans l'#233;cole! ), + 'decode numeric entities' + ); + } + + function testDecodeMixedEntities() { + $this-assertEquals( + \xc4\x88io bonas dans l'\xc3\xa9cole!, + Sanitizer::decodeCharReferences( #x108;io + bonas dans l'eacute;cole! ), + 'decode mixed numeric/named entities' + ); + } + + function testDecodeMixedComplexEntities() { + $this-assertEquals( + \xc4\x88io bonas dans l'\xc3\xa9cole! (mais + pas #x108;io dans l'eacute;cole), + Sanitizer::decodeCharReferences(
Re: [Wikitech-l] sessions in parallel
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of lee worden Sent: 06 December 2009 23:26 To: Wikimedia developers Subject: Re: [Wikitech-l] sessions in parallel -Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of lee worden Sent: 04 December 2009 19:14 To: Wikimedia developers Subject: [Wikitech-l] sessions in parallel Hi - I'm debugging my extension code against potential deadlock conditions, and am having a problem: when I request 2 pages simultaneously in different firefox tabs, judging by the wfDebug output it seems like the second page request blocks at session_start() and waits until the first page is done. Is it supposed to do this? Does it depend on my configuration? Thanks - Lee Worden McMaster University If I remember correctly, the PHP files session handler has to acquire an exclusive lock on the session file in session_start(). Thus preventing your second request until the first is complete. Jared Thanks! I have to log in as 2 people from 2 different browsers, I guess. LW Or configure MediaWiki PHP to use something like MemCache. http://www.mediawiki.org/wiki/Manual:$wgSessionsInMemcached Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] sessions in parallel
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of lee worden Sent: 04 December 2009 19:14 To: Wikimedia developers Subject: [Wikitech-l] sessions in parallel Hi - I'm debugging my extension code against potential deadlock conditions, and am having a problem: when I request 2 pages simultaneously in different firefox tabs, judging by the wfDebug output it seems like the second page request blocks at session_start() and waits until the first page is done. Is it supposed to do this? Does it depend on my configuration? Thanks - Lee Worden McMaster University If I remember correctly, the PHP files session handler has to acquire an exclusive lock on the session file in session_start(). Thus preventing your second request until the first is complete. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Advice needed
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Max Semenik Sent: 19 October 2009 21:42 To: Wikimedia developers Subject: [Wikitech-l] Advice needed As you may or may not know, most queries involving LIKE clause are broken on SQLite backend.[1] As a measure to fix it, I'm planning to replace all LIKEs with a function call that will provide the needed abstraction. However, I would like it to be convenient to use and provide automatic protection against SQL injection, so instead of something like $sql = 'SELECT * FROM table WHERE field' . $db-like($db-escapeLike($text) . '%') I'd rather prefer Mr.Z-man's idea of $sql = 'SELECT * FROM table WHERE field' . $db-like($text, MATCH_STRING ) The example patch is at [2], but there is a problem: due to PHP's duck typing, you can have tough times in telling a string to be encoded from a constant that indicates '%' or '_' placeholders. There are a few possible solutions: * Even comparing with === can't provide enough guarantee for integer constants. * We could use tricky float constants such like 3253427569845.236156471, as suggested by Aryeh Gregor, but it looks rather hackish. * Alternatively, there could be something like Database::asterisk() that would return unique objects. Can there be a better way of doing that? And which variant of constant names would you prefer: Mr.Z-man's original LIKE_UNDERSCORE/LIKE_PERCENT, MATCH_CHAR/MATCH_STRING proposed by me, or something else? Please opine. -- [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=20275 [2] https://bugzilla.wikimedia.org/attachment.cgi?id=6531action=diff -- Max Semenik ([[User:MaxSem]]) I'd personally go with 3 functions, assuming don't need the full flexibility of LIKE startsWith($prefix) = LIKE '$prefix%' endsWith($suffix) = LIKE '%$suffix' contains($infix)= LIKE '%$infix%' Looking at the grep results searching for LIKE seems like they would cover it. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki memory usage
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Andrew Garrett Sent: 04 October 2009 23:07 To: m...@aifb.uni-karlsruhe.de; Wikimedia developers Subject: Re: [Wikitech-l] MediaWiki memory usage On 04/10/2009, at 7:27 PM, Markus Krötzsch wrote: == How to do memory profiling? == I tried to enable PHP memory profiling in xdebug, but all I got was time data, and I gave up on this for now. The aggregated outputs in profileinfo.php were not very useful for me either; in particular, I think that they do not take garbage collection into account, i.e. they only show new memory allocations, but not the freeing of old memory. So one piece of code may allocate 20M but never need more than 4M at a time, while another consumes the same amount and keeps it due to some mem leak. Especially, the sums and percentages do apparently not show the real impact that a piece of code has on using PHP's memory. So I based my memory estimations on the minimal PHP memory limit that would not return a blank page when creating a page preview (ini_set('memory_limit',...);). This measure is rather coarse for debugging, but it might be the one number that matters most to the user. The results were reproducible. For future reference, you can use the memory_get_usage function: http://php.net/manual-lookup.php?pattern=get+memory+usagelang=en Memtrack extension probably more useful. http://www.php.net/manual/en/intro.memtrack.php Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Proposal for editing template calls within pages
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Dmitriy Sintsov Sent: 25 September 2009 07:01 To: Wikimedia developers Subject: Re: [Wikitech-l] Proposal for editing template calls within pages * Aryeh Gregor simetrical+wikil...@gmail.com [Thu, 24 Sep 2009 15:40:46 -0400]: Templates and refs are by far the worst offenders, for sticking tons of content in the page that doesn't have any obvious relationship to the actual content. Getting rid of them would be a huge step forward. But stuff like '''bold''' and ==headings== are also a real problem. What's complex in '''bold''' and ==headings== ? Here when we've installed the wiki for local historical records at the local Russian university the humanitarians got to understand such things really quickly. The Ms or PhD in History cannot be that much stupid.. To me it looks like you are overstating the complexity of the wikitext. But yes, they are calling technical staff for complex cases, but it happens _rarely_. Historical records are mostly just plain text with links and occasional pictures. The problem is the ambiguity with italics, (''italics''). So the current parser doesn't really make its final decision on what should be bold or what should be italic until it hits a newline. If there are an even number of both bold and italics then it assumes it interpreted the line correctly. However if there is an uneven number of bold italic, it starts searching for where it could have misinterpreted something. I think this is part of what makes wikitext undescribable in a formal grammar. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Proposal for editing template calls within pages
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Dmitriy Sintsov Sent: 25 September 2009 11:09 To: Wikimedia developers Subject: Re: [Wikitech-l] Proposal for editing template calls within pages * Jared Williams jared.willia...@ntlworld.com [Fri, 25 Sep 2009 10:49:54 +0100]: The problem is the ambiguity with italics, (''italics''). So the current parser doesn't really make its final decision on what should be bold or what should be italic until it hits a newline. If there are an even number of both bold and italics then it assumes it interpreted the line correctly. However if there is an uneven number of bold italic, it starts searching for where it could have misinterpreted something. Shouldn't these cases be considered a syntax error? How much is common to wikipedia to have uneven numbers of occurences of that? Is there any use for that (weird templates)? I think this is part of what makes wikitext undescribable in a formal grammar. Jared Let's assume an odd occurence of ''' will be converted to wmf:bold and an even occurence ''' to /wmf:bold (begin/end of the node)? Non-paired occurence will simply cause XML parsing error - there should not be uneven number of '' or '''. Dmitriy Problem is quotes are also valid as part of the textual content, so could not italics immediately before or after an apostrophe. As in L'''arc de triomphe'' Which the current parser resolves to L'iarc de triomphe/i Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Proposal for editing template calls within pages
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Roan Kattouw Sent: 25 September 2009 23:39 To: Wikimedia developers Subject: Re: [Wikitech-l] Proposal for editing template calls within pages 2009/9/25 Platonides platoni...@gmail.com: Those descriptions will have to be edited by the same user base that edit all other pages. Even if they are power users, it's not easy to write correct XML on the wiki textarea. We would need to create an editor for the language being created so a template editor can be made. Since the XML file describes the template, it need only be changed when the template is changed. Realistically, newbie editors don't edit templates; anyone skilled enough to edit templates can handle some simple XML. I advocate for a simpler syntax for form definition (but we shouldn't on the way reinvent wikitext). Exactly. XML is a decent choice here because it has a well-defined, pre-existing grammar with parsers already available, which means it's easy to parse and easy to learn (assuming you've got some shred of a technical background; see my earlier point about newbies not editing templates). Roan Kattouw (Catrope) One thing I think might be missing from the example template description is all the implicit parameters it depends on, like language. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JS2 design (was Re: Working towards branchingMediaWiki 1.16)
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Aryeh Gregor Sent: 25 September 2009 23:01 To: Wikimedia developers Subject: Re: [Wikitech-l] JS2 design (was Re: Working towards branchingMediaWiki 1.16) On Fri, Sep 25, 2009 at 3:46 PM, Steve Sanbeg ssan...@ask.com wrote: I'm not sure that's entirely accurate. XSLT works on DOM trees, so malformed XML shouldn't really apply. Of course, the standard command line processors create this tree with a standard parser, usually an XML parser. But in PHP, creating the DOM with a parser and transforming it with XSLT are handled separately. Interesting. In that case, theoretically, you could use an HTML5 parser, which is guaranteed to *always* produce a DOM even on random garbage input (much like wikitext!). Now, who's up for writing an HTML5 parser in PHP whose performance is acceptable? I thought not. :P libxml2, and therefore PHP has a tag soup HTML 4 parser. DOMDocument::loadHTML() http://xmlsoft.org/html/libxml-HTMLparser.html Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JS2 design (was Re: Working towards branchingMediaWiki 1.16)
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Aryeh Gregor Sent: 24 September 2009 15:48 To: Wikimedia developers Subject: Re: [Wikitech-l] JS2 design (was Re: Working towards branchingMediaWiki 1.16) On Thu, Sep 24, 2009 at 4:41 AM, Tim Starling tstarl...@wikimedia.org wrote: * Removes a few RTTs for non-pipelining clients Do you mean to imply that there's such a thing as a pipelining client on the real web? (Okay, okay, Opera.) This concern seems like it outweighs all the others put together pretty handily -- especially for script files that aren't at the end, which block page loading. * Automatically create CSS sprites? That would be neat, but perhaps a bit tricky. Just trying to think how it'd work. Given a CSS selector, and an image, should be able to construct a stylesheet which sets the background property of the css rules and an single image. (#toolbar-copy, toolbar-copy.png) (#toolbar-copy:hover, toolbar-copy-hover.png) And the generated stylesheet would get concatenated with other stylesheets. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JS2 design (was Re: Working towards branchingMediaWiki 1.16)
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Trevor Parscal Sent: 24 September 2009 19:38 To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] JS2 design (was Re: Working towards branchingMediaWiki 1.16) On 9/24/09 9:31 AM, Jared Williams wrote: * Automatically create CSS sprites? That would be neat, but perhaps a bit tricky. Just trying to think how it'd work. Given a CSS selector, and an image, should be able to construct a stylesheet which sets the background property of the css rules and an single image. (#toolbar-copy, toolbar-copy.png) (#toolbar-copy:hover, toolbar-copy-hover.png) And the generated stylesheet would get concatenated with other stylesheets. Again, I like sprites allot! But in reality, they are an optimization technique that needs careful attention and can cause problems if done improperly. Providing CSS sprite support would be (I guess) just a service for modules/extensions to use, just as a part of the proposed client resource manager(?). So the mediawiki or an extension can put in a request for a some stylesheet or javascript be linked to, it could also request for images possibly via CSS sprites. So don't see how it should cause a problem. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] JS2 design (was Re: Working towards branchingMediaWiki 1.16)
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Trevor Parscal Sent: 24 September 2009 21:49 To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] JS2 design (was Re: Working towards branchingMediaWiki 1.16) On 9/24/09 1:40 PM, Jared Williams wrote: -Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Trevor Parscal Sent: 24 September 2009 19:38 To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] JS2 design (was Re: Working towards branchingMediaWiki 1.16) On 9/24/09 9:31 AM, Jared Williams wrote: * Automatically create CSS sprites? That would be neat, but perhaps a bit tricky. Just trying to think how it'd work. Given a CSS selector, and an image, should be able to construct a stylesheet which sets the background property of the css rules and an single image. (#toolbar-copy, toolbar-copy.png) (#toolbar-copy:hover, toolbar-copy-hover.png) And the generated stylesheet would get concatenated with other stylesheets. Again, I like sprites allot! But in reality, they are an optimization technique that needs careful attention and can cause problems if done improperly. Providing CSS sprite support would be (I guess) just a service for modules/extensions to use, just as a part of the proposed client resource manager(?). So the mediawiki or an extension can put in a request for a some stylesheet or javascript be linked to, it could also request for images possibly via CSS sprites. So don't see how it should cause a problem. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l So you are saying that you believe a generic set of sprite-generation utilities are going to be able to completely overcome the issues I identified and be a better use of time (to design, develop and use) than just creating and using sprites manually? - Trevor I wouldn't say there a issues with CSS sprites, but there are limitations which you have to be aware of before deciding on using them, and therefore do not need overcoming. In the context of providing toolbar imagery for UIs like a WYSIWYG editor, or for playing video, audio, or for simple image editing, they can remove a lot of round triping. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Usability initiative (HotCatreplacement/improvements etc.)
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Aryeh Gregor Sent: 16 September 2009 19:39 To: Wikimedia developers Subject: Re: [Wikitech-l] Usability initiative (HotCatreplacement/improvements etc.) On Wed, Sep 16, 2009 at 12:34 PM, Andrew Garrett agarr...@wikimedia.org wrote: * Heavy icon use means a lot of extra HTTP requests. Non-issue, I think. If we think that icons enhance usability, and we have appropriate placeholders in place, then we're willing to buy the extra servers. It's not just server load, it's also latency that's visible to the user. Both in loading the images, and any subsequent files. Browsers don't load things in parallel very aggressively. Can distribute them across multiple domain names, thereby bypassing the browser/HTTP limits. Something along the lines of 'c'.(crc32($title) 3).'.en.wikipedia.org' Would atleast attempt to download upto 4 times as many things. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Usability initiative(HotCatreplacement/improvements etc.)
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Gregory Maxwell Sent: 16 September 2009 22:35 To: Wikimedia developers Subject: Re: [Wikitech-l] Usability initiative(HotCatreplacement/improvements etc.) On Wed, Sep 16, 2009 at 5:24 PM, Jared Williams jared.willia...@ntlworld.com wrote: Can distribute them across multiple domain names, thereby bypassing the browser/HTTP limits. Something along the lines of 'c'.(crc32($title) 3).'.en.wikipedia.org' Would atleast attempt to download upto 4 times as many things. Right, but it reduces connection reuse. So you end up taking more TCP handshakes and spend more time with a small transmission window. (plus more DNS round-trips; relevant because wikimedia uses low TTLs for GSLB reasons) TNSTAAFL. Indeed, it all rather depends on usage. There is also that sprite option, combing all the icons into a single image, and using CSS tricks to display each icon. But seems far too much pfaff to keep track of, if want the individual icons as wiki content. My personal prefered solution would be to have the icons in SVG and embed them directly into the page. But I guess that is not acceptable for the browser agnostic wikipedia audience. Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Usabilityinitiative(HotCatreplacement/improvements etc.)
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Aryeh Gregor Sent: 17 September 2009 00:46 To: Wikimedia developers Subject: Re: [Wikitech-l] Usabilityinitiative(HotCatreplacement/improvements etc.) On Wed, Sep 16, 2009 at 6:17 PM, Daniel Schwen li...@schwen.de wrote: My personal prefered solution would be to have the icons in SVG and embed them directly into the page. But I guess that is not acceptable for the browser agnostic wikipedia audience. There are always data: urls, which would also save a roundtrip. But without some serverside support to automatically embed those this would create a maintenance nightmare. These icons are being added to the page by the software, so automatic embedding is no problem. But IE doesn't support data: before version 8. data: with SVG would avoid the extra requests and latency, but then of course you don't get to do caching! Caching is still possible, with SDCH compression but only Chromium/Chrome, and possibly IE with Google Toolbar, have implementations. Web development is fun. No comment :) Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language variants
Doesn't having geographically located page caches reduce the doubling effect in any given location? Squids located in the US should be caching more en-US than en-GB, and those in Europe should have more en-GB than en-US. Jared -Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Gerard Meijssen Sent: 12 September 2009 09:48 To: Wikimedia developers Subject: Re: [Wikitech-l] Language variants Hoi, When we are to do this for English and have digitise and digitize, we have to keep in mind that this ONLY deals with issues that are differences between GB and US English. There are other varieties of English that may make this more complicated. Given the size of the GB and US populations it would split the cache and effectively double the cache size. There are more languages where this would provide serious benefits. I can easily imagine that the German, Spanish and Portuguese community would be interested.. Then there are many of the other languages that may have an interest.. The first order of business is not can it be done but who will implement and maintain the language part of this. Thanks, GerardM 2009/9/12 Ilmari Karonen nos...@vyznev.net Happy-melon wrote: Ilmari Karonen wrote: -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}}; ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; !-- ...and so on for about 70 more languages --}- The above begs the question, of course, would this switch actually work? And if it does, how does it affect the cache and linktables? More investigation needed, methinks Indeed, that was what I was wondering about too. Without actually trying it out, my guess would be that it would indeed work, but at a cost: it'd first parse all the 75 or so subtemplates and then throw all but one of them away. Of course, that's what one would have to do anyway, to get full link table consistency. It does seem to me that it might not be *that* inefficient, *if* the page were somehow cached in its pre-languageconverted state but after the expensive template parsing has been done. Does such a cache actually exist, or, if not, could one be added with reasonable ease? -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] On templates and programming languages
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Brion Vibber Sent: 30 June 2009 17:17 To: Wikimedia developers Subject: [Wikitech-l] On templates and programming languages As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise. And we all thought Perl was bad! ;) There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation. One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty. An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part. An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis. There are perhaps three primary alternative contenders that don't involve making up our own scripting language (something I'd dearly like to avoid): * PHP Advantage: Lots of webbish people have some experience with PHP or can easily find references. Advantage: we're pretty much guaranteed to have a PHP interpreter available. :) Disadvantage: PHP is difficult to lock down for secure execution. * JavaScript Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users. Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P * Python Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.) Wash: Python is probably better known than Lua, but not as well as PHP or JS. Disadvantage: Like PHP, Python is difficult to lock down securely. Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;) Would you want the interpreter to translate the template into PHP array of opcodes first, so could dump that into APC/MemCache? Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] On templates and programming languages
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Aryeh Gregor Sent: 30 June 2009 20:56 To: Wikimedia developers Subject: Re: [Wikitech-l] On templates and programming languages On Tue, Jun 30, 2009 at 12:16 PM, Brion Vibberbr...@wikimedia.org wrote: * PHP Advantage: Lots of webbish people have some experience with PHP or can easily find references. Advantage: we're pretty much guaranteed to have a PHP interpreter available. :) Disadvantage: PHP is difficult to lock down for secure execution. I think it would be easy to provide a very simple locked-down version, with most of the features gone. You could, for instance, only permit variable assignment, use of built-in operators, a small whitelist of functions, and conditionals. You could omit loops, function definitions, and abusable functions like str_repeat() (let alone exec(), eval(), etc.) from a first pass. This would still be vastly more powerful, more readable, and faster than ParserFunctions. Pity there is not a method of locking down code execution to a single namespace, (think ahead with php5.3) namespace Template { function strlen($string) { return \strlen($string) * 2; } function exec() { throw new \Exception(); } class Template { function paint() { // Redirect \ namespace to Template, so \exec() is also caught. echo strlen('data'); } } } Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Future of Javascript and mediaWiki
-Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Platonides Sent: 17 December 2008 00:20 To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] Future of Javascript and mediaWiki Jared Williams wrote: SDCHing MediaWiki HTML would take some effort, as the page output is between skin classes and OutputPage etc. Also would want the translation text from \languages\messages\Messages*.php in there too I think. Handling the $1 style placeholders is easy, its just determining what message goes through which wfMsg*() function, and if the WikiText translations can be preconverted to html. But most of the HTML comes from article wikitext, so I wonder wether it'd beat gzip by anything significant. Jared Note that SDCH is expected to be then gzipped, as they fulfill different needs. They aren't incompatible. You would use a dictionary for common skin bits, perhaps also adding some common page features, like the TOC code, 'amp;action=editamp;redlink=1 class=new'... Having a second dictionary for language dependant output could be also interesting, but not all messages should be provided. Unfortunately, whilst the useragent can announce it has multiple dictionaries, the SDCH response can only indicate it used a single dictionary. Simetrical wrote: What happens if you have parser functions that depend on the value of $1 (allowed in some messages AFAIK)? What if $1 contains wikitext itself (I wouldn't be surprised if that were true somewhere)? How do you plan to do this substitution anyway, JavaScript? What about clients that don't support JavaScript? /Usually/, you don't create the dictionary output by hand, but pass the page to a dictionary compresser (or so is expected, this is too much experimental yet). If a parser function changed it completely, they will just be literals. If you have a parametrized block, the vcdiff would see, this piece up to Foo matches this dictionary section, before $1. And this other matches the text following Foo... What I have atm, just traverses a directory of templates, using PHPs built in tokenizer to extract T_INLINE_HTML tokens into the dictionary (if greater than 3 bytes long), and replacing with them with a call to output the vcdiff copy opcodes. So html xmlns=http://www.w3.org/1999/xhtml; xml:lang=?php $e($this-lang); ? head meta http-equiv=Content-Type content=text/html; charset=utf-8/ title?php $e($this-title); ? Becomes ?php $this-copy(0, 53);$e($this-lang); $this-copy(53, 91);$e($this-title); PHPs output buffering captures the output from the PHP code within the template, which essentially becomes the data section of the vcdiff. Jared wrote: I do have working PHP code, That can parse PHP templates language strings to generate the dictionary, and a new set of templates rewritten to output the vcdiff efficiently. Please share? Intend too, I probably should document/add some comments first :) Jared ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l