Re: [Wikitech-l] MW test infrastructure architecture

2009-08-12 Thread Andrew Garrett

On 12/08/2009, at 1:16 AM, Chad wrote:
 On the whole new code front. Can we all /please/ remember that
 we're writing PHP5 here. Visibility on all new functions and variables
 should also be a must.

I honestly don't think this is that important. We should err on the  
side of public, not private or protected.

Very frequently, I've had to change visibility on functions because  
somebody hasn't anticipated a case where a class function might be  
usefully called from outside that class.

--
Andrew Garrett
agarr...@wikimedia.org
http://werdn.us/


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MW test infrastructure architecture

2009-08-12 Thread Chad
On Wed, Aug 12, 2009 at 8:31 AM, Andrew Garrettagarr...@wikimedia.org wrote:

 On 12/08/2009, at 1:16 AM, Chad wrote:
 On the whole new code front. Can we all /please/ remember that
 we're writing PHP5 here. Visibility on all new functions and variables
 should also be a must.

 I honestly don't think this is that important. We should err on the
 side of public, not private or protected.

 Very frequently, I've had to change visibility on functions because
 somebody hasn't anticipated a case where a class function might be
 usefully called from outside that class.

 --
 Andrew Garrett
 agarr...@wikimedia.org
 http://werdn.us/


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


When writing maintenance, I tried to make very concious decisions
about what visibility to put on each method. For clearly defined public
interfaces, I made the methods public. Some of the setup functions
don't need accessing and I probably don't want children implementing
them, so I made them private. I think making careful decisions when
coding can lead to well-written classes with proper visibility. If a
particular method/variable needs to become public at a later point, it
can easily be changed.

My main point is not so much make things private but at least look
like we're writing code in the last 4 years.

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MW test infrastructure architecture

2009-08-12 Thread Aryeh Gregor
On Wed, Aug 12, 2009 at 8:31 AM, Andrew Garrettagarr...@wikimedia.org wrote:
 I honestly don't think this is that important. We should err on the
 side of public, not private or protected.

 Very frequently, I've had to change visibility on functions because
 somebody hasn't anticipated a case where a class function might be
 usefully called from outside that class.

On the other hand, I've often wanted to refactor code but been unable
to because the functions were public or protected and I didn't want to
break any third-party extensions that might call them.  If a function
is private and could usefully be public, then just make it public, no
problem.

The only people who are harmed by too many private functions are
people without commit access, who can't just make the functions
public.  Since these are also the only people whose code could be
broken by changing a public function, though -- if the code is
committed, then the one who changes the public function can just
change the caller -- you can make a good case that it's better to make
everything public, and let extension authors rely on the public
functions if they want without any guarantee that the functions will
remain stable.  They'd probably prefer that to having the function
private and not being able to even try to use it.

Generally speaking, I always make my functions private if I don't
explicitly intend to make them part of the class interface for
third-party use.  But I can see the other side too, so I don't know if
a firm policy is the best course of action.

On Wed, Aug 12, 2009 at 9:24 AM, Victor Vasilievvasi...@gmail.com wrote:
 What comment would you except to public function getArticleText()?

Well, I can't write a very good comment since I've never read that
function or even any significant part of that class until now.  (I'm
looking at RawPage::getArticleText() here.)  But maybe something like:

/**
 * Returns the parsed HTML corresponding to this page for action=raw output,
 * and outputs some HTTP headers.
 *
 * @return string Parsed HTML suitable for echoing
 */
private function getArticleText() {

The documentation is cursory because it's really just a helper method
that's designed to be called in exactly one place, so anyone who wants
to use it is already making changes to RawPage and will as likely as
not want to read and/or change the source of the method.

If you don't want to make it private for some reason, then I'd add
@private to the documentation and put a note saying what method you
actually want to call.  It has various side effects and
RawPage-specific logic that make it very unlikely to be useful to any
other classes.

If you were referring to a hypothetical public method in, say,
Revision, then I'd expect the comment to include all information that
callers might like to know, such as:

* Is the text parsed or unparsed?  If parsed, what method should be
used to get unparsed text?  If unparsed, what method to get parsed
text?
* What's the performance impact like?  Does this do any batching or
caching?  Does every call result in a database query?  Does it call
the parser?  Does it use memcached?  Is it possible that this will
return outdated text, or will it definitely be up-to-date?
* Is any kind of permissions checking done?  Will this retrieve text
for revisions that are deleted, oversighted, rev_deleted, or otherwise
hidden?  Does it check whether the current user has read permission?
* What happens if there's an error, like the revision doesn't exist?
Is an exception thrown, or some magic value returned?  What sorts of
errors should we expect to see?  Can we assume there are no errors?
E.g., a low-level function that always returns the text even if
deleted, and where the revision object was created with a constructor
that verified that the revision was legit, might never return an error
unless the database couldn't be reached or was manually tampered with
(in which case the request should probably just die anyway).

The answers don't have to be lengthy, and they can refer to
documentation of other methods within reason.  But they *should* be
there, in a form where most MediaWiki developers can understand them.
I think all those questions could be answered in about three or four
lines, say, depending on what the answers are.  I really have to go
now, or else I'd commit a good comment for Revision::get(Raw)Text() to
show you what I mean.

In fact, Revision::getText() already has 11 lines of comments.
Revision::getRawText() has only 2.  I think both could be improved a
bit.

 By the way, is there a policy which describes where should we use
 get*/set* methods and where should we use public fields?

I always use private member variables.  Some other developers always
use public member variables.  It's pretty much the same story as with
public/private methods.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] User order on enhanced recentchanges

2009-08-12 Thread Nemo_bis
Do you know how are ordered the users who edited a page in the enhanced
recentchanges feature? See
http://meta.wikimedia.org/w/index.php?title=Help%3AEnhanced_recent_changesdiff=1570240oldid=1476276
 

I can't understand if there's a logical order or there's a bug.
Thanks,

Nemo


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Assumptions for development machines (w.r.t. to test infrastructure)

2009-08-12 Thread dan nessett
I'm starting a new thread because I noticed my news reader has glued together 
messages with the title A potential land mine and MW test infrastructure 
architecture, which may confuse someone coming into the discussion late. Also, 
the previous thread has branched into several topics and I want to concentrate 
on only one, specifically what can we assume about the system environment for a 
test infrastructure? These assumptions have direct impact on what test harness 
we use. Let me start by stating what I think can be assumed. Then people can 
tell me I am full of beans, add to the assumptions, subtract from them, etc.

The first thing I would assume is that a development system is less constrained 
than a production system in what can and what cannot be installed. For example, 
people shot down my proposal to automatically discover the MW root directory 
because some production systems have administrators without root access, 
without the ability to load code into the PEAR directory, etc. Fair enough 
(although minimizing the number of places where $IP is computed is still 
important). However, if you are doing MW development, then I think this 
assumption is too stringent. You need to run the tests in /tests/PHPUnitTests, 
which in at least one case requires the use of $wgDBadminuser and 
$wgDBadminpassword, something a non-priviledged user would not be allowed to do.

If a developer has more system privileges than a production admin, to what 
extent? Can we assume he has root access? If not, can we assume he can get 
someone who has to do things like install PHPUnit? Can we assume the 
availability of PERL or should we only assume PHP? Can we assume *AMP (e.g., 
LAMP, WAMP, MAMP, XAMP)? Can we assume PEAR? Can the developer install into 
PEAR?

Dan


  

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Assumptions for development machines (w.r.t. to test infrastructure)

2009-08-12 Thread Chad
On Wed, Aug 12, 2009 at 10:59 AM, dan nessettdness...@yahoo.com wrote:
 I'm starting a new thread because I noticed my news reader has glued together 
 messages with the title A potential land mine and MW test infrastructure 
 architecture, which may confuse someone coming into the discussion late. 
 Also, the previous thread has branched into several topics and I want to 
 concentrate on only one, specifically what can we assume about the system 
 environment for a test infrastructure? These assumptions have direct impact 
 on what test harness we use. Let me start by stating what I think can be 
 assumed. Then people can tell me I am full of beans, add to the assumptions, 
 subtract from them, etc.

 The first thing I would assume is that a development system is less 
 constrained than a production system in what can and what cannot be 
 installed. For example, people shot down my proposal to automatically 
 discover the MW root directory because some production systems have 
 administrators without root access, without the ability to load code into the 
 PEAR directory, etc. Fair enough (although minimizing the number of places 
 where $IP is computed is still important). However, if you are doing MW 
 development, then I think this assumption is too stringent. You need to run 
 the tests in /tests/PHPUnitTests, which in at least one case requires the use 
 of $wgDBadminuser and $wgDBadminpassword, something a non-priviledged user 
 would not be allowed to do.

 If a developer has more system privileges than a production admin, to what 
 extent? Can we assume he has root access? If not, can we assume he can get 
 someone who has to do things like install PHPUnit? Can we assume the 
 availability of PERL or should we only assume PHP? Can we assume *AMP (e.g., 
 LAMP, WAMP, MAMP, XAMP)? Can we assume PEAR? Can the developer install into 
 PEAR?

 Dan




 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Tests should run in a vanilla install, with minimal dependency on
external stuff. PHPUnit
(or whatever framework we use) would be considered an acceptable dependency for
test suites. If PHPUnit isn't available (ie: already installed and in
the include path), then
we should bail out nicely.

In general, external dependencies should be used as seamlessly as
possible, with minimal
involvement from the end-user. A good example is wikidiff2...we load
it if it exists, we fail
back to PHP diff mode if we can't use it.

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Assumptions for development machines (w.r.t. to test infrastructure)

2009-08-12 Thread dan nessett
--- On Wed, 8/12/09, Chad innocentkil...@gmail.com wrote:

 
 Tests should run in a vanilla install, with minimal
 dependency on
 external stuff. PHPUnit
 (or whatever framework we use) would be considered an
 acceptable dependency for
 test suites. If PHPUnit isn't available (ie: already
 installed and in
 the include path), then
 we should bail out nicely.
 
 In general, external dependencies should be used as
 seamlessly as
 possible, with minimal
 involvement from the end-user. A good example is
 wikidiff2...we load
 it if it exists, we fail
 back to PHP diff mode if we can't use it.

OK, graceful backout if an external dependency fails and minimal dependency on 
external stuff. So far we have two categories of proposal for test 
infrastructure : 1) build it all ourselves, and 2) use some external stuff. How 
do we decide which to do?

Dan


  

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Preload doesn't work at MediaWiki namespace?

2009-08-12 Thread Helder Geovane Gomes de Lima
Hi!

When we go to links such as:
http://pt.wikibooks.org/w/index.php?action=editpreload=Project:Abouttitle=test
http://pt.wikibooks.org/w/index.php?action=editpreload=Project:Abouttitle=MediaWiki:test
we note that both testand MediaWiki:test doesn't exist. But using the
preload parameter for the first page we get the text #REDIRECT
[[wikibooks:Sobre]] preloaded although for the second we don't.

Is this the expected behavior for MediaWiki namespace?

Helder
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Problem with phpunit --skeleton

2009-08-12 Thread dan nessett
I have been playing around with phpunit, in particular its facility for 
generating tests from existing PHP code. You do this by processing a suitably 
annotated (using /* @assert ... */ comment lines) version of the file with 
phpunit --skeleton. Unfortunately, the --skeleton option assumes the file 
contains a class with the same name as the file.

For example, I tried annotating Hook.php and then processing it with phpunit 
--skeleton. It didn't work. phpunit reported:

Could not find class Hook in .../phase3/tests/Hook.php

(where I have replaced my path to MW with ...).

Since there are many MW files that do not contain classes of the same name as 
the file (or even classes at all), the --skeleton is probably not very useful 
for MW phpunit test generation.

You can still use phpunit by inserting the appropriate test code manually, but 
the objective of automatically generating tests with the same convenience as 
adding lines to parserTests.txt isn't available.

Dan


  

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Assumptions for development machines (w.r.t. to test infrastructure)

2009-08-12 Thread Brion Vibber
On 8/12/09 9:51 AM, dan nessett wrote:
 --- On Wed, 8/12/09, Brion Vibberbr...@wikimedia.org  wrote:

 We *already* automatically discover the MW root directory.

 Yes, you're right. I should have said automatically discover the MW
 root directory without using file position dependent code.

The suggestions were for explicit manual configuration, not 
autodiscovery. Autodiscovery means *not* having to set anything. :)

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Assumptions for development machines (w.r.t. to test infrastructure)

2009-08-12 Thread dan nessett
--- On Wed, 8/12/09, Brion Vibber br...@wikimedia.org wrote:

 
 The suggestions were for explicit manual configuration, not
 
 autodiscovery. Autodiscovery means *not* having to set
 anything. :)

I am insane to keep this going, but the proposal I made did not require doing 
anything manually (other than running the install script, which you have to do 
anyway). The install script knows (or can find out) where the MW root is 
located. It could then either: 1) rewrite php.ini to concatenate the location 
of MWInit.php at the end of include_path, or 2) plop MWInit.php into a 
directory already searched by PHP for includes/requires (e.g., the PEAR) 
directory.

I gave up on the proposal when people pointed out that MW admins may not have 
the privileges that would allow the install script to do either.

Dan


  

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



Re: [Wikitech-l] Assumptions for development machines (w.r.t. to test infrastructure)

2009-08-12 Thread dan nessett
--- On Wed, 8/12/09, Roan Kattouw roan.katt...@gmail.com wrote:

 On shared hosting, both are impossible. MediaWiki currently
 works with
 minimal write access requirements (only the config/
 directory for the
 installer and the images/ directory if you want uploads),
 and we'd
 like to keep it that way for people who are condemned to
 shared
 hosting.

Which is why I wrote in the message that is the subject of your reply:

I gave up on the proposal when people pointed out that MW admins may not have 
the privileges that would allow the install script to do either.

Dan


  

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] split LocalSettings.php into public and private parts

2009-08-12 Thread jidanni
 dn == dan nessett dness...@yahoo.com writes:
dn suggest breaking LocalSettings into two parts

I've always wanted a public part, say linked from Special:Version, view
public initialization settings, with a standard URL,
title=Special:PublicLocalSettings perhaps, and all without needing to
install an extension.

And a private part, containing passwords, secret spam filters expressions etc.

E.g., take your average hook. One can let users see its name on
Special:Version, but to let users see its content... well one needs to
do something like

$ perl -nlwe 'print if /#START PUBLIC/../#END PUBLIC/' \
LocalSettings.php  PublicLocalSettings.txt

often, and link to that. I.e., _you are not making being open source easy_.

If one could easily see a wiki's public LocalSettings portion, one could
quickly answer many messages here on this list No wonder, Clodsworth,
you have set $wgNorfblatz='maybe' which is an illegal value.

And on
http://www.mediawiki.org/wiki/Manual:Wiki_family#Ultimate_minimalist_solution
I could provide an up to the minute link 'see the real settings behind that 
wiki',
without needing to cut and paste, nor use the above perl solution.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] License requirements for extensions in wikimedia SVN

2009-08-12 Thread Brion Vibber
On 8/12/09 11:19 AM, Lane, Ryan wrote:
 Are there any license requirements for extensions checked into wikimedia's
 SVN? I am taking over maintanance of an OpenSSO authentication extension
 from Sun, and it is currently CDDL licensed.

 Does it need to be GPL, or is CDDL an acceptable license?

Well, as I understand CDDL isn't compatible with GPL. so if you're 
pulling it into MediaWiki at runtime you enter the GPL Mystery Zone(TM) 
where things may, or may not, be properly licensed. :)

But I wouldn't object to CDDL code being in the repository, as long as 
we're not mixing the files together.

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] split LocalSettings.php into public and private parts

2009-08-12 Thread Roan Kattouw
2009/8/12  jida...@jidanni.org:
 often, and link to that. I.e., _you are not making being open source easy_.

This has absolutely nothing to do with MediaWiki being open source or
not. It's not like there's closed-source wikis out there of equivalent
quality that don't have this nuisance.

On topic: I agree with Chad that it's trivial to segment LocalSettings
yourself if you really want to.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] License requirements for extensions in wikimedia SVN

2009-08-12 Thread Lane, Ryan
 Well, as I understand CDDL isn't compatible with GPL. so if you're 
 pulling it into MediaWiki at runtime you enter the GPL 
 Mystery Zone(TM) 
 where things may, or may not, be properly licensed. :)
 
 But I wouldn't object to CDDL code being in the repository, 
 as long as 
 we're not mixing the files together.
 

Thanks. I've asked the developer if Sun is willing to relicense the code. If
not, I'll add as CDDL, and ensure I don't mix code/files.

V/r,

Ryan Lane
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] split LocalSettings.php into public and private parts

2009-08-12 Thread jidanni
 RK == Roan Kattouw roan.katt...@gmail.com writes:
RK 2009/8/12  jida...@jidanni.org:
 often, and link to that. I.e., _you are not making being open source easy_...
RK This has absolutely nothing to do with MediaWiki being open source
...for hook authors.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] A comprehensive parser regression test

2009-08-12 Thread dan nessett
I am investigating how to write a comprehensive parser regression test. What I 
mean by this is something you wouldn't normally run frequently, but rather 
something that we could use to get past the known to fail tests now disabled. 
The problem is no one understands the parser well enough to have confidence 
that if you fix one of these tests that you will not break something else.

So, I thought, how about using the guts of DumpHTML to create a comprehensive 
parser regression test. The idea is to have two versions of phase3 + 
extensions, one without the change you make to the parser to fix a 
known-to-fail test (call this Base) and one with the change (call this 
Current). Modify DumpHTML to first visit a page through Base, saving the HTML 
then visit the same page through Current and compare the two results. Do this 
for every page in the database. If there are no differences, the change in 
Current works.

Sitting here I can see the eyeballs of various developers bulging from their 
faces. What? they say. If you ran this test on, for example, Wikipedia, it 
could take days to complete. Well, that is one of the things I want to find 
out. The key to making this test useful is getting the code in the loop 
(rendering the page twice and testing the results for equality) very efficient. 
I may not have the skills to do this, but I can at least develop an upper bound 
on the time it would take to run such a test.

A comprehensive parser regression test would be valuable for:

* fixing the known-to-fail tests.
* testing any new parser that some courageous developer decides to code.
* testing major releases before they are released.
* catching bugs that aren't found by the current parserTest tests.
* other things I haven't thought of.

Of course, you wouldn't run this thing nightly or, perhaps, even weekly. Maybe 
once a month would be enough to ensure the parser hasn't regressed out of sight.



  

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] A comprehensive parser regression test

2009-08-12 Thread Roan Kattouw
2009/8/12 dan nessett dness...@yahoo.com:
 I am investigating how to write a comprehensive parser regression test. What 
 I mean by this is something you wouldn't normally run frequently, but rather 
 something that we could use to get past the known to fail tests now 
 disabled. The problem is no one understands the parser well enough to have 
 confidence that if you fix one of these tests that you will not break 
 something else.

 So, I thought, how about using the guts of DumpHTML to create a comprehensive 
 parser regression test. The idea is to have two versions of phase3 + 
 extensions, one without the change you make to the parser to fix a 
 known-to-fail test (call this Base) and one with the change (call this 
 Current). Modify DumpHTML to first visit a page through Base, saving the HTML 
 then visit the same page through Current and compare the two results. Do this 
 for every page in the database. If there are no differences, the change in 
 Current works.

 Sitting here I can see the eyeballs of various developers bulging from their 
 faces. What? they say. If you ran this test on, for example, Wikipedia, it 
 could take days to complete. Well, that is one of the things I want to find 
 out. The key to making this test useful is getting the code in the loop 
 (rendering the page twice and testing the results for equality) very 
 efficient. I may not have the skills to do this, but I can at least develop 
 an upper bound on the time it would take to run such a test.

I read this paragraph first, then read the paragraph above and
couldn't help saying WHAT?!?. Using a huge set of pages is a poor
replacement for decent tests. Also, how would you handle intentional
changes to the parser output, especially when they're non-trivial?

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] More fun and games with file position relative code

2009-08-12 Thread dan nessett
So. I checked out a copy of phase3 and extensions to start working on 
investigating the feasibility of a comprehensive parser regression test. After 
getting the working copy downloaded, I do what I usually do - blow away the 
extensions directory stub that comes with phase3 and soft link the downloaded 
copy of extensions in its place. I then began familiarizing myself with 
DumpHTML by starting it up in a debugger. Guess what happened.

If fell over. Why? Because DumpHTML is yet another software module that 
computes the value $IP. So what? Well, DumpHTML.php is located in 
../extensions/DumpHTML. At line 57-59 it executes:

$IP = getenv( 'MW_INSTALL_PATH' );
if ( $IP === false ) {
$IP = dirname(__FILE__).'/../..';
}

This works on a deployed version of MW, since the extensions directory is 
embedded in /phase3. But, in a development version, where /extensions is a 
separate subdirectory, ./../.. does not get you to phase3, it gets you to MW 
root. So, when you execute the next line:

require_once( $IP./maintenance/commandLine.inc );

DumpHTML fails.

Of course, since I am going to change DumpHTML anyway, I can move it to 
/phase3/maintenance and change the '/../..' to '/..' and get on with it. But, 
for someone attempting to fix bugs in DumpHTML, the code that uses a knowledge 
of where DumpHTML.php is in the distribution tree is an issue.

Dan


  

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] More fun and games with file position relative code

2009-08-12 Thread Chad
On Wed, Aug 12, 2009 at 4:39 PM, dan nessettdness...@yahoo.com wrote:
 So. I checked out a copy of phase3 and extensions to start working on 
 investigating the feasibility of a comprehensive parser regression test. 
 After getting the working copy downloaded, I do what I usually do - blow away 
 the extensions directory stub that comes with phase3 and soft link the 
 downloaded copy of extensions in its place. I then began familiarizing myself 
 with DumpHTML by starting it up in a debugger. Guess what happened.

 If fell over. Why? Because DumpHTML is yet another software module that 
 computes the value $IP. So what? Well, DumpHTML.php is located in 
 ../extensions/DumpHTML. At line 57-59 it executes:

 $IP = getenv( 'MW_INSTALL_PATH' );
 if ( $IP === false ) {
        $IP = dirname(__FILE__).'/../..';
 }

 This works on a deployed version of MW, since the extensions directory is 
 embedded in /phase3. But, in a development version, where /extensions is a 
 separate subdirectory, ./../.. does not get you to phase3, it gets you to 
 MW root. So, when you execute the next line:

 require_once( $IP./maintenance/commandLine.inc );

 DumpHTML fails.

 Of course, since I am going to change DumpHTML anyway, I can move it to 
 /phase3/maintenance and change the '/../..' to '/..' and get on with it. But, 
 for someone attempting to fix bugs in DumpHTML, the code that uses a 
 knowledge of where DumpHTML.php is in the distribution tree is an issue.

 Dan




 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


DumpHTML will not be moved back to maintenance in the repo, it was
already removed from maintenance and made into an extension. Issues
with it as an extension should be fixed, but it should not be encouraged
to go back into core.

Also, on a meta notecan you possibly confine all of your testing comments
to a single thread? We don't need a new thread for each problem you find :)

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] A comprehensive parser regression test

2009-08-12 Thread dan nessett
--- On Wed, 8/12/09, Roan Kattouw roan.katt...@gmail.com wrote:

 I read this paragraph first, then read the paragraph above
 and
 couldn't help saying WHAT?!?. Using a huge set of pages
 is a poor
 replacement for decent tests.

I am not proposing that the CPRT be a substitute for decent tests. We still 
need a a good set of tests for the whole MW product (not just the parser). Nor 
would I recommend making a change to the parser and then immediately running 
the CPRT. Any developer that isn't masochistic would first run the existing 
parserTests and ensure it passes. Then, you probably want to run the modified 
DumpHTML against a small random selection of pages in the WP DB. Only if it 
passes those tests would you then run the CPRT for final assurance. 

The CPRT I am proposing is about as good a test of the parser that I can think 
of. If a change to the parser passes it using the Wikipedia database (currently 
5 GB), then I would say for all practical purposes the changes made to the 
parser do not regress it.

 Also, how would you handle
 intentional
 changes to the parser output, especially when they're
 non-trivial?

I don't understand this point. Would you elaborate?

Dan


  

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] More fun and games with file position relative code

2009-08-12 Thread dan nessett
Chad innocentkil...@gmail.com wrote:
 
 DumpHTML will not be moved back to maintenance in the repo, it was
 already removed from maintenance and made into an extension. Issues
 with it as an extension should be fixed, but it should not be encouraged
 to go back into core.

What I meant was I can move the code in DumpHTML as CPRT to maintanence in my 
working copy and work on it there. Whether this code is simply a MacGyver test 
or something else is completely up in the air.

 Also, on a meta notecan you possibly confine all of your testing comments
 to a single thread? We don't need a new thread for each problem you find :)
 

My apologies.

Dan


  

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] More fun and games with file position relative code

2009-08-12 Thread Brion Vibber
On 8/12/09 1:39 PM, dan nessett wrote:
 This works on a deployed version of MW, since the extensions
 directory is embedded in /phase3. But, in a development version,
 where /extensions is a separate subdirectory

Your setup is incorrect: the extensions folder *always* goes inside the 
MediaWiki root dir. Always.

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] A comprehensive parser regression test

2009-08-12 Thread Chad
On Wed, Aug 12, 2009 at 4:48 PM, dan nessettdness...@yahoo.com wrote:
 --- On Wed, 8/12/09, Roan Kattouw roan.katt...@gmail.com wrote:

 I read this paragraph first, then read the paragraph above
 and
 couldn't help saying WHAT?!?. Using a huge set of pages
 is a poor
 replacement for decent tests.

 I am not proposing that the CPRT be a substitute for decent tests. We still 
 need a a good set of tests for the whole MW product (not just the parser). 
 Nor would I recommend making a change to the parser and then immediately 
 running the CPRT. Any developer that isn't masochistic would first run the 
 existing parserTests and ensure it passes. Then, you probably want to run the 
 modified DumpHTML against a small random selection of pages in the WP DB. 
 Only if it passes those tests would you then run the CPRT for final assurance.

 The CPRT I am proposing is about as good a test of the parser that I can 
 think of. If a change to the parser passes it using the Wikipedia database 
 (currently 5 GB), then I would say for all practical purposes the changes 
 made to the parser do not regress it.

 Also, how would you handle
 intentional
 changes to the parser output, especially when they're
 non-trivial?

 I don't understand this point. Would you elaborate?

 Dan




 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


To elaborate on the final point. Sometimes the parser is changed and
it breaks output on purpose. Case in point was when Tim rewrote the
preprocessor. Some parts of syntax were intentionally changed. You'd
have to establish a new baseline for this new behavior at that point.

This also comes down to the fact that we don't have a formal grammar
for wikisyntax (basically it's whatever the Parser says it is at any given
time). This makes testing the parser hard--we can only give it input and
expected output, there's no standard to check against.

Finally, I don't think we need to dump all of enwiki. It can't require that
much content to describe the various combinations of wiki syntax...

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] More fun and games with file position relative code

2009-08-12 Thread Chad
On Wed, Aug 12, 2009 at 6:04 PM, dan nessettdness...@yahoo.com wrote:
 --- On Wed, 8/12/09, Brion Vibber br...@wikimedia.org wrote:

 Your setup is incorrect: the extensions folder *always*
 goes inside the
 MediaWiki root dir. Always.


 Sorry, my inexperience with Subversion led me in the wrong direction. I 
 didn't realize I could check out phase3 then point Subversion to the 
 extensions directory in phase3 to check out extensions. I thought Subversion 
 would get confused, but I just tried it and it worked :-). At least 
 Subversion performed the checkout. I haven't tried doing an update.

 I just (case sensitively) searched the extensions directory for the string 
 $IP = and found 32 files that compute $IP on their own. How about creating 
 a standard bit of code that extensions and other modules can copy and use to 
 figure out MW root. For example, it is very unlikely that the name of the 
 first level directory (i.e., phase3) will change. The code can call dirname( 
 __FILE__ ) and then search from the end of the pathname until it finds 
 phase3. It then knows that the prefix is MW root.

 Dan




 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Not true. People can name phase3 whatever they want. I have
phase3, wmf-deployment, new-install, maintenance-work and
w as wiki roots. You can name it anything you want, so don't
rely on the 'phase3' name at all.

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] A comprehensive parser regression test

2009-08-12 Thread Tim Landscheidt
dan nessett dness...@yahoo.com wrote:

 I am investigating how to write a comprehensive parser regression test. What 
 I mean by this is something you wouldn't normally run frequently, but rather 
 something that we could use to get past the known to fail tests now 
 disabled. The problem is no one understands the parser well enough to have 
 confidence that if you fix one of these tests that you will not break 
 something else.

 So, I thought, how about using the guts of DumpHTML to create a comprehensive 
 parser regression test. The idea is to have two versions of phase3 + 
 extensions, one without the change you make to the parser to fix a 
 known-to-fail test (call this Base) and one with the change (call this 
 Current). Modify DumpHTML to first visit a page through Base, saving the HTML 
 then visit the same page through Current and compare the two results. Do this 
 for every page in the database. If there are no differences, the change in 
 Current works.
 [...]

I use a similar approach on a toolserver script in addition
to smaller tests: I saved several revisions of interesting
wiki pages and the respective output of the then-current
script version to the subversion repository. Before commit-
ting changes, I run a test whether the current script produ-
ces the same results. If the results are different, either a
bug needs to be fixed or the expected output be amended (in
the same commit).

Tim

P. S.: I find your dedication to QA very laudable; I think
   though that more people would read and embrace your
   thoughts if you would find a more concise way to put
   them across :-).


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] More fun and games with file position relative code

2009-08-12 Thread Daniel Friesen
dan nessett wrote:
 --- On Wed, 8/12/09, Brion Vibber br...@wikimedia.org wrote:

   
 Your setup is incorrect: the extensions folder *always*
 goes inside the 
 MediaWiki root dir. Always.

 

 Sorry, my inexperience with Subversion led me in the wrong direction. I 
 didn't realize I could check out phase3 then point Subversion to the 
 extensions directory in phase3 to check out extensions. I thought Subversion 
 would get confused, but I just tried it and it worked :-). At least 
 Subversion performed the checkout. I haven't tried doing an update.

 I just (case sensitively) searched the extensions directory for the string 
 $IP = and found 32 files that compute $IP on their own. How about creating 
 a standard bit of code that extensions and other modules can copy and use to 
 figure out MW root. For example, it is very unlikely that the name of the 
 first level directory (i.e., phase3) will change. The code can call dirname( 
 __FILE__ ) and then search from the end of the pathname until it finds 
 phase3. It then knows that the prefix is MW root.

 Dan
   
Since, when... I myself have NEVER setup a mediawiki installation with
the base directory named phase3.

Take a good second look at the chunk of code you posted.

$IP = getenv( 'MW_INSTALL_PATH' );
if ( $IP === false ) {
$IP = dirname(__FILE__).'/../..';
}

This is precisely the reason the MW_INSTALL_PATH environment variable
was introduced. This IS the standard bit of code meant for extensions
that use maintenance scripts to copy.

 export MW_INSTALL_PATH=$PWD
 php ../extensions/DumpHTML/DumpHTML.php
...

Trust me... I was the one with one of the strangest MediaWiki setups
ever. Symlinks all over the place, multiple extension directories,
multiple svn points, global configs, wiki installations near purely
setup with symlinks to the base files, shell scripts for generating
those installations and upgrading svn points, and so on...
It was basically a personal mini-farm running all my wiki which had
varying extensions setup and varying installations of MediaWiki (latest
stable for most, trunk for the various dev sites).
IIRC, That's how that chunk of code ended up inside MW in the first place.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]



-- 
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MW test infrastructure architecture

2009-08-12 Thread Tim Landscheidt
Chad innocentkil...@gmail.com wrote:

 [...]
 What we need is something similar to parser tests, where it's
 absurdly easy to pop new tests in with little to no coding
 required at all. [...]

Is it really so much more difficult to write

| TestAddArticle (Main Page, blah blah);

than

| !! article
| Main Page
| !! text
| blah blah
| !! endarticle

or

| TestParser (External links: trail, Linktrails should not work for external 
links: [http://example.com link]s, pLinktrails should not work for external 
links: a href=\http://example.com\; class=\external text\ 
rel=\nofollow\link/as\n/p);

than

| !! test
| External links: trail
| !! input
| Linktrails should not work for external links: [http://example.com link]s
| !! result
| pLinktrails should not work for external links: a 
href=http://example.com; class=external text rel=nofollowlink/as
| /p
| !! end

I think the motivation for using standard techniques is to
lower the bar for newcomers to the code and not have them
to master another (fairly self-explanatory) syntax.

Tim


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MW test infrastructure architecture

2009-08-12 Thread Chad
On Wed, Aug 12, 2009 at 6:40 PM, Tim Landscheidtt...@tim-landscheidt.de wrote:
 Chad innocentkil...@gmail.com wrote:

 [...]
 What we need is something similar to parser tests, where it's
 absurdly easy to pop new tests in with little to no coding
 required at all. [...]

 Is it really so much more difficult to write

 | TestAddArticle (Main Page, blah blah);

 than

 | !! article
 | Main Page
 | !! text
 | blah blah
 | !! endarticle

 or

 | TestParser (External links: trail, Linktrails should not work for 
 external links: [http://example.com link]s, pLinktrails should not work 
 for external links: a href=\http://example.com\; class=\external text\ 
 rel=\nofollow\link/as\n/p);

 than

 | !! test
 | External links: trail
 | !! input
 | Linktrails should not work for external links: [http://example.com link]s
 | !! result
 | pLinktrails should not work for external links: a 
 href=http://example.com; class=external text rel=nofollowlink/as
 | /p
 | !! end

 I think the motivation for using standard techniques is to
 lower the bar for newcomers to the code and not have them
 to master another (fairly self-explanatory) syntax.

 Tim


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


I don't disagree here. I don't really care what the particular format
is. My main point (on which I think we agree) was that it needs to
be standardized, easy to learn and use, and very flexible to a variety
of tests we could potentially want.

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Do no harm

2009-08-12 Thread Tim Landscheidt
Brion Vibber br...@wikimedia.org wrote:

 Here's what I do in similar circumstances.  Create another variable,
 $wgScriptPathEscaped.  Then, gradually make the changes.  Wait for tests.
 Change some more.  Eventually, most of the old ones will be gone.

 $wgScriptPath is a URL fragment, and by definition MUST be properly
 URL-escaped.

 There is no need for a second variable; it just needs to be initialized
 correctly -- something that never got done because it's extremely rare
 to stick a web app in a folder that has characters that would need escaping.

It just needs to be *documented* correctly :-).

Tim


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] A comprehensive parser regression test

2009-08-12 Thread Roan Kattouw
2009/8/12 Chad innocentkil...@gmail.com:
 To elaborate on the final point. Sometimes the parser is changed and
 it breaks output on purpose. Case in point was when Tim rewrote the
 preprocessor. Some parts of syntax were intentionally changed. You'd
 have to establish a new baseline for this new behavior at that point.

Another (but fundamentally different) case in point is when Aryeh
removed type= on script tags or when I added class=tocsection-n
(n being the section number) to the TOC lis. In some case you want
to add/remove an attribute or otherwise intentionally change the HTML
output, and if that output is covered by parser tests those tests need
to be updated. When updating parser test cases it's easy enough to
tell the difference between intentional and accidental changes, but
this is not so apparent when comparing pre- and post-change parses of
big wiki pages (like when Tim added class=mw-redirect to all links
to redirects, which cluttered Special:ParserDiffTest output).

 Finally, I don't think we need to dump all of enwiki. It can't require that
 much content to describe the various combinations of wiki syntax...

Exactly. Instead of throwing a huge amount of wikitext at it and
hoping that'll cover everything, we should make our test suite more
comprehensive by adding lots of new parser test. Of course there'll be
*some* crazy bugs concerning weird interactions that only happen in
the wild, you can't really write tests for those, but that's life.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] SSL certificates for Wikimedia sites

2009-08-12 Thread Brion Vibber
On 8/12/09 3:49 PM, Tim Landscheidt wrote:
 Brion Vibberbr...@wikimedia.org  wrote:

 [...]
 I'm primarily thinking about the certificates for:

 - wikitech.leuksman.com

 This is an old link from when we stuck our tech doc wiki on my personal
 site for a while; you'll see there's a nicer cert at the permanent URL:
 https://wikitech.wikimedia.org/
 [...]

 Hmmm, the latter now shows a self-signed certificate again?

Yeah, but it's got the right URL at least! ;)

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] A comprehensive parser regression test

2009-08-12 Thread Brion Vibber
On 8/12/09 2:55 PM, Chad wrote:
 To elaborate on the final point. Sometimes the parser is changed and
 it breaks output on purpose. Case in point was when Tim rewrote the
 preprocessor. Some parts of syntax were intentionally changed. You'd
 have to establish a new baseline for this new behavior at that point.

Knowing when things change is important -- you want to know that things 
changed *on purpose* not *by accident*.

And yes, that means that baseline shifts can be painful as we have to go 
through and see if any of the changes were regressions, but that's the 
price of knowing what's going on. ;)

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] SSL certificates for Wikimedia sites

2009-08-12 Thread George Herbert
Unsecure Sockets Layer?

(I'll shut up now 8-)

On Wed, Aug 12, 2009 at 4:12 PM, Brion Vibberbr...@wikimedia.org wrote:
 On 8/12/09 3:49 PM, Tim Landscheidt wrote:
 Brion Vibberbr...@wikimedia.org  wrote:

 [...]
 I'm primarily thinking about the certificates for:

 - wikitech.leuksman.com

 This is an old link from when we stuck our tech doc wiki on my personal
 site for a while; you'll see there's a nicer cert at the permanent URL:
 https://wikitech.wikimedia.org/
 [...]

 Hmmm, the latter now shows a self-signed certificate again?

 Yeah, but it's got the right URL at least! ;)

 -- brion

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
-george william herbert
george.herb...@gmail.com

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] A comprehensive parser regression test

2009-08-12 Thread Brion Vibber
On 8/12/09 4:05 PM, Roan Kattouw wrote:
 Exactly. Instead of throwing a huge amount of wikitext at it and
 hoping that'll cover everything, we should make our test suite more
 comprehensive by adding lots of new parser test. Of course there'll be
 *some* crazy bugs concerning weird interactions that only happen in
 the wild, you can't really write tests for those, but that's life.

Real-page testing is like fuzz testing; the point is never to replace 
explicit tests, but that it helps you find things you didn't think of 
yourself.

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Batik SVG-to-PNG server revisited

2009-08-12 Thread hk_kng
Helder Geovane kindly asked me to post this here, after I published a comment at
http://meta.wikimedia.org/wiki/Talk:SVG_image_support#Batik_server_revisited

This started out when a colleague posted a proposal on strategy wiki of throwing
some foundation money at the active development of rsvg. I wanted to counter
this with an old question: why not use Batik instead? The decision for rsvg in
2006 was based on the asumption that a Java application was a bad idea due to
overhead. At this time, a user proposed a http Server app based on Simple
(http://www.simplframework.org). He published a demonstration version
(http://batikd.sourceforge.net/), but noone reacted, and his work stopped. Now I
have taken a look at his approach, and I find the results are not that bad, for
an early development stage: in most cases, the batikd rendering times are two or
three times those of rsvg, and in some specialized cases (extensive use of
filters), batikd is even faster.

I would say, like rsvg improvements, a Batik server for a production environment
would need considerable work, and I am frankly not qualified to do it. So,
regarding the funding proposal my position is to say that there are multiple
avenues to get better SVG support. (I have seen the comments about svgweb,
another project that seems to need work before it is usable.)

This post here is only meant as an encouragement if someone who knows more about
server development than I do would like to take a deeper look.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Assumptions for development machines (w.r.t. to test infrastructure)

2009-08-12 Thread Aryeh Gregor
On Wed, Aug 12, 2009 at 10:59 AM, dan nessettdness...@yahoo.com wrote:
 If a developer has more system privileges than a production admin, to what 
 extent? Can we assume he has root access? If not, can we assume he can get 
 someone who has to do things like install PHPUnit? Can we assume the 
 availability of PERL or should we only assume PHP? Can we assume *AMP (e.g., 
 LAMP, WAMP, MAMP, XAMP)? Can we assume PEAR? Can the developer install into 
 PEAR?

We should *try* to have no extra dependencies.  Every extra dependency
means fewer people will be able to run the tests.  If a reasonable
dependency would gain us a lot in some other way, that might be
acceptable, depending on what we actually gain from it.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l