[Wikitech-l] CPRT feasibility

2009-08-20 Thread dan nessett
I am looking into the feasibility of writing a comprehensive parser regression 
test (CPRT). Before writing code, I thought I would try to get some idea of how 
well such a tool would perform and what gotchas might pop up. An easy first 
step is to run dump_HTML and capture some data and statistics.

I tried to run the version of dumpHTML in r54724, but it failed. So, I went 
back to 1.14 and ran that version against a small personal wiki database I 
have. I did this to get an idea of what structures dump_HTML produces and also 
to get some performance data with which to do projections of runtime/resource 
usage.

I ran dumpHTML twice using the same MW version and same database. I then diff'd 
the two directories produced. One would expect no differences, but that 
expectation is wrong. I got a bunch of diffs of the following form (I have put 
a newline between the two file names to shorten the line length):

diff -r 
HTML_Dump/articles/d/n/e/User~Dnessett_Bref_Examples_Example1_Chapter_1_4083.html
 
HTML_Dump2/articles/d/n/e/User~Dnessett_Bref_Examples_Example1_Chapter_1_4083.html
77,78c77,78
 Post-expand include size: 16145/2097152 bytes
 Template argument size: 12139/2097152 bytes
---
 Post-expand include size: 16235/2097152 bytes
 Template argument size: 12151/2097152 bytes

I looked at one of the html files to see where these differences appear. They 
occur in an html comment:

!-- 
NewPP limit report
Preprocessor node count: 1891/100
Post-expand include size: 16145/2097152 bytes
Template argument size: 12139/2097152 bytes
Expensive parser function count: 0/100
--

Does anyone have an idea of what this is for? Is there any way to configure MW 
so it isn't produced?

I will post some performance data later.

Dan


  

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] CPRT feasibility

2009-08-20 Thread Andrew Garrett

On 20/08/2009, at 6:19 PM, dan nessett wrote:
 !--
 NewPP limit report
 Preprocessor node count: 1891/100
 Post-expand include size: 16145/2097152 bytes
 Template argument size: 12139/2097152 bytes
 Expensive parser function count: 0/100
 --

 Does anyone have an idea of what this is for? Is there any way to  
 configure MW so it isn't produced?

As the title implies, it is a performance limit report. You can remove  
it by changing the parser options passed to the parser. Look at the  
ParserOptions and Parser classes.

--
Andrew Garrett
agarr...@wikimedia.org
http://werdn.us/


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] CPRT feasibility

2009-08-20 Thread dan nessett
--- On Thu, 8/20/09, Andrew Garrett agarr...@wikimedia.org wrote:

 As the title implies, it is a performance limit report. You
 can removeĀ  
 it by changing the parser options passed to the parser.
 Look at theĀ  
 ParserOptions and Parser classes.

Thanks. It appears dumpHTML has no command option to turn off this report (the 
parser option is mEnableLimitReport).

A question to the developer community: Is it better to change dumpHTML to 
accept a new option (to turn off Limit Reports) or copy dumpHTML into a new 
CPRT extension and change it. I strongly feel that having two extensions with 
essentially the same functionality is bad practice. On the other hand, changing 
dumpHTML means it becomes dual purposed, which has the potential of making it 
big and ugly. One compromise position is to attempt to factor dumpHTML so that 
a core provides common functionality to two different upper layers. However, I 
don't know if that is acceptable practice for extensions.

A short term fix is to pipe the output of dumpHTML through a filter that 
removes the Limit Report. That would allow developers to use dumpHTML (as a 
CPRT) fairly quickly to find and fix the known-to-fail parser bugs. The 
downside to this is it may significantly degrade performance.

Dan


  

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l