Hi gang:
> Is that really so with the code additions? Stay tuned for the next
> episode of this post-release fun series ;-)
Of course, the overall number of lines of code per module cannot
measure any in situ code modifications. The coding activity in
various modules can better be grasped by considering also (i) the
number of commits; (ii) the number of line changes (additions,
modifications, deletions) per module.
Firstly, let us exclude from the analysis all "generated" files that
we store in CVS, namely *.po, *.pot, *.rdf and intbitset.c files.
These files could alter the stats considerably, e.g. updating PO files
with new message references basically means to update 70,000+ LOCs,
but this update is quite effortless thanks to the GNU gettext
infrastructure.
Secondly, it is not easily possible to tell distinctly line additions
and line deletions from line modifications. Every commit is therefore
represented by a number of lines "added" and lines "deleted", with a
line modification being represented by a line deletion (of the old
line version) and a line addition (of the new line version). Hence
the net difference of "lines added minus lines deleted" can be
positive in case new code was mostly added, zero in case the existing
code was only modified, and negative in case the old code was mostly
refactored away.
Thirdly, let us sort the results primarily not by the number of
commits, but by the number of line "additions" (= number of lines
added and/or modified) which is arguably a better representation of the
inherent coding activity, at least to a certain degree. (This is also
what the StatCVS software uses when it estimates the "Lines of Code
per Author".)
So, after this necessary intro, here are the results obtained by my
little git log analysis tool:
>>> Commits and LOC changes, from 0.92.1 to 0.99.0:
Module # Commits # LOCs added # LOCs deleted # LOCs add-del
----------- ---------- -------------- -------------- --------------
websearch 441 +37372 -50388 -13016
miscutil 574 +21674 -13136 8538
websubmit 710 +19407 -12031 7376
bibformat 576 +10023 -22073 -12050
webaccess 284 +8933 -6697 2236
webhelp 167 +8909 -8791 118
webjournal 155 +6400 -2248 4152
webstyle 221 +5508 -2677 2831
websession 275 +5480 -4547 933
bibedit 134 +5273 -3299 1974
bibupload 88 +4833 -2899 1934
bibrank 288 +4326 -4859 -533
OTHER 261 +3979 -5877 -1898
bibharvest 159 +3890 -3905 -15
bibindex 114 +3447 -3083 364
bibconvert 97 +3225 -2118 1107
webstat 69 +2905 -764 2141
bibsched 68 +1689 -1371 318
bibclassify 88 +1545 -1265 280
webcomment 94 +1265 -1083 182
webbasket 59 +1084 -918 166
webalert 78 +710 -731 -21
elmsubmit 76 +593 -11218 -10625
webmessage 56 +568 -545 23
bibmatch 25 +228 -265 -37
----------- ---------- -------------- -------------- --------------
TOTAL 5157 +163266 -166788 -3522
We can see the overall diminution of LOCs. How come? This stems from
us moving away from WML files (mostly to WebDoc, some to Python), from
IN files (mostly to Python files), and from removing some unnecessary
files (old PHP sources, old magic in ElmSubmit). Indeed, if we can
look at the numbers broken down by the file extension:
Filetype # Commits # LOCs added # LOCs deleted # LOCs add-del
----------- ---------- -------------- -------------- --------------
py 2986 +81962 -50204 31758
webdoc 594 +51479 -27522 23957
xml 14 +6236 -331 5905
wml 348 +5874 -59119 -53245
sql 86 +4581 -2182 2399
am 514 +2167 -2381 -214
xsl 43 +2044 -410 1634
NONE 76 +1429 -1673 -244
in 126 +1275 -4477 -3202
pyx 31 +1237 -478 759
conf 13 +1053 -298 755
html 17 +876 -20 856
c 16 +645 -595 50
tpl 79 +581 -476 105
css 24 +539 -294 245
ac 40 +262 -304 -42
cfg 15 +230 -41 189
bft 24 +196 -173 23
js 3 +157 -17 140
h 14 +124 -84 40
tex 4 +81 -41 40
sed 2 +67 -67 0
cvsignore 37 +61 -77 -16
bfx 9 +40 -323 -283
el 5 +25 -11 14
sh 2 +24 -24 0
bfo 8 +8 -4 4
kb 7 +7 -8 -1
lisp 2 +4 -4 0
dtd 1 +2 -2 0
m4 1 +0 -502 -502
php 13 +0 -4541 -4541
KB 1 +0 -12 -12
magic 1 +0 -17 -17
ext 1 +0 -10076 -10076
----------- ---------- -------------- -------------- --------------
TOTAL 5157 +163266 -166788 -3522
We can see the massive disappearance of WML, EXT, PHP and IN files.
BTW, the huge number of LOC deletions for WebDoc files is thanks to a
smarter handling of I18N translations in WebDoc when compared to in
WML:
20,002 LOCs = websearch/doc/guide.html.wml
5,251 LOCs = websearch/doc/search-guide.webdoc
and this despite the fact that the Search Guide is now fully
translated into French too.
Let us look now more specifically at the Python source files, which
can be somewhat more representative of the coding activity:
>>> Commits and LOC changes from 0.92.1 to 0.99.0, *.py files only:
Module # Commits # LOCs added # LOCs deleted # LOCs add-del
----------- ---------- -------------- -------------- --------------
websubmit 503 +15188 -7590 7598
websearch 303 +6463 -4838 1625
webjournal 130 +6203 -2219 3984
webaccess 179 +6180 -4048 2132
miscutil 229 +6107 -2736 3371
bibformat 347 +5847 -4646 1201
websession 238 +5241 -3585 1656
bibedit 99 +4862 -2878 1984
bibupload 61 +4292 -2454 1838
webstyle 140 +3548 -1714 1834
bibharvest 124 +3452 -2836 616
bibindex 82 +2838 -2468 370
bibrank 181 +2645 -2450 195
webstat 33 +2546 -390 2156
bibsched 44 +1542 -1188 354
webcomment 68 +1139 -917 222
webbasket 39 +985 -781 204
bibclassify 21 +600 -343 257
webalert 53 +541 -525 16
bibconvert 21 +493 -340 153
elmsubmit 40 +478 -531 -53
webmessage 33 +457 -398 59
OTHER 15 +244 -258 -14
bibmatch 3 +71 -71 0
----------- ---------- -------------- -------------- --------------
TOTAL 2986 +81962 -50204 31758
We can see that, indeed, WebSubmit received the most coding activity
in terms of lines added and/or modified too, as in the previous
episode; but now it is WebSearch who is the second most active module,
in spite of its lower net LOC change.
(Beginning of parenthesis. If you read carefully, maybe you have
noticed a difference in the net LOC change numbers when compared to
the previous episode. There, the LOC numbers came from
kwalitee-related stats, measured by including also Pythonic-like
bin/*.in files and by excluding Pythonic test suite files. Here, I
include any *.py files, which is why the net numbers are slightly
different. But they are mutually consistent, e.g. for WebSubmit: 7362
LOCs (kwalitee-based net change, episode 1) plus 210 LOCs
(thumbmaker.in in 0.92.1) plus 177 LOCs (tests in 0.99.0) minus 151
LOCs (tests in 0.92.1) equals to 7598 LOCs, which is exactly the net
LOC change detected by the git log analysis tool here in episode 2.
End of parenthesis.)
So, while the LOC numbers and the module order are not to be directly
compared to the ones cited in the previous episode (mostly due to the
exclusion of the test code), we clearly see that e.g. WebSearch
development was actually "more active" than BibEdit development, due
to higher "# LOCs added" (read: lines added and/or modified) in spite
of lower "# LOCs add-del" (read: line added), even after we exclude
the test code increase:
>>> Commits and LOC changes from 0.92.1 to 0.99.0, *_tests.py only:
Module # Commits # LOCs added # LOCs deleted # LOCs add-del
----------- ---------- -------------- -------------- --------------
bibupload 22 +2261 -676 1585
miscutil 43 +1110 -485 625
websearch 51 +845 -494 351
bibformat 35 +478 -373 105
websession 29 +390 -150 240
webaccess 16 +246 -105 141
webbasket 6 +226 -80 146
bibharvest 19 +195 -120 75
bibrank 31 +189 -162 27
bibedit 15 +122 -116 6
bibconvert 7 +121 -56 65
bibindex 13 +105 -70 35
bibclassify 5 +85 -22 63
websubmit 10 +82 -56 26
webcomment 10 +58 -48 10
webstyle 6 +41 -28 13
elmsubmit 3 +25 -28 -3
webalert 3 +15 -15 0
webmessage 3 +15 -15 0
----------- ---------- -------------- -------------- --------------
TOTAL 327 +6609 -3099 3510
Here the same three modules came on top with respect to either (i) the
net test suite LOC change (read: test code additions) or to (ii) test
suite LOC "dynamism" (read: test code additions and/or modifications).
Finally, what about the most hacked-upon Python files?
>>> Commits and LOC changes per file from 0.92.1 to 0.99.0, *.py only:
Sorted by the number of lines added and/or modified:
File # Commits # LOCs added # LOCs
deleted # LOCs add-del
----------- ---------- --------------
-------------- --------------
bibedit/lib/refextract.py 38 +3413
-1914 1499
bibupload/lib/bibupload_regression_tests.py 22 +2261
-676 1585
webaccess/lib/webaccessadmin_lib.py 32 +2133
-1850 283
websubmit/lib/websubmit_file_stamper.py 9 +2069
-700 1369
webjournal/lib/webjournal_utils.py 32 +1925
-688 1237
bibupload/lib/bibupload.py 30 +1862
-1610 252
webstyle/lib/webdoc.py 35 +1840
-938 902
websearch/lib/search_engine.py 82 +1665
-1166 499
websubmit/lib/bibdocfile.py 25 +1629
-238 1391
webjournal/lib/webjournal_webinterface.py 28 +1621
-996 625
miscutil/lib/inveniocfg.py 47 +1621
-617 1004
websubmit/lib/websubmit_templates.py 18 +1535
-510 1025
websubmit/web/publiline.py 24 +1373
-171 1202
webaccess/lib/access_control_admin.py 23 +1321
-945 376
bibharvest/lib/oaiarchive_engine.py 18 +1290
-1085 205
websearch/lib/websearch_templates.py 49 +1255
-821 434
bibindex/lib/bibindex_engine.py 29 +1196
-1370 -174
bibsched/lib/bibtask.py 19 +1110
-644 466
websubmit/lib/websubmit_engine.py 34 +1106
-667 439
[...]
Sorted by the number of commits:
File # Commits # LOCs added # LOCs
deleted # LOCs add-del
----------- ---------- --------------
-------------- --------------
websearch/lib/search_engine.py 82 +1665
-1166 499
websession/lib/webuser.py 51 +860
-608 252
websearch/lib/websearch_templates.py 49 +1255
-821 434
miscutil/lib/inveniocfg.py 47 +1621
-617 1004
websession/lib/websession_templates.py 42 +809
-641 168
websession/lib/websession_webinterface.py 39 +716
-475 241
bibedit/lib/refextract.py 38 +3413
-1914 1499
websearch/lib/websearch_webinterface.py 37 +799
-430 369
websearch/lib/websearch_regression_tests.py 35 +726
-390 336
webstyle/lib/webdoc.py 35 +1840
-938 902
websubmit/lib/websubmit_engine.py 34 +1106
-667 439
webaccess/lib/access_control_config.py 33 +245
-183 62
webjournal/lib/webjournal_utils.py 32 +1925
-688 1237
webaccess/lib/webaccessadmin_lib.py 32 +2133
-1850 283
webstyle/lib/webstyle_templates.py 31 +420
-250 170
bibupload/lib/bibupload.py 30 +1862
-1610 252
websearch/lib/websearch_webcoll.py 30 +540
-700 -160
bibindex/lib/bibindex_engine.py 29 +1196
-1370 -174
bibrank/lib/bibrank_record_sorter.py 29 +292
-358 -66
[...]
In the next episode of this post-release fun series, we shall look at
some developer-related stats...
Best regards
--
Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>