Hi gang:

> Is that really so with the code additions?  Stay tuned for the next
> episode of this post-release fun series ;-)

Of course, the overall number of lines of code per module cannot
measure any in situ code modifications.  The coding activity in
various modules can better be grasped by considering also (i) the
number of commits; (ii) the number of line changes (additions,
modifications, deletions) per module.

Firstly, let us exclude from the analysis all "generated" files that
we store in CVS, namely *.po, *.pot, *.rdf and intbitset.c files.
These files could alter the stats considerably, e.g. updating PO files
with new message references basically means to update 70,000+ LOCs,
but this update is quite effortless thanks to the GNU gettext
infrastructure.

Secondly, it is not easily possible to tell distinctly line additions
and line deletions from line modifications.  Every commit is therefore
represented by a number of lines "added" and lines "deleted", with a
line modification being represented by a line deletion (of the old
line version) and a line addition (of the new line version).  Hence
the net difference of "lines added minus lines deleted" can be
positive in case new code was mostly added, zero in case the existing
code was only modified, and negative in case the old code was mostly
refactored away.

Thirdly, let us sort the results primarily not by the number of
commits, but by the number of line "additions" (= number of lines
added and/or modified) which is arguably a better representation of the
inherent coding activity, at least to a certain degree.  (This is also
what the StatCVS software uses when it estimates the "Lines of Code
per Author".)

So, after this necessary intro, here are the results obtained by my
little git log analysis tool:

>>> Commits and LOC changes, from 0.92.1 to 0.99.0:

         Module  # Commits   # LOCs added # LOCs deleted # LOCs add-del
    ----------- ---------- -------------- -------------- --------------
      websearch        441         +37372         -50388         -13016
       miscutil        574         +21674         -13136           8538
      websubmit        710         +19407         -12031           7376
      bibformat        576         +10023         -22073         -12050
      webaccess        284          +8933          -6697           2236
        webhelp        167          +8909          -8791            118
     webjournal        155          +6400          -2248           4152
       webstyle        221          +5508          -2677           2831
     websession        275          +5480          -4547            933
        bibedit        134          +5273          -3299           1974
      bibupload         88          +4833          -2899           1934
        bibrank        288          +4326          -4859           -533
          OTHER        261          +3979          -5877          -1898
     bibharvest        159          +3890          -3905            -15
       bibindex        114          +3447          -3083            364
     bibconvert         97          +3225          -2118           1107
        webstat         69          +2905           -764           2141
       bibsched         68          +1689          -1371            318
    bibclassify         88          +1545          -1265            280
     webcomment         94          +1265          -1083            182
      webbasket         59          +1084           -918            166
       webalert         78           +710           -731            -21
      elmsubmit         76           +593         -11218         -10625
     webmessage         56           +568           -545             23
       bibmatch         25           +228           -265            -37
    ----------- ---------- -------------- -------------- --------------
          TOTAL       5157        +163266        -166788          -3522


We can see the overall diminution of LOCs.  How come?  This stems from
us moving away from WML files (mostly to WebDoc, some to Python), from
IN files (mostly to Python files), and from removing some unnecessary
files (old PHP sources, old magic in ElmSubmit).  Indeed, if we can
look at the numbers broken down by the file extension:

       Filetype  # Commits   # LOCs added # LOCs deleted # LOCs add-del
    ----------- ---------- -------------- -------------- --------------
             py       2986         +81962         -50204          31758
         webdoc        594         +51479         -27522          23957
            xml         14          +6236           -331           5905
            wml        348          +5874         -59119         -53245
            sql         86          +4581          -2182           2399
             am        514          +2167          -2381           -214
            xsl         43          +2044           -410           1634
           NONE         76          +1429          -1673           -244
             in        126          +1275          -4477          -3202
            pyx         31          +1237           -478            759
           conf         13          +1053           -298            755
           html         17           +876            -20            856
              c         16           +645           -595             50
            tpl         79           +581           -476            105
            css         24           +539           -294            245
             ac         40           +262           -304            -42
            cfg         15           +230            -41            189
            bft         24           +196           -173             23
             js          3           +157            -17            140
              h         14           +124            -84             40
            tex          4            +81            -41             40
            sed          2            +67            -67              0
      cvsignore         37            +61            -77            -16
            bfx          9            +40           -323           -283
             el          5            +25            -11             14
             sh          2            +24            -24              0
            bfo          8             +8             -4              4
             kb          7             +7             -8             -1
           lisp          2             +4             -4              0
            dtd          1             +2             -2              0
             m4          1             +0           -502           -502
            php         13             +0          -4541          -4541
             KB          1             +0            -12            -12
          magic          1             +0            -17            -17
            ext          1             +0         -10076         -10076
    ----------- ---------- -------------- -------------- --------------
          TOTAL       5157        +163266        -166788          -3522

We can see the massive disappearance of WML, EXT, PHP and IN files.

BTW, the huge number of LOC deletions for WebDoc files is thanks to a
smarter handling of I18N translations in WebDoc when compared to in
WML:

    20,002 LOCs = websearch/doc/guide.html.wml
     5,251 LOCs = websearch/doc/search-guide.webdoc

and this despite the fact that the Search Guide is now fully
translated into French too.

Let us look now more specifically at the Python source files, which
can be somewhat more representative of the coding activity:

>>> Commits and LOC changes from 0.92.1 to 0.99.0, *.py files only:

         Module  # Commits   # LOCs added # LOCs deleted # LOCs add-del
    ----------- ---------- -------------- -------------- --------------
      websubmit        503         +15188          -7590           7598
      websearch        303          +6463          -4838           1625
     webjournal        130          +6203          -2219           3984
      webaccess        179          +6180          -4048           2132
       miscutil        229          +6107          -2736           3371
      bibformat        347          +5847          -4646           1201
     websession        238          +5241          -3585           1656
        bibedit         99          +4862          -2878           1984
      bibupload         61          +4292          -2454           1838
       webstyle        140          +3548          -1714           1834
     bibharvest        124          +3452          -2836            616
       bibindex         82          +2838          -2468            370
        bibrank        181          +2645          -2450            195
        webstat         33          +2546           -390           2156
       bibsched         44          +1542          -1188            354
     webcomment         68          +1139           -917            222
      webbasket         39           +985           -781            204
    bibclassify         21           +600           -343            257
       webalert         53           +541           -525             16
     bibconvert         21           +493           -340            153
      elmsubmit         40           +478           -531            -53
     webmessage         33           +457           -398             59
          OTHER         15           +244           -258            -14
       bibmatch          3            +71            -71              0
    ----------- ---------- -------------- -------------- --------------
          TOTAL       2986         +81962         -50204          31758

We can see that, indeed, WebSubmit received the most coding activity
in terms of lines added and/or modified too, as in the previous
episode; but now it is WebSearch who is the second most active module,
in spite of its lower net LOC change.

(Beginning of parenthesis.  If you read carefully, maybe you have
noticed a difference in the net LOC change numbers when compared to
the previous episode.  There, the LOC numbers came from
kwalitee-related stats, measured by including also Pythonic-like
bin/*.in files and by excluding Pythonic test suite files.  Here, I
include any *.py files, which is why the net numbers are slightly
different.  But they are mutually consistent, e.g. for WebSubmit: 7362
LOCs (kwalitee-based net change, episode 1) plus 210 LOCs
(thumbmaker.in in 0.92.1) plus 177 LOCs (tests in 0.99.0) minus 151
LOCs (tests in 0.92.1) equals to 7598 LOCs, which is exactly the net
LOC change detected by the git log analysis tool here in episode 2.
End of parenthesis.)

So, while the LOC numbers and the module order are not to be directly
compared to the ones cited in the previous episode (mostly due to the
exclusion of the test code), we clearly see that e.g. WebSearch
development was actually "more active" than BibEdit development, due
to higher "# LOCs added" (read: lines added and/or modified) in spite
of lower "# LOCs add-del" (read: line added), even after we exclude
the test code increase:

>>> Commits and LOC changes from 0.92.1 to 0.99.0, *_tests.py only:

         Module  # Commits   # LOCs added # LOCs deleted # LOCs add-del
    ----------- ---------- -------------- -------------- --------------
      bibupload         22          +2261           -676           1585
       miscutil         43          +1110           -485            625
      websearch         51           +845           -494            351
      bibformat         35           +478           -373            105
     websession         29           +390           -150            240
      webaccess         16           +246           -105            141
      webbasket          6           +226            -80            146
     bibharvest         19           +195           -120             75
        bibrank         31           +189           -162             27
        bibedit         15           +122           -116              6
     bibconvert          7           +121            -56             65
       bibindex         13           +105            -70             35
    bibclassify          5            +85            -22             63
      websubmit         10            +82            -56             26
     webcomment         10            +58            -48             10
       webstyle          6            +41            -28             13
      elmsubmit          3            +25            -28             -3
       webalert          3            +15            -15              0
     webmessage          3            +15            -15              0
    ----------- ---------- -------------- -------------- --------------
          TOTAL        327          +6609          -3099           3510

Here the same three modules came on top with respect to either (i) the
net test suite LOC change (read: test code additions) or to (ii) test
suite LOC "dynamism" (read: test code additions and/or modifications).

Finally, what about the most hacked-upon Python files?

>>> Commits and LOC changes per file from 0.92.1 to 0.99.0, *.py only:

Sorted by the number of lines added and/or modified:

                                        File  # Commits   # LOCs added # LOCs 
deleted # LOCs add-del
                                 ----------- ---------- -------------- 
-------------- --------------
                   bibedit/lib/refextract.py         38          +3413          
-1914           1499
 bibupload/lib/bibupload_regression_tests.py         22          +2261          
 -676           1585
         webaccess/lib/webaccessadmin_lib.py         32          +2133          
-1850            283
     websubmit/lib/websubmit_file_stamper.py          9          +2069          
 -700           1369
          webjournal/lib/webjournal_utils.py         32          +1925          
 -688           1237
                  bibupload/lib/bibupload.py         30          +1862          
-1610            252
                      webstyle/lib/webdoc.py         35          +1840          
 -938            902
              websearch/lib/search_engine.py         82          +1665          
-1166            499
                 websubmit/lib/bibdocfile.py         25          +1629          
 -238           1391
   webjournal/lib/webjournal_webinterface.py         28          +1621          
 -996            625
                  miscutil/lib/inveniocfg.py         47          +1621          
 -617           1004
        websubmit/lib/websubmit_templates.py         18          +1535          
 -510           1025
                  websubmit/web/publiline.py         24          +1373          
 -171           1202
       webaccess/lib/access_control_admin.py         23          +1321          
 -945            376
         bibharvest/lib/oaiarchive_engine.py         18          +1290          
-1085            205
        websearch/lib/websearch_templates.py         49          +1255          
 -821            434
             bibindex/lib/bibindex_engine.py         29          +1196          
-1370           -174
                     bibsched/lib/bibtask.py         19          +1110          
 -644            466
           websubmit/lib/websubmit_engine.py         34          +1106          
 -667            439
                                       [...]

Sorted by the number of commits:

                                          File  # Commits   # LOCs added # LOCs 
deleted # LOCs add-del
                                   ----------- ---------- -------------- 
-------------- --------------
                websearch/lib/search_engine.py         82          +1665        
  -1166            499
                     websession/lib/webuser.py         51           +860        
   -608            252
          websearch/lib/websearch_templates.py         49          +1255        
   -821            434
                    miscutil/lib/inveniocfg.py         47          +1621        
   -617           1004
        websession/lib/websession_templates.py         42           +809        
   -641            168
     websession/lib/websession_webinterface.py         39           +716        
   -475            241
                     bibedit/lib/refextract.py         38          +3413        
  -1914           1499
       websearch/lib/websearch_webinterface.py         37           +799        
   -430            369
   websearch/lib/websearch_regression_tests.py         35           +726        
   -390            336
                        webstyle/lib/webdoc.py         35          +1840        
   -938            902
             websubmit/lib/websubmit_engine.py         34          +1106        
   -667            439
        webaccess/lib/access_control_config.py         33           +245        
   -183             62
            webjournal/lib/webjournal_utils.py         32          +1925        
   -688           1237
           webaccess/lib/webaccessadmin_lib.py         32          +2133        
  -1850            283
            webstyle/lib/webstyle_templates.py         31           +420        
   -250            170
                    bibupload/lib/bibupload.py         30          +1862        
  -1610            252
            websearch/lib/websearch_webcoll.py         30           +540        
   -700           -160
               bibindex/lib/bibindex_engine.py         29          +1196        
  -1370           -174
          bibrank/lib/bibrank_record_sorter.py         29           +292        
   -358            -66
                                         [...]

In the next episode of this post-release fun series, we shall look at
some developer-related stats...

Best regards
-- 
Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>

Reply via email to