#12342: things I don't like about the json <--> hg conversion code
----------------------------+-----------------------------------------------
   Reporter:  was           |          Owner:  tbd     
       Type:  defect        |         Status:  new     
   Priority:  critical      |      Milestone:  sage-5.0
  Component:  distribution  |       Keywords:          
Work_issues:                |       Upstream:  N/A     
   Reviewer:                |         Author:          
     Merged:                |   Dependencies:          
----------------------------+-----------------------------------------------

Comment(by kini):

 git has a beautiful command called `fast-export` which does exactly what
 we want, and is a lot better and more robust than my code. `git fast-
 export --all > filename` will produce a text file containing all commit
 data ("commits"), file data ("blobs"), filename data ("trees"), pointers
 ("refs"), and pointer movement history ("the reflog") in a human-readable
 format. `git fast-import < filename` performed on an empty repository will
 reproduce the original repository.

 `fast-export` is more thorough than my code, which only exports the first
 three of the five things I listed above, and loses the other two - not
 that we are using refs ("bookmarks" in Mercurial) or reflogs (not existent
 in Mercurial afaik) to do anything in Sage at the moment. Also, while my
 code just attempts to make human-readable the deltas between changesets,
 `fast-export` actually dumps every separate version of every file in the
 repository, which is I guess more "readable" (though this also results in
 much larger output). This is actually the natural way to do things in git,
 because git's most abstract conception of the repository actually does
 contain every separate version of every file, whereas Mercurial's level of
 abstraction is lower and stops at deltas between changesets.

 If you plan on switching to git anytime soon, I recommend we just drop
 this ticket for now and work on getting `fast-export` into our build
 scripts instead. We already have a git-based Sage library repository; I
 can make analogous ones for the scripts, extcode, and root repos, though
 it's probably a good idea to have a long hard think about how we can
 consolidate our repos as much as possible, as the switch from hg to git
 would be the ideal time to do massive history rewriting (which git is good
 at, by the way, using `git filter-branch`). The SPKG system in particular
 needs a rework, IMO. We should have one git repo containing all the
 patches, spkg-install scripts, SPKG.txt files, etc. for all the packages
 we offer, standard/optional/experimental/whatever, and then just `tar -cj`
 the src/ directories separately. I think we should take a look at what
 Burcin is doing with lmonade for clues.

 I timed `fast-export` and `fast-import` of
 [http://github.com/sagemath/sagelib the git repository I'm maintaining on
 github], as well as compression and decompression into gz, bz2, xz, and
 lrz archives (and the resulting filesizes):

 {{{
 fs@boone ~/src $ bash -x fast-export-test.sh
 + cd sagelib
 + git fast-export --all

 real    0m35.897s
 user    0m33.348s
 sys     0m1.494s
 + cd ..
 + du -h sagelib.fast_export
 1.8G    sagelib.fast_export
 + gzip sagelib.fast_export

 real    0m52.549s
 user    0m50.681s
 sys     0m0.779s
 + du -h sagelib.fast_export.gz
 392M    sagelib.fast_export.gz
 + gunzip sagelib.fast_export

 real    0m13.212s
 user    0m9.598s
 sys     0m0.882s
 + bzip2 sagelib.fast_export

 real    2m29.877s
 user    2m28.340s
 sys     0m0.858s
 + du -h sagelib.fast_export.bz2
 264M    sagelib.fast_export.bz2
 + bunzip2 sagelib.fast_export.bz2

 real    0m53.294s
 user    0m50.876s
 sys     0m1.372s
 + xz sagelib.fast_export

 real    10m49.975s
 user    10m46.808s
 sys     0m1.329s
 + du -h sagelib.fast_export.xz
 107M    sagelib.fast_export.xz
 + unxz sagelib.fast_export.xz

 real    0m10.790s
 user    0m8.476s
 sys     0m0.968s
 + lrzip -D sagelib.fast_export
 Output filename is: sagelib.fast_export.lrz
 sagelib.fast_export - Compression Ratio: 131.246. Average Compression
 Speed: 26.304MB/s.
 Total time: 00:01:09.06

 real    1m9.066s
 user    1m12.394s
 sys     0m0.750s
 + du -h sagelib.fast_export.lrz
 14M     sagelib.fast_export.lrz
 + lrunzip -D sagelib.fast_export.lrz
 Output filename is: sagelib.fast_export
 Decompressing...
 100%    1815.25 /   1815.25 MB
 Average DeCompression Speed: 181.500MB/s
 Output filename is: sagelib.fast_export: [OK] - 1903430998 bytes
 Total time: 00:00:13.12

 real    0m13.172s
 user    0m4.774s
 sys     0m1.624s
 + git init sagelib2
 Initialized empty Git repository in /home/fs/src/sagelib2/.git/
 + cd sagelib2
 + git fast-import
 git-fast-import statistics:
 ---------------------------------------------------------------------
 Alloc'd objects:     125000
 Total objects:       120380 (      9143 duplicates                  )
       blobs  :        41942 (         0 duplicates      32577 deltas of
 39920 attempts)
       trees  :        61752 (      9143 duplicates      55847 deltas of
 56256 attempts)
       commits:        16686 (         0 duplicates          0 deltas of
 0 attempts)
       tags   :            0 (         0 duplicates          0 deltas of
 0 attempts)
 Total branches:         459 (       494 loads     )
       marks:        1048576 (     58628 unique    )
       atoms:           3025
 Memory total:          8094 KiB
        pools:          2235 KiB
      objects:          5859 KiB
 ---------------------------------------------------------------------
 pack_report: getpagesize()            =       4096
 pack_report: core.packedGitWindowSize = 1073741824
 pack_report: core.packedGitLimit      = 8589934592
 pack_report: pack_used_ctr            =     322098
 pack_report: pack_mmap_calls          =      16687
 pack_report: pack_open_windows        =          1 /          1
 pack_report: pack_mapped              =  435565281 /  435565281
 ---------------------------------------------------------------------


 real    1m22.824s
 user    1m20.089s
 sys     0m1.329s
 }}}

 Tests performed on a !VirtualBox VM running on a 4-core i5-2500K @ 4.5 GHz
 with 8 GB of RAM.

 In summary: `fast-export` produces a 1.8 GB text file in about 30 seconds.
 This can be compressed into a 392 MB gz file in about a minute, a 264 MB
 bz2 file in about two and a half minutes, a 107 MB xz file in about eleven
 minutes, or a 14 MB lrz file in about a minute. These can be decompressed
 in about ten seconds, a minute, ten seconds, and ten seconds respectively.
 Importing the `fast-export` dump using `fast-import` takes about a minute
 and a half.

 So obviously we should try to compress our "paranoid source archive" with
 [http://ck.kolivas.org/apps/lrzip/README lrzip] if at all possible (it
 actually compresses down to less than half the size of the git repository
 itself!). We don't need to ship this with sage unless you want all users
 to be able to produce paranoid source archives. Users who want the source
 archive will of course need to install lrzip themselves.

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/12342#comment:6>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to