#12342: things I don't like about the json <--> hg conversion code
----------------------------+-----------------------------------------------
Reporter: was | Owner: tbd
Type: defect | Status: new
Priority: critical | Milestone: sage-5.0
Component: distribution | Keywords:
Work_issues: | Upstream: N/A
Reviewer: | Author:
Merged: | Dependencies:
----------------------------+-----------------------------------------------
Comment(by kini):
git has a beautiful command called `fast-export` which does exactly what
we want, and is a lot better and more robust than my code. `git fast-
export --all > filename` will produce a text file containing all commit
data ("commits"), file data ("blobs"), filename data ("trees"), pointers
("refs"), and pointer movement history ("the reflog") in a human-readable
format. `git fast-import < filename` performed on an empty repository will
reproduce the original repository.
`fast-export` is more thorough than my code, which only exports the first
three of the five things I listed above, and loses the other two - not
that we are using refs ("bookmarks" in Mercurial) or reflogs (not existent
in Mercurial afaik) to do anything in Sage at the moment. Also, while my
code just attempts to make human-readable the deltas between changesets,
`fast-export` actually dumps every separate version of every file in the
repository, which is I guess more "readable" (though this also results in
much larger output). This is actually the natural way to do things in git,
because git's most abstract conception of the repository actually does
contain every separate version of every file, whereas Mercurial's level of
abstraction is lower and stops at deltas between changesets.
If you plan on switching to git anytime soon, I recommend we just drop
this ticket for now and work on getting `fast-export` into our build
scripts instead. We already have a git-based Sage library repository; I
can make analogous ones for the scripts, extcode, and root repos, though
it's probably a good idea to have a long hard think about how we can
consolidate our repos as much as possible, as the switch from hg to git
would be the ideal time to do massive history rewriting (which git is good
at, by the way, using `git filter-branch`). The SPKG system in particular
needs a rework, IMO. We should have one git repo containing all the
patches, spkg-install scripts, SPKG.txt files, etc. for all the packages
we offer, standard/optional/experimental/whatever, and then just `tar -cj`
the src/ directories separately. I think we should take a look at what
Burcin is doing with lmonade for clues.
I timed `fast-export` and `fast-import` of
[http://github.com/sagemath/sagelib the git repository I'm maintaining on
github], as well as compression and decompression into gz, bz2, xz, and
lrz archives (and the resulting filesizes):
{{{
fs@boone ~/src $ bash -x fast-export-test.sh
+ cd sagelib
+ git fast-export --all
real 0m35.897s
user 0m33.348s
sys 0m1.494s
+ cd ..
+ du -h sagelib.fast_export
1.8G sagelib.fast_export
+ gzip sagelib.fast_export
real 0m52.549s
user 0m50.681s
sys 0m0.779s
+ du -h sagelib.fast_export.gz
392M sagelib.fast_export.gz
+ gunzip sagelib.fast_export
real 0m13.212s
user 0m9.598s
sys 0m0.882s
+ bzip2 sagelib.fast_export
real 2m29.877s
user 2m28.340s
sys 0m0.858s
+ du -h sagelib.fast_export.bz2
264M sagelib.fast_export.bz2
+ bunzip2 sagelib.fast_export.bz2
real 0m53.294s
user 0m50.876s
sys 0m1.372s
+ xz sagelib.fast_export
real 10m49.975s
user 10m46.808s
sys 0m1.329s
+ du -h sagelib.fast_export.xz
107M sagelib.fast_export.xz
+ unxz sagelib.fast_export.xz
real 0m10.790s
user 0m8.476s
sys 0m0.968s
+ lrzip -D sagelib.fast_export
Output filename is: sagelib.fast_export.lrz
sagelib.fast_export - Compression Ratio: 131.246. Average Compression
Speed: 26.304MB/s.
Total time: 00:01:09.06
real 1m9.066s
user 1m12.394s
sys 0m0.750s
+ du -h sagelib.fast_export.lrz
14M sagelib.fast_export.lrz
+ lrunzip -D sagelib.fast_export.lrz
Output filename is: sagelib.fast_export
Decompressing...
100% 1815.25 / 1815.25 MB
Average DeCompression Speed: 181.500MB/s
Output filename is: sagelib.fast_export: [OK] - 1903430998 bytes
Total time: 00:00:13.12
real 0m13.172s
user 0m4.774s
sys 0m1.624s
+ git init sagelib2
Initialized empty Git repository in /home/fs/src/sagelib2/.git/
+ cd sagelib2
+ git fast-import
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects: 125000
Total objects: 120380 ( 9143 duplicates )
blobs : 41942 ( 0 duplicates 32577 deltas of
39920 attempts)
trees : 61752 ( 9143 duplicates 55847 deltas of
56256 attempts)
commits: 16686 ( 0 duplicates 0 deltas of
0 attempts)
tags : 0 ( 0 duplicates 0 deltas of
0 attempts)
Total branches: 459 ( 494 loads )
marks: 1048576 ( 58628 unique )
atoms: 3025
Memory total: 8094 KiB
pools: 2235 KiB
objects: 5859 KiB
---------------------------------------------------------------------
pack_report: getpagesize() = 4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit = 8589934592
pack_report: pack_used_ctr = 322098
pack_report: pack_mmap_calls = 16687
pack_report: pack_open_windows = 1 / 1
pack_report: pack_mapped = 435565281 / 435565281
---------------------------------------------------------------------
real 1m22.824s
user 1m20.089s
sys 0m1.329s
}}}
Tests performed on a !VirtualBox VM running on a 4-core i5-2500K @ 4.5 GHz
with 8 GB of RAM.
In summary: `fast-export` produces a 1.8 GB text file in about 30 seconds.
This can be compressed into a 392 MB gz file in about a minute, a 264 MB
bz2 file in about two and a half minutes, a 107 MB xz file in about eleven
minutes, or a 14 MB lrz file in about a minute. These can be decompressed
in about ten seconds, a minute, ten seconds, and ten seconds respectively.
Importing the `fast-export` dump using `fast-import` takes about a minute
and a half.
So obviously we should try to compress our "paranoid source archive" with
[http://ck.kolivas.org/apps/lrzip/README lrzip] if at all possible (it
actually compresses down to less than half the size of the git repository
itself!). We don't need to ship this with sage unless you want all users
to be able to produce paranoid source archives. Users who want the source
archive will of course need to install lrzip themselves.
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/12342#comment:6>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.