Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-07 Thread Felipe Contreras
On Tue, May 7, 2013 at 1:47 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 05/07/2013 06:47 AM, Felipe Contreras wrote:
 On Mon, May 6, 2013 at 10:27 PM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:

 You conjectured earlier that nobody uses blob marks, and I provided a
 counterexample.  Then you proposed a workaround that would require
 changes to the cvs2git documentation, and I even explained how your
 proposed workaround is not as flexible as the status quo.

 cvs2git does *not* need blob marks, it does not need marks at all.

 The use-case that you mentioned has nothing to do with cvs2git, in
 fact. I can be described as this:

 % ./generate-blobs  blobs
 % git fast-import --export-marks=marks  blobs
 % ./generate-commits  commits
 % git fast-import --import-marks=marks  commits

 In this example 'generate-commits' has no notion of marks at all, and
 'git fast-import' doesn't need marks to process both blobs and
 commits.

 The generate-blobs program generates a mark for each blob and
 remembers the marks in a database.  The generate-commits program reads
 the marks from the database and incorporates them in the definitions of
 the commits that it writes to its output.  So yes, the program pair
 *does* rely on marks for blobs being exported correctly.

Fine:

% ./generate-data  data
% ./split-blobs data  blobs
% ./split-commits data  commits

How exactly the files 'blobs' and 'commits' are generated is
irrelevant, 'git fast-import' will need them *once*, and never again,
in fact, doesn't even need to know they are two files. There's no need
for marks.

 I've tired of this discussion.  I am quite sure that your change will
 not be accepted, so I see no need to participate further.  Please do not
 interpret my silence as agreement with your quarrelsome arguments.

How convenient. I ask the though questions, and you suddenly get tired
of the discussion. So, let me answer for you:

* Which the *vastly* more common case? That blobs are needed, or that
they are not?

That they are not. To suggest otherwise would be ridiculous.

And of course this patch will not be accepted, because nobody is
interested in improving things in the most easy and sensible way.

Yours and everybody else's argument is one and the same: status quo,
which is of course, not an argument at all.

But it's pointless to try to convince you, because a) you are not
interested in changing your mind b) you are not interested in seeking
the most sensible solution c) you are only interested in doing what
you are used to do, which is to not change anything, ever d) you know
making this the default is only slightly risky, at best, and not only
the world is not going to end, but most likely nobody would even
notice, and the ones that would, are probably already participating in
this conversation, but you don't even want to do something slightly
risky, because it would show that others were right, and your concerns
don't actually affect any real users at all.

But as I said, drop the patch. Who needs progress?

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-07 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 CVS stores all of the revisions of a single file in a single filename,v
 file in rcsfile(5) format.  The revisions are stored as deltas ordered
 so that a single revision can be reconstructed from a single serial read
 of the file.

 cvs2git reads each of these files once, reconstructing *all* of the
 revisions for a file in a single go.  It then pours them into a
 git-fast-import stream as blobs and sets a mark on each blob.

This is more or less off-topic but in the bigger picture it is more
interesting and important X-.

The way you describe how cvs2git handles the blobs is the more
efficient way, given that fast-import does not even attempt to
bother to create good deltas. The only thing it does is to see if
the current data deltifies against the last object.

IIRC, CVS's backend storage is mostly recorded in backward delta, so
if you are feeding the blob data from new to old, then the resulting
pack would follow Linus's law (the file generally grows over time)
and would generally give you a good deltified chain of objects.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-07 Thread Michael Haggerty
On 05/07/2013 09:12 AM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 CVS stores all of the revisions of a single file in a single filename,v
 file in rcsfile(5) format.  The revisions are stored as deltas ordered
 so that a single revision can be reconstructed from a single serial read
 of the file.

 cvs2git reads each of these files once, reconstructing *all* of the
 revisions for a file in a single go.  It then pours them into a
 git-fast-import stream as blobs and sets a mark on each blob.
 
 This is more or less off-topic but in the bigger picture it is more
 interesting and important X-.
 
 The way you describe how cvs2git handles the blobs is the more
 efficient way, given that fast-import does not even attempt to
 bother to create good deltas. The only thing it does is to see if
 the current data deltifies against the last object.
 
 IIRC, CVS's backend storage is mostly recorded in backward delta, so
 if you are feeding the blob data from new to old, then the resulting
 pack would follow Linus's law (the file generally grows over time)
 and would generally give you a good deltified chain of objects.

Yes, you are correct about how CVS orders commits on the mainline.
Branches are stored in the opposite order--oldest to newest--but CVS
users don't tend to get carried away with branches anyway, and if the
changes are small deltafication should help a lot anyway.

Cool.  I knew that fast-import didn't do much in the way of compression,
but I didn't realize that it can compute deltas only between adjacent
blobs.  So cvs2git happily hits the sweet-spot of fast-import.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Michael Haggerty
On 05/03/2013 08:23 PM, Felipe Contreras wrote:
 On Fri, May 3, 2013 at 12:56 PM, Thomas Rast tr...@inf.ethz.ch wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:
 
 How do we know that this doesn't break any users of fast-import?  Your
 comment isn't very reassuring:

 the vast majority of them will never be used again

 So what's with the minority?
 
 Actually I don't think there's any minority. If the client program
 doesn't store blobs, the blob marks are not used anyway. So there's no
 change.

I haven't been following this conversation in detail, but your proposed
change sounds like something that would break cvs2git [1].  Let me
explain what cvs2git does and why:

CVS stores all of the revisions of a single file in a single filename,v
file in rcsfile(5) format.  The revisions are stored as deltas ordered
so that a single revision can be reconstructed from a single serial read
of the file.

cvs2git reads each of these files once, reconstructing *all* of the
revisions for a file in a single go.  It then pours them into a
git-fast-import stream as blobs and sets a mark on each blob.

Only much later in the conversion does it have enough information to
reconstruct tree-wide commits.  At that time it outputs git-fast-import
data (to a second file) defining the git commits and their ancestry.
The contents are defined by referring to the marks of blobs from the
first git-fast-import stream file.

This strategy speeds up the conversion *enormously*.

So if I understand correctly that you are proposing to stop allowing
marks on blob objects to be set and/or referred to later, then I object
vociferously.

If I've misunderstood then I'll go back into my hole :-)

Michael

[1] http://cvs2svn.tigris.org/cvs2git.html

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Thomas Rast
Michael Haggerty mhag...@alum.mit.edu writes:

 On 05/03/2013 08:23 PM, Felipe Contreras wrote:
 On Fri, May 3, 2013 at 12:56 PM, Thomas Rast tr...@inf.ethz.ch wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:
 
 How do we know that this doesn't break any users of fast-import?  Your
 comment isn't very reassuring:

 the vast majority of them will never be used again

 So what's with the minority?
 
 Actually I don't think there's any minority. If the client program
 doesn't store blobs, the blob marks are not used anyway. So there's no
 change.

 I haven't been following this conversation in detail, but your proposed
 change sounds like something that would break cvs2git [1].  Let me
 explain what cvs2git does and why:

 CVS stores all of the revisions of a single file in a single filename,v
 file in rcsfile(5) format.  The revisions are stored as deltas ordered
 so that a single revision can be reconstructed from a single serial read
 of the file.

 cvs2git reads each of these files once, reconstructing *all* of the
 revisions for a file in a single go.  It then pours them into a
 git-fast-import stream as blobs and sets a mark on each blob.

 Only much later in the conversion does it have enough information to
 reconstruct tree-wide commits.  At that time it outputs git-fast-import
 data (to a second file) defining the git commits and their ancestry.
 The contents are defined by referring to the marks of blobs from the
 first git-fast-import stream file.

 This strategy speeds up the conversion *enormously*.

 So if I understand correctly that you are proposing to stop allowing
 marks on blob objects to be set and/or referred to later, then I object
 vociferously.

The proposed patch wants to stop writing marks (in --export-marks) for
anything but commits.  Does cvs2git depend on that?  I.e., are you using
two separate fast-import processes for the blob and tree/commit phases
you describe above?

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Michael Haggerty
On 05/06/2013 12:32 PM, Thomas Rast wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 On 05/03/2013 08:23 PM, Felipe Contreras wrote:
 On Fri, May 3, 2013 at 12:56 PM, Thomas Rast tr...@inf.ethz.ch wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 How do we know that this doesn't break any users of fast-import?  Your
 comment isn't very reassuring:

 the vast majority of them will never be used again

 So what's with the minority?

 Actually I don't think there's any minority. If the client program
 doesn't store blobs, the blob marks are not used anyway. So there's no
 change.

 I haven't been following this conversation in detail, but your proposed
 change sounds like something that would break cvs2git [1].  Let me
 explain what cvs2git does and why:

 CVS stores all of the revisions of a single file in a single filename,v
 file in rcsfile(5) format.  The revisions are stored as deltas ordered
 so that a single revision can be reconstructed from a single serial read
 of the file.

 cvs2git reads each of these files once, reconstructing *all* of the
 revisions for a file in a single go.  It then pours them into a
 git-fast-import stream as blobs and sets a mark on each blob.

 Only much later in the conversion does it have enough information to
 reconstruct tree-wide commits.  At that time it outputs git-fast-import
 data (to a second file) defining the git commits and their ancestry.
 The contents are defined by referring to the marks of blobs from the
 first git-fast-import stream file.

 This strategy speeds up the conversion *enormously*.

 So if I understand correctly that you are proposing to stop allowing
 marks on blob objects to be set and/or referred to later, then I object
 vociferously.
 
 The proposed patch wants to stop writing marks (in --export-marks) for
 anything but commits.  Does cvs2git depend on that?  I.e., are you using
 two separate fast-import processes for the blob and tree/commit phases
 you describe above?

Yes, it can be handy to start loading the first blobfile in parallel
with the later stages of the conversion, before the second dumpfile is
ready.  In that case the user needs to pass --export-marks to the first
fast-import process to export marks on blobs so that the marks can be
passed to the second fast-import via --import-marks.

So the proposed change would break a documented use of cvs2git.

Making the export of blob marks optional would of course be OK, as long
as the default is to export them.

Michael


-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Johannes Schindelin
Hi Thomas,

On Mon, 6 May 2013, Thomas Rast wrote:

 The proposed patch wants to stop writing marks (in --export-marks) for
 anything but commits.

Then it should not go in.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 Yes, it can be handy to start loading the first blobfile in parallel
 with the later stages of the conversion, before the second dumpfile is
 ready.  In that case the user needs to pass --export-marks to the first
 fast-import process to export marks on blobs so that the marks can be
 passed to the second fast-import via --import-marks.

 So the proposed change would break a documented use of cvs2git.

 Making the export of blob marks optional would of course be OK, as long
 as the default is to export them.

Thanks for a concise summary.  Your use case fits exactly what
Felipe conjectured as the nonexistent minority.

An option that lets the caller say I only care about marks on these
types of objects to be written to (and read from) the exported marks
file would help Felipe's use case without harming your use case,
and would be a sane and safe way to go.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Felipe Contreras
On Mon, May 6, 2013 at 5:45 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 05/06/2013 12:32 PM, Thomas Rast wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:

 On 05/03/2013 08:23 PM, Felipe Contreras wrote:
 On Fri, May 3, 2013 at 12:56 PM, Thomas Rast tr...@inf.ethz.ch wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 How do we know that this doesn't break any users of fast-import?  Your
 comment isn't very reassuring:

 the vast majority of them will never be used again

 So what's with the minority?

 Actually I don't think there's any minority. If the client program
 doesn't store blobs, the blob marks are not used anyway. So there's no
 change.

 I haven't been following this conversation in detail, but your proposed
 change sounds like something that would break cvs2git [1].  Let me
 explain what cvs2git does and why:

 CVS stores all of the revisions of a single file in a single filename,v
 file in rcsfile(5) format.  The revisions are stored as deltas ordered
 so that a single revision can be reconstructed from a single serial read
 of the file.

 cvs2git reads each of these files once, reconstructing *all* of the
 revisions for a file in a single go.  It then pours them into a
 git-fast-import stream as blobs and sets a mark on each blob.

 Only much later in the conversion does it have enough information to
 reconstruct tree-wide commits.  At that time it outputs git-fast-import
 data (to a second file) defining the git commits and their ancestry.
 The contents are defined by referring to the marks of blobs from the
 first git-fast-import stream file.

 This strategy speeds up the conversion *enormously*.

 So if I understand correctly that you are proposing to stop allowing
 marks on blob objects to be set and/or referred to later, then I object
 vociferously.

 The proposed patch wants to stop writing marks (in --export-marks) for
 anything but commits.  Does cvs2git depend on that?  I.e., are you using
 two separate fast-import processes for the blob and tree/commit phases
 you describe above?

 Yes, it can be handy to start loading the first blobfile in parallel
 with the later stages of the conversion, before the second dumpfile is
 ready.  In that case the user needs to pass --export-marks to the first
 fast-import process to export marks on blobs so that the marks can be
 passed to the second fast-import via --import-marks.

It can be used that way, but it doesn't have to be.

 So the proposed change would break a documented use of cvs2git.

It's documented as an alternative. How many people actually use this
form over the other? Is there really any advantage? It's possibly that
basically nobody is using this form, and there's no benefits.

 Making the export of blob marks optional would of course be OK, as long
 as the default is to export them.

Nobody benefits from leaving the default as it is.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Felipe Contreras
On Mon, May 6, 2013 at 7:20 AM, Johannes Schindelin
johannes.schinde...@gmx.de wrote:
 Hi Thomas,

 On Mon, 6 May 2013, Thomas Rast wrote:

 The proposed patch wants to stop writing marks (in --export-marks) for
 anything but commits.

 Then it should not go in.

If that rationale was valid, no change in behavior should ever go in.
Of course, that is not the case, we want to move forward, so changes
in behavior do happen.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Felipe Contreras
On Mon, May 6, 2013 at 10:18 AM, Junio C Hamano gits...@pobox.com wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:

 Yes, it can be handy to start loading the first blobfile in parallel
 with the later stages of the conversion, before the second dumpfile is
 ready.  In that case the user needs to pass --export-marks to the first
 fast-import process to export marks on blobs so that the marks can be
 passed to the second fast-import via --import-marks.

 So the proposed change would break a documented use of cvs2git.

 Making the export of blob marks optional would of course be OK, as long
 as the default is to export them.

 Thanks for a concise summary.  Your use case fits exactly what
 Felipe conjectured as the nonexistent minority.

Not true. cvs2git does *not* rely on the blobs being stored in a marks
file, because cvs2git does not rely on mark files at all.

 An option that lets the caller say I only care about marks on these
 types of objects to be written to (and read from) the exported marks
 file would help Felipe's use case without harming your use case,
 and would be a sane and safe way to go.

His case is not harmed at all. It's only the unfortunate command that
is mentioned in the documentation that didn't need to be mentioned at
all in the first place.

It should be the other way around, if it's only this documentation
that is affected, we could add a switch for that particular command,
and the documentation should be updated, but it's overkill to add a
switch for one odd command in some documentation somewhere, it would
be much better to update the odd command to avoid using marks at all,
which is what the more appropriate command does, right below in the
same documentation.

  cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import

Should the rest of the real world be punished because somebody added a
command in some documentation somewhere, which wasn't actually needed
in the first place?

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Felipe Contreras
On Mon, May 6, 2013 at 4:19 PM, Felipe Contreras
felipe.contre...@gmail.com wrote:
 On Mon, May 6, 2013 at 10:18 AM, Junio C Hamano gits...@pobox.com wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:

 Yes, it can be handy to start loading the first blobfile in parallel
 with the later stages of the conversion, before the second dumpfile is
 ready.  In that case the user needs to pass --export-marks to the first
 fast-import process to export marks on blobs so that the marks can be
 passed to the second fast-import via --import-marks.

 So the proposed change would break a documented use of cvs2git.

 Making the export of blob marks optional would of course be OK, as long
 as the default is to export them.

 Thanks for a concise summary.  Your use case fits exactly what
 Felipe conjectured as the nonexistent minority.

 Not true. cvs2git does *not* rely on the blobs being stored in a marks
 file, because cvs2git does not rely on mark files at all.

 An option that lets the caller say I only care about marks on these
 types of objects to be written to (and read from) the exported marks
 file would help Felipe's use case without harming your use case,
 and would be a sane and safe way to go.

 His case is not harmed at all. It's only the unfortunate command that
 is mentioned in the documentation that didn't need to be mentioned at
 all in the first place.

 It should be the other way around, if it's only this documentation
 that is affected, we could add a switch for that particular command,
 and the documentation should be updated, but it's overkill to add a
 switch for one odd command in some documentation somewhere, it would
 be much better to update the odd command to avoid using marks at all,
 which is what the more appropriate command does, right below in the
 same documentation.

This would simplify the documentation, and obliterate the need to use
mark files at all:

diff -ur cvs2svn-2.4.0/www/cvs2git.html cvs2svn-2.4.0-mod/www/cvs2git.html
--- cvs2svn-2.4.0/www/cvs2git.html  2012-09-22 01:49:55.0 -0500
+++ cvs2svn-2.4.0-mod/www/cvs2git.html  2013-05-06 16:33:12.070189985 -0500
@@ -355,14 +355,13 @@
   fast-import/tt:/p

 pre
-git fast-import --export-marks=../cvs2svn-tmp/git-marks.dat lt;
../cvs2svn-tmp/git-blob.dat
-git fast-import --import-marks=../cvs2svn-tmp/git-marks.dat lt;
../cvs2svn-tmp/git-dump.dat
+cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import
 /pre

-pOn Linux/Unix this can be shortened to:/p
+pOn Windows you should use type instead:/p

 pre
-cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import
+type ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import
 /pre

   /li
Only in cvs2svn-2.4.0-mod/www: .cvs2git.html.swp


-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Michael Haggerty
On 05/06/2013 11:19 PM, Felipe Contreras wrote:
 On Mon, May 6, 2013 at 10:18 AM, Junio C Hamano gits...@pobox.com wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:

 Yes, it can be handy to start loading the first blobfile in parallel
 with the later stages of the conversion, before the second dumpfile is
 ready.  In that case the user needs to pass --export-marks to the first
 fast-import process to export marks on blobs so that the marks can be
 passed to the second fast-import via --import-marks.

 So the proposed change would break a documented use of cvs2git.

 Making the export of blob marks optional would of course be OK, as long
 as the default is to export them.

 Thanks for a concise summary.  Your use case fits exactly what
 Felipe conjectured as the nonexistent minority.
 
 Not true. cvs2git does *not* rely on the blobs being stored in a marks
 file, because cvs2git does not rely on mark files at all.
 
 An option that lets the caller say I only care about marks on these
 types of objects to be written to (and read from) the exported marks
 file would help Felipe's use case without harming your use case,
 and would be a sane and safe way to go.
 
 His case is not harmed at all. It's only the unfortunate command that
 is mentioned in the documentation that didn't need to be mentioned at
 all in the first place.
 
 It should be the other way around, if it's only this documentation
 that is affected, we could add a switch for that particular command,
 and the documentation should be updated, but it's overkill to add a
 switch for one odd command in some documentation somewhere, it would
 be much better to update the odd command to avoid using marks at all,
 which is what the more appropriate command does, right below in the
 same documentation.
 
   cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git 
 fast-import
 
 Should the rest of the real world be punished because somebody added a
 command in some documentation somewhere, which wasn't actually needed
 in the first place?

Don't get too fixated on the documentation.  The documentation just
gives some examples of how cvs2git can be used.

The reason that cvs2git outputs two files is that the first file is
emitted at the very beginning of the conversion and the second at the
very end.  These conversions can take a long time ( 1 day for very big
repos), can be interrupted and restarted between passes, and passes
can even be re-run with changed configurations.

CVS write access has to be turned off before the start of the final
conversion, so no VCS is possible until the conversion is over.  So
users are very interested in keeping the downtime minimal.  The blobfile
can also be unwieldy (its size is approximately the sum of the sizes of
all revisions of all files in the project).  Being able to load the
blobfile into one fast-import process and the dumpfile into a different
process (which relies on the feature that you propose removing) opens up
a lot of possibilities:

* The first fast-import of the blobfile can be started as soon as the
blobfile is complete and run in parallel with the rest of the conversion.

* If the blobfile needs to be transferred over the network (e.g.,
because Git will be served from a different server than the one doing
the conversion) the network transfer can also be done in parallel with
the rest of the conversion.

* The blobfile could be written to a named pipe that is being read by a
git-fast-import process, to avoid having to write the blobfile to disk
in the first place.

* The user could run git repack between loading the blobfile and
loading the dumpfile.

These are just the ways that cvs2git does and/or could benefit from the
flexibility that is now in git-fast-import.  Other tools might also be
using git-fast-import in ways that would be broken by your proposed change.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Michael Haggerty
On 05/06/2013 11:36 PM, Felipe Contreras wrote:
 This would simplify the documentation, and obliterate the need to use
 mark files at all:

As explained in my other email, this documentation change does not
remove all of the reasons that users might want to use mark files.  I
would still like to show users how they can load the files into Git as
two separate steps.

 diff -ur cvs2svn-2.4.0/www/cvs2git.html cvs2svn-2.4.0-mod/www/cvs2git.html
 --- cvs2svn-2.4.0/www/cvs2git.html2012-09-22 01:49:55.0 -0500
 +++ cvs2svn-2.4.0-mod/www/cvs2git.html2013-05-06 16:33:12.070189985 
 -0500
 @@ -355,14 +355,13 @@
fast-import/tt:/p
 
  pre
 -git fast-import --export-marks=../cvs2svn-tmp/git-marks.dat lt;
 ../cvs2svn-tmp/git-blob.dat
 -git fast-import --import-marks=../cvs2svn-tmp/git-marks.dat lt;
 ../cvs2svn-tmp/git-dump.dat
 +cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import
  /pre
 
 -pOn Linux/Unix this can be shortened to:/p
 +pOn Windows you should use type instead:/p
 
  pre
 -cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import
 +type ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git 
 fast-import
  /pre
 
/li
 Only in cvs2svn-2.4.0-mod/www: .cvs2git.html.swp
 
 

Nevertheless, it *would* be nice to add the Windows equivalent of the
cat pipeline.  I knew about the type command but I was under the
impression that it is intended for text files and can corrupt binary
files.  Are you sure that using type as you suggest is binary-clean?

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Michael Haggerty
On 05/06/2013 11:04 PM, Felipe Contreras wrote:
 On Mon, May 6, 2013 at 5:45 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 05/06/2013 12:32 PM, Thomas Rast wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:

 On 05/03/2013 08:23 PM, Felipe Contreras wrote:
 On Fri, May 3, 2013 at 12:56 PM, Thomas Rast tr...@inf.ethz.ch wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 How do we know that this doesn't break any users of fast-import?  Your
 comment isn't very reassuring:

 the vast majority of them will never be used again

 So what's with the minority?

 Actually I don't think there's any minority. If the client program
 doesn't store blobs, the blob marks are not used anyway. So there's no
 change.

 I haven't been following this conversation in detail, but your proposed
 change sounds like something that would break cvs2git [1].  Let me
 explain what cvs2git does and why:

 CVS stores all of the revisions of a single file in a single filename,v
 file in rcsfile(5) format.  The revisions are stored as deltas ordered
 so that a single revision can be reconstructed from a single serial read
 of the file.

 cvs2git reads each of these files once, reconstructing *all* of the
 revisions for a file in a single go.  It then pours them into a
 git-fast-import stream as blobs and sets a mark on each blob.

 Only much later in the conversion does it have enough information to
 reconstruct tree-wide commits.  At that time it outputs git-fast-import
 data (to a second file) defining the git commits and their ancestry.
 The contents are defined by referring to the marks of blobs from the
 first git-fast-import stream file.

 This strategy speeds up the conversion *enormously*.

 So if I understand correctly that you are proposing to stop allowing
 marks on blob objects to be set and/or referred to later, then I object
 vociferously.

 The proposed patch wants to stop writing marks (in --export-marks) for
 anything but commits.  Does cvs2git depend on that?  I.e., are you using
 two separate fast-import processes for the blob and tree/commit phases
 you describe above?

 Yes, it can be handy to start loading the first blobfile in parallel
 with the later stages of the conversion, before the second dumpfile is
 ready.  In that case the user needs to pass --export-marks to the first
 fast-import process to export marks on blobs so that the marks can be
 passed to the second fast-import via --import-marks.
 
 It can be used that way, but it doesn't have to be.

Please see my other email for an explanation of how the flexibility of
loading the two files separately can bring concrete benefits.

 So the proposed change would break a documented use of cvs2git.
 
 It's documented as an alternative. How many people actually use this
 form over the other? Is there really any advantage? It's possibly that
 basically nobody is using this form, and there's no benefits.

You conjectured earlier that nobody uses blob marks, and I provided a
counterexample.  Then you proposed a workaround that would require
changes to the cvs2git documentation, and I even explained how your
proposed workaround is not as flexible as the status quo.  Do you want
to go through the same argument with every possible user of git-fast-import?

It would be insanity to change the default behavior when a workaround on
the Git side (namely adding an option that tells git-fast-import *not*
to emit blob marks) would be quite straightforward to implement and have
little maintenance cost.

 Making the export of blob marks optional would of course be OK, as long
 as the default is to export them.
 
 Nobody benefits from leaving the default as it is.

Sure they do.  Any tool that *knows* that it doesn't need blob marks can
pass the new option and benefit.  Other tools benefit from not being
broken by your change.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Felipe Contreras
On Mon, May 6, 2013 at 11:32 PM, Johannes Schindelin
johannes.schinde...@gmx.de wrote:
 Hi Michael,

 On Tue, 7 May 2013, Michael Haggerty wrote:

 I knew about the type command but I was under the impression that it
 is intended for text files and can corrupt binary files.  Are you sure
 that using type as you suggest is binary-clean?

 type is not binary-clean. At least on some Windows versions, type also
 has a limit on file size.

copy /b file1+file2 destfile

Then.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Felipe Contreras
On Mon, May 6, 2013 at 9:58 PM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 05/06/2013 11:19 PM, Felipe Contreras wrote:
 On Mon, May 6, 2013 at 10:18 AM, Junio C Hamano gits...@pobox.com wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:

 Yes, it can be handy to start loading the first blobfile in parallel
 with the later stages of the conversion, before the second dumpfile is
 ready.  In that case the user needs to pass --export-marks to the first
 fast-import process to export marks on blobs so that the marks can be
 passed to the second fast-import via --import-marks.

 So the proposed change would break a documented use of cvs2git.

 Making the export of blob marks optional would of course be OK, as long
 as the default is to export them.

 Thanks for a concise summary.  Your use case fits exactly what
 Felipe conjectured as the nonexistent minority.

 Not true. cvs2git does *not* rely on the blobs being stored in a marks
 file, because cvs2git does not rely on mark files at all.

 An option that lets the caller say I only care about marks on these
 types of objects to be written to (and read from) the exported marks
 file would help Felipe's use case without harming your use case,
 and would be a sane and safe way to go.

 His case is not harmed at all. It's only the unfortunate command that
 is mentioned in the documentation that didn't need to be mentioned at
 all in the first place.

 It should be the other way around, if it's only this documentation
 that is affected, we could add a switch for that particular command,
 and the documentation should be updated, but it's overkill to add a
 switch for one odd command in some documentation somewhere, it would
 be much better to update the odd command to avoid using marks at all,
 which is what the more appropriate command does, right below in the
 same documentation.

   cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git 
 fast-import

 Should the rest of the real world be punished because somebody added a
 command in some documentation somewhere, which wasn't actually needed
 in the first place?

 Don't get too fixated on the documentation.  The documentation just
 gives some examples of how cvs2git can be used.

 The reason that cvs2git outputs two files is that the first file is
 emitted at the very beginning of the conversion and the second at the
 very end.  These conversions can take a long time ( 1 day for very big
 repos), can be interrupted and restarted between passes, and passes
 can even be re-run with changed configurations.

 CVS write access has to be turned off before the start of the final
 conversion, so no VCS is possible until the conversion is over.  So
 users are very interested in keeping the downtime minimal.

Of course VCS is possible before the conversion is over, you can do
all the cvs2git commands first, turn on CVS, and let 'git fast-import'
do it's thing afterwards, can you not?

 The blobfile
 can also be unwieldy (its size is approximately the sum of the sizes of
 all revisions of all files in the project).  Being able to load the
 blobfile into one fast-import process and the dumpfile into a different
 process (which relies on the feature that you propose removing) opens up
 a lot of possibilities:

 * The first fast-import of the blobfile can be started as soon as the
 blobfile is complete and run in parallel with the rest of the conversion.

Is that reason enough to punish everyone else?

 * If the blobfile needs to be transferred over the network (e.g.,
 because Git will be served from a different server than the one doing
 the conversion) the network transfer can also be done in parallel with
 the rest of the conversion.

Ditto.

 * The blobfile could be written to a named pipe that is being read by a
 git-fast-import process, to avoid having to write the blobfile to disk
 in the first place.

And it would still be possible.

{ cat file1; cat file2; }  pipe

What am I missing?

 * The user could run git repack between loading the blobfile and
 loading the dumpfile.

Is that reason enough to punish everyone else?

 These are just the ways that cvs2git does and/or could benefit from the
 flexibility that is now in git-fast-import.

Sure, all those things _migt_ be nice, but nothing would be broken,
would it? cvs2git can still achieve it's goal without it.

And if this feature is removed by *default*, it would be trivial to
enable an option to tell 'git fast-import' to store blobs from
cvs2git, would it not?

 Other tools might also be
 using git-fast-import in ways that would be broken by your proposed change.

Yeah, everything is possible, but the key word is *might*. Will
anything be *actually* broken by this change? My guess is, very few
things, if any.

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Johannes Schindelin
Hi,

On Tue, 7 May 2013, Michael Haggerty wrote:

 On 05/06/2013 11:04 PM, Felipe Contreras wrote:
  On Mon, May 6, 2013 at 5:45 AM, Michael Haggerty mhag...@alum.mit.edu 
  wrote:
  On 05/06/2013 12:32 PM, Thomas Rast wrote:

  So the proposed change would break a documented use of cvs2git.
  
  It's documented as an alternative. How many people actually use this
  form over the other? Is there really any advantage? It's possibly that
  basically nobody is using this form, and there's no benefits.
 
 You conjectured earlier that nobody uses blob marks, and I provided a
 counterexample.  Then you proposed a workaround that would require
 changes to the cvs2git documentation, and I even explained how your
 proposed workaround is not as flexible as the status quo.  Do you want
 to go through the same argument with every possible user of
 git-fast-import?
 
 It would be insanity to change the default behavior when a workaround on
 the Git side (namely adding an option that tells git-fast-import *not*
 to emit blob marks) would be quite straightforward to implement and have
 little maintenance cost.

I really wonder how many more counterexamples are required to settle this
discussion.

There is no good reason to artificially limit Git's capabilities here,
especially when it has been demonstrated that supporting that capability
is not only possible, but also outright easy.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-06 Thread Felipe Contreras
On Mon, May 6, 2013 at 11:39 PM, Johannes Schindelin
johannes.schinde...@gmx.de wrote:
 Hi,

 On Tue, 7 May 2013, Michael Haggerty wrote:

 On 05/06/2013 11:04 PM, Felipe Contreras wrote:
  On Mon, May 6, 2013 at 5:45 AM, Michael Haggerty mhag...@alum.mit.edu 
  wrote:
  On 05/06/2013 12:32 PM, Thomas Rast wrote:

  So the proposed change would break a documented use of cvs2git.
 
  It's documented as an alternative. How many people actually use this
  form over the other? Is there really any advantage? It's possibly that
  basically nobody is using this form, and there's no benefits.

 You conjectured earlier that nobody uses blob marks, and I provided a
 counterexample.  Then you proposed a workaround that would require
 changes to the cvs2git documentation, and I even explained how your
 proposed workaround is not as flexible as the status quo.  Do you want
 to go through the same argument with every possible user of
 git-fast-import?

 It would be insanity to change the default behavior when a workaround on
 the Git side (namely adding an option that tells git-fast-import *not*
 to emit blob marks) would be quite straightforward to implement and have
 little maintenance cost.

 I really wonder how many more counterexamples are required to settle this
 discussion.

One. I haven't seen one use-case that *requires* blob marks, I haven't
seen one tool that utilizes blob marks. And no, cvs2git doesn't use
marks at all.

 There is no good reason to artificially limit Git's capabilities here,
 especially when it has been demonstrated that supporting that capability
 is not only possible, but also outright easy.

Strawman. Nobody is arguing that there shouldn't be an option to
enable blob exporting, the argument is that there's no point in making
that the *default*.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-03 Thread Thomas Rast
Felipe Contreras felipe.contre...@gmail.com writes:

 There's no point in storing blob, they would increase the time of
 loading the marks, and the vast majority of them will never be used
 again.

 This also makes fast-export and fast-import marks compatible.
[...]
 - if (m-data.marked[k])
 + if (m-data.marked[k]) {
 + struct object_entry *e;
 + e = m-data.marked[k];
 + if (e-type != OBJ_COMMIT)
 + continue;
   fprintf(f, :% PRIuMAX  %s\n, base + k,
 - 
 sha1_to_hex(m-data.marked[k]-idx.sha1));
 + sha1_to_hex(e-idx.sha1));
 + }

IIUC, you are unconditionally storing only marks to commit objects.

Are you allowed to do that at this point?  I notice that
git-fast-export(1) says

   --export-marks=file
   Dumps the internal marks table to file when complete. Marks are
   written one per line as :markid SHA-1. Only marks for revisions
   are dumped[...]

But git-fast-import(1) says nothing of the sort; I would even claim that

   --export-marks=file
   Dumps the internal marks table to file when complete.

means that the *full* marks table is dumped.

How do we know that this doesn't break any users of fast-import?  Your
comment isn't very reassuring:

 the vast majority of them will never be used again

So what's with the minority?

In any case, if this does go in, please update the documentation to
match, probably by copying the sentence from git-fast-export(1).

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-03 Thread Felipe Contreras
On Fri, May 3, 2013 at 12:56 PM, Thomas Rast tr...@inf.ethz.ch wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 How do we know that this doesn't break any users of fast-import?  Your
 comment isn't very reassuring:

 the vast majority of them will never be used again

 So what's with the minority?

Actually I don't think there's any minority. If the client program
doesn't store blobs, the blob marks are not used anyway. So there's no
change.

However, there's a chance the client programs do rely on blob objects,
in which case things would break. I would like to analyze some client
programs of fast-import out there, to see what would be the impact.

But I think this has to be done either way, the only question is how.
Having a warning would take a lot of effort, and it might be for
nothing. I think it might make sense to knowingly make the potentially
dangerous change, and see what breaks. Most likely nothing will, but
if something does, we revert immediately.

Otherwise we would be stuck in this non-ideal state forever.

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-03 Thread Junio C Hamano
Thomas Rast tr...@inf.ethz.ch writes:

 IIUC, you are unconditionally storing only marks to commit objects.

 Are you allowed to do that at this point?  I notice that
 git-fast-export(1) says

--export-marks=file
Dumps the internal marks table to file when complete. Marks are
written one per line as :markid SHA-1. Only marks for revisions
are dumped[...]

 But git-fast-import(1) says nothing of the sort; I would even claim that

--export-marks=file
Dumps the internal marks table to file when complete.

 means that the *full* marks table is dumped.

 How do we know that this doesn't break any users of fast-import?  Your
 comment isn't very reassuring:

 the vast majority of them will never be used again

 So what's with the minority?

 In any case, if this does go in, please update the documentation to
 match, probably by copying the sentence from git-fast-export(1).

A safe and sane approach may be to teach these an option to tell
them to omit non-commits or to emit all kinds, and make remote-bzr
use that to exclude non-commits.  If the defaults is matched to the
current behaviour, nobody will get hurt, and Felipe's Emacs import,
knowing that it does not need marks to blobs, can take advantage of
the new feature.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-03 Thread Felipe Contreras
On Fri, May 3, 2013 at 5:08 PM, Junio C Hamano gits...@pobox.com wrote:
 Thomas Rast tr...@inf.ethz.ch writes:

 IIUC, you are unconditionally storing only marks to commit objects.

 Are you allowed to do that at this point?  I notice that
 git-fast-export(1) says

--export-marks=file
Dumps the internal marks table to file when complete. Marks are
written one per line as :markid SHA-1. Only marks for revisions
are dumped[...]

 But git-fast-import(1) says nothing of the sort; I would even claim that

--export-marks=file
Dumps the internal marks table to file when complete.

 means that the *full* marks table is dumped.

 How do we know that this doesn't break any users of fast-import?  Your
 comment isn't very reassuring:

 the vast majority of them will never be used again

 So what's with the minority?

 In any case, if this does go in, please update the documentation to
 match, probably by copying the sentence from git-fast-export(1).

 A safe and sane approach may be to teach these an option to tell
 them to omit non-commits or to emit all kinds, and make remote-bzr
 use that to exclude non-commits.

This has nothing to do with remote-bzr, or any remote helper. These
objects are not useful, not even to 'git fast-export'.

 If the defaults is matched to the
 current behaviour, nobody will get hurt

Changing nothing always ensures that nobody will get hurt, but that
doesn't improve anything either.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-03 Thread Junio C Hamano
Felipe Contreras felipe.contre...@gmail.com writes:

 A safe and sane approach may be to teach these an option to tell
 them to omit non-commits or to emit all kinds, and make remote-bzr
 use that to exclude non-commits.

 This has nothing to do with remote-bzr, or any remote helper. These
 objects are not useful, not even to 'git fast-export'.

 If the defaults is matched to the
 current behaviour, nobody will get hurt

 Changing nothing always ensures that nobody will get hurt, but that
 doesn't improve anything either.

I do not quite follow.  Allowing existing code to depend on old
behaviour, while letting new code that knows it does not need
anything but commits, will get the best of both worlds. The new code
will definitely improve (otherwise you won't be writing these
patches), and the old code will not get hurt. Where is this doesn't
improve anything coming from?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] fast-import: only store commit objects

2013-05-03 Thread Felipe Contreras
On Fri, May 3, 2013 at 6:45 PM, Junio C Hamano gits...@pobox.com wrote:
 Felipe Contreras felipe.contre...@gmail.com writes:

 A safe and sane approach may be to teach these an option to tell
 them to omit non-commits or to emit all kinds, and make remote-bzr
 use that to exclude non-commits.

 This has nothing to do with remote-bzr, or any remote helper. These
 objects are not useful, not even to 'git fast-export'.

 If the defaults is matched to the
 current behaviour, nobody will get hurt

 Changing nothing always ensures that nobody will get hurt, but that
 doesn't improve anything either.

 I do not quite follow.  Allowing existing code to depend on old
 behaviour, while letting new code that knows it does not need
 anything but commits, will get the best of both worlds. The new code
 will definitely improve (otherwise you won't be writing these
 patches), and the old code will not get hurt. Where is this doesn't
 improve anything coming from?

So let's suppose we add a new 'feature no-blob-store' to 'git
fast-import', and some clients of 'git fast-import' enable it, and
benefit from it.

How does that benefit the rest of fast-import clients? You are
worrying about clients that most likely don't exist, and don't let
existing real clients benefit for free. This is like premature
optimization.

But whatever, let's keep writing and discarding these blobs, at least
the previous patches do make such operations fast.

You can drop this patch, because I'm not going to write code for
clients that don't exist. And it seems clear that even if I
investigate client apps of 'git fast-import' and I'm unable to find a
single one that utilizes blobs, you still wouldn't apply this patch.
So there's no point in investigating what are the *actual* users, if
all we are every going to do is worry about hypothetical ones.

My main interest is the previous patches, if anybody has any issues
with the way this patch is handled, they can either work on the patch
itself, or start a new thread in which I will not participate, I will
rather concentrate on issues that affect *real* users, and have real
fixes reachable today, and the previous patches in this series fit
that bill.

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html