subject:"Re\: \[HACKERS\] PostgreSQL Developer meeting minutes up"

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-13 Thread kris

2009/6/7 Tom Lane t...@sss.pgh.pa.us:
 So there are a lot of good reasons to work backwards in patching.
 I don't believe that these would be outweighed by some advantage
 in the mechanics of applying an unchanging patch to multiple
 branches (especially since AFAICT the mechanical advantage would
 be pretty darn minimal anyhow).

As another data point,  the stable branches of the linux kernel are
actually maintained this way.  There is a policy that any patch for the
stable branches must have already be included (in some form) in HEAD.
There is no merging going on.  They aren't even using git cherry-pick,  but
that's because all backpatching goes into a review list rather than happening
immediately.

The multiple branches and merging that is going on in the linux kernel
is all about development of new features, not fixing of bugs.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-08 Thread Markus Wanner


Hi,

Quoting Mark Mielke m...@mark.mielke.cc:
I am a theory person - I run things in my head. To me, the concept  
of having more context to make the right decision, and an algorithm  
that takes advantage of this context to make the right decision, is  
simple and compelling on its own. Knowing the algorithms that are in  
use, including how it selects the most recent common ancestor gives  
me confidence.


Than makes me wondering why you are speaking against merges, where  
there are common ancestors. I'd argue that in theory (and generally) a  
merge yields better results than cherry-picking (where there is no  
common ancestor, thus less information). Especially for back-branches,  
where there obviously is a common ancestor.


No amount of discussions where others say it works great and you  
say I don't believe you until you provide me with output is going  
to get anywhere.


Well, I guess it can be frustrating for both sides. However, I think  
these discussions are worthwhile (and necessary) none the less.


As not even those who highly appreciate merge algorithms (you and me,  
for example) are in agreement on how to use them (cherry-picking vs.  
merging) it doesn't surprise me that others are generally skeptic.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-08 Thread Ron Mayer

Robert Haas wrote:
 On Fri, Jun 5, 2009 at 12:15 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 ... but I'm not at all excited about cluttering the
 long-term project history with a zillion micro-commits.  One of the
 things I find most annoying about reviewing the current commit history
 is that Bruce has taken a micro-commit approach to managing the TODO
 list --- I was seldom so happy as the day that disappeared from CVS,
 because of the ensuing reduction in noise level.

For better or worse, git also includes a command git-rebase that can
collapse such micro-commits into a larger one.

Quoting the git-rebase man page:
   A range of commits could also be removed with rebase. If we have the
   following situation:
   E---F---G---H---I---J  topicA
   then the command
   git-rebase --onto topicA~5 topicA~3 topicA
   would result in the removal of commits F and G:
   E---H´---I´---J´  topicA

While I wouldn't recommend using this for historical revisionism, I
imagine it could be useful during code-review time when the
micro-commits (from both the patch submitter and patch reviewer)
are interesting.  After the review, the commits could be collapsed
into meaningful-sized-chunks just before they're merged into the
official branches.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-08 Thread Markus Wanner


Hi,

Quoting Nicolas Barbier nicolas.barb...@gmail.com:

If I understand correctly, nearby variable renaming refers to
changes to the few lines surrounding the changes-to-be-merged.


Hm.. I took that to mean changes on the same line. I now realize  
this interpretation has been an overly strict interpretation.



There
is certainly supposed to be an advantage relative to diff/patch here:
as all changes leading to both versions are known (up to some common
ancestor), git doesn't need context lines to recognize the position
in the file that is supposed to receive the updates.


Yes, that's how I understand it as well. Your example seems fine  
(except that it does not make much sense to merge with an ancestor).


I'm not sure if git also works line by line (as does monotone).  
However, IIRC kdiff3 uses some finer grained comparison, so it can  
even merge unrelated change on the same line, i.e.:


ancestor: aaa bbb
left: axa bbb  (modified a - x)
right:aaa byb  (modified b - y)
merge:axa byb  (contains both modifications)

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-08 Thread Mark Mielke


Markus Wanner wrote:

Quoting Mark Mielke m...@mark.mielke.cc:
I am a theory person - I run things in my head. To me, the concept of 
having more context to make the right decision, and an algorithm that 
takes advantage of this context to make the right decision, is simple 
and compelling on its own. Knowing the algorithms that are in use, 
including how it selects the most recent common ancestor gives me 
confidence.


Than makes me wondering why you are speaking against merges, where 
there are common ancestors. I'd argue that in theory (and generally) a 
merge yields better results than cherry-picking (where there is no 
common ancestor, thus less information). Especially for back-branches, 
where there obviously is a common ancestor.


Nope - definitely not speaking against merges. Automatic merges = best. 
Automatic cherry picking = second best if the work flow doesn't allow 
for merges. Doing things by hand = bad but sometimes necessary. 
Automatic merges or automatic cherry picking with some manual tweaking 
(hopefully possible from kdiff3) = necessary at times but still better 
than doing things by hand completely. I think you and I are in 
agreement. (Even Tom and I are in agreement on many things - I just 
didn't respond to his well thought out great posts, like the one that 
describes why back patching is often better than forward patching when 
having multiple parallel releases open at the same time)


No amount of discussions where others say it works great and you 
say I don't believe you until you provide me with output is going 
to get anywhere.
Well, I guess it can be frustrating for both sides. However, I think 
these discussions are worthwhile (and necessary) none the less.


As not even those who highly appreciate merge algorithms (you and me, 
for example) are in agreement on how to use them (cherry-picking vs. 
merging) it doesn't surprise me that others are generally skeptic.


We're in agreement on the merge algorithms I think. :-)

That said, it is a large domain, and there is room for disagreement even 
between those with experience, and you are right that it shouldn't be 
surprising that others are generally sceptic.


Cheers,
mark

--
Mark Mielke m...@mielke.cc


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-08 Thread Markus Wanner


Hi,

Quoting Nicolas Barbier nicolas.barb...@gmail.com:

ISTM that back-patching


I take this to mean back-patching by cherry picking.


a change to a file that wasn't modified on the
back-branch leads exactly to merging a change to a (file-wise)
ancestor?


Regarding the file's contents - and therefore the immediately visible  
result - that's correct. However, for a merge, the two ancestor  
revisions are stored, where as with cherry-pinging this information is  
lost (at least for git).


So, trying to merge on top of a cherry-pick, git must merge these  
changes again (which might or might not work). Merging on top of  
merging works just fine.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-07 Thread Nicolas Barbier

2009/6/7 Markus Wanner mar...@bluegap.ch:

 However, there's no special whitespace treatment. Nor anything remotely
 as clever as nearby variable renaming. There's no such magic, the
 developer still needs to tell the tool what he wants.

If I understand correctly, nearby variable renaming refers to
changes to the few lines surrounding the changes-to-be-merged. There
is certainly supposed to be an advantage relative to diff/patch here:
as all changes leading to both versions are known (up to some common
ancestor), git doesn't need context lines to recognize the position
in the file that is supposed to receive the updates.

Example:

Original file:

a
b
c

Random other changes later (a and c are updated to incorporate nearby
variable renaming or somesuch):

extra line
a'
b
c'

(Note that the extra line is important, because if the line numbers
stay the same and the lines-to-update are exactly the same, patch
could just ignore the context lines.)

An update to line b yields:

extra line
a'
b'
c'

This change would not be diff/patch-mergeable to the original file,
because the context lines a' and c' wouldn't be found. Git is
smarter than this and doesn't need the context lines; rather it uses
the full history to determine that the change to line 3 becomes a
change to line 2 in the original file. It therefore merges this change
to yield:

a
b'
c

Disclaimer: I don't use git, but I assume that this is how all systems
that are smarter than diff/patch work.

Nicolas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Markus Wanner

Hi,

Tom Lane wrote:
 I think it's already been made crystal clear that the people who
 actually do this work don't do it that way, and are uninterested in
 allowing their tools to force them to do it that way.

That's well understood.

 Patching from
 HEAD back works better for us for a number of reasons, the main one
 being that HEAD is the version of the code that's most swapped into
 our awareness.

Committing on the oldest back-branch first doesn't necessarily mean
having to develop the patch there.

 However, so long as we can have a separate working copy per branch,
 I see no problem with preparing all the versions of a patch and then
 committing them back-to-front.

That's what I think as well.

However, I bet git could help a lot with creating all the versions of a
patch in the first place. You don't *need* to use that feature, but
preserving the option could help.

 What I'm not clear about is the
 mechanics for doing that.

If you create each of the patches individually, there's not much magic
required from git. It should be trivial to commit those as merges.

 Would someone explain exactly what the
 steps should be to produce the nicest-looking git history?

I fear the cherry-picking approach creates the nicest-looking history
(especially to the CVS trained eye).

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Markus Wanner

Hi,

Andrew Dunstan wrote:
 Yeah, a requirement to work from the back branch forward is quite
 unacceptable IMNSHO. It's also quite unreasonable.

The monotone page about daggy fixes does quite a good job in explaining
why it is helpful. I think it's how to make best use of these tools. And
it's obviously not the same as what worked well in practice with CVS.
Out of interest, and not necessarily related to Postgres: why do you
think it's unreasonable? Fixing the problem where it was introduced
sounds like the most reasonable place to fix it, IMO.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Tom Lane

Markus Wanner mar...@bluegap.ch writes:
 Out of interest, and not necessarily related to Postgres: why do you
 think it's unreasonable? Fixing the problem where it was introduced
 sounds like the most reasonable place to fix it, IMO.

There are a number of possible reasons, but here are a few that hold for me:

* I always prefer to isolate a bug in HEAD if possible.  It's the
version of the code that's most familiar at the moment, and there are
often new features available that make it easier to test a problem.
So that generally leads to formulating the fix in terms of the HEAD code
first.  After that you start to think about whether (some form of) the
bug exists in back branches and how to fix those branches.

* Experience has shown that later branches tend to have more places
affected by an issue than older ones; eg you might need to touch four
places to fix a bug now, but only three of those places exist in the
older branches.  ISTM you'd be far more likely to miss fixing the
fourth place if you do your initial investigation and fixing/testing
in the oldest affected branch.

* We want HEAD to have the cleanest, most maintainable version of the
fix.  It's not infrequently the case that the most natural way of fixing
a problem varies across branches --- for instance, there might be a
helpful subroutine available in later branches.  If you design the fix
in terms of what works in the oldest branch that has the problem,
you're more likely to come up with something that's suboptimal for later
branches.  For instance in the helpful-subroutine case, I'd be more
likely to decide to back-port the subroutine along with the fix if
I work from HEAD back than if I try to work the other way.

* We are often willing to adopt a fairly invasive fix for HEAD, if
that's what's needed to have a clean maintainable solution, and then
look for a less invasive but klugy solution for the back branches.
Approaching it the other way around would strongly encourage use of
the kluge solution as a permanent fix.


So there are a lot of good reasons to work backwards in patching.
I don't believe that these would be outweighed by some advantage
in the mechanics of applying an unchanging patch to multiple
branches (especially since AFAICT the mechanical advantage would
be pretty darn minimal anyhow).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Andrew Dunstan




Markus Wanner wrote:

Hi,

Andrew Dunstan wrote:
  

Yeah, a requirement to work from the back branch forward is quite
unacceptable IMNSHO. It's also quite unreasonable.



The monotone page about daggy fixes does quite a good job in explaining
why it is helpful. I think it's how to make best use of these tools. And
it's obviously not the same as what worked well in practice with CVS.
Out of interest, and not necessarily related to Postgres: why do you
think it's unreasonable? Fixing the problem where it was introduced
sounds like the most reasonable place to fix it, IMO.


  


Half the trouble with this discussion is that it has not been related 
enough to how the Postgres project actually works IMNSHO.


One fact to keep in mind is that, unlike most other FOSS projects, we 
keep quite a large number of branches live. If we don't remove one (and 
so far there is no great reason to that I know of) that number will be 
seven when we release 8.4. There is a huge benefit from this to the user 
community. It means that they can deploy Postgres with confidence that 
they will not have to upgrade for quite a few years. In the corporate 
world, especially, that is a major issue. I occasionally have clients 
running 7.4 or even older versions. Anyway, the large number of branches 
alone means that our patterns are unlikely to match those of other 
projects.


The question we often face in backpatching is not where did it first 
occur? but how far back should we patch it?. Problems are almost 
always discovered near the top of the version list, overwhelmingly on 
the HEAD or most recent stable branches. So the way we work is not to 
try to develop a fix where the problem first occurred (which might not 
even be on a supported branch at all) but as high up the list as the 
problem goes (usually HEAD) and then work out how far down the list to 
apply the fix. And the notion that a fix of any complexity at all is 
going to be simply applicable across six or seven branches simply defies 
our experience. It almost never does. Frequently it won't apply cleanly 
from *any* one branch to another. Even fairly trivial patches can suffer 
from this: the pretty small plperl fixes I applied yesterday and the day 
before, required adjustment going from one branch to the previous one in 
about three out of five back branch cases. Sometimes these adjustments 
are small, sometimes they are quite large. So the idea that we can just 
create a fix on say, the 7.4 branch, and then just merge it forward 
nicely, is just fanciful in most cases, as well as being contrary to our 
methods of work.


Most of this stuff is almost invisible to most of the community. But 
people like Tom work with it every day. And we want to keep Tom 
productive, right? ;-)


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 [ most of a good summary omitted ]
 ... Even fairly trivial patches can suffer 
 from this: the pretty small plperl fixes I applied yesterday and the day 
 before, required adjustment going from one branch to the previous one in 
 about three out of five back branch cases. Sometimes these adjustments 
 are small, sometimes they are quite large. So the idea that we can just 
 create a fix on say, the 7.4 branch, and then just merge it forward 
 nicely, is just fanciful in most cases, as well as being contrary to our 
 methods of work.

I have heard it claimed that git is more intelligent than plain
diff/patch and could successfully merge patches in cases that currently
require manual adjustment of the sort Andrew describes.  If that's
really true to any significant extent, then it could represent a benefit
large enough to persuade us to alter work flows (at least for simple
patches that don't require significant rethinking across branches).
However, I have yet to see any actual *evidence* in support of this
claim.  How robust is git about dealing with whitespace changes,
nearby variable renamings, and such?

Andrew's plperl patches would be an excellent small test case.  Anybody
want to try them against the experimental git repository and see if git
does any better than plain patch?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Mark Mielke


Tom Lane wrote:

I have heard it claimed that git is more intelligent than plain
diff/patch and could successfully merge patches in cases that currently
require manual adjustment of the sort Andrew describes.  If that's
really true to any significant extent, then it could represent a benefit
large enough to persuade us to alter work flows (at least for simple
patches that don't require significant rethinking across branches).
However, I have yet to see any actual *evidence* in support of this
claim.  How robust is git about dealing with whitespace changes,
nearby variable renamings, and such?

Andrew's plperl patches would be an excellent small test case.  Anybody
want to try them against the experimental git repository and see if git
does any better than plain patch


Any revision control system should be able to do better than diff/patch 
as these systems have more information available to them. Normal GIT 
uses the relatively common 3-way merge based upon the most recent common 
ancestor algorithm. Assuming there is a most recent common ancestor that 
isn't file creation, it will have a better chance of doing the right 
thing.


Systems such as ClearCase have had these capabilities for a long time. 
The difference with distributed version control systems is that they 
absolutely must work well, as every user has their own repository, and 
every repository represents a branch, therefore each user of the system 
is working on a different branch. The need for reliable merges goes up 
under a distributed version control system.


Not to say GIT is truly best-in-class here, but it definitely has 
motivation to be and benefit of being better than diff/patch.


These sorts of tools usually work with another tool such as kdiff3 to 
allow for only the conflicts the be resolved. If you set it up properly, 
you can have the automatic merges completely successful, and kdiff3 or 
similar can present you a graphical interface that allow you to identify 
and resolve the conflicts that require help. I've used these sorts of 
tools long enough to completely take them for granted now, and it feels 
painful to go back to anything more primitive.


Cheers,
mark

--
Mark Mielke m...@mielke.cc


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Tom Lane

Mark Mielke m...@mark.mielke.cc writes:
 Tom Lane wrote:
 I have heard it claimed that git is more intelligent than plain
 diff/patch and could successfully merge patches in cases that currently
 require manual adjustment of the sort Andrew describes.
 ...
 However, I have yet to see any actual *evidence* in support of this
 claim.

 Any revision control system should be able to do better than diff/patch 
 as these systems have more information available to them. Normal GIT 
 uses the relatively common 3-way merge based upon the most recent common 
 ancestor algorithm. Assuming there is a most recent common ancestor that 
 isn't file creation, it will have a better chance of doing the right 
 thing.

And I still haven't seen any actual evidence.  Could we have fewer
undocumented assertions and more experimental evidence?  Take Andrew's
plperl patches and see if git does any better with them than plain patch
does.  (If it's not successful with that patch, it's pointless to try it
on any bigger cases, I fear.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Greg Stark

On Fri, Jun 5, 2009 at 4:37 PM, Tom Lanet...@sss.pgh.pa.us wrote:

 However, given that we don't do any real development on the back
 branches, it might be that trying to be smart about this is a waste of
 time anyway.  Surely only the HEAD version of the patch is going to be
 something that other developers care about merging with.

For what it's worth that's certainly not true. Any user maintaining a
patched version of the source tree for production use will want to
merge in any patches for older releases. For example anyone using the
CONNECT BY patch with 8.3 will surely want to take any 8.3 patch
releases. Of course EDB in particular has to maintain sources based on
old patch releases as well as the current branch.

That said, I don't see that this really affects the decision here.
These devleopers will just merge in the patch as it was applied to the
back branch anyways.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Andrew Dunstan




Tom Lane wrote:
Any revision control system should be able to do better than diff/patch 
as these systems have more information available to them. Normal GIT 
uses the relatively common 3-way merge based upon the most recent common 
ancestor algorithm. Assuming there is a most recent common ancestor that 
isn't file creation, it will have a better chance of doing the right 
thing.



And I still haven't seen any actual evidence.  Could we have fewer
undocumented assertions and more experimental evidence?  Take Andrew's
plperl patches and see if git does any better with them than plain patch
does.  (If it's not successful with that patch, it's pointless to try it
on any bigger cases, I fear.)


  


The plperl stuff is actually a tough case. In 7.4 we didn't have 
provision for two interpreters, so PERL_SYS_INIT3 is called 
unconditionally, and we didn't have a Windows port either, so the 
comment is also different.


I guess that in itself illustrates the problems.

I also entirely agree with your point about us being more kludgey and 
less invasive on back branches.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Markus Wanner

Hi,

Tom Lane wrote:
 There are a number of possible reasons, but here are a few that hold for me:

Thank you for this very good collection. I'm still wondering about
what's the best way to represent this in git (or others). Cherry-picking
is arguably the simplest variant. Maybe that can be combined with
merging to preserve merge capability. I'll try that...

 So there are a lot of good reasons to work backwards in patching.

Agreed and understood. However, there are good reasons for keeping merge
capability between branches intact as well. I still hope we can get both
somehow, if not, I'm certainly accepting that backward patching is more
important.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Markus Wanner

Hi,

Andrew Dunstan wrote:
 One fact to keep in mind is that, unlike most other FOSS projects, we
 keep quite a large number of branches live.

So far I thought exactly that would be a good reason for migrating to
something like git. Those claim to ease working on multiple branches in
parallel, and in my experience that works pretty well. I'd like to find
a good way to allow the Postgres project to make use of these features
to ease development.

 It means that they can deploy Postgres with confidence that
 they will not have to upgrade for quite a few years. In the corporate
 world, especially, that is a major issue. I occasionally have clients
 running 7.4 or even older versions.

I agree and appreciate that very much as well.

 The question we often face in backpatching is not where did it first
 occur? but how far back should we patch it?.

Uh.. the difference here mostly being *when* the question comes up,
right? Because the possible answers in 8.1 or back to 8.1 are pretty
close.

From what I understand now, you are saying here that you work on the
patch and only after that question how far back to apply it. Note that
working on the patch doesn't necessarily mean having to commit it on
HEAD first. I seem to recall a script which has so far been used for CVS
to do the multi-branch commits pretty much at the same time. Is that
correct?

 the pretty small plperl fixes I applied yesterday and the day
 before, required adjustment going from one branch to the previous one in
 about three out of five back branch cases.

I'll give these a try with one of the touted merge algorithms. I'm
curious myself.

 Sometimes these adjustments
 are small, sometimes they are quite large. So the idea that we can just
 create a fix on say, the 7.4 branch, and then just merge it forward
 nicely, is just fanciful in most cases, as well as being contrary to our
 methods of work.

Well, my experience with the Postgres-R patch has been different.
However, that patch is probably not overly invasive.

 Most of this stuff is almost invisible to most of the community.

The daily work maybe, yes. But not the end result, which is known as
rock-solid. I certainly don't want to change that. ;-)

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Markus Wanner

Hi,

Tom Lane wrote:
 How robust is git about dealing with whitespace changes,
 nearby variable renamings, and such?

Monotone tracks changes line by line. I'm not sure about git. Kdiff3,
which is used to do the manual merge, if necessary, uses some finer
grained method, AFAIK.

However, there's no special whitespace treatment. Nor anything remotely
as clever as nearby variable renaming. There's no such magic, the
developer still needs to tell the tool what he wants.

However, I'd argue that monotone (as well as git) do an incredible job
at remembering these decisions and merges, so you never need to do a
manual merge twice. (Which I remember doing a lot with diff/patch, quilt
or subversion).

 Andrew's plperl patches would be an excellent small test case.  Anybody
 want to try them against the experimental git repository and see if git
 does any better than plain patch?

I've given that patch a try under monotone (just because I happen to
know that a lot better). The results should be the same as with git.

I've started with the patch against 7.4 (which I know doesn't resemble
the current workflow, but is sufficient for testing merging
capabilities). Merging that to 8.0 worked without any conflicts.
Although the result then differed from Andrew's work in that the
variable dummy_perl_env is declared after the #ifdef WIN32 block as
opposed to before in 7.4. The addition in the comment (notably on
Windows) of course also didn't appear automatically.

It merged from 8.0 to 8.1 without any conflicts, results were equal.

Merging from 8.1 to 8.2 resulted in one merge conflict, because of the
additional condition ('if (interp_state == INTERP_NONE)') that got added
between 8.1 and 8.2.

Merging from 8.2 to 8.3 and then to HEAD as well was conflict free
again. The results differ in whitespace changes exclusively.

So, three out of the five merges would have been equally perfect with
automatic merging, while requiring only one single command, which could
even be scripted, because it remains the same over time, i.e. for
monotone it was something similar to:

   mtn propagate REL8_0_STABLE REL8_1_STABLE

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-06 Thread Mark Mielke


Tom Lane wrote:
Any revision control system should be able to do better than diff/patch 
as these systems have more information available to them. Normal GIT 
uses the relatively common 3-way merge based upon the most recent common 
ancestor algorithm. Assuming there is a most recent common ancestor that 
isn't file creation, it will have a better chance of doing the right 
thing.


And I still haven't seen any actual evidence.  Could we have fewer
undocumented assertions and more experimental evidence?  Take Andrew's
plperl patches and see if git does any better with them than plain patch
does.  (If it's not successful with that patch, it's pointless to try it
on any bigger cases, I fear.)
  


This comes to the theory vs profiling I suppose. I am a theory person - 
I run things in my head. To me, the concept of having more context to 
make the right decision, and an algorithm that takes advantage of this 
context to make the right decision, is simple and compelling on its own. 
Knowing the algorithms that are in use, including how it selects the 
most recent common ancestor gives me confidence. You have the 
capabilities to test things for yourself. If you have any questions, try 
it out. No amount of discussions where others say it works great and 
you say I don't believe you until you provide me with output is going 
to get anywhere. I could set up a few scenarios or grab actual patches 
and show you particular success cases and particular failure cases, but 
will you really believe it? Because you shouldn't. For all you know, I 
picked the cases I knew would work and put them up against the cases I 
knew would fail.


I've used ClearCase for around 10 years now, and with the exception of 
cherry picking, it has very strong and mature merge support. We rely 
on merges being safe while managing many projects much larger than 
PostgreSQL. Many of the projects have hundreds of users working on them 
at the same time. CVS is *unusable* in these environments. Recently, 
however, in spite of investments into ClearCase, we are looking at GIT 
as providing *stronger* merge capabilities than ClearCase, specifically 
with regard to propagating changes from one release to another. I'm not 
going to pull up the last ten years of history and make it available to you.


Nothing is going to prove this to you other than trying it out for 
yourself. People need to be burned by unreliable merge algorithms before 
they respect the value of a reliable merge algorithm. People need to 
experience reliable merging before they buy the product.


If the theory doesn't work for you, you really are going to have to try 
it out for yourself.


Or not.

It doesn't matter to me. :-)

In any case - you raised the question - I explained how it works - and 
you shot me done without any evidence of your own. I explained how it 
works. It's up to you to try it out for yourself and decide if you are a 
believer.


Cheers,
mark

P.S. I'm only a bit insulted by these threads. There are a lot of 
sceptical people in the crowd who until now have raised questions which 
only make it clear that these people have not ever worked with a capable 
SCM system on a major project before. I really shouldn't hold this 
against you, which is why I continue to try and provide the theory and 
background, so that when you do give it a chance, it will all start to 
make sense. You'll try it out - find it works great - and wonder how 
does it do that? Then, hopefully you can go back to my post (or the 
many others who have tried to help out) and read how it works and say 
ah hah! excellent!


--
Mark Mielke m...@mielke.cc

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Markus Wanner


Hi,

Quoting Ron Mayer rm...@cheapcomplexdevices.com:

Seems you'd want to do is create a new branch as close to the point
where the bug was introduced - and then merge that forward into each
of the branches.


Thank you for pointing this out. As a fan of monotone I certainly know  
and like that way. However, for people who are used to CVS, lots of  
branching and merging quickly sound dangerous and messy. So I'd like  
to keep things as simple as possible while still keeping possibilities  
open for the future.


Note that a requirement for daggy fixes is that the bug is fixed  
close to the point where it was introduced. So fixing it on the  
oldest stable branch that introduced a bug instead of fixing it on  
HEAD and then back-porting would certainly be a step into the right  
direction. And I think it would be sufficient in most cases. If not,  
we can still enhance that and used daggy fixes later on (as long as we  
have a conversion that allows merging, that is).


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Tom Lane

Markus Wanner mar...@bluegap.ch writes:
 Note that a requirement for daggy fixes is that the bug is fixed  
 close to the point where it was introduced. So fixing it on the  
 oldest stable branch that introduced a bug instead of fixing it on  
 HEAD and then back-porting would certainly be a step into the right  
 direction.

I think it's already been made crystal clear that the people who
actually do this work don't do it that way, and are uninterested in
allowing their tools to force them to do it that way.  Patching from
HEAD back works better for us for a number of reasons, the main one
being that HEAD is the version of the code that's most swapped into
our awareness.

However, so long as we can have a separate working copy per branch,
I see no problem with preparing all the versions of a patch and then
committing them back-to-front.  What I'm not clear about is the
mechanics for doing that.  Would someone explain exactly what the
steps should be to produce the nicest-looking git history?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Andrew Dunstan




Tom Lane wrote:

Markus Wanner mar...@bluegap.ch writes:
  
Note that a requirement for daggy fixes is that the bug is fixed  
close to the point where it was introduced. So fixing it on the  
oldest stable branch that introduced a bug instead of fixing it on  
HEAD and then back-porting would certainly be a step into the right  
direction.



I think it's already been made crystal clear that the people who
actually do this work don't do it that way, and are uninterested in
allowing their tools to force them to do it that way.  Patching from
HEAD back works better for us for a number of reasons, the main one
being that HEAD is the version of the code that's most swapped into
our awareness.
  



Yeah, a requirement to work from the back branch forward is quite 
unacceptable IMNSHO. It's also quite unreasonable. The tool is there to 
help, not to force an unnatural work pattern on us.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Robert Haas

On Fri, Jun 5, 2009 at 9:38 AM, Tom Lanet...@sss.pgh.pa.us wrote:
 Markus Wanner mar...@bluegap.ch writes:
 Note that a requirement for daggy fixes is that the bug is fixed
 close to the point where it was introduced. So fixing it on the
 oldest stable branch that introduced a bug instead of fixing it on
 HEAD and then back-porting would certainly be a step into the right
 direction.

 I think it's already been made crystal clear that the people who
 actually do this work don't do it that way, and are uninterested in
 allowing their tools to force them to do it that way.  Patching from
 HEAD back works better for us for a number of reasons, the main one
 being that HEAD is the version of the code that's most swapped into
 our awareness.

 However, so long as we can have a separate working copy per branch,
 I see no problem with preparing all the versions of a patch and then
 committing them back-to-front.  What I'm not clear about is the
 mechanics for doing that.  Would someone explain exactly what the
 steps should be to produce the nicest-looking git history?

I'm sure someone is going to come in here and again recommend merging,
but I'm going to again recommend not merging.  Cherry-picking is the
way to go here.  Or just commit to each branch completely separately
with the same commit message; cherry-pick at least IMO is just a
convenience to help you attempt to apply the patch to a different
branch.

The way you're using commit messages to construct the release notes
really puts a limits on what the history has to look like.  I think it
would be good to find a better way to generate release notes that
isn't quite so dependent on having a very tight history, but even if
we do that I think in this particular situation cherry-picking is
going to be less work for the committers than any of the other options
that have been proposed.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 I'm sure someone is going to come in here and again recommend merging,
 but I'm going to again recommend not merging.  Cherry-picking is the
 way to go here.  Or just commit to each branch completely separately
 with the same commit message; cherry-pick at least IMO is just a
 convenience to help you attempt to apply the patch to a different
 branch.

Commit to each branch separately is surely the closest analog to what
we have done historically.  What I'm trying to understand is whether
there's an easy variant on that that'd expose the related-ness of the
patch versions in a way git understands, hopefully giving us more
ability to leverage git's capabilities in future.

However, given that we don't do any real development on the back
branches, it might be that trying to be smart about this is a waste of
time anyway.  Surely only the HEAD version of the patch is going to be
something that other developers care about merging with.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Robert Haas

On Fri, Jun 5, 2009 at 11:37 AM, Tom Lanet...@sss.pgh.pa.us wrote:
 However, given that we don't do any real development on the back
 branches, it might be that trying to be smart about this is a waste of
 time anyway.  Surely only the HEAD version of the patch is going to be
 something that other developers care about merging with.

I think that's about right.  I think there would be some benefit in
developning better tools - release notes seem to be the main issue -
so that, for example, if I develop a complex feature and you think my
code is great (ok, now I'm dreaming), you could actually merge my
commits rather than flattening them.  The EXPLAIN stuff I'm working on
right now is a good example where it's a lot easier to review the
changes piece by piece rather than as a big unit, but I know you won't
want to commit it that way because (1) with CVS, it would be a lot
more work to do that, and (2) it would suck a lot of extra commits
into the data you use to generate release notes, thereby making that
process more complex.

I'm actually going to the trouble of trying to make sure that each of
my commits does one and only one thing that can be separately checked,
tested, and either accepted (hopefully) or rejected (hopefully not).
Hopefully, that will still help with reviewing, but then if you commit
it, it'll probably go in as one stomping commit that changes the
world, or at most as two or three commits that are all still pretty
big.  There are certainly cases where big stomping commits are good (I
have them in my own projects, too, and branches with long histories of
little dumb commits regularly get squashed and rebased before merging)
but I think it would be nice to have other options.

(As a side benefit, if one of my little micro-commits turns out to
have a bug, you can easily revert *just that commit*, without having
to manually sort out exactly which pieces related to that change.)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 [ about micro commits ]
 (As a side benefit, if one of my little micro-commits turns out to
 have a bug, you can easily revert *just that commit*, without having
 to manually sort out exactly which pieces related to that change.)

I don't actually have a lot of faith in such an approach.  My experience
is that bugs arise from unforeseen interactions of changes, and that
backing out just one isn't a useful thing to do, even if none of the
later parts of the patch directly depend on it.

So, yeah, presenting a patch as a series of edits can be useful for
review purposes, but I'm not at all excited about cluttering the
long-term project history with a zillion micro-commits.  One of the
things I find most annoying about reviewing the current commit history
is that Bruce has taken a micro-commit approach to managing the TODO
list --- I was seldom so happy as the day that disappeared from CVS,
because of the ensuing reduction in noise level.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Robert Haas

On Fri, Jun 5, 2009 at 12:15 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 [ about micro commits ]
 (As a side benefit, if one of my little micro-commits turns out to
 have a bug, you can easily revert *just that commit*, without having
 to manually sort out exactly which pieces related to that change.)

 I don't actually have a lot of faith in such an approach.  My experience
 is that bugs arise from unforeseen interactions of changes, and that
 backing out just one isn't a useful thing to do, even if none of the
 later parts of the patch directly depend on it.

 So, yeah, presenting a patch as a series of edits can be useful for
 review purposes, but I'm not at all excited about cluttering the
 long-term project history with a zillion micro-commits.  One of the
 things I find most annoying about reviewing the current commit history
 is that Bruce has taken a micro-commit approach to managing the TODO
 list --- I was seldom so happy as the day that disappeared from CVS,
 because of the ensuing reduction in noise level.

I've never even noticed that noise, even when reviewing older history.
 The power of git log to get you exactly the commits you care about
is not to be underestimated.

With regard to micro-commits, I don't have hugely strong feelings on
the issue.  I like them in certain situations, and I think that git
makes it feasible to use them that way if you want to; but if you
don't want to, I don't think that's a disaster either.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Bruce Momjian

Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  [ about micro commits ]
  (As a side benefit, if one of my little micro-commits turns out to
  have a bug, you can easily revert *just that commit*, without having
  to manually sort out exactly which pieces related to that change.)
 
 I don't actually have a lot of faith in such an approach.  My experience
 is that bugs arise from unforeseen interactions of changes, and that
 backing out just one isn't a useful thing to do, even if none of the
 later parts of the patch directly depend on it.
 
 So, yeah, presenting a patch as a series of edits can be useful for
 review purposes, but I'm not at all excited about cluttering the
 long-term project history with a zillion micro-commits.  One of the
 things I find most annoying about reviewing the current commit history
 is that Bruce has taken a micro-commit approach to managing the TODO
 list --- I was seldom so happy as the day that disappeared from CVS,
 because of the ensuing reduction in noise level.

Yea, that was a problem that is now fixed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Aidan Van Dyk

* Andrew Dunstan and...@dunslane.net [090605 13:55]:

 Yeah, a requirement to work from the back branch forward is quite  
 unacceptable IMNSHO. It's also quite unreasonable. The tool is there to  
 help, not to force an unnatural work pattern on us.

Again, just to make it clear, git isn't going to *force* anyone to
drastically change their workflow.  For people who want to keep a
separate working directory per branch, and just work on them as
independently as they do with CVS, *nothing* is going to have to change,
except the possible git push step required to actually publish your
committed changes...  But, if you want, you could just also have a
post-commit hook that will do that push for you too, and you just don't
commit until you're sure (a-la-cvs-style):

cvs update === git stash save  git pull  git stash apply
cvs commit === git commit -a  git push

The git stash is because git won't pull/merge remote work into a
dirty workdir... This is the classic conflict CVS mess that git avoids,
and then allows you to use all it's powerful merge machinery to merge
any of your stashed local changes back into what you've just pulled.

But

I have a feeling that as people (specifically the comitters) get slowly
introduced and exposed to some of the more advanced things git lets you
do, and as you get comfortable with using it, people will *want* to
start altering how they do thing, simply because they start to find out
that git really allows them to do what they really want, rather than
what they have thought they want because they've been so brainwashed
by CVS...

;-)


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Andrew Dunstan




Aidan Van Dyk wrote:

* Andrew Dunstan and...@dunslane.net [090605 13:55]:

  
Yeah, a requirement to work from the back branch forward is quite  
unacceptable IMNSHO. It's also quite unreasonable. The tool is there to  
help, not to force an unnatural work pattern on us.



Again, just to make it clear, git isn't going to *force* anyone to
drastically change their workflow.  


My reaction was against someone saying in effect don't work that way, 
work this way.


So make your argument to that person ;-)

[...]

I have a feeling that as people (specifically the comitters) get slowly
introduced and exposed to some of the more advanced things git lets you
do, and as you get comfortable with using it, people will *want* to
start altering how they do thing, simply because they start to find out
that git really allows them to do what they really want, rather than
what they have thought they want because they've been so brainwashed
by CVS...


  



The whole point is that we want something better *that suits our work 
patterns*. Almost all the backpatching that gets done is by the 
committers. So we have a bunch of concerns that are not relevant to that 
vast majority of developers. In particular, it would be nice to be able 
to make a bunch of changes on different branches and then commit it all 
in one hit. If that's possible, then well and good. If it's not, that's 
a pity.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-05 Thread Aidan Van Dyk

* Andrew Dunstan and...@dunslane.net [090605 14:41]:

 The whole point is that we want something better *that suits our work  
 patterns*. Almost all the backpatching that gets done is by the  
 committers. So we have a bunch of concerns that are not relevant to that  
 vast majority of developers. In particular, it would be nice to be able  
 to make a bunch of changes on different branches and then commit it all  
 in one hit. If that's possible, then well and good. If it's not, that's  
 a pity.

My only concern is that I am seeing 2 requirements emerge:
1) Everything has to work as it currently does with CVS
2) We want better information about how patches relate for possible
   future stuff

Unfortunately, those 2 requirements are conflicting...  If you (not
anyone personally, but the more general PostgreSQL committer) want the
repository to properly track the fixes and show their relationship,
and extra through all the branches than you really do want the
branch-to-fix and merge the fix forward into all your STABLE/master
branches, like the daggy type thing mentioned elsewhere...  But
notice, that is *very* different from the current work patterns based on
the CVS model where everything is completely independent (save the
commit message), and it's a huge change to the way developers work.

If you want to stay with the current CVS style, then you aren't going to
get any closer than commit messages matching (or possibly a reference
to another commit as an extra line) that we currently have with CVS.

My suggestion is to keep it simple.  Just work independently, like you
currently do.  You don't want every committer to have to completely
learn the advanced features of a new tool just to use it...  You can use
it as you use the less feature-full tool as you learn all the
features...

But as people start to use the new tool, and start to use it's more
advanced features, then it's natural that their results will start to be
reflected the main repository.

But insisting that people currently comfortable and proficient in the
current work patterns *have* to learn completely new ones for a
flag-day type switch and start using them immediately is going to:
* Piss them off
* Create great ill-will against the tool
And neither of those will be the fault of the tool itself, but of the
way a new process was forced in conjunction with a new tool...

I don't want to see the PG project trying to *force* a radical change in
the way the development/branches currently work at the same time as a
switch to git.  Replace the tool, and allow the current processes and
work-flows to gradually improve.  The process and work-flow improvements
will be an iterative and collaborative process, just like the actual
code improvements, where huge radical patches are generally frowned
upon.

I've used git for a long time, on many different projects.  I do know
how radically it *can* change the process, and how much more efficient
and natural the improved processes can be.  But the change is not an
overnight change.  And it's not going to happen unless the people
needing to change *see* it's benefits.  And that's going to take time
and experience with the new tool...

Anyways, I said previously that I was over with this thread, but now I
mean it ;-) If someone want specific git information or help, I'm
available.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-04 Thread Markus Wanner


Hi,

Quoting Greg Stark st...@enterprisedb.com:

This is all completely irrelevant to the CVS import.


To the CVS import it is, yes. After all, CVS has no notion of renaming  
files. But my example is about renaming with git *after* the  
conversion. Git *does* support renaming (to some extent). However, it  
fails as explained if you feed it with corrupt data (the corruption  
being the missing link between the two added files - after a rename,  
git simply has no chance of knowing it should be the same file).



I don't think
we've ever renamed files because CVS can't handle it cleanly.


Yes, that applies to the past. But I think we *are* going to rename  
files *after* the switch, because git *can* handle it cleanly - given  
a correct import.


If that defect would only affect historic information, I'd not be half  
as pestering as I am. But it's such delayed effects which might  
surprise you years after the cause, which make me nervous.



It does sound to me like we really ought to have merge commits marking
the bug fixes in old releases as merged in the equivalent commits to
later branches based on Tom's commit messages.


Now, I don't know how you got to that conclusion, but I absolutely agree ;-)

Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-04 Thread Greg Stark



On 4 Jun 2009, at 09:11, Markus Wanner mar...@bluegap.ch wrote:


Hi,

Quoting Greg Stark st...@enterprisedb.com:

This is all completely irrelevant to the CVS import.


To the CVS import it is, yes. After all, CVS has no notion of  
renaming files. But my example is about renaming with git *after*  
the conversion. Git *does* support renaming (to some extent).  
However, it fails as explained if you feed it with corrupt data  
(the corruption being the missing link between the two added files -  
after a rename, git simply has no chance of knowing it should be the  
same file).





Hmm. I see. I'm not sure we've ever added files to back branches  
either. I'm less sure of that though.





I don't think
we've ever renamed files because CVS can't handle it cleanly.


Yes, that applies to the past. But I think we *are* going to rename  
files *after* the switch, because git *can* handle it cleanly -  
given a correct import.


If that defect would only affect historic information, I'd not be  
half as pestering as I am. But it's such delayed effects which might  
surprise you years after the cause, which make me nervous.


It does sound to me like we really ought to have merge commits  
marking

the bug fixes in old releases as merged in the equivalent commits to
later branches based on Tom's commit messages.


Now, I don't know how you got to that conclusion, but I absolutely  
agree ;-)


Regards

Markus Wanner



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-04 Thread Markus Wanner


Hi,

Quoting Greg Stark greg.st...@enterprisedb.com:
Hmm. I see. I'm not sure we've ever added files to back branches  
either. I'm less sure of that though.


We did from time to time. Every merge commit in my current conversion  
contains at least one such file that got added as part of a back  
patch. The perl file mentioned in the example upstream is one of them.


Regards

Markus Wanner



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-04 Thread Markus Wanner


Hi,

Quoting Tom Lane t...@sss.pgh.pa.us:

BTW, Markus: you do realize thomas is not me but Tom Lockhart?


Uh.. thanks, that name has fallen through the cracks, before. I've  
added it now, it will be included in the next sample conversion.


Regards

Markus Wanner



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-04 Thread Markus Wanner


Hi,

Quoting Marko Kreen mark...@gmail.com:

 I'm not sure whether we should mark the old branches getting merges
 down or the new branches getting merged up. I suspect I'm missing
 something but I don't see any reason one is better than the other.


As pointed out by others, it doesn't make sense to merge (all commits  
since the last merge) from HEAD to the back branches. You'd have to  
cherry-pick only the commits which actually have to get back patched.


The new branches getting merged up could work. That is, applying the  
fix to the oldest back-branch which requires the fix first and then  
merge it to all newer ones, including HEAD. However, that would  
require some rethinking: instead of creating bugfix-patches for HEAD,  
then manually adjust patches for back-branches and then group  
committing, you'd have to create a bugfix-patch for the oldest branch  
first, commit that and then merge that to the newer branches.


I consider merging a cleaner and simpler operation than  
cherry-picking, because merging allows the VCS to keep track of what  
needs to be propagated, while with cherry-picking, you'd have to keep  
track of that manually (or with the help of other tools).


An example for that is the very same unability to properly track  
renames when cherry-picking, just like what I explained for the CVS  
conversion.



It seems to require noticeable development effort to get a importer
to a level it can do it.  Will this be a requirement for import?
Or just a good thing to have?  Also how to check if all such merges
are sensible?


If that's how you'd like to have the CVS repository represented in git  
(which I'd support as well), I'd give it a try. With all of the work  
I've done for mtn cvs_import I certainly have the necessary experience  
in CVS conversion and with the cvs2svn algorithm itself.



And note that such effort will affect only old imported history,
it will not make easier to handle back-branch fixes in the future...


Hm.. depends, if you want to merge from older branches to newer ones,  
instead of cherry-picking, it would certainly help to get the history  
clean.



Various scenarios with git cherry-pick and similar tools would still
result in duplicate commits, so we would need a git log post-processor
anyway if we want to somehow group them together for eg. weekly commit
summary.  And such post-processor would work on old history too.


I think we should decide on either using merges or using duplicate  
commits we try to link somehow. But then, we should IMO use that  
scheme for the conversion as well as later on, so as not to get a  
messy history, as you put it.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-04 Thread Ron Mayer

Markus Wanner wrote:
 The new branches getting merged up could work. That is, applying the
 fix to the oldest back-branch which requires the fix first and then
 merge it to all newer ones, including HEAD. However, that would require
 some rethinking: instead of creating bugfix-patches for HEAD, then
 manually adjust patches for back-branches and then group committing,
 you'd have to create a bugfix-patch for the oldest branch first, commit
 that and then merge that to the newer branches.

That sounds a bit dangerous too, since I imagine there are some
changes in the old release branches you wouldn't want merged into
the newest releases (say, code affecting sections that got redesigned).

Seems you'd want to do is create a new branch as close to the point
where the bug was introduced - and then merge that forward into each
of the branches.  This concept was mentioned in a page linked earlier
in the thread[1] and seems like the way monotone recommends people
use their system[2].   See that page for more reasons why they think
it's good.

[1]http://archives.postgresql.org/pgsql-hackers/2009-06/msg00153.php
[2]http://www.monotone.ca/wiki/DaggyFixes/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Markus Wanner


Hi,

Quoting Marko Kreen mark...@gmail.com:

The example was not actual case from Postgres CVS history,
but hypotetical situation without checking if it already works
with GIT.


Of course it is a simplified example, but it resembles what could  
happen i.e. to the file doc/src/sgml/generate_history.pl, which got  
added from a backported patch after forking off REL8_3_STABLE.


If you create separate commits during the conversion, rename that file  
on the master branch and then - for whatever reason - try to merge the  
two branches, you will end up having that file twice. That's what I'm  
warning about. Changes on either or both sides of the merge make the  
situation worse.



Merging between branches with GIT is fine workflow in the future.


Do you consider the above scenario a fine merge?


My point is that we should avoid fake merges, to avoid obfuscating
history.


Understood. It looks like I'm pretty much the only one who cares more  
about merge capability than nice looking history :-(


Attached is my current options file for cvs2git, it includes requested  
changes by Alvaro and additional names and emails as given by Tom  
(thanks again). A current conversion with cvs2git (and with the  
merges) results in a repository with exactly 0 differences against any  
branch or tag symbol compared to cvs checkout -kk.


Regards

Markus Wanner
# (Be in -*- mode: python; coding: utf-8 -*- mode.)

import re

from cvs2svn_lib import config
from cvs2svn_lib import changeset_database
from cvs2svn_lib.common import CVSTextDecoder
from cvs2svn_lib.log import Log
from cvs2svn_lib.project import Project
from cvs2svn_lib.git_revision_recorder import GitRevisionRecorder
from cvs2svn_lib.git_output_option import GitRevisionMarkWriter
from cvs2svn_lib.git_output_option import GitOutputOption
from cvs2svn_lib.revision_manager import NullRevisionRecorder
from cvs2svn_lib.revision_manager import NullRevisionExcluder
from cvs2svn_lib.fulltext_revision_recorder \
 import SimpleFulltextRevisionRecorderAdapter
from cvs2svn_lib.rcs_revision_manager import RCSRevisionReader
from cvs2svn_lib.cvs_revision_manager import CVSRevisionReader
from cvs2svn_lib.checkout_internal import InternalRevisionRecorder
from cvs2svn_lib.checkout_internal import InternalRevisionExcluder
from cvs2svn_lib.checkout_internal import InternalRevisionReader
from cvs2svn_lib.symbol_strategy import AllBranchRule
from cvs2svn_lib.symbol_strategy import AllTagRule
from cvs2svn_lib.symbol_strategy import BranchIfCommitsRule
from cvs2svn_lib.symbol_strategy import ExcludeRegexpStrategyRule
from cvs2svn_lib.symbol_strategy import ForceBranchRegexpStrategyRule
from cvs2svn_lib.symbol_strategy import ForceTagRegexpStrategyRule
from cvs2svn_lib.symbol_strategy import ExcludeTrivialImportBranchRule
from cvs2svn_lib.symbol_strategy import ExcludeVendorBranchRule
from cvs2svn_lib.symbol_strategy import HeuristicStrategyRule
from cvs2svn_lib.symbol_strategy import UnambiguousUsageRule
from cvs2svn_lib.symbol_strategy import HeuristicPreferredParentRule
from cvs2svn_lib.symbol_strategy import SymbolHintsFileRule
from cvs2svn_lib.symbol_transform import ReplaceSubstringsSymbolTransform
from cvs2svn_lib.symbol_transform import RegexpSymbolTransform
from cvs2svn_lib.symbol_transform import IgnoreSymbolTransform
from cvs2svn_lib.symbol_transform import NormalizePathsSymbolTransform
from cvs2svn_lib.property_setters import AutoPropsPropertySetter
from cvs2svn_lib.property_setters import CVSBinaryFileDefaultMimeTypeSetter
from cvs2svn_lib.property_setters import CVSBinaryFileEOLStyleSetter
from cvs2svn_lib.property_setters import CVSRevisionNumberSetter
from cvs2svn_lib.property_setters import DefaultEOLStyleSetter
from cvs2svn_lib.property_setters import EOLStyleFromMimeTypeSetter
from cvs2svn_lib.property_setters import ExecutablePropertySetter
from cvs2svn_lib.property_setters import KeywordsPropertySetter
from cvs2svn_lib.property_setters import MimeMapper
from cvs2svn_lib.property_setters import SVNBinaryFileKeywordsPropertySetter

Log().log_level = Log.NORMAL
ctx.revision_recorder = SimpleFulltextRevisionRecorderAdapter(
CVSRevisionReader(cvs_executable=r'cvs'),
GitRevisionRecorder('cvs2git-tmp/git-blob.dat'),
)

ctx.revision_excluder = NullRevisionExcluder()

ctx.revision_reader = None

ctx.sort_executable = r'sort'

ctx.trunk_only = False

ctx.cvs_author_decoder = CVSTextDecoder(
['ascii', 'latin1'],
)
ctx.cvs_log_decoder = CVSTextDecoder(
['ascii', 'latin1'],
)
ctx.cvs_filename_decoder = CVSTextDecoder(
['ascii', 'latin1'],
)

ctx.initial_project_commit_message = (
'Standard project directories initialized by cvs2git.'
)

ctx.post_commit_message = (
'This commit was generated by cvs2git to track changes on a CVS '
'vendor branch.'
)

ctx.symbol_commit_message = (
This commit was manufactured by cvs2git to create %(symbol_type)s 
'%(symbol_name)s'.
)

ctx.decode_apple_single = False

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Greg Stark

On Wed, Jun 3, 2009 at 12:10 PM, Markus Wanner mar...@bluegap.ch wrote:
 If you create separate commits during the conversion, rename that file on
 the master branch

This is all completely irrelevant to the CVS import. I don't think
we've ever renamed files because CVS can't handle it cleanly.

It does sound to me like we really ought to have merge commits marking
the bug fixes in old releases as merged in the equivalent commits to
later branches based on Tom's commit messages.

That would make the git history match Tom's same commit message
implicit CVS history that cvs2pcl was giving him. I find git-log's
output including merge commits kind of strange and annoying myself but
having them at least gives us a chance to have a tool that understands
them output something like cvs2pcl. Throwing away that information
because we don't like the clutter in the tool output seems like a
short-sighted plan.

That said, the commit log message isn't being lost. We could always
import the history linearly and add the merge commits later if we
decide having them would help some tool implement cvs2pcl summaries.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Andres Freund


Hi,

On 06/03/2009 02:08 PM, Greg Stark wrote:

On Wed, Jun 3, 2009 at 12:10 PM, Markus Wannermar...@bluegap.ch  wrote:
That would make the git history match Tom's same commit message
implicit CVS history that cvs2pcl was giving him. I find git-log's
output including merge commits kind of strange and annoying myself but
having them at least gives us a chance to have a tool that understands
them output something like cvs2pcl.
git log --no-merges hides the actual merge commits if that is what you 
want.


Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Greg Stark

On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund and...@anarazel.de wrote:
 git log --no-merges hides the actual merge commits if that is what you
 want.

Ooh! Life seems so much sweeter now!

Given that we don't have to see them then I'm all for marking bug fix
patches which were applied to multiple branches as merges. That seems
like it would make it easier for tools like gitk or to show useful
information analogous to the cvs2pcl info.

Given that Tom's been intentionally marking the commits with identical
commit messages we ought to be able to find *all* of them and mark
them properly. That would be way better than only finding patches that
are absolutely identical.

I'm not sure whether we should mark the old branches getting merges
down or the new branches getting merged up. I suspect I'm missing
something but I don't see any reason one is better than the other.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Magnus Hagander

Greg Stark wrote:
 On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund and...@anarazel.de wrote:
 git log --no-merges hides the actual merge commits if that is what you
 want.
 
 Ooh! Life seems so much sweeter now!
 
 Given that we don't have to see them then I'm all for marking bug fix
 patches which were applied to multiple branches as merges. That seems
 like it would make it easier for tools like gitk or to show useful
 information analogous to the cvs2pcl info.

Right, if it adds additional metadata that lets the tools do their magic
better, and it's still easy to filter out, I don't see a downside.


 Given that Tom's been intentionally marking the commits with identical
 commit messages we ought to be able to find *all* of them and mark
 them properly. That would be way better than only finding patches that
 are absolutely identical.

Just to be clear, not just Tom. All committers. I was told to do that
right after my first backpatch which *didn't* do it :-)

So it's an established project practice. That has other advantages as
well, of course..


 I'm not sure whether we should mark the old branches getting merges
 down or the new branches getting merged up. I suspect I'm missing
 something but I don't see any reason one is better than the other.

If you go from older to newer, the automatic merge algorithms have a
better chance of doing something smart since they can track previous
changes. At least I think that's how it works.

But I think for most of the changes it wouldn't make a huge difference,
though - manual merging would be needed anyway.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Marko Kreen

On 6/3/09, Greg Stark st...@enterprisedb.com wrote:
 On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund and...@anarazel.de wrote:
   git log --no-merges hides the actual merge commits if that is what you
   want.


 Ooh! Life seems so much sweeter now!

  Given that we don't have to see them then I'm all for marking bug fix
  patches which were applied to multiple branches as merges. That seems
  like it would make it easier for tools like gitk or to show useful
  information analogous to the cvs2pcl info.

  Given that Tom's been intentionally marking the commits with identical
  commit messages we ought to be able to find *all* of them and mark
  them properly. That would be way better than only finding patches that
  are absolutely identical.

  I'm not sure whether we should mark the old branches getting merges
  down or the new branches getting merged up. I suspect I'm missing
  something but I don't see any reason one is better than the other.

Although mark Tom's back-branch fixes as merges makes much more
sense than mark new files as merges, it is quite a step up from
do tags match official releases.

It seems to require noticeable development effort to get a importer
to a level it can do it.  Will this be a requirement for import?
Or just a good thing to have?  Also how to check if all such merges
are sensible?

And note that such effort will affect only old imported history,
it will not make easier to handle back-branch fixes in the future...

Various scenarios with git cherry-pick and similar tools would still
result in duplicate commits, so we would need a git log post-processor
anyway if we want to somehow group them together for eg. weekly commit
summary.  And such post-processor would work on old history too.

Maybe that's better direction to work on, than to potentially risk in
messy history in GIT?

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Robert Haas

On Wed, Jun 3, 2009 at 10:13 AM, Magnus Hagander mag...@hagander.net wrote:
 Greg Stark wrote:
 On Wed, Jun 3, 2009 at 1:19 PM, Andres Freund and...@anarazel.de wrote:
 git log --no-merges hides the actual merge commits if that is what you
 want.

 Ooh! Life seems so much sweeter now!

 Given that we don't have to see them then I'm all for marking bug fix
 patches which were applied to multiple branches as merges. That seems
 like it would make it easier for tools like gitk or to show useful
 information analogous to the cvs2pcl info.

 Right, if it adds additional metadata that lets the tools do their magic
 better, and it's still easy to filter out, I don't see a downside.

 I'm not sure whether we should mark the old branches getting merges
 down or the new branches getting merged up. I suspect I'm missing
 something but I don't see any reason one is better than the other.

 If you go from older to newer, the automatic merge algorithms have a
 better chance of doing something smart since they can track previous
 changes. At least I think that's how it works.

 But I think for most of the changes it wouldn't make a huge difference,
 though - manual merging would be needed anyway.

In practice, isn't it more likely that you would develop the change on
the newest branch and then try to back-port it?  However you do the
import, you're going to want to do subsequent things the same way.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Marko Kreen

On 6/3/09, Magnus Hagander mag...@hagander.net wrote:
 Robert Haas wrote:
   On Wed, Jun 3, 2009 at 10:13 AM, Magnus Hagander mag...@hagander.net 
 wrote:

  I'm not sure whether we should mark the old branches getting merges
   down or the new branches getting merged up. I suspect I'm missing
   something but I don't see any reason one is better than the other.
   If you go from older to newer, the automatic merge algorithms have a
   better chance of doing something smart since they can track previous
   changes. At least I think that's how it works.
  
   But I think for most of the changes it wouldn't make a huge difference,
   though - manual merging would be needed anyway.
  
   In practice, isn't it more likely that you would develop the change on
   the newest branch and then try to back-port it?  However you do the
   import, you're going to want to do subsequent things the same way.


 That's definitely the order in which *I* work, and I think that's how
  most others do it as well.

Thats true, but it's not representable in VCS, unless you use cherry-pick,
which is just UI around patch transport.  But considering separate
local trees (with can optionally contain local per-fix branches),
it is possible to separate the fix-developement from final representation.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Aidan Van Dyk

* Magnus Hagander mag...@hagander.net [090603 10:13]:
 
 Right, if it adds additional metadata that lets the tools do their magic
 better, and it's still easy to filter out, I don't see a downside.

Note, that it could (and likely will) have a downside when you get to
doing real merge-based development... A merge means that *all* changes
in *both* parents have been combined in *this* commit.  And all merge
tools depend on this.  That's the directed part of the DAG in git.  So
if you want to be working in a way that the merge tools work, you
*don't* have master/HEAD merged into REL8_2_STABLE.  You can have
REL8_2_STABLE merged into master/head.

I'll concede that in GIT, it's flexible (some say arbitrary) enough that
you can *construct* the DAG otherwise, but then you've done something in
such a fashion that the DAG has no bearing on real merging, and thus
you loose all the power of DAGs merge tracking when working on new
real merging

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Aidan Van Dyk

* Marko Kreen mark...@gmail.com [090603 10:26]:
 
 Thats true, but it's not representable in VCS, unless you use cherry-pick,
 which is just UI around patch transport.  But considering separate
 local trees (with can optionally contain local per-fix branches),
 it is possible to separate the fix-developement from final representation.

I'll note that in git, cherry-pick is *more* than just patch
transport.  I would more call it patch commute.  It does actually
look at the history between the picked patch, and the current
tree, any merge/fork points, and the differences on each path that lead
to the changes in the current tree and the picked patch.

a.
-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Magnus Hagander

Robert Haas wrote:
 On Wed, Jun 3, 2009 at 10:13 AM, Magnus Hagander mag...@hagander.net wrote:
 I'm not sure whether we should mark the old branches getting merges
 down or the new branches getting merged up. I suspect I'm missing
 something but I don't see any reason one is better than the other.
 If you go from older to newer, the automatic merge algorithms have a
 better chance of doing something smart since they can track previous
 changes. At least I think that's how it works.

 But I think for most of the changes it wouldn't make a huge difference,
 though - manual merging would be needed anyway.
 
 In practice, isn't it more likely that you would develop the change on
 the newest branch and then try to back-port it?  However you do the
 import, you're going to want to do subsequent things the same way.

That's definitely the order in which *I* work, and I think that's how
most others do it as well.


-- 
 Magnus Hagander
 Self: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Robert Haas

On Wed, Jun 3, 2009 at 10:20 AM, Marko Kreen mark...@gmail.com wrote:
 Various scenarios with git cherry-pick and similar tools would still
 result in duplicate commits, so we would need a git log post-processor
 anyway if we want to somehow group them together for eg. weekly commit
 summary.  And such post-processor would work on old history too.

 Maybe that's better direction to work on, than to potentially risk in
 messy history in GIT?

I think it is.  cherry-picking seems like a much better way of
back-patching than merging, so putting a lot of effort into making
merges work doesn't seem like a good expenditure of effort.

It seems pretty clear that searching through the histories of each
branch for duplicate commit messages and producing a unified report is
pretty straightforward if we assume that the commit messages are
byte-for-byte identical (or even modulo whitespace changes).  But I
wonder if it would make more sense to include some kind of metadata in
the commit message (or some other property of the commit?  does git
support that?) to make it not depend on that.  I suppose Tom et. al.
like the way they do it now, so maybe we should just stick with text
comparison, but it seems a bit awkward to me.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Marko Kreen

On 6/3/09, Aidan Van Dyk ai...@highrise.ca wrote:
 * Marko Kreen mark...@gmail.com [090603 10:26]:
   Thats true, but it's not representable in VCS, unless you use cherry-pick,
   which is just UI around patch transport.  But considering separate
   local trees (with can optionally contain local per-fix branches),
   it is possible to separate the fix-developement from final representation.


 I'll note that in git, cherry-pick is *more* than just patch
  transport.  I would more call it patch commute.  It does actually
  look at the history between the picked patch, and the current
  tree, any merge/fork points, and the differences on each path that lead
  to the changes in the current tree and the picked patch.

Well, thats good to know, but this also seems to mean it's rather bad
tool for back-patching, as you risk including random unwanted commits
too that happened in the HEAD meantime.  But also, it's very good
tool for forward-patching.

But my point was not about that - rather I was pointing out that
this patch-commute will result in duplicate commits, that have
no ties in DAG.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Aidan Van Dyk

* Marko Kreen mark...@gmail.com [090603 11:12]:
 
 Well, thats good to know, but this also seems to mean it's rather bad
 tool for back-patching, as you risk including random unwanted commits
 too that happened in the HEAD meantime.  But also, it's very good
 tool for forward-patching.

It doesn't pull in commits in the sense that darcs does... But rather,
its more like the patch changes $XXX in $file, but that $file was
really $old_file at the common point between the 2 commits, and
$old_file is still $old file in the commit I'm trying to apply the patch
to.

It looks at the history of the changes to figure out why (or why
not) they apply, and see if they should still be applied to the same
file, or another file (in case of a rename/moved file in 1 branch), or
if the changed area has been moved drastically in the file in one
branch, and the change should be applied there instead.

 But my point was not about that - rather I was pointing out that
 this patch-commute will result in duplicate commits, that have
 no ties in DAG.

Yes.  That's a cherry-pick, if you want a merge, you merge ;-)  But
merge carries the baggage of expectation that *all* changes in both
parents have been combined.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Marko Kreen

On 6/3/09, Aidan Van Dyk ai...@highrise.ca wrote:
 * Marko Kreen mark...@gmail.com [090603 11:12]:
   Well, thats good to know, but this also seems to mean it's rather bad
   tool for back-patching, as you risk including random unwanted commits
   too that happened in the HEAD meantime.  But also, it's very good
   tool for forward-patching.

 It doesn't pull in commits in the sense that darcs does... But rather,
  its more like the patch changes $XXX in $file, but that $file was
  really $old_file at the common point between the 2 commits, and
  $old_file is still $old file in the commit I'm trying to apply the patch
  to.

  It looks at the history of the changes to figure out why (or why
  not) they apply, and see if they should still be applied to the same
  file, or another file (in case of a rename/moved file in 1 branch), or
  if the changed area has been moved drastically in the file in one
  branch, and the change should be applied there instead.

I'm not certain, but I remember using cherry pick and seeing
several commits in result.  This seems to be a point that needs
to be checked.

   But my point was not about that - rather I was pointing out that
   this patch-commute will result in duplicate commits, that have
   no ties in DAG.


 Yes.  That's a cherry-pick, if you want a merge, you merge ;-)  But
  merge carries the baggage of expectation that *all* changes in both
  parents have been combined.

But in forward-merge case it's true.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Aidan Van Dyk

* Marko Kreen mark...@gmail.com [090603 11:28]:
 
 I'm not certain, but I remember using cherry pick and seeing
 several commits in result.  This seems to be a point that needs
 to be checked.

I'm not sure what you're recalling, but git cherry-pick takes a single
commit, and applies it as a single commit (or, with -n, doesn't actually
commit it).  That's what it does... There are various *other* tools (like
rebase, am, cherry, etc) which operate on sets of commits.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-03 Thread Ron Mayer

Robert Haas wrote:
 But I
 wonder if it would make more sense to include some kind of metadata in
 the commit message (or some other property of the commit?  does git
 support that?) to make it not depend on that.

From elsewhere in this thread[1], 'The git cherry-pick ... -x flag adds
a note to the commit comment describing the relationship between the commits.'

If the commit on the main branch had this message
=
   added a line on the main branch
=
The commit on the cherry picked branch will have this comment
=
   added a line on the main branch
(cherry picked from commit 189ef03b4f4ed5078328f7965c7bfecce318490d)
=
where the big hex string identifies the comment on the other branch.


[1] http://archives.postgresql.org/pgsql-hackers/2009-06/msg00191.php



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen

On 6/1/09, Markus Wanner mar...@bluegap.ch wrote:
  a newish conversion with cvs2git is available to check here:

   git://www.bluegap.ch/

  (it's not incremental and will only stay for a few days)

+1 for the idea of replacing CVS usernames with full names.

The knowledge about CVS usernames will be increasingly obscure.

Also worth mentioning is that there is no need to assign absolutely
up-to-date email addresses, it's enough if they uniquely identify
person.

  Aidan Van Dyk wrote:
   Yes, but the point is you want an exact replica of CVS right?  You're
   git repo should have $PostgreSQL$ and the cvs export/checkout (you do
   use -kk right) should also have $PostgreSQL$.


 No, I'm testing against cvs checkout, as that's what everybody is used to.


   But it's important, because on *some* files you *do* want expanded
   keywords (like the $OpenBSD ... Exp $.  One of the reasons pg CVS went
   to the $PostgreSQL$ keyword (I'm guessing) was so they could explictly
   de-couple them from other keywords that they didn't want munging on.


 I don't care half as much about the keyword expansion stuff - that's
  doomed to disappear anyway.

But this is one aspect we need to get right for the conversion.

So preferably we test it sooner not later.

I think Aidan got it right - expand $PostgreSQL$ and others that are
actually expanded on current repo, but not $OpenBSD$ and others
coming from external sources.

  What I'm much more interested in is correctness WRT historic contents,
  i.e. that git log, git blame, etc.. deliver correct results. That's
  certainly harder to check.

  In my experience, cvs2svn (or cvs2git) does a pretty decent job at that,
  even in case of some corruptions. Plus it offers lots of options to fine
  tune the conversion, see the attached configuration I've used.


   So, I wouldn't consider any conversion good unless it had all these:
  

  As well as stuff like:
 parsecvs-master:src/backend/access/index/genam.c: *   
 $PostgreSQL$


 I disagree here and find it more convenient for the git repository to
  keep the old RCS versions - as in the source tarballs that got (and
  still get) shipped. Just before switching over to git one can (and
  should, IMO) remove these tags to avoid confusion.

I'd prefer we immediately test full conversion and not leave some
steps to last moment.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen

On 6/2/09, Marko Kreen mark...@gmail.com wrote:
 On 6/1/09, Markus Wanner mar...@bluegap.ch wrote:
a newish conversion with cvs2git is available to check here:
  
 git://www.bluegap.ch/
  
(it's not incremental and will only stay for a few days)

Btw this conversion seems broken as it contains random merge commits.

parsecvs managed to do it without them.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner


Hi,

Quoting Marko Kreen mark...@gmail.com:

I don't care half as much about the keyword expansion stuff - that's
 doomed to disappear anyway.


But this is one aspect we need to get right for the conversion.


What's your definition of right? I personally prefer the keyword  
expansion to match a cvs checkout as closely as possible.



So preferably we test it sooner not later.


I actually *am* testing against that. As mentioned, the only  
differences are insignificant, IMO. For example having 1.1.1.1  
instead of 1.1 (or vice versa, I don't remember).



I think Aidan got it right - expand $PostgreSQL$ and others that are
actually expanded on current repo, but not $OpenBSD$ and others
coming from external sources.


AFAIU Aidan proposed the exact opposite.

I'm proposing to leave both expanded, as in a CVS checkout and as  
shipped in the source release tarballs.



I'd prefer we immediately test full conversion and not leave some
steps to last moment.


IMO that would equal to changing history, so that a checkout from git  
doesn't match a released tarball as good as possible.


What you call leave(ing) some steps to last moment is IMO not part  
of the conversion. It's rather a conscious decision to drop these  
keywords as soon as we switch to git. This step should be represented  
in history as a separate commit, IMO.


What do others think?

Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen

On 6/2/09, Markus Wanner mar...@bluegap.ch wrote:
  Quoting Marko Kreen mark...@gmail.com:
   I don't care half as much about the keyword expansion stuff - that's
doomed to disappear anyway.
  
 
  But this is one aspect we need to get right for the conversion.
 

  What's your definition of right? I personally prefer the keyword
 expansion to match a cvs checkout as closely as possible.

This is Definitely Wrong (tm).  You seem to be thinking that comparing
GIT checkout to random parallel CVS checkout (eg. from .tgz.) is the
main use-case.  It is not.  Browsing history and looking and diffs
between versions is.  And expanded CVS keywords would be total PITA
for that.

  So preferably we test it sooner not later.
 

  I actually *am* testing against that. As mentioned, the only differences
 are insignificant, IMO. For example having 1.1.1.1 instead of 1.1 (or
 vice versa, I don't remember).

Why have those at all...

  I think Aidan got it right - expand $PostgreSQL$ and others that are
  actually expanded on current repo, but not $OpenBSD$ and others
  coming from external sources.
 

  AFAIU Aidan proposed the exact opposite.

Ah, sorry, my thinko.  s/expanded/stripped/.  Take Aidan's description
as authoritative.. :)

  I'm proposing to leave both expanded, as in a CVS checkout and as shipped
 in the source release tarballs.

No, the noise they add to history would seriously hurt usability.

  I'd prefer we immediately test full conversion and not leave some
  steps to last moment.
 

  IMO that would equal to changing history, so that a checkout from git
 doesn't match a released tarball as good as possible.

We need to compare against tarballs only when checking the conversion.
And only then.  Writing few scripts for that should not be a problem.

  What you call leave(ing) some steps to last moment is IMO not part of the
 conversion. It's rather a conscious decision to drop these keywords as soon
 as we switch to git. This step should be represented in history as a
 separate commit, IMO.

The question is how they should appear in historical commits.

I have no strong opinion whether to edit them out or not in the future.
Doing it during the periodic reindent would be good moment tho'.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner


Hi,

Quoting Marko Kreen mark...@gmail.com:

Btw this conversion seems broken as it contains random merge commits.


Well, that's a feature, not a bug ;-)

When a commit adds a file to the master *and* then to the branch as  
well, cvs2git prefers to represent this as a merge from the master  
branch, instead of adding the file twice, once on the master and once  
on the branch.


This way the target VCS knows it's the *same* file, originating from  
one single commit. This may be important for later merges - otherwise  
you may suddenly end up with duplicated files after a merge, because  
the VCS doesn't know they are in fact the same.


(Okay, git assumes two files to have the same origin/history as long  
as they have the same filename. But just rename one of the two, and  
you are have the same troubles, again).


Also note that these situations occur rather frequently in the  
Postgres CVS repository. Every back-patch which adds files ends up as  
a merge. (One could even argue that in the perfect conversion *all*  
back-patches should be represented as merges, rather than as separate  
commits).



parsecvs managed to do it without them.


Now, I'm not calling it broken, but cvs2git's output is arguably  
better in that regard.


As you certainly see by now, conversion from CVS is neither simple nor  
unambiguous.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner


Hi,

Quoting Marko Kreen mark...@gmail.com:

This is Definitely Wrong (tm).  You seem to be thinking that comparing
GIT checkout to random parallel CVS checkout (eg. from .tgz.) is the
main use-case.  It is not.  Browsing history and looking and diffs
between versions is.  And expanded CVS keywords would be total PITA
for that.


That's an agrument. Point taken. I'll check if cvs2git supports that as well.

Regards

Markus Wanner



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen

On 6/2/09, Markus Wanner mar...@bluegap.ch wrote:
  Quoting Marko Kreen mark...@gmail.com:
  Btw this conversion seems broken as it contains random merge commits.
 

  Well, that's a feature, not a bug ;-)

  When a commit adds a file to the master *and* then to the branch as well,
 cvs2git prefers to represent this as a merge from the master branch, instead
 of adding the file twice, once on the master and once on the branch.

  This way the target VCS knows it's the *same* file, originating from one
 single commit. This may be important for later merges - otherwise you may
 suddenly end up with duplicated files after a merge, because the VCS doesn't
 know they are in fact the same.

  (Okay, git assumes two files to have the same origin/history as long as
 they have the same filename. But just rename one of the two, and you are
 have the same troubles, again).

Not a problem for git I think - it assumes they are same if they have
same contents...

  Also note that these situations occur rather frequently in the Postgres CVS
 repository. Every back-patch which adds files ends up as a merge. (One could
 even argue that in the perfect conversion *all* back-patches should be
 represented as merges, rather than as separate commits).

Well, such behaviour may be a feature for some repo with complex CVS
usage, but currently we should aim for simple and clear conversion.

The question is - do such merges make any sense to human looking at
history - and the answer is no, as no VCS level merge was happening,
just some copying around (if your description is correct).  And
we don't need to add noise for the benefit of GIT as it works fine
without any fake merges.

Our target should be each branch having simple linear history,
without any fake merges.  This will result in minimal confusion
to both humans looking history and also GIT itself.

So please turn the merge logic off.  If this cannot be turned off,
cvs2git is not usable for conversion.

  parsecvs managed to do it without them.
 

  Now, I'm not calling it broken, but cvs2git's output is arguably better in
 that regard.

Seems it contains more complex logic to handle more complex CVS usage
cases, but seems like overkill for us if it creates a mess of history.

  As you certainly see by now, conversion from CVS is neither simple nor
 unambiguous.

I know, thats why I'm discussing the tradeoffs.  Simple+clear vs.
complex+messy. :)

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Aidan Van Dyk

* Markus Wanner mar...@bluegap.ch [090602 07:08]:
 Hi,

 Quoting Marko Kreen mark...@gmail.com:
 I don't care half as much about the keyword expansion stuff - that's
  doomed to disappear anyway.

 But this is one aspect we need to get right for the conversion.

 What's your definition of right? I personally prefer the keyword  
 expansion to match a cvs checkout as closely as possible.

 AFAIU Aidan proposed the exact opposite.

 I'm proposing to leave both expanded, as in a CVS checkout and as  
 shipped in the source release tarballs.

Well, since I have -kk set in my .cvsrc, mine matches exactly the CVS
checkout l-)

Basically, I want the git to be identical to the cvs checkout.  If you
use -kk, that means the PostgreSQL CVS repository keywords *aren't*
expanded.  If you like -kv, that means they are.

Pick your poison (after all, it's CVS), either way, I think the 2 of
*us* are going to disagree which is best here ;-)

But, which ever way (exact to -kk or exact to -kv), the conversion
should be exact, and there should be no reason to filter out
keyword-like stuff in the diffs.

 What you call leave(ing) some steps to last moment is IMO not part of 
 the conversion. It's rather a conscious decision to drop these keywords 
 as soon as we switch to git. This step should be represented in history 
 as a separate commit, IMO.

 What do others think?

I'm assuming they will get removed from the source eventually too - but
that step is *outside* the conversion.  Somebody could do it now in CVS
before the conversion, or afterwards, but it's still outside the
conversion.


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner


Hi,

Quoting Aidan Van Dyk ai...@highrise.ca:

Pick your poison (after all, it's CVS), either way, I think the 2 of
*us* are going to disagree which is best here ;-)


Marko already convinced me of -kk, I'm trying that with cvs2git.


But, which ever way (exact to -kk or exact to -kv), the conversion
should be exact, and there should be no reason to filter out
keyword-like stuff in the diffs.


I just really didn't want to care about keyword expansion. Besides  
lacking consistency, it's one of the worst misfeatures of CVS, IMNSHO.  
;-)


I'll let you know how cvs2git behaves WRT -kk.

Regards

Markus Wanner



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Aidan Van Dyk

* Markus Wanner mar...@bluegap.ch [090602 09:37]:

 Marko already convinced me of -kk, I'm trying that with cvs2git.

Good ;-)

 I just really didn't want to care about keyword expansion. Besides  
 lacking consistency, it's one of the worst misfeatures of CVS, IMNSHO.  
 ;-)

Absolutely...  And one of the reasons I've had -kk in my .cvsrc for
years, even before I started with git.

 I'll let you know how cvs2git behaves WRT -kk.

Cool..

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner


Hi,

Quoting Marko Kreen mark...@gmail.com:

Not a problem for git I think


Knowing that git doesn't track files as hard as monotone, I  
certainly doubt that.



- it assumes they are same if they have
same contents...


Why do you assume they have the same contents? Obviously these are  
different branches, where files can (and will!) have different contents.



Well, such behaviour may be a feature for some repo with complex CVS
usage, but currently we should aim for simple and clear conversion.


First of all, we should aim for a correct one.


The question is - do such merges make any sense to human looking at
history - and the answer is no, as no VCS level merge was happening,
just some copying around (if your description is correct).  And
we don't need to add noise for the benefit of GIT as it works fine
without any fake merges.


For low expectations of it works, maybe yes. However if you don't  
tell git, it has no chance of knowing that two (different) files  
should actually be the same.


Try the following:

 git init
 echo base  basefile
 git add basefile
 git commit -m base commit
 git checkout -b branch
 echo hello, world  testfile
 git add testfile
 git commit testfile -m addition on branch
 git checkout master
 echo hello world  testfile
 git add testfile
 git commit testfile -m addition on master

 # here we are a similar point like after a lacking conversion, having two
 # distinct, i.e. historically independent files called testfile

 git mv testfile movedfile
 git commit -m file moved
 git checkout branch
 git merge master
 ls

 # Bang, you suddenly have 'testfile' and 'movedfile', go figure!


I leave it as an exercise for the reader to try the same with a single  
historic origin of the file, as cvs2git does the conversion.



Our target should be each branch having simple linear history,
without any fake merges.  This will result in minimal confusion
to both humans looking history and also GIT itself.


I don't consider the above a minimal confusion. And concerning  
humans... you get used to merge commits pretty quickly. I for one am  
more confused by a linear history which in fact is not.


As mentioned before, I'd personally favor *all* of the back-ports to  
actually be merges of some sort, because that's what they effectively  
are. However, that also bring up the question of how we are going to  
do back-patches in the future with git.



So please turn the merge logic off.  If this cannot be turned off,
cvs2git is not usable for conversion.


As far as I know, it cannot be turned off. Use parsecvs if you want to  
get silly side effects later on in history. ;-)



Seems it contains more complex logic to handle more complex CVS usage
cases, but seems like overkill for us if it creates a mess of history.


You consider it a mess, I consider it a better and more valid  
representation of the mess that CVS is.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Aidan Van Dyk

* Markus Wanner mar...@bluegap.ch [090602 10:23]:

  # Bang, you suddenly have 'testfile' and 'movedfile', go figure!

 I leave it as an exercise for the reader to try the same with a single  
 historic origin of the file, as cvs2git does the conversion.

Sure, and we can all construct example where that move is both right and
wrong...  But the point is that in PostgreSQL, (and that may be mainly
because we're using CVS), merges *aren't* something that happens.
Patches are written against HEAD (master) and then back-patched...

If you want to turn PostgreSQL devellopment on it's head, then we can
switch this around, so that patches are always done on the oldest
branch, and fixes always merged forward...

I'm not going to be the one that pushes that though ;-)

 I don't consider the above a minimal confusion. And concerning  
 humans... you get used to merge commits pretty quickly. I for one am  
 more confused by a linear history which in fact is not.

But the fact is, everyone using CVS wants a linear history. All
they care about is cvs update...wait...cvs update ... time ... cvs
update .. Everything *was* linear to them.  Any merge type things
certaily wasn't intentional in CVS...

 As mentioned before, I'd personally favor *all* of the back-ports to  
 actually be merges of some sort, because that's what they effectively  
 are. However, that also bring up the question of how we are going to do 
 back-patches in the future with git.

Well, if people get comfortable with it, I expect that backports don't
happenen.. Bugs are fixed where they happen, and merged forward into
all affected later development based on the bugged area.

 As far as I know, it cannot be turned off. Use parsecvs if you want to  
 get silly side effects later on in history. ;-)

Ya, that's one of the reasons I considered parsecvs the leading
candidate...  And why I went thouth, and showed that with the exception
of the one REL_8_0_0 tip, it *was* and exact copy of the current CVS
repository (minus the 1 messed up tag in the repository).

 You consider it a mess, I consider it a better and more valid  
 representation of the mess that CVS is.

So much better that it makes the history as useless as CVS... I think
one of the reasons people are wanting tomove from CVS to git is that it
makes things *better*...  The exact history will *always* be
available, right in CVS if people need it.  I thin the goal is to make
the git history as close to CVS as possible, such that it's useful.  I
mean, if we want it to be a more valid representation, then really, we
should be doing every file change in a single commit, and merging that
file commit into the branch *every* *single* *time*... I don't think
anybody wants our conversion to be that much better and move valid
representation of the mess that CVS is...

It's a balance...  We're moving because we want *better* tools and
access, not the same mess that CVS is.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Alvaro Herrera

Aidan Van Dyk escribió:
 * Markus Wanner mar...@bluegap.ch [090602 10:23]:
 
   # Bang, you suddenly have 'testfile' and 'movedfile', go figure!
 
  I leave it as an exercise for the reader to try the same with a single  
  historic origin of the file, as cvs2git does the conversion.
 
 Sure, and we can all construct example where that move is both right and
 wrong...  But the point is that in PostgreSQL, (and that may be mainly
 because we're using CVS), merges *aren't* something that happens.
 Patches are written against HEAD (master) and then back-patched...
 
 If you want to turn PostgreSQL devellopment on it's head, then we can
 switch this around, so that patches are always done on the oldest
 branch, and fixes always merged forward...

The Monotone folk call this daggy fixes and it seems a clean way to
handle things.

http://www.monotone.ca/wiki/DaggyFixes/

However,

 I'm not going to be the one that pushes that though ;-)

I'm not either.  Maybe someday we'll be familiar enough with the tools
to make things this way, but I think just after the migration we'll
mainly want to be able to press on with development and not waste too
much time learning the new toys.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Tom Lane

Aidan Van Dyk ai...@highrise.ca writes:
 * Markus Wanner mar...@bluegap.ch [090602 10:23]:
 You consider it a mess, I consider it a better and more valid  
 representation of the mess that CVS is.

 So much better that it makes the history as useless as CVS... I think
 one of the reasons people are wanting tomove from CVS to git is that it
 makes things *better*...

FWIW, the tool that I customarily use (cvs2cl) considers commits on
different branches to be the same if they have the same commit message
and occur sufficiently close together (within a few minutes).  My
committing habits have been designed around that behavior for years,
and I believe other PG committers have been doing likewise.

I would consider a git conversion to be less useful to me, not more,
if it insists on showing me such cases as separate commits --- and if
it then adds useless merge messages on top of that, I'd start to get
seriously annoyed.

What we want here is a readable equivalent of the CVS history, not
necessarily something that is theoretically an exact equivalent.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner


Hi,

Quoting Aidan Van Dyk ai...@highrise.ca:

Sure, and we can all construct example where that move is both right and
wrong...


Huh? The problem is the file duplication. The move is an action of a  
committer - it's neither right nor wrong in this example.


I cannot see any use case for seemingly random files poping up out of  
nowhere, just because git doesn't know how to merge two files after a  
mv and a merge.



But the point is that in PostgreSQL, (and that may be mainly
because we're using CVS), merges *aren't* something that happens.
Patches are written against HEAD (master) and then back-patched...


..which can (and better is) represented as a merge in git (for the  
sake of comfortable automated merging).



If you want to turn PostgreSQL devellopment on it's head, then we can
switch this around, so that patches are always done on the oldest
branch, and fixes always merged forward...


I'd consider that good use of tools, yes. However, I realize that this  
probably is pipe-dreaming...



But the fact is, everyone using CVS wants a linear history. All
they care about is cvs update...wait...cvs update ... time ... cvs
update .. Everything *was* linear to them.  Any merge type things
certaily wasn't intentional in CVS...


..no, it just wasn't possible in CVS. Switching to git, people soon  
want merge type things. Heck, it's probably *the* reason for  
switching to git.



So much better that it makes the history as useless as CVS... I think
one of the reasons people are wanting tomove from CVS to git is that it
makes things *better*...


Yes, especially merging. Please don't cripple that ability just  
because CVS once upon a time enforced a linear history.



The exact history will *always* be
available, right in CVS if people need it.


Agreed. Please note that I mostly talk about a more correct  
representation *of history*, as it happened. This has nothing to do  
with single commits per file.



It's a balance...  We're moving because we want *better* tools and
access, not the same mess that CVS is.


Agreed. And please cut as many of its burdens of the past, like  
linearity. History is not linear and has never been. But I'm stopping  
now before getting overly philosophic...


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Greg Stark

On Tue, Jun 2, 2009 at 4:02 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:


 The Monotone folk call this daggy fixes and it seems a clean way to
 handle things.

 http://www.monotone.ca/wiki/DaggyFixes/

Is this like what git calls an octopus? I've been wondering what the
point of such things were.

Or maybe not. I thought an octopus was two patches with the same
parent -- ie, two patches that could independently be applied in any
order.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen

On 6/2/09, Tom Lane t...@sss.pgh.pa.us wrote:
 Aidan Van Dyk ai...@highrise.ca writes:
   * Markus Wanner mar...@bluegap.ch [090602 10:23]:

  You consider it a mess, I consider it a better and more valid
   representation of the mess that CVS is.

   So much better that it makes the history as useless as CVS... I think
   one of the reasons people are wanting tomove from CVS to git is that it
   makes things *better*...


 FWIW, the tool that I customarily use (cvs2cl) considers commits on
  different branches to be the same if they have the same commit message
  and occur sufficiently close together (within a few minutes).  My
  committing habits have been designed around that behavior for years,
  and I believe other PG committers have been doing likewise.

  I would consider a git conversion to be less useful to me, not more,
  if it insists on showing me such cases as separate commits --- and if
  it then adds useless merge messages on top of that, I'd start to get
  seriously annoyed.

They cannot be same commits in GIT as the resulting tree is different.
You could tie them with some sort of merge commits, but doubt the
result would be worth the noise.

Also I doubt there is tool grokking such commits anyway, the merge
discussion above was for full files with exact contents appearing
in several branches.

  What we want here is a readable equivalent of the CVS history, not
  necessarily something that is theoretically an exact equivalent.

I suggest setting the goal to be simple and clear representation
of CVS history that we can make sense later, instead of revising
CVS history to look like we used some better VCS system...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen

On 6/2/09, Markus Wanner mar...@bluegap.ch wrote:
 [academic nitpicking]

Sorry, not going there.  Just look at the state of VCS systems
that have prioritized academic issues insead of practicality...
(arch/darcs/monotone/etc..)

  So please turn the merge logic off.  If this cannot be turned off,
  cvs2git is not usable for conversion.
 

  As far as I know, it cannot be turned off. Use parsecvs if you want to get
 silly side effects later on in history. ;-)

--no-cross-branch-commits seems sort of that direction?

And what silly side effects are you talking about?  I see only cvs2git
doing silly things...

(I'm talking about only in context of Postgres CVS repo, not in general.)

  Seems it contains more complex logic to handle more complex CVS usage
  cases, but seems like overkill for us if it creates a mess of history.
 

  You consider it a mess, I consider it a better and more valid
 representation of the mess that CVS is.

Note that merge is no file-level but tree level.  Also note we don't
use branches for feature developement but for major version maintenance.

So how can single file appearing in 2 branches means merge of 2 trees?
How can that be valid?

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Robert Haas

On Tue, Jun 2, 2009 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Aidan Van Dyk ai...@highrise.ca writes:
 * Markus Wanner mar...@bluegap.ch [090602 10:23]:
 You consider it a mess, I consider it a better and more valid
 representation of the mess that CVS is.

 So much better that it makes the history as useless as CVS... I think
 one of the reasons people are wanting tomove from CVS to git is that it
 makes things *better*...

 FWIW, the tool that I customarily use (cvs2cl) considers commits on
 different branches to be the same if they have the same commit message
 and occur sufficiently close together (within a few minutes).  My
 committing habits have been designed around that behavior for years,
 and I believe other PG committers have been doing likewise.

Interesting.  I was wondering why all your commit messages always show
up simultaneously for all the back branches.

 I would consider a git conversion to be less useful to me, not more,
 if it insists on showing me such cases as separate commits --- and if
 it then adds useless merge messages on top of that, I'd start to get
 seriously annoyed.

There's no help for them being separate commits, but I agree that
useless merge commits are a bad thing.  There are plenty of ways to
avoid that, though; I've been using git cherry-pick a lot recently,
and I think git rebase --onto also has some potential.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner


Hi,

Quoting Tom Lane t...@sss.pgh.pa.us:

FWIW, the tool that I customarily use (cvs2cl) considers commits on
different branches to be the same if they have the same commit message
and occur sufficiently close together (within a few minutes).  My
committing habits have been designed around that behavior for years,
and I believe other PG committers have been doing likewise.


Yeah, that's how I see things as well.


I would consider a git conversion to be less useful to me, not more,
if it insists on showing me such cases as separate commits --- and if
it then adds useless merge messages on top of that, I'd start to get
seriously annoyed.


Hm.. well, in git, there's no such thing as a commit that spans  
multiple branches. So it's impossible to fulfill both of your wishes  
here.


parsecvs creates multiple independent commits in such a case.

cvs2git creates a single commit and propagates this to the back  
branches with merge commits (however, only if new files are added,  
otherwise it does the same as parsecvs).



What we want here is a readable equivalent of the CVS history, not
necessarily something that is theoretically an exact equivalent.


Understood. However, readability depends on the user's habits. But  
failing to merge due to a lacking conversion potentially hurts  
everybody who wants to merge.


Having used merging (in combination with renaming) often enough, I'd  
certainly be pretty annoyed if merges suddenly begin to bring up  
spurious file duplicates.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner


Hi,

Quoting Marko Kreen mark...@gmail.com:

Sorry, not going there.  Just look at the state of VCS systems
that have prioritized academic issues insead of practicality...
(arch/darcs/monotone/etc..)


I already am there. And I don't want to go back, thanks. But my bias  
for monotone certainly shines through, yes ;-)



--no-cross-branch-commits seems sort of that direction?


Yes, that could lead to the same defect. Uhm.. thank you for pointing  
that out, I'm not gonna try it, sorry.



And what silly side effects are you talking about?


I'm talking about spurious file duplicates popping up after a rename  
and a merge, see my example in this thread.



 You consider it a mess, I consider it a better and more valid
representation of the mess that CVS is.


Note that merge is no file-level but tree level.


Depends on your point of view. Each file gets merged pretty  
indivitually, but the result ends up in a single commit, yes.



Also note we don't
use branches for feature developement but for major version maintenance.


So? You think you are never going to merge?


So how can single file appearing in 2 branches means merge of 2 trees?
How can that be valid?


I'm not sure what you are questioning here.

I find it perfectly reasonable to build something on top of  
REL8_3_STABLE and later on wanting to merge to REL8_4_STABLE. And I  
don't want to manually merge my changes, just because of a rename in  
8.4 and a bad decision during the migration to git.


(And no, I don't think any of the other git tools will help with this,  
due to the academic-nitpick-reasons above).


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen

On 6/2/09, Markus Wanner mar...@bluegap.ch wrote:
  Quoting Marko Kreen mark...@gmail.com:
  And what silly side effects are you talking about?
 

  I'm talking about spurious file duplicates popping up after a rename and a
 merge, see my example in this thread.

The example was not actual case from Postgres CVS history,
but hypotetical situation without checking if it already works
with GIT.

  Also note we don't
  use branches for feature developement but for major version maintenance.
 

  So? You think you are never going to merge?


  So how can single file appearing in 2 branches means merge of 2 trees?
  How can that be valid?
 

  I'm not sure what you are questioning here.

  I find it perfectly reasonable to build something on top of REL8_3_STABLE
 and later on wanting to merge to REL8_4_STABLE. And I don't want to manually
 merge my changes, just because of a rename in 8.4 and a bad decision during
 the migration to git.

  (And no, I don't think any of the other git tools will help with this, due
 to the academic-nitpick-reasons above).

Merging between branches with GIT is fine workflow in the future.

But we are currently discussing how to convert CVS history to GIT.
My point is that we should avoid fake merges, to avoid obfuscating
history.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Ron Mayer

Aidan Van Dyk wrote:
 * Markus Wanner mar...@bluegap.ch [090602 10:23]:
 As mentioned before, I'd personally favor *all* of the back-ports to  
 actually be merges of some sort, because that's what they effectively  
 are. However, that also bring up the question of how we are going to do 
 back-patches in the future with git.
 
 Well, if people get comfortable with it, I expect that backports don't
 happenen.. Bugs are fixed where they happen, and merged forward into
 all affected later development based on the bugged area.

I imagine the closest thing to existing practices would be that people
would to use git-cherry-pick -x -n to backport only the commits they
wanted from the current branch into the back branches.

AFAICT, this doesn't record a merge in the GIT history, but looks a lot
like the linear history from CVS - with the exception that the comment
added by -x explicitly refers to the exact commit from the main branch.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-01 Thread Markus Wanner

Hi,

a newish conversion with cvs2git is available to check here:

  git://www.bluegap.ch/

(it's not incremental and will only stay for a few days)


For everybody interested, please check the committer names and emails.
I'm missing the names and email addresses for these committers:

'barry' : ('barry??', ''),
'dennis' : ('Dennis??', ''),
'inoue' : ('inoue??', ''),
'jurka' : ('jurka??', ''),
'pjw' : ('pjw??', ''),

And I'm guessing that 'peter' is the same as 'petere':

'peter' : ('Peter Eisentraut (?)', 'pete...@gmx.net'),


I've compared all branch heads and all tags with a cvs checkout. The
only differences are keyword expansion errors. Most commonly the RCS
version 1.1 is used in the resulting git repository, instead of
version 1.1.1.1. This also leads to getting dates wrong ($Date keyword).

I'm unsure on how to test Tom's requirement that every commit and its
log message is included in the resulting git repository. Feel free to
clone and inspect the mentioned git repository and propose improvements
on the cvs2git options used.

Aidan Van Dyk wrote:
 Yes, but the point is you want an exact replica of CVS right?  You're
 git repo should have $PostgreSQL$ and the cvs export/checkout (you do
 use -kk right) should also have $PostgreSQL$.

No, I'm testing against cvs checkout, as that's what everybody is used to.

 But it's important, because on *some* files you *do* want expanded
 keywords (like the $OpenBSD ... Exp $.  One of the reasons pg CVS went
 to the $PostgreSQL$ keyword (I'm guessing) was so they could explictly
 de-couple them from other keywords that they didn't want munging on.

I don't care half as much about the keyword expansion stuff - that's
doomed to disappear anyway.

What I'm much more interested in is correctness WRT historic contents,
i.e. that git log, git blame, etc.. deliver correct results. That's
certainly harder to check.

In my experience, cvs2svn (or cvs2git) does a pretty decent job at that,
even in case of some corruptions. Plus it offers lots of options to fine
tune the conversion, see the attached configuration I've used.

 So, I wouldn't consider any conversion good unless it had all these:
 
 As well as stuff like:
   parsecvs-master:src/backend/access/index/genam.c: *   $PostgreSQL$

I disagree here and find it more convenient for the git repository to
keep the old RCS versions - as in the source tarballs that got (and
still get) shipped. Just before switching over to git one can (and
should, IMO) remove these tags to avoid confusion.

Regards

Markus Wanner
# (Be in -*- mode: python; coding: utf-8 -*- mode.)

import re

from cvs2svn_lib import config
from cvs2svn_lib import changeset_database
from cvs2svn_lib.common import CVSTextDecoder
from cvs2svn_lib.log import Log
from cvs2svn_lib.project import Project
from cvs2svn_lib.git_revision_recorder import GitRevisionRecorder
from cvs2svn_lib.git_output_option import GitRevisionMarkWriter
from cvs2svn_lib.git_output_option import GitOutputOption
from cvs2svn_lib.revision_manager import NullRevisionRecorder
from cvs2svn_lib.revision_manager import NullRevisionExcluder
from cvs2svn_lib.fulltext_revision_recorder \
 import SimpleFulltextRevisionRecorderAdapter
from cvs2svn_lib.rcs_revision_manager import RCSRevisionReader
from cvs2svn_lib.cvs_revision_manager import CVSRevisionReader
from cvs2svn_lib.checkout_internal import InternalRevisionRecorder
from cvs2svn_lib.checkout_internal import InternalRevisionExcluder
from cvs2svn_lib.checkout_internal import InternalRevisionReader
from cvs2svn_lib.symbol_strategy import AllBranchRule
from cvs2svn_lib.symbol_strategy import AllTagRule
from cvs2svn_lib.symbol_strategy import BranchIfCommitsRule
from cvs2svn_lib.symbol_strategy import ExcludeRegexpStrategyRule
from cvs2svn_lib.symbol_strategy import ForceBranchRegexpStrategyRule
from cvs2svn_lib.symbol_strategy import ForceTagRegexpStrategyRule
from cvs2svn_lib.symbol_strategy import ExcludeTrivialImportBranchRule
from cvs2svn_lib.symbol_strategy import ExcludeVendorBranchRule
from cvs2svn_lib.symbol_strategy import HeuristicStrategyRule
from cvs2svn_lib.symbol_strategy import UnambiguousUsageRule
from cvs2svn_lib.symbol_strategy import HeuristicPreferredParentRule
from cvs2svn_lib.symbol_strategy import SymbolHintsFileRule
from cvs2svn_lib.symbol_transform import ReplaceSubstringsSymbolTransform
from cvs2svn_lib.symbol_transform import RegexpSymbolTransform
from cvs2svn_lib.symbol_transform import IgnoreSymbolTransform
from cvs2svn_lib.symbol_transform import NormalizePathsSymbolTransform
from cvs2svn_lib.property_setters import AutoPropsPropertySetter
from cvs2svn_lib.property_setters import CVSBinaryFileDefaultMimeTypeSetter
from cvs2svn_lib.property_setters import CVSBinaryFileEOLStyleSetter
from cvs2svn_lib.property_setters import CVSRevisionNumberSetter
from cvs2svn_lib.property_setters import DefaultEOLStyleSetter
from cvs2svn_lib.property_setters import EOLStyleFromMimeTypeSetter

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-01 Thread Tom Lane

Markus Wanner mar...@bluegap.ch writes:
 I'm missing the names and email addresses for these committers:

 'barry' : ('barry??', ''),

Barry Lind, formerly one of the JDBC bunch, been inactive for awhile

 'dennis' : ('Dennis??', ''),

I suppose this must be Dennis Björklund, but I didn't realize he
used to be a committer.

 'inoue' : ('inoue??', ''),

Hiroshi Inoue, still active, but ODBC is not part of core anymore

 'jurka' : ('jurka??', ''),

Kris Jurka, still active, but JDBC is not part of core anymore

 'pjw' : ('pjw??', ''),

Philip Warner, inactive (still reads the lists though)

 And I'm guessing that 'peter' is the same as 'petere':

 'peter' : ('Peter Eisentraut (?)', 'pete...@gmx.net'),

No, that would be Peter Mount, also a retired JDBC hacker.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-01 Thread Alvaro Herrera

Tom Lane wrote:
 Markus Wanner mar...@bluegap.ch writes:

  'dennis' : ('Dennis??', ''),
 
 I suppose this must be Dennis Bj�rklund, but I didn't realize he
 used to be a committer.

IIRC he was given commit privs for translation files.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-01 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane wrote:
 I suppose this must be Dennis Björklund, but I didn't realize he
 used to be a committer.

 IIRC he was given commit privs for translation files.

Ah, right, that does ring a bell now.

BTW, Markus: you do realize thomas is not me but Tom Lockhart?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-29 Thread Markus Wanner


Hi,

Quoting Robert Haas robertmh...@gmail.com:

That's not the best news I've had today...


Sorry :-(


To me they sound complex and inconvenient.  I guess I'm kind of
mystified by why we can't make this work reliably.  Other than the
broken tags issue we've discussed, it seems like the only real issue
should be how to group changes to different files into a single
commit.  Once you do that, you should be able to construct a
well-defined, total function f : cvs-file, cvs-revision - git
commit which is surjective on the space of git commits.  In fact it
might be a good idea to explicitly construct this mapping and drop it
into a database table somewhere so that people can sanity check it as
much as they wish.  Why is this harder than I think it is?


Well, as CVS doesn't guarantee any consistency between files, you end  
up with silly situations more often than you think. One of the  
simplest possible example is something like:


  commit 1: fileA @ 1.1, fileB @ 1.2
  commit 2: fileA @ 1.2, fileB @ 1.1

Seen from fileA, it's obvious that commit 1 (@1.1) comes before commit  
2 (@1.2), but seen from fileB it's the exact opposite. The most  
promising approach to solve these problems seems to be based on Graph  
Theory, where you work with a graph of dependencies from fileA @ 1.1  
to fileA @ 1.2.


To resolve the above situation, you'd have split a blob of  
single-file commits into two end-result commits (for monotone / git).  
In the above example, you'd have two options to resolve the conflict:


  commit 1a: fileA @ 1.1
  commit 2:  fileA @ 1.2, fileB @ 1.1
  commit 1b: fileA @ 1.2

Or:

  commit 2a: fileB @ 1.1
  commit 1: fileA @ 1.1, fileB @ 1.2
  commit 2b: fileB @ 1.2

(Note that often enough, these have actually been separate commits in  
CVS as well, there's just no way to represent that. And no, timestamps  
are simply not reliable enough).


Now add tags, branches and cyclic dependencies involving many files  
and many 100 commits to the example above and you start to get an idea  
of the complexity of the problem in general.


See my description and diagrams of the steps used for cvs_import in  
monotone at [1] or follow descriptions of how cvs2svn works internally.


A few numbers about a conversion I'm trying for testing my algorithm  
and heuristics. It's converting a pretty recent snapshot of the  
Postgres repository:


 * running at 100% CPU time since: April, 17
 * Total number of files involved: 6'847
 * total number of blobs (before splitting): 28'010
 * blobs split due to cyclic dependencies: 12'801

Admittedly, my algorithm isn't optimized at all. However, I'm focusing  
on good results rather than speed of conversion.


Also note, that monotone uses SQLite, so it actually stores the  
results of this conversion in an SQL database, as you proposed.  
Recently, a git_export command has been added, so that's definitely  
worth a try for converting CVS to git. However, I fear cvs2git is more  
mature.


Regards

Markus Wanner

[1]: a description of the various steps in conversion from CVS to monotone:
http://www.monotone.ca/wiki/CvsImport/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-29 Thread Peter Eisentraut

On Thursday 28 May 2009 20:03:38 Stephen Frost wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
  Right.  Shall we try to spec out exactly what our conversion
  requirements are?  Here's a shot:

 [...]

  Comments?  Other considerations?

 Certainly sounds reasonable to me.  I'd be really suprised if that's
 really all that hard to accomplish.  I'd be happy to help with some
 testing too if we feel that the current git repo is in reasonable shape
 to do that testing against (or someone has another).

Sounds like writing a comprehensive test suite against Tom's spec would be the 
first step.  And then this test suite can be run against various conversion 
tools and configurations thereof.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-29 Thread Robert Haas

On Fri, May 29, 2009 at 2:41 AM, Markus Wanner mar...@bluegap.ch wrot Hi,
 Quoting Robert Haas robertmh...@gmail.com:
 Why is this harder than I think it is?

 One of the simplest possible example is something like:

Thanks for the explanation, I understand it better now.  I'm still
dismayed, but at least I know why I'm dismayed.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-29 Thread Markus Wanner


Hi,

Quoting Aidan Van Dyk ai...@highrise.ca:

Ok, so seeing the interest in having a good conversion, I took a stab at
parsecvs this afternoon, probably what I consider the leading static
conversion tool.


Here are some results from a conversion with cvs2git.


It takes about 10 minutes to run my old xeon.


The conversion with cvs2git certainly took a bit longer, however, I  
don't think that matters at all. Everything below a day or two is good  
enough, IMO. What counts is the result.


The first step is running cvs2git itself:

cvs2svn Statistics:
--
Total CVS Files:  6873
Total CVS Revisions:140191
Total CVS Branches:  36057
Total CVS Tags: 457515
Total Unique Tags: 171
Total Unique Branches:  21
CVS Repos Size in KB:   377337
Total SVN Commits:   32889
First Revision Date:Tue Jul  9 08:21:07 1996
Last Revision Date: Thu May 28 22:02:10 2009

(number of files matches pretty well with my own algorithm, however,  
total svn commits is a bit lower, compared to the ~ 40'000 blobs I got).


The output of cvs2git can then be imported with git fast-import:

git-fast-import statistics:
-
Alloc'd objects: 35
Total objects:   349405 ( 19563 duplicates  )
  blobs  :   132672 (  3255 duplicates 119032 deltas)
  trees  :   183967 ( 16308 duplicates 165582 deltas)
  commits:32766 ( 0 duplicates  0 deltas)
  tags   :0 ( 0 duplicates  0 deltas)
Total branches: 194 (   664 loads )
  marks: 1073741824 (168693 unique)
  atoms:   5280
Memory total: 16532 KiB
   pools:  2860 KiB
 objects: 13671 KiB
-
pack_report: getpagesize()=   4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit  = 8589934592
pack_report: pack_used_ctr= 124414
pack_report: pack_mmap_calls  =   3674
pack_report: pack_open_windows=  1 /  1
pack_report: pack_mapped  =  199500913 /  199500913
-


The resulting repository contains the following branches. The  
unlabeled ones contain only 1-2 files and seem rather irrelevant. In a  
next try, I'd disable their creation completely, just wanted to check.


  REL2_0B
  REL6_4
  REL6_5_PATCHES
  REL7_0_PATCHES
  REL7_1_STABLE
  REL7_2_STABLE
  REL7_3_STABLE
  REL7_4_STABLE
  REL8_0_0
  REL8_0_STABLE
  REL8_1_STABLE
  REL8_2_STABLE
  REL8_3_STABLE
  Release_1_0_3
  WIN32_DEV
  ecpg_big_bison
* master
  unlabeled-1.44.2   - from src/backend/commands/tablecmds.c
  unlabeled-1.51.2   - from src/test/regress/expected/alter_table.out
  unlabeled-1.59.2   - from src/backend/executor/execTuples.c
  unlabeled-1.87.2   - from src/backend/executor/nodeAgg.c
  unlabeled-1.90.2   - from src/backend/parser/parse_target.c and
 src/backend/access/common/tupdesc.c

Comparison of the head of each branch between git and CVS (modulo CVS  
keyword expansion, which I've filtered out):


ecpg_big_bison.diff:  0 files changed
master.diff:  0 files changed
REL2_0B.diff: 0 files changed
REL6_4.diff:  0 files changed
REL6_5_PATCHES.diff:  0 files changed
REL7_0_PATCHES.diff:  0 files changed
REL7_1_STABLE.diff:   0 files changed
REL7_2_STABLE.diff:   0 files changed
REL7_3_STABLE.diff:   0 files changed
REL7_4_STABLE.diff:   0 files changed
REL8_0_0.diff:0 files changed
REL8_0_STABLE.diff:   0 files changed
REL8_1_STABLE.diff:   0 files changed
REL8_2_STABLE.diff:   0 files changed
REL8_3_STABLE.diff:   0 files changed
Release_1_0_3.diff:   0 files changed
WIN32_DEV.diff:   0 files changed

I plan to compare the tags as well and test what branch they are in,  
but so far cvs2git seems to hold its promises. I'll report back again  
within the next few days.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-29 Thread Aidan Van Dyk

* Markus Wanner mar...@bluegap.ch [090529 11:06]:
 Hi,

 Comparison of the head of each branch between git and CVS (modulo CVS  
 keyword expansion, which I've filtered out):

How did you filter it out, and without the filtering out, how does it
do?

 I plan to compare the tags as well and test what branch they are in, but 
 so far cvs2git seems to hold its promises. I'll report back again within 
 the next few days.

It definitely seems to have figured out the REL8_0_0 confusing that
tripped up parsecvs.  If I'm stuck on another windows project some time
in the near future, I'll try and look into why parsecvs trips up on
those 3 files from REL8_0_0 branch ;-)

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-29 Thread Markus Wanner


Hi,

Quoting Aidan Van Dyk ai...@highrise.ca:

* Markus Wanner mar...@bluegap.ch [090529 11:06]:

Comparison of the head of each branch between git and CVS (modulo CVS
keyword expansion, which I've filtered out):


How did you filter it out


With perl some regexes.


and without the filtering out, how does it do?


Uh.. why is that of interest? With content hashing, these keywords do  
more harm than good.


I'd have to check again, but there certainly are differences here and there.

Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-29 Thread Aidan Van Dyk

* Markus Wanner mar...@bluegap.ch [090529 11:18]:
 Hi,

 Quoting Aidan Van Dyk ai...@highrise.ca:
 * Markus Wanner mar...@bluegap.ch [090529 11:06]:
 Comparison of the head of each branch between git and CVS (modulo CVS
 keyword expansion, which I've filtered out):

 How did you filter it out

 With perl some regexes.

 and without the filtering out, how does it do?

 Uh.. why is that of interest? With content hashing, these keywords do  
 more harm than good.

Yes, but the point is you want an exact replica of CVS right?  You're
git repo should have $PostgreSQL$ and the cvs export/checkout (you do
use -kk right) should also have $PostgreSQL$.

The 3 parsecvs errors were that it *didn't* recognoze the strange
$PostgreSQL ... Exp $ expansion that cvs did.

But it's important, because on *some* files you *do* want expanded
keywords (like the $OpenBSD ... Exp $.  One of the reasons pg CVS went
to the $PostgreSQL$ keyword (I'm guessing) was so they could explictly
de-couple them from other keywords that they didn't want munging on.

So, I wouldn't consider any conversion good unless it had all these:
parsecvs-master:contrib/pgcrypto/crypt-des.c: * $FreeBSD: 
src/secure/lib/libcrypt/crypt-des.c,v 1.12 1999/09/20 12:39:20 markm Exp $
parsecvs-master:contrib/pgcrypto/crypt-md5.c: * $FreeBSD: 
src/lib/libcrypt/crypt-md5.c,v 1.5 1999/12/17 20:21:45 peter Exp $
parsecvs-master:contrib/pgcrypto/md5.c:/*  $KAME: md5.c,v 1.3 
2000/02/22 14:01:17 itojun Exp $ */
parsecvs-master:contrib/pgcrypto/md5.h:/*  $KAME: md5.h,v 1.3 
2000/02/22 14:01:18 itojun Exp $ */
parsecvs-master:contrib/pgcrypto/rijndael.c:/*  $OpenBSD: rijndael.c,v 
1.6 2000/12/09 18:51:34 markus Exp $ */
parsecvs-master:contrib/pgcrypto/rijndael.h: *  $OpenBSD: rijndael.h,v 
1.3 2001/05/09 23:01:32 markus Exp $ */
parsecvs-master:contrib/pgcrypto/sha1.c:/* $KAME: sha1.c,v 1.3 
2000/02/22 14:01:18 itojun Exp $*/
parsecvs-master:contrib/pgcrypto/sha1.h:/* $KAME: sha1.h,v 1.4 
2000/02/22 14:01:18 itojun Exp $*/
parsecvs-master:contrib/pgcrypto/sha2.c:/*  $OpenBSD: sha2.c,v 1.6 
2004/05/03 02:57:36 millert Exp $*/
parsecvs-master:contrib/pgcrypto/sha2.h:/*  $OpenBSD: sha2.h,v 1.2 
2004/04/28 23:11:57 millert Exp $*/
parsecvs-master:src/backend/port/darwin/system.c: * $FreeBSD: 
src/lib/libc/stdlib/system.c,v 1.6 2000/03/16 02:14:41 jasone Exp $
parsecvs-master:src/port/crypt.c:/* $NetBSD: crypt.c,v 1.18 
2001/03/01 14:37:35 wiz Exp $   */
parsecvs-master:src/port/crypt.c:__RCSID($NetBSD: crypt.c,v 1.18 
2001/03/01 14:37:35 wiz Exp $);
parsecvs-master:src/port/qsort.c:/* $NetBSD: qsort.c,v 1.13 
2003/08/07 16:43:42 agc Exp $   */
parsecvs-master:src/port/qsort_arg.c:/* $NetBSD: qsort.c,v 1.13 
2003/08/07 16:43:42 agc Exp $   */
parsecvs-master:src/port/strlcat.c: *   $OpenBSD: strlcat.c,v 1.13 
2005/08/08 08:05:37 espie Exp $  */
parsecvs-master:src/port/strlcpy.c:/*   $OpenBSD: strlcpy.c,v 1.11 
2006/05/05 15:27:38 millert Exp $*/

As well as stuff like:
parsecvs-master:src/backend/access/index/genam.c: *   $PostgreSQL$
parsecvs-master:src/backend/access/index/indexam.c: * $PostgreSQL$
parsecvs-master:src/backend/access/nbtree/Makefile:#$PostgreSQL$
parsecvs-master:src/backend/access/nbtree/README:$PostgreSQL$
parsecvs-master:src/backend/access/nbtree/nbtcompare.c: * 
$PostgreSQL$
parsecvs-master:src/backend/access/nbtree/nbtinsert.c: *  
$PostgreSQL$
parsecvs-master:src/backend/access/nbtree/nbtpage.c: *$PostgreSQL$
parsecvs-master:src/backend/access/nbtree/nbtree.c: * $PostgreSQL$
parsecvs-master:src/backend/access/nbtree/nbtsearch.c: *  
$PostgreSQL$

Basically, identical what to a cvs export/checkout/update gives you with
a -kk.

But I'm picky ;-)

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-29 Thread Alvaro Herrera

Aidan Van Dyk wrote:

 Yes, but the point is you want an exact replica of CVS right?  You're
 git repo should have $PostgreSQL$ and the cvs export/checkout (you do
 use -kk right) should also have $PostgreSQL$.
 
 The 3 parsecvs errors were that it *didn't* recognoze the strange
 $PostgreSQL ... Exp $ expansion that cvs did.

Huh, no -- I agree that $OpenBSD$ etc should remain (we don't munge them
anyway), but $PostgreSQL$, $Id$, $Revision$ etc tags are best gone
because, as Markus says, their expansion interferes with content hashing.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-29 Thread Alvaro Herrera

Tom Lane escribió:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Tom Lane escribi�:
  What was in the back of my mind was that we'd go around and mass-remove
  $PostgreSQL$ (and any other lurking tags), but only from HEAD and only
  after the repo conversion.  Although just before it would be okay too.
 
  You mean we would remove them from CVS?  I don't think that's
  necessarily a good idea; it'd be massive changes for no good reason.
 
 Uh, how is it different from any other mass edit, such as our annual
 copyright-year updates, or pgindent runs?

Well, the other mass edits have a purpose.  This one would be only to
help the migration.

  My idea was to remove them from the repository that would be used for the
  conversion (I think that means editing the ,v files),
 
 Ick ... I'm willing to tolerate a few small manual ,v edits if we have
 to do it to make tags consistent or something like that.  I don't think
 we should be doing massive edits of that kind.

Yeah, that idea wasn't all that great after all.

 But anyway, that's not the interesting point.  The interesting point is
 what about the historical aspect of it, not whether we want to dispense
 with the tags going forward.  Should our repo conversion try to
 represent the historical states of the files including the tag strings?

Since we're going to lose them functionally after the conversion, it
doesn't seem that they serve any purpose.  After all, they will not
represent anything on the new repository.

The problem is that they are a problem for the conversion.  Are they
expanded before or after the commit?  Because the very expansion causes
the file to change identity, files being identified by the SHA1 sum of
their contents.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-29 Thread Aidan Van Dyk

* Alvaro Herrera alvhe...@commandprompt.com [090529 11:45]:
 Aidan Van Dyk wrote:
 
  Yes, but the point is you want an exact replica of CVS right?  You're
  git repo should have $PostgreSQL$ and the cvs export/checkout (you do
  use -kk right) should also have $PostgreSQL$.
  
  The 3 parsecvs errors were that it *didn't* recognoze the strange
  $PostgreSQL ... Exp $ expansion that cvs did.
 
 Huh, no -- I agree that $OpenBSD$ etc should remain (we don't munge them
 anyway), but $PostgreSQL$, $Id$, $Revision$ etc tags are best gone
 because, as Markus says, their expansion interferes with content hashing.

I *think* you're actually agreeing with me.  *Hiding* the diffs that
include munching of keywords is not what we want.  We want the
conversion to *not* munge keyword-like things (No, $OpenBSD$ is *not*
a keyword in the PostgreSQL CVS repository.  But $PostgreSQL$ *is*.

So we want the conversion to be identical to:
 cvs export -kk -r $tag

That will have *keywords* be unexpanded; namely these specific ones:
Author
Date
Header
Id
Locker
Log
Name
RCSfile
Revision
Source
State
PostgreSQL
but *not* keyword-like entries, like:
$ NetBSD ... Exp $
$ FreeBSD ... Exp $
$ OpenBSD ... Exp $
$ KAME ... Exp $
which are *not* CVS keywords in the PostgreSQL repository.  

i.e. Just like I said, identical to cvs checkout/export -kk.


Now, and intersting question, do you want the perfect conversion to
contain *other* keyword un-expansion possiblities that would have happened
on any commits on Nov 29/30 2003 when CVSROOT/options contained:
+tagexpand=iPostgreSQL
If you had checked out something on that day, even with a -kk, $Log$
would have been expanded, because for that day, $Log$ was *not* an
eligable keyword on the PostgreSQL CVS repository.

Whooee... Fun with CVS history

a.


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-28 Thread Markus Wanner


Hi,

Quoting Marc G. Fournier scra...@hub.org:

Please repost ...


Peter referred to this message here:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg01879.php

However, please be cautious before applying such a patch.

Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-28 Thread Markus Wanner


Hi,

Quoting Marc G. Fournier scra...@hub.org:
Actually, I have done that on at least one of the 8.x tags too, so  
if that is it, more then those two tags should be causing issues ...


Not *every* such issue causes problems. An example that's perfectly fine:

 cvs commit -m first commit fileA
 cvs tag TEST filA
 cvs commit -m second commit fileB
 cvs tag TEST fileB

In such a situation, a converter can easily push-down the tag TEST  
to the second commit, because fileA is the same (in that revision) as  
after the first commit. After all, the results in the RCS files are  
exactly the same as if you did the following:


 cvs commit -m first commit fileA
 cvs commit -m second commit fileB
 cvs tag TEST fileA fileB

A converter can't possibly distinguish these two.

However, if both files get committed the second time, but only one  
gets tagged, it gets problematic (always assuming the commit actually  
changes the file):


 cvs commit -m first commit fileA
 cvs tag TEST filA
 cvs commit -m second commit fileA fileB
 cvs tag TEST fileB

That's perfectly valid from CVS's point of view, unwanted for the  
Postgres repository and hard to handle for a converter to git (or  
mercurial, monotone, etc..), because the tag TEST is on the first  
commit for fileA but on the second for fileB, while both of fileA and  
fileB differ between the commits.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-28 Thread Markus Wanner


Hi,

Quoting Robert Haas robertmh...@gmail.com:

I think this is a semantic argument.  The problem isn't that we don't
understand how CVS behaves; it's that we find that behavior
undesirable


I fully agree to that and find it undesirable as well.


aka broken.


Well, for some it's a feature, for others a bug ;-)

My point was that other converters have better support for such  
(undesirable, but still existent) tags that span multiple commits. If  
that's unwanted anyway, it seems cleaner to fix the CVS repository,  
yes. Has that been done now? Or is somebody going to do it? (See  
Peter's patch he just linked again upthread).



If we really care about having a tag that
contains the exact files that are tagged in CVS, we can create a
branch from one of the commits involved, and then apply a commit to
that branch that places it in the state that matches the contents of
the CVS tag.


Exactly (with the difference that with the branch you preserve the  
history of changes, while the variant with the tag does not).



AIUI, this is not very different from what you'd have to
do in Subversion, where a tag is a branch is a copy.


I think so, too. I'd even state that subversion doesn't really support  
tagging, instead it simulates tags with branches.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-28 Thread Aidan Van Dyk

* Robert Haas robertmh...@gmail.com [090527 22:43]:
 On Wed, May 27, 2009 at 10:09 PM, Aidan Van Dyk ai...@highrise.ca wrote:
  * Robert Haas robertmh...@gmail.com [090527 21:30]:
 
   And actually looking at the history of the gpo repo, the branches are all
   messed up with merges and stuff that I'm not sure where they are coming
   from...  8.2, 8.3, and master(HEAD) are all the same as my gpo repo, but 
   the
   back branchs are very bad...
 
  This is really quite horrible.  What is the best way forward here?
 
  That depends entirely on what the project wants.
 
 I can't speak for anyone else, but what I want is for the git tree on
 git.postgresql.org to match CVS.

Well, sure, but I think the way forward part implied recognition that
the current tree at git.postgresql.org *doesn't* match CVS very closely
(for back branches), and that people currently rely on it and use it.

So, again, the answer to the question really does depend on what the
canonical VCS of the project is.  As of now, it's *still* CVS, and
those using either git repo can still develop and submit patches to CVS
easily.

When the project switches, there will probably need to be a more
canonical conversion, with one of the tools that doesn't support
incremental imports, and then people will have to adjust their current
repo with any of rebase/graft/filter-branch to adjust their work
history onto the official tree...

All that based on the assumption that when the project switches to git,
they actually want all the CVS history in their official tree.  Its
certainly not necessary, and possibly not even desirable...  PostgreSQL
could just as easily to a linus style switch when they switch to git,
and just import the latest release in each branch as the starting
point for each branch.  The git repository will have no history, and
people can choose which history they want to graft in...  CVSROOT can be
made available as a historical download.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-28 Thread Robert Haas

On Thu, May 28, 2009 at 8:59 AM, Aidan Van Dyk ai...@highrise.ca wrote:
 All that based on the assumption that when the project switches to git,
 they actually want all the CVS history in their official tree.  Its
 certainly not necessary, and possibly not even desirable...  PostgreSQL
 could just as easily to a linus style switch when they switch to git,
 and just import the latest release in each branch as the starting
 point for each branch.  The git repository will have no history, and
 people can choose which history they want to graft in...  CVSROOT can be
 made available as a historical download.

That would suck for me.  I use git log a lot to see how things have
changed over time.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-28 Thread Aidan Van Dyk

* Robert Haas robertmh...@gmail.com [090528 09:49]:
 On Thu, May 28, 2009 at 8:59 AM, Aidan Van Dyk ai...@highrise.ca wrote:
  All that based on the assumption that when the project switches to git,
  they actually want all the CVS history in their official tree.  Its
  certainly not necessary, and possibly not even desirable...  PostgreSQL
  could just as easily to a linus style switch when they switch to git,
  and just import the latest release in each branch as the starting
  point for each branch.  The git repository will have no history, and
  people can choose which history they want to graft in...  CVSROOT can be
  made available as a historical download.
 
 That would suck for me.  I use git log a lot to see how things have
 changed over time.

No, the whole point is that you graft whatever history *you* want in...
So if PostgreSQL offical git only starts when the offical VCS was in
git, you graft on gpo, or git, or some personal one-time cvs2git or
parsecvs history you want in...

It would be the projects way of saying basically None of the current
cvs imports are perfect and we recognize that.  So we're starting fresh,
use whatever historical cvs import *you* find best for your history and
graft it in.   Just the linux kernel has a few historical repos
available for people to graft into linus's tree which only started in
2.6.12. 

If you have work that requires the history of the current gpo repo, you
keep using it.  If you have work requring the current git repo, you keep
using it.  If you have no work, but you're a stickler for perfect
imports, you start working on parsecvs and cvs2git, and make a new
history every time you find another quirk...

a.


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-28 Thread Andrew Dunstan




Robert Haas wrote:

On Thu, May 28, 2009 at 8:59 AM, Aidan Van Dyk ai...@highrise.ca wrote:
  

All that based on the assumption that when the project switches to git,
they actually want all the CVS history in their official tree.  Its
certainly not necessary, and possibly not even desirable...  PostgreSQL
could just as easily to a linus style switch when they switch to git,
and just import the latest release in each branch as the starting
point for each branch.  The git repository will have no history, and
people can choose which history they want to graft in...  CVSROOT can be
made available as a historical download.



That would suck for me.  I use git log a lot to see how things have
changed over time.


  


Indeed. Losing the history is not an acceptable option.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-05-28 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 Robert Haas wrote:
 That would suck for me.  I use git log a lot to see how things have
 changed over time.

 Indeed. Losing the history is not an acceptable option.

I think the same.  If git is not able to maintain our project history
then it is not mature enough to be considered as our official VCS.
This is not a negotiable requirement.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 156 matches

Mail list logo