Robert Haas <robertmh...@gmail.com> writes: > 1. The new conversion seems to have stolen the apostrophe from "D'Arcy > J.M. Cain <da...@druid.net>", rendering him "DArcy J.M. Cain > <da...@druid.net>".
Yeah, I see that too. It's probably bad input rather than the converter's fault ;-) > 2. Any non-ASCII characters in, for example, contributor's names show > up differently in the two repos. Generally, the original repo is OK > and the new repo is garbled; although I found one very old example > that went the other way. What it looks like to me is that a Latin1->UTF8 conversion has been applied to the log text. Which might be a good idea if it all *was* Latin1, but a fair-sized percentage isn't. Applying this conversion to UTF8 entries results in garbage, of course. Even if this could be done reliably, I think this counts as editorializing on the historical record, and should be switched off if possible. > There are also a number of commits that differ in order between the > two repos, and an even larger number where commits are duplicated or > merged in one repository relative to the other. I suspect that this is an artifact of the converter trying to merge nearby commits into one commit, which it more or less *has* to do for sanity since CVS commits aren't atomic. I don't have a problem with the concept, but I notice cases where the converted commit has a timestamp some minutes later than what the cvs2cl output claims. I suspect this is what the converter was using as a cutoff time. Would it be possible to make sure that the converted commit is always timestamped with the latest individual file update timestamp from the included CVS commits? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers