RE: binary files bad idea? why?
--- Forwarded mail from Greg Woods: [ On Tuesday, July 6, 2004 at 15:18:50 (-0700), Paul Sander wrote: ] Subject: RE: binary files bad idea? why? And BTW, you keep waving over RCS compatability with arguments about minimum reproducibility that simply do not wash. RCS compatabiltiy means _all_ of RCS' features, warts, and concepts. More about the reasons for this below. As long as the rcsfile(5) specification is met, then all of RCS' features, warts, and concepts will follow. That specification also allows specifically for extensions to be introduced in particular ways, and RCS is written to accomodate such extensions by ignoring them while allowing other tools that scan RCS to store their own semantics. There is absolutely no reason why such tools should not be written, and any tool that cannot work in such an environment is inherently broken because it violates the rcsfile specification. In the mean time, and in the real world, RCS (and thus CVS) and all the tools they use and are used with them work _best_ and primarily with text files. I.e. until someone provides working code that makes the diff/diff3 toolset used _internally_ in CVS (_and_ RCS) to be selectable (on a per-revision basis), there's no point to even pretending that non-text files can be handled generically by CVS. This is nuts. Differencing algorithms that rely on longest common substrings will remain the best algorithms for storing deltas for a long time to come, regardless of the type of data being stored. Applying a different algorithm for every revision just won't go. And BTW, the point about on a per-revision basis is/was supposed to be a strong clue to you to show just how hair-brained and nearly impossible to achieve your ideas are. It's also a _necessary_ requirement for both RCS and CVS, which manage _files_ and groups of _files_, but not the structure of the grouping. You keep making the mistake of claiming that the differencing algorithm that computes the deltas, and the differencing and merge tools that manipulate and present the content of the revisions are inextricably linked. They are not. However, I agree that in that context, the differencing and merge tools must be compatible with the content of every revision stored in an RCS file. The easiest way to do that is to make sure that every revision has the same kind of data. The problem is that CVS doesn't make such a guarantee at this time. For a very long time I have argued that CVS should record in the admin section of every RCS file the type of data stored in it, and that CVS should poll that data type to select the proper tools to apply to provide the diff and merge capabilities. This is one way to guarantee that all of the revisions, and therefore all of the combinations of data fed to the tools, are indeed compatible. The there's a price, and that is that no position in the filesystem can ever be reused for a different type of data than has ever been stored there before. That means e.g. that README files can't change formats between plain text, rich text, MS Word, or whatever formats. This is clearly an unacceptable condition as long as software developers wish to evolve their designs. It is possible to meet both requirements: Guarantee every revision in an RCS file to contain the same data type, and allow the users to make arbitrary changes to their source trees. And the way to do that is to change the way that CVS maps files in the user's sandbox to the RCS files in the repository, so that at any given time the working file maps to the correct RCS file but the correct RCS file may be different at different points in the project's lifetime. That same change also enables other things, like the ability to genuinely rename a file. Therefore, to accomodate multiple data types, it is in fact a _necessary_ requirement for CVS to track the file structure in addition to the content of each file. The main idea of change management is to capture and identify _changes_, not to record exact replicas of specific revisions across time. The latter comes from the former, not the other way around. Changes are best specified as the the edit actions that were done to make them. Why do you think it is that deltas are stored as edit scripts in both RCS and SCCS files? I'll tell you for certain it wasn't just because there were already well known algorithms (and ready implemented tools) to create and make use of those edit scripts (though that was of course a big part of it). First of all, the motivation for storing deltas was for storage efficiency. This goes all the way back at least to Brooks (The Mythical Man Month). Second, SCCS doesn't store edit strings; it stores interleaved deltas which are more akin to #ifdef constructions. The deltas stored in RCS are not the same as the edit actions that the user took. In the case of RCS, they're approximations, but when examined in a context that understands the semantics of the content, e.g. a C
Re: binary files bad idea? why?
--- Forwarded mail from [EMAIL PROTECTED] [ On Thursday, July 8, 2004 at 23:01:09 (-0700), Mark D. Baushke wrote: ] Subject: Re: binary files bad idea? why? IF we assume that the 'cvs update' of a particular file in a user's sandbox needs to do a three-way merge (checked-out version, latest-version and locally modified version) AND we assume that there is a hint for the CVS server to use some program that looks just like diff3 as to arguments, but (possibly) interprets (say a canonical HTML structure ignoring whitespace) the file differently than the default diff3, AND the diff3-like-progam for the checked-out version and the latest-version specifies the same diff3-like program, THEN Paul's request for an extension seems reasonable to allow this kind of an extension. Except those assumptions in total are bogus (and unrealistic), and they do not leave one with a true RCS-compatible repository either. Mark's first assumption is totally reasonable, and it matches perfectly the common usage model of CVS. The second assumption can be implemented by storing the hint in a newphrase in the RCS file. I challenge you to find a situation in which adding a hint in this way breaks RCS compatibility. Chances are that the breakage willbe in some third party tool that doesn't understand newphrases. In that event, it's the third party tool that break RCS compatibility, and you can't lay the blame on CVS. Remember the whole point of RCS compatability is to be compatible with other tools that understand and use the RCS ,v format. It's not just a convenient delta compression mechanism. However the particular form of delta compression used universally in the RCS ,v format is integral to everything I know of which would rely on RCS compatability. Okay, how does adding a newphrase break other tools that rigidly adhere to the rcsfile specification? Also within the architecture of CVS it's totally bogus, stupid, and very short-sighted, to go blindly off and invent yet another ugly brain- damaged hack that doesn't fully account for the fact that some signifiant number of files' internal structure type (for lack of a more succinct term) _will_ change over time in any sizable project. The reason that the internal structure of a file changes over time is because CVS makes no guarantee that every version of a file has the same type over its lifetime. There are ways to make such guarantees, but at present they're limited to adoption of crippling policies or changing the CVS implementation. BTW, it turns out that the specific change that makes this guarantee also fixes other problems in CVS. CVSwrappers is bad enough for this reason alone already (never mind the other brain-damage it implies) and luckily it's not used by many otherwise sane people. Any extension mechanism _MUST_ be per-delta, but of course that goes against the very nature of RCS (and there are already a vast number of attributes which are not per-revision but should be to get to this level of flexibility). It could be per-delta, but in my opinion that is a poor implementation. The guarantee of one data type per RCS files makes a whole lot of problems of this nature just disappear because all possible combinations of data applied to the extensions match. I believe that this is very much in line with the RCS way. Just take a look at keyword expansion if you want a built-in feature that suffers the same problem. It also only works if the data type is similar for every revision. --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/info-cvs
Re: binary files bad idea? why?
[ On Monday, July 12, 2004 at 17:10:46 (-0700), Mark D. Baushke wrote: ] Subject: Re: binary files bad idea? why? It is a pity that you didn't bother to read what I wrote and instead ranted on a question that was not asked. Have you been ill? If so, I am sorry to hear it, please get well soon. No, the real problem is you're beginning to get the same disease Paul has suffered from for so many years. You're not seeing the forest for the trees. You concocted a detailed micro-example to illustrate your point without any consideration to the higher level concepts involved or even any analysis of how your example might actually relate to higher level concepts and requirements. I don't think you're even paying proper and full attention to some of the underlying reasons for doing change control in the first place. You're clearly not yet grasping the full impact of what RCS compatability really means and why it is so important to CVS users (not CVS necessarily, but certainly CVS users whether they appreciate it yet or not). You seem to sometimes mime the idea of RCS compatability but you clearly haven't integrated it into your thoughts enough that you can see when and how a propsal wil interfere with, or even completely break, it. Perhaps all of this is my fault for not writing more lucid and detailed descriptions of the ideas I'm trying to get across, but there are only so many hours in the day and even fewer that I can use to enjoy in these pursuits of intellectual discussion in public forums such as this one. Your entire reply to my previous message did not address any of the points or topic of my message That's because what you (and Paul) are suggesting is what can only be called a hack (and in my mind an ugly one at that) which goes in entirely the wrong direction for all these underlying reasons of why any sufficiently aware person would choose to use CVS (or any other RCS-compatible) change control tool in the first place. I'm not going to get sucked into debating artificially concocted examples that ignore the bigger conceptual picture and which also ignore a great many other lower level details as well. I've been trying to _raise_ the level of discussion up to the concepts and requirements where it _must_ occur before anyone can do any sensible functional design or implementation (such as your contrived example attempted to do). If you are not going to even bother to read what I write and not try to read between the lines, you are going to hurt yourself and burst a blood vessel or something... If you are not going to even bother to try to read what _I_ have written and to try to grasp the basic fundamental concepts I'm trying to relate to CVS and to how CVS is, and could be, used, then we're not going to progress at all. You're focusing on the details necessary to prop up your argument and I'm just not going to descend to the level of discussing such details untill well after everyone's come to a consensus about how things can and should work at a conceptual level. Why don't you have a peek again at the discussion on effective ways to use CVS with (La)TeX (or troff or lout or texinfo or whatever) documentation. Embedded in that thread is a hint to just how important it is to work _with_ the ubiquitous nature of line-oriented diff/diff3 algorithms. Stepping back a tiny bit from that thread and considering all the unerlying reasons of why one does change control in such small increments as something like CVS encourages will hopefully also let you get at least a tiny hint of why it's important with any RCS-compatible change control tool that the delta format inside the RCS files be directly and intimately related to the format users see when they do diffs and merges. (hint: unnecessary re-filling of paragraphs, or changes to whitespace, makes diff (and thus RCS and thus CVS) treat any file in a more ``binary'' fashion than necessary, no matter how fundamentally text (line oriented) it appears to be -- I learned this way back in the very early 1980's and I thought this was now such a mantra amongst users of all version control tools that it didn't need saying any more, but obviously that's not yet true) Now if you (or Paul) personally don't want to use an RCS-compatible repository for whatever reason(s) then you don't have to -- as you full well know there are quite a good number of other tools out there already that use other database formats that are more effective for the purposes of pure delta compression (e.g. xdelta) and which always go to the trouble of re-creating all file revisions every time they're needed in order to do any and all presentation-level diffing and merging. One of those tools might already have the capability of using user-specified diff and merge tools for a specified file, file type, or revision set or whatever is appropriate for their model However keep in mind that CVS is today an RCS-compatible change control tool and that this relationship
Re: binary files bad idea? why?
[ On Thursday, July 8, 2004 at 23:01:09 (-0700), Mark D. Baushke wrote: ] Subject: Re: binary files bad idea? why? IF we assume that the 'cvs update' of a particular file in a user's sandbox needs to do a three-way merge (checked-out version, latest-version and locally modified version) AND we assume that there is a hint for the CVS server to use some program that looks just like diff3 as to arguments, but (possibly) interprets (say a canonical HTML structure ignoring whitespace) the file differently than the default diff3, AND the diff3-like-progam for the checked-out version and the latest-version specifies the same diff3-like program, THEN Paul's request for an extension seems reasonable to allow this kind of an extension. Except those assumptions in total are bogus (and unrealistic), and they do not leave one with a true RCS-compatible repository either. Remember the whole point of RCS compatability is to be compatible with other tools that understand and use the RCS ,v format. It's not just a convenient delta compression mechanism. However the particular form of delta compression used universally in the RCS ,v format is integral to everything I know of which would rely on RCS compatability. If you want to just compress deltas efficiently regardless of the internal structure of the files being versioned then you should use xdelta and give up entirely on the notion of RCS compatability. That way lies ultimate flexibility, but of course it's also the path to ever more waste of computing resources necessary to reconstruct deltas in more sensible forms for both human _and_ computer consumption, wether that's the traditional diff/patch style or some other representation suitable for non-text files, _every_ time they're needed. Also within the architecture of CVS it's totally bogus, stupid, and very short-sighted, to go blindly off and invent yet another ugly brain- damaged hack that doesn't fully account for the fact that some signifiant number of files' internal structure type (for lack of a more succinct term) _will_ change over time in any sizable project. CVSwrappers is bad enough for this reason alone already (never mind the other brain-damage it implies) and luckily it's not used by many otherwise sane people. Any extension mechanism _MUST_ be per-delta, but of course that goes against the very nature of RCS (and there are already a vast number of attributes which are not per-revision but should be to get to this level of flexibility). The lack of support for a per-delta newphrase that tells some version of CVS to use this other diff3 equivalent would not impact RCS nor would it impact older versions of CVS. That's not the point -- why would anyone be bothering to use a change control tool in the first place if all they want are archives of revisions with delta compression for storage efficiency?!?!?!? The whole point of doing change control is to capture (and be able to reproduce, undo, copy, merge, etc.) the essense of the changes made, not just to archive various revisions. While one can pretend to do this by reconstructing them every time using the right tool and re-extracted copies of the right revisions, that's (politely speaking) not a very productive direction to go in when one is still using RCS in the back-end. As you full well know there are ample other available tools which are better suited to doing this already too (and one widely used delta compression algorithm to bind many of them together, conceptually at least, if not with repository compatability :-). IMVNSHO (and it has always been so) CVS could and should be making better and more efficient and more effective use of the deltas stored in RCS files, in their direct native format; not less. -- Greg A. Woods +1 416 218-0098 VE3TCPRoboHack [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED] Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/info-cvs
Re: binary files bad idea? why?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Greg, It is a pity that you didn't bother to read what I wrote and instead ranted on a question that was not asked. Have you been ill? If so, I am sorry to hear it, please get well soon. Let me put this in simple terms via an exmaple. I was asking if an automated FORM of the following commands (note that locking issues and file permissions and the like would also need to be handled properly, this is just pseudo-code as an example): Given a checked out tree where filename was originally checked out when base-version was the top of the branch, and for which the latest version is now head-version and for which local modifications have been made, run the commands: cvs update -p -rbase-version filename filename.base-version mv filename .#filename.base-version cvs update filename mv filename filename.head-version diff3-equivalent \ -E -am -L filename -L base-version -Lhead-version -- \ .#filename.base-version filename.base-version \ filename.head-version filename.merged mv filename.merged filename and where diff3-equivalent is some program other than 'diff3', but which accepted idential arguments to diff3. [The above is roughly equivalent to what run_diff3() does in cvs other than that cvs hard-codes the use of the 'diff3' executable instead of allowing for the possibility of using a different executable. Feel free to read the code in rcscmds.c that pertain to this.] The same basic method could be used either to do a simple 'cvs update' on a tree or to do a 'cvs update -jtag1 -jtag2' on a tree. As you should be able to see, there is nothing in the above that is assuming anything about the internal delta formats or any of the other drivel you banged into your keyboard. It has been suggested that the 'diff3-equivalent' command name might be kept in an newphrase section of the delta for each version, but that is not even required for this example code and we could store it outside of the rcs file if there was an excellent reason to do it (even though all of the tools that manipulate rcs files I have been able to find would not have a problem with an extra keyword-value pair). Your entire reply to my previous message did not address any of the points or topic of my message and instead attacked a position I do not hold (by name-calling whoever would hold the positions that were not your own). Honestly, my original message was not intended as flame-bait. If you are not going to even bother to read what I write and not try to read between the lines, you are going to hurt yourself and burst a blood vessel or something... -- Mark Greg A. Woods [EMAIL PROTECTED] writes: [ On Thursday, July 8, 2004 at 23:01:09 (-0700), Mark D. Baushke wrote: ] Subject: Re: binary files bad idea? why? IF we assume that the 'cvs update' of a particular file in a user's sandbox needs to do a three-way merge (checked-out version, latest-version and locally modified version) AND we assume that there is a hint for the CVS server to use some program that looks just like diff3 as to arguments, but (possibly) interprets (say a canonical HTML structure ignoring whitespace) the file differently than the default diff3, AND the diff3-like-progam for the checked-out version and the latest-version specifies the same diff3-like program, THEN Paul's request for an extension seems reasonable to allow this kind of an extension. Except those assumptions in total are bogus (and unrealistic), and they do not leave one with a true RCS-compatible repository either. Your opinion, while strong, does not address the particulars. Remember the whole point of RCS compatability is to be compatible with other tools that understand and use the RCS ,v format. It's not just a convenient delta compression mechanism. However the particular form of delta compression used universally in the RCS ,v format is integral to everything I know of which would rely on RCS compatability. Well, if we were really RCS compatible, we would not have magic branches and CVSNT would not have mergepoint entries in deltas, so the 'whole' point is somewhat less than 'whole'... There are good reasons to follow the RCS format and not be egregiously incompatbile. Nothing that follows the RCS syntax format to add a few newphrase entries could be said to be a tools that is compatible with RCS format. If you want to just compress deltas efficiently regardless of the internal structure of the files being versioned then you should use xdelta and give up entirely on the notion of RCS compatability. That way lies ultimate flexibility, but of course it's also the path to ever more waste of computing resources necessary to reconstruct deltas in more sensible forms for both human _and_ computer consumption, wether that's the traditional diff/patch style or some other representation suitable for non-text files, _every_ time they're needed. I missed the leap
Re: binary files bad idea? why?
Yves == Yves Martin [EMAIL PROTECTED] writes: I.e. it is not possible, by definition, to resolve merge conflicts in any ``binary'' file. Period. Yves Close to the subject, I would like to know how a unicode Yves file should be added in CVS ? Is it OK to add it as a text Yves file ? Unicode is just a character set. You still haven't specified which encoding you use to encode the Unicode text file. UTF-16? UTF-8? UTF-7? -- Lee Sau Dan +Z05biGVm- [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] Home page: http://www.informatik.uni-freiburg.de/~danlee ___ Info-cvs mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
--- Forwarded mail from [EMAIL PROTECTED] [ On Friday, July 2, 2004 at 12:34:42 (-0700), Paul Sander wrote: ] Subject: RE: binary files bad idea? why? --- Forwarded mail from Greg Woods: It is literally _impossible_ to manually resolve (with any degree of correctness) any three way merge with conflicts in any ``binary'' file, regardless of whether it has been encoded as text or not. It IS possible, using a tools that understand the content of the file. I thought we had agreed a half dozen years ago ore more that the definition of binary file as the phrase is usually used in this forum means binary opaque file. I thought you'd at least account for this interpretation if I used double quotes, but clearly you'd rather debate meaningless nonsense regardless. I recall no such agreement. I do agree a fair amount of discussion that makes the distinction between mergeable content and non-mergeable content, where it was agreed that there was a high degree of correlation between binary files and non-mergeable content, but there was also wide acknowledgement that text-based content can be non-mergeable (e.g. uuencoded binaries), and that binary content can be mergeable (e.g. mark-up formats like MS Word). If you can come up with a time frame and subject thread in which such agreement was made, then I'll be happy to review the entire discussion and debate its merits. I.e. it is not possible, by definition, to resolve merge conflicts in any ``binary'' file. Period. If the ``binary'' file is truly opaque, which is to say that you have no knowledge of its structure and can't find a tool that does, then I agree that a merge conflicts are impossible to resolve. However, not all binary files have opaque structure, and I maintain that it is indeed possible to resolve merge conflicts if you know something about their content. While researching your claim of an agreement, I discovered the following: From: Greg A. Woods Subject: RE: cvs update; merge Date: Wed, 29 Aug 2001 16:55:01 -0400 (EDT) [...] Non-mergable files can NEVER be easily and generically handled by CVS as it stands today. Until someone provides working code that makes the diff/diff3 toolset used internally in CVS (and RCS) to be replacable (preferably on a per-revision basis), there's no point to even pretending that presently non-mergable files can be handled generically by CVS. (Of course with a selectable diff/diff3 algorithm you can treat truly binary-opaque files with a copy-to-merge mechanism, though even that's hardly 100% satisfactory.) As for binary from the fopen(3) perspective of stilly petty OS differences in EOL and EOF conventions, well since CVS cannot really properly support non-mergable binary files the only correct thing to do is to always treat all files as text files and to always apply the local libc fopen(foo, *t) conversions for the given client platform (i.e. to always normalize the text stored in the repository into the Unix/ANSI/ISO-C standard format where binary == text and EOL is '\n' and there is no EOF character). [...] Your first sentence in this quote says it all. The code you suggested in the second sentence was posted to this forum just three weeks later. Taking that paragraph in its entirety, it seems almost as if we're in violent agreement. As for the second paragraph, the requirement here is that RCS and CVS faithfully reproduce the version in a way that makes sense for the platform on which the merge tool runs. That means that for text files, the newline conventions must be matched; binary files must not be modified in any way. That problem is already solved, thanks to the RCS keyword expansion capability. After the versions are properly reproduced, the selected merge tool can open the file in any way it deems appropriate. BTW, the removed context of the quotation above has to do with cvswrappers, and the appropriateness of marshalling aggregate data structures in the filesystem in a CVS environment. While that is worthy of further discussion, it's not relevant to this thread. --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
--- Forwarded mail from Greg Woods: [ On Friday, July 2, 2004 at 22:25:27 (-0400), Eric wrote: ] Subject: RE: binary files bad idea? why? At 2:11 PM -0400 7/2/04, Greg A. Woods wrote: Why is it so damn hard for everyone to keep this simple fact in mind? Because it is entirely possible to use CVS in a manner where this simply isn't an issue. Ah ha! So, we finally get back around to this issue! What a long trip it has been! ;-) Now if you remember what I stated at the beginning then you'll realize just exactly why what you've said above is the wrong answer. Don't put binary files into CVS and expect it to work 100%. CVS cannot detect and manage changes in any meaningful way in files that are not organized as lines of text, In CVS' current implementation, this is true. But it is possible to generalize it to support other data types if it can be taught something about the structure of the files it manages. One easy way to do this is to integrate type-specific diff and merge tools that are applied to reconstructed versions of sources. and many of the most common and most important delta management operations that CVS does involve three-way merges of deltas with the hope and expectation of avoiding conflicts in those merges. But this is false no matter how you look at it. Remember that a delta is a specific set of changes that derive one complete version from its immediate predecessor (or in the case of an RCS trunk, its immediate successor). Neither RCS nor CVS use deltas directly in their user-exposed diff and merge features. Instead, they reconstruct entire versions and apply the diff and merge tools to those. There's a reason for this, and the implementation is correct. (It so happens that RCS and CVS use a particular diff tool to create deltas, and they obviously know how to accumulate the deltas to reconstruct specific versions. But those algorithms are for all intents and purposes hidden from the user. At least, they are if the user doesn't review the contents of the repository directly, which is strongly discouraged anyway.) One seldom-changing binary file in a large project (e.g. thevery few found in all of the NetBSD source tree) isn't an issue provided the human management of the project contributors keeps a sharp eye out for problems with these files (e.g. through peer pressure in the NetBSD group, combined with the fact that most/all of those binary files are owned by one developer). However the more binary files your project has, the more times they are changed, the more diverse the working directory hosts, then the more problems binary files will cause if they are committed to a CVS repository. Putting binary files in CVS is a bad idea, always was, and always will be. I.e. unless you have extremely pressing reasons for including binary files in your repository (e.g. as in NetBSD they are very rare and extremely stable and owned by one developer and because NetBSD also strives to use CVS as a source distribution tool), then it's best to use other tools and procedures for managing your binary files outside of CVS. Again, this is all true if the contents of the binary files are opaque. The nature of the binary data that are part of the NetBSD sources appears to be of this nature. But you continue to ignore the situations where the binary files have structure and therefore can be differenced and merged with appropriate tools. Don't put binary files into CVS and expect it to work 100%. For unstructured binary files, this is true. For structured binary files that have effective differencing and merge tools, fix CVS so that it will work 100%. Use the most appropriate tools for the job. We are, but we want them to be better. CVS is not a complete software configuration management system. Nobody's asking it to be. It does version control nothing more and nothing less. Furthermore, CVS does provide some facilities to use it in a non-concurrent manner adding further protections. No, not really -- some of what's there doesn't work right and the rest is a bunch of half-baked add-on hacks that don't meld with the design or goals of CVS and which, as proven by this ever repeating cycle of discussion, causes more confusion and more headaches to naive users than could ever be made back in long-term benefits to anyone. I assume you're talking about cvswrappers here. True, it's a partial solution, but the marshalling capability it supplies for managing aggregate data types is appropriate. Unfortunately, no one has thought the problem through sufficiently to produce a good general solution. On the other hand, features like the modules database (and especially the Checkin.prog file capability built into it), the vendor branch, the history file, and certain other features are much more broken than support for binary files. --- End of forwarded message from [EMAIL PROTECTED
RE: binary files bad idea? why?
[ On Friday, July 2, 2004 at 12:34:42 (-0700), Paul Sander wrote: ] Subject: RE: binary files bad idea? why? --- Forwarded mail from Greg Woods: It is literally _impossible_ to manually resolve (with any degree of correctness) any three way merge with conflicts in any ``binary'' file, regardless of whether it has been encoded as text or not. It IS possible, using a tools that understand the content of the file. Paul you sure like to split hairs and spread confusion to the masses, and far more than you admit to doing. I thought we had agreed a half dozen years ago ore more that the definition of binary file as the phrase is usually used in this forum means binary opaque file. I thought you'd at least account for this interpretation if I used double quotes, but clearly you'd rather debate meaningless nonsense regardless. I.e. it is not possible, by definition, to resolve merge conflicts in any ``binary'' file. Period. -- Greg A. Woods +1 416 218-0098 VE3TCPRoboHack [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED] Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
[ On Friday, July 2, 2004 at 22:25:27 (-0400), Eric wrote: ] Subject: RE: binary files bad idea? why? At 2:11 PM -0400 7/2/04, Greg A. Woods wrote: Why is it so damn hard for everyone to keep this simple fact in mind? Because it is entirely possible to use CVS in a manner where this simply isn't an issue. Ah ha! So, we finally get back around to this issue! What a long trip it has been! ;-) Now if you remember what I stated at the beginning then you'll realize just exactly why what you've said above is the wrong answer. Don't put binary files into CVS and expect it to work 100%. CVS cannot detect and manage changes in any meaningful way in files that are not organized as lines of text, and many of the most common and most important delta management operations that CVS does involve three-way merges of deltas with the hope and expectation of avoiding conflicts in those merges. One seldom-changing binary file in a large project (e.g. thevery few found in all of the NetBSD source tree) isn't an issue provided the human management of the project contributors keeps a sharp eye out for problems with these files (e.g. through peer pressure in the NetBSD group, combined with the fact that most/all of those binary files are owned by one developer). However the more binary files your project has, the more times they are changed, the more diverse the working directory hosts, then the more problems binary files will cause if they are committed to a CVS repository. Putting binary files in CVS is a bad idea, always was, and always will be. I.e. unless you have extremely pressing reasons for including binary files in your repository (e.g. as in NetBSD they are very rare and extremely stable and owned by one developer and because NetBSD also strives to use CVS as a source distribution tool), then it's best to use other tools and procedures for managing your binary files outside of CVS. Don't put binary files into CVS and expect it to work 100%. Use the most appropriate tools for the job. CVS is not a complete software configuration management system. Furthermore, CVS does provide some facilities to use it in a non-concurrent manner adding further protections. No, not really -- some of what's there doesn't work right and the rest is a bunch of half-baked add-on hacks that don't meld with the design or goals of CVS and which, as proven by this ever repeating cycle of discussion, causes more confusion and more headaches to naive users than could ever be made back in long-term benefits to anyone. Ultimately this is the core problem with true open-source software -- there are just way too many cooks in the pot. CVS could have been ever so much more for those using it in the way it was designed if it hadn't been for so many people trying to fiddle with things they obviously never really understood deeply enough. If you know what you are doing and what to expect, clearly using CVS and binary files will never be a problem. Well, that might be true for a computer, were there ever a full SCM system that used CVS internally, but it will never be true for humans, especially those that don't have a deep enough grasp of fundament SCM principles (which means almost everyone since it seems they still don't teach SCM well enough at anywhere near the number of the places where computer programming is taught and most programmers still don't have the slightest clue about SCM, with some even having trouble with the most basic versioning schemes!). -- Greg A. Woods +1 416 218-0098 VE3TCPRoboHack [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED] Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
[ On Wednesday, June 30, 2004 at 16:09:16 (-0400), Eric Gorr wrote: ] Subject: RE: binary files bad idea? why? So, I took two very different binary files (well a mix of binary and text files in a special folder under MacOSX called a NIB) and binhexed them. I then did: diff -u filea.hqx fileb.hqx difference.txt I then did: patch filea.hqx difference.txt and the resulting file was equivalent to fileb.hqx. Well I should hope so -- you were working with plain text files and without encountering any conflicting changes. So, I'm sorry...what can go wrong here? You're not looking at the whole picture. It is literally _impossible_ to manually resolve (with any degree of correctness) any three way merge with conflicts in any ``binary'' file, regardless of whether it has been encoded as text or not. CVS is a _CONCURRENT_ Versioning System -- which in general means you _must_ expect conflicts in a number of scenarios. Why is it so damn hard for everyone to keep this simple fact in mind? (Even without concurrency there's still the issue of merging changes between branches.) (Yes conflicts can be ``resolved'' by choosing one or the other, but that's a special case, and a hack, not the general case.) -- Greg A. Woods +1 416 218-0098 VE3TCPRoboHack [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED] Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
--- Forwarded mail from Greg Woods: It is literally _impossible_ to manually resolve (with any degree of correctness) any three way merge with conflicts in any ``binary'' file, regardless of whether it has been encoded as text or not. It IS possible, using a tools that understand the content of the file. Data that contain record or linked structures can be merged, regardless of whether they live in memory or in files. The catch is that a specialized merge tool is needed, which doesn't happen to be based on the generic Unix diff tools. Consider the following hypothetical case: A threaded data structure is marshalled into a line-based textual format. You could theoretically use diff3 to perform a 3-way merge on the data, but then the result must be inspected and the links properly reconnected by hand. After sufficient user outcry, the vendor supplies validity checker that users can run after they complete their manual merges. Of course, the users complain that the checker isn't sufficient, so the vendor supplies a merge tool that produces usable output. Finally, the users are happy. Now consider that same case, except that the file format isn't a line-based textual format, but is instead a binary format that is less expensive to produce and consume. There's no practical difference, despite the fact that one file is binary, and the other is not. This situation occurs in real life. Frame Maker is my canonical example, because it truly does have semantically equivalent binary and text-based file formats. And there are others. (Even without concurrency there's still the issue of merging changes between branches.) (Yes conflicts can be ``resolved'' by choosing one or the other, but that's a special case, and a hack, not the general case.) Actually, it's the most general case (it will work with ANY form of data), but it's also the choice of last resort because it offers the least control over the result. (Splitting hairs here.) --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
At 04:09 PM 6/30/2004, Eric Gorr wrote: So, I'm sorry...what can go wrong here? Merging. Start with your test repository. Checkout two different sandboxes. In one, make a change to one part of the NIB, binhex, and commit. In another, make a change to another part of the NIB, binhex, and update. What happens? Fred ___ Frederic W. Brehm, Sarnoff Corporation, http://www.sarnoff.com/ ___ Info-cvs mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
I want to play with the diff patch tools a bit myself just to see if I could see something go wrong with CVS and binary files if those files were run through something that would binhex them (similar to uuencoding) via a CVS wrapper. Everything seemed to work as I expected. If I understand what CVS does, when to check in a file, it does a diff with the previous version and stores that diff. To do the comparison, it must also use the patch tool. So, I took two very different binary files (well a mix of binary and text files in a special folder under MacOSX called a NIB) and binhexed them. I then did: diff -u filea.hqx fileb.hqx difference.txt I then did: patch filea.hqx difference.txt and the resulting file was equivalent to fileb.hqx. So, I'm sorry...what can go wrong here? If diff, patch and a binhex tool are the only tools which CVS requires when dealing with binary files, I don't see the problem as long as I never compare the differences between filea.hqx and fileb.hqx and select which ones to keep and which ones to throw away. ___ Info-cvs mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
At 10:28 PM 5/21/2004, Greg A. Woods wrote: If this isn't blatantly obvious to everyone who knows that CVS uses It's not blatantly obvious that everyone knows how CVS works. There's a lot of people here who are very smart people and very talented developers but view CVS as a black box. There's nothing wrong with that. But, because they are so smart, they ask questions. If I say Don't do that with CVS, then they ask Why?. If I can't explain it sufficiently, then they won't internalize my prescription and they will eventually do that anyway. We don't have a formal CM group here but I am the de facto expert. At least they are smart enough to ask me before doing something different with CVS. So, I always explain the reasons why one should treat the CVS repository or CVS commands in a certain way. Fred ___ Frederic W. Brehm, Sarnoff Corporation, http://www.sarnoff.com/ ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: binary files bad idea? why?
Hello, * On Thu, May 20, 2004 at 09:50:13AM -0400 Jim.Hyslop wrote: Spiro Trikaliotis wrote: This is no problem from my experience if the initial check-in was done from a Unix (LF-) based system, but it is a problem if it was done from a DOS (CR/LF-) based system. There is also a remote possibility that the binary file might _happen_ to contain what CVS thinks is a keyword, such as $Id$. Chances are pretty slim, but it _could_ happen. Well, since CVS handles these keywords on checkout, not on commit, this should not be a problem: Just change the file to binary (cvs admin -kb) and do an update (cvs update), and you have the original file again. Best regards, Spiro. -- Spiro R. Trikaliotis I'm subscribed to the mailing lists I'm posting, http://www.trikaliotis.net/ so please refrain from Cc:ing me. Thank you. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
[ On Friday, May 21, 2004 at 09:22:04 (-0400), Jim.Hyslop wrote: ] Subject: RE: binary files bad idea? why? Greg A. Woods wrote: There are also still many cases where surprises will crop up later, often nasty and very difficult to deal with surprises. Don't put binary files into CVS and expect it to work 100%. Can you elaborate, or point to references please? At this point, all you've done is spread FUD. Oh, come on now Jim! If you can't remember then read the friggin' list archives for goodness sake! Or, just use your imagination and run some tests on example binary files! It's pretty damn easy to come up with many scenarios where binary files cause difficulties with what would otherwise be very normal CVS activities. The more, and the larger, the binary files you have the more headaches you'll face, especially if you try to use any significant portion of the full functionality CVS provides. If this isn't blatantly obvious to everyone who knows that CVS uses RCS under the hood and that RCS uses the unix diff and patch tools to calculate and merge change deltas then those who don't get it need to go back and take comp-sci-101 over again. Yes it would be really nifty if we all had some fancy tools that could do syntactically correct delta detection and merging, regardless of the file syntax and if RCS and CVS used those tools, but that's not how RCS or CVS works and there's no point to pining over it. Note also that -kb doesn't really mean binary despite what everyone here keeps pretending -- it just means don't mess with my line separators or terminators and ignore whatever you think is an EOF char. It is certainly not a panacea for using CVS to manage binary files. There is no really good reason to put a few binary files in CVS when most of the rest of your files are normal. It's much easier and cleaner to use your external configuration management and build systems along with some other (perhaps even extremely simple) archiving tools to integrate the right versions of those binary files at build time. Use the right tool for the job and get on with it! There is certainly no point to using CVS for all binary files since almost none of what makes CVS really great to use would be of any use to anyone with only binary files and there are more than enough alternative tools that deal much better with binary files (though of course anyone trying to deal with managing change control in binary files really needs to think harder about what they're doing and maybe also go back to comp-sci-101 again, especially when the file internal format is opaque to the change identification and tracking tools). Has everyone forgotten or failed to pay attention everything learned about change management and control in the past 25 years or more?!?!?!? Is everyone still stuck on the I have a hammer so everything is a nail problem?!?!?!?!? Is everyone so afraid of learning to use a new tool or so that they can't get past the tiny little bit that they know? Get over it guys! Binary files and CVS do not, and will not, and cannot, ever mix _well_. Hacking around with one or a very few such files in a limited range of scenarios is OK, but giving generic advice to the public is quite another thing, especially when it _should_ be peppered with so many disclaimers and gotchas that any sane person would turn and run. Don't put binary files in CVS and expect it, i.e. CVS, to work 100%. If you still don't understand why then play around with diff and patch for a while until you do. -- Greg A. Woods +1 416 218-0098 VE3TCPRoboHack [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED] Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
Greg A. Woods wrote: There are also still many cases where surprises will crop up later, often nasty and very difficult to deal with surprises. Don't put binary files into CVS and expect it to work 100%. Can you elaborate, or point to references please? At this point, all you've done is spread FUD. -- Jim Hyslop Senior Software Designer Leitch Technology International Inc. (http://www.leitch.com) Columnist, C/C++ Users Journal (http://www.cuj.com/experts) ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
Spiro Trikaliotis wrote: This is no problem from my experience if the initial check-in was done from a Unix (LF-) based system, but it is a problem if it was done from a DOS (CR/LF-) based system. There is also a remote possibility that the binary file might _happen_ to contain what CVS thinks is a keyword, such as $Id$. Chances are pretty slim, but it _could_ happen. -- Jim Hyslop Senior Software Designer Leitch Technology International Inc. (http://www.leitch.com) Columnist, C/C++ Users Journal (http://www.cuj.com/experts) ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
[ On Wednesday, May 19, 2004 at 15:06:59 (-0400), Jim.Hyslop wrote: ] Subject: RE: binary files bad idea? why? CVS can easily handle binary files. It's just not necessarily as efficient at handling them as it is at handling text files. That is an outcome of the history of the utility - it was originally designed to handle source files. There are also still many cases where surprises will crop up later, often nasty and very difficult to deal with surprises. Don't put binary files into CVS and expect it to work 100%. -- Greg A. Woods +1 416 218-0098 VE3TCPRoboHack [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED] Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: binary files bad idea? why?
Jim.Hyslop [EMAIL PROTECTED] wrote: Spiro Trikaliotis wrote: This is no problem from my experience if the initial check-in was done from a Unix (LF-) based system, but it is a problem if it was done from a DOS (CR/LF-) based system. There is also a remote possibility that the binary file might _happen_ to contain what CVS thinks is a keyword, such as $Id$. Chances are pretty slim, but it _could_ happen. But the $Id$ expansion occurs on checkout. The repository copy itself would still be intact, and problems with the file would be fixed by a cvs admin -kb. Working copies could be restored with cvs update. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
binary files bad idea? why?
Hi, just read that handling of binary files with cvs is a bad idea (or something similar). Well, I guess one has to add that cvs might not be that suitable for the case of binary files, but, nevertheless, you won't run into trouble if you properly define them as of type 'binary' at initial import of add to a repository. Marko ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
Just use the -kb option when adding files. I don't think CVS handles diff between binary files well... -chris -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of marko Sent: Wednesday, May 19, 2004 4:12 AM To: [EMAIL PROTECTED] Subject: binary files bad idea? why? Hi, just read that handling of binary files with cvs is a bad idea (or something similar). Well, I guess one has to add that cvs might not be that suitable for the case of binary files, but, nevertheless, you won't run into trouble if you properly define them as of type 'binary' at initial import of add to a repository. Marko ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
Just use the -kb option when adding files. That was my point, I just didn't mention that option, sorry. I don't think CVS handles diff between binary files well... Well, it doesn't, certainly. That can only be done with additional software, of course. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
marko wrote: just read that handling of binary files with cvs is a bad idea (or something similar). Hmmm... I would suggest that either you mis-interpreted what was written, or what you read was either inaccurate or incomplete. CVS can easily handle binary files. It's just not necessarily as efficient at handling them as it is at handling text files. That is an outcome of the history of the utility - it was originally designed to handle source files. -- Jim Hyslop Senior Software Designer Leitch Technology International Inc. (http://www.leitch.com) Columnist, C/C++ Users Journal (http://www.cuj.com/experts) ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
Well, CVS does handle binary files without problems in a repository if they are checked in with -kb. cvs admin -kb file will set a file's mode as a binary file if it was NOT originally done so during check in. -chris -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Spiro Trikaliotis Sent: Wednesday, May 19, 2004 11:49 AM To: [EMAIL PROTECTED] Subject: Re: binary files bad idea? why? Hello Marko, * On Wed, May 19, 2004 at 10:11:32AM +0200 marko wrote: just read that handling of binary files with cvs is a bad idea (or something similar). Well, I guess one has to add that cvs might not be that suitable for the case of binary files, but, nevertheless, you won't run into trouble if you properly define them as of type 'binary' at initial import of add to a repository. Well, CVS does handle binary files without problems in a repository if they are checked in with -kb. Anyway, if these binaries tend to change, CVS (and the RCS file format) cannot handle them in a good way, so that the diffs tend to get really large. In fact, you might end in storing the binary file completely in many cases. If you only check in the binaries once and forever, this is not a problem. In fact, that's what I'm doing myself. Regards, Spiro. -- Spiro R. Trikaliotis I'm subscribed to the mailing lists I'm posting, http://www.trikaliotis.net/ so please refrain from Cc:ing me. Thank you. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: binary files bad idea? why?
[EMAIL PROTECTED] writes: cvs admin -kb file will set a file's mode as a binary file if it was NOT originally done so during check in. But if it wasn't marked as binary when it was checked in, the checked-in data may have been corrupted, so you have to check the file after changing the mode and redo the checkin if necessary. -Larry mail2web - Check your email from the web at http://mail2web.com/ . ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: binary files bad idea? why?
Hello, * On Wed, May 19, 2004 at 06:16:00PM -0400 [EMAIL PROTECTED] wrote: But if it wasn't marked as binary when it was checked in, the checked-in data may have been corrupted, so you have to check the file after changing the mode and redo the checkin if necessary. This is no problem from my experience if the initial check-in was done from a Unix (LF-) based system, but it is a problem if it was done from a DOS (CR/LF-) based system. Regards, Spiro. -- Spiro R. Trikaliotis http://www.trikaliotis.net/ ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs