RE: make cvs text agnostic?
Okay, agreed, but let me put it this way: you are working on a project with files called *.doc, some containing text and some containing binary. People have complained they don't like adding the -kb and many get it wrong. do you a) take the error-prone option of people setting -k sticky flags themselves? (yuck!) (then they go and throw weird variants on it with keyword conversion and what not and see what happens) b) say well from now on don't call your text files *.doc, call them *.txt c) invent a heuristic detector which understands 382 languages and 3483 filetypes whatever little problems there may be, i really think (b) is the easiest. this really is a case of the shortest path. if files don't have meaningful extensions, the purpose of which is to convey a unique file type, then the responsibility lies _there_ to fix the problem. autodetection of types is a drastically appaling workaround for something that just doesn't need to be a big issue. (and can't cvswrappers files be defined on a directory level? then, wrappers could be set up for each folder and the different types of documentation stored in each one of these) -Original Message- From: Paul Sander [mailto:[EMAIL PROTECTED]] Sent: Thursday, 29 August 2002 10:16 To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: make cvs text agnostic? --- Forwarded mail from [EMAIL PROTECTED] re this conversation of file types -- why autodetect them, isn't that the whole point of a file type, given in every file's extension? heuristic detection of binariness -- yuck! That only works if you have a strict naming convention. The canonical counterexample is the .doc extension which can represent any one of dozens of data types, some of which are pure ASCII and some of which are not. Many shops have never standardized the tool they use to produce documentation (and therefore have a few), and several tools default to that specific extension. a mechanism already exists to tell with this problem -- why don't people just make a whopper of a cvswrappers file and then be done with it? Because cvswrappers won't work with this counterexample. --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
On Sat, Aug 31, 2002 at 09:40:00AM +1000, Matthew Herrmann wrote: you are working on a project with files called *.doc, some containing text and some containing binary. People have complained they don't like adding the -kb and many get it wrong. do you b) say well from now on don't call your text files *.doc, call them *.txt [ugly options snipped] That, *and* add a cvswrappers entry so that *.doc -- or better, *.[dD][oO][cC] -- will be binary by default, because that fails *much* safer than defaulting to text. If you accidentally check in a text file as binary, the recovery's fairly easy -- use cvs admin to frob the -k setting, then, if necessary, fix the newlines and check in a new rev. But if (on a non-UNIX system) you accidentally check in a binary file as text, the version in the the repo is very likely garbage -- and so might be the version in your sandbox, if the file happens to contain something that looks like a CVS keyword. -- | | /\ |-_|/ Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / [...] despite reports to the contrary, it is the rare programmer who permanently loses his sanity while coding (permanently being the operative word). - Eric E. Allen ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: make cvs text agnostic?
--- Forwarded mail from [EMAIL PROTECTED] Okay, agreed, but let me put it this way: you are working on a project with files called *.doc, some containing text and some containing binary. People have complained they don't like adding the -kb and many get it wrong. I think the argument has more to do with how to correctly set the keyword expansion flag. Ideally this is done at the time the file is initally checked into the repository, but the current mechanism gets it wrong much of the time and a later correction is necessary. And the correction is often done after several users have populated their sandboxes with incorrect state. do you a) take the error-prone option of people setting -k sticky flags themselves? (yuck!) (then they go and throw weird variants on it with keyword conversion and what not and see what happens) This is the least desirable mechanism, but it's also the most viable in the event that a correction is needed. b) say well from now on don't call your text files *.doc, call them *.txt c) invent a heuristic detector which understands 382 languages and 3483 filetypes whatever little problems there may be, i really think (b) is the easiest. this really is a case of the shortest path. if files don't have meaningful extensions, the purpose of which is to convey a unique file type, then the responsibility lies _there_ to fix the problem. autodetection of types is a drastically appaling workaround for something that just doesn't need to be a big issue. Well, using file extensions really is a heuristic method to identify file types, so (b) is really a subset of (c). If you can count on your file extensions, then more power to you. Some of us need to do something a little more sophisticated, such as search the file for a keyword, e.g. if the file has no extension. No matter how it's done, it's possible to become arbitrarily close to 100% correctness on the heuristic detector. (and can't cvswrappers files be defined on a directory level? then, wrappers could be set up for each folder and the different types of documentation stored in each one of these) Another way to do it, of course, is to review all of the files being exported and supply a 1-1 mapping between file names and type. This can be done either on the command line directly when adding files, or by writing a light wrapper about cvs add and supplying a table. -Original Message- From: Paul Sander [mailto:[EMAIL PROTECTED]] Sent: Thursday, 29 August 2002 10:16 To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: make cvs text agnostic? --- Forwarded mail from [EMAIL PROTECTED] re this conversation of file types -- why autodetect them, isn't that the whole point of a file type, given in every file's extension? heuristic detection of binariness -- yuck! That only works if you have a strict naming convention. The canonical counterexample is the .doc extension which can represent any one of dozens of data types, some of which are pure ASCII and some of which are not. Many shops have never standardized the tool they use to produce documentation (and therefore have a few), and several tools default to that specific extension. a mechanism already exists to tell with this problem -- why don't people just make a whopper of a cvswrappers file and then be done with it? Because cvswrappers won't work with this counterexample. --- End of forwarded message from [EMAIL PROTECTED] --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
At 09:58 PM 8/28/2002, Matthew Herrmann wrote: re this conversation of file types -- why autodetect them, isn't that the whole point of a file type, given in every file's extension? heuristic detection of binariness -- yuck! Exactly! a mechanism already exists to tell with this problem -- why don't people just make a whopper of a cvswrappers file and then be done with it? I assume that you are talking about filename extensions on a Microsoft operating system. That mechanism isn't very reliable. Quick, what kind of file is a .dat? You may have local shop procedures that define the type of all files you deal with and give them unique filename extensions that are enforced in some way by your culture. That's fine. You can now make a whopping cvswrappers file and be done with it. Try taking your cvswrappers file to a different shop, especially one that uses a variant of Unix, and see what happens. I'll avoid the rant about overloading filenames with semantic information about the contents of the file. That discussion doesn't belong here. CVS is nice because it doesn't try to enforce that particular way of using filenames on shops that do not or cannot use it. It does, however, provide a mechanism (cvswrappers) to allow you to do it in your shop, though. Fred ___ Frederic W. Brehm, Sarnoff Corporation, http://www.sarnoff.com/ ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
Hi, I would like share my thaughts on "Having repository in one OS and development in other OSes" Actually I am windows developer, working on C++, VC++, etc those application I can not develop in UNIX platforlms and at same time there is no better source management tools on windows like CVS. So We have repositories on UNIOX of Linix platforms and check out sources to windows platforms and do work and commit to the repositories and I feel that It was very good. At same time it is not very good to trust windows systems (Linux, Unix machines have more stable than windows machines.) So it's better to have repositories on Unix or Linix servers irrespective of work env. Thanks, -Koti Paul Sander wrote: [EMAIL PROTECTED]"> --- Forwarded mail from [EMAIL PROTECTED] Frederic Brehm wrote: The CVS clients already do this. The problem comes when people use a file system cross mounted on several different kinds of OS, checkout on one OS, and then edit and commit on another OS. I wonder why people do this? Anyway, it shouldn't matter, should it,even what Jouni says is true (see below) The practice is common in shops that have policies against committingstuff that doesn't compile, and that require single sources that compileon all of their supported platforms. The developers tend to check outon their Unix boxes into a shared filesystem, debug and compile, thenswitch to Windows to debug and compile there, and finally commit the codefrom an arbitrary platform. Autodetection of binary There are enough times when autodetection gets the wrong answer that many people on this list will vigorously oppose having CVS doing this automagically. Jouni Heikmniemi also mentioned that 8bit text would make it difficult. I still think that for the typical case (source code) the detection would be quite reliable. And reversible, except if you use doublechars where one of the chars just happened to be the same as a \r or \n. Unfortunately, there are many types of source code. For programs writtenin, say, the C language, what you say is probably true. For messagecatalogs whose purpose is to match numbers with text in some arbitrarylanguage, it's a bit harder.Also, there are times going the other way, when a binary file iserroneously recognized as text.It is possible to build a mechanism that is accurate to any arbitrarylevel, but certain vocal members of this group seem to think that ifit can't be 100% reliable out of the box then it isn't worth implementing. --- End of forwarded message from [EMAIL PROTECTED] ___Info-cvs mailing list[EMAIL PROTECTED]http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
hi all, re this conversation of file types -- why autodetect them, isn't that the whole point of a file type, given in every file's extension? heuristic detection of binariness -- yuck! a mechanism already exists to tell with this problem -- why don't people just make a whopper of a cvswrappers file and then be done with it? that's what we've got running in our shop, and i haven't typed a -kb since i don't know when... Matthew Herrmann -- Far Edge Technology Level 11, 80 Mount St North Sydney NSW 2060 Australia Ph: 02 9955 3640 Mob: 0404 852 537 ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
--- Forwarded mail from [EMAIL PROTECTED] re this conversation of file types -- why autodetect them, isn't that the whole point of a file type, given in every file's extension? heuristic detection of binariness -- yuck! That only works if you have a strict naming convention. The canonical counterexample is the .doc extension which can represent any one of dozens of data types, some of which are pure ASCII and some of which are not. Many shops have never standardized the tool they use to produce documentation (and therefore have a few), and several tools default to that specific extension. a mechanism already exists to tell with this problem -- why don't people just make a whopper of a cvswrappers file and then be done with it? Because cvswrappers won't work with this counterexample. --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
make cvs text agnostic?
I've been lurking on the list for a while, and have noticed that many of the questions seem to hinge on text files and storing and extracting them the right way for your platform. Would it be a good enhancement to automagically always extract the file in the natural way for your platform? The actual format in the repository would not matter at all; cvs would just do the right thing. Autodetection of binary, and perfect reversible conversion between text formats is quite feasible, so this scheme should fly Thoughts? Matt ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
On Tue, 27 Aug 2002, Matthew Hannigan wrote: Would it be a good enhancement to automagically always extract the file in the natural way for your platform? The actual format in the repository would not matter at all; cvs would just do the right thing. This is exactly how CVS works now, except that the format used in the repo is fixed to be the unixish text file format. Autodetection of binary, and perfect reversible conversion between text formats is quite feasible, so this scheme should fly Autodetecting between binary and text is far from trivial. It's fairly easy when you're talking about English only material, but it gets much harder when languages with 8-bit characters are involved. Jouni ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
At 06:59 AM 8/27/2002, Matthew Hannigan wrote: I've been lurking on the list for a while, and have noticed that many of the questions seem to hinge on text files and storing and extracting them the right way for your platform. The CVS clients already do this. The problem comes when people use a file system cross mounted on several different kinds of OS, checkout on one OS, and then edit and commit on another OS. Autodetection of binary There are enough times when autodetection gets the wrong answer that many people on this list will vigorously oppose having CVS doing this automagically. You can always write a script for yourself that will autodetect and do a cvs add -b or cvs add as appropriate. I personally would suggest a manual confirmation step for each file, unless you are really 100% sure that your script always gets the right answer. Fred ___ Frederic W. Brehm, Sarnoff Corporation, http://www.sarnoff.com/ ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
Frederic Brehm wrote: The CVS clients already do this. The problem comes when people use a file system cross mounted on several different kinds of OS, checkout on one OS, and then edit and commit on another OS. I wonder why people do this? Anyway, it shouldn't matter, should it, even what Jouni says is true (see below) Autodetection of binary There are enough times when autodetection gets the wrong answer that many people on this list will vigorously oppose having CVS doing this automagically. Jouni Heikmniemi also mentioned that 8bit text would make it difficult. I still think that for the typical case (source code) the detection would be quite reliable. And reversible, except if you use double chars where one of the chars just happened to be the same as a \r or \n. JH: This is exactly how CVS works now, except that the format used in the repo is fixed to be the unixish text file format. Even when the client and server are Windows? If so then CVS does do some automagic detection? Matt ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
At 09:20 AM 8/27/2002, Matthew Hannigan wrote: JH: This is exactly how CVS works now, except that the format used in the repo is fixed to be the unixish text file format. Disclaimer about the CVS NT server: this may or may not be true...I don't know. Even when the client and server are Windows? If so then CVS does do some automagic detection? No, it just defaults to text mode. If you want CVS to keep the bytes in a file exactly the same on all systems then you have to tell CVS that the file is binary. Fred ___ Frederic W. Brehm, Sarnoff Corporation, http://www.sarnoff.com/ ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
On Tue, 27 Aug 2002, Matthew Hannigan wrote: Jouni Heikmniemi also mentioned that 8bit text would make it difficult. I still think that for the typical case (source code) the detection would be quite reliable. Depends. But I think that writing a frontend which would ask a confirmation about all the files about to be marked as binary would be an acceptable solution in most cases. Even when the client and server are Windows? If so then CVS does do some automagic detection? Even in Windows. CVSNT and CVS repositories are compatible, so you can just copy a NT repository over to a unix box. The same goes for servers: linefeed replacements operate equally regardless of server OS. The cross-mounting is the problem, as already stated here. Jouni ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
On Tue, 27 Aug 2002, Matthew Hannigan wrote: I've been lurking on the list for a while, and have noticed that many of the questions seem to hinge on text files and storing and extracting them the right way for your platform. Would it be a good enhancement to automagically always extract the file in the natural way for your platform? Doh, CVS already does this. The aforementioned questions and discussions are raised by the clueless. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
On Tue, 27 Aug 2002, Matthew Hannigan wrote: Autodetection of binary, and perfect reversible conversion between text formats is quite feasible, so this scheme should fly Autodetecting between binary and text is far from trivial. It's fairly easy when you're talking about English only material, but it gets much harder when languages with 8-bit characters are involved. I find that on systems that have file metadata (like MacOS) this can be done quite reliably. But most systems do not support this, and others have pointed out, detecting this from the raw contents of the file is impossible in a number of situations. - rmgw http://www.trustedmedianetworks.com/ Richard Wesley Trusted Media Networks, Inc. You're confusing boredom with motivation. - Sherman in Sherman's Lagoon ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: make cvs text agnostic?
--- Forwarded mail from [EMAIL PROTECTED] Frederic Brehm wrote: The CVS clients already do this. The problem comes when people use a file system cross mounted on several different kinds of OS, checkout on one OS, and then edit and commit on another OS. I wonder why people do this? Anyway, it shouldn't matter, should it, even what Jouni says is true (see below) The practice is common in shops that have policies against committing stuff that doesn't compile, and that require single sources that compile on all of their supported platforms. The developers tend to check out on their Unix boxes into a shared filesystem, debug and compile, then switch to Windows to debug and compile there, and finally commit the code from an arbitrary platform. Autodetection of binary There are enough times when autodetection gets the wrong answer that many people on this list will vigorously oppose having CVS doing this automagically. Jouni Heikmniemi also mentioned that 8bit text would make it difficult. I still think that for the typical case (source code) the detection would be quite reliable. And reversible, except if you use double chars where one of the chars just happened to be the same as a \r or \n. Unfortunately, there are many types of source code. For programs written in, say, the C language, what you say is probably true. For message catalogs whose purpose is to match numbers with text in some arbitrary language, it's a bit harder. Also, there are times going the other way, when a binary file is erroneously recognized as text. It is possible to build a mechanism that is accurate to any arbitrary level, but certain vocal members of this group seem to think that if it can't be 100% reliable out of the box then it isn't worth implementing. --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs