RE: make cvs text agnostic?

2002-08-30 Thread Matthew Herrmann

Okay, agreed, but let me put it this way:

you are working on a project with files called *.doc, some containing text
and some containing binary. People have complained they don't like adding
the -kb and many get it wrong.

do you
a) take the error-prone option of people setting -k sticky flags themselves?
(yuck!) (then they go and throw weird variants on it with keyword conversion
and what not and see what happens)
b) say well from now on don't call your text files *.doc, call them *.txt
c) invent a heuristic detector which understands 382 languages and 3483
filetypes

whatever little problems there may be, i really think (b) is the easiest.
this really is a case of the shortest path.

if files don't have meaningful extensions, the purpose of which is to convey
a unique file type, then the responsibility lies _there_ to fix the problem.
autodetection of types is a drastically appaling workaround for something
that just doesn't need to be a big issue.

(and can't cvswrappers files be defined on a directory level? then, wrappers
could be set up for each folder and the different types of documentation
stored in each one of these)

-Original Message-
From: Paul Sander [mailto:[EMAIL PROTECTED]]
Sent: Thursday, 29 August 2002 10:16
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: make cvs text agnostic?


--- Forwarded mail from [EMAIL PROTECTED]

re this conversation of file types -- why autodetect them, isn't that the
whole point
of a file type, given in every file's extension? heuristic detection of
binariness -- yuck!

That only works if you have a strict naming convention.  The canonical
counterexample is the .doc extension which can represent any one of
dozens of data types, some of which are pure ASCII and some of which
are not.  Many shops have never standardized the tool they use to produce
documentation (and therefore have a few), and several tools default to
that specific extension.

a mechanism already exists to tell with this problem -- why don't people
just make a whopper of a cvswrappers file and then be done with it?

Because cvswrappers won't work with this counterexample.

--- End of forwarded message from [EMAIL PROTECTED]




___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-30 Thread Eric Siegerman

On Sat, Aug 31, 2002 at 09:40:00AM +1000, Matthew Herrmann wrote:
 you are working on a project with files called *.doc, some containing text
 and some containing binary. People have complained they don't like adding
 the -kb and many get it wrong.
 
 do you
 b) say well from now on don't call your text files *.doc, call them *.txt
 [ugly options snipped]

That, *and* add a cvswrappers entry so that *.doc -- or better,
*.[dD][oO][cC] -- will be binary by default, because that fails
*much* safer than defaulting to text.  If you accidentally check
in a text file as binary, the recovery's fairly easy -- use cvs
admin to frob the -k setting, then, if necessary, fix the
newlines and check in a new rev.

But if (on a non-UNIX system) you accidentally check in a binary
file as text, the version in the the repo is very likely garbage
-- and so might be the version in your sandbox, if the file
happens to contain something that looks like a CVS keyword.

--

|  | /\
|-_|/ Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED]
|  |  /
[...] despite reports to the contrary, it is the rare programmer who
permanently loses his sanity while coding (permanently being the
operative word).
- Eric E. Allen


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: make cvs text agnostic?

2002-08-30 Thread Paul Sander

--- Forwarded mail from [EMAIL PROTECTED]

Okay, agreed, but let me put it this way:

you are working on a project with files called *.doc, some containing text
and some containing binary. People have complained they don't like adding
the -kb and many get it wrong.

I think the argument has more to do with how to correctly set the keyword
expansion flag.  Ideally this is done at the time the file is initally
checked into the repository, but the current mechanism gets it wrong
much of the time and a later correction is necessary.  And the correction
is often done after several users have populated their sandboxes with
incorrect state.

do you
a) take the error-prone option of people setting -k sticky flags themselves?
(yuck!) (then they go and throw weird variants on it with keyword conversion
and what not and see what happens)

This is the least desirable mechanism, but it's also the most viable in
the event that a correction is needed.

b) say well from now on don't call your text files *.doc, call them *.txt
c) invent a heuristic detector which understands 382 languages and 3483
filetypes

whatever little problems there may be, i really think (b) is the easiest.
this really is a case of the shortest path.

if files don't have meaningful extensions, the purpose of which is to convey
a unique file type, then the responsibility lies _there_ to fix the problem.
autodetection of types is a drastically appaling workaround for something
that just doesn't need to be a big issue.

Well, using file extensions really is a heuristic method to identify
file types, so (b) is really a subset of (c).  If you can count on your
file extensions, then more power to you.  Some of us need to do something
a little more sophisticated, such as search the file for a keyword, e.g. if
the file has no extension.

No matter how it's done, it's possible to become arbitrarily close to
100% correctness on the heuristic detector.

(and can't cvswrappers files be defined on a directory level? then, wrappers
could be set up for each folder and the different types of documentation
stored in each one of these)

Another way to do it, of course, is to review all of the files being
exported and supply a 1-1 mapping between file names and type.  This can
be done either on the command line directly when adding files, or by
writing a light wrapper about cvs add and supplying a table.

-Original Message-
From: Paul Sander [mailto:[EMAIL PROTECTED]]
Sent: Thursday, 29 August 2002 10:16
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: make cvs text agnostic?


--- Forwarded mail from [EMAIL PROTECTED]

re this conversation of file types -- why autodetect them, isn't that the
whole point
of a file type, given in every file's extension? heuristic detection of
binariness -- yuck!

That only works if you have a strict naming convention.  The canonical
counterexample is the .doc extension which can represent any one of
dozens of data types, some of which are pure ASCII and some of which
are not.  Many shops have never standardized the tool they use to produce
documentation (and therefore have a few), and several tools default to
that specific extension.

a mechanism already exists to tell with this problem -- why don't people
just make a whopper of a cvswrappers file and then be done with it?

Because cvswrappers won't work with this counterexample.

--- End of forwarded message from [EMAIL PROTECTED]



--- End of forwarded message from [EMAIL PROTECTED]



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-29 Thread Frederic Brehm

At 09:58 PM 8/28/2002, Matthew Herrmann wrote:
re this conversation of file types -- why autodetect them, isn't that the
whole point
of a file type, given in every file's extension? heuristic detection of
binariness -- yuck!

Exactly!



a mechanism already exists to tell with this problem -- why don't people
just make a whopper of a cvswrappers file and then be done with it?

I assume that you are talking about filename extensions on a Microsoft 
operating system. That mechanism isn't very reliable. Quick, what kind of 
file is a .dat?

You may have local shop procedures that define the type of all files you 
deal with and give them unique filename extensions that are enforced in 
some way by your culture. That's fine. You can now make a whopping 
cvswrappers file and be done with it.

Try taking your cvswrappers file to a different shop, especially one that 
uses a variant of Unix, and see what happens.

I'll avoid the rant about overloading filenames with semantic information 
about the contents of the file. That discussion doesn't belong here.

CVS is nice because it doesn't try to enforce that particular way of using 
filenames on shops that do  not or cannot use it. It does, however, provide 
a mechanism (cvswrappers) to allow you to do it in your shop, though.

Fred

___
Frederic W. Brehm, Sarnoff Corporation, http://www.sarnoff.com/




___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-28 Thread Koti



Hi,

I would like share my thaughts on "Having repository in one OS and development
in other OSes"

Actually I am windows developer, working on C++, VC++, etc those application
I can not develop in UNIX platforlms and at same time there is no better
source management tools on windows like CVS. So We have repositories on UNIOX
of Linix platforms and check out sources to windows platforms and do work
and commit to the repositories and I feel that It was very good. 

At same time it is not very good to trust windows systems (Linux, Unix machines
have more stable than windows machines.)
So it's better to have repositories on Unix or Linix servers irrespective
of work env.

Thanks,
-Koti

Paul Sander wrote:
[EMAIL PROTECTED]">
  
--- Forwarded mail from [EMAIL PROTECTED]



  Frederic Brehm wrote:
  
The CVS clients already do this. The problem comes when people use a file system cross mounted on several different kinds of OS, checkout on one OS, and then edit and commit on another OS.




  I wonder why people do this?  Anyway, it shouldn't matter, should it,even what Jouni says is true (see below)
  
  The practice is common in shops that have policies against committingstuff that doesn't compile, and that require single sources that compileon all of their supported platforms.  The developers tend to check outon their Unix boxes into a shared filesystem, debug and compile, thenswitch to Windows to debug and compile there, and finally commit the codefrom an arbitrary platform.
  

  
Autodetection of binary

There are enough times when autodetection gets the wrong answer that many people on this list will vigorously oppose having CVS doing this automagically.




  Jouni Heikmniemi also mentioned that 8bit text would make it difficult.  I still think that for the typical case (source code) the detection would be quite reliable.  And reversible, except if you use doublechars where one of the chars just happened to be the same as a \r or \n.
  
  Unfortunately, there are many types of source code.  For programs writtenin, say, the C language, what you say is probably true.  For messagecatalogs whose purpose is to match numbers with text in some arbitrarylanguage, it's a bit harder.Also, there are times going the other way, when a binary file iserroneously recognized as text.It is possible to build a mechanism that is accurate to any arbitrarylevel, but certain vocal members of this group seem to think that ifit can't be 100% reliable out of the box then it isn't worth implementing.
  
--- End of forwarded message from [EMAIL PROTECTED]

___Info-cvs mailing list[EMAIL PROTECTED]http://mail.gnu.org/mailman/listinfo/info-cvs






Re: make cvs text agnostic?

2002-08-28 Thread Matthew Herrmann

hi all,

re this conversation of file types -- why autodetect them, isn't that the
whole point
of a file type, given in every file's extension? heuristic detection of
binariness -- yuck!

a mechanism already exists to tell with this problem -- why don't people
just make a whopper of a cvswrappers file and then be done with it?

that's what we've got running in our shop, and i haven't typed a -kb since
i don't know when...


Matthew Herrmann
--
Far Edge Technology
Level 11, 80 Mount St
North Sydney NSW 2060
Australia

Ph: 02 9955 3640
Mob: 0404 852 537



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-28 Thread Paul Sander

--- Forwarded mail from [EMAIL PROTECTED]

re this conversation of file types -- why autodetect them, isn't that the
whole point
of a file type, given in every file's extension? heuristic detection of
binariness -- yuck!

That only works if you have a strict naming convention.  The canonical
counterexample is the .doc extension which can represent any one of
dozens of data types, some of which are pure ASCII and some of which
are not.  Many shops have never standardized the tool they use to produce
documentation (and therefore have a few), and several tools default to
that specific extension.

a mechanism already exists to tell with this problem -- why don't people
just make a whopper of a cvswrappers file and then be done with it?

Because cvswrappers won't work with this counterexample.

--- End of forwarded message from [EMAIL PROTECTED]



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



make cvs text agnostic?

2002-08-27 Thread Matthew Hannigan


I've been lurking on the list for a while,
and have noticed that many of the questions
seem to hinge on text files and storing and
extracting them the right way for your platform.

Would it be a good enhancement to automagically
always extract the file in the natural way
for your platform?  The actual format in the
repository would not matter at all; cvs would
just do the right thing.

Autodetection of binary, and perfect reversible
conversion between text formats is quite feasible,
so this scheme should fly


Thoughts?

Matt



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-27 Thread Jouni Heikniemi

On Tue, 27 Aug 2002, Matthew Hannigan wrote:

 Would it be a good enhancement to automagically
 always extract the file in the natural way
 for your platform?  The actual format in the
 repository would not matter at all; cvs would
 just do the right thing.

This is exactly how CVS works now, except that the format used in the repo
is fixed to be the unixish text file format.

 Autodetection of binary, and perfect reversible
 conversion between text formats is quite feasible,
 so this scheme should fly

Autodetecting between binary and text is far from trivial. It's fairly
easy when you're talking about English only material, but it gets much
harder when languages with 8-bit characters are involved. 


Jouni



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-27 Thread Frederic Brehm

At 06:59 AM 8/27/2002, Matthew Hannigan wrote:

I've been lurking on the list for a while,
and have noticed that many of the questions
seem to hinge on text files and storing and
extracting them the right way for your platform.

The CVS clients already do this. The problem comes when people use a file 
system cross mounted on several different kinds of OS, checkout on one OS, 
and then edit and commit on another OS.

Autodetection of binary

There are enough times when autodetection gets the wrong answer that many 
people on this list will vigorously oppose having CVS doing this 
automagically. You can always write a script for yourself that will 
autodetect and do a cvs add -b or cvs add as appropriate. I personally 
would suggest a manual confirmation step for each file, unless you are 
really 100% sure that your script always gets the right answer.

Fred

___
Frederic W. Brehm, Sarnoff Corporation, http://www.sarnoff.com/




___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-27 Thread Matthew Hannigan

Frederic Brehm wrote:
 The CVS clients already do this. The problem comes when people use a 
 file system cross mounted on several different kinds of OS, checkout on 
 one OS, and then edit and commit on another OS.

I wonder why people do this?  Anyway, it shouldn't matter, should it,
even what Jouni says is true (see below)

 Autodetection of binary
 
 There are enough times when autodetection gets the wrong answer that 
 many people on this list will vigorously oppose having CVS doing this 
 automagically.

Jouni Heikmniemi also mentioned that 8bit text would make it difficult. 
  I still think that for the typical case (source code) the detection 
would be quite reliable.  And reversible, except if you use double
chars where one of the chars just happened to be the same as a \r or \n.

JH:
  This is exactly how CVS works now, except that the format
  used in the repo is fixed to be the unixish text file format.

Even when the client and server are Windows?
If so then CVS does do some automagic detection?

Matt




___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-27 Thread Frederic Brehm

At 09:20 AM 8/27/2002, Matthew Hannigan wrote:
JH:
  This is exactly how CVS works now, except that the format
  used in the repo is fixed to be the unixish text file format.

Disclaimer about the CVS NT server: this may or may not be true...I don't know.


Even when the client and server are Windows?
If so then CVS does do some automagic detection?

No, it just defaults to text mode. If you want CVS to keep the bytes in a 
file exactly the same on all systems then you have to tell CVS that the 
file is binary.

Fred

___
Frederic W. Brehm, Sarnoff Corporation, http://www.sarnoff.com/




___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-27 Thread Jouni Heikniemi

On Tue, 27 Aug 2002, Matthew Hannigan wrote:

 Jouni Heikmniemi also mentioned that 8bit text would make it difficult. 
   I still think that for the typical case (source code) the detection 
 would be quite reliable.

Depends. But I think that writing a frontend which would ask a
confirmation about all the files about to be marked as binary would be an
acceptable solution in most cases. 

 Even when the client and server are Windows?
 If so then CVS does do some automagic detection?

Even in Windows. CVSNT and CVS repositories are compatible, so you can
just copy a NT repository over to a unix box. The same goes for
servers: linefeed replacements operate equally regardless of server
OS. The cross-mounting is the problem, as already stated here.


Jouni



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-27 Thread Kaz Kylheku

On Tue, 27 Aug 2002, Matthew Hannigan wrote:

 I've been lurking on the list for a while,
 and have noticed that many of the questions
 seem to hinge on text files and storing and
 extracting them the right way for your platform.
 
 Would it be a good enhancement to automagically
 always extract the file in the natural way
 for your platform?

Doh, CVS already does this. The aforementioned questions and
discussions are raised by the clueless.



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-27 Thread Richard Wesley

On Tue, 27 Aug 2002, Matthew Hannigan wrote:

   Autodetection of binary, and perfect reversible
  conversion between text formats is quite feasible,
  so this scheme should fly

Autodetecting between binary and text is far from trivial. It's fairly
easy when you're talking about English only material, but it gets much
harder when languages with 8-bit characters are involved.

I find that on systems that have file metadata (like MacOS) this can 
be done quite reliably.  But most systems do not support this, and 
others have pointed out, detecting this from the raw contents of the 
file is impossible in a number of situations.

- rmgw

http://www.trustedmedianetworks.com/


Richard Wesley  Trusted Media Networks, Inc.

You're confusing boredom with motivation.
   - Sherman in Sherman's Lagoon


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: make cvs text agnostic?

2002-08-27 Thread Paul Sander

--- Forwarded mail from [EMAIL PROTECTED]

Frederic Brehm wrote:
 The CVS clients already do this. The problem comes when people use a 
 file system cross mounted on several different kinds of OS, checkout on 
 one OS, and then edit and commit on another OS.

I wonder why people do this?  Anyway, it shouldn't matter, should it,
even what Jouni says is true (see below)

The practice is common in shops that have policies against committing
stuff that doesn't compile, and that require single sources that compile
on all of their supported platforms.  The developers tend to check out
on their Unix boxes into a shared filesystem, debug and compile, then
switch to Windows to debug and compile there, and finally commit the code
from an arbitrary platform.

 Autodetection of binary
 
 There are enough times when autodetection gets the wrong answer that 
 many people on this list will vigorously oppose having CVS doing this 
 automagically.

Jouni Heikmniemi also mentioned that 8bit text would make it difficult. 
  I still think that for the typical case (source code) the detection 
would be quite reliable.  And reversible, except if you use double
chars where one of the chars just happened to be the same as a \r or \n.

Unfortunately, there are many types of source code.  For programs written
in, say, the C language, what you say is probably true.  For message
catalogs whose purpose is to match numbers with text in some arbitrary
language, it's a bit harder.

Also, there are times going the other way, when a binary file is
erroneously recognized as text.

It is possible to build a mechanism that is accurate to any arbitrary
level, but certain vocal members of this group seem to think that if
it can't be 100% reliable out of the box then it isn't worth implementing.

--- End of forwarded message from [EMAIL PROTECTED]



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs