Re: Line ending chaos in our codebase
It works on cygwin. Here's some sample output from my Windows machine: $ file /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/* /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/AWTStarter.java: ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/CVS: directory /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/CommandLineOptions.java: ASCII C program text, with CRLF, CR line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/CommandLineStarter.java: ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/Driver.java: ASCII C program text, with CRLF, CR line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/ErrorHandler.java:ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/FOInputHandler.java: ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/FOPException.java:ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/Fop.java: ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/FormattingResults.java: ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/InputHandler.java:ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/Options.java: ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/PageSequenceResults.java: ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/PrintStarter.java:ASCII C program text, with CRLF, CR line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/Starter.java: ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/StreamRenderer.java: ASCII C program text, with CRLF, CR line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/TraxInputHandler.java:ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/Version.java: ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/XSLTInputHandler.java:ASCII C program text, with CRLF line terminators /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/package.html: HTML document text On Wed, 6 Nov 2002 00:09:10 -0700 Victor Mote wrote: > Tim Landscheidt wrote: > > > Why not just use file(1)? > > > > | [tim@butler ~]$ file /var/tmp/test-{dos,unix}.txt > > | /var/tmp/test-dos.txt: ASCII text, with CRLF line terminators > > | /var/tmp/test-unix.txt: ASCII text > > I thought of that too, but it doesn't work on my Linux box (which reports > both as "ASCII text"), so it is at least somewhat implementation-dependent. > It seems like our old SCO system could tell the difference, and my Linux is > not the latest/greatest so perhaps our CVS server can handle it better. I > don't trust "file" for anything critical, but, if it makes the distinction > that you noted above, it is probably adequate for the task at hand. Jeremias Maerki <[EMAIL PROTECTED]> - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Line ending chaos in our codebase
Hi Peter Yes, file works fine to identify the bad files. It's available on cygwin. But I think, I'll give my Java-stuff a chance anyway. On Wed, 06 Nov 2002 12:55:05 +1000 Peter B. West wrote: > The unix 'file' command, with a subsequent check for the work 'text', as > mentioned by Victor earlier, is a very good start for file type checking. Jeremias Maerki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: Line ending chaos in our codebase
Tim Landscheidt wrote: > Why not just use file(1)? > > | [tim@butler ~]$ file /var/tmp/test-{dos,unix}.txt > | /var/tmp/test-dos.txt: ASCII text, with CRLF line terminators > | /var/tmp/test-unix.txt: ASCII text I thought of that too, but it doesn't work on my Linux box (which reports both as "ASCII text"), so it is at least somewhat implementation-dependent. It seems like our old SCO system could tell the difference, and my Linux is not the latest/greatest so perhaps our CVS server can handle it better. I don't trust "file" for anything critical, but, if it makes the distinction that you noted above, it is probably adequate for the task at hand. Victor Mote - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Line ending chaos in our codebase
Bertrand Delacretaz wrote: AFAIK as long as the "binary file" flag is not set, CVS takes care of line endings by itself when a file is checked out (http://www.loria.fr/~molli/cvs/doc/cvs_9.html#SEC76), converting them to what's appropriate for the platform. Bertrand, This needs a bit more investigation. As I have said, I have recently seen spurious CRLF endings in checked-out files. Peter -- Peter B. West [EMAIL PROTECTED] http://www.powerup.com.au/~pbwest/ "Lord, to whom shall we go?" - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Line ending chaos in our codebase
Jeremias, The unix 'file' command, with a subsequent check for the work 'text', as mentioned by Victor earlier, is a very good start for file type checking. Peter Jeremias Maerki wrote: Thanks for your input. Your suggestion below smells dangerous, though. In the meantime I've started a little check-program (in Java) that analyzes files on their line endings using regex-matching. I think, I'll expand that to a little project including command-line app, ant-task etc. -- Peter B. West [EMAIL PROTECTED] http://www.powerup.com.au/~pbwest/ "Lord, to whom shall we go?" - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Line ending chaos in our codebase
Victor Mote <[EMAIL PROTECTED]> wrote: > [...] > I have never been able to get grep to detect them. The only way I know (and > it falls into the category of "beat it to death") is to convert each file > using tr, then compare it to the old one. Here is a script that I just ran > on my box that works: > [...] Why not just use file(1)? | [tim@butler ~]$ file /var/tmp/test-{dos,unix}.txt | /var/tmp/test-dos.txt: ASCII text, with CRLF line terminators | /var/tmp/test-unix.txt: ASCII text Tim - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Line ending chaos in our codebase
Victor Mote wrote: Also, if you want to clean up the files in the repository, I understand that running "cvs admin -kkv FILE" will do so. This will tell cvs to treat the files as text files instead of binary, which is apparently the root of the problem. (I know, -k is for keywords, but cvs has keywords conversions & line-ending conversions in the same space). Make sure you're backed up & do some testing to make sure you got what you want. AFAIK, CVS (how's that for the start of a sentence?) treats files as text unless '-kb' is in operation. '-kb' is '-ko' (leave keywords as they were in the original checkin of the file) plus binary file I/O. I've noticed in this repository that CVS seems to get '-kb' right on, e.g., PNG files that I have added and on which I have forgotten to specify '-kb'. I suspect that the guardians of the Apache repository have done some work here. I just had a look. /home/cvs/CVSROOT/cvswrappers contains the following: *.gif -k 'b' *.psd -k 'b' *.jpg -k 'b' *.jpeg -k 'b' *.png -k 'b' *.psd -k 'b' *.eps -k 'b' *.ai -k 'b' *.jar -k 'b' *.war -k 'b' *.class -k 'b' *.zip -k 'b' *.ser -k 'b' *.pdf -k 'b' *.ico -k 'b' *.ucs2 -k 'b' *.ucs4 -k 'b' '-kkv' is the default keyword expansion form, and is contra-indicated on binary files, while '-kb' is contra-indicated on text files, on which you definitely want expansion. These values, incidentally, come from RCS, and can be read about with 'man co'. Unless something has changed recently, CRLF will happily go into the repository. A couple of months back (so it seems) I had occasion to strip CRs out of a file which had been committed late one night from a Windows system. Peter -- Peter B. West [EMAIL PROTECTED] http://www.powerup.com.au/~pbwest/ "Lord, to whom shall we go?" - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Line ending chaos in our codebase
Thanks for your input. Your suggestion below smells dangerous, though. In the meantime I've started a little check-program (in Java) that analyzes files on their line endings using regex-matching. I think, I'll expand that to a little project including command-line app, ant-task etc. On Tue, 5 Nov 2002 08:08:54 -0700 Victor Mote wrote: > Jeremias Maerki wrote: > > > up a few patch submissions. While applying them I ran across several > > files that had CRCRLF endings instead of CRLF when checked out using > > WinCVS on a Windows box. I think I have successfully corrected those I > > ran into. Does anyone have a good idea how to... > > 1. identify files not having correct linevindingstlithoutckaving > > de opendeach an>every file? > > I have never been able to get grep to detect them. The only way I know (and > it falls into the category of "beat it to death") is to convert each file > using tr, then compare it to the old one. Here is a script that I just ran > on my box that works: > > cd /u/vic/fop-trunk > for I in `find . -type f` > do > cat $I | tr -d "\015" > /u/tmp/QQtest > DELTA=`diff $I /u/tmp/QQtest | wc -l` > if [ $DELTA -gt 0 ] > then > echo "$I has DOS line-endings" > fi > done > rm /u/tmp/QQtest > > It will include binary files in its output as well. If that is a problem, > add a test to exclude those from consideration (probably using the "file" > command and looking for the word "text"). > > Since I have a hybrid Linux/Windows environment here, I feel like the > apostles at the Last Supper ("Lord, is it I?"). > > Also, if you want to clean up the files in the repository, I understand that > running "cvs admin -kkv FILE" will do so. This will tell cvs to treat the > files as text files instead of binary, which is apparently the root of the > problem. (I know, -k is for keywords, but cvs has keywords conversions & > line-ending conversions in the same space). Make sure you're backed up & do > some testing to make sure you got what you want. Jeremias Maerki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: Line ending chaos in our codebase
Jeremias Maerki wrote: > up a few patch submissions. While applying them I ran across several > files that had CRCRLF endings instead of CRLF when checked out using > WinCVS on a Windows box. I think I have successfully corrected those I > ran into. Does anyone have a good idea how to... > 1. identify files not having correct linevindingstlithoutckaving > de opendeach an>every file? I have never been able to get grep to detect them. The only way I know (and it falls into the category of "beat it to death") is to convert each file using tr, then compare it to the old one. Here is a script that I just ran on my box that works: cd /u/vic/fop-trunk for I in `find . -type f` do cat $I | tr -d "\015" > /u/tmp/QQtest DELTA=`diff $I /u/tmp/QQtest | wc -l` if [ $DELTA -gt 0 ] then echo "$I has DOS line-endings" fi done rm /u/tmp/QQtest It will include binary files in its output as well. If that is a problem, add a test to exclude those from consideration (probably using the "file" command and looking for the word "text"). Since I have a hybrid Linux/Windows environment here, I feel like the apostles at the Last Supper ("Lord, is it I?"). Also, if you want to clean up the files in the repository, I understand that running "cvs admin -kkv FILE" will do so. This will tell cvs to treat the files as text files instead of binary, which is apparently the root of the problem. (I know, -k is for keywords, but cvs has keywords conversions & line-ending conversions in the same space). Make sure you're backed up & do some testing to make sure you got what you want. Victor Mote - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Line ending chaos in our codebase
Thanks, Peter and Betrand, for your answers. On Tue, 5 Nov 2002 07:21:54 +0100 Bertrand Delacretaz wrote: > On Monday 04 November 2002 17:02, Jeremias Maerki wrote: > >. . .Does anyone have a good idea how to... > > 2. enforce correct line endings? > > Using the commitinfo administrative file, scripts can be configured in CVS to > run when a file is committed, at which point you could detect the problem. I've heard suggestions like this being discussed in other projects. But they were never installed. So, I guess it's really up to self-control. > I'm not sure if it's worth the effort though. When such a problem is found, > you could also study file revisions to find out who created the problem and > tell people to fix their environment. The person in question seems to have discovered the issue himself and fixed it. I'm into some work on the maintenance branch anyway, so I'll fix the files when I run into them. Jeremias Maerki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Line ending chaos in our codebase
On Monday 04 November 2002 17:02, Jeremias Maerki wrote: >. . .Does anyone have a good idea how to... > 2. enforce correct line endings? Using the commitinfo administrative file, scripts can be configured in CVS to run when a file is committed, at which point you could detect the problem. I'm not sure if it's worth the effort though. When such a problem is found, you could also study file revisions to find out who created the problem and tell people to fix their environment. -Bertrand - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Line ending chaos in our codebase
On Monday 04 November 2002 23:53, Peter B. West wrote: >. . .I don't know the > mechanism for handling line-end differences on entry into a CVS > repository on a unix box. >. . . AFAIK as long as the "binary file" flag is not set, CVS takes care of line endings by itself when a file is checked out (http://www.loria.fr/~molli/cvs/doc/cvs_9.html#SEC76), converting them to what's appropriate for the platform. Funny things can happen if people checkout files on a unix box and edit them from a windows box, but most windows editors handle this correctly. -Bertrand - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Line ending chaos in our codebase
Jeremias, Never having had the misfortune to work with Windows, I don't know the mechanism for handling line-end differences on entry into a CVS repository on a unix box. I assume you have to handle this on your end, because I occasionally see files with CRLF out of CVS repositories. Is this the burden of your question 2? To clean up your files, you might try some perl. perl -pi.bak -e 'BEGIN{undef $/};s/\r*\r\n/\n/g' file... should work from a unix command line. How you achieve this in Windows I don't know. Once it's working, you can just use `-pi' instead of `-pi.bak'. The 'i' works on files in-place; any trailing characters are appended to the file name to create a backup file. Handy for testing. Obviously, the line above gives me LF-only line endings. Peter Jeremias Maerki wrote: Hi there Before I'm going to work on the multi-threading issues I wanted to clear up a few patch submissions. While applying them I ran across several files that had CRCRLF endings instead of CRLF when checked out using WinCVS on a Windows box. I think I have successfully corrected those I ran into. Does anyone have a good idea how to... 1. identify files not having correct linevindingstlithoutckaving de opendeach an>every file? 2. enforce correct line endings? -- Peter B. West [EMAIL PROTECTED] http://www.powerup.com.au/~pbwest/ "Lord, to whom shall we go?" - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Line ending chaos in our codebase
Hi there Before I'm going to work on the multi-threading issues I wanted to clear up a few patch submissions. While applying them I ran across several files that had CRCRLF endings instead of CRLF when checked out using WinCVS on a Windows box. I think I have successfully corrected those I ran into. Does anyone have a good idea how to... 1. identify files not having correct linevindingstlithoutckaving de opendeach an>every file? 2. enforce correct line endings? This is probably an old story, but anyway... Jeremias Maerki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]