A problem we've been unable to resolve for some time is that LF-delimited files are corrupted when they are written to a Samba/VMS share, double-spacing every record. I have at least once before mentioned this problem (the most recent time I have on record being August 2005) but have no record of having received an answer.

A trivial ruby script can be used to demonstrate the problem. (C++ and perl test programs can reproduce it too; see the perl example at end of this message.) The client system in this test case is a Linux system, which considers a bare LF to be a newline:

$ ruby -e 'puts "a\nb"' >bg1.tmp

A:BG> dir/full bg1.tmp
....
Record format:      Stream, maximum 0 bytes, longest 32767 bytes
...
A:BG> dump/rec bg1.tmp
...
Record number 1 (00000001), 2 (0002) bytes, RFA(0001,0000,0000)
0A61 a............................... 000000
Record number 2 (00000002), 2 (0002) bytes, RFA(0001,0000,0002)
0A62 b............................... 000000
A:BG> dump bg1.tmp
...
00000000 00000000 00000000 00000000 00000000 00000000 00000000 0A620A61 a.b............................. 000000
...
A:BG> type bg1.tmp
a

b

A:BG>

You can see why "type" is printing the file as double-spaced. The file now consists of two records, each of which contains two characters, *including* the LF line delimiter. Stream is apparently a very "forgiving" file format, which does actually consider records to end at LF characters, as well as at a wide variety of other possible delimiters (e.g. form-feed, CR+LF, vertical tab, etc.) However, out of all of these possible delimiters, it seems that only the CR+LF pair is excluded from the record itself.

The difficulty is, there is no way I know of for the client application/system to convey that the file is LF-delimited and must remain LF-delimited, and therefore should be written as Stream-LF. The end result is that any LF-delimited file written to the Samba share is corrupted, being converted into a double-spaced file, so far as RMS is concerned. The corruption gets worse as further reads & writes occur on the file from both systems, double-double spacing the file, then double-double-double-spacing it, etc.

A Windows client system running natively compiled Ruby, which considers newlines to consist of CR+LF, does not exhibit the same problem behaviour.

Here are the results when the file is created on the Windows system with the same Ruby test:

$ ruby -e 'puts "a\nb"' >bg1.tmp

A:BG> dir/full bg2.tmp
...
Record format:      Stream, maximum 0 bytes, longest 32767 bytes
...
A:BG> dump/rec bg2.tmp
...
Record number 1 (00000001), 1 (0001) byte, RFA(0001,0000,0000)
61 a............................... 000000
Record number 2 (00000002), 1 (0001) byte, RFA(0001,0000,0003)
62 b............................... 000000
A:BG> dump bg2.tmp
...
00000000 00000000 00000000 00000000 00000000 00000000 00000A0D 620A0D61 a..b............................ 000000
...
A:BG> type bg2.tmp
a
b
A:BG>

As you can see, we now get the expected results, a single-spaced file containing CR+LF-delimited lines in our "Stream" type file. RMS sees the CR+LF delimiters as terminating each record, and does not consider them to be a part of the record itself.

So if all applications on all three platforms could agree to use CR+LF as the "canonical" text file format, we wouldn't have this problem. However, even Ruby on VMS, and also Perl on VMS (the more popular and widely used of the two) are examples of applications that insist on writing Stream_LF files by default. For example:

A:BG> perl -e "print ""a\nb\n"";" >bg3.tmp
A:BG> dir/full bg.tmp
...
Record format:      Stream_LF, maximum 0 bytes, longest 32767 bytes
...
A:BG> dump/rec bg3.tmp
...
Record number 1 (00000001), 1 (0001) byte, RFA(0001,0000,0000)
61 a............................... 000000
Record number 2 (00000002), 1 (0001) byte, RFA(0001,0000,0002)
62 b............................... 000000
A:BG> dump bg3.tmp
...
00000000 00000000 00000000 00000000 00000000 00000000 00000000 0A620A61 a.b............................. 000000
...

Furthermore, not every Windows application writes CR+LF line terminators in text files. For instance, Vim for Windows (the text editor we use) understands how to read "unix" (LF-delimited) text files and "dos" (CR+LF-delimited) text files, and preserves the original line terminator type when it is written out again. With an increasing number of open source applications being ported to the Windows platform, and which must operate correctly in a mixed-platform environment, LF-delimited text files written from a Windows system are now a fact of life that cannot be easily worked around.

In conclusion, it is not a practical solution to insist that all text files be written as CR+LF-delimited. Samba/VMS *must* accommodate for LF-delimited text files somehow. Without a solution for this problem, the product's usefulness in a production cross-platform environment is seriously limited. If anyone has any idea how we can solve the problem effectively, please share it!

Ben


PLEASE READ THIS IMPORTANT ETIQUETTE MESSAGE BEFORE POSTING:

http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to