Re: Problem with VMS Carriage return carriage control files in 5.10 and 5.12

Craig A. Berry Thu, 04 Nov 2010 17:38:38 -0700


On Apr 21, 2010, at 5:29 PM, martin.zin...@deutsche-boerse.com wrote:

If you open a text file with Carriage return carriage control foroutput
(based off an existing file) and populate the new file with longer
records, at some point gratuitous
line breaks are added to the file.

Finally getting back to this after six months. And I think I have asolution. To review, what happens when you use the Perl "open"operator is that it calls into its own buffered I/O layer named"perlio" which sits on top of another layer called "unixio" which isimplemented in terms of the CRTL read/write functions. Thisarrangement was new in about 5.6 but became the default in 5.10, andthat's where we started seeing the problem Martin describes on VMS.

The problem is that while the perlio layer is buffered, the unixiolayer is not. When the buffer in the perlio layer gets filled up, ittriggers a flush to the lower layer. The flush in the perlio layercauses a write() in the unixio layer, and when you do that you go allthe way to disk, and if writing to a record-oriented file, you'lllikely introduce an extra record boundary in the file unless you hadthe extreme good fortune to hit the end of a line at the same time youhit the end of the buffer. Part of the problem is that the buffer inthe perlio layer is hard-wired to 4K. With a larger buffer, you wouldtypically not see as many extra records, but you would still see them.

It turns out the perlio layer has some knobs and switches on it, andone of them is a "line buffering" option. If this option is enabled,then the flush to the lower layer happens whenever a newline characterappears in the data. As long as your lines are shorter than thelength of the buffer, you write them out whole, which empties thebuffer in the upper layer making room for more data, and everything ispeachy.

So, where and how to enable this line buffering? Here's my proposedpatch:


--- perlio.c;-0 2010-10-21 07:58:15 -0500
+++ perlio.c    2010-11-02 21:32:41 -0500
@@ -3758,6 +3758,22 @@ PerlIOBuf_open(pTHX_ PerlIO_funcs *self,
                 */
                PerlLIO_setmode(fd, O_BINARY);
 #endif
+#ifdef VMS
+#include <rms.h>

+ /* Enable line buffering with record-oriented regularfiles+ * so we don't introduce an extraneous record boundarywhen

+                * the buffer fills up.
+                */
+               if (PerlIOBase(f)->flags & PERLIO_F_CANWRITE) {
+                   Stat_t st;
+                   if (PerlLIO_fstat(fd, &st) == 0
+                       && S_ISREG(st.st_mode)
+                       && (st.st_fab_rfm == FAB$C_VAR
+                           || st.st_fab_rfm == FAB$C_VFC)) {
+                       PerlIOBase(f)->flags |= PERLIO_F_LINEBUF;
+                   }
+               }
+#endif
            }
        }
     }

[end]

This is right after the perlio layer has called down to the unixiolayer to get the file open. We have an fd, so we can do an fstat() onthat and retrieve the record format from the VMS-specific bits of thestat structure. Then I check to see if it's a regular file (not adevice like a mailbox that may need to carry binary data) and that therecord format is either variable or variable with fixed control. Ifthese conditions are met, I enable the line buffering option on thatfilehandle.

I have tested this and it works for situations similar to Martin'soriginal report, and it does not introduce any new test failures inthe test suite. But what situations, if any, does this break? I'massuming that if the record format is FAB$C_VAR or FAB$C_VFC, therecords will never contain binary data with embedded newlines. Isthat true? What other assumptions am I making that I shouldn't?


________________________________________
Craig A. Berry
mailto:craigbe...@mac.com

"... getting out of a sonnet is much more
 difficult than getting in."
                 Brad Leithauser

Re: Problem with VMS Carriage return carriage control files in 5.10 and 5.12

Reply via email to