On Apr 21, 2010, at 5:29 PM, martin.zin...@deutsche-boerse.com wrote:

If you open a text file with Carriage return carriage control for output
(based off an existing file) and populate the new file with longer
records, at some point gratuitous
line breaks are added to the file.

Finally getting back to this after six months. And I think I have a solution. To review, what happens when you use the Perl "open" operator is that it calls into its own buffered I/O layer named "perlio" which sits on top of another layer called "unixio" which is implemented in terms of the CRTL read/write functions. This arrangement was new in about 5.6 but became the default in 5.10, and that's where we started seeing the problem Martin describes on VMS.

The problem is that while the perlio layer is buffered, the unixio layer is not. When the buffer in the perlio layer gets filled up, it triggers a flush to the lower layer. The flush in the perlio layer causes a write() in the unixio layer, and when you do that you go all the way to disk, and if writing to a record-oriented file, you'll likely introduce an extra record boundary in the file unless you had the extreme good fortune to hit the end of a line at the same time you hit the end of the buffer. Part of the problem is that the buffer in the perlio layer is hard-wired to 4K. With a larger buffer, you would typically not see as many extra records, but you would still see them.

It turns out the perlio layer has some knobs and switches on it, and one of them is a "line buffering" option. If this option is enabled, then the flush to the lower layer happens whenever a newline character appears in the data. As long as your lines are shorter than the length of the buffer, you write them out whole, which empties the buffer in the upper layer making room for more data, and everything is peachy.

So, where and how to enable this line buffering? Here's my proposed patch:

--- perlio.c;-0 2010-10-21 07:58:15 -0500
+++ perlio.c    2010-11-02 21:32:41 -0500
@@ -3758,6 +3758,22 @@ PerlIOBuf_open(pTHX_ PerlIO_funcs *self,
                 */
                PerlLIO_setmode(fd, O_BINARY);
 #endif
+#ifdef VMS
+#include <rms.h>
+ /* Enable line buffering with record-oriented regular files + * so we don't introduce an extraneous record boundary when
+                * the buffer fills up.
+                */
+               if (PerlIOBase(f)->flags & PERLIO_F_CANWRITE) {
+                   Stat_t st;
+                   if (PerlLIO_fstat(fd, &st) == 0
+                       && S_ISREG(st.st_mode)
+                       && (st.st_fab_rfm == FAB$C_VAR
+                           || st.st_fab_rfm == FAB$C_VFC)) {
+                       PerlIOBase(f)->flags |= PERLIO_F_LINEBUF;
+                   }
+               }
+#endif
            }
        }
     }

[end]


This is right after the perlio layer has called down to the unixio layer to get the file open. We have an fd, so we can do an fstat() on that and retrieve the record format from the VMS-specific bits of the stat structure. Then I check to see if it's a regular file (not a device like a mailbox that may need to carry binary data) and that the record format is either variable or variable with fixed control. If these conditions are met, I enable the line buffering option on that filehandle.

I have tested this and it works for situations similar to Martin's original report, and it does not introduce any new test failures in the test suite. But what situations, if any, does this break? I'm assuming that if the record format is FAB$C_VAR or FAB$C_VFC, the records will never contain binary data with embedded newlines. Is that true? What other assumptions am I making that I shouldn't?

________________________________________
Craig A. Berry
mailto:craigbe...@mac.com

"... getting out of a sonnet is much more
 difficult than getting in."
                 Brad Leithauser

Reply via email to