Rémi Dewitte <[email protected]> writes: > On Thu, Jan 29, 2009 at 06:40, Ben Pfaff <[email protected]> wrote: > > Rémi Dewitte <[email protected]> writes: > > > Working with pspp-0.6.1 I am glad it works fine. Nevertheless I > encountered > > two minor issues for which I have patched a bit pspp. > > > > First one is the ability to import CSV file with DOS endlines. I don't > know > > whether it is the right place to trim the '\r'. > > This is not the right place to do this. > > You might give me some clues...
Sure, I was just in a hurry when I wrote that. Here is my suggested substitute fix. I have not had a chance to test that it works. Will you test it and report your results? Reviews welcome from everyone else too, of course. Thanks! commit f4cc711051121873dd2e11436b10dd829094bdb9 Author: Ben Pfaff <[email protected]> Date: Thu Jan 29 22:01:27 2009 -0800 Accept LF, CR LF, and LF as new-line sequences in data files. Until now, PSPP has used the host operating system's idea of the new-line sequence when reading data files and other text files. This means that, when a file with CR LF line ends is read on an OS that uses LF as new-line (e.g. an MS-DOS file on Unix), each line appears to have a CR at the the end. This commit fixes the problem, by normalizing the new-line sequence at time of reading. This commit eliminates a performance optimization from ds_read_line(), because the getdelim() function that it used cannot be made to stop reading at one of two different delimiters. If this causes a real performance regression, then the getndelim2 function from gnulib could be used to restore the optimization. Thanks to Rémi Dewitte <[email protected]> for pointing out the problem and providing an initial patch. commit 70a46fb66ae0de5e312c4fc007bddf65e8ea5ac9 Author: Ben Pfaff <[email protected]> Date: Thu Jan 29 22:08:43 2009 -0800 Accept LF, CR LF, and LF as new-line sequences in data files. Until now, PSPP has used the host operating system's idea of the new-line sequence when reading data files and other text files. This means that, when a file with CR LF line ends is read on an OS that uses LF as new-line (e.g. an MS-DOS file on Unix), each line appears to have a CR at the the end. This commit fixes the problem, by normalizing the new-line sequence at time of reading. This commit eliminates a performance optimization from ds_read_line(), because the getdelim() function that it used cannot be made to stop reading at one of two different delimiters. If this causes a real performance regression, then the getndelim2 function from gnulib could be used to restore the optimization. Thanks to Rémi Dewitte <[email protected]> for pointing out the problem and providing an initial patch. diff --git a/src/libpspp/str.c b/src/libpspp/str.c index d082672..f054c9e 100644 *** a/src/libpspp/str.c --- b/src/libpspp/str.c *************** *** 1,5 **** /* PSPP - a program for statistical analysis. ! Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by --- 1,5 ---- /* PSPP - a program for statistical analysis. ! Copyright (C) 1997-9, 2000, 2006, 2009 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by *************** *** 1190,1237 **** return st->ss.string; } ! /* Appends to ST a newline-terminated line read from STREAM, but ! no more than MAX_LENGTH characters. ! Newline is the last character of ST on return, if encountering ! a newline was the reason for terminating. ! Returns true if at least one character was read from STREAM ! and appended to ST, false if no characters at all were read ! before an I/O error or end of file was encountered (or ! MAX_LENGTH was 0). */ bool ds_read_line (struct string *st, FILE *stream, size_t max_length) { ! if (!st->ss.length && max_length == SIZE_MAX) ! { ! size_t capacity = st->capacity ? st->capacity + 1 : 0; ! ssize_t n = getline (&st->ss.string, &capacity, stream); ! if (capacity) ! st->capacity = capacity - 1; ! if (n > 0) ! { ! st->ss.length = n; ! return true; ! } ! else ! return false; ! } ! else { ! size_t length; ! for (length = 0; length < max_length; length++) { ! int c = getc (stream); ! if (c == EOF) ! break; ! ! ds_put_char (st, c); ! if (c == '\n') ! return true; } ! ! return length > 0; } } /* Removes a comment introduced by `#' from ST, --- 1190,1231 ---- return st->ss.string; } ! /* Reads characters from STREAM and appends them to ST, stopping ! after MAX_LENGTH characters, after appending a newline, or ! after an I/O error or end of file was encountered, whichever ! comes first. Returns true if at least one character was added ! to ST, false if no characters were read before an I/O error or ! end of file (or if MAX_LENGTH was 0). ! ! This function accepts LF, CR LF, and CR sequences as new-line, ! and translates each of them to a single '\n' new-line ! character in ST. */ bool ds_read_line (struct string *st, FILE *stream, size_t max_length) { ! size_t length; ! ! for (length = 0; length < max_length; length++) { ! int c = getc (stream); ! if (c == EOF) ! break; ! if (c == '\r') { ! c = getc (stream); ! if (c != '\n') ! { ! ungetc (c, stream); ! c = '\n'; ! } } ! ds_put_char (st, c); ! if (c == '\n') ! return true; } + + return length > 0; } /* Removes a comment introduced by `#' from ST, -- Ben Pfaff http://benpfaff.org _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
