Ben, I'll do some testing and let you know the result. I guess I have to checkout last dev version of pspp to do so.
Thanks, Rémi On Fri, Jan 30, 2009 at 07:09, Ben Pfaff <[email protected]> wrote: > Rémi Dewitte <[email protected]> writes: > > > On Thu, Jan 29, 2009 at 06:40, Ben Pfaff <[email protected]> wrote: > > > > Rémi Dewitte <[email protected]> writes: > > > > > Working with pspp-0.6.1 I am glad it works fine. Nevertheless I > > encountered > > > two minor issues for which I have patched a bit pspp. > > > > > > First one is the ability to import CSV file with DOS endlines. I > don't > > know > > > whether it is the right place to trim the '\r'. > > > > This is not the right place to do this. > > > > You might give me some clues... > > Sure, I was just in a hurry when I wrote that. > > Here is my suggested substitute fix. I have not had a chance to > test that it works. Will you test it and report your results? > > Reviews welcome from everyone else too, of course. > > Thanks! > > commit f4cc711051121873dd2e11436b10dd829094bdb9 > Author: Ben Pfaff <[email protected]> > Date: Thu Jan 29 22:01:27 2009 -0800 > > Accept LF, CR LF, and LF as new-line sequences in data files. > > Until now, PSPP has used the host operating system's idea of the > new-line sequence when reading data files and other text files. > This means that, when a file with CR LF line ends is read on an OS > that uses LF as new-line (e.g. an MS-DOS file on Unix), each line > appears to have a CR at the the end. This commit fixes the > problem, by normalizing the new-line sequence at time of reading. > > This commit eliminates a performance optimization from > ds_read_line(), because the getdelim() function that it used cannot > be made to stop reading at one of two different delimiters. If > this causes a real performance regression, then the getndelim2 > function from gnulib could be used to restore the optimization. > > Thanks to Rémi Dewitte <[email protected]> for pointing out the problem > and providing an initial patch. > > commit 70a46fb66ae0de5e312c4fc007bddf65e8ea5ac9 > Author: Ben Pfaff <[email protected]> > Date: Thu Jan 29 22:08:43 2009 -0800 > > Accept LF, CR LF, and LF as new-line sequences in data files. > > Until now, PSPP has used the host operating system's idea of the > new-line sequence when reading data files and other text files. > This means that, when a file with CR LF line ends is read on an OS > that uses LF as new-line (e.g. an MS-DOS file on Unix), each line > appears to have a CR at the the end. This commit fixes the > problem, by normalizing the new-line sequence at time of reading. > > This commit eliminates a performance optimization from > ds_read_line(), because the getdelim() function that it used cannot > be made to stop reading at one of two different delimiters. If > this causes a real performance regression, then the getndelim2 > function from gnulib could be used to restore the optimization. > > Thanks to Rémi Dewitte <[email protected]> for pointing out the problem > and providing an initial patch. > > diff --git a/src/libpspp/str.c b/src/libpspp/str.c > index d082672..f054c9e 100644 > *** a/src/libpspp/str.c > --- b/src/libpspp/str.c > *************** > *** 1,5 **** > /* PSPP - a program for statistical analysis. > ! Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc. > > This program is free software: you can redistribute it and/or modify > it under the terms of the GNU General Public License as published by > --- 1,5 ---- > /* PSPP - a program for statistical analysis. > ! Copyright (C) 1997-9, 2000, 2006, 2009 Free Software Foundation, Inc. > > This program is free software: you can redistribute it and/or modify > it under the terms of the GNU General Public License as published by > *************** > *** 1190,1237 **** > return st->ss.string; > } > > ! /* Appends to ST a newline-terminated line read from STREAM, but > ! no more than MAX_LENGTH characters. > ! Newline is the last character of ST on return, if encountering > ! a newline was the reason for terminating. > ! Returns true if at least one character was read from STREAM > ! and appended to ST, false if no characters at all were read > ! before an I/O error or end of file was encountered (or > ! MAX_LENGTH was 0). */ > bool > ds_read_line (struct string *st, FILE *stream, size_t max_length) > { > ! if (!st->ss.length && max_length == SIZE_MAX) > ! { > ! size_t capacity = st->capacity ? st->capacity + 1 : 0; > ! ssize_t n = getline (&st->ss.string, &capacity, stream); > ! if (capacity) > ! st->capacity = capacity - 1; > ! if (n > 0) > ! { > ! st->ss.length = n; > ! return true; > ! } > ! else > ! return false; > ! } > ! else > { > ! size_t length; > > ! for (length = 0; length < max_length; length++) > { > ! int c = getc (stream); > ! if (c == EOF) > ! break; > ! > ! ds_put_char (st, c); > ! if (c == '\n') > ! return true; > } > ! > ! return length > 0; > } > } > > /* Removes a comment introduced by `#' from ST, > --- 1190,1231 ---- > return st->ss.string; > } > > ! /* Reads characters from STREAM and appends them to ST, stopping > ! after MAX_LENGTH characters, after appending a newline, or > ! after an I/O error or end of file was encountered, whichever > ! comes first. Returns true if at least one character was added > ! to ST, false if no characters were read before an I/O error or > ! end of file (or if MAX_LENGTH was 0). > ! > ! This function accepts LF, CR LF, and CR sequences as new-line, > ! and translates each of them to a single '\n' new-line > ! character in ST. */ > bool > ds_read_line (struct string *st, FILE *stream, size_t max_length) > { > ! size_t length; > ! > ! for (length = 0; length < max_length; length++) > { > ! int c = getc (stream); > ! if (c == EOF) > ! break; > > ! if (c == '\r') > { > ! c = getc (stream); > ! if (c != '\n') > ! { > ! ungetc (c, stream); > ! c = '\n'; > ! } > } > ! ds_put_char (st, c); > ! if (c == '\n') > ! return true; > } > + > + return length > 0; > } > > /* Removes a comment introduced by `#' from ST, > > -- > Ben Pfaff > http://benpfaff.org >
_______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
