Ben,

I'll do some testing and let you know the result. I guess I have to checkout
last dev version of pspp to do so.

Thanks,
Rémi

On Fri, Jan 30, 2009 at 07:09, Ben Pfaff <[email protected]> wrote:

> Rémi Dewitte <[email protected]> writes:
>
> > On Thu, Jan 29, 2009 at 06:40, Ben Pfaff <[email protected]> wrote:
> >
> >     Rémi Dewitte <[email protected]> writes:
> >
> >     > Working with pspp-0.6.1 I am glad it works fine. Nevertheless I
> >     encountered
> >     > two minor issues for which I have patched a bit pspp.
> >     >
> >     > First one is the ability to import CSV file with DOS endlines. I
> don't
> >     know
> >     > whether it is the right place to trim the '\r'.
> >
> >     This is not the right place to do this.
> >
> > You might give me some clues...
>
> Sure, I was just in a hurry when I wrote that.
>
> Here is my suggested substitute fix.  I have not had a chance to
> test that it works.  Will you test it and report your results?
>
> Reviews welcome from everyone else too, of course.
>
> Thanks!
>
> commit f4cc711051121873dd2e11436b10dd829094bdb9
> Author: Ben Pfaff <[email protected]>
> Date:   Thu Jan 29 22:01:27 2009 -0800
>
>    Accept LF, CR LF, and LF as new-line sequences in data files.
>
>    Until now, PSPP has used the host operating system's idea of the
>    new-line sequence when reading data files and other text files.
>    This means that, when a file with CR LF line ends is read on an OS
>    that uses LF as new-line (e.g. an MS-DOS file on Unix), each line
>    appears to have a CR at the the end.  This commit fixes the
>    problem, by normalizing the new-line sequence at time of reading.
>
>    This commit eliminates a performance optimization from
>    ds_read_line(), because the getdelim() function that it used cannot
>    be made to stop reading at one of two different delimiters.  If
>    this causes a real performance regression, then the getndelim2
>    function from gnulib could be used to restore the optimization.
>
>    Thanks to Rémi Dewitte <[email protected]> for pointing out the problem
>    and providing an initial patch.
>
> commit 70a46fb66ae0de5e312c4fc007bddf65e8ea5ac9
> Author: Ben Pfaff <[email protected]>
> Date:   Thu Jan 29 22:08:43 2009 -0800
>
>    Accept LF, CR LF, and LF as new-line sequences in data files.
>
>    Until now, PSPP has used the host operating system's idea of the
>    new-line sequence when reading data files and other text files.
>    This means that, when a file with CR LF line ends is read on an OS
>    that uses LF as new-line (e.g. an MS-DOS file on Unix), each line
>    appears to have a CR at the the end.  This commit fixes the
>    problem, by normalizing the new-line sequence at time of reading.
>
>    This commit eliminates a performance optimization from
>    ds_read_line(), because the getdelim() function that it used cannot
>    be made to stop reading at one of two different delimiters.  If
>    this causes a real performance regression, then the getndelim2
>    function from gnulib could be used to restore the optimization.
>
>    Thanks to Rémi Dewitte <[email protected]> for pointing out the problem
>    and providing an initial patch.
>
> diff --git a/src/libpspp/str.c b/src/libpspp/str.c
> index d082672..f054c9e 100644
> *** a/src/libpspp/str.c
> --- b/src/libpspp/str.c
> ***************
> *** 1,5 ****
>  /* PSPP - a program for statistical analysis.
> !    Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc.
>
>     This program is free software: you can redistribute it and/or modify
>     it under the terms of the GNU General Public License as published by
> --- 1,5 ----
>  /* PSPP - a program for statistical analysis.
> !    Copyright (C) 1997-9, 2000, 2006, 2009 Free Software Foundation, Inc.
>
>     This program is free software: you can redistribute it and/or modify
>     it under the terms of the GNU General Public License as published by
> ***************
> *** 1190,1237 ****
>    return st->ss.string;
>  }
>
> ! /* Appends to ST a newline-terminated line read from STREAM, but
> !    no more than MAX_LENGTH characters.
> !    Newline is the last character of ST on return, if encountering
> !    a newline was the reason for terminating.
> !    Returns true if at least one character was read from STREAM
> !    and appended to ST, false if no characters at all were read
> !    before an I/O error or end of file was encountered (or
> !    MAX_LENGTH was 0). */
>  bool
>  ds_read_line (struct string *st, FILE *stream, size_t max_length)
>  {
> !   if (!st->ss.length && max_length == SIZE_MAX)
> !     {
> !       size_t capacity = st->capacity ? st->capacity + 1 : 0;
> !       ssize_t n = getline (&st->ss.string, &capacity, stream);
> !       if (capacity)
> !         st->capacity = capacity - 1;
> !       if (n > 0)
> !         {
> !           st->ss.length = n;
> !           return true;
> !         }
> !       else
> !         return false;
> !     }
> !   else
>      {
> !       size_t length;
>
> !       for (length = 0; length < max_length; length++)
>          {
> !           int c = getc (stream);
> !           if (c == EOF)
> !             break;
> !
> !           ds_put_char (st, c);
> !           if (c == '\n')
> !             return true;
>          }
> !
> !       return length > 0;
>      }
>  }
>
>  /* Removes a comment introduced by `#' from ST,
> --- 1190,1231 ----
>    return st->ss.string;
>  }
>
> ! /* Reads characters from STREAM and appends them to ST, stopping
> !    after MAX_LENGTH characters, after appending a newline, or
> !    after an I/O error or end of file was encountered, whichever
> !    comes first.  Returns true if at least one character was added
> !    to ST, false if no characters were read before an I/O error or
> !    end of file (or if MAX_LENGTH was 0).
> !
> !    This function accepts LF, CR LF, and CR sequences as new-line,
> !    and translates each of them to a single '\n' new-line
> !    character in ST. */
>  bool
>  ds_read_line (struct string *st, FILE *stream, size_t max_length)
>  {
> !   size_t length;
> !
> !   for (length = 0; length < max_length; length++)
>      {
> !       int c = getc (stream);
> !       if (c == EOF)
> !         break;
>
> !       if (c == '\r')
>          {
> !           c = getc (stream);
> !           if (c != '\n')
> !             {
> !               ungetc (c, stream);
> !               c = '\n';
> !             }
>          }
> !       ds_put_char (st, c);
> !       if (c == '\n')
> !         return true;
>      }
> +
> +   return length > 0;
>  }
>
>  /* Removes a comment introduced by `#' from ST,
>
> --
> Ben Pfaff
> http://benpfaff.org
>
_______________________________________________
pspp-dev mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/pspp-dev

Reply via email to