Re: [GRASS-dev] Right way to use G_getl2
Glynn Clements wrote: > > FWIW, I suggest changing the existing behaviour [of G_getl2()] so that it > always > reads an entire line, even if it can't store it all. IOW, read until > EOL or EOF, stop storing characters once the buffer is full. +1 > > Unlike with fgets(), which stores the line terminator in the buffer, > there is no way for the caller of G_getl2() to determine that an > incomplete line was read. If the buffer is too small, you lose either > way; discarding the trailing characters is likely to be more robust > than returning them as if they were a subsequent line (the current > behaviour). The caller could know if a line was only partially read if G_getl2() would return -1 (anything but 0, 1) in case the buffer is too small. Markus M ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] Right way to use G_getl2
Vaclav Petras wrote: > So, finally: What is the right usage of G_getl2? Should the caller use > buffer size equal to maximum number of expected characters on line plus end > of line character (plus 2 on MS Windows) plus terminating character? Or > should he just pass the n parameter increased by one (two on MS Windows) > since in fact nothing will be stored to buffer? Or should he just allocate > really huge buffer and than read chars from buffer if he wants to store > them? The short answer is that the buffer needs to be more than large enough. As it turns out, it needs at least two more characters: one for the NUL and one to ensure that the line terminator is read even though it isn't stored. FWIW, I suggest changing the existing behaviour so that it always reads an entire line, even if it can't store it all. IOW, read until EOL or EOF, stop storing characters once the buffer is full. Unlike with fgets(), which stores the line terminator in the buffer, there is no way for the caller of G_getl2() to determine that an incomplete line was read. If the buffer is too small, you lose either way; discarding the trailing characters is likely to be more robust than returning them as if they were a subsequent line (the current behaviour). -- Glynn Clements ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] Right way to use G_getl2
Markus Metz wrote: > I would suggest to modify G_getl2() to read at most n (not n - 1) > characters and include a check if the buffer is long enough: mmmph, to me G_get2() does the right thing, as long as people follow the given instructions all is fine. So I'd vote to keep it as-is. FWIW, "at most one less than size characters from stream" follows the behaviour of fgets() and (programmer)fails on the side of not overflowing the given output buffer. The \n and \r chars are not stored in the output string, and are not left in the input stream for the next line. (if this wasn't the case it's pretty hard to imagine that we wouldn't have noticed corrupted ascii file input over the last 4 years) is there a ticket for the 'v.in.ascii skip=' overflow problem? if not can someone open one with a problematic input file attached? I suspect the recent patch is perhaps fixing a problem actually rooted somewhere else.. thanks, Hamish ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] Right way to use G_getl2
I think the confusion arises because G_getl2(char *buf, int n, FILE * fd) reads in at most n - 1 characters, not n characters. The next character is always set to '\0' and guaranteed to be at most the n-th character. That also means that G_getl() does not check if the buffer is large enough to hold the line to be read, which it should, IMHO. To the example with v.in.ascii: The module first finds the longest line, the length including the terminating character. This number is then passed to G_getl2() as n, which means that the end of the line is not read for the longest line, because the end of the line is the n-th character. I would suggest to modify G_getl2() to read at most n (not n - 1) characters and include a check if the buffer is long enough: ---> Index: getl.c === --- getl.c(revision 57537) +++ getl.c(working copy) @@ -67,7 +67,7 @@ int c; int ret = 1; -while (i < n - 1) { +while (i < n) { c = fgetc(fd); if (c == EOF) { @@ -88,6 +88,11 @@ } break; } + +if (i == n - 1) { +G_warning("G_getl2(): buffer too small"); +break; +} buf[i] = c; <--- Markus M On Fri, Sep 13, 2013 at 4:58 AM, Vaclav Petras wrote: > Hi, > > this relates to the recent fix for r.in.ascii (r57649). The skip parameter > [2] required one line more because when reading lines using G_getl2 [3] > function and counting these lines it counted one line more. > > The documentation of the function says that it reads max n-1 bytes. In other > words, n is the size of the buffer, so it is clear that it can n-1 useful > characters plus string terminating character '\0'. Quite standard way but... > > The problem in v.in.ascii was that it the caller passed the previously > determined maximum number of characters on line in file plus one (space for > the terminating character). So this number is the same for all lines and it > works well for all lines except for the longest one. > > The reason is that G_getl2 stops reading characters for the longest line and > leaves end line character in the stream. So, in fact, the longest line is in > buffer and can be processed. However, the next line is than only the > remaining end line character. This is not what the caller expected. There is > even no way to find out that the line was not read completely. > > So, finally: What is the right usage of G_getl2? Should the caller use > buffer size equal to maximum number of expected characters on line plus end > of line character (plus 2 on MS Windows) plus terminating character? Or > should he just pass the n parameter increased by one (two on MS Windows) > since in fact nothing will be stored to buffer? Or should he just allocate > really huge buffer and than read chars from buffer if he wants to store > them? > > Thanks, > Vaclav > > [1] https://trac.osgeo.org/grass/changeset/57649 > [2] http://grass.osgeo.org/grass70/manuals/v.in.ascii.html > [3] > http://grass.osgeo.org/programming7/getl_8c.html#a4bf86e666fda18059c42b0f5eca6f9bd > > Note 1: Don't be confused from documentation since the n parameter > description for G_getl2 is wrong. > > Note 2: In description I was taking only about Unix line endings. On Windows > we need two bytes more to store it as noted in questions. The case of old > Mac OS is not cover because it is even more tricky in G_getl2 function. > > > ___ > grass-dev mailing list > grass-dev@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/grass-dev ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
[GRASS-dev] Right way to use G_getl2
Hi, this relates to the recent fix for r.in.ascii (r57649). The skip parameter [2] required one line more because when reading lines using G_getl2 [3] function and counting these lines it counted one line more. The documentation of the function says that it reads max n-1 bytes. In other words, n is the size of the buffer, so it is clear that it can n-1 useful characters plus string terminating character '\0'. Quite standard way but... The problem in v.in.ascii was that it the caller passed the previously determined maximum number of characters on line in file plus one (space for the terminating character). So this number is the same for all lines and it works well for all lines except for the longest one. The reason is that G_getl2 stops reading characters for the longest line and leaves end line character in the stream. So, in fact, the longest line is in buffer and can be processed. However, the next line is than only the remaining end line character. This is not what the caller expected. There is even no way to find out that the line was not read completely. So, finally: What is the right usage of G_getl2? Should the caller use buffer size equal to maximum number of expected characters on line plus end of line character (plus 2 on MS Windows) plus terminating character? Or should he just pass the n parameter increased by one (two on MS Windows) since in fact nothing will be stored to buffer? Or should he just allocate really huge buffer and than read chars from buffer if he wants to store them? Thanks, Vaclav [1] https://trac.osgeo.org/grass/changeset/57649 [2] http://grass.osgeo.org/grass70/manuals/v.in.ascii.html [3] http://grass.osgeo.org/programming7/getl_8c.html#a4bf86e666fda18059c42b0f5eca6f9bd Note 1: Don't be confused from documentation since the n parameter description for G_getl2 is wrong. Note 2: In description I was taking only about Unix line endings. On Windows we need two bytes more to store it as noted in questions. The case of old Mac OS is not cover because it is even more tricky in G_getl2 function. ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev