Re: [GRASS-dev] Right way to use G_getl2

2013-09-14 Thread Markus Metz
Glynn Clements wrote:
>
> FWIW, I suggest changing the existing behaviour [of G_getl2()] so that it 
> always
> reads an entire line, even if it can't store it all. IOW, read until
> EOL or EOF, stop storing characters once the buffer is full.
+1

>
> Unlike with fgets(), which stores the line terminator in the buffer,
> there is no way for the caller of G_getl2() to determine that an
> incomplete line was read. If the buffer is too small, you lose either
> way; discarding the trailing characters is likely to be more robust
> than returning them as if they were a subsequent line (the current
> behaviour).

The caller could know if a line was only partially read if G_getl2()
would return -1 (anything but 0, 1) in case the buffer is too small.

Markus M
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] Right way to use G_getl2

2013-09-13 Thread Glynn Clements

Vaclav Petras wrote:

> So, finally: What is the right usage of G_getl2? Should the caller use
> buffer size equal to maximum number of expected characters on line plus end
> of line character (plus 2 on MS Windows) plus terminating character? Or
> should he just pass the n parameter increased by one (two on MS Windows)
> since in fact nothing will be stored to buffer? Or should he just allocate
> really huge buffer and than read chars from buffer if he wants to store
> them?

The short answer is that the buffer needs to be more than large
enough. As it turns out, it needs at least two more characters: one
for the NUL and one to ensure that the line terminator is read even
though it isn't stored.

FWIW, I suggest changing the existing behaviour so that it always
reads an entire line, even if it can't store it all. IOW, read until
EOL or EOF, stop storing characters once the buffer is full.

Unlike with fgets(), which stores the line terminator in the buffer,
there is no way for the caller of G_getl2() to determine that an
incomplete line was read. If the buffer is too small, you lose either
way; discarding the trailing characters is likely to be more robust
than returning them as if they were a subsequent line (the current
behaviour).

-- 
Glynn Clements 
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] Right way to use G_getl2

2013-09-13 Thread Hamish
Markus Metz wrote:

> I would suggest to modify G_getl2() to read at most n (not n - 1)
> characters and include a check if the buffer is long enough:

mmmph, to me G_get2() does the right thing, as long as people follow
the given instructions all is fine. So I'd vote to keep it as-is.

FWIW, "at most one less than size characters from stream" follows
the behaviour of fgets() and (programmer)fails on the side of not
overflowing the given output buffer.


The \n and \r chars are not stored in the output string, and are not
left in the input stream for the next line. (if this wasn't the case
it's pretty hard to imagine that we wouldn't have noticed corrupted
ascii file input over the last 4 years)

is there a ticket for the 'v.in.ascii skip=' overflow problem? if not
can someone open one with a problematic input file attached? I suspect
the recent patch is perhaps fixing a problem actually rooted somewhere
else..


thanks,
Hamish

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] Right way to use G_getl2

2013-09-13 Thread Markus Metz
I think the confusion arises because G_getl2(char *buf, int n, FILE *
fd) reads in at most n - 1 characters, not n characters. The next
character is always set to '\0' and guaranteed to be at most the n-th
character. That also means that G_getl() does not check if the buffer
is large enough to hold the line to be read, which it should, IMHO.

To the example with v.in.ascii:
The module first finds the longest line, the length including the
terminating character. This number is then passed to G_getl2() as n,
which means that the end of the line is not read for the longest line,
because the end of the line is the n-th character.

I would suggest to modify G_getl2() to read at most n (not n - 1)
characters and include a check if the buffer is long enough:

--->
Index: getl.c
===
--- getl.c(revision 57537)
+++ getl.c(working copy)
@@ -67,7 +67,7 @@
 int c;
 int ret = 1;

-while (i < n - 1) {
+while (i < n) {
 c = fgetc(fd);

 if (c == EOF) {
@@ -88,6 +88,11 @@
 }
 break;
 }
+
+if (i == n - 1) {
+G_warning("G_getl2(): buffer too small");
+break;
+}

 buf[i] = c;
<---

Markus M


On Fri, Sep 13, 2013 at 4:58 AM, Vaclav Petras  wrote:
> Hi,
>
> this relates to the recent fix for r.in.ascii (r57649). The skip parameter
> [2] required one line more because when reading lines using G_getl2 [3]
> function and counting these lines it counted one line more.
>
> The documentation of the function says that it reads max n-1 bytes. In other
> words, n is the size of the buffer, so it is clear that it can n-1 useful
> characters plus string terminating character '\0'. Quite standard way but...
>
> The problem in v.in.ascii was that it the caller passed the previously
> determined maximum number of characters on line in file plus one (space for
> the terminating character). So this number is the same for all lines and it
> works well for all lines except for the longest one.
>
> The reason is that G_getl2 stops reading characters for the longest line and
> leaves end line character in the stream. So, in fact, the longest line is in
> buffer and can be processed. However, the next line is than only the
> remaining end line character. This is not what the caller expected. There is
> even no way to find out that the line was not read completely.
>
> So, finally: What is the right usage of G_getl2? Should the caller use
> buffer size equal to maximum number of expected characters on line plus end
> of line character (plus 2 on MS Windows) plus terminating character? Or
> should he just pass the n parameter increased by one (two on MS Windows)
> since in fact nothing will be stored to buffer? Or should he just allocate
> really huge buffer and than read chars from buffer if he wants to store
> them?
>
> Thanks,
> Vaclav
>
> [1] https://trac.osgeo.org/grass/changeset/57649
> [2] http://grass.osgeo.org/grass70/manuals/v.in.ascii.html
> [3]
> http://grass.osgeo.org/programming7/getl_8c.html#a4bf86e666fda18059c42b0f5eca6f9bd
>
> Note 1: Don't be confused from documentation since the n parameter
> description for G_getl2 is wrong.
>
> Note 2: In description I was taking only about Unix line endings. On Windows
> we need two bytes more to store it as noted in questions. The case of old
> Mac OS is not cover because it is even more tricky in G_getl2 function.
>
>
> ___
> grass-dev mailing list
> grass-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] Right way to use G_getl2

2013-09-12 Thread Vaclav Petras
Hi,

this relates to the recent fix for r.in.ascii (r57649). The skip parameter
[2] required one line more because when reading lines using G_getl2 [3]
function and counting these lines it counted one line more.

The documentation of the function says that it reads max n-1 bytes. In
other words, n is the size of the buffer, so it is clear that it can n-1
useful characters plus string terminating character '\0'. Quite standard
way but...

The problem in v.in.ascii was that it the caller passed the previously
determined maximum number of characters on line in file plus one (space for
the terminating character). So this number is the same for all lines and it
works well for all lines except for the longest one.

The reason is that G_getl2 stops reading characters for the longest line
and leaves end line character in the stream. So, in fact, the longest line
is in buffer and can be processed. However, the next line is than only the
remaining end line character. This is not what the caller expected. There
is even no way to find out that the line was not read completely.

So, finally: What is the right usage of G_getl2? Should the caller use
buffer size equal to maximum number of expected characters on line plus end
of line character (plus 2 on MS Windows) plus terminating character? Or
should he just pass the n parameter increased by one (two on MS Windows)
since in fact nothing will be stored to buffer? Or should he just allocate
really huge buffer and than read chars from buffer if he wants to store
them?

Thanks,
Vaclav

[1] https://trac.osgeo.org/grass/changeset/57649
[2] http://grass.osgeo.org/grass70/manuals/v.in.ascii.html
[3]
http://grass.osgeo.org/programming7/getl_8c.html#a4bf86e666fda18059c42b0f5eca6f9bd

Note 1: Don't be confused from documentation since the n parameter
description for G_getl2 is wrong.

Note 2: In description I was taking only about Unix line endings. On
Windows we need two bytes more to store it as noted in questions. The case
of old Mac OS is not cover because it is even more tricky in G_getl2
function.
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev