Re: [GRASS-dev] Segment library zero filling

2019-05-21 Thread Markus Metz
On Mon, May 20, 2019 at 8:43 PM Vaclav Petras  wrote:
>
>
>
> On Mon, May 20, 2019 at 1:46 PM Markus Metz 
wrote:
>>
>>
>> On Mon, May 20, 2019 at 7:18 PM Vaclav Petras 
wrote:
>> >
>> > On Mon, May 20, 2019 at 11:57 AM Markus Metz <
markus.metz.gisw...@gmail.com> wrote:
>> >>
>> >> On Mon, May 20, 2019 at 5:39 PM Vaclav Petras 
wrote:
>> >> >
>> >> > Hi MarkusM and all,
>> >> >
>> >> > I'm trying to understand if the Segment_open() [1] function fills
with zeros or not. I don't think it does since it is calling G_malloc
(malloc) or Segment_format_nofill(). However, it is not completely clear to
me what is supposed to be doing because documentation still says it calls
Segment_format() and I don't understand context of the related commit [2,
3] and the usage of lseek and USE_LSEEK are not clear to me from format.c
[4].
>> >>
>> >> Segment_open() uses Segment_format_nofill() [1], if it can not use
the all-in-memory cache. The documentation has not been updated accordingly
(yet).
>> >
>> >
>> > Thank you. Just be to be sure: Now, if you want to ensure that it is
zero-filled across platform, you need to do it yourself using
Segment_put(), right?
>>
>> All modules that use the segment lib use Segment_put() anyway to load
data
>
>
> I'm asking about the case when I'm writing a new raster map. Let's say
with point binning (r.in.lidar/r.in.pdal) or with a simulation I want the
values to be initialized as zeros. I can use Segment_put(), but it seems
that what Segment_format() is for. Hence the question about
Segment_open_zero_fill() (as something more costly than Segment_open() but
cheaper than Segment_open()+Segment_put()).

In this case I would rather initialize with NULL values. IIUC,
Segment_get() will return all null bytes if lseek has been used and no data
have been written out yet. Note that the all-in-memory cache uses malloc,
not calloc (which it probably should), thus no initialization with zero
bytes in this case.

>>
>> > The exact role of lseek() here is still unclear to me (the hole and \0
bytes).
>> man lseek:
>>lseek() allows the file offset to be set beyond the end of the
file (but this does not change the size of the file).  If data is later
 written
>>at this point, subsequent reads of the data in the gap (a "hole")
return null bytes ('\0') until data is actually written into the gap.
>>
>> More info: disk space is allocated and file size increases only when
actual data are written into the gap
>
>
> Thanks. I have seen this, but the segment library is not using the null
bytes as zeros, or does it?

The segment lib does not interpret the bytes, it returns the bytes as is
(actually a void pointer). The module then casts it to whatever it is
supposed to be.
>
> > ...
> >> The advantage of no fill, only lseek, is that it is faster, the
disadvantage is that any "no space left on device" error will be
encountered only later on, and you always need to check the return code of
Segment_put().
> > ...
>
> It seems that not all modules are doing this (e.g. r.cost). How big a
problem that is? I guess the result is a truncated raster without a warning
in case of "no space left on device".

Yes, the result would be either a truncated raster or earlier an error on
reading. r.cost as other affected modules need to be updated to check the
return value of Segment_[get|put]() and exit with fatal error on failure.

Markus M
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] Segment library zero filling

2019-05-20 Thread Vaclav Petras
On Mon, May 20, 2019 at 1:46 PM Markus Metz 
wrote:

>
> On Mon, May 20, 2019 at 7:18 PM Vaclav Petras 
> wrote:
> >
> > On Mon, May 20, 2019 at 11:57 AM Markus Metz <
> markus.metz.gisw...@gmail.com> wrote:
> >>
> >> On Mon, May 20, 2019 at 5:39 PM Vaclav Petras 
> wrote:
> >> >
> >> > Hi MarkusM and all,
> >> >
> >> > I'm trying to understand if the Segment_open() [1] function fills
> with zeros or not. I don't think it does since it is calling G_malloc
> (malloc) or Segment_format_nofill(). However, it is not completely clear to
> me what is supposed to be doing because documentation still says it calls
> Segment_format() and I don't understand context of the related commit [2,
> 3] and the usage of lseek and USE_LSEEK are not clear to me from format.c
> [4].
> >>
> >> Segment_open() uses Segment_format_nofill() [1], if it can not use the
> all-in-memory cache. The documentation has not been updated accordingly
> (yet).
> >
> >
> > Thank you. Just be to be sure: Now, if you want to ensure that it is
> zero-filled across platform, you need to do it yourself using
> Segment_put(), right?
>
> All modules that use the segment lib use Segment_put() anyway to load data
>

I'm asking about the case when I'm writing a new raster map. Let's say with
point binning (r.in.lidar/r.in.pdal) or with a simulation I want the values
to be initialized as zeros. I can use Segment_put(), but it seems that what
Segment_format() is for. Hence the question about Segment_open_zero_fill()
(as something more costly than Segment_open() but cheaper than
Segment_open()+Segment_put()).


> > (Assuming you want to use Segment_open().)
>
> or Segment_format_nofill() + Segment_init()
>

I don't want to use Segment_init(), it is just too low level comparing to
your Segment_open() (plus it does not have the all-in-memory mode/code).


>
> > The exact role of lseek() here is still unclear to me (the hole and \0
> bytes).
> man lseek:
>lseek() allows the file offset to be set beyond the end of the file
> (but this does not change the size of the file).  If data is later  written
>at this point, subsequent reads of the data in the gap (a "hole")
> return null bytes ('\0') until data is actually written into the gap.
>
> More info: disk space is allocated and file size increases only when
> actual data are written into the gap
>

Thanks. I have seen this, but the segment library is not using the null
bytes as zeros, or does it?

> ...
>> The advantage of no fill, only lseek, is that it is faster, the
disadvantage is that any "no space left on device" error will be
encountered only later on, and you always need to check the return code of
Segment_put().
> ...

It seems that not all modules are doing this (e.g. r.cost). How big a
problem that is? I guess the result is a truncated raster without a warning
in case of "no space left on device".

>
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] Segment library zero filling

2019-05-20 Thread Markus Metz
On Mon, May 20, 2019 at 7:18 PM Vaclav Petras  wrote:
>
>
>
> On Mon, May 20, 2019 at 11:57 AM Markus Metz <
markus.metz.gisw...@gmail.com> wrote:
>>
>>
>>
>> On Mon, May 20, 2019 at 5:39 PM Vaclav Petras 
wrote:
>> >
>> > Hi MarkusM and all,
>> >
>> > I'm trying to understand if the Segment_open() [1] function fills with
zeros or not. I don't think it does since it is calling G_malloc (malloc)
or Segment_format_nofill(). However, it is not completely clear to me what
is supposed to be doing because documentation still says it calls
Segment_format() and I don't understand context of the related commit [2,
3] and the usage of lseek and USE_LSEEK are not clear to me from format.c
[4].
>>
>> Segment_open() uses Segment_format_nofill() [1], if it can not use the
all-in-memory cache. The documentation has not been updated accordingly
(yet).
>
>
> Thank you. Just be to be sure: Now, if you want to ensure that it is
zero-filled across platform, you need to do it yourself using
Segment_put(), right?

All modules that use the segment lib use Segment_put() anyway to load data

> (Assuming you want to use Segment_open().)

or Segment_format_nofill() + Segment_init()

> The exact role of lseek() here is still unclear to me (the hole and \0
bytes).
man lseek:
   lseek() allows the file offset to be set beyond the end of the file
(but this does not change the size of the file).  If data is later  written
   at this point, subsequent reads of the data in the gap (a "hole")
return null bytes ('\0') until data is actually written into the gap.

More info: disk space is allocated and file size increases only when actual
data are written into the gap

>
> Would Segment_open_zero_fill() make sense, i.e. do you know if
Segment_fill() faster then Segment_put()?

Segment_put() is used anyway to load the actual data, therefore the
question if a new Segment_open_zero_fill() is an alternative to
Segment_put() is invalid because you need to use Segment_put() in any case
to populate the Segment structure with meaningful data.

Markus M
>
>>
>> The advantage of no fill, only lseek, is that it is faster, the
disadvantage is that any "no space left on device" error will be
encountered only later on, and you always need to check the return code of
Segment_put().
>
>
> Makes sense. I'll document that as well.
>
>>
>>
>> HTH,
>>
>> Markus M
>>
>> [1] https://github.com/OSGeo/grass/blob/master/lib/segment/open.c#L89
>>
>> >
>> > Markus, can you please clarify that for me? I will then update the
documentation with whatever is needed.
>> >
>> > Thanks,
>> > Vaclav
>> >
>> > [1]
https://grass.osgeo.org/programming7/segment_2open_8c.html#ae24d2e794c66c0512b67d7cea8b2ba9a
>> > [2]
https://github.com/OSGeo/grass/commit/7a0d8d749537acd6d5c4baea11dbb6167fdef916
>> > [3] https://trac.osgeo.org/grass/changeset/73268
>> > [4] https://github.com/OSGeo/grass/blob/master/lib/segment/format.c
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] Segment library zero filling

2019-05-20 Thread Vaclav Petras
On Mon, May 20, 2019 at 11:57 AM Markus Metz 
wrote:

>
>
> On Mon, May 20, 2019 at 5:39 PM Vaclav Petras 
> wrote:
> >
> > Hi MarkusM and all,
> >
> > I'm trying to understand if the Segment_open() [1] function fills with
> zeros or not. I don't think it does since it is calling G_malloc (malloc)
> or Segment_format_nofill(). However, it is not completely clear to me what
> is supposed to be doing because documentation still says it calls
> Segment_format() and I don't understand context of the related commit [2,
> 3] and the usage of lseek and USE_LSEEK are not clear to me from format.c
> [4].
>
> Segment_open() uses Segment_format_nofill() [1], if it can not use the
> all-in-memory cache. The documentation has not been updated accordingly
> (yet).
>

Thank you. Just be to be sure: Now, if you want to ensure that it is
zero-filled across platform, you need to do it yourself using
Segment_put(), right? (Assuming you want to use Segment_open().) The exact
role of lseek() here is still unclear to me (the hole and \0 bytes).

Would Segment_open_zero_fill() make sense, i.e. do you know if
Segment_fill() faster then Segment_put()?


> The advantage of no fill, only lseek, is that it is faster, the
> disadvantage is that any "no space left on device" error will be
> encountered only later on, and you always need to check the return code of
> Segment_put().
>

Makes sense. I'll document that as well.


>
> HTH,
>
> Markus M
>
> [1] https://github.com/OSGeo/grass/blob/master/lib/segment/open.c#L89
>
> >
> > Markus, can you please clarify that for me? I will then update the
> documentation with whatever is needed.
> >
> > Thanks,
> > Vaclav
> >
> > [1]
> https://grass.osgeo.org/programming7/segment_2open_8c.html#ae24d2e794c66c0512b67d7cea8b2ba9a
> > [2]
> https://github.com/OSGeo/grass/commit/7a0d8d749537acd6d5c4baea11dbb6167fdef916
> > [3] https://trac.osgeo.org/grass/changeset/73268
> > [4] https://github.com/OSGeo/grass/blob/master/lib/segment/format.c
>
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] Segment library zero filling

2019-05-20 Thread Markus Metz
On Mon, May 20, 2019 at 5:39 PM Vaclav Petras  wrote:
>
> Hi MarkusM and all,
>
> I'm trying to understand if the Segment_open() [1] function fills with
zeros or not. I don't think it does since it is calling G_malloc (malloc)
or Segment_format_nofill(). However, it is not completely clear to me what
is supposed to be doing because documentation still says it calls
Segment_format() and I don't understand context of the related commit [2,
3] and the usage of lseek and USE_LSEEK are not clear to me from format.c
[4].

Segment_open() uses Segment_format_nofill() [1], if it can not use the
all-in-memory cache. The documentation has not been updated accordingly
(yet). The advantage of no fill, only lseek, is that it is faster, the
disadvantage is that any "no space left on device" error will be
encountered only later on, and you always need to check the return code of
Segment_put().

HTH,

Markus M

[1] https://github.com/OSGeo/grass/blob/master/lib/segment/open.c#L89

>
> Markus, can you please clarify that for me? I will then update the
documentation with whatever is needed.
>
> Thanks,
> Vaclav
>
> [1]
https://grass.osgeo.org/programming7/segment_2open_8c.html#ae24d2e794c66c0512b67d7cea8b2ba9a
> [2]
https://github.com/OSGeo/grass/commit/7a0d8d749537acd6d5c4baea11dbb6167fdef916
> [3] https://trac.osgeo.org/grass/changeset/73268
> [4] https://github.com/OSGeo/grass/blob/master/lib/segment/format.c
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

[GRASS-dev] Segment library zero filling

2019-05-20 Thread Vaclav Petras
Hi MarkusM and all,

I'm trying to understand if the Segment_open() [1] function fills with
zeros or not. I don't think it does since it is calling G_malloc (malloc)
or Segment_format_nofill(). However, it is not completely clear to me what
is supposed to be doing because documentation still says it calls
Segment_format() and I don't understand context of the related commit [2,
3] and the usage of lseek and USE_LSEEK are not clear to me from format.c
[4].

Markus, can you please clarify that for me? I will then update the
documentation with whatever is needed.

Thanks,
Vaclav

[1]
https://grass.osgeo.org/programming7/segment_2open_8c.html#ae24d2e794c66c0512b67d7cea8b2ba9a
[2]
https://github.com/OSGeo/grass/commit/7a0d8d749537acd6d5c4baea11dbb6167fdef916
[3] https://trac.osgeo.org/grass/changeset/73268
[4] https://github.com/OSGeo/grass/blob/master/lib/segment/format.c
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev