Re: [GRASS-user] Organizing spatial (time series) data for mixed GIS environments

2013-12-05 Thread Blumentrath, Stefan
Hi again

Now I tested a bit the GeoTIFF approach with regards to disc space and 
performance:
For the disc space test I exported a MODIS dataset over Scandinavia (Size is 
7629 x 9387 pixels) from GRASS native format (type CELL which had (compressed) 
27M on disc) to Geotiff with two different data types (Int16 and Float64) both 
LZW-compressed and uncompressed.

Results for the Int16 dataset
MODIS_sizetest_compressed.tif (LZW-compressed, Predictor 2): 14M
MODIS_sizetest_uncompressed.tif (uncompressed): 137M

Results for the Float64 dataset
MODIS_sizetest_compressed_Float64.tif (LZW-compressed): 29M
MODIS_sizetest_uncompressed_Float64.tif  (uncompressed): 547M

So, disc capacity seems to be an factor one should consider as uncompressed 
data is in this case at least 10 to 20 times heavier than compressed...
Maybe we will have to accept that raw data is kept in a less interoperable 
(GRASS native) format as the processed results are mainly the ones of interests 
(and I guess visual file browser e.g. in Arc software will never the less 
almost freeze when opening a folder with hundreds or thousands of files, do not 
know?).

I tried to run also performance tests using the time command and r.mapcalc with 
external and native format both for input and output (all 4 combinations). 
Results were not reliable and performance tests seem to be a bit tricky 
according to Glynns post here:
http://osdir.com/ml/grass-development-gis/2010-09/msg00225.html
Does anyone have a suggestion how such tests Glynn describes could be run 
technically (for not C-developers) without too much effort?

Cheers
Stefan

-Original Message-
From: grass-user-boun...@lists.osgeo.org 
[mailto:grass-user-boun...@lists.osgeo.org] On Behalf Of Blumentrath, Stefan
Sent: 4. desember 2013 22:52
To: Sören Gebbert
Cc: grass-user@lists.osgeo.org list
Subject: Re: [GRASS-user] Organizing spatial (time series) data for mixed GIS 
environments

Hi Sören,

First of all thank you very much for the excellent temporal framework! It is 
really great work!
Thank you also for your answers. They are already very helpful too!

I will test the solution with external Geotiffs.

Updates of the Geotiffs by external software are expectable (possibly by 
cron-jobs), so I have to think about a strategy for updating all time-space 
datasets down-streams which depend on a file updated (decade, year, month, 
whatever...)

I`ll report back after some first tests...

Best regards
Stefan


-Original Message-
From: Sören Gebbert [mailto:soerengebb...@googlemail.com]
Sent: 4. desember 2013 18:02
To: Blumentrath, Stefan
Cc: grass-user@lists.osgeo.org list
Subject: Re: [GRASS-user] Organizing spatial (time series) data for mixed GIS 
environments

Hi Stefan,

2013/12/3 Blumentrath, Stefan stefan.blumentr...@nina.no:
 Dear all,



 On our Ubuntu server we are about to reorganize our GIS data in order 
 to develop a more efficient and consistent solution for data storage 
 in a mixed GIS environment.

 By “mixed GIS environment” I mean that we have people working with 
 GRASS, QGIS, PostGIS but also many people using R and maybe the 
 largest fraction using ESRI products, furthermore we have people using 
 ENIV, ERDAS and some other. Only few people (like me) actually work 
 directly on the server…

 Until now I stored “my” data mainly in GRASS (6/7) native format which 
 I was very happy with. But I  guess our ESRI- and PostGIS-people would 
 not accept that as a standard…



 However, especially for time series data we cannot have several copies 
 in different formats (tailor-made for each and every software).



 So I started thinking: what would be the most efficient and convenient 
 solution for storing a large amount of data (e.g. high resolution 
 raster and vector data with national extent plus time series data) in 
 a way that it is accessible for all (at least most) remote users (with 
 different GIS software). As I am very fond of the temporal framework 
 in GRASS 7 it would be a precondition that I can use these tools on 
 the data without unreasonable performance loss. Another precondition 
 would be that users at remote computers in our (MS Windows) network can have 
 access to the data.



 In general, four options come into my mind:

 a)  Stick to GRASS native format and have one copy in another format

 b)  Use the native formats the data come in (e.g. temperature and
 precipitation comes in zipped ascii-grid format)

 c)   Use PostGIS as a backend for data storage (raster / vector) (linked
 by (r./v.external.*)

 d)  Use another GDAL/OGR format for data storage (raster / vector)
 (linked by (r./v.external.*)



 My question(s) are:

 What solutions could you recommend or what solution did you choose?

I would suggest to use r.external and uncompressed geotiff files for raster 
data. But you have to make sure that external software does not modify these 
files, or if they do, that the temporal framework is triggered to update

Re: [GRASS-user] Organizing spatial (time series) data for mixed GIS environments

2013-12-04 Thread Sören Gebbert
Hi Stefan,

2013/12/3 Blumentrath, Stefan stefan.blumentr...@nina.no:
 Dear all,



 On our Ubuntu server we are about to reorganize our GIS data in order to
 develop a more efficient and consistent solution for data storage in a mixed
 GIS environment.

 By “mixed GIS environment” I mean that we have people working with GRASS,
 QGIS, PostGIS but also many people using R and maybe the largest fraction
 using ESRI products, furthermore we have people using ENIV, ERDAS and some
 other. Only few people (like me) actually work directly on the server…

 Until now I stored “my” data mainly in GRASS (6/7) native format which I was
 very happy with. But I  guess our ESRI- and PostGIS-people would not accept
 that as a standard…



 However, especially for time series data we cannot have several copies in
 different formats (tailor-made for each and every software).



 So I started thinking: what would be the most efficient and convenient
 solution for storing a large amount of data (e.g. high resolution raster and
 vector data with national extent plus time series data) in a way that it is
 accessible for all (at least most) remote users (with different GIS
 software). As I am very fond of the temporal framework in GRASS 7 it would
 be a precondition that I can use these tools on the data without
 unreasonable performance loss. Another precondition would be that users at
 remote computers in our (MS Windows) network can have access to the data.



 In general, four options come into my mind:

 a)  Stick to GRASS native format and have one copy in another format

 b)  Use the native formats the data come in (e.g. temperature and
 precipitation comes in zipped ascii-grid format)

 c)   Use PostGIS as a backend for data storage (raster / vector) (linked
 by (r./v.external.*)

 d)  Use another GDAL/OGR format for data storage (raster / vector)
 (linked by (r./v.external.*)



 My question(s) are:

 What solutions could you recommend or what solution did you choose?

I would suggest to use r.external and uncompressed geotiff files for
raster data. But you have to make sure that external software does not
modify these files, or if they do, that the temporal framework is
triggered to update dependent space time raster datasets.

I would suggest to use the native GRASS format, in case of vector
data. Hence vector data needs to be copied. But maybe PostgreSQL with
topology support will be a solution? I think Martin Landa may have an
opinion here.


 Who is having experience with this kind of data management challenge?

No experience here from my side.

 How do externally linked data series perform compared to GRASS native?

It will be slower than the native format for sure. But i don't know
how much slower.



 I searched a bit the mailing list and found this:
 (http://osgeo-org.1560.x6.nabble.com/GRASS7-temporal-GIS-database-questions-td5054920.html)
 where Sören recommended “postgresql as temporal database backend”. However I
 am not sure if that was meant only for the temporal metadata and not the
 rasters themselves…

My recommendation was related to the temporal metadata only. The
sqlite database will not scale very well for select requests if you
have more than 30,000 maps registered in your temporal database.
PostgreSQL will be much faster for select requests. But PostgreSQL
performs very badly in managing (insert, update, delete) many maps. I
am not sure what the reason for this is, but from my experience has
PostgreSQL a scaling problem with many tables. Hence if you do not
modify you data often, PostgreSQL is your temporal database backend of
choice. Otherwise i would recommend Sqlite, even if its slower for
select requests.

 Furthermore in the idea collection for the Temporal framework
 (http://grasswiki.osgeo.org/wiki/Time_series_development, Open issues

This discussion is pretty old and does not reflect the current
temporal framework implementation. Please have a look at the new
TGRASS paper:
https://www.sciencedirect.com/science/article/pii/S136481521300282X?np=y
and the Geostat workshop:
http://geostat-course.org/Topic_Gebbert

 section) limitations were mentioned regarding the number of files in a
 folder, which would be possibly a problem both for file based storage. The
 ext2 file system had “soft upper limit of about 10-15k files in a single
 directory” but theoretically many more where possible. Other file systems
 may allow for more I guess… Will usage of such big directories  10,000
 files lead to performance problems?

Modern file systems should not have problems with many files. I am
using ext4 and the temporal framework with 100.000 maps without
noticeable performance issues.


 The “Working with external data in GRASS 7” – wiki entry
 (http://grasswiki.osgeo.org/wiki/Working_with_external_data_in_GRASS_7)
 covers the technical part (and to some degree performance issues) very well.
 Would it be worth adding a part on the strategic considerations / pros and
 cons of using 

Re: [GRASS-user] Organizing spatial (time series) data for mixed GIS environments

2013-12-04 Thread Sören Gebbert
Hi Stefan,
there is a FOSS4G presentation online as well:
http://elogeo.nottingham.ac.uk/xmlui/handle/url/288

Best regards
Soeren

2013/12/4 Sören Gebbert soerengebb...@googlemail.com:
 Hi Stefan,

 2013/12/3 Blumentrath, Stefan stefan.blumentr...@nina.no:
 Dear all,



 On our Ubuntu server we are about to reorganize our GIS data in order to
 develop a more efficient and consistent solution for data storage in a mixed
 GIS environment.

 By “mixed GIS environment” I mean that we have people working with GRASS,
 QGIS, PostGIS but also many people using R and maybe the largest fraction
 using ESRI products, furthermore we have people using ENIV, ERDAS and some
 other. Only few people (like me) actually work directly on the server…

 Until now I stored “my” data mainly in GRASS (6/7) native format which I was
 very happy with. But I  guess our ESRI- and PostGIS-people would not accept
 that as a standard…



 However, especially for time series data we cannot have several copies in
 different formats (tailor-made for each and every software).



 So I started thinking: what would be the most efficient and convenient
 solution for storing a large amount of data (e.g. high resolution raster and
 vector data with national extent plus time series data) in a way that it is
 accessible for all (at least most) remote users (with different GIS
 software). As I am very fond of the temporal framework in GRASS 7 it would
 be a precondition that I can use these tools on the data without
 unreasonable performance loss. Another precondition would be that users at
 remote computers in our (MS Windows) network can have access to the data.



 In general, four options come into my mind:

 a)  Stick to GRASS native format and have one copy in another format

 b)  Use the native formats the data come in (e.g. temperature and
 precipitation comes in zipped ascii-grid format)

 c)   Use PostGIS as a backend for data storage (raster / vector) (linked
 by (r./v.external.*)

 d)  Use another GDAL/OGR format for data storage (raster / vector)
 (linked by (r./v.external.*)



 My question(s) are:

 What solutions could you recommend or what solution did you choose?

 I would suggest to use r.external and uncompressed geotiff files for
 raster data. But you have to make sure that external software does not
 modify these files, or if they do, that the temporal framework is
 triggered to update dependent space time raster datasets.

 I would suggest to use the native GRASS format, in case of vector
 data. Hence vector data needs to be copied. But maybe PostgreSQL with
 topology support will be a solution? I think Martin Landa may have an
 opinion here.


 Who is having experience with this kind of data management challenge?

 No experience here from my side.

 How do externally linked data series perform compared to GRASS native?

 It will be slower than the native format for sure. But i don't know
 how much slower.



 I searched a bit the mailing list and found this:
 (http://osgeo-org.1560.x6.nabble.com/GRASS7-temporal-GIS-database-questions-td5054920.html)
 where Sören recommended “postgresql as temporal database backend”. However I
 am not sure if that was meant only for the temporal metadata and not the
 rasters themselves…

 My recommendation was related to the temporal metadata only. The
 sqlite database will not scale very well for select requests if you
 have more than 30,000 maps registered in your temporal database.
 PostgreSQL will be much faster for select requests. But PostgreSQL
 performs very badly in managing (insert, update, delete) many maps. I
 am not sure what the reason for this is, but from my experience has
 PostgreSQL a scaling problem with many tables. Hence if you do not
 modify you data often, PostgreSQL is your temporal database backend of
 choice. Otherwise i would recommend Sqlite, even if its slower for
 select requests.

 Furthermore in the idea collection for the Temporal framework
 (http://grasswiki.osgeo.org/wiki/Time_series_development, Open issues

 This discussion is pretty old and does not reflect the current
 temporal framework implementation. Please have a look at the new
 TGRASS paper:
 https://www.sciencedirect.com/science/article/pii/S136481521300282X?np=y
 and the Geostat workshop:
 http://geostat-course.org/Topic_Gebbert

 section) limitations were mentioned regarding the number of files in a
 folder, which would be possibly a problem both for file based storage. The
 ext2 file system had “soft upper limit of about 10-15k files in a single
 directory” but theoretically many more where possible. Other file systems
 may allow for more I guess… Will usage of such big directories  10,000
 files lead to performance problems?

 Modern file systems should not have problems with many files. I am
 using ext4 and the temporal framework with 100.000 maps without
 noticeable performance issues.


 The “Working with external data in GRASS 7” – wiki entry
 

Re: [GRASS-user] Organizing spatial (time series) data for mixed GIS environments

2013-12-04 Thread Blumentrath, Stefan
Hi Sören,

First of all thank you very much for the excellent temporal framework! It is 
really great work!
Thank you also for your answers. They are already very helpful too!

I will test the solution with external Geotiffs.

Updates of the Geotiffs by external software are expectable (possibly by 
cron-jobs), so I have to think about a strategy for updating all time-space 
datasets down-streams which depend on a file updated (decade, year, month, 
whatever...)

I`ll report back after some first tests...

Best regards
Stefan


-Original Message-
From: Sören Gebbert [mailto:soerengebb...@googlemail.com] 
Sent: 4. desember 2013 18:02
To: Blumentrath, Stefan
Cc: grass-user@lists.osgeo.org list
Subject: Re: [GRASS-user] Organizing spatial (time series) data for mixed GIS 
environments

Hi Stefan,

2013/12/3 Blumentrath, Stefan stefan.blumentr...@nina.no:
 Dear all,



 On our Ubuntu server we are about to reorganize our GIS data in order 
 to develop a more efficient and consistent solution for data storage 
 in a mixed GIS environment.

 By “mixed GIS environment” I mean that we have people working with 
 GRASS, QGIS, PostGIS but also many people using R and maybe the 
 largest fraction using ESRI products, furthermore we have people using 
 ENIV, ERDAS and some other. Only few people (like me) actually work 
 directly on the server…

 Until now I stored “my” data mainly in GRASS (6/7) native format which 
 I was very happy with. But I  guess our ESRI- and PostGIS-people would 
 not accept that as a standard…



 However, especially for time series data we cannot have several copies 
 in different formats (tailor-made for each and every software).



 So I started thinking: what would be the most efficient and convenient 
 solution for storing a large amount of data (e.g. high resolution 
 raster and vector data with national extent plus time series data) in 
 a way that it is accessible for all (at least most) remote users (with 
 different GIS software). As I am very fond of the temporal framework 
 in GRASS 7 it would be a precondition that I can use these tools on 
 the data without unreasonable performance loss. Another precondition 
 would be that users at remote computers in our (MS Windows) network can have 
 access to the data.



 In general, four options come into my mind:

 a)  Stick to GRASS native format and have one copy in another format

 b)  Use the native formats the data come in (e.g. temperature and
 precipitation comes in zipped ascii-grid format)

 c)   Use PostGIS as a backend for data storage (raster / vector) (linked
 by (r./v.external.*)

 d)  Use another GDAL/OGR format for data storage (raster / vector)
 (linked by (r./v.external.*)



 My question(s) are:

 What solutions could you recommend or what solution did you choose?

I would suggest to use r.external and uncompressed geotiff files for raster 
data. But you have to make sure that external software does not modify these 
files, or if they do, that the temporal framework is triggered to update 
dependent space time raster datasets.

I would suggest to use the native GRASS format, in case of vector data. Hence 
vector data needs to be copied. But maybe PostgreSQL with topology support will 
be a solution? I think Martin Landa may have an opinion here.


 Who is having experience with this kind of data management challenge?

No experience here from my side.

 How do externally linked data series perform compared to GRASS native?

It will be slower than the native format for sure. But i don't know how much 
slower.



 I searched a bit the mailing list and found this:
 (http://osgeo-org.1560.x6.nabble.com/GRASS7-temporal-GIS-database-ques
 tions-td5054920.html) where Sören recommended “postgresql as temporal 
 database backend”. However I am not sure if that was meant only for 
 the temporal metadata and not the rasters themselves…

My recommendation was related to the temporal metadata only. The sqlite 
database will not scale very well for select requests if you have more than 
30,000 maps registered in your temporal database.
PostgreSQL will be much faster for select requests. But PostgreSQL performs 
very badly in managing (insert, update, delete) many maps. I am not sure what 
the reason for this is, but from my experience has PostgreSQL a scaling problem 
with many tables. Hence if you do not modify you data often, PostgreSQL is your 
temporal database backend of choice. Otherwise i would recommend Sqlite, even 
if its slower for select requests.

 Furthermore in the idea collection for the Temporal framework 
 (http://grasswiki.osgeo.org/wiki/Time_series_development, Open issues

This discussion is pretty old and does not reflect the current temporal 
framework implementation. Please have a look at the new TGRASS paper:
https://www.sciencedirect.com/science/article/pii/S136481521300282X?np=y
and the Geostat workshop:
http://geostat-course.org/Topic_Gebbert

 section) limitations were mentioned

[GRASS-user] Organizing spatial (time series) data for mixed GIS environments

2013-12-03 Thread Blumentrath, Stefan
Dear all,

On our Ubuntu server we are about to reorganize our GIS data in order to 
develop a more efficient and consistent solution for data storage in a mixed 
GIS environment.
By mixed GIS environment I mean that we have people working with GRASS, QGIS, 
PostGIS but also many people using R and maybe the largest fraction using ESRI 
products, furthermore we have people using ENIV, ERDAS and some other. Only few 
people (like me) actually work directly on the server...
Until now I stored my data mainly in GRASS (6/7) native format which I was 
very happy with. But I  guess our ESRI- and PostGIS-people would not accept 
that as a standard...

However, especially for time series data we cannot have several copies in 
different formats (tailor-made for each and every software).

So I started thinking: what would be the most efficient and convenient solution 
for storing a large amount of data (e.g. high resolution raster and vector data 
with national extent plus time series data) in a way that it is accessible for 
all (at least most) remote users (with different GIS software). As I am very 
fond of the temporal framework in GRASS 7 it would be a precondition that I can 
use these tools on the data without unreasonable performance loss. Another 
precondition would be that users at remote computers in our (MS Windows) 
network can have access to the data.

In general, four options come into my mind:

a)  Stick to GRASS native format and have one copy in another format

b)  Use the native formats the data come in (e.g. temperature and 
precipitation comes in zipped ascii-grid format)

c)   Use PostGIS as a backend for data storage (raster / vector) (linked by 
(r./v.external.*)

d)  Use another GDAL/OGR format for data storage (raster / vector) (linked 
by (r./v.external.*)

My question(s) are:
What solutions could you recommend or what solution did you choose?
Who is having experience with this kind of data management challenge?
How do externally linked data series perform compared to GRASS native?

I searched a bit the mailing list and found this: 
(http://osgeo-org.1560.x6.nabble.com/GRASS7-temporal-GIS-database-questions-td5054920.html)
 where Sören recommended postgresql as temporal database backend. However I 
am not sure if that was meant only for the temporal metadata and not the 
rasters themselves...
Furthermore in the idea collection for the Temporal framework 
(http://grasswiki.osgeo.org/wiki/Time_series_development, Open issues section) 
limitations were mentioned regarding the number of files in a folder, which 
would be possibly a problem both for file based storage. The ext2 file system 
had soft upper limit of about 10-15k files in a single directory but 
theoretically many more where possible. Other file systems may allow for more I 
guess... Will usage of such big directories  10,000 files lead to performance 
problems?

The Working with external data in GRASS 7 - wiki entry 
(http://grasswiki.osgeo.org/wiki/Working_with_external_data_in_GRASS_7) covers 
the technical part (and to some degree performance issues) very well.  Would it 
be worth adding a part on the strategic considerations / pros and cons of using 
external data? Or is that too much user and format dependent?

Thanks for any feedback our thoughts around this topic...

Cheers
Stefan



___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user