Thank you, Regina, I apologize, I have shared the older version of my code, I have tried ST_pixelascentroid and ST_value and the issue was I messed up the parameters of centroids and that caused the errors now I corrected that as well. I think the code below is a good representation of my approach, I would like to iterate for every raster in a table, for every band and for every lat and lon and extract st value, and insert it to a table,
psql:/home/manaswini/MEGA/bbgc_uw_rpackage/rproject/raster2pgsql/extract_table_queryyyy.sql:66: NOTICE: Invalid band index (must use 1-based). Returning empty set for all the coordinates and bands. What could be the issue. Is this because my data is packed? the readme file of the data has this information: Details about the data packing: The data has been packed into short integers (2 bytes instead of 4 byte reals) to save space. You must unpack that data to get the correct floating point representation of the data. Each netCDF variable that has been packed has an add_offset and scale_factor attribute associated with it. Some software automatically unpacks the data when it is read. The formula to unpack the data is: unpacked value = add_offset + ( (packed value) * scale_factor ) For more information see here:https://www.unidata.ucar.edu/software/netcdf/workshops/2010/bestpractices/Packing.html There's also another attribute called "missing_value". In this case all the -32768 values you see are missing. Only the grid points outside the downscaling domain is given the missing data value. The packing saves a lot of space, that is why the data is packed. -- Create the precipitation_temperature_data table DROP TABLE IF EXISTS extracted_data_bbox; CREATE TABLE extracted_data_bbox ( id SERIAL PRIMARY KEY, year integer, year_day integer, lat double precision, lon double precision, prcp double precision, tmax double precision, tmin double precision, observation_time timestamp ); -- Loop through all records in gfdl_03_bbox DO $$ DECLARE raster_record RECORD; year_from_filename integer; observation_time timestamp; centroid_point geometry; variable_name text; variable_value double precision; band_number integer; band_numbers integer[]; pixel_data RECORD; BEGIN FOR raster_record IN (SELECT * FROM gfdl_03_bbox) LOOP SELECT regexp_replace(raster_record.filename, '.*_(\d{4})[^0-9]+', '\1')::integer INTO year_from_filename; -- Determine the variable name based on the filename IF raster_record.filename ~ 'prcp' THEN variable_name := 'prcp'; ELSIF raster_record.filename ~ 'tmax' THEN variable_name := 'tmax'; ELSIF raster_record.filename ~ 'tmin' THEN variable_name := 'tmin'; ELSE RAISE EXCEPTION 'Unknown variable in filename: %', raster_record.filename; END IF; band_numbers := ARRAY(SELECT generate_series(1, 366)); -- Loop through all bands within the file FOREACH band_number IN ARRAY band_numbers LOOP -- Print the band number to the PostgreSQL log RAISE NOTICE 'Band Number: %', band_number; observation_time := MAKE_DATE(year_from_filename, 1, 1) + (band_number - 1) * interval '1 day'; SELECT * FROM ST_PixelAsCentroids(raster_record.clipped_raster, band_number, true) INTO pixel_data; -- Extract the centroid point from pixel_data centroid_point := pixel_data.geom; -- Extract the variable value based on the variable name EXECUTE format('SELECT ST_Value($1, $2, $3, true)', raster_record.clipped_raster, band_number, centroid_point) INTO variable_value USING raster_record.clipped_raster, band_number, centroid_point; -- Insert the extracted data into the extracted_data_bbox table INSERT INTO extracted_data_bbox (year, year_day, lat, lon, prcp, tmax, tmin, observation_time) VALUES (year_from_filename, band_number, ST_Y(centroid_point), ST_X(centroid_point), CASE WHEN variable_name = 'prcp' THEN variable_value ELSE NULL END, CASE WHEN variable_name = 'tmax' THEN variable_value ELSE NULL END, CASE WHEN variable_name = 'tmin' THEN variable_value ELSE NULL END, observation_time); END LOOP; END LOOP; END; $$; I have tried the following query to check values for a raster band : the output is again null psql:/home/manaswini/MEGA/bbgc_uw_rpackage/rproject/raster2pgsql/test_query_fff.sql:22: NOTICE: Band: 20, X: 1, Y: 135, Value: <NULL> : DO $$ DECLARE band_number integer := 20; -- Replace with the desired band number x_coord integer; y_coord integer; pixel_value double precision; srid integer := 4326; -- Replace with the correct SRID for your data BEGIN -- Loop through all x and y coordinates in the raster band FOR x_coord IN (SELECT generate_series(1, ST_Width(rast.clipped_raster)) FROM gfdl_03_bbox AS rast WHERE rast.filename = 'prcp_03_2034.nc') LOOP FOR y_coord IN (SELECT generate_series(1, ST_Height(rast.clipped_raster)) FROM gfdl_03_bbox AS rast WHERE rast.filename = 'prcp_03_2034.nc') LOOP -- Get the pixel value at the current x and y coordinates SELECT ST_Value(ST_SetSRID(rast.clipped_raster, srid), band_number, ST_SetSRID(ST_Point(x_coord, y_coord), srid)) INTO pixel_value FROM gfdl_03_bbox AS rast WHERE rast.filename = 'prcp_03_2034.nc'; -- Print or use the pixel value as needed RAISE NOTICE 'Band: %, X: %, Y: %, Value: %', band_number, x_coord, y_coord, pixel_value; END LOOP; END LOOP; END; Thank you, Manaswini On Wed, 15 Nov 2023 at 20:42, Regina Obe <[email protected]> wrote: > I didn’t realize that netCDF also has a vector component. > > > > https://gdal.org/drivers/vector/netcdf.html#vector-netcdf > > > > To read the vector component, you’d use the ogr_fdw extension to read that > rather than postgis_raster extension. > > > > Details here - https://github.com/pramsey/pgsql-ogr-fdw > > > > So perhaps that is what your python is doing reading the vector > component. I don’t know if netcdf mixes those in one data set or not since > I have no experience using netcdf. > > > > I recall you said you are using the OSGeo Live 16 distribution. I just > checked the OSGeoLive 16 does include ogr_fdw > > > > So do > > > > CREATE EXTENSION ogr_fdw; > > > > > > The list of supported formats you can see with this query: > > > > SELECT name FROM unnest(ogr_fdw_drivers()) AS f(name) ORDER BY name; > > > > Which for osgeolive 16, I see netCDF listed > > > > > > > > > > > > *From:* Regina Obe <[email protected]> > *Sent:* Wednesday, November 15, 2023 6:19 PM > *To:* 'PostGIS Users Discussion' <[email protected]> > *Cc:* 'Manaswini Ganjam' <[email protected]> > *Subject:* RE: [postgis-users] Extracting variable information from > netcdf, imported as raster to a table > > > > Just confirming some stuff, since I’m not completely following: > > > > Raster_record.rast column is of type raster correct? IF so ST_X and ST_Y > won’t work since those are for geometry types. > > > > Also ST_Value(raster_record.rast, band_number), won’t work either since > that expects as input a geometry or x,y on the raster you want the value > you. > > I would think you would have gotten an error with that, which makes me > feel I’m missing something critical. > > > > If you want to extract all the pixels in a raster, you’d do something like > https://postgis.net/docs/RT_ST_PixelAsPoints.html > > > > SELECT pp.x, pp.y, pp.val, ST_X(pp.geom) AS lon, ST_Y(pp.geom) AS lat > > FROM raster_record, > > ST_PixelAsPoints(raster_record.rast, 1) AS pp > > > > > > > > > *From:* postgis-users <[email protected]> *On Behalf > Of *Manaswini Ganjam via postgis-users > *Sent:* Wednesday, November 15, 2023 2:01 PM > *To:* [email protected] > *Cc:* Manaswini Ganjam <[email protected]> > *Subject:* [postgis-users] Extracting variable information from netcdf, > imported as raster to a table > > > > Hi, > > I have been trying to download s3 cloud stored gridded climate data and > generate tables with variables, lat, lon and timestamp (year, yearday). To > achieve this I used raster2pgsql and imported multiple netcdf files into a > database table. > > > > Question: How to achieve the extraction of variables using postgis? I > tried using ST_value, ST_pixelaspoints but I was getting errors, mainly due > to the format in which netcdfs are stored in the database (the error says > can't load some characters like 00E30100082...), I even tried changing > the datatype to float but still did not work. I mean it is probably not > simple like selecting a variable from the netcdf. I have enclosed my sql > query below: > > > > -- Iterate through all raster files in the table > FOR raster_record IN (SELECT * FROM gfdl_03_prcp) LOOP > -- Determine the year from the raster file name, assuming the > format is 'prcp_03_1950.nc' > SELECT > regexp_replace(raster_record.filename, '.*_(\d{4})\.nc', > '\1')::integer > INTO > year; > > -- Calculate the start date of the year > year_start := (year || '-01-01')::date; > > -- Determine if the year is a leap year > is_leap_year := EXTRACT(ISODOW FROM (year_start + interval '1 > year')) = 7; > > -- Set the number of bands for the year (365 for non-leap years, > 366 for leap years) > FOR band_number IN 1..(CASE WHEN is_leap_year THEN 366 ELSE 365 > END) LOOP > -- Calculate the observation_time using the year and band > number > observation_time := year_start + (band_number - 1) * interval > '1 day'; > > -- Extract X (lon) and Y (lat) coordinates from the raster > SELECT > ST_X(raster_record.rast) AS lon, > ST_Y(raster_record.rast) AS lat > INTO > lon, > lat; > > -- Insert the lat, lon, prcp, and observation_time into the > extracted_values table > INSERT INTO extracted_values (lat, lon, prcp, observation_time) > VALUES > (lat, lon, ST_Value(raster_record.rast, band_number), > observation_time); > > -- Increment the counter > counter := counter + 1; > > -- Commit the transaction periodically in batches > IF counter % batch_size = 0 THEN > COMMIT; > END IF; > END LOOP; > END LOOP; > > > > > The metadata for the two files is as follows: > > > > File from database: > > {'NC_GLOBAL#Conventions': 'CF-1.5', > > 'NC_GLOBAL#GDAL': 'GDAL 3.6.4, released 2023/04/17', > > 'NC_GLOBAL#history': 'Wed Nov 15 13:32:13 2023: GDAL CreateCopy( > not_clipped_prcp.nc, ... )'} > > File before loading into the database: > > {'lat#units': 'degrees_north', > > 'lon#units': 'degrees_east', > > 'NC_GLOBAL#title': 'Daily statistically downscaled CMIP5 data for the United > States and southern Canada east of the Rocky Mountains, version 1.0, > realization 1, 0.1x0.1 degree spatial resolution.', > > 'NETCDF_DIM_EXTRA': '{time}', > > 'NETCDF_DIM_time_DEF': '{366,4}', > > 'NETCDF_DIM_time_VALUES': > '{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14....362,363,364,365}', > > 'prcp#add_offset': '819.17499', > > 'prcp#long_name': 'daily precipitation accumulation', > > 'prcp#missing_value': '-32768', > > 'prcp#scale_factor': '0.025', > > 'prcp#units': 'mm', > > 'prcp#_FillValue': '-32768', > > 'time#units': 'days since 1952-1-1 0:0:0.0'} > > > > In case this information is useful: Previously I used python to extract > variable information and generate a csv or table using this variable > information, and the code is enclosed below. In the code I extracted variable > values using lon = dataset.variables['lon'][:] and iterated for loops to > write them all in csv. > > Python code: > > import netCDF4 as nc > > # Step 1: Read the NetCDF file > filename = "/home/manaswini/prcp_03_1950.nc" > dataset = nc.Dataset(filename) > dataset.set_auto_mask(False) > dataset.set_auto_scale(True) > > > > lon = dataset.variables['lon'][:] > lat = dataset.variables['lat'][:] > time = dataset.variables['time'][:] > prcp = dataset.variables['prcp'][:] > > > > import numpy as np > import csv > > # csv_buffer > csv_buffer = open('output.csv', 'w', newline='') > csv_writer = csv.writer(csv_buffer) > > # Iterate through grid points and write to CSV buffer > for i in enumerate(lon): > for j in enumerate(lat): > for k in enumerate(time): > csv_writer.writerow([lat[j], lon[i], prcp[i][j][k], year[k], > yearday[k]]) > > > # Close the CSV buffer > csv_buffer.close() > > > > Thank you, > > Manaswini Ganjam > -- Manaswini Ganjam
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
