Hello
We have a process in R which reads statistical raw data from a table and
computes time series values from them.
The time series values are in a hstore field with the date as the key and the
value as the value.
The process writes the computed value into a temporary table and locks the
corresponding row(s) of the target table for update.
Finally the row(s) are updated if they already exist or inserted if they do not
exist.
This process runs nightly and processes many GB of data without generating any
error. Normally these are low frequency time series
(monthly or quarterly data).
Now we have a daily time serie with about 46'000 key/value pairs. In near
future this will increase to 86'000 including data from
previous years.
When R processes the daily time serie we get a stack size exceeded error,
followed by the hint to increase the max_stack_depth. My
colleague, who wrote the R function and performed the test read the docs and
increased, according to ulimit -s the max_stack_depth
to 7MB.
Here the details of OS and PG:
OS: osx 10.10.5
PG: 9.3.3
ulimit -s = 8192
The resize did work as *show max_stack_depth;" has shown. After this change,
however, the query states the same error as before,
just with the new limit of 7 MB.
The query itself was written to a file in order to verify its size. The size
turned out to be 1.7MB, i.e. even below the
conservative default limit of 2 MB, yet alone substantially below 7 MB.
Apart from the fact that we could consider using a different strategy to store
time series, we would like to understand what is
causing the problem.
Here the query as it looks like in the R code:
sql_query_data <- sprintf("BEGIN;
CREATE TEMPORARY TABLE ts_updates(ts_key varchar,
ts_data hstore, ts_frequency integer) ON COMMIT DROP;
INSERT INTO ts_updates(ts_key, ts_data) VALUES %s;
LOCK TABLE %s.timeseries_main IN EXCLUSIVE MODE;
UPDATE %s.timeseries_main
SET ts_data = ts_updates.ts_data
FROM ts_updates
WHERE ts_updates.ts_key = %s.timeseries_main.ts_key;
INSERT INTO %s.timeseries_main
SELECT ts_updates.ts_key, ts_updates.ts_data,
ts_updates.ts_frequency
FROM ts_updates
LEFT OUTER JOIN %s.timeseries_main ON
(%s.timeseries_main.ts_key = ts_updates.ts_key)
WHERE %s.timeseries_main.ts_key IS NULL;
COMMIT;",
values, schema, schema, schema, schema, schema,
schema, schema)
And here is how it looks like at the end:
INSERT INTO ts_updates(ts_key, ts_data, ts_frequency) VALUES
('somekey',hstore('1900-01-01','-0.395131869823009')||
hstore('1900-01-02','-0.595131869823009')||
hstore('1900-01-03','-0.395131869823009')||
[...]
46'000 times
hstore('1900-01-04','-0.395131869823009'),NULL);
The computer where my colleague made the test is local. There are no other
concurrent users.
We thank you for hints on what the problem may be and/or how to investigate it
further.
Please reply to all, as my colleague is not yet subscribed to the mailing list.
Regards,
Charles and Matthias
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general