[GENERAL] Abusing Postgres in fun ways.
I'm creating a data queue on top of postgres and I'm wondering if I've made an incorrect assumption about isolation or synchronization or some similar issue. Every item in the queue is given a unique ID from a sequence. CREATE TABLE data_queue ( sequence_num BIGINT PRIMARY KEY, sender_key BIGINT NOT NULL, datablob bytea ); I read from the queue by passing in the last _highest_seen_sequence_num to a stored procedure: SELECT * from data_queue WHERE sequence_num > _highest_seen_sequence_num ORDER BY sequence_num ASC Obviously with readers and writers racing I need some sort of synchronization. I've found the advisory locks and those did seem to be my best bet. I used explicit locking for a while but ran into an issue with our daily backups and ACCESS_EXCLUSIVE (which I might be able to revisit) I'm also trying to create a setup where there is basically no blocking, writers can always write, readers are not blocked by writers (though there may be a delay in what is visible to the reader). Before I dump a bunch of SQL on the list, my plan in short is to stage writes to a similar table: stage_data_queue, and then copy them all into a table visible by readers. 1 Writers get a shared advisory lock, get the next sequence_num and Insert one row, then release a shared advisory lock (in one stored procedure) 2 At some point there is a 'tick' and another thread gets the corresponding exclusive advisory lock (letting all in flight writes finish). Then copy all rows into another table visible to the readers, then Truncate the staging table, and release the exclusive lock. (all in one stored procedure) My fear is that there is still a race here because the writer (1) calls unlock at the end of the stored procedure, and thus there is a window before the row is committed, and (2) may end up truncating that data... I think I could fix this by leaving the (1) shared lock locked through the end of the stored procedure, and calling back unlocking it later. I might also be able to fix this with Explicit Locks because I assume those will get properly unlocked after the Insert is truly committed. Am I on the wrong track here? -JD
[GENERAL] Abusing Postgres in interesting ways
Hello all, I'm creating a data queue on top of postgres and I'm wondering if I've made an incorrect assumption about isolation or synchronization or some similar issue. Every item in the queue is given a unique ID from a sequence. CREATE TABLE data_queue ( sequence_num BIGINT PRIMARY KEY, sender_key BIGINT NOT NULL, datablob bytea ); I read from the queue by passing in the last _highest_seen_sequence_num to a stored procedure: SELECT * from data_queue WHERE sequence_num > _highest_seen_sequence_num ORDER BY sequence_num ASC Obviously with readers and writers racing I need some sort of synchronization. I've found the advisory locks and those seem to be my best bet. I used explicit locking for a while but ran into an issue with our daily backups and ACCESS_EXCLUSIVE (which I might be able to revisit) I'm also trying to create a setup where there is basically no blocking, writers can always write, readers are not blocked by writers (though there may be a delay in what is visible to the reader). Before I dump a bunch of SQL on the list, my plan in short to stage writes to a similar table: stage_data_queue. 1 Writers get a shared advisory lock, Insert one row, and release shared advisory lock (in one stored procedure) 2 At some point there is a 'tick' and another thread gets the corresponding exclusive advisory lock (letting all in flight writes finish). Then copy all rows into another table visible to the readers, then Truncate the staging table, and releasing the exclusive lock. (all in one stored procedure) My fear is that there is still a race here because the writer (1) calls unlock at the end of the stored procedure, and thus there is a window before the row is committed, and (2) may end up truncating that data... I think I could fix this by leaving the (1) shared lock locked through the end of the stored procedure, and calling back unlocking it later. I might also be able to fix this with Explicit Locks because I assume those will get properly unlocked after the Insert is truly committed. Am I on the wrong track here? -JD
[GENERAL] Time Series on Postgres (HOWTO?)
I've been googling, but haven't found a good answer to what I should do if I want to store time series in Postgres. My current solution is store serialized (compressed) blobs of data. (So for example store 1 day worth of 1 minute samples (~1440 samples) stored as one row in a bytea. (Plus meta data) It would be nice if I could use 1 sample per column,(because updating individual columns/samples is clear to me) but postgres doesn't compress the row (which is bad because of high amount of repetitive data.. Easily 10X bigger. I've been considering a Double[] array, which would get compressed, but before I start down that path (I suppose I need to make some storedprocs to update individual samples), has anyone built anything like this? Any open source projects I should look at? Thanks.
[GENERAL] missing chunk number (Bug 5507)
I found this link: http://postgresql.1045698.n5.nabble.com/BUG-5507-missing-chunk-number-0-for-toast-value-X-in-pg-toast-X-td2126674.html And am also experiencing the same issue... More anecdotal evidence that this is a bug: We recently(couple days ago) completely recreated the DB, and started inserting, so not data rot. After a couple days of inserts, we could no longer read due to the TOAST error. We had WAL shipping to a warm standby (we are using 9.0), and it also had the same issues after replaying the WAL.
[GENERAL] Anyone use PG with kvm/virtio? Any gotchas or recommended settings?
Looks like the recommended settings are using the virtio interface, cache=none, and raw partitions (not qcow2). Anyone else run into any problems with kvm or virtio? We currently have a setup using qcow2, virtio, and the default cache settings, and experienced some data corruption (not preceded by crashes or restarts of postgres), and we are wondering if that might be the culprit.
[GENERAL] Is TimeZone applied with TIMESTAMP WITHOUT TIME ZONE and Extract( EPOCH ...)?
I'm noticing some interesting behavior around timestamp and extract epoch, and it appears that I'm getting a timezone applied somewhere. Specifically, If I do: select EXTRACT( EPOCH FROM '2010-01-31 00:00:00'::TIMESTAMP WITHOUT TIME ZONE ); == 1264924800 select EXTRACT( EPOCH FROM '2010-04-01 00:00:00'::TIMESTAMP WITHOUT TIME ZONE ); == 1270105200 Now if I do something similar in Java.. using a GregorianCalendar, with "GMT" TimeZone. I get Hello:2010-01-31 00:00:00.000 (UTC) Hello:126489600 Hello:2010-04-01 00:00:00.000 (UTC) Hello:127008000 Which gives a difference of 8 and 7 hours respectively, so both a timezone and a DST shift are at work here. Is this the expected behavior of extract epoch, is there a way to get it to always be in GMT?
[GENERAL] Re: Is TimeZone applied with TIMESTAMP WITHOUT TIME ZONE and Extract( EPOCH ...)?
Looks like a quick search says I need to specify the timezone... On Thu, Mar 17, 2011 at 11:22 AM, bubba postgres wrote: > > I'm noticing some interesting behavior around timestamp and extract epoch, > and it appears that I'm getting a timezone applied somewhere. > > Specifically, If I do: > select EXTRACT( EPOCH FROM '2010-01-31 00:00:00'::TIMESTAMP WITHOUT TIME > ZONE ); == 1264924800 > select EXTRACT( EPOCH FROM '2010-04-01 00:00:00'::TIMESTAMP WITHOUT TIME > ZONE ); == 1270105200 > > Now if I do something similar in Java.. using a GregorianCalendar, with > "GMT" TimeZone. > I get > Hello:2010-01-31 00:00:00.000 (UTC) > Hello:126489600 > > Hello:2010-04-01 00:00:00.000 (UTC) > Hello:127008000 > > Which gives a difference of 8 and 7 hours respectively, so both a timezone > and a DST shift are at work here. > > Is this the expected behavior of extract epoch, is there a way to get it to > always be in GMT? > > > > >
[GENERAL] Re: Is TimeZone applied with TIMESTAMP WITHOUT TIME ZONE and Extract( EPOCH ...)?
no.. still confused. I assume it's storing everythign in UTC.. did I need to specify a timezone when I inserted? On Thu, Mar 17, 2011 at 11:24 AM, bubba postgres wrote: > Looks like a quick search says I need to specify the timezone... > > > On Thu, Mar 17, 2011 at 11:22 AM, bubba postgres > wrote: > >> >> I'm noticing some interesting behavior around timestamp and extract epoch, >> and it appears that I'm getting a timezone applied somewhere. >> >> Specifically, If I do: >> select EXTRACT( EPOCH FROM '2010-01-31 00:00:00'::TIMESTAMP WITHOUT TIME >> ZONE ); == 1264924800 >> select EXTRACT( EPOCH FROM '2010-04-01 00:00:00'::TIMESTAMP WITHOUT TIME >> ZONE ); == 1270105200 >> >> Now if I do something similar in Java.. using a GregorianCalendar, with >> "GMT" TimeZone. >> I get >> Hello:2010-01-31 00:00:00.000 (UTC) >> Hello:126489600 >> >> Hello:2010-04-01 00:00:00.000 (UTC) >> Hello:127008000 >> >> Which gives a difference of 8 and 7 hours respectively, so both a timezone >> and a DST shift are at work here. >> >> Is this the expected behavior of extract epoch, is there a way to get it >> to always be in GMT? >> >> >> >> >> >
[GENERAL] Re: Is TimeZone applied with TIMESTAMP WITHOUT TIME ZONE and Extract( EPOCH ...)?
ok got it. select EXTRACT( EPOCH FROM '2010-04-01 00:00:00'::TIMESTAMP WITHOUT TIME ZONE at time zone 'utc' ); On Thu, Mar 17, 2011 at 11:32 AM, bubba postgres wrote: > no.. still confused. > I assume it's storing everythign in UTC.. did I need to specify a timezone > when I inserted? > > > > On Thu, Mar 17, 2011 at 11:24 AM, bubba postgres > wrote: > >> Looks like a quick search says I need to specify the timezone... >> >> >> On Thu, Mar 17, 2011 at 11:22 AM, bubba postgres < >> bubba.postg...@gmail.com> wrote: >> >>> >>> I'm noticing some interesting behavior around timestamp and extract >>> epoch, and it appears that I'm getting a timezone applied somewhere. >>> >>> Specifically, If I do: >>> select EXTRACT( EPOCH FROM '2010-01-31 00:00:00'::TIMESTAMP WITHOUT TIME >>> ZONE ); == 1264924800 >>> select EXTRACT( EPOCH FROM '2010-04-01 00:00:00'::TIMESTAMP WITHOUT TIME >>> ZONE ); == 1270105200 >>> >>> Now if I do something similar in Java.. using a GregorianCalendar, with >>> "GMT" TimeZone. >>> I get >>> Hello:2010-01-31 00:00:00.000 (UTC) >>> Hello:126489600 >>> >>> Hello:2010-04-01 00:00:00.000 (UTC) >>> Hello:127008000 >>> >>> Which gives a difference of 8 and 7 hours respectively, so both a >>> timezone and a DST shift are at work here. >>> >>> Is this the expected behavior of extract epoch, is there a way to get it >>> to always be in GMT? >>> >>> >>> >>> >>> >> >
[GENERAL] why does extract ( HOUR FROM TIMESTAMP '2010-01-01 00:00:00' at time zone 'utc' ) == 16?
Is this the correct behavior? It seems like if I specify the utc offset it should be 0, not 16.. It seems to be the opposite behavior from extract epoch. select extract ( HOUR FROM TIMESTAMP '2010-01-01 00:00:00' ) as defhour, extract ( HOUR FROM TIMESTAMP '2010-01-01 00:00:00' at time zone 'PST' ) as psthour, extract ( HOUR FROM TIMESTAMP '2010-01-01 00:00:00' at time zone 'utc' ) as utchour, extract ( epoch FROM TIMESTAMP '2010-01-01 00:00:00' at time zone 'utc' ) as utcepoch; 0,0,16,1262304000 @Test public void testFoo() { TimeZone tz = TimeZone.getTimeZone("GMT"); GregorianCalendar cal = new GregorianCalendar(tz); cal.set(2010,0,1,0,0,0); cal.set(GregorianCalendar.MILLISECOND, 0 ); System.out.println("" + cal.getTimeInMillis() ); System.out.println("" + String.format( "%1$tY-%1$tm-%1$td %1$tH:%1$tM:%1$tS.%1$tL", cal ) ); System.out.println("" + cal.get(GregorianCalendar.HOUR_OF_DAY ) ); } In Java: 126230400 2010-01-01 00:00:00.000 (UTC) 0
[GENERAL] Re: why does extract ( HOUR FROM TIMESTAMP '2010-01-01 00:00:00' at time zone 'utc' ) == 16?
I found a work around... Not sure why this is the behavior select extract ( HOUR FROM (TIMESTAMP '2010-01-01 00:00:00' at time zone 'utc' ) at time zone 'utc' ) gives what I expect would be the correct answer BUT.. select extract ( EPOCH FROM (TIMESTAMP '2010-01-01 00:00:00' at time zone 'utc' ) at time zone 'utc' ) does not... Can anyone explain this? On Thu, Mar 17, 2011 at 5:05 PM, bubba postgres wrote: > Is this the correct behavior? It seems like if I specify the utc offset it > should be 0, not 16.. It seems to be the opposite behavior from extract > epoch. > > select extract ( HOUR FROM TIMESTAMP '2010-01-01 00:00:00' ) as defhour, > extract ( HOUR FROM TIMESTAMP '2010-01-01 00:00:00' at time zone 'PST' ) as > psthour, extract ( HOUR FROM TIMESTAMP '2010-01-01 00:00:00' at time zone > 'utc' ) as utchour, extract ( epoch FROM TIMESTAMP '2010-01-01 00:00:00' at > time zone 'utc' ) as utcepoch; > > 0,0,16,1262304000 > > > > > @Test > public void testFoo() { > TimeZone tz = TimeZone.getTimeZone("GMT"); > GregorianCalendar cal = new GregorianCalendar(tz); > cal.set(2010,0,1,0,0,0); > cal.set(GregorianCalendar.MILLISECOND, 0 ); > System.out.println("" + cal.getTimeInMillis() ); > System.out.println("" + String.format( "%1$tY-%1$tm-%1$td > %1$tH:%1$tM:%1$tS.%1$tL", cal ) ); > System.out.println("" + cal.get(GregorianCalendar.HOUR_OF_DAY ) ); > } > > In Java: > 126230400 > 2010-01-01 00:00:00.000 (UTC) > 0 >
[GENERAL] JDBC Binary transfer of Arrays
Hello! In my current application I am sending a lot of data to/from the DB with JDBC, and specifically arrays of Double. (or even Double[][]). Alas, my current implementation I converts everything into a string representation, which blows my memory out of the water and drops my transaction rate way way down while I spend most of my time working with a StringBuffer. I note that there is this web page: http://wiki.postgresql.org/wiki/JDBC-BinaryTransfer Which makes me think I might be saved. Has anyone used Binary Transfer for Double[] (or other) array data? Any pointers to some implementation would be great... Right now I have my own implementation that extends java.sql.Array, and contains the string and a typeName and num. Regards, -JD
Re: [GENERAL] JDBC Binary transfer of Arrays
After trying out the JDBC4 driver in DBCP, I see that Connection.createArray(...) still just creates a big string under the covers. Is that the expected behavior? Am I doing it wrong? On Thu, Mar 31, 2011 at 8:04 PM, bubba postgres wrote: > Hello! > In my current application I am sending a lot of data to/from the DB with > JDBC, and specifically arrays of Double. (or even Double[][]). > Alas, my current implementation I converts everything into a string > representation, which blows my memory out of the water and drops my > transaction rate way way down while I spend most of my time working with a > StringBuffer. > > I note that there is this web page: > http://wiki.postgresql.org/wiki/JDBC-BinaryTransfer > > Which makes me think I might be saved. Has anyone used Binary Transfer for > Double[] (or other) array data? > Any pointers to some implementation would be great... > Right now I have my own implementation that extends java.sql.Array, and > contains the string and a typeName and num. > > Regards, > -JD >
[GENERAL] Transport Compression (whatever became of that discussion?)
In this discussion there was a lot of talk of transport compression in Postgres, (also specifically wondering about JDBC as well) did anything ever come of that discussion? http://postgresql.1045698.n5.nabble.com/Compression-on-SSL-links-td2261205.html
[GENERAL] pg_dump on Hot standby : clarification on how to
I would just like to get some clarification from the list on how to do a pg_dump on the slave in the face of "canceling statement due to conflict with recovery". The following links seem to indicate that If I start an idle transaction on the master I should be able to do the pg_dump, but I tried this in psql on the master "start transaction", and was still unable to do a pg_dump on the slave at the same time. Is there something special about using dblink that would make this all work? > http://postgresql.1045698.n5.nabble.com/Hot-Standby-ERROR-canceling-statement-due-to-conflict-with-recovery-td3402417.html One solution is to begin idle transactions on the master by using e.g. dblink from the *standby* to the master before you start *pg_dump* on the *standby* and end them after *pg_dump* (or whatever) is finished.
Re: [GENERAL] pg_dump on Hot standby : clarification on how to
What I mean is if I do pg_dump on slave I get the " ERROR: canceling statement due to conflict with recovery". So I googled and tried the solution listed in the linked thread. I did a "start transaction" via psql on the master but I continued to get the error. Wondered if there was more to it than that. On Thu, May 12, 2011 at 5:08 PM, Andrew Sullivan wrote: > On Thu, May 12, 2011 at 11:26:38AM -0700, bubba postgres wrote: > > I would just like to get some clarification from the list on how to do a > > pg_dump on the slave in the face of "canceling statement due to conflict > > with recovery". > > The following links seem to indicate that If I start an idle transaction > on > > the master I should be able to do the pg_dump, but I tried this in psql > on > > the master "start transaction", and was still unable to do a pg_dump on > the > > slave at the same time. > > Is there something special about using dblink that would make this all > work? > > Could you define what you mean by "unable to do pg_dump on the slave"? > > I don't see why dblink would be the special thing. I think what you > want is to hold a transaction open on the master so that the WAL can't > get recycled. At least, that's what I understood from the post. I > haven't actually tried it yet, but to me it sounded like it ought to > work. > > A > > -- > Andrew Sullivan > a...@crankycanuck.ca > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
[GENERAL] General Postgres performance tips when using ARRAY
So, what are the gotcha's around manipulating Arrays in stored procs? It seems reasonable that an array_cat /etc would cause the creation of a new array, but does mutating an existing array also create a copy?
[GENERAL] Why are IDLE connections using cpu according to TOP.
I have an overloaded DB and I see several IDLE connections that are using significant CPU.. (Not Idle in transaction) Why would an idle process be eating so much cpu? Or is it not actually idle? Here is an example from pg_top: last pid: 11821; load avg: 6.11, 6.32, 7.64; up 1+21:05:31 50 processes: 3 running, 42 sleeping, 5 uninterruptable CPU states: 21.7% user, 0.0% nice, 7.8% system, 46.9% idle, 23.6% iowait Memory: 29G used, 149M free, 13M buffers, 27G cached Swap: PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND 4779 postgres 200 4383M 573M disk3:16 4.79% 39.42% postgres: gpup gpup 10.202.99.5(46391) UPDATE 11591 postgres 200 4383M 108M sleep 0:12 2.08% 19.61% postgres: gpup gpup 10.202.99.6(52459) idle 4191 postgres 200 4384M 709M sleep 4:33 2.50% 19.41% postgres: gpup gpup 10.202.99.6(42288) idle 10942 postgres 200 4383M 242M sleep 0:42 5.08% 16.86% postgres: gpup gpup 10.202.99.5(58373) idle 10930 postgres 200 4390M 281M sleep 0:43 1.62% 15.30% postgres: gpup gpup 10.202.99.6(52273) idle 11571 postgres 200 4390M 210M run 0:25 4.32% 14.51% postgres: gpup gpup 10.202.99.6(52455) SELECT 11533 postgres 200 4383M 109M run 0:14 2.31% 12.75% postgres: gpup gpup 10.202.99.6(52453) SELECT 7494 postgres 200 4384M 1611M disk2:31 2.44% 12.35% postgres: gpup gpup 10.202.99.6(53620) SELECT
[GENERAL] Are check constraints always evaluated on UPDATE?
Are there any optimizations around check constraints such that they will not be evaluated if constituent columns are not updated? Regards, -JD
[GENERAL] Odd performance difference in check constraint : SQL(slow) vs plpgsql(fast)
This is the reverse of what I thought I would find. In short my check constraint is extracting the epoch from a start timestamp, and an end timestamp to get the number of seconds difference. It then uses this number to check the array_upper() of an array to make sure it's the proper size The SQL version uses a case statement, and the plpgsql uses an IF/ELSE In a particular insert test The plpgsql version adds 1 second over the no constraints case. the sql version adds 10 seconds over the no constraints case. Why would this be? ---> CREATE OR REPLACE FUNCTION check_end_time_foo( _start_time TIMESTAMP, _end_time TIMESTAMP, _granularity SMALLINT, _values DOUBLE PRECISION[] ) RETURNS boolean AS $$ BEGIN if( _granularity = 5 ) THEN return( EXTRACT( EPOCH FROM _end_time at time zone 'utc' )::BIGINT - EXTRACT( EPOCH FROM _start_time at time zone 'utc' )::BIGINT = 60 * array_upper( _values,1 ) ); ELSEIF( _granularity = 7 ) THEN return( EXTRACT( EPOCH FROM _end_time at time zone 'utc' )::BIGINT - EXTRACT( EPOCH FROM _start_time at time zone 'utc' )::BIGINT = 900 * array_upper( _values,1 ) ); ELSEIF( _granularity = 9 ) THEN return( EXTRACT( EPOCH FROM _end_time at time zone 'utc' )::BIGINT - EXTRACT( EPOCH FROM _start_time at time zone 'utc' )::BIGINT = 3600 * array_upper( _values,1 ) ); ELSEIF( _granularity = 12 ) THEN return( ( (EXTRACT( YEAR FROM (_end_time at time zone 'utc') at time zone 'utc' )::INT * 12) + EXTRACT( MONTH FROM (_end_time at time zone 'utc' ) at time zone 'utc' )::INT ) - ( (EXTRACT( YEAR FROM (_start_time at time zone 'utc') at time zone 'utc' )::INT * 12) + EXTRACT( MONTH FROM (_start_time at time zone 'utc') at time zone 'utc' )::INT ) = array_upper( _values,1 ) ); END IF; END; $$ language plpgsql IMMUTABLE; alter table timeseries add CONSTRAINT timeseries_valid_end_time CHECK( check_end_time_foo( series_start_time, series_end_time, granularity, data_value ) ); -vs- alter table timeseries add CONSTRAINT timeseries_valid_end_time CHECK( CASE WHEN granularity = 5 THEN EXTRACT( EPOCH FROM series_end_time at time zone 'utc' )::BIGINT - EXTRACT( EPOCH FROM series_start_time at time zone 'utc' )::BIGINT = 60 * array_upper( data_value,1 ) WHEN granularity = 7 THEN EXTRACT( EPOCH FROM series_end_time at time zone 'utc' )::BIGINT - EXTRACT( EPOCH FROM series_start_time at time zone 'utc' )::BIGINT = 900 * array_upper( data_value,1 ) WHEN granularity = 9 THEN EXTRACT( EPOCH FROM series_end_time at time zone 'utc' )::BIGINT - EXTRACT( EPOCH FROM series_start_time at time zone 'utc' )::BIGINT = 3600 * array_upper( data_value,1 ) WHEN granularity = 12 THEN ((EXTRACT( YEAR FROM ( series_end_time at time zone 'utc') at time zone 'utc' )::INT * 12) + EXTRACT( MONTH FROM ( series_end_time at time zone 'utc' ) at time zone 'utc' )::INT ) - ( (EXTRACT( YEAR FROM ( series_start_time at time zone 'utc') at time zone 'utc' )::INT * 12) + EXTRACT( MONTH FROM ( series_start_time at time zone 'utc') at time zone 'utc' )::INT ) = array_upper( data_value,1 ) ELSE false END );
[GENERAL] pg_largeobject vs pg_toast_XXXX
After some changes on my DB I notice that where I used to have a large pg_toast_X table, I now have a large pg_largeobject table. Can't find an explanation of the difference between the two, could someone enlighten me?
Re: [GENERAL] pg_largeobject vs pg_toast_XXXX
No takers? Some background I've changed my TOAST type from EXTENDED to MAIN. On Thu, Jul 28, 2011 at 10:50 AM, bubba postgres wrote: > > After some changes on my DB I notice that where I used to have a large > pg_toast_X table, I now have a large pg_largeobject table. > Can't find an explanation of the difference between the two, could someone > enlighten me? > > >