Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON
Its worth noting that the JDBC's behavior when you switch back to autocommit is to immediately commit the open transaction. Personally, I think committing immediately or erroring are unsurprising behaviors. The current behavior is surprising and obviously wrong. Rolling back without an error would be very surprising (no other database API I know of does that) and would take forever to debug issues around the behavior. And committing after the next statement is definitely the most surprising behavior suggested. IMHO, I think committing immediately and erroring are both valid. I think I'd prefer the error in principle, and per the law of bad code I'm sure, although no one has ever intended to use this behavior, there is probably some code out there that is relying on this behavior for "correctness". I think a hard failure and making the dev add an explicit commit is least likely to cause people serious issues. As for the other options, consider me opposed. - Matt K.
Re: [HACKERS] Cluster on NAS and data center.
As someone who has bitten by index corruption due to collation changes between glibc versions that shipped CentOS 6 and CentOS 7, don't even try to do this with anything other than C collation. The default collation _will_ deterministically leave you with a silently corrupt database if you store anything other than ASCII text. Windows and Linux are going to implement en_US.utf-8 slightly differently and Postgres is currently relying on the OS to provide collation implementations. Go search for my mailing list post about the dangers of running across versions of glibc for more info. I'm going to echo everyone else's sentiment though, and assert that what you are trying to do is really an insane idea. You might be able to make it appear like its working but as a DBA, I would have absolutely no confidence in using that server for disaster recovery. If your company is saving money by not getting Windows licenses for your DR environment, you are far better off just saving one more license and making both your production and DR server be Linux builds. - Matt K.
Re: [HACKERS] amcheck (B-Tree integrity checking tool)
> > On Fri, 2016-03-11 at 17:24 +0100, Michael Paquier wrote: > > On Fri, Mar 11, 2016 at 5:19 PM, Anastasia Lubennikova wrote: > > > > > > BTW, if you know a good way to corrupt index (and do it > > > reproducible) I'd be > > > very glad to see it. > > You can use for example dd in non-truncate mode to corrupt on-disk > > page data, say that for example: > > dd if=/dev/random bs=8192 count=1 \ > > seek=$BLOCK_ID of=base/$DBOID/$RELFILENODE \ > > conv=notrunc > I guess /dev/random is not very compatible with the "reproducible" > requirement. I mean, it will reproducibly break the page, but pretty > much completely, which is mostly what checksums are for. You can actually pretty easily produce a test case by setting up streaming replication between servers running two different version of glibc. I actually wrote a tool that spins up a pair of VMs using vagrant and then sets them up as streaming replica's using ansible. It provides a nice one liner to get a streaming replica test environment going and it will easily provide the cross glibc test case. Technically, though it belongs to Trip because I wrote it on company time. Let me see if I can open source a version of it later this week that way you can use it for testing. - Matt K.
Re: [HACKERS] logical column ordering
Even if it does fit in memory I suspect memory bandwidth is more important than clock cycles. http://people.freebsd.org/~lstewart/articles/cpumemory.pdf This paper is old but the ratios should still be pretty accurate. Main memory is 240 clock cycles away and L1d is only 3. If the experiments in this paper still hold true loading the 8K block into L1d is far more expensive than the CPU processing done once the block is in cache. When one adds in NUMA to the contention on this shared resource, its not that hard for a 40 core machine to starve for memory bandwidth, and for cores to sit idle waiting for main memory. Eliminating wasted space seems far more important even when everything could fit in memory already.
Re: [HACKERS] Exposing the stats snapshot timestamp to SQL
Yeah. The only use-case that's been suggested is detecting an unresponsive stats collector, and the main timestamp should be plenty for that. Lately, I've spent most of my time doing investigation into increasing qps. Turned out we've been able to triple our throughput by monitoring experiments at highly granular time steps (1 to 2 seconds). Effects that were invisible with 30 second polls of the stats were obvious with 2 second polls. The problem with doing highly granular snapshots is that the postgres counters are monotonically increasing, but only when stats are published. Currently you have no option except to divide by the delta of now() between the polling intervals. If you poll every 2 seconds the max error is about .5/2 or 25%. It makes reading those numbers a bit noisy. Using (snapshot_timestamp_new - snapshot_timestamp_old) as the denominator in that calculation should help to smooth out that noise and show a clearer picture. However, I'm happy with the committed version. Thanks Tom. - Matt K. On Thu, Feb 19, 2015 at 9:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Matt Kelly mkell...@gmail.com writes: Attached is the fixed version. (hopefully with the right mime-type and wrong extension. Alas, gmail doesn't let you set mime-types; time to find a new email client...) Committed with a couple of changes: * I changed the function name from pg_stat_snapshot_timestamp to pg_stat_get_snapshot_timestamp, which seemed to me to be the style in general use in the stats stuff for inquiry functions. * The function should be marked stable not volatile, since its value doesn't change any faster than all the other stats inquiry functions. regards, tom lane
Re: [HACKERS] Exposing the stats snapshot timestamp to SQL
Robert, I'll add it to the commitfest. Jim, I'm not sure I understand what you mean? This new function follows the same conventions as everything else in the file. TimestampTz is just a typedef for int64. Functions like pg_stat_get_buf_alloc follow the exact same pattern on the int64 fields of the global stats struct. - Matt K. On Thu, Jan 29, 2015 at 6:49 PM, Jim Nasby jim.na...@bluetreble.com wrote: On 1/28/15 11:18 PM, Matt Kelly wrote: In a previous thread Tom Lane said: (I'm also wondering if it'd make sense to expose the stats timestamp as a callable function, so that the case could be dealt with programmatically as well. But that's future-feature territory.) (http://www.postgresql.org/message-id/27251.1421684...@sss.pgh.pa.us) It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I thought I'd grab it. I've reviewed the patch (though haven't tested it myself) and it looks good. The only thing I'm not sure of is this: + /* Get the timestamp of the current statistics snapshot */ + Datum + pg_stat_snapshot_timestamp(PG_FUNCTION_ARGS) + { + PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()-stats_timestamp); + } Is the community OK with referencing stats_timestamp that way? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: [HACKERS] Exposing the stats snapshot timestamp to SQL
This is now: https://commitfest.postgresql.org/4/128/ On Thu, Jan 29, 2015 at 7:01 PM, Matt Kelly mkell...@gmail.com wrote: Robert, I'll add it to the commitfest. Jim, I'm not sure I understand what you mean? This new function follows the same conventions as everything else in the file. TimestampTz is just a typedef for int64. Functions like pg_stat_get_buf_alloc follow the exact same pattern on the int64 fields of the global stats struct. - Matt K. On Thu, Jan 29, 2015 at 6:49 PM, Jim Nasby jim.na...@bluetreble.com wrote: On 1/28/15 11:18 PM, Matt Kelly wrote: In a previous thread Tom Lane said: (I'm also wondering if it'd make sense to expose the stats timestamp as a callable function, so that the case could be dealt with programmatically as well. But that's future-feature territory.) (http://www.postgresql.org/message-id/27251.1421684...@sss.pgh.pa.us) It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I thought I'd grab it. I've reviewed the patch (though haven't tested it myself) and it looks good. The only thing I'm not sure of is this: + /* Get the timestamp of the current statistics snapshot */ + Datum + pg_stat_snapshot_timestamp(PG_FUNCTION_ARGS) + { + PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()-stats_timestamp); + } Is the community OK with referencing stats_timestamp that way? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
[HACKERS] Exposing the stats snapshot timestamp to SQL
In a previous thread Tom Lane said: (I'm also wondering if it'd make sense to expose the stats timestamp as a callable function, so that the case could be dealt with programmatically as well. But that's future-feature territory.) (http://www.postgresql.org/message-id/27251.1421684...@sss.pgh.pa.us) It seemed the appropriate scope for my first submission, and that feature has been on my wish list for a while, so I thought I'd grab it. Main purpose of this patch is to expose the timestamp of the current stats snapshot so that it can be leveraged by monitoring code. The obvious case is alerting if the stats collector becomes unresponsive. The common use case is to smooth out graphs which are built from high frequency monitoring of the stats collector. The timestamp is already available as part of PgStat_GlobalStats. This patch is just the boilerplate (+docs tests) needed to expose that value to SQL. It shouldn't impact anything else in the server. I'm not particularly attached to the function name, but I didn't have a better idea. The patch should apply cleanly to master. - Matt K pg_stat_snapshot_timestamp_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?
Sure, but nobody who is not a developer is going to care about that. A typical user who sees pgstat wait timeout, or doesn't, isn't going to be able to make anything at all out of that. As a user, I wholeheartedly disagree. That warning helped me massively in diagnosing an unhealthy database server in the past at TripAdvisor (i.e. high end server class box, not a raspberry pie). I have realtime monitoring that looks at pg_stat_database at regular intervals particularly for the velocity of change of xact_commit and xact_rollback columns, similar to how check_postgres does it. https://github.com/bucardo/check_postgres/blob/master/check_postgres.pl#L4234 When one of those servers was unhealthy, it stopped reporting statistics for 30 seconds+ at a time. My dashboard which polled far more frequently than that indicated the server was normally processing 0 tps with intermittent spikes. I went directly onto the server and sampled pg_stat_database. That warning was the only thing that directly indicated that the statistics collector was not to be trusted. It obviously was a victim of what was going on in the server, but its pretty important to know when your methods for measuring server health are lying to you. The spiky TPS at first glance appears like some sort of live lock, not just that the server is overloaded. Now, I know: 0 change in stats = collector broken. Rereading the docks, Also, the collector itself emits a new report at most once per PGSTAT_STAT_INTERVAL milliseconds (500 ms unless altered while building the server). Without context this merely reads: We sleep for 500ms, plus the time to write the file, plus whenever the OS decides to enforce the timer interrupt... so like 550-650ms. It doesn't read, When server is unhealthy, but _still_ serving queries, the stats collector might not be able to keep up and will just stop reporting stats all together. I think the warning is incredibly valuable. Along those lines I'd also love to see a pg_stat_snapshot_timestamp() for monitoring code to use to determine if its using a stale snapshot, as well as helping to smooth graphs of the statistics that are based on highly granular snapshotting. - Matt Kelly
Re: [HACKERS] Async execution of postgres_fdw.
I'm trying to compare v5 and v6 in my laptop right now. Apparently my laptop is quite a bit faster than your machine because the tests complete in roughly 3.3 seconds. I added more data and didn't see anything other than noise. (Then again the queries were dominated by the disk sort so I should retry with larger work_mem). I'll try it again when I have more time to play with it. I suspect the benefits would be more clear over a network. Larger than default work_mem yes, but I think one of the prime use case for the fdw is for more warehouse style situations (PostgresXL style use cases). In those cases, work_mem might reasonably be set to 1GB. Then even if you have 10KB rows you can fetch a million rows and still be using less than work_mem. A simpler change would be to vary it with respect to work_mem. Half baked idea: I know its the wrong time in the execution phase, but if you are using remote estimates for cost there should also be a row width estimate which I believe is based from pg_statistic and its mean column width. Its actually a pity that there is no way to set fetch sizes based on give me as many tuples as will fit in less than x amount of memory. Because that is almost always exactly what you want. Even when writing application code, I've never actually wanted precisely 10,000 rows; I've always wanted a reasonable size chunk that could fit into memory and then backed my way into how many rows I wanted. If we were to extend FETCH to support syntax like: FETCH FORWARD '10MB' FROM ...; then we would eliminate the need estimate the value on the fly. The async stuff, however, is a huge improvement over the last time I played with the fdw. The two active postgres processes were easily consuming a core and half of CPU. I think its not worth tying these two things together. Its probably worth it to make these two separate discussions and separate patches. - Matt Kelly *Just sanity checking myself: Shutting down the server, applying the different patch, 'make clean install' in postgres_fdw, and then restarting the server should obviously be sufficient to make sure its running the new code because that is all linked at runtime, right?
Re: [HACKERS] Async execution of postgres_fdw.
I think its telling that varying the fetch size doubled the performance, even on localhost. If you were to repeat this test across a network, the performance difference would be far more drastic. I understand the desire to keep the fetch size small by default, but I think your results demonstrate how important the value is. At the very least, it is worth reconsidering this arbitrary value. However, I think the real solution is to make this configurable. It probably should be a new option on the foreign server or table, but an argument could be made for it to be global across the server just like work_mem. Obviously, this shouldn't block your current patch but its worth revisiting. - Matt Kelly