Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-08-06 Thread Matt Kelly
Its worth noting that the JDBC's behavior when you switch back to
autocommit is to immediately commit the open transaction.

Personally, I think committing immediately or erroring are unsurprising
behaviors.  The current behavior is surprising and obviously wrong.
Rolling back without an error would be very surprising (no other database
API I know of does that) and would take forever to debug issues around the
behavior.  And committing after the next statement is definitely the most
surprising behavior suggested.

IMHO, I think committing immediately and erroring are both valid.  I think
I'd prefer the error in principle, and per the law of bad code I'm sure,
although no one has ever intended to use this behavior, there is probably
some code out there that is relying on this behavior for "correctness".  I
think a hard failure and making the dev add an explicit commit is least
likely to cause people serious issues.  As for the other options, consider
me opposed.

- Matt K.


Re: [HACKERS] Cluster on NAS and data center.

2016-07-04 Thread Matt Kelly
As someone who has bitten by index corruption due to collation changes
between glibc versions that shipped CentOS 6 and CentOS 7, don't even try
to do this with anything other than C collation. The default collation
_will_ deterministically leave you with a silently corrupt database if you
store anything other than ASCII text. Windows and Linux are going to
implement en_US.utf-8 slightly differently and Postgres is currently
relying on the OS to provide collation implementations. Go search for my
mailing list post about the dangers of running across versions of glibc for
more info.

I'm going to echo everyone else's sentiment though, and assert that what
you are trying to do is really an insane idea. You might be able to make it
appear like its working but as a DBA, I would have absolutely no confidence
in using that server for disaster recovery.

If your company is saving money by not getting Windows licenses for your DR
environment, you are far better off just saving one more license and making
both your production and DR server be Linux builds.

- Matt K.


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-12 Thread Matt Kelly
>
> On Fri, 2016-03-11 at 17:24 +0100, Michael Paquier wrote:
> > On Fri, Mar 11, 2016 at 5:19 PM, Anastasia Lubennikova wrote:
> > >
> > > BTW, if you know a good way to corrupt index (and do it
> > > reproducible) I'd be
> > > very glad to see it.
> > You can use for example dd in non-truncate mode to corrupt on-disk
> > page data, say that for example:
> > dd if=/dev/random bs=8192 count=1 \
> > seek=$BLOCK_ID of=base/$DBOID/$RELFILENODE \
> > conv=notrunc
> I guess /dev/random is not very compatible with the "reproducible"
> requirement. I mean, it will reproducibly break the page, but pretty
> much completely, which is mostly what checksums are for.


You can actually pretty easily produce a test case by setting up streaming
replication between servers running two different version of glibc.

I actually wrote a tool that spins up a pair of VMs using vagrant and then
sets them up as streaming replica's using ansible.  It provides a nice one
liner to get a streaming replica test environment going and it will easily
provide the cross glibc test case.  Technically, though it belongs to Trip
because I wrote it on company time.  Let me see if I can open source a
version of it later this week that way you can use it for testing.

- Matt K.


Re: [HACKERS] logical column ordering

2015-02-27 Thread Matt Kelly

 Even if it does fit in memory I suspect memory bandwidth is more important
 than clock cycles.


http://people.freebsd.org/~lstewart/articles/cpumemory.pdf

This paper is old but the ratios should still be pretty accurate.  Main
memory is 240 clock cycles away and L1d is only 3.  If the experiments in
this paper still hold true loading the 8K block into L1d is far more
expensive than the CPU processing done once the block is in cache.

When one adds in NUMA to the contention on this shared resource, its not
that hard for a 40 core machine to starve for memory bandwidth, and for
cores to sit idle waiting for main memory.  Eliminating wasted space seems
far more important even when everything could fit in memory already.


Re: [HACKERS] Exposing the stats snapshot timestamp to SQL

2015-02-19 Thread Matt Kelly

 Yeah.  The only use-case that's been suggested is detecting an
 unresponsive stats collector, and the main timestamp should be plenty for
 that.

Lately, I've spent most of my time doing investigation into increasing
qps.  Turned out we've been able to triple our throughput by monitoring
experiments at highly granular time steps (1 to 2 seconds).  Effects that
were invisible with 30 second polls of the stats were obvious with 2 second
polls.

The problem with doing highly granular snapshots is that the postgres
counters are monotonically increasing, but only when stats are published.
Currently you have no option except to divide by the delta of now() between
the polling intervals. If you poll every 2 seconds the max error is about
.5/2 or 25%.  It makes reading those numbers a bit noisy.  Using
(snapshot_timestamp_new
- snapshot_timestamp_old) as the denominator in that calculation should
help to smooth out that noise and show a clearer picture.

However, I'm happy with the committed version. Thanks Tom.

- Matt K.


On Thu, Feb 19, 2015 at 9:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Matt Kelly mkell...@gmail.com writes:
  Attached is the fixed version. (hopefully with the right mime-type and
  wrong extension.  Alas, gmail doesn't let you set mime-types; time to
 find
  a new email client...)

 Committed with a couple of changes:

 * I changed the function name from pg_stat_snapshot_timestamp to
 pg_stat_get_snapshot_timestamp, which seemed to me to be the style
 in general use in the stats stuff for inquiry functions.

 * The function should be marked stable not volatile, since its value
 doesn't change any faster than all the other stats inquiry functions.

 regards, tom lane



Re: [HACKERS] Exposing the stats snapshot timestamp to SQL

2015-01-29 Thread Matt Kelly
Robert, I'll add it to the commitfest.

Jim, I'm not sure I understand what you mean?  This new function follows
the same conventions as everything else in the file.  TimestampTz is just a
typedef for int64.  Functions like pg_stat_get_buf_alloc follow the exact
same pattern on the int64 fields of the global stats struct.

- Matt K.

On Thu, Jan 29, 2015 at 6:49 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 1/28/15 11:18 PM, Matt Kelly wrote:

 In a previous thread Tom Lane said:

 (I'm also wondering if it'd make sense to expose the stats timestamp
 as a callable function, so that the case could be dealt with
 programmatically as well.  But that's future-feature territory.)

 (http://www.postgresql.org/message-id/27251.1421684...@sss.pgh.pa.us)

 It seemed the appropriate scope for my first submission, and that feature
 has been on my wish list for a while, so I thought I'd grab it.


 I've reviewed the patch (though haven't tested it myself) and it looks
 good. The only thing I'm not sure of is this:

 + /* Get the timestamp of the current statistics snapshot */
 + Datum
 + pg_stat_snapshot_timestamp(PG_FUNCTION_ARGS)
 + {
 +   PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()-stats_timestamp);
 + }

 Is the community OK with referencing stats_timestamp that way?
 --
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: [HACKERS] Exposing the stats snapshot timestamp to SQL

2015-01-29 Thread Matt Kelly
This is now: https://commitfest.postgresql.org/4/128/

On Thu, Jan 29, 2015 at 7:01 PM, Matt Kelly mkell...@gmail.com wrote:

 Robert, I'll add it to the commitfest.

 Jim, I'm not sure I understand what you mean?  This new function follows
 the same conventions as everything else in the file.  TimestampTz is just a
 typedef for int64.  Functions like pg_stat_get_buf_alloc follow the exact
 same pattern on the int64 fields of the global stats struct.

 - Matt K.

 On Thu, Jan 29, 2015 at 6:49 PM, Jim Nasby jim.na...@bluetreble.com
 wrote:

 On 1/28/15 11:18 PM, Matt Kelly wrote:

 In a previous thread Tom Lane said:

 (I'm also wondering if it'd make sense to expose the stats timestamp
 as a callable function, so that the case could be dealt with
 programmatically as well.  But that's future-feature territory.)

 (http://www.postgresql.org/message-id/27251.1421684...@sss.pgh.pa.us)

 It seemed the appropriate scope for my first submission, and that
 feature has been on my wish list for a while, so I thought I'd grab it.


 I've reviewed the patch (though haven't tested it myself) and it looks
 good. The only thing I'm not sure of is this:

 + /* Get the timestamp of the current statistics snapshot */
 + Datum
 + pg_stat_snapshot_timestamp(PG_FUNCTION_ARGS)
 + {
 +   PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()-stats_timestamp);
 + }

 Is the community OK with referencing stats_timestamp that way?
 --
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com





[HACKERS] Exposing the stats snapshot timestamp to SQL

2015-01-28 Thread Matt Kelly
In a previous thread Tom Lane said:

(I'm also wondering if it'd make sense to expose the stats timestamp
 as a callable function, so that the case could be dealt with
 programmatically as well.  But that's future-feature territory.)


(http://www.postgresql.org/message-id/27251.1421684...@sss.pgh.pa.us)

It seemed the appropriate scope for my first submission, and that feature
has been on my wish list for a while, so I thought I'd grab it.

Main purpose of this patch is to expose the timestamp of the current stats
snapshot so that it can be leveraged by monitoring code.  The obvious case
is alerting if the stats collector becomes unresponsive.  The common use
case is to smooth out graphs which are built from high frequency monitoring
of the stats collector.

The timestamp is already available as part of PgStat_GlobalStats.  This
patch is just the boilerplate (+docs  tests) needed to expose that value
to SQL.  It shouldn't impact anything else in the server.

I'm not particularly attached to the function name, but I didn't have a
better idea.

The patch should apply cleanly to master.

- Matt K


pg_stat_snapshot_timestamp_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-21 Thread Matt Kelly

 Sure, but nobody who is not a developer is going to care about that.
 A typical user who sees pgstat wait timeout, or doesn't, isn't going
 to be able to make anything at all out of that.


As a user, I wholeheartedly disagree.

That warning helped me massively in diagnosing an unhealthy database server
in the past at TripAdvisor (i.e. high end server class box, not a raspberry
pie).  I have realtime monitoring that looks at pg_stat_database at regular
intervals particularly for the velocity of change of xact_commit and
xact_rollback columns, similar to how check_postgres does it.
https://github.com/bucardo/check_postgres/blob/master/check_postgres.pl#L4234

When one of those servers was unhealthy, it stopped reporting statistics
for 30 seconds+ at a time.  My dashboard which polled far more frequently
than that indicated the server was normally processing 0 tps with
intermittent spikes. I went directly onto the server and sampled
pg_stat_database.  That warning was the only thing that directly indicated
that the statistics collector was not to be trusted.  It obviously was a
victim of what was going on in the server, but its pretty important to know
when your methods for measuring server health are lying to you.  The spiky
TPS at first glance appears like some sort of live lock, not just that the
server is overloaded.

Now, I know: 0 change in stats = collector broken.  Rereading the docks,

 Also, the collector itself emits a new report at most once per
 PGSTAT_STAT_INTERVAL milliseconds (500 ms unless altered while building
 the server).


Without context this merely reads: We sleep for 500ms, plus the time to
write the file, plus whenever the OS decides to enforce the timer
interrupt... so like 550-650ms.  It doesn't read, When server is
unhealthy, but _still_ serving queries, the stats collector might not be
able to keep up and will just stop reporting stats all together.

I think the warning is incredibly valuable.  Along those lines I'd also
love to see a pg_stat_snapshot_timestamp() for monitoring code to use to
determine if its using a stale snapshot, as well as helping to smooth
graphs of the statistics that are based on highly granular snapshotting.

- Matt Kelly


Re: [HACKERS] Async execution of postgres_fdw.

2015-01-20 Thread Matt Kelly
I'm trying to compare v5 and v6 in my laptop right now.  Apparently my
laptop is quite a bit faster than your machine because the tests complete
in roughly 3.3 seconds.

I added more data and didn't see anything other than noise.  (Then again
the queries were dominated by the disk sort so I should retry with larger
work_mem).  I'll try it again when I have more time to play with it.  I
suspect the benefits would be more clear over a network.

Larger than default work_mem yes, but I think one of the prime use case for
the fdw is for more warehouse style situations (PostgresXL style use
cases).  In those cases, work_mem might reasonably be set to 1GB.  Then
even if you have 10KB rows you can fetch a million rows and still be using
less than work_mem.  A simpler change would be to vary it with respect to
work_mem.

Half baked idea: I know its the wrong time in the execution phase, but if
you are using remote estimates for cost there should also be a row width
estimate which I believe is based from pg_statistic and its mean column
width.

Its actually a pity that there is no way to set fetch sizes based on give
me as many tuples as will fit in less than x amount of memory.  Because
that is almost always exactly what you want.  Even when writing application
code, I've never actually wanted precisely 10,000 rows; I've always wanted
a reasonable size chunk that could fit into memory and then backed my way
into how many rows I wanted.  If we were to extend FETCH to support syntax
like: FETCH FORWARD '10MB' FROM ...; then we would eliminate the need
estimate the value on the fly.

The async stuff, however, is a huge improvement over the last time I played
with the fdw.  The two active postgres processes were easily consuming a
core and half of CPU.  I think its not worth tying these two things
together.  Its probably worth it to make these two separate discussions and
separate patches.

- Matt Kelly

*Just sanity checking myself: Shutting down the server, applying the
different patch, 'make clean install' in postgres_fdw, and then restarting
the server should obviously be sufficient to make sure its running the new
code because that is all linked at runtime, right?


Re: [HACKERS] Async execution of postgres_fdw.

2015-01-19 Thread Matt Kelly
I think its telling that varying the fetch size doubled the performance,
even on localhost.  If you were to repeat this test across a network, the
performance difference would be far more drastic.

I understand the desire to keep the fetch size small by default, but I
think your results demonstrate how important the value is.  At the very
least, it is worth reconsidering this arbitrary value.  However, I think
the real solution is to make this configurable.  It probably should be a
new option on the foreign server or table, but an argument could be made
for it to be global across the server just like work_mem.

Obviously, this shouldn't block your current patch but its worth revisiting.

- Matt Kelly