[PERFORM] Impact of track_activity_query_size on high traffic OLTP system

2017-04-13 Thread Jeremy Finzel
I have found some examples of people tweaking this
parameter track_activity_query_size to various setting such as 4000, 1,
15000, but little discussion as to performance impact on memory usage.
What I don't have a good sense of is how significant this would be for a
high traffic system with rapid connection creation/destruction, say 1000s
per second.  In such a case, would there be a reason to hesitate raising it
to 1 from 1024?  Is 10k memory insignificant?  Any direction here is
much appreciated, including a good way to benchmark this kind of thing.

Thanks!


Re: [PERFORM] index of only not null, use function index?

2017-06-08 Thread Jeremy Finzel
Normally, I find that in these situations, it makes sense to index the
primary key of the table WHERE col is not null, because it will usually
cover the largest number of cases, and is much better than a two-value
boolean index, for example.

On Thu, Jun 8, 2017 at 9:58 AM, Tom Lane  wrote:

> Merlin Moncure  writes:
> > On Mon, May 22, 2017 at 10:17 AM, Ariel  wrote:
> >> Should I do:
> >>
> >> CREATE INDEX ON table ((col IS NOT NULL)) WHERE col IS NOT NULL
> >>
> >> or:
> >>
> >> CREATE INDEX ON table (col) WHERE col IS NOT NULL
> >>
> >> I'm thinking the first index will make a smaller, simpler, index since I
> >> don't actually need to index the value of the column. But are there any
> >> drawbacks I may not be aware of? Or perhaps there are no actual
> benefits?
>
> > You are correct.  I don't see any downside to converting to bool; this
> > will be more efficient especially if 'col' is large at the small cost
> > of some generality.
>
> Depends on the datatype really.  Because of alignment considerations,
> the index tuples will be the same size for any column value <= 4 bytes,
> or <= 8 bytes on 64-bit hardware.  So if this is an integer column,
> or even bigint on 64-bit, you won't save any space with the first
> index definition.  If it's a text column with an average width larger
> than what I just mentioned, you could save some space that way.
>
> In general, indexes on expressions are a tad more expensive to maintain
> than indexes on plain column values.  And the second index at least has
> the potential to be useful for other queries than the one you're thinking
> about.  So personally I'd go with the second definition unless you can
> show that there's a really meaningful space savings with the first one.
>
> > Having said that, what I typically do in such
> > cases (this comes a lot in database driven work queues) something like
> > this:
> > CREATE INDEX ON table (OrderCol) WHERE col IS NOT NULL;
>
> Right, you can frequently get a lot of mileage out of indexing something
> that's unrelated to the predicate condition, but is also needed by the
> query you want to optimize.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


[PERFORM] Odd sudden performance degradation related to temp object churn

2017-08-14 Thread Jeremy Finzel
This particular db is on 9.3.15.  Recently we had a serious performance
degradation related to a batch job that creates 4-5 temp tables and 5
indexes.  It is a really badly written job but what really confuses us is
that this job has been running for years with no issue remotely approaching
this one.  We are also using pgpool.

The job would kick off with 20-30 of similar queries running at once.  The
thing normally takes only 30ms or so to run - it only operates on 1
customer at a time (yes, it's horribly written).  All of a sudden the
cluster started thrashing and performance seriously degraded.  We tried a
number of things with no success:

   - Analyzed the whole database
   - Turned off full logging
   - Turned off synchronous commit
   - Vacuumed several of the catalog tables
   - Checked if we had an abnormal high amount of traffic this time - we
   didn't
   - No abnormal disk/network issues (we would have seen much larger issues
   if that had been the case)
   - Tried turning down the number of app nodes running

What ended up completely resolving the issue was converting the query to
use ctes instead of temp tables.  That means we avoided the disk writing
and the catalog churn, and useless indexes.  However, we are baffled as to
why this could make such a big difference when we had no issue like this
before, and we have seen no systematic performance degradation in our
system.

Any insights would be greatly appreciated, as we are concerned not knowing
the root cause.

Thanks,
Jeremy


Re: [PERFORM] Odd sudden performance degradation related to temp object churn

2017-08-14 Thread Jeremy Finzel
On Mon, Aug 14, 2017 at 3:01 PM, Scott Marlowe 
wrote:

> On Mon, Aug 14, 2017 at 1:53 PM, Jeremy Finzel  wrote:
> > This particular db is on 9.3.15.  Recently we had a serious performance
> > degradation related to a batch job that creates 4-5 temp tables and 5
> > indexes.  It is a really badly written job but what really confuses us is
> > that this job has been running for years with no issue remotely
> approaching
> > this one.  We are also using pgpool.
> >
> > The job would kick off with 20-30 of similar queries running at once.
> The
> > thing normally takes only 30ms or so to run - it only operates on 1
> customer
> > at a time (yes, it's horribly written).  All of a sudden the cluster
> started
> > thrashing and performance seriously degraded.  We tried a number of
> things
> > with no success:
> >
> > Analyzed the whole database
> > Turned off full logging
> > Turned off synchronous commit
> > Vacuumed several of the catalog tables
> > Checked if we had an abnormal high amount of traffic this time - we
> didn't
> > No abnormal disk/network issues (we would have seen much larger issues if
> > that had been the case)
> > Tried turning down the number of app nodes running
> >
> > What ended up completely resolving the issue was converting the query to
> use
> > ctes instead of temp tables.  That means we avoided the disk writing and
> the
> > catalog churn, and useless indexes.  However, we are baffled as to why
> this
> > could make such a big difference when we had no issue like this before,
> and
> > we have seen no systematic performance degradation in our system.
> >
> > Any insights would be greatly appreciated, as we are concerned not
> knowing
> > the root cause.
>
> How are your disks setup? One big drive with everything on it?
> Separate disks for pg_xlog and pg's data dir and the OS logging? IO
> contention is one of the big killers of db performance.


It's one san volume ssd for the data and wal files.  But logging and memory
spilling and archived xlogs go to a local ssd disk.


> Logging likely isn't your problem, but yeah you don't need to log
> ERRYTHANG to see the problem either. Log long running queries temp
> usage, buffer usage, query plans on slow queries, stuff like that.
>
> You've likely hit a "tipping point" in terms of data size. Either it's
> cause the query planner to make a bad decision, or you're spilling to
> disk a lot more than you used to.

Be sure to log temporary stuff with log_temp_files = 0 in your
> postgresql.conf and then look for temporary file in your logs. I bet
> you've started spilling into the same place as your temp tables are
> going, and by default that's your data directory. Adding another drive
> and moving pgsql's temp table space to it might help.
>

We would not have competition between disk spilling and temp tables because
what I described above - they are going to two different places.  Also, I
neglected to mention that we turned on auto-explain during this crisis, and
found the query plan was good, it was just taking forever due to thrashing
just seconds after we kicked off the batches.  I did NOT turn on
log_analyze and timing but it was enough to see there was no apparent query
plan regression.  Also, we had no change in the performance/plan after
re-analyzing all tables.


> Also increasing work_mem (but don't go crazy, it's per sort, so can
> multiply fast on a busy server)
>

We are already up at 400MB, and this query was using memory in the low KB
levels because it is very small (1 - 20 rows of data per temp table, and no
expensive selects with missing indexes or anything).


> Also log your query plans or run explain / explain analyze on the slow
> queries to see what they're doing that's so expensive.
>

Yes, we did do that and there was nothing remarkable about the plan when we
ran them in production.  All we saw was that over time, the actual
execution time (along with everything else on the entire system) started
slowing down more and more as thrashing increased.  But we found no
evidence of a plan regression.

Thank you!  Any more feedback is much appreciated.


Re: [PERFORM] Odd sudden performance degradation related to temp object churn

2017-08-15 Thread Jeremy Finzel
>
> > Not so.
> >
> > This system has no defined temp_tablespace however spillage due to
> > sorting/hashing that exceeds work_mem goes to base/pgsql_tmp which we
> > have symlinked out to a local SSD drive.
>
> Which is also where temp tables are created.
>

This isn't true, at least in our environment.  Just as proof, I have
created a couple of temp tables, and querying the relfilenodes, they only
show up under base//t4_:

test=# CREATE TEMP TABLE foo(id int);
CREATE TABLE
test=# INSERT INTO foo SELECT * FROM generate_series(1,100);
INSERT 0 100
test=# CREATE TEMP TABLE bar();
CREATE TABLE
test=# SELECT relfilenode FROM pg_class WHERE relname IN('foo','bar');
 relfilenode
-
   20941
   20944
(2 rows)

postgres@foo:/san//pgdata/base$ ls -l
total 44
drwx-- 2 postgres postgres  4096 Jul  7 15:19 1
drwx-- 2 postgres postgres  4096 Nov 29  2016 12408
drwx-- 2 postgres postgres  4096 Jul 14 14:00 12409
drwx-- 2 postgres postgres 12288 Jul  7 15:19 18289
drwx-- 2 postgres postgres 12288 Jul  7 15:19 18803
drwx-- 2 postgres postgres  4096 Jul  7 15:19 20613
drwx-- 2 postgres postgres  4096 Aug 15 08:06 20886
lrwxrwxrwx 1 postgres postgres30 Jul  7 15:15 pgsql_tmp ->
/local/pgsql_tmp/9.6/

postgres@pgsnap05:/san//pgdata/base$ ls -l 20886 | grep
'20941\|20944'
-rw--- 1 postgres postgres  8192 Aug 15 10:55 t4_20941
-rw--- 1 postgres postgres  0 Aug 15 10:55 t4_20944
postgres@pgsnap05:/san/dba_dev_d/pgdata/base$ cd pgsql_tmp
postgres@pgsnap05:/san/dba_dev_d/pgdata/base/pgsql_tmp$ ls -l
total 0


Re: [PERFORM] Odd sudden performance degradation related to temp object churn

2017-08-18 Thread Jeremy Finzel
On Tue, Aug 15, 2017 at 12:07 PM, Scott Marlowe 
wrote:

> So do iostat or iotop show you if / where your disks are working
> hardest? Or is this CPU overhead that's killing performance?
>

Sorry for the delayed reply.  I took a look in more detail at the query
plans from our problem query during this incident.  There are actually 6
plans, because there were 6 unique queries.  I traced one query through our
logs, and found something really interesting.  That is that all of the
first 5 queries are creating temp tables, and all of them took upwards of
500ms each to run.  The final query, however, is a simple select from the
last temp table, and that query took 0.035ms!  This really confirms that
somehow, the issue had to do with *writing *to the SAN, I think.  Of course
this doesn't answer a whole lot, because we had no other apparent issues
with write performance at all.

I also provide some graphs below.

7pm-3am on 8/10 (first incidents were around 10:30pm, other incidents ~1am,
2am):

Local Disk IO:

[image: Screen Shot 2017-08-18 at 8.20.06 AM.png]

SAN IO:

[image: Screen Shot 2017-08-18 at 8.16.59 AM.png]

CPU:

[image: Screen Shot 2017-08-18 at 8.20.58 AM.png]

7-9pm on 8/10 (controlled attempts starting a little after 7):

CPU:

[image: Screen Shot 2017-08-18 at 8.43.35 AM.png]

Write IO on SAN:

[image: Screen Shot 2017-08-18 at 8.44.32 AM.png]

Read IO on Local disk:

[image: Screen Shot 2017-08-18 at 8.46.27 AM.png]

Write IO on Local disk:

[image: Screen Shot 2017-08-18 at 8.46.58 AM.png]


[PERFORM] Indexing an array of two separate columns

2017-09-15 Thread Jeremy Finzel
I have a user who is trying to match overlapping duplicate phone info but
for different customer_ids.

The intended conditional could be expressed:
IF the intersection of the sets
{c.main_phone, c.secondary_phone}
and
{c1.main_phone, c1.secondary_phone}
is not empty
THEN join
EXCEPT where the intersection of the sets =
{'00'}

He wants a join like this:

FROM customers c
INNER JOIN customers c1 on (array[c.main_phone, c.secondary_phone] &&
 array[nullif(c1.main_phone, '00') , nullif(c1.secondary_phone,
'00')])
(array[c.main_phone, c.secondary_phone] && array[nullif(c1.main_phone,
'00') , nullif(c1.secondary_phone, '00')])
WHERE c.customer_id = 1;

I want to index this part:
array[nullif(c1.main_phone, '00') , nullif(c1.secondary_phone,
'00')]

First of all I see I can't create a btree index on an array.  And with
btree_gin, this index is not being used:

CREATE INDEX ON customers USING gin ((NULLIF(main_phone,
'00'::text)), (NULLIF(secondary_phone, '00'::text)));

What am I missing here?  Is there a way to support a condition like this?

Thank you!