Re: [PERFORM] Slow query in JDBC

2017-09-29 Thread Jeff Janes
On Thu, Sep 28, 2017 at 2:59 AM, Subramaniam C 
wrote:

> First output show the output when the query is executed from sql command
> line. The second output show when it is executed from the application. AS
> per the output it is clear that the when the query is executed through JDBC
> its not using the index (health_index) instead its doing sequence scan.
> Please let us know how this issue can be resolved from JDBC?
>
> 1.)
>
>
> * ->  Index Only Scan
> using health_index on health_timeseries_table  (cost=0.56..421644.56
> rows=1558800 width=24)*
>
> *   Index Cond: (("timestamp" >=
> '1505989186834'::bigint) AND ("timestamp" <= '1505990086834'::bigint))*
>
>

> 2.)
>
>
>   ->  Seq Scan on
> health_timeseries_table  (cost=0.00..267171.00 rows=1005634 width=24)
>
> Filter: (("timestamp" >=
> '150598950'::bigint) AND ("timestamp" <= '150599040'::bigint))
>


Those are different queries, so it is not terribly surprising it might
choose a different plan.

For this type of comparison, you need to compare identical queries,
including parameter.

Cheers,

Jeff


Re: [PERFORM] repeated subplan execution

2017-09-20 Thread Jeff Janes
On Tue, Sep 19, 2017 at 7:31 PM, monika yadav 
wrote:

> Hi All,
>
> I didn't understand why same sub plan for the sub query executed two
> times? As per the query it should have been executed only once.
>
> Can someone please explain this behaviour of query execution ?
>


The sum_bid at the end of the query is an alias for the entire subselect,
so it not entirely surprising that it gets interpolated twice. it is just
kind of unfortunate from a performance perspective.

The query I originally gave is equivalent to this query:


 select
aid,
(select sum(bid) from pgbench_branches
where bbalance between -1-abalance and 1+abalance
) as sum_bid
from pgbench_accounts
where aid between 1 and 1000
group by aid
having (select sum(bid) from pgbench_branches where bbalance
between -1-abalance and 1+abalance ) >0;


In my originally query I just wrapped the whole thing in another select, so
that I could use the alias rather than having to mechanically repeat the
entire subquery again in the HAVING section.  They give identical plans.

Cheers,

Jeff


[PERFORM] repeated subplan execution

2017-09-19 Thread Jeff Janes
I have a complicated query which runs the exact same subplan more than once.

Here is a greatly simplified (and rather pointless) query to replicate the
issue:

select aid, sum_bid from
(select
aid,
(select sum(bid) from pgbench_branches
where bbalance between -1-abalance and 1+abalance
) as sum_bid
from pgbench_accounts
where aid between 1 and 1000
group by aid
) asdfsadf
where sum_bid >0;

 QUERY
PLAN
-
 Group  (cost=0.44..375841.29 rows=931 width=12) (actual
time=1.233..691.200 rows=679 loops=1)
   Group Key: pgbench_accounts.aid
   Filter: ((SubPlan 2) > 0)
   Rows Removed by Filter: 321
   ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
 (cost=0.44..634.32 rows=931 width=8) (actual time=0.040..1.783 rows=1000
loops=1)
 Index Cond: ((aid >= 1) AND (aid <= 1000))
   SubPlan 2
 ->  Aggregate  (cost=403.00..403.01 rows=1 width=8) (actual
time=0.406..0.407 rows=1 loops=1000)
   ->  Seq Scan on pgbench_branches pgbench_branches_1
 (cost=0.00..403.00 rows=1 width=4) (actual time=0.392..0.402 rows=1
loops=1000)
 Filter: ((bbalance >= ('-1'::integer -
pgbench_accounts.abalance)) AND (bbalance <= (1 +
pgbench_accounts.abalance)))
 Rows Removed by Filter: 199
   SubPlan 1
 ->  Aggregate  (cost=403.00..403.01 rows=1 width=8) (actual
time=0.407..0.407 rows=1 loops=679)
   ->  Seq Scan on pgbench_branches  (cost=0.00..403.00 rows=1
width=4) (actual time=0.388..0.402 rows=1 loops=679)
 Filter: ((bbalance >= ('-1'::integer -
pgbench_accounts.abalance)) AND (bbalance <= (1 +
pgbench_accounts.abalance)))
 Rows Removed by Filter: 199
 Planning time: 0.534 ms
 Execution time: 691.784 ms


https://explain.depesz.com/s/Xaib


The subplan is not so fast that I wish it to be executed again or every row
which passes the filter.

I can prevent this dual execution using a CTE, but that creates other
problems.  Is there a way to get rid of it without resorting to that?

Maybe also a question for bugs and/or hackers, is why should I need to do
anything special to avoid dual execution?

Cheers,

Jeff


Re: [PERFORM] Handling small inserts from many connections.

2017-09-04 Thread Jeff Janes
On Mon, Sep 4, 2017 at 1:14 AM, 우성민  wrote:

> Hi team,
>
> I'm trying to configure postgres and pgbouncer to handle many inserts from
> many connections.
>
> Here's some details about what i want to achieve :
>
>   We have more than 3000 client connections, and my server program forks
> backend process for each client connections.
>   If backend processes send a request to its connected client, the client
> send some text data(about 3000 bytes) to the backend process and wait for
>   next request.
>   The backend process execute insert text data using PQexec from libpq
> lbirary, if PQexec is done, backend process send request to
>   client again.
>
>   All the inserts using one, same table.
>
> The problem is, clients wait too long due to insert process is too slow.
> It seems to working fine at first, but getting slows down after couple of
> hours,
> each insert query takes 3000+ ms and keep growing.
>

If it takes a couple hours for it to slow down, then it sounds like you
have a leak somewhere in your code.

Run "top" and see who is using the CPU time (or the io wait time, if that
is what it is, and the memory)

Cheers,

Jeff


Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Jeff Janes
On Sat, Aug 19, 2017 at 10:37 AM, anand086  wrote:

Your email is very hard to read, the formatting and line wrapping is
heavily mangled.  You might want to attach the plans as files attachments
instead of or in addition to putting the in the body.



>  -> Index Only Scan using ui_nkey_test_tab on test_tab test_tab_1
> (cost=0.56..2.03 rows=1 width=8) |
>
>  Index Cond: ((import_num = '4520460'::numeric) AND (login IS NOT 
> NULL))
>
>
It looks like the statistics for your table are desperately out of date, as
a later query showed there are 762599 rows (unless login is null for all of
them) but the above is estimating there is only one.   When was the table
last analyzed?

Cheers,

Jeff

On Sat, Aug 19, 2017 at 10:37 AM, anand086  wrote:

> I am a Postgres Newbie and trying to learn :) We have a scenario wherein,
> one of the SQL with different input value for import_num showing different
> execution plan. As an example, with import_num = '4520440' the execution
> plan shows Nested Loop and is taking ~12secs. With import_num = '4520460'
> execution plan showed using "Materialize" and never completed. After I set
> enable_material to off, the execution plan is changed using Hash Semi Join
> and completes in less than 3 secs. SELECT count(*) FROM test_tab WHERE
> login IN (SELECT DISTINCT login FROM test_tab WHERE import_num = '4520440'
> AND login IS NOT NULL EXCEPT SELECT DISTINCT login FROM test_tab WHERE
> import_num = '0' AND login IS NOT NULL) AND import_num = '4520440';
> ++ | count | ++ | 746982 | ++ (1 row) Time:
> 12054.274 ms
>
> +---+
> |
> QUERY PLAN
>  |
> +---+
> | Aggregate  (cost=351405.08..351405.09 rows=1 width=8)   
>   
> |
> |   ->  Nested Loop  (cost=349846.23..350366.17 rows=415562 width=0)  
>   
> |
> | ->  HashAggregate  (cost=349845.67..349847.67 rows=200 width=96)
>   
> |
> |   Group Key: ("ANY_subquery".login)::text   
>   
> |
> |   ->  Subquery Scan on "ANY_subquery"  
> (cost=340828.23..348557.47 rows=515282 width=96)  
>  |
> | ->  SetOp Except  (cost=340828.23..343404.65 
> rows=515282 width=100)
>|
> |   ->  Sort  (cost=340828.23..342116.44 rows=515283 
> width=100)
>  |
> | Sort Key: "*SELECT* 1".login
>   
> |
> | ->  Append  (cost=0.56..275836.74 
> rows=515283 width=100)
>   |
> |   ->  Subquery Scan on "*SELECT* 1"  
> (cost=0.56..275834.70 rows=515282 width=12)   
>|
> | ->  Unique  
> (cost=0.56..270681.88 rows=515282 width=8)
> |
> |   ->  Index Only Scan using 
> ui_nkey_test_tab on test_tab test_tab_1  (cost=0.56..268604.07 rows=831125 
> width=8) |
> | Index Cond: 
> ((import_num = '4520440'::numeric) AND (login IS NOT NULL))   
> |
> |   ->  Subquery Scan on "*SELECT* 2"  
> (cost=0.56..2.04 rows=1 width=12) 
>|
> | ->  Unique  (cost=0.56..2.03 
> rows=1 width=8)   
>|
> |   ->  Index Only Scan using 
> ui_nkey_test_tab on test_tab test_tab_2  (cost=0.56..2.03 rows=1 width=8) 
>   |
> | Index Cond: 
> ((import_num = '0'::numeric) AND (login IS NOT NULL)) 
> |
> | ->  Index Only Scan using 

Re: [PERFORM] performance problem on big tables

2017-08-15 Thread Jeff Janes
On Tue, Aug 15, 2017 at 3:06 AM, Mariel Cherkassky <
mariel.cherkas...@gmail.com> wrote:

> Hi,
> So I I run the cheks that jeff mentioned :
> \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1 hour
> and 35 minutes
> \copy local_postresql_table from /tmp/tmp with binary - Didnt run because
> the remote oracle database is currently under maintenance work.
>

The "\copy...from" doesn't depend on oracle, it would be only depend on
local file system (/tmp/tmp), provided that the "\copy...to" finished.
Anyway, given the length of time it took, I think you can conclude the
bottleneck is in oracle_fdw itself, or in Oracle, or the network.

Cheers,

Jeff


Re: [PERFORM] performance problem on big tables

2017-08-14 Thread Jeff Janes
On Mon, Aug 14, 2017 at 6:24 AM, Mariel Cherkassky <
mariel.cherkas...@gmail.com> wrote:

> I have performance issues with two big tables. Those tables are located on
> an oracle remote database. I'm running the quert : insert into
> local_postgresql_table select * from oracle_remote_table.
>
> The first table has 45M records and its size is 23G. The import of the
> data from the oracle remote database is taking 1 hour and 38 minutes.
>
To investigate this, I'd decouple the two steps and see how long each one
takes:

\copy (select * from oracle_remote_table) to /tmp/tmp with binary
\copy local_postresql_table from /tmp/tmp with binary

Cheers,

Jeff


Re: [PERFORM] Unlogged tables

2017-08-09 Thread Jeff Janes
On Tue, Aug 8, 2017 at 8:20 PM, l...@laurent-hasson.com <
l...@laurent-hasson.com> wrote:

> Hello,
>
>
> We have a fairly large static dataset that we load into Postgres. We made
> the tables UNLOGGED and saw a pretty significant performance improvement
> for the loading. This was all fantastic until the server crashed and we
> were surprised to see during a follow up demo that the data had
> disappeared... Of course, it's all our fault for not understanding the
> implications of UNLOGGED proprely.
>
>
> However, our scenario is truly a set of tables with 100's of millions of
> rows that are effectively WORMs: we write them once only, and then only
> read from them afterwards. As such, they could not be possibly corrupted
> post-load (i think) during a server crash (short of physical disk
> defects...).
>

Yes, this is a feature many people have wanted.   You'd have to somehow
mark the unlogged table as immutable and then do a checkpoint, after which
it would no longer need to be truncated after a crash.  Alternatively, it
could be done automatically where the system would somehow know which
unlogged tables were possibly touched since the last successful checkpoint,
and truncate only those one.  But, no one has implemented such a thing.

>
> I'd like to have the performance improvement during a initial batch
> insert, and then make sure the table remains after "unclean" shutdowns,
> which, as you might have it, includes a regular Windows server shut down
> during patching for example.
>

Why doesn't the Windows scheduled shutdown signal postgres to shutdown
cleanly and wait for it to do so?  That is what is supposed to happen.


> So unlogged tables in practice are pretty flimsy. I tried to ALTER ... SET
> LOGGED, but that takes a VERY long time and pretty much negates the initial
> performance boost of loading into an unlogged table.
>

Are you using streaming or wal logging?

Cheers,

Jeff


Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Jeff Janes
On Wed, Jul 12, 2017 at 3:04 AM, Charles Nadeau 
wrote:

> Jeff,
>
> Here are the 2 EXPLAINs for one of my simplest query:
>


It looks like dstexterne and flowcompact are both views over flow.  Can you
share the definition of those views?

I think the iowait > 12.5% is due to the parallel query execution.  But
then the question is, why is it only 25% when you have 10 fold parallelism?

It certainly looks like you are doing more than 4MB/s there, so maybe
something is wrong with the instrumentation, or how you are interpreting
it.

Although it is still less than perhaps it could do.  To put a baseline on
what you can expect out of parallel seq scans, can you do something like:

explain (analyze, buffers) select avg(doctets) from flow;

Cheers,

Jeff


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Jeff Janes
On Tue, Jul 11, 2017 at 4:42 PM, Joshua D. Drake 
wrote:

> On 07/11/2017 04:15 PM, Merlin Moncure wrote:
>
>> On Mon, Jul 10, 2017 at 9:03 AM, Charles Nadeau
>>  wrote:
>>
>>> I’m running PostgreSQL 9.6.3 on Ubuntu 16.10 (kernel 4.4.0-85-generic).
>>> Hardware is:
>>>
>>> *2x Intel Xeon E5550
>>>
>>> *72GB RAM
>>>
>>> *Hardware RAID10 (4 x 146GB SAS 10k) P410i controller with 1GB FBWC (80%
>>> read/20% write) for Postgresql data only:
>>>
>>> The problem I have is very poor read. When I benchmark my array with fio
>>> I
>>> get random reads of about 200MB/s and 1100IOPS and sequential reads of
>>> about
>>> 286MB/s and 21000IPS. But when I watch my queries using pg_activity, I
>>> get
>>> at best 4MB/s. Also using dstat I can see that iowait time is at about
>>> 25%.
>>> This problem is not query-dependent.
>>>
>>
>> Stop right there. 1100 iops * 8kb = ~8mb/sec raw which might
>> reasonably translate to 4mb/sec to the client. 200mb/sec random
>> read/sec on spinning media is simply not plausible;
>>
>
> Sure it is, if he had more than 4 disks ;)


Or more to the point here, if each random read is 4MB long.  Which makes it
more like sequential reads, randomly-piecewise, rather than random reads.


> but he also isn't going to get 1100 IOPS from 4 10k disks. The average 10k
> disk is going to get around 130 IOPS . If he only has 4 then there is no
> way he is getting 1100 IOPS.
>

I wouldn't be sure.  He is using an iodepth of 256 in his benchmark.  It
wouldn't be all that outrageous for a disk to be able to find 3 or 4
sectors per revolution it can read, when it has that many to choose from.

 Cheers,

Jeff


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Jeff Janes
On Tue, Jul 11, 2017 at 4:02 AM, Charles Nadeau 
wrote:

> Jeff,
>
> I used fio in a quick benchmarking script inspired by https://smcleod.net/
> benchmarking-io/:
>
> #!/bin/bash
> #Random throughput
> echo "Random throughput"
> sync
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test --filename=test --bs=4M --iodepth=256 --size=10G
> --readwrite=randread --ramp_time=4
> #Random IOPS
> echo "Random IOPS"
> sync
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test --filename=test --bs=4k --iodepth=256 --size=4G
> --readwrite=randread --ramp_time=4
> #Sequential throughput
> echo "Sequential throughput"
> sync
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test --filename=test --bs=4M --iodepth=256 --size=10G
> --readwrite=read --ramp_time=4
> #Sequential IOPS
> echo "Sequential IOPS"
> sync
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test --filename=test --bs=4k --iodepth=256 --size=4G
> --readwrite=read --ramp_time=4
>
>
I don't think any of those are directly relevant to PostgreSQL, as it
doesn't use direct IO, doesn't use libaio, and is rarely going to get
anywhere near 256 iodepth.  So the best they can do is put a theoretical
ceiling on the performance.  Also, random IO with a 4MB stride doesn't make
any sense from a PostgreSQL perspective.



>
> Performing the test you suggested, I get 128.5MB/s. Monitoring the test, I
> find that the throughput is constant from start to finish and that the
> iowait is also constant at 5%:
>

I would have expected it to do better than that.  Maybe you increase the
kernel readahead setting.  I've found the default to be much too small.
But it doesn't make much difference to you, as you appear to be doing
random IO in your queries, not sequential.


> Could you suggest another way to benchmark random reads?
>

Your 1100 IOPS times 8kb block size gives about 8MB/s of throughput, which
is close to what you report.  So I think I'd would instead focus on tuning
your actual queries.  You say the problem is not query-dependent, but I
think that that just means all the queries you looked at are similar.  If
you looked at a query that can't use indexes, like count(unindexed_column)
from biggest_table; you would find it doing much more IO than 4MB/s.

Can you pick the simplest query you actually care about, and post both an
"explain (analyze, timing off)" and an "explain (analyze, buffers)" for it?
 (Preferably turning "track_io_timing" on first).

One other question I had, you said you had "2x Intel Xeon E5550", which
should be 8 CPU (or 16, if the hyperthreads
are reported as separate CPUs).  But you also said: "Also using dstat I can
see that iowait time is at about 25%".  Usually if there is only one thing
going on on the server, then IOWAIT won't be more than reciprocal of #CPU.
Is the server busy doing other stuff at the same time you are benchmarking
it?

Cheers,

Jeff


Re: [PERFORM] Very poor read performance, query independent

2017-07-10 Thread Jeff Janes
On Mon, Jul 10, 2017 at 7:03 AM, Charles Nadeau 
wrote:

>
> The problem I have is very poor read. When I benchmark my array with fio I
> get random reads of about 200MB/s and 1100IOPS and sequential reads of
> about 286MB/s and 21000IPS.
>


That doesn't seem right.  Sequential is only 43% faster?  What job file are
giving to fio?

What do you get if you do something simpler, like:

time cat ~/$PGDATA/base/16402/*|wc -c

replacing 16402 with whatever your biggest database is.

Cheers,

Jeff


Re: [PERFORM]

2017-06-30 Thread Jeff Janes
On Thu, Jun 29, 2017 at 12:11 PM, Yevhenii Kurtov  wrote:

> Hi Jeff,
>
> That is just a sample data, we are going live in Jun and I don't have
> anything real so far. Right now it's 9.6 and it will be a latest stable
> available release on the date that we go live.
>


You need to use your knowledge of the application to come up with some
plausible sample data.

What happens when something succeeds?  Does it get deleted from the table,
or does it get retained but with a certain value of the status column?  If
it is retained, what happens to the priority and times_failed fields?

The performance of your queuing table will critically depend on that.

If you need to keep it once it succeeds, you should probably do that by
deleting it from the queuing table and inserting it into a history table.
It is much easier to keep performance up with that kind of design.

Cheers,

Jeff


Re: [PERFORM]

2017-06-29 Thread Jeff Janes
On Tue, Jun 27, 2017 at 11:47 PM, Yevhenii Kurtov  wrote:

> Hello,
>
> We have a query that is run almost each second and it's very important to
> squeeze every other ms out of it. The query is:
>
> SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
> OR ((c0."status" = $3) AND (c0."failed_at" > $4))
> OR ((c0."status" = $5) AND (c0."started_at" < $6))
> ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT $7
> FOR UPDATE SKIP LOCKED
>
>


>
> I see that query still went through the Seq Scan instead of Index Scan. Is
> it due to poorly crafted index or because of query structure? Is it
> possible to make this query faster?
>

An index on (priority desc, times_failed) should speed this up massively.
Might want to include status at the end as well. However, your example data
is not terribly realistic.

What version of PostgreSQL are you using?

Cheers,

Jeff


Re: [PERFORM] Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each

2017-06-26 Thread Jeff Janes
On Fri, Jun 23, 2017 at 1:09 PM, Chris Wilson 
wrote:

>
> The records can already be read in order from idx_metric_value If this
> was selected as the primary table, and metric_pos was joined to it, then
> the output would also be in order, and no sort would be needed.
>
> We should be able to use a merge join to metric_pos, because it can be
> read in order of id_metric (its primary key, and the first column in
> idx_metric_value...). If not, a hash join should be faster than a nested
> loop, if we only have to hash ~100 records.
>

Hash joins do not preserve order.  They could preserve the order of their
"first" input, but only if the hash join is all run in one batch and
doesn't spill to disk.  But the hash join code is never prepared to make a
guarantee that it won't spill to disk, and so never considers it to
preserve order.  It thinks it only needs to hash 100 rows, but it is never
absolutely certain of that, until it actually executes.

If I set enable_sort to false, then I do get the merge join you want (but
with asset_pos joined by nested loop index scan, not a hash join, for the
reason just stated above) but that is slower than the plan with the sort in
it, just like PostgreSQL thinks it will be.

If I vacuum your fact table, then it can switch to use index only scans.  I
then get a different plan, still using a sort, which runs in 1.6 seconds.
Sorting is not the slow step you think it is.

Be warned that "explain (analyze)" can substantially slow down and distort
this type of query, especially when sorting.  You should run "explain
(analyze, timing off)" first, and then only trust "explain (analyze)" if
the overall execution times between them are similar.



> If I remove one of the joins (asset_pos) then I get a merge join between
> two indexes, as expected, but it has a materialize just before it which
> makes no sense to me. Why do we need to materialize here? And why
> materialise 100 rows into 1.5 million rows? (explain.depesz.com
> )
>


   ->  Materialize  (cost=0.14..4.89 rows=100 width=8) (actual
> time=0.018..228.265 rows=1504801 loops=1)
>  Buffers: shared hit=2
>  ->  Index Only Scan using idx_metric_pos_id_pos on metric_pos
>  (cost=0.14..4.64 rows=100 width=8) (actual time=0.013..0.133 rows=100
> loops=1)
>Heap Fetches: 100
>Buffers: shared hit=2
>
>
It doesn't need to materialize, it does it simply because it thinks it will
be faster (which it is, slightly).  You can prevent it from doing so by set
enable_materialize to off.  The reason it is faster is that with the
materialize, it can check all the visibility filters at once, rather than
having to do it repeatedly.  It is only materializing 100 rows, the 1504801
comes from the number of rows the projected out of the materialized table
(one for each row in the other side of the join, in this case), rather than
the number of rows contained within it.

And again, vacuum your tables.  Heap fetches aren't cheap.


> The size of the result set is approximately 91 MB (measured with psql -c |
> wc -c). Why does it take 4 seconds to transfer this much data over a UNIX
> socket on the same box?
>

It has to convert the data to a format used for the wire protocol (hardware
independent, and able to support user defined and composite types), and
then back again.

> work_mem = 100MB

Can you give it more than that?  How many simultaneous connections do you
expect?

Cheers,

Jeff


Re: [PERFORM] [BUGS] Invalid WAL segment size. Allowed values are 1,2,4,8,16,32,64

2017-06-12 Thread Jeff Janes
On Fri, Jun 9, 2017 at 3:43 PM, Michael Paquier 
wrote:

> On Fri, Jun 9, 2017 at 10:55 PM, Cocco Gianfranco
>  wrote:
> > Is there a way to fix “wal_segsize” to about 1 Gb in 9.2. version, and
> “rebuild” postgreSQL server?
>
> As long as you are able to compile your own version of Postgres and
> your distribution does not allow that, there is nothing preventing you
> to do so.
>

But there is something preventing it.  wal_segsize cannot exceed 64MB in
9.2. v10 will be the first version which will allow sizes above 64MB.

Cheers,

Jeff


Re: [PERFORM] Improving PostgreSQL insert performance

2017-06-09 Thread Jeff Janes
On Fri, Jun 9, 2017 at 6:04 AM, Frits Jalvingh  wrote:

>
> I already changed the following config parameters:
> work_mem 512MB
> synchronous_commit off
>

Since you are already batching up commits into large chunks, this setting
is not very useful, but does risk you losing supposedly-committed data upon
a crash.  I would not do it.


> shared_buffers 512mb
>

You might try increasing wal_buffers, but the default for this size of
shared_buffers is 16MB, which is usually big enough.

One thing you are missing is max_wal_size.  The default value of that is
probably too small for what you are doing.

But if you are not using COPY, then maybe none of this matters as the
bottleneck will be elsewhere.

Cheers,

Jeff


Re: [PERFORM] Different plan between 9.6 and 9.4 when using "Group by"

2017-05-29 Thread Jeff Janes
On Sat, May 27, 2017 at 1:40 AM, 梁海安(Killua Leung) <
lianghaian...@pingan.com.cn> wrote:

> Hi team:
>
>The following SQL is very slow in 9.6.1 for the plan has a “sort”
> node.
>


The difference is only a factor of 2.  I wouldn't call it "very" slow.

Your explain plans are unreadable, please try posting them as
un-line-wrapped text files, or using something like
https://explain.depesz.com/, to share them in a readable way.  (Also,
VERBOSE probably isn't doing us much
good here, and makes it much less readable).

Writing your CTEs as inline subqueries might help the planner make some
better choices here.  Also, the estimate for CTE n is so bad, I'm guessing
that their is a high functional dependency on:

a.mapping_code = b.mapping_code AND a.channel=b.channel

While the planner is assuming they are independent.  You might be able to
get better estimates there by doing something like:

a.mapping_code+0 = b.mapping_code+0 AND a.channel=b.channel

(or using ||'' rather than +0 if the types are textual rather than
numerical).  But I doubt it would be enough of a difference to change the
plan, but it is an easy thing to try.

Cheers,

Jeff


Re: [PERFORM] select subquery versus join subquery

2017-05-23 Thread Jeff Janes
On Tue, May 23, 2017 at 4:03 AM, Gunnar "Nick" Bluth <
gunnar.bluth.ext...@elster.de> wrote:

> Am 05/22/2017 um 09:57 PM schrieb Jeff Janes:
> >
> > create view view2 as select id,
> >   (
> >  select md5 from thing_alias where thing_id=id
> > order by priority desc limit 1
> >   ) as md5,
> >   cutoff from thing;
> >
> > Cheers,
> >
> > Jeff
>
> Hi Jeff,
>
> how does something like
>
> CREATE OR REPLACE VIEW public.view3 AS
>  SELECT thing.id,
> foo.md5,
> thing.cutoff
>FROM thing,
> LATERAL ( SELECT DISTINCT ON (thing_alias.thing_id)
> thing_alias.thing_id,
> thing_alias.md5
>FROM thing_alias
>   WHERE thing_alias.thing_id = thing.id
>   ORDER BY thing_alias.thing_id, thing_alias.priority DESC) foo
>
> work for you? At least that's always using an index scan here, as
> opposed to view1, which (for me) defaults to a SeqScan on thing_alias at
> a low cutoff.
>

Unfortunately that always uses the index scan, even at a high cutoff where
aggregation on the seq scan and then hash joining is more appropriate.  So
it is very similar to view2, except that it doesn't return the rows from
"thing" which have zero corresponding rows in thing_alias.

*
> Note btw. that both view1 and view2 don't return any md5 values for me,
> while view3 does!
> *
>

Because of the way I constructed the data, using the power transform of the
uniform random distribution, the early rows of the view (if sorted by
thing_id) are mostly null in the md5 column, so if you only look at the
first few screen-fulls you might not see any md5.  But your view does
effectively an inner join rather than a left join, so your view gets rid of
the rows with a NULL md5.  Most things don't have aliases; of the things
that do, most have 1; and some have a several.



Cheers,

Jeff


[PERFORM] select subquery versus join subquery

2017-05-22 Thread Jeff Janes
I need to make a view which decorates rows from a parent table with
aggregated values from a child table.  I can think of two ways to write it,
one which aggregates the child table to make a new query table and joins
the parent to that, as shown in "view1" below.  Or does subselect in the
select list to aggregate just the currently matching rows, and returns that
value, as in "view2" below.

While these two are semantically equivalent, the planner doesn't understand
that, and always executes them pretty much the way you would naively do it
based on the text of the query.

But view1 is pretty slow if the WHERE clause is highly selective (like
"WHERE cutoff<0.1") because it has to summarize the entire child table
just to pull out a few rows.  But view2 is pretty slow if the entire view
or most of it (like "WHERE cutoff<0.9") is being returned.

Is there some 3rd way to write the query which allows the planner to switch
between strategies (summarize whole table vs summarize values on demand)
depending on the known selectivity of the where clause?

In this case, the planner is getting the relative cost estimates roughly
correct.  It is not a problem of mis-estimation.

I can always create two views, view_small and view_large, and swap between
them based on my own knowledge of how restrictive a query is likely to be,
but that is rather annoying.  Especially in the real-world situation, which
is quite a bit more complex than this.

create table thing as select x as id, random() as cutoff from
generate_series(1,200) f(x);

create table thing_alias as select
floor(power(random()*power(200,5),0.2))::int thing_id, md5(x::text),
random() as priority from generate_series(1,15) f(x);

create index on thing_alias (thing_id );

create index on thing (cutoff );

vacuum; analyze;

create view view1 as select id, md5,cutoff from thing left join
  (
 select distinct on (thing_id) thing_id, md5 from thing_alias
 order by thing_id, priority desc
  ) as foo
  on (thing_id=id);

create view view2 as select id,
  (
 select md5 from thing_alias where thing_id=id
order by priority desc limit 1
  ) as md5,
  cutoff from thing;

Cheers,

Jeff


[PERFORM] pg_stat_statements with fetch

2017-05-19 Thread Jeff Janes
I'm spoiled by using pg_stat_statements to find the hotspot queries which
could use some attention.

But with some recent work, all of the hotspots are of the form "FETCH 1000
FROM c3".  The vast majority of the queries return less than 1000 rows, so
only one fetch is issued per execution.

Is there an automated way to trace these back to the parent query, without
having to strong-arm the driving application into changing its cursor-using
ways?

pg_stat_statements v1.4 and postgresql v9.6 (or 10beta1, if it makes a
difference)

Sometimes you can catch the DECLARE also being in pg_stat_statements, but
it is not a sure thing and there is some risk the name got freed and reused.

log_min_duration_statement has the same issue.

Cheers,

Jeff


Re: [PERFORM] postgres_fdw and column casting shippability

2017-05-16 Thread Jeff Janes
On Mon, May 15, 2017 at 3:22 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Jeff Janes <jeff.ja...@gmail.com> writes:
> > I've tried versions 9.6.3 and 10dev, and neither do what I expected.  It
> > doesn't seem to be a planning problem where it thinks the fast plan is
> > slower, it just doesn't seem to consider the faster plans as being
> options
> > at all.  Is there some setting to make it realize the cast is shippable?
>
> AFAICS, postgres_fdw doesn't have any knowledge of CoerceViaIO parse
> nodes, so it's never going to consider this type of brute-force cast
> as shippable.  Normal casts would presumably be shippable if the
> underlying function is considered safe.
>

So then, the secret is to write it like this:

explain analyze select data from remote2 join remote1 on (int8in(textout(
remote2.id))  = remote1.id)
   where cutoff > 0.;

This works to have the join pushed to the foreign side in 9.6, but not
before that.

Thanks,

Jeff


[PERFORM] postgres_fdw and column casting shippability

2017-05-15 Thread Jeff Janes
I need to do a join between two foreign tables using columns of different
types.

select data from remote2 join remote1 on ((remote2.id)::bigint=remote1.id)
where cutoff > 0.;

For demonstration purposes, I use a loop-back foreign server, set up in the
attached sql file.

If I do the join directly on the "foreign" server by specifying the
schemaname where the physical tables live, it uses a sensible join plan,
using an index on cutoff column to get a handful of rows, then casting the
id column and using in index on remote1.id to get each row there.

explain analyze select data from remote.remote2 join remote.remote1 on ((
remote2.id)::bigint=remote1.id) where cutoff > 0.;

 QUERY PLAN

 Nested Loop  (cost=5.56..1100.48 rows=100 width=8) (actual
time=0.303..5.598 rows=119 loops=1)
   ->  Bitmap Heap Scan on remote2  (cost=5.13..334.85 rows=91 width=7)
(actual time=0.112..0.899 rows=105 loops=1)
 Recheck Cond: (cutoff > '0.'::double precision)
 Heap Blocks: exact=105
 ->  Bitmap Index Scan on remote2_cutoff_idx  (cost=0.00..5.11
rows=91 width=0) (actual time=0.062..0.062 rows=105 loops=1)
   Index Cond: (cutoff > '0.'::double precision)
   ->  Index Scan using remote1_id_idx on remote1  (cost=0.43..8.40 rows=1
width=16) (actual time=0.038..0.041 rows=1 loops=105)
 Index Cond: (id = (remote2.id)::bigint)


But if I go through the foreign machinery, it doesn't use a good plan:

explain analyze select data from remote2 join remote1 on ((remote2.id
)::bigint=remote1.id) where cutoff > 0.;

   QUERY PLAN
-
 Hash Join  (cost=537.81..76743.81 rows=455000 width=4) (actual
time=75.019..4659.802 rows=119 loops=1)
   Hash Cond: (remote1.id = (remote2.id)::bigint)
   ->  Foreign Scan on remote1  (cost=100.00..35506.00 rows=100
width=16) (actual time=1.110..4143.655 rows=100 loops=1)
   ->  Hash  (cost=436.67..436.67 rows=91 width=7) (actual
time=2.754..2.754 rows=105 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 13kB
 ->  Foreign Scan on remote2  (cost=105.13..436.67 rows=91 width=7)
(actual time=1.567..2.646 rows=105 loops=1)
 Planning time: 29.629 ms
 Execution time: 4660.433 ms

I thought it would either push the entire join to the foreign side, or at
least do a foreign index scan on remote2_cutoff_idx, then loop over each
row and do a foreign index scans against remote1_id_idx.

I've tried versions 9.6.3 and 10dev, and neither do what I expected.  It
doesn't seem to be a planning problem where it thinks the fast plan is
slower, it just doesn't seem to consider the faster plans as being options
at all.  Is there some setting to make it realize the cast is shippable?
Is any of the work being done on postgres_fdw for V11 working towards
fixing this?

Cheers,

Jeff
drop database foobar;
create database foobar;
\c foobar

CREATE EXTENSION IF NOT EXISTS postgres_fdw WITH SCHEMA public;

CREATE SERVER remote_server FOREIGN DATA WRAPPER postgres_fdw options (dbname 
'foobar') ;
CREATE USER MAPPING FOR postgres SERVER remote_server OPTIONS (
"user" 'postgres'
);

CREATE SCHEMA remote;
SET search_path = remote, pg_catalog;

CREATE TABLE remote1 as select floor(random()*1000)::bigint as id, 
floor(random()*1000)::bigint as data from generate_series(1,100);
create index on remote1 (id);
CREATE TABLE remote2 as select id::text as id, random() as cutoff from remote1;
create index on remote2 (cutoff);
vacuum analyze;

reset search_path;

CREATE FOREIGN TABLE remote1 (
   id integer,
   data integer
)
SERVER remote_server
OPTIONS (
schema_name 'remote',
table_name 'remote1',
use_remote_estimate 'true'
);

CREATE FOREIGN TABLE remote2 (
   id text,
   cutoff double precision 
)
SERVER remote_server
OPTIONS (
schema_name 'remote',
table_name 'remote2',
use_remote_estimate 'true'
);

analyze;

explain analyze select data from remote2 join remote1 on 
((remote2.id)::bigint=remote1.id) where cutoff > 0.;

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Please help with a slow query: there are millions of records, what can we do?

2017-03-08 Thread Jeff Janes
On Tue, Mar 7, 2017 at 6:26 PM, Pat Maddox  wrote:

> Hi there,
>
> I’ve been asked to help with a project dealing with slow queries. I’m
> brand new to the project, so I have very little context. I’ve gathered as
> much information as I can.
>
> I’ve put the schema, query, and explain info in gists to maintain their
> formatting.
>
> We are stumped with this slow query right now. I could really use some
> help looking for ways to speed it up.
>
> If you need any more information, please let me know.
>


You could try a partial index on:

(account_id, completed_at desc, taskable_name, position,
assigned_to_user_id) where "tasks"."archived" != 't' AND "tasks"."complete"
= 't'

Also, the poor estimate of the number of rows on your scan of
index_permissions_on_user_id_and_object_id_and_object_type suggests that
you are not analyzing (and so probably also not vacuuming) often enough.

Cheers,

Jeff


Re: [PERFORM] Huge difference between ASC and DESC ordering

2017-03-06 Thread Jeff Janes
On Mon, Mar 6, 2017 at 8:46 AM, twoflower  wrote:

> Thank you Jeff.
>
> There are 7 million rows satisfying fk_id_client = 20045. There is an
> index on fk_id_client, now I added a composite (fk_id_client, id) index but
> that did not help.
>

With 7 million rows, you shouldn't expect any magic here.  But still 7
million is less than 18 million, and you may be able to get that 7 million
with more sequential-like IO.

Did you force PostgreSQL to stop using the index on s.id?  If not, do
that.  If so, please post the EXPLAIN (analyze) of the plan it does switch
to.



> I see the point of what you are saying, but still don't understand how
> these two situations (*asc* vs. *desc*) are not symmetrical.


They return different data.  How could they be symmetrical?  You are
getting a different 50 rows depending on which way you order the data in
the query.  You are **not** getting the same 50 rows, just in a different
order from among the 50.



> I mean, there *is* an ascending index on *JOB_MEMORY.id*, so why does it
> matter which end I am picking the data from?
>


The query stops as soon as it finds 50 rows which meet fk_id_client =
20045.  When you order one way, it needs to cover 18883917 to find those
50.  When you order the other way, it takes 6610 to find those 50. This
fact does not depend on whether the index is ASC or DESC.  If you traverse
a DESC index backwards, it has exactly the same issue as if you traverse a
ASC index forward.  Either way, once it decides to use that index to obtain
the ordering of the query, it has to inspect 18883917 tuples before it
satisfies the LIMIT.


>
> The thing is, even when I force Postgres to use the ascending index on
> *id*, it's still orders of magnitude slower than the *desc* version (even
> when that one goes through the index backwards).


Right.  PostgreSQL has to return the rows commanded by your query.  It
can't just decide to return a different set of rows because doing so would
be faster.  If that is what you want, wrap the whole query into a subselect
and move the ORDER BY into the outer query, like "select * from (SELECT ...
LIMIT 50) foo order by foo.id"

Changing the ordering direction of the index doesn't change which rows get
returned, while changing the ordering direction of the query does.

Cheers,

Jeff


Re: [PERFORM] Huge difference between ASC and DESC ordering

2017-03-06 Thread Jeff Janes
On Mon, Mar 6, 2017 at 6:22 AM, twoflower  wrote:

> I have the following query
>
> select *
> from "JOB_MEMORY_STORAGE" st
>   inner join "JOB_MEMORY" s on s.fk_id_storage = st.id
> where st.fk_id_client = 20045
> order by s.id asc limit 50
>
>
The query stops as soon as it finds 50 rows which meet fk_id_client =
20045.  When you order one way, it needs to cover 18883917 to find those
50.  When you order the other way, it takes 6610 to find those 50.   So the
problem is that the tuples which satisfy st.fk_id_client = 20045 all lie
towards one end of the s.id range, but PostgreSQL doesn't know that. This
is a hard type of problem to solve at a fundamental level.  The best you
can do is work around it.  Do you really need the order to be on s.id?  If
so, you can get PostgreSQL to stop trying to use the index for ordering
purposes by writing that as "order by s.id+0 asc limit 50", or by using a
CTE which does the join and have the ORDER BY and LIMIT outside the CTE.

Do you have an index on fk_id_client?  Or perhaps better, (fk_id_client,
id)?  How many rows satisfy fk_id_client = 20045?


How can I help Postgres execute the query with *asc* ordering as fast as
> the one with *desc*?
>

You probably can't.  Your data us well suited to one, and ill suited for
the other.  You can probably make it faster than it currently is, but not
as fast as the DESC version.

Cheers,

Jeff


Re: [PERFORM] Speeding up JSON + TSQUERY + GIN

2017-03-05 Thread Jeff Janes
On Thu, Mar 2, 2017 at 1:19 PM, Sven R. Kunze <srku...@mail.de> wrote:

> On 01.03.2017 18:04, Jeff Janes wrote:
>
> On Wed, Mar 1, 2017 at 6:02 AM, Sven R. Kunze <srku...@mail.de> wrote:
>
>> On 28.02.2017 17:49, Jeff Janes wrote:
>>
>> Oh.  In my hands, it works very well.  I get 70 seconds to do the {age:
>> 20} query from pure cold caches, versus 1.4 seconds from cold caches which
>> was followed by pg_prewarm('docs','prefetch').
>>
>> How much RAM do you have?  Maybe you don't have enough to hold the table
>> in RAM.  What kind of IO system?  And what OS?
>>
>>
>> On my test system:
>>
>> RAM: 4GB
>> IO: SSD (random_page_cost = 1.0)
>> OS: Ubuntu 16.04
>>
>
>
> 4GB is not much RAM to be trying to pre-warm this amount of data into.
> Towards the end of the pg_prewarm, it is probably evicting data read in by
> the earlier part of it.
>
> What is shared_buffers?
>
>
> 942MB.
>
> But I see where you are coming from. How come that these queries need a
> Recheck Cond? I gather that this would require reading not only the index
> data but also the table itself which could be huge, right?
>

Bitmaps can overflow and drop the row-level information, tracking only the
blocks which need to be inspected.  So it has to have a recheck in case
that happens (although in your case it is not actually overflowing--but it
still needs to be prepared for that).  Also, I think that jsonb_path_ops
indexes the hashes of the paths, so it can deliver false positives which
need to be rechecked.  And you are selecting `id`, which is not in the
index so it would have to consult the table anyway to retrieve that.  Even
if it could get all the data from the index itself, I don't think GIN
indexes support that feature.

Cheers,

Jeff


Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-05 Thread Jeff Janes
On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 <
dinesh.chan...@cyient.com> wrote:

> Dear Nur,
>
>
>
> The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence
> oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
> (p.modification_time > '2015-05-10 00:06:56.056 IST' OR
> oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id
>
>
>
>
...


>  ->  Index Scan using point_domain_class_id_index on
> point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual
> time=27.265..142101.1
>
> 59 rows=1607491 loops=1)
>
>Index Cond: (domain_class_id = 11)
>

Why wouldn't this be using a bitmap scan rather than a regular index scan?
It seems like it should prefer the bitmap scan, unless the table is well
clustered on domain_class_id.  In which case, why isn't it just faster?

You could try repeating the explain analyze after setting enable_indexscan
=off to see what that gives.  If it gives a seq scan, then repeat with
enable_seqscan also turned off.  Or If it gives the bitmap scan, then
repeat with enable_bitmapscan turned off.

How many rows is in point, and how big is it?

The best bet for making this better might be to have an index on
(domain_class_id, modification_time) and hope for an index only scan.
Except that you are on 9.1, so first you would have to upgrade.  Which
would allow you to use BUFFERS in the explain analyze, as well as
track_io_timings, both of which would also be pretty nice to see.  Using
9.1 is like having one hand tied behind your back.

Also, any idea why this execution of this query 15 is times faster than the
execution you found in the log file?  Was the top output you showed in the
first email happening at the time the really slow query was running, or was
that from a different period?

Cheers,

Jeff


Re: [PERFORM] anti-join with small table via text/varchar cannot estimate rows correctly

2017-03-01 Thread Jeff Janes
On Wed, Mar 1, 2017 at 2:12 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wed, Mar 1, 2017 at 3:00 PM, Stefan Andreatta 
> wrote:
>
>> plain analyze
>>  select tmp_san_1.id
>>  from tmp_san_1
>>left join tmp_san_2 on tmp_san_1.text = tmp_san_2.text
>>  where tmp_san_2.id is null;
>>
>> ​Does it help if you check for "tmp_san_2.text is null"?
>
>
>
Yes.  And if you swap it so that the left join is on the integer while IS
NULL is on the text, that also gets poorly estimated.  Also, if you make
both column of both tables be integers, same thing--you get bad estimates
when the join condition refers to one column and the where refers to the
other.  I don't know why the estimate is poor, but it is not related to the
types of the columns, but rather the identities of them.

Cheers,

Jeff


Re: [PERFORM] Speeding up JSON + TSQUERY + GIN

2017-03-01 Thread Jeff Janes
On Wed, Mar 1, 2017 at 6:02 AM, Sven R. Kunze <srku...@mail.de> wrote:

> On 28.02.2017 17:49, Jeff Janes wrote:
>
> Oh.  In my hands, it works very well.  I get 70 seconds to do the {age:
> 20} query from pure cold caches, versus 1.4 seconds from cold caches which
> was followed by pg_prewarm('docs','prefetch').
>
> How much RAM do you have?  Maybe you don't have enough to hold the table
> in RAM.  What kind of IO system?  And what OS?
>
>
> On my test system:
>
> RAM: 4GB
> IO: SSD (random_page_cost = 1.0)
> OS: Ubuntu 16.04
>


4GB is not much RAM to be trying to pre-warm this amount of data into.
Towards the end of the pg_prewarm, it is probably evicting data read in by
the earlier part of it.

What is shared_buffers?

Cheers,

Jeff


Re: [PERFORM] Speeding up JSON + TSQUERY + GIN

2017-02-28 Thread Jeff Janes
On Tue, Feb 28, 2017 at 12:27 AM, Sven R. Kunze <srku...@mail.de> wrote:

> On 27.02.2017 19:22, Jeff Janes wrote:
>
> If by 'permanently', you mean even when you intentionally break things,
> then no.  You will always be able to intentionally break things.  There is
> on-going discussion of an auto-prewarm feature.  But that doesn't yet
> exist; and once it does, a super user will always be able to break it.
>
> Presumably you have a use-case in mind other than intentional sabotage of
> your caches by root.  But, what is it?  If you reboot the server
> frequently, maybe you can just throw 'select pg_prewarm...' into an init
> script?
>
>
> I didn't express myself well enough. pg_prewarm doesn't help to speed up
> those queries at all.
>


Oh.  In my hands, it works very well.  I get 70 seconds to do the {age: 20}
query from pure cold caches, versus 1.4 seconds from cold caches which was
followed by pg_prewarm('docs','prefetch').

How much RAM do you have?  Maybe you don't have enough to hold the table in
RAM.  What kind of IO system?  And what OS?


Cheers,

Jeff


Re: [PERFORM] Speeding up JSON + TSQUERY + GIN

2017-02-27 Thread Jeff Janes
On Sun, Feb 26, 2017 at 5:28 AM, Sven R. Kunze  wrote:

>
>
> Using "select pg_prewarm('docs');" and on any of the indexes doesn't help
> either.
> After a "systemctl stop postgresql.service && sync && echo 3 >
> /proc/sys/vm/drop_caches && systemctl start postgresql.service" the age=20,
> 30 or name=john queries are slow again.
>
>
> Is there a way to speed up or to warm up things permanently?
>


If by 'permanently', you mean even when you intentionally break things,
then no.  You will always be able to intentionally break things.  There is
on-going discussion of an auto-prewarm feature.  But that doesn't yet
exist; and once it does, a super user will always be able to break it.

Presumably you have a use-case in mind other than intentional sabotage of
your caches by root.  But, what is it?  If you reboot the server
frequently, maybe you can just throw 'select pg_prewarm...' into an init
script?

Cheers,

Jeff


Re: [PERFORM] PSA: upgrade your extensions

2017-02-01 Thread Jeff Janes
On Wed, Feb 1, 2017 at 4:38 AM, Merlin Moncure  wrote:

> I was just troubleshooting a strange performance issue with pg_trgm
> (greatest extension over) that ran great in testing but poor in
> production following a 9.6 in place upgrade from 9.2. By poor I mean
> 7x slower.  Problem was resolved by ALTER EXTENSION UPDATE followed by
> a REINDEX on the impacted table.  Hope this helps somebody at some
> point :-).
>

It was probably the implementation of the triconsistent function for
pg_trgm (or I would like to think so, anyway).

But if so, the REINDEX should not have been necessary, just the ALTER
EXTENSION UPDATE should do the trick. Rebuiding a large gin index can be
pretty slow.

Cheers,

Jeff


Re: [PERFORM] Backup taking long time !!!

2017-01-24 Thread Jeff Janes
On Mon, Jan 23, 2017 at 9:43 AM, Simon Riggs <si...@2ndquadrant.com> wrote:

> On 23 January 2017 at 17:12, Jeff Janes <jeff.ja...@gmail.com> wrote:
>
> >> Just to make sure anyone reading the mailing list archives isn't
> >> confused, running pg_start_backup does *not* make PG stop writing to
> >> BASEDIR (or DATADIR, or anything, really).  PG *will* continue to write
> >> data into BASEDIR after pg_start_backup has been called.
> >
> >
> >
> > Correct.  Unfortunately it is a very common myth that it does cause
> > PostgreSQL to stop writing to the base dir.
>
> Never heard that one before. Wow. Who's been saying that?
>
> It's taken me years to hunt down all invalid backup memes and terminate
> them.
>
> Never fails to surprise me how many people don't read the docs.
>

I've seen it on stackexchange, and a few times on the pgsql mailing lists,
and talking to people in person.  I've never traced it back some
"authoritative" source who is making the claim, I think many people just
independently think up "How would I implement pg_start_backup if I were
doing it" and then come up with the same false conclusion, and then all
reinforce each other.

I don't think the docs are particularly clear on this. There is the comment
"Some file system backup tools emit warnings or errors if the files they
are trying to copy change while the copy proceeds. When taking a base
backup of an active database, this situation is normal and not an error"
but the reader could think that comment could apply to any of the files in
the datadirectory (in particular, pg_xlog), and could think that it doesn't
apply to the files in datadirectory/base in particular.  In other words,
once they form the wrong understanding, the docs (if read) don't force them
to change it, as they could interpret it in ways that are consistent.

Of course the docs aren't a textbook and aren't trying to fully describe
the theory of operation; just give the people a recipe they can follow. But
people will make inferences from that recipe anyway.  I don't know if it is
worth trying preemptively dispel these mistakes in the docs.

Cheers,

Jeff


Re: [PERFORM] Backup taking long time !!!

2017-01-23 Thread Jeff Janes
On Mon, Jan 23, 2017 at 7:28 AM, Jim Nasby  wrote:

> On 1/22/17 11:32 AM, Stephen Frost wrote:
>
>> The 1-second window concern is regarding the validity of a subsequent
>> incremental backup.
>>
>
> BTW, there's a simpler scenario here:
>
> Postgres touches file.
> rsync notices file has different timestamp, starts copying.
> Postgres touches file again.
>
> If those 3 steps happen in the same second, you now have an invalid
> backup. There's probably other scenarios as well.
>

To be clear, you don't have an invalid backup *now*, as replay of the WAL
will fix it up.  You will have an invalid backup next time you take a
backup, using a copy of the backup you just took now as the rsync
destination of that future backup.

If you were to actually fire up a copy of the backup and go through
recovery, then shut it down, and then use that post-recovery copy as the
destination of the rsync, would that eliminate the risk (barring clock skew
between systems)?


> In short, if you're using rsync, it's *critical* that you give it the
> --checksum option, which tells rsync to ignore file size and timestamp.


Which unfortunately obliterates much of the point of using rsync for many
people.  You can still save on bandwidth, but not on local IO on each end.

Cheers,

Jeff


Re: [PERFORM] Backup taking long time !!!

2017-01-23 Thread Jeff Janes
On Sun, Jan 22, 2017 at 6:57 AM, Stephen Frost  wrote:

> Greetings,
>
> * julyanto SUTANDANG (julya...@equnix.co.id) wrote:
> > CORRECTION:
> >
> > "you might you pg_start_backup to tell the server not to write into the
> > DATADIR"
> >
> > become
> >
> > "you might *use* pg_start_backup to tell the server not to write into the
> > *BASEDIR*, actually server still writes but only to XLOGDIR "
>
> Just to make sure anyone reading the mailing list archives isn't
> confused, running pg_start_backup does *not* make PG stop writing to
> BASEDIR (or DATADIR, or anything, really).  PG *will* continue to write
> data into BASEDIR after pg_start_backup has been called.
>


Correct.  Unfortunately it is a very common myth that it does cause
PostgreSQL to stop writing to the base dir.


>
> The only thing that pg_start_backup does is identify an entry in the WAL
> stream, from which point all WAL must be replayed when restoring the
> backup.  All WAL generated from that point (pg_start_backup point) until
> the pg_stop_backup point *must* be replayed when restoring the backup or
> the database will not be consistent.
>

pg_start_backup also forces full_page_writes to be effectively 'on' for the
duration of the backup, if it is not already explicitly on (which it
usually will already be).  This affects pg_xlog, of course, not base.  But
it is an essential step for people who run with full_page_writes=off, as it
ensures that anything in base which got changed mid-copy will be fixed up
during replay of the WAL.


Cheers,

Jeff


Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-12-04 Thread Jeff Janes
> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';


> I wonder why bitmap heap scan adds such a big amount of time on top of the
> plain bitmap index scan.
> It seems to me, that the recheck is active although all blocks are exact
> [1] and that pg is loading the jsonb for the recheck.
>
> Is this an expected behavior?
>


Yes, this is expected.  The gin index is lossy.  It knows that all the
elements are present (except when it doesn't--large elements might get
hashed down and suffer hash collisions), but it doesn't know what the
recursive structure between them is, and has to do a recheck.

For example, if you change your example where clause to:

big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]';

You will see that the index still returns 50,000 rows, but now all of them
get rejected upon the recheck.

You could try changing the type of index to jsonb_path_ops.  In your given
example, it won't make a difference, because you are actually counting half
the table and so half the table needs to be rechecked.  But in my example,
jsonb_path_ops successfully rejects all the rows at the index stage.

Cheers,

Jeff


Re: [PERFORM] Some tuning suggestions on a Red Hat 6.7 - PG 9.5.3 production environment

2016-11-15 Thread Jeff Janes
On Mon, Nov 14, 2016 at 3:45 AM, Pietro Pugni 
wrote:

>
> The first issue I faced was about maintenance_work_mem because I set it to
> 16GB and the server silently crashed during a VACUUM because I didn’t
> consider that it could take up to autovacuum_max_workers *
> maintenance_work_mem (roughly 48GB).
>

I don't think that this is the true cause of the problem. In current
versions of PostgreSQL, VACUUM cannot make use of more than 1GB of
process-local memory, even if maintenance_work_mem is set to a far greater
value.

Cheers,

Jeff


Re: [PERFORM] Query planner chooses index scan backward instead of better index option

2016-11-14 Thread Jeff Janes
On Mon, Nov 14, 2016 at 4:01 AM, Seckin Pulatkan 
wrote:

> Hi,
>
> On our production environment (PostgreSQL 9.4.5 on
> x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat
> 4.8.5-4), 64-bit), one of our queries runs very slow, about 5 minutes . We
> noticed that it does not use an index that we anticapited it would.
>
> The query is
>
> select booking0_.*
> from booking booking0_
> where booking0_.customer_id in (
>   select customer1_.id
>  from customer customer1_
>where lower((customer1_.first_name||'
> '||customer1_.last_name)) like '%gatef%'
>   )
> order by booking0_.id desc
> limit 30;
>


It thinks it is going to find 30 rows which meet your condition very
quickly, so by walking the index backwards it can avoid needing to do a
sort.  But, the rows which meet your sub-select conditions are biased
towards the front of the index, so in fact it was to walk backwards through
most of your index before finding 30 eligible rows.

Your best bet is probably to force it into the plan you want by using a CTE:

with t as
(select booking0_.*
from booking booking0_
where booking0_.customer_id in (
  select customer1_.id
 from customer customer1_
   where lower((customer1_.first_name||'
'||customer1_.last_name)) like '%gatef%'
)  select * from t order by booking0_.id desc limit 30;

Cheers,

Jeff


Re: [PERFORM] Any advice tuning this query ?

2016-11-12 Thread Jeff Janes
On Fri, Nov 11, 2016 at 7:19 AM, Henrik Ekenberg  wrote:

> Hi,
>
> I have a select moving around a lot of data and takes times
> Any advice tuning this query ?
>
> EXPLAIN (ANALYZE ON, BUFFERS ON)
>

When accessing lots of data, sometimes the act of collecting timing on all
of the actions makes the query take >2x times longer, or more, and distorts
the timings it collects.

Try running the same query like:

EXPLAIN (ANALYZE ON, BUFFERS ON, timing off)

If the Execution times are very similar either way, then you don't have
this problem.  But if they differ, then you can't depend on the results of
the timings reported when timing is turned on.  Large sorts are
particularly subject to this problem.

More than half the time (if the times are believable) goes to scanning the
index activations_books_date.  You might be better off with a sort rather
than an index scan.  You can test this by doing:

begin;
drop index activations_books_date;
;
rollback;

Don't do that on production server, as it will block other access to the
table for the duration.


You might also benefit from hash joins/aggregates, but you would have to
set work_mem to a very large value get them.  I'd start by setting work_mem
in your session to 1TB, and seeing if that changes the explain plan (just
explain, not explain analyze!).  If that supports the hash
joins/aggregates, then keeping lowering work_mem until you find the minimum
that supports the hash plans.  Then ponder if it is safe to use that much
work_mem "for real" given your RAM and level  of concurrent access.

Cheers,

Jeff


Re: [PERFORM] Inlining of functions (doing LIKE on an array)

2016-11-11 Thread Jeff Janes
On Thu, Nov 10, 2016 at 10:54 PM, l...@laurent-hasson.com <
l...@laurent-hasson.com> wrote:

> Hello,
>
>
>
> I am trying to implement an efficient “like” over a text[]. I see a lot of
> people have tried before me and I learnt a lot through the forums.
>

Have you looked at parray_gin?

https://github.com/theirix/parray_gin

(Also on PGXN, but I don't know how up-to-date it is there)

Or you could create an regular pg_trgm index on the expression:

array_to_string("ICD9_DGNS_CD",'')

If you can find a safe delimiter to use (one that can't be part of the
text[]).

The performance of these options will depend on both the nature of your
data and the nature of your queries.

Cheers,

Jeff


Re: [PERFORM] archive_command too slow.

2016-11-04 Thread Jeff Janes
On Wed, Nov 2, 2016 at 12:06 PM, Joao Junior  wrote:

> Hi friends,
>
> I am running 2 Linux machines, kernel  3.13.0-45-generic #74-Ubuntu SMP.
> Postgresql version 9.4 in both machine, in a Hot Standby cenario.
>
> Master-Slave using WAL files, not streaming replication.
>
> The archive_command from master is:
>
> archive_command = '/usr/bin/rsync -a -e "ssh" "%p"
> slave:/data2/postgres/standby/main/incoming/"%f"' #
>


How long does it take just to set up the ssh tunnel?

$ time ssh slave hostname

In my hands, this takes about 0.5, every time.  If you need to archive 26
segments per minute, that much overhead is going to consume a substantial
fraction of your time budget.

How much network bandwidth do you have?  If you scp a big chunk of files in
one command over to the slave (not into a production directory of it,of
course) how fast does that go?

$ time rsync datadir/pg_xlog/0001C960004? slave:/tmp/foo/


...



>
> It seems that archive_command is very slowly compared with the amount of
> WAL segments generated.
> Any suggestions??? Should I use another strategy to increase the
> archive_command process speed???
>


If network throughput is the problem, use compression, or get a faster
network.

If setting up the ssh tunnel is the problem, you could assess whether you
really need that security, or compile a custom postgresql with larger WAL
file sizes, or write a fancy archive_command which first archives the files
to a local directory, and then transfers them in chunks to the slave.  Or
maybe use streaming rather than file shipping.


Cheers,

Jeff


Re: [PERFORM] Millions of tables

2016-10-01 Thread Jeff Janes
On Thu, Sep 29, 2016 at 4:11 AM, Alex Ignatov (postgrespro) <
a.igna...@postgrespro.ru> wrote:

>
>
> *From:* pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] *On Behalf Of *
>
> Thank you Terry.  You get the gold star.  :)   I was waiting for that to
> come up.
>
>
>
> Success means handling this condition.  A whole database vacuum and
> dump-restore is out of the question.  Can a properly tuned autovacuum
> prevent the situation?
>
>
>
> -Greg
>
>
>
> Hi!
>
> With millions of tables you have to setautovacuum_max_workers
>  sky-high =). We have some situation when at thousands of tables autovacuum
> can’t vacuum all tables that need it. Simply it vacuums some of most
> modified table and never reach others.
>

Any autovacuum worker should vacuum all tables in its assigned database
which it perceives need vacuuming, as long as it can get the lock.  Unless
the worker is interrupted, for example by frequent database shutdowns, it
should reach all tables in that database before it exits.  Unless there is
a bug, or you are constantly restarting the database before autovacuum can
finish or doing something else to kill them off, what you describe should
not happen.

If it is a bug, we should fix it.  Can you give more details?

There is a known bug when you multiple active databases in the same
cluster.  Once one database reaches the age where anti-wrap around vacuums
kick in, then all future autovacuum workers are directed to that one
database, starving all other databases of auto-vacuuming.  But that doesn't
sound like what you are describing.

Cheers,

Jeff


Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-29 Thread Jeff Janes
On Thu, Sep 29, 2016 at 11:12 AM, Pavel Stehule 
wrote:

>
>
> 2016-09-29 14:20 GMT+02:00 Sven R. Kunze :
>
>> On 23.09.2016 11:00, Pavel Stehule wrote:
>>
>> 2016-09-23 8:35 GMT+02:00 Sven R. Kunze :
>>
>>> I was wondering: would it be possible for PostgreSQL to rewrite the
>>> query to generate the UNION (or subquery plan if it's also fast) on it's
>>> own?
>>>
>>
>> It depends on real data. On your specific data the UNION variant is
>> pretty fast, on different set, the UNION can be pretty slow. It is related
>> to difficult OR predicate estimation.
>>
>>
>> I figure that the UNION is fast if the sub-results are small (which they
>> are in our case). On the contrary, when they are huge, the OUTER JOIN
>> variant might be preferable.
>>
>>
>> Is there something I can do to help here?
>>
>> Or do you think it's naturally application-dependent and thus should be
>> solved with application logic just as we did?
>>
>
> In ideal world then plan should be independent on used form. The most
> difficult is safe estimation of OR predicates. With correct estimation the
> transformation to UNION form should not be necessary I am think.
>

I don't think it is an estimation issue.  If it were, the planner would
always choose the same inefficient plan (providing the join collapse
limits, etc. don't come into play, which I don't think they do here) for
all the different ways of writing the query.

Since that is not happening, the planner must not be able to prove that the
different queries are semantically identical to each other, which means
that it can't pick the other plan no matter how good the estimates look.

Cheers,

Jeff


Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-29 Thread Jeff Janes
On Thu, Sep 29, 2016 at 11:48 AM, Sven R. Kunze <srku...@mail.de> wrote:

> On 29.09.2016 20:03, Jeff Janes wrote:
>
> Perhaps some future version of PostgreSQL could do so, but my gut feeling
> is that that is not very likely.  It would take a lot of work, would risk
> breaking or slowing down other things, and is probably too much of a niche
> issue to attract a lot of interest.
>
>
> I don't hope so; in business and reports/stats applications there is a lot
> of room for this.
>
> Why do you think that OR-ing several tables is a niche issue? I can at
> least name 3 different projects (from 3 different domains) where combining
> 3 or more tables with OR is relevant and should be reasonably fast.
>

Well, I don't recall seeing this issue on this list before (or a few other
forums I read) while I see several other issues over and over again.  So
that is why I think it is a niche issue.  Perhaps I've have seen it before
and just forgotten, or have not recognized it as being the same issue each
time.



> This multitude of solution also shows that applications developers might
> be overwhelmed by choosing the most appropriate AND most long-lasting one.
> Because what I take from the discussion is that a UNION might be
> appropriate right now but that could change in the future even for the very
> same use-case at hand.
>

I'm not sure what would cause it to change.  Do you mean if you suddenly
start selecting a much larger portion of the table?  I don't know that the
union would be particularly bad in that case, either.

I'm not saying it wouldn't be nice to fix it.  I just don't think it is
particularly likely to happen soon.  I could be wrong (especially if you
can write the code to make it happen).

Cheers,

Jeff


Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-29 Thread Jeff Janes
On Thu, Sep 22, 2016 at 11:35 PM, Sven R. Kunze  wrote:

> Thanks a lot Madusudanan, Igor, Lutz and Jeff for your suggestions.
>
> What I can confirm is that the UNION ideas runs extremely fast (don't have
> access to the db right now to test the subquery idea, but will check next
> week as I travel right now). Thanks again! :)
>
>
> I was wondering: would it be possible for PostgreSQL to rewrite the query
> to generate the UNION (or subquery plan if it's also fast) on it's own?
>

I don't know what the subquery plan is, I don't see references to that in
the email chain.

I don't believe that current versions of PostgreSQL are capable of
rewriting the plan in the style of a union.  It is not just a matter of
tweaking the cost estimates, it simply never considers such a plan in the
first place given the text of your query.

Perhaps some future version of PostgreSQL could do so, but my gut feeling
is that that is not very likely.  It would take a lot of work, would risk
breaking or slowing down other things, and is probably too much of a niche
issue to attract a lot of interest.

Why not just use the union?  Are you using a framework which generates the
query automatically and you have no control over it?  Or do you just think
it is ugly or fragile for some other reason?

Perhaps moving the union from the outside to the inside would be more
suitable?  That way teh select list is only specified once, and if you AND
more clauses into the WHERE condition they also only need to be specified
once.

SELECT * FROM big_table
WHERE
 id in (SELECT big_table_id FROM table_a WHERE "table_a"."item_id" IN
() union
 SELECT big_table_id FROM table_a WHERE "table_b"."item_id" IN
()
  );


Cheers,

Jeff


Re: [PERFORM] Millions of tables

2016-09-26 Thread Jeff Janes
On Mon, Sep 26, 2016 at 5:53 AM, Greg Spiegelberg 
wrote:

>
>>
>> I may need to understand autovacuum better.  My impression was it
> consulted statistics and performed vacuums one table at a time based on the
> vacuum threshold formula on  https://www.postgresql.org/
> docs/9.5/static/routine-vacuuming.html.
>

A problem is that those statistics are stored in one file (per database; it
used to be one file per cluster).  With 8 million tables, that is going to
be a pretty big file.  But the code pretty much assumes the file is going
to be pretty small, and so it has no compunction about commanding that it
be read and written, in its entirety, quite often.

Cheers,

Jeff


Re: [PERFORM] Millions of tables

2016-09-26 Thread Jeff Janes
On Sun, Sep 25, 2016 at 7:50 PM, Greg Spiegelberg 
wrote:

> Hey all,
>
> Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time
> has said not to have millions of tables.  I too have long believed it until
> recently.
>
> AWS d2.8xlarge instance with 9.5 is my test rig using XFS on EBS (io1) for
> PGDATA.  Over the weekend, I created 8M tables with 16M indexes on those
> tables.  Table creation initially took 0.018031 secs, average 0.027467 and
> after tossing out outliers (qty 5) the maximum creation time found was
> 0.66139 seconds.  Total time 30 hours, 31 minutes and 8.435049 seconds.
> Tables were created by a single process.  Do note that table creation is
> done via plpgsql function as there are other housekeeping tasks necessary
> though minimal.
>
> No system tuning but here is a list of PostgreSQL knobs and switches:
> shared_buffers = 2GB
> work_mem = 48 MB
> max_stack_depth = 4 MB
> synchronous_commit = off
> effective_cache_size = 200 GB
> pg_xlog is on it's own file system
>
> There are some still obvious problems.  General DBA functions such as
> VACUUM and ANALYZE should not be done.  Each will run forever and cause
> much grief.
>

Why would the auto versions of those cause less grief than the manual
versions?


>   Backups are problematic in the traditional pg_dump and PITR space.
>

Is there a third option to those two spaces?  File-system snapshots?


> Large JOIN's by VIEW, SELECT or via table inheritance (I am abusing it in
> my test case) are no-no's.  A system or database crash could take
> potentially hours to days to recover.
>

Isn't that a show-stopper?


> There are likely other issues ahead.
>
> You may wonder, "why is Greg attempting such a thing?"  I looked at
> DynamoDB, BigTable, and Cassandra.  I like Greenplum but, let's face it,
> it's antiquated and don't get me started on "Hadoop".  I looked at many
> others and ultimately the recommended use of each vendor was to have one
> table for all data.  That overcomes the millions of tables problem, right?
>
> Problem with the "one big table" solution is I anticipate 1,200 trillion
> records.  Random access is expected and the customer expects <30ms reads
> for a single record fetch.
>

Sorry, I don't really follow.  Whether you have 1 table or millions,
eventually someone has to go get the data off the disk. Why would the
number of tables make much of a difference to that fundamental?

Also, how many tablespaces do you anticipate having?  Can you get 120
petabytes of storage all mounted to one machine?


> No data is loaded... yet  Table and index creation only.  I am interested
> in the opinions of all including tests I may perform.  If you had this
> setup, what would you capture / analyze?  I have a job running preparing
> data.  I did this on a much smaller scale (50k tables) and data load via
> function allowed close to 6,000 records/second.  The schema has been
> simplified since and last test reach just over 20,000 records/second with
> 300k tables.
>
> I'm not looking for alternatives yet but input to my test.  Takers?
>

Go through and put one row (or 8kB worth of rows) into each of 8 million
table.  The stats collector and the autovacuum process will start going
nuts.  Now, maybe you can deal with it.  But maybe not.  That is the first
non-obvious thing I'd look at.

Cheers,

Jeff


Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-22 Thread Jeff Janes
On Thu, Sep 22, 2016 at 6:37 AM, Madusudanan.B.N 
wrote:

> > However, this results in an awful slow plan (requiring to scan the
> complete big_table which obviously isn't optimal)
>
> You mean to say there is a sequential scan ? An explain would be helpful.
> Are there indexes on the provided where clauses.
>
> Postgres can do a Bitmap heap scan to combine indexes, there is no need to
> fire two separate queries.
>

It can't combine bitmap scans that come from different tables.

But he can just combine the two queries into one, with a UNION.

Cheers,

Jeff


Re: [PERFORM] How to reduce IOWAIT and CPU idle time?

2016-09-10 Thread Jeff Janes
On Sat, Sep 10, 2016 at 3:49 AM, Samir Magar  wrote:

> Hello,
>
>
> My Application has normally 25 to 30 connections and it is doing lot of
> insert/update/delete operation.
> The database size is 100GB.
> iowait  is at 40% to 45 % and CPU idle time is at 45% to 50%
> TOTAL RAM = 8 GB   TOTAL CPU = 4
>
> postgresql.conf parametre:
>
> shared_buffers = 2GB
> work_mem = 100MB
> effective_cache_size = 2GB
> maintenance_work_mem = 500MB
> autovacuum = off
> wal_buffers = 64MB
>
>
> How can i reduce iowait and CPU idle time. It is slowing all the queries.
> The queries that used to take 1 sec,it is taking 12-15 seconds.
>

What changed between the 1 sec regime and the 12-15 second regime?  Just
growth in the database size?

Index-update-intensive databases will often undergo a collapse in
performance once the portion of the indexes which are being rapidly dirtied
exceeds shared_buffers + (some kernel specific factor related
to dirty_background_bytes and kin)

If you think this is the problem, you could try violating the conventional
wisdom by setting shared_buffers 80% to 90% of available RAM, rather than
20% to 25%.

Cheers,

Jeff


Re: [PERFORM] Slow query with big tables

2016-08-27 Thread Jeff Janes
On Fri, Aug 26, 2016 at 6:17 AM, Tommi K  wrote:

> Hello,
> thanks for the response. I did not get the response to my email even
> though I am subscribed to the pgsql-performance mail list. Let's hope that
> I get the next one :)
>
> Increasing work_mem did not have great impact on the performance. But I
> will try to update the PostgreSQL version to see if it speeds up things.
>
> However is there way to keep query time constant as the database size
> grows.
>

Not likely.  If the number of rows you are aggregating grows, it will take
more work to do those aggregations.

If the number of rows being aggregated doesn't grow, because all the growth
occurs outside of the measurement_time range, even then the new data will
still make it harder to keep the stuff you want cached in memory.  If you
really want more-constant query time, you could approach that by giving the
machine as little RAM as possible.  This works not by making the large
database case faster, but by making the small database case slower.  That
usually is not what people want.



> Should I use partitioning or partial indexes?
>

Partitioning the Feature and Point tables on measurement_time (or
measurement_start_time,
you are not consistent on what it is called) might be helpful.  However,
measurement_time does not exist in those tables, so you would first have to
de-normalize by introducing it into them.

More likely to be helpful would be precomputing the aggregates and storing
them in a materialized view (not available in 9.2).   Also, more RAM and
better hard-drives can't hurt.

Cheers,

Jeff


Re: [PERFORM] Slow query with big tables

2016-08-27 Thread Jeff Janes
On Sat, Aug 27, 2016 at 7:13 AM, Craig James  wrote:

> On Fri, Aug 26, 2016 at 9:11 PM, Jim Nasby 
> wrote:
>
>> On 8/26/16 3:26 PM, Mike Sofen wrote:
>>
>>> Is there way to keep query time constant as the database size grows.
>>>
>>
>> No. More data == more time. Unless you find a way to break the laws of
>> physics.
>>
>
> Straight hash-table indexes (which Postgres doesn't use) have O(1) access
> time.
>

But he isn't doing single-row lookups, he is doing large aggregations.  If
you have to aggregate N rows, doing a O(1) operation on different N
occasions is still O(N).

Not that big-O is useful here anyway.  It assumes that either everything
fits in RAM (and is already there), or that nothing fits in RAM and it all
has to be fetched from disk, even the index root pages, every time it is
needed.  Tommi is not operating under an environment where the first
assumption holds, and no one operates in an environment where the second
assumption holds.

As N increases beyond available RAM, your actual time for a single look-up
is going to be a weighted average of two different constant-time
operations, one with a small constant and one with a large constant.  Big-O
notation ignores this nicety and assumes all operations are at the slower
speed, because that is what the limit of the weighted average will be as N
gets very large. But real world systems do not operate at the infinite
limit.

So his run time could easily be proportional to N^2, if he aggregates more
rows and each one of them is less likely to be a cache hit.

Cheers,

Jeff


Re: [PERFORM] Estimates on partial index

2016-08-18 Thread Jeff Janes
On Thu, Aug 18, 2016 at 11:55 AM, Victor Yegorov <vyego...@gmail.com> wrote:
> 2016-08-18 18:59 GMT+03:00 Jeff Janes <jeff.ja...@gmail.com>:
>>
>> Both plans touch the same pages.  The index scan just touches some of
>> those pages over and over again.  A large setting of
>> effective_cache_size would tell it that the page will most likely
>> still be in cache when it comes back to touch it again, meaning the
>> cost of doing so will be small, basically free.
>>
>> > and in a typical situation those are cold.
>>
>> But they won't be, because it is heating them up itself, and
>> effective_cache_size says that stay then hot for the duration of the
>> query.
>
>
> But IndexScan means, that not only index, table is also accessed.
> And although index is small get's hot quite quickly (yes, e_c_s is 96GB on
> this dedicated box),
> table is not.

Both types of scans have to touch the same set of pages.  The bitmap
hits all of the needed index pages first and memorizes the relevant
results, then hits all the needed table pages.  The regular index scan
keeps jumping back and forth from index to table. But they are the
same set of pages either way.

With a regular index scan, if the same table page is pointed to from
40 different places in the index, then it will be touched 40 different
times.  But at least 39 of those times it is going to already be in
memory.  The bitmap scan will touch the page just one and deal with
all 40 entries.


>  And this clearly adds up to the total time.

That isn't clear at all from the info you gave.  You would have to set
track_io_timing=on in order to show something like that.  And we don't
know if you ran each query once in the order shown, and posted what
you got (with one warming the cache for the other); or if you have ran
each repeatedly and posted representative examples with a pre-warmed
cache.


> I am wondering, if heap page accesses are also accounted for during
> planning.

It does account for them, but perhaps not perfectly.  See "[PERFORM]
index fragmentation on insert-only table with non-unique column" for
some arguments on that which might be relevant to you.

If you can come up with a data generator which creates data that
others can use to reproduce this situation, we can then investigate it
in more detail.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Estimates on partial index

2016-08-18 Thread Jeff Janes
On Thu, Aug 18, 2016 at 6:52 AM, Victor Yegorov  wrote:
> Greetings.
>
> I have a question on why planner chooses `IndexScan` for the following
> query:
>
> SELECT la.loan_id, la.due_date, la.is_current
>   FROM loan_agreements la WHERE la.is_current AND '2016-08-11' >
> la.due_date;
>
...
>
> Planner chooses the following plan:
>
> QUERY PLAN
>
> 
> Index Scan using idx_loan_agreemnets_loan_id_cond_is_current_true on
> loan_agreements la  (cost=0.42..16986.53 rows=226145 width=13) (actual
> time=0.054..462.394 rows=216530 loops=1)
>   Filter: ('2016-08-11'::date > due_date)
>   Rows Removed by Filter: 21304
>   Buffers: shared hit=208343 read=18399
> Planning time: 0.168 ms
> Execution time: 479.773 ms
>
> If I disable IndexScans, plan changes likes this:
>
> QUERY PLAN
>
> --
> Bitmap Heap Scan on loan_agreements la  (cost=2884.01..23974.88
> rows=226145 width=13) (actual time=38.893..200.376 rows=216530 loops=1)
>   Recheck Cond: is_current
>   Filter: ('2016-08-11'::date > due_date)
>   Rows Removed by Filter: 21304
>   Heap Blocks: exact=18117
>   Buffers: shared hit=18212 read=557
>   ->  Bitmap Index Scan on
> idx_loan_agreemnets_loan_id_cond_is_current_true  (cost=0.00..2827.47
> rows=237910 width=0) (actual time=35.166..35.166 rows=237853 loops=1)
> Buffers: shared hit=119 read=533
> Planning time: 0.171 ms
> Execution time: 214.341 ms
>
> Question is — why IndexScan over partial index is estimated less than
> BitmapHeap + BitmapIndex scan. And how can I tell Planner, that IndexScan
> over 1/3 of table is not a good thing — IndexScan is touching 10x more pages

Both plans touch the same pages.  The index scan just touches some of
those pages over and over again.  A large setting of
effective_cache_size would tell it that the page will most likely
still be in cache when it comes back to touch it again, meaning the
cost of doing so will be small, basically free.

> and in a typical situation those are cold.

But they won't be, because it is heating them up itself, and
effective_cache_size says that stay then hot for the duration of the
query.

Also, with a random_page_cost of 2.5, you are telling it that even
cold pages are not all that cold.

What are the correlations of the is_current column to the ctid order,
and of the loan_id column to the ctid order?

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random slow queries

2016-06-29 Thread Jeff Janes
On Tue, Jun 28, 2016 at 6:24 PM,   wrote:
>
>
> PostgreSQL version:
> PostgreSQL 9.3.4, compiled by Visual C++ build 1600, 64-bit

The current minor version of that branch is 9.3.13, so you are 9 bug
fix releases behind.

I don't know if this matters, because I see that my first guess of
your problem was fixed in commit 4162a55c77cbb54acb4ac442e, which was
already included in 9.3.4.  (Yes, you did say you also observed the
problem in 9.5.3, but still, why intentionally run something that far
behind?)


> Things I tried:
> - Upgrading to PostgreSQL 9.5.3, compiled by Visual C++ build 1800, 64-bit
> This did not solve the problem, queries still take 122 seconds from time to
> time

Could you try 9.6beta2?

In particular, I am wondering if your problem was solved by

commit 8a7d0701814a4e293efad22091d6f6fb441bbe1c
Author: Tom Lane 
Date:   Wed Aug 26 18:18:57 2015 -0400

Speed up HeapTupleSatisfiesMVCC() by replacing the XID-in-progress test.


I am not entirely sure why this (as opposed to the previous-mentioned
4162a55c77cbb54) would fix a problem occurring during BIND, though.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_xlog dir not getting swept

2016-06-29 Thread Jeff Janes
On Wed, Jun 29, 2016 at 3:00 AM, Niels Kristian Schjødt
 wrote:
> About a day ago, there seems to have been some trouble in the network of my
> database (postgresql 9.3).
>
> I’m running my db with a streaming replication setup with wall shipping.
>
> I sync wal logs to a mounted networkdrive using archive_command = 'rsync -a
> %p /mnt/wal_drive/wals/%f  leading to my pg_xlog dir building up (590Gb). I rebooted the server, and
> the archiving command seems to succeed now - however - After about an hour
> of running, the pg_xlog drive has not decreased in size - I would have
> expect that! I can see that lot’s of files get’s synced to the
> /mnt/wal_drive/wals dir, but somehow the pg_xlog dir is not swept (yet)?
> Will this happen automatically eventually, or do I need to do something
> manually?

Successfully archived files are only removed by the checkpointer.  The
logic is quite complex and it can be very frustrating trying to
predict exactly when any given file will get removed.  You might want
to run a few manual checkpoints to see if that cleans it up.  But turn
on log_checkpoints and reload the configuration first.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] can't explain commit performance win7 vs linux : 8000/s vs 419/s

2016-06-25 Thread Jeff Janes
On Sat, Jun 25, 2016 at 9:19 AM, t.dalpo...@gmail.com
 wrote:
> Hi,
> I've postgres 9.5.3 installed on win7 64 bit, and ubuntu 16.04tls 64 bit,
> same SSD (evo 850 pro) , two different partitions. Laptop is 3.8Ghz.
> I've in each partition a simple database with one table called data256 with
> one column of 256 char.
> I wrote a program using libpq which:
> 1 connects to 127.0.0.1 to the server
> 2 drops and recreates the table;
> 3 executes 2000 times the exec() function with the command  "INSERT INTO
> data256 VALUES ('AA.. 250 times')"
> I want to commit after every insert of course.
> The program is the same both in win and linux; in ansi c, so it's portable.
>
> Performance:
> Win7: 8000 write/sec
> Linux: 419 write/sec
>
> I don't figure out why such a difference. Also what should I expect? Which
> one is reasonable?

The Win7 numbers seem suspiciously high to me, even for SSD.  Have you
tried holding the power button until it hard-resets the computer in
the middle of a run (preferably several runs going in parallel), and
see if comes back up without corruption and contains consistent data?
And then repeat that a several times?


> I compared the two postgresql.conf, they're identical (except obvious
> things), they're the default ones, I didn't touch them.

We don't know which things are obvious to you.

>
> Then, in order to test write / flush without postgres, I made another C
> program, to open a file in writing, and for 1000 times : write 256 bytes and
> flush them (using fsync in linux and FlushFileBuffers in win).
> Win7: 200 write/sec
> Linux: 100 write/sec

Rather than rolling your own program, can you run pg_test_fsync on each?

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can't get two index scans

2016-06-23 Thread Jeff Janes
On Wed, Jun 22, 2016 at 9:36 PM, Craig James <cja...@emolecules.com> wrote:
> On Wed, Jun 22, 2016 at 11:36 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:

>> You might be able to build a multiple column index on (smiles,
>> version_id) and have it do the right thing automatically. Whether that
>> is possible, and if so how effective it will actually be, would depend
>> on the implementation details of |>|. My gut feeling is that it would
>> not work well.
>
>
> No, because it's not a normal exact-match query. The analogy would be that
> you can build a multi-column index for an '=' operation on a string, but it
> wouldn't help if you were doing an '~' or 'LIKE' operation.

That restriction only applies to BTREE indexes.  GiST and GIN indexes
work differently, and don't have that particular limitation.  They can
use the second column of the index even if the first column is not
used, or (in the case of GiST at least) the first column is used with
an operator other than equality.

The main problems I've run into with GiST indexes is that they
sometimes take absurdly long times to build; and that the
split-picking algorithm might arrive at buckets ill-suited to your
queries so that the consultation of the index "works" in the sense
that it discards most of the non-matching rows without inspecting
them, but isn't actually faster. Unfortunately, both of these problems
seem hard to predict.  You pretty much have to try it (on a full-size
data set, as scaling up from toy data sets is also hard to predict)
and see how it does.

But, JChem's cartridge is apparently not using a GiST index, which is
what my first guess was.  I can't really figure out what PostgreSQL
API it is tapping into, so whatever it is very well might not support
multi-column indexes at all.

>> You could partition your data on version_id.  Then it would keep a
>> separate smiles index on each partition, and would only consult those
>> indexes which can possibly contain (according to the CHECK
>> constraints) the version_ids of interest in the query.
>
>
> I actually struck on this solution today and it works well. Instead
> partitioning on the version_id, I added a column "p" ("partition") and used
> 20 partitions where p is a random number from 0..19. This has the advantage
> that as new compounds are added, they are distributed throughout the
> partitions, so each partition remains a 5% sample of the whole.
>
> It's pretty cool. A full-table scan of all partitions is slightly slower,
> but if I want to do a sample and limit the run time, I can query with p = 0.
>
> It also has another huge benefit for a web site: I can give the user a
> progress-bar widget by querying the partitions one-by-one and updating the
> progress in 5% increments. This is really critical for long-running queries.

That does sound pretty useful.  You could potentially get the same
benefit with the multicolumn GiST index, without needing to partition
the table.  In a vague hand-wavy way, building an index "USING GIST
(p, smiles jchem_op_class)" is like using p to automatically partition
the index so it acts like individual indexes over smiles for each
value of p.  But it is unlikely to ever be as efficient as
well-crafted explicit partitions, and once you have gone to the effort
of setting them up there would probably be no point in trying to
change over.


>> Also, if you tune your system using benzene, you will be probably
>> arrive at a place not optimal for more realistic queries.
>
>
> No, it's actually very useful. I'm not interested in optimizing typical
> queries, but rather in limiting worst-case queries. This is a public web
> site, and you never know what molecule someone will draw. In fact, it's
> quite common for visitors to draw silly molecules like benzine or methane
> that would result in a heavy load if left to run to completion.

My benefit in having a non-public web site, is that I can just walk
over to their desk and yell at the people who do things like that to
my database.

(And I promise to stop searching for methane on your web site.)

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can't get two index scans

2016-06-22 Thread Jeff Janes
On Wed, Jun 22, 2016 at 9:03 AM, Craig James  wrote:
> I'm working with a third-party plugin that does chemistry.


Out of personal/professional curiosity, which one are you using, if
that can be disclosed?




> Notice that it doesn't use the i_version_smiles index at all, but instead
> applies the very expensive filter |>| to all 1% of the database.

You have to tell the database that |>| is very expensive, by setting
the COST of the function which it invokes.  You can get the name of
the function with:

select oprcode from pg_operator where oprname ='|>|' ;

(taking care for schema and overloading, etc.)

I would set the COST to at least 1000, probably more.

> So instead
> of getting a 100x speedup, we only get a 3x speedup, about 30x worse that
> what is theoretically possible.
>
> The production database is about 50x larger than this test database.
>
> Maybe I misunderstand what's possible with indexes, but it seems to me that
> it could first do the pk_version index scan, and then use the results of
> that to do a limited index-scan search using the i_version_smiles index. Is
> that not possible?

I don't think it can do that.  What it can do is run each index scan
to completion as a bitmap index scan, and then AND the bitmaps
together.

You might be able to build a multiple column index on (smiles,
version_id) and have it do the right thing automatically. Whether that
is possible, and if so how effective it will actually be, would depend
on the implementation details of |>|. My gut feeling is that it would
not work well.

You could partition your data on version_id.  Then it would keep a
separate smiles index on each partition, and would only consult those
indexes which can possibly contain (according to the CHECK
constraints) the version_ids of interest in the query.

Also, if you tune your system using benzene, you will be probably
arrive at a place not optimal for more realistic queries.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Savepoint and Releasepoint in Logs

2016-06-19 Thread Jeff Janes
On Fri, Jun 17, 2016 at 8:19 AM,   wrote:
> Hi ,
>
> I am connecting to PostgreSQL 9.4 via an ODBC driver on Windows machine from
> MS VBA application. I am facing huge performance issues while inserting data
> continuously. On analysing the logs , there were around 9 statements
> related to Save Points and Release Points.
>
>
>
> duration: 2.000 ms
>
> 2016-06-17 12:45:02 BST LOG:  statement: RELEASE _EXEC_SVP_1018CCF8
>
> 2016-06-17 12:45:02 BST LOG:  duration: 1.000 ms
>
> 2016-06-17 12:45:05 BST LOG:  statement: SAVEPOINT _EXEC_SVP_186EB5C8
>
> 2016-06-17 12:45:05 BST LOG:  duration: 0.000 ms
>
>
>
> I am guessing these statements are causing an overhead while inserting
> records in to the table.


The fact that there is 3 seconds between the release of one savepoint
the start of the next suggests that your client, not the server, is
the dominant bottleneck.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Jeff Janes
On Wed, Jun 15, 2016 at 6:16 AM, k...@rice.edu  wrote:
> On Wed, Jun 15, 2016 at 03:09:04PM +0200, Ivan Voras wrote:
>> On 15 June 2016 at 15:03, k...@rice.edu  wrote:
>>
>>
>> I don't suppose there's an effort in progress to make hash indexes use WAL?
>> :D
>
> Hi Ivan,
>
> Several people have looked at it but it has not made it to the top of anyone's
> to-do list.

I don't know if it is the top of his todo list, but Amit seems pretty
serious about it:

https://www.postgresql.org/message-id/caa4ek1lfzczyxloxs874ad0+s-zm60u9bwcyiuzx9mhz-kc...@mail.gmail.com

I hope to give him some help if I get a chance.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance of LIKE/NOT LIKE when used in single query

2016-06-09 Thread Jeff Janes
On Tue, Jun 7, 2016 at 9:57 PM, Ed Felstein  wrote:
> Hello,
> First time poster here.  Bear with me.
> Using PostgreSQL 9.5
> I have a situation where I have a LIKE and a NOT LIKE in the same query to
> identify strings in a varchar field.  Since I am using wildcards, I have
> created a GIN index on the field in question, which makes LIKE '%%'
> searches run very fast.  The problem is the NOT LIKE phrases, which (as
> would be expected) force a sequential scan.  Being that we're talking about
> millions of records, this is not desirable.
> Here's the question...
> Is there a way, using a single query, to emulate the process of running the
> LIKE part first, then running the NOT LIKE just on those results?

Just do it.  In my hands, the planner is smart enough to figure it out
for itself.

explain analyze select * from stuff where synonym like '%BAT%' and
synonym not like '%col not like%' ;

   QUERY PLAN
-
 Bitmap Heap Scan on stuff  (cost=16.10..63.08 rows=13 width=14)
(actual time=9.465..10.642 rows=23 loops=1)
   Recheck Cond: (synonym ~~ '%BAT%'::text)
   Rows Removed by Index Recheck: 76
   Filter: (synonym !~~ '%col not like%'::text)
   Heap Blocks: exact=57
   ->  Bitmap Index Scan on integrity_synonym_synonym_idx
(cost=0.00..16.10 rows=13 width=0) (actual time=8.847..8.847 rows=99
loops=1)
 Index Cond: (synonym ~~ '%BAT%'::text)
 Planning time: 18.261 ms
 Execution time: 10.932 ms


So it is using the index for the positive match, and filtering those
results for the negative match, just as you wanted.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-06-05 Thread Jeff Janes
On Sun, Jun 5, 2016 at 9:03 AM, Tom Lane  wrote:
> Claudio Freire  writes:
>> So correlated index scans look extra favourable vs bitmap index scans
>> because bitmap heap scans consider random page costs sans correlation
>> effects (even though correlation applies to bitmap heap scans as
>> well).
>
> Really?  How?  The index ordering has nothing to do with the order in
> which heap tuples will be visited.


It is not the order itself, but the density.

If the index is read in a range scan (as opposed to =ANY scan), and
the index lead column is correlated with the table ordering, then the
parts of the table that need to be visited will be much denser than if
there were no correlation.  But Claudio is saying that this is not
being accounted for.


Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] similarity and operator '%'

2016-05-30 Thread Jeff Janes
On Mon, May 30, 2016 at 10:53 AM, Volker Boehm  wrote:

> The reason for using the similarity function in place of the '%'-operator is
> that I want to use different similarity values in one query:
>
> select name, street, zip, city
> from addresses
> where name % $1
> and street % $2
> and (zip % $3 or city % $4)
> or similarity(name, $1) > 0.8

I think the best you can do through query writing is to use the
most-lenient setting in all places, and then refilter to get the less
lenient cutoff:

 select name, street, zip, city
 from addresses
 where name % $1
 and street % $2
 and (zip % $3 or city % $4)
 or (name % $1 and similarity(name, $1) > 0.8)

If it were really important to me to get maximum performance, what I
would do is alter/fork the pg_trgm extension so that it had another
operator, say %%%, with a hard-coded cutoff which paid no attention to
the set_limit().  I'm not really sure how the planner would deal with
that, though.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: Planner chooses slow index heap scan despite accurate row estimates

2016-05-30 Thread Jeff Janes
On Sat, May 28, 2016 at 5:38 PM, Jake Magner  wrote:
> Tom Lane-2 wrote
>> Jake Magner 
>
>> jakemagner90@
>
>>  writes:
>>> I tried without doing an INSERT at all, just running the SELECT queries
>>> and
>>> the result is the same. Nested loop is chosen but is much slower.
>>
>> FWIW, I just noticed that the comparisons you're using are plain equality
>> of the arrays.  While a GIN array index supports that, it's not exactly
>> its strong suit: the sort of questions that index type supports well are
>> more like "which arrays contain value X?".  I wonder if it'd be worth
>> creating btree indexes on the array column.
>
> I added btree indexes and now the nested loop uses those and is a bit faster
> than the hash join. So the planner just misestimates the cost of doing the
> equality comparisons?

I wonder how it would do in 9.4?  Either in them actually being
faster, or the planner doing
a better job of realizing they won't be fast.

> I'd prefer not to add more indexes, the hash join
> performance is fast enough if it would just choose that but I'm reluctant to
> turn off nested loops in case the table gets a lot bigger.

A large hash join just needs to divide it up into batches.  It should
still be faster than the nested loop (as currently implemented) ,
until you run out of temp space.

But, you already have a solution in hand.  I agree you shouldn't add
more indexes without reason, but you do have a reason.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-25 Thread Jeff Janes
On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby  wrote:
> Summary: Non-unique btree indices are returning CTIDs for rows with same
> value of indexed column not in logical order, imposing a high performance
> penalty.
>
> Running PG 9.5.3 now, we have a time-based partitions of append-only tables
> with data loaded from other sources.  The tables are partitioned by time, and
> timestamp column has an non-unique, not-null btree index.
>
> The child tables are each ~75GB and expected to keep growing.  For a child
> table with a week's worth of data:
> relpages  | 11255802
> reltuples | 5.90502e+07
>
> The data is loaded shortly after it's available, so have high correlation in
> pg_statistic:
> [pryzbyj@viaero ~]$ psql ts -c "SELECT tablename, correlation, n_distinct 
> FROM pg_stats s JOIN pg_class c ON (c.relname=s.tablename) WHERE tablename 
> LIKE 'cdrs_huawei_pgwrecord%' AND attname='recordopeningtime' ORDER BY 1" 
> |head
> tablename | correlation | n_distinct
> --+-+
>  cdrs_huawei_pgwrecord|0.97 | 102892
>  cdrs_huawei_pgwrecord_2016_02_15 |0.999658 |  96145
>  cdrs_huawei_pgwrecord_2016_02_22 |0.43 |  91916
>  cdrs_huawei_pgwrecord_2016_02_29 |0.997219 |  50341
>  cdrs_huawei_pgwrecord_2016_03_01 |0.47 |  97485
>
> But note the non-uniqueness of the index column:
> ts=# SELECT recordopeningtime, COUNT(1) FROM cdrs_huawei_pgwrecord WHERE 
> recordopeningtime>='2016-05-21' AND recordopeningtime<'2016-05-22' GROUP BY 1 
> ORDER BY 2 DESC;
>   recordopeningtime  | count
> -+---
>  2016-05-21 12:17:29 |   176
>  2016-05-21 12:17:25 |   171
>  2016-05-21 13:11:33 |   170
>  2016-05-21 10:20:02 |   169
>  2016-05-21 11:30:02 |   167
> [...]

That is not that much duplication.  You aren't going to have dozens or
hundreds of leaf pages all with equal values.   (and you only showed
the most highly duplicated ones, presumably the average is much less)


> We have an daily analytic query which processes the previous day's data.  For
> new child tables, with only 1 days data loaded, this runs in ~30min, and for
> child tables with an entire week's worth of data loaded, takes several hours
> (even though both queries process the same amount of data).

For an append only table, why would the first day of a new partition
be any less fragmented than that same day would be a week from now?
Are you sure it isn't just that your week-old data has all been aged
out of the cache?


> First, I found I was able to get 30-50min query results on full week's table 
> by
> prefering a seq scan to an index scan.  The row estimates seemed fine, and the
> only condition is the timestamp, so the planner's use of index scan is as
> expected.

Can you show us the query?  I would expect a bitmap scan of the index
(which would do what you want, but even more so), instead.

>
> AFAICT what's happening is that the index scan was returning pages
> nonsequentially.  strace-ing the backend showed alternating lseek()s and
> read()s, with the offsets not consistently increasing (nor consistently
> decreasing):
> % sudo strace -p 25588 2>&1 |grep -m9 'lseek(773'
> lseek(773, 1059766272, SEEK_SET)= 1059766272
> lseek(773, 824926208, SEEK_SET) = 824926208
> lseek(773, 990027776, SEEK_SET) = 990027776
> lseek(773, 990330880, SEEK_SET) = 990330880
> lseek(773, 1038942208, SEEK_SET)= 1038942208
> lseek(773, 1059856384, SEEK_SET)= 1059856384
> lseek(773, 977305600, SEEK_SET) = 977305600
> lseek(773, 990347264, SEEK_SET) = 990347264
> lseek(773, 871096320, SEEK_SET) = 871096320
>
> .. and consecutive read()s being rare:
> read(802, "g"..., 8192) = 8192
> lseek(802, 918003712, SEEK_SET) = 918003712
> read(802, "c"..., 8192) = 8192
> lseek(802, 859136000, SEEK_SET) = 859136000
> read(802, "a"..., 8192) = 8192
> lseek(802, 919601152, SEEK_SET) = 919601152
> read(802, "d"..., 8192) = 8192
> lseek(802, 905101312, SEEK_SET) = 905101312
> read(802, "c"..., 8192) = 8192
> lseek(801, 507863040, SEEK_SET) = 507863040
> read(801, "p"..., 8192) = 8192
> lseek(802, 914235392, SEEK_SET) = 914235392
> read(802, "c"..., 8192) = 8192


Which of those are the table, and which the index?

Something doesn't add up here.  How could an index of an append-only
table possibly become that fragmented, when the highest amount of key
duplication is about 170?

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIKE pattern

2016-05-12 Thread Jeff Janes
On Thu, May 12, 2016 at 8:13 AM, Robert Klemme
 wrote:
> On Mon, May 9, 2016 at 11:41 PM, SoDupuDupu wrote:
>> Владимир-3 wrote
>>> It seems my quite complex query runs 10 times faster on "some_column
>>> LIKE '%test_1' " vs "some_column LIKE 'test_1' "
>>> So I just add "%"  to the pattern...
>>
>> Keep in mind then LIKE '%test_1' and LIKE 'test_1' are not equivalent, using
>> the % as a prefix to the argument means that the scan only has to confirm
>> that the value ends in 'test_1' where forgoing the % entirely means that you
>> are essentially saying some_column='test_1'.
>
> Yes, but wouldn't the latter test be more efficient usually since it
> tests against a prefix - at least with a regular index?

In theory.  But the planner is imperfect, and they will have different
estimated selectivities which could easily tip the planner into making
a poor choice for the more selective case.  Without seeing the plans,
it is hard to say much more.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performant queries on table with many boolean columns

2016-04-21 Thread Jeff Janes
On Wed, Apr 20, 2016 at 11:54 AM, Teodor Sigaev  wrote:
>>
>> The obvious thing seems to make a table with ~100 columns, with 1 column
>> for each boolean property. Though, what type of indexing strategy would
>> one use on that table? Doesn't make sense to do BTREE. Is there a better
>> way to structure it?
>>
> looks like a deal for contrib/bloom index in upcoming 9.6 release

Not without doing a custom compilation with an increased INDEX_MAX_KEYS:

ERROR:  cannot use more than 32 columns in an index

But even so, I'm skeptical this would do better than a full scan.  It
would be interesting to test that.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performant queries on table with many boolean columns

2016-04-21 Thread Jeff Janes
On Wed, Apr 20, 2016 at 11:41 AM, Rob Imig  wrote:
> Hey all,
>
> New to the lists so please let me know if this isn't the right place for
> this question.
>
> I am trying to understand how to structure a table to allow for optimal
> performance on retrieval. The data will not change frequently so you can
> basically think of it as static and only concerned about optimizing reads
> from basic SELECT...WHERE queries.
>
> The data:
>
> ~20 million records
> Each record has 1 id and ~100 boolean properties
> Each boolean property has ~85% of the records as true
>
>
> The retrieval will always be something like "SELECT id FROM  WHERE
> .
>
>  will be some arbitrary set of the ~100 boolean columns and you
> want the ids that match all of the conditions (true for each boolean
> column). Example:
> WHERE prop1 AND prop18 AND prop24


Is 3 a typical number of conditions to have?

85%^3 is 61.4%, so you are fetching most of the table.  At that point,
I think I would give up on indexes and just expect to do a full table
scan each time.   Which means a single column
bit-string data type might be the way to go, although the construction
of the queries would then be more cumbersome, especially if you will
do by hand.

I think the only way to know for sure is to write a few scripts to benchmark it.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

2016-04-02 Thread Jeff Janes
On Tue, Mar 22, 2016 at 9:41 AM, Oleg Bartunov <obartu...@gmail.com> wrote:
>
>
> On Sat, Mar 19, 2016 at 5:44 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
>>
>>
>> I don't see why it would not be possible to create a new execution node
>> type that does an index scan to obtain order (or just to satisfy an equality
>> or range expression), and takes a bitmap (as produced by the FTS/GIN) to
>> apply as a filter.  But, I don't know of anyone planning on doing that.
>
>
> Please, find bitmap filtering patch, which we developed several months ago,
> but failed  to find good use case :( Teodor is here now, so he could answer
> the questions.

I can't find any benefit because I can't get the new node to ever execute.

I set up this:

create table foo as select md5(random()::text), random() as y from
generate_series(1,1000);
create index on foo using gin (md5 gin_trgm_ops);
create index on foo (y);
vacuum ANALYZE foo ;

Then when I run this:

explain (analyze,buffers) select y from foo where md5 like '%abcde%'
order by y limit 1

The function "cost_filtered_index(newpath)" never fires.  So the
planner is never even considering this feature.

It seems to be getting short-circuited here:

if (ipath->indexorderbys == NIL && ipath->indexorderbycols == NIL)
continue;



I don't know enough about the planner to know where to start on this.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query not using Index

2016-03-27 Thread Jeff Janes
On Sun, Mar 27, 2016 at 9:12 AM, Wei Shan  wrote:
> Hi Andreas,
>
> The tablespace is not on SSD although I intend to do it within the next
> week. I actually tried reducing the random_page_cost to 0.2 but it doesn't
> help.

Setting random_page_cost to less than seq_page_cost is nonsensical.

You could try to increase cpu_tuple_cost to 0.015 or 0.02


Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

2016-03-18 Thread Jeff Janes
On Wed, Mar 16, 2016 at 6:53 AM, Andreas Joseph Krogh 
wrote:

> På onsdag 16. mars 2016 kl. 14:37:27, skrev Tom Lane :
>
> Andreas Joseph Krogh  writes:
> > 1. Why isnt' folder_id part of the index-cond?
>
> Because a GIN index is useless for sorting.
>
> > 2. Is there a way to make it use the (same) index to sort by
> > received_timestamp?
>
> No.
>
> > 3. Using a GIN-index, is there a way to use the index at all for sorting?
>
> No.
>
> > 4. It doesn't seem like ts_rank uses the index for sorting either.
>
> Same reason.
>
> regards, tom lane
>
>
> So it's basically impossible to use FTS/GIN with sorting on large datasets?
> Are there any plans to improve this situation?
>

I don't see why it would not be possible to create a new execution node
type that does an index scan to obtain order (or just to satisfy an
equality or range expression), and takes a bitmap (as produced by the
FTS/GIN) to apply as a filter.  But, I don't know of anyone planning on
doing that.

Cheers,

Jeff


Re: [PERFORM] Why Postgres use a little memory on Windows.

2016-02-20 Thread Jeff Janes
On Sat, Feb 20, 2016 at 8:46 AM, tuanhoanganh  wrote:
> Hello
>
> I have a bad query on PostgreSQL 9.0.23 - 64bit - Windows 2012 R2 - 48GB Ram

9.0 is no longer supported.  You should work toward upgrading to a
newer version.  It might not solve this problem, but it would give you
better tools for diagnosing the problem.  Which is a pretty good step
toward solving it.

> When I check taskmanager, I found postgres process is user 4-5MB

Other people have explained the details of how the query is being run
and why it is being run that way.  But I would like to take a step
back from that, and tell you that the reason that PostgreSQL is not
using more memory, is that it doesn't think that using more memory
would help.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Running lots of inserts from selects on 9.4.5

2016-02-09 Thread Jeff Janes
On Tue, Feb 9, 2016 at 4:09 PM, Dan Langille  wrote:
> I have a wee database server which regularly tries to insert 1.5 million or
> even 15 million new rows into a 400 million row table.  Sometimes these
> inserts take hours.
>
> The actual query to produces the join is fast.  It's the insert which is
> slow.
>
> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
> DeltaSeq)
>   SELECT batch_testing.FileIndex, batch_testing.JobId, Path.PathId,
> Filename.FilenameId, batch_testing.LStat, batch_testing.MD5,
> batch_testing.DeltaSeq
> FROM batch_testing JOIN Path ON (batch_testing.Path = Path.Path)
>JOIN Filename ON (batch_testing.Name =
> Filename.Name);
>
> This is part of the plan: http://img.ly/images/9374145/full  created via
> http://tatiyants.com/pev/#/plans
>
> This gist contains postgresql.conf, zfs settings, slog, disk partitions.
>
>https://gist.github.com/dlangille/1a8c8cc62fa13b9f

The table you are inserting into has 7 indexes, all of which have to
be maintained.  The index on the sequence column should be efficient
to maintain.  But for the rest, if the inserted rows are not naturally
ordered by any of the indexed columns then it would end up reading 6
random scattered leaf pages in order to insert row pointers.  If none
those pages are in memory, that is going to be slow to read off from
hdd in single-file.  Also, you are going dirty all of those scattered
pages, and they will be slow to write back to hdd because there
probably won't be much opportunity for write-combining.

Do you really need all of those indexes?

Won't the index on (jobid, pathid, filenameid) service any query that
(jobid) does, so you can get rid of the latter?

And unless you have range queries on fileindex, like "where jobid = 12
and fileindex between 4 and 24" then you should be able to replace
(jobid, fileindex) with (fileindex,jobid) and then get rid of the
stand-alone index on (fileindex).

If you add an "order by" to the select statement which order by the
fields of one of the remaining indexes, than you could make the
maintenance of that index become much cheaper.

Could you move the indexes for this table to SSD?

SSD is probably wasted on your WAL.  If your main concern is bulk
insertions, then WAL is going to written sequentially with few fsyncs.
That is ideal for HDD.  Even if you also have smaller transactions,
WAL is still sequentially written as long as you have a non-volatile
cache on your RAID controller which can absorb fsyncs efficiently.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bitmap and-ing between btree and gin?

2016-02-08 Thread Jeff Janes
On Thu, Feb 4, 2016 at 9:19 AM, Jordi  wrote:

The custom here is to respond in line, not to top-post.  Thanks.

>
> So basically you're saying it's hard to do sorting in any way when a gin
> index is involved? Neither with a complete multi-column btree_gin index
> because it doesn't support sorting per definition, nor with a seperate gin
> and btree because there would be an extra post-sorting step involved over
> the FULL resultset (because of the LIMIT).

In principle there is no reason (that I can think of) that a normal
btree index range scan couldn't accept a bitmap as an optional input,
and then use that as a filter which would allow it to walk the index
in order while throwing out tuples that can't match the other
conditions.  You are not the first person who would benefit from such
a feature.  But it would certainly not be trivial to implement.  It is
not on anyone's to-do list as far as I know.

>From your earlier email:

> BUT: when I remove the ORDER BY statement, the query runs really fast. It 
> uses the 2 indexes seperately and bitmap-ands them together, resulting in a 
> fast executing query.

When you removed the ORDER BY, did you also remove the LIMIT?  If you
removed the ORDER BY and kept the LIMIT, that is pretty much a
meaningless comparison.  You are asking a much easier question at that
point.

> Then would you have any hint on how to implement pagination when doing full
> text search?
> Cause in theory, if I gave it a id>100 LIMIT 100, it might just as well
> return me results 150 to 250, instead of 100 to 200...

Can you use a method that maintains state (cursor with fetching, or
temporary storage) so that it doesn't have to recalculate the query
for each page?

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] gin performance issue.

2016-02-08 Thread Jeff Janes
On Mon, Feb 8, 2016 at 2:21 AM, Marc Mamin  wrote:
>
> - auto vacuum will not run as these are insert only tables
> - according to this post, auto analyze would also do the job:
>   
> http://postgresql.nabble.com/Performance-problem-with-gin-index-td5867870.html
>   It seems that this information is missing in the doc
>
>   but it sadly neither triggers in our case as we have manual analyzes called 
> during the dataprocesssing just following the imports.
>   Manual vacuum is just too expensive here.
>
>   Hence disabling fast update seems to be our only option.

Does disabling fast update cause problems?  I always start with
fastupdate disabled, and only turn on if it I have a demonstrable
problem with it being off.

I would think "off" is likely to be better for you.  You say each
distinct key only appears in 2.7 rows.  So you won't get much benefit
from aggregating together all the new rows for each key before
updating the index for that key, as there is very little to aggregate.

Also, you say the inserts come in bulk.  It is generally a good thing
to slow down bulk operations by making them clean up their own messes,
for the sake of everyone else.


>   I hope this problem will help push up the 9.5 upgrade on our todo list :)
>
>   Ideally, we would then like to flush the pending list inconditionally after 
> the imports.
>   I guess we could achieve something approaching while modifying the analyze 
> scale factor  and gin_pending_list_limit
>   before/after the (bulk) imports, but having the possibility to flush it per 
> SQL would be better.
>   Is this a reasonable feature wish?

That feature has already been committed for the 9.6 branch.

>   And a last question: how does the index update work with bulk (COPY) 
> inserts:
>   without pending list: is it like a per row trigger or will the index be 
> cared of afterwards ?

Done for each row.

>   with small pending lists : is there a concurrency problem, or can both 
> tasks cleanly work in parallel ?

I don't understand the question.  What are the two tasks you are
referring to? Do you have multiple COPY running at the same time in
different processes?

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] insert performance

2016-01-11 Thread Jeff Janes
On Sat, Jan 9, 2016 at 9:57 PM, Jinhua Luo  wrote:
>
> To make a clean test env, I clone a new table, removing the indexes (keeping
> the primary key) and triggers, and use pgbench to test insert statement
> purely.

Can you share the pgbench command line, and the sql file you feed to
it (and whatever is needed to set up the schema)?


Thanks,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advise needed for a join query with a where conditional

2015-12-12 Thread Jeff Janes
On Thu, Dec 10, 2015 at 8:38 PM, ankur_adwyze  wrote:
> Hi Folks,
>
> I am a newbie to this mailing list. Tried searching the forum but didn't
> find something similar to the problem I am facing.
>
> Background:
> I have a Rails app with Postgres db. For certain reports, I have to join
> multiple tables. However, certain join queries are dog slow and I am
> wondering if I am missing any index.

Are you vacuuming and analyzing your database appropriately?  What
non-default config settings do you have.

Something certainly seems suspicious about custom_tags_fb_ad_groups
and its index.


->  Index Only Scan using custom_tags_fb_ad_groups_index on
custom_tags_fb_ad_groups custom_tags_fb_ad_groups_1
(cost=0.42..1728.30 rows=1 width=8) (actual time=1.352..3.815 rows=1
loops=32934)
 Index Cond: (fb_ad_group_id = fb_ad_group_reports.fb_ad_group_id)
Heap Fetches: 32934

Doing a single-value look up into an index should have an estimated
cost of around 9, unless you did something screwy with your cost_*
parameter settings.  Why does it think it is 1728.30 instead?  Is the
index insanely bloated?  And it actually is slow to do those look ups,
which is where almost all of your time is going.

And, why isn't it just using a hash join on that table, since you are
reading so much of it?

I'd do a VACUUM FULL of that table, then a regular VACUUM on it (or
the entire database), then ANALYZE it (or your entire database), and
see if that took care of the problem.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index scan cost calculation

2015-11-30 Thread Jeff Janes
On Mon, Nov 30, 2015 at 6:03 AM, Glyn Astill  wrote:
>
>
>
>
> If I create the index show+best+block+row+seat then the planner appears to 
> favour that, and all is well.  Despite the startup cost estimate being the 
> same, and total cost being 0.01 higher.  This is something I fail to 
> understand fully.

I think usually Index scans that are estimated to be within 1% of each
other are considered to be identical.  Which one gets chosen then
depends on what order they are considered in, which I think is in
implementation dependent detail. Usually it is the most recently
created one, which would explain why you got the plan switch with the
new index.


> Tom stated the index choice is due to a selectivity underestimate.  I think 
> this may be because there is actually a correlation between "best"+"block" 
> and "type", but from Toms reply my understanding was that total selectivity 
> for the query is calculated as the product of the individual selectivities in 
> the where clause.

I think the problem here is not with total query selectivity estimate,
but rather selectivity estimates of the indexes.

It thinks the combination of (show, type, best, block)  is enough to
get down to a single row.  One index adds "flag" to that (which is not
useful to the query) and the other adds "row" to that, which is useful
but the planner doesn't think it is because once you are down to a
single tuple additional selectivity doesn't help.


> Are particular equality clauses ever excluded from the calculation as a 
> result of available indexes or otherwise?

Clauses that can't be used in an "indexable" way are excluded from the
index selectivity, but not from the total query selectivity.

> Or is it just likely that the selection of the new index is just by chance?

Bingo.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index scan cost calculation

2015-11-28 Thread Jeff Janes
On Thu, Nov 26, 2015 at 8:11 AM, Glyn Astill  wrote:
> Hi All,
>
> Using pg 9.4.5 I'm looking at a table set up by a 3rd party application and 
> trying to figure out why a particular index is being chosen over another for 
> updates/deletes.
>
> From what I can see the reason is that plans using either index have the same 
> exactly the same cost.  So rather I'm asking if there's something glaringly 
> obvious I'm missing, or is there anything I can to to get better estimates.
>
> The table is as follows and has  ~ 50M rows, ~ 4.5GB in size:
>
> CREATE TABLE tickets.seats
> (
>   recnum serial NOT NULL,
>   show numeric(8,0) NOT NULL,
>   type numeric(4,0) NOT NULL,
>   block character varying(8) NOT NULL,
>   "row" numeric(14,0) NOT NULL,
>   seat numeric(8,0) NOT NULL,
>   flag character varying(15) NOT NULL,
>   transno numeric(8,0) NOT NULL,
>   best numeric(4,0) NOT NULL,
>   "user" character varying(15) NOT NULL,
>   "time" numeric(10,0) NOT NULL,
>   date date NOT NULL,
>   date_reserved timestamp NOT NULL
> );
>
> Indexes:
>   "seats_index01" PRIMARY KEY, btree (show, type, best, block, "row", seat)   
>// (1094 MB)
>   "seats_index00" UNIQUE, btree (recnum)  
> // (2423 MB)
>   "seats_index02" UNIQUE, btree (show, type, best, block, flag, "row", seat, 
> recnum)  // (2908 MB)


Why does the index seats_index02 exist in the first place?  It looks
like an index designed for the benefit of a single query.  In which
case, could flag column be moved up front?  That should prevent it
from looking falsely enticing.

A column named "flag" is not usually the type of thing you expect to
see a range query on, so moving it leftward in the index should not be
a problem.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HASH

2015-11-05 Thread Jeff Janes
On Thu, Nov 5, 2015 at 1:11 AM, Artem Tomyuk  wrote:
> Hi all.
>
> Is the speed of hash operations stands on the performance of CPU?

Yes, but the variation is probably not as much as the raw timing in
your example indicates.

> Below you can see part from output of explain analyze command
>
> Intel(R) Xeon(R) CPU   E7520  @ 1.87GHz
>
> "  ->  Hash  (cost=337389.43..337389.43 rows=3224443 width=34)
> (actual time=15046.382..15046.382 rows=3225191 loops=1)"
> "Buckets: 524288  Batches: 1  Memory Usage: 207874kB"

A lot of that time was probably spent reading the data off of disk so
that it could hash it.

You should turn track_io_timing on, run "explain (analyze, buffers)
..." and then show the entire explain output, or at least also show
the entries downstream of the Hash node.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GIN index always doing Re-check condition, postgres 9.1

2015-11-02 Thread Jeff Janes
On Mon, Nov 2, 2015 at 12:19 AM, Andrey Osenenko
 wrote:
>
> It also looks like if there was a way to create a table with just primary
> key and add an index to it that indexes data from another table, it would
> work much, much faster since there would be very little to read from disk
> after index lookup. But looks like there isn't.

There is a way to do this, but it is pretty gross.

You can define function which takes the primary key as input and
returns the data to index.  Mark the function as immutable, even
though it isn't.  Something like:

CREATE OR REPLACE FUNCTION public.filler_by_aid(integer)
 RETURNS text
 LANGUAGE sql
 IMMUTABLE
AS $function$ select filler::text from pgbench_accounts where aid=$1 $function$

Then create a table which has just the primary key, and create a
functional index on that table

create table foobar as select aid from pgbench_accounts;
create index on foobar (filler_by_aid(aid));

Now you can query the skinny table by reference to the data in the wide table:

explain analyze select count(*) from foobar where filler_by_aid(aid)='zebra';

Since you fibbed to PostgreSQL about the functions immutability, it is
pretty easy to get a corrupt index here.  Every time the parent is
updated, you have to be sure to delete and reinsert the primary key in
the corresponding skinny table, otherwise it will not reflect the
updated value.

What you gain in the skinny table you could very well lose with the
triggers needed to maintain it.  Not to mention the fragility.


It would be simpler if you could just force the wide data to always be
toasted, even if it is not wide enough to trigger the default toast
threshold.  You could get a skinnier table (although not quite as
skinny as one with only a single column), without having to do
unsupported hacks.  (I am assuming here, without real evidence other
than intuition, that most of your news articles are in fact coming in
under the toast threshold).


> So am I correct in assumption that as the amount of rows grows, query times
> for rows that are not in memory (and considering how many of them there are,
> most won't be) will grow linearly?

Yep.

What you really need are index only scans.  But those are not
supported by gin indexes, and given the gin index structure it seems
unlikely they will ever support index-only scans, at least not in a
way that would help you.

What are you going to do with these 23,000 primary keys once you get
them, anyway?  Perhaps you can push that analysis into the database
and gain some efficiencies there.

Or you could change your data structure.  If all you are doing is
searching for one tsvector token at a time, you could unnest ts_vector
and store it in a table like (ts_token text, id_iu bigint).  Then
build a regular btree index on (ts_token, id_iu) and get index-only
scans (once you upgrade from 9.1 to something newer)

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GIN index always doing Re-check condition, postgres 9.1

2015-11-01 Thread Jeff Janes
On Sun, Nov 1, 2015 at 10:52 PM, Andrey Osenenko
 wrote:
> I have a table with roughly 300 000 rows, each row containing two large
> strings - one is news article HTML, the other is news article plaintext. The
> table has a bigint primary key.
>
> A GIN index is used to do fulltext search on the plaintext part. All I want
> to retrieve when I do fulltext search is values of primary key column.
>
> With a popular word, the amount of results from fulltext search query can be
> pretty high - a query can return 23 000 rows and some can more, and will
> return more as the database continues to grow.
>
> The problem I have is that postgres always does Re-check condition step for
> my request. That query with 23k rows takes 20 seconds to execute, and
> EXPLAIN shows that almost all of that time is spent
> re-checking condition.

Explain does not address the issue of how much time was spent doing
rechecks.  You are misinterpreting something.

> The second time I run the same query, I get results
> immediately.

That suggests the time is spent reading data from disk the first time,
not spent doing rechecks.  Rechecks do not get faster by repeated
execution, except indirectly to the extent the data has already been
pulled into memory.  But other things get faster due to that, as well.

Now those are not mutually exclusive, as doing a recheck might lead to
reading toast tables that don't need to get read at all in the absence
of rechecks.  So rechecks can lead to IO problems.  But there is no
evidence that that is the case for you.

>
> 1st run:
> Bitmap Heap Scan on fulldata  (cost=266.79..39162.57 rows=23069 width=8)
> (actual time=135.727..19499.667 rows=23132 loops=1)
>   Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@
> '''москв'''::tsquery)

This tells you what condition will be applied to the recheck, in case
a recheck is needed due to bitmap memory overflow.  It doesn't tell
how many times, if any, that was actually done, or how much time was
spent doing it.

As far as I know, there is no way to distinguish a "lossy index"
recheck form a "lossy bitmap" recheck in version 9.1.

>   Buffers: shared hit=115 read=13000

That you needed only 13115 blocks to deliver 23069 tells me that there
is little if any recheck-driven toast table access going on.  That the
second execution was very fast tells me that there is little
rechecking at all going on, because actual rechecking is CPU
intensive.

I don't think your problem has anything to do with rechecking.  You
simply have too much data that is not in memory.  You need more
memory, or some way to keep your memory pinned with what you need.  If
you are on a RAID, you could also increase effective_io_concurrency,
which lets the bitmap scan prefetch table blocks.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GroupAggregate and Integer Arrays

2015-10-24 Thread Jeff Janes
On Fri, Oct 23, 2015 at 9:26 AM, David Osborne  wrote:
> Ah yes sorry:
>
> I think these cover it...
>
> CREATE AGGREGATE sum (
>   sfunc = array_add,
>   basetype = INTEGER[],
>   stype = INTEGER[],
>   initcond = '{}'
>);
>
> CREATE OR REPLACE FUNCTION array_add(int[],int[]) RETURNS int[] AS $$
>-- Add two arrays.
>select
>   ARRAY (
>  SELECT coalesce($1[i],0) + coalesce($2[i],0)
>  FROM (
> select generate_series(least(array_lower($1, 1),array_lower($2,
> 1)), greatest(array_upper($1, 1),array_upper($2, 1)), 1) AS i
>  ) sub
>GROUP BY i
>ORDER BY i
>);
> $$ LANGUAGE sql STRICT IMMUTABLE;

You are paying a lot for the convenience of using a sql language
function here.  If you want much better performance, you would
probably have to rewrite it into C.  But that would be a drag, and I
would try just throwing more CPU at it first.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GroupAggregate and Integer Arrays

2015-10-23 Thread Jeff Janes
On Fri, Oct 23, 2015 at 7:29 AM, David Osborne  wrote:


> Hi,
>
> Wondering if anyone could suggest how we could improve the performance of
> this type of query?
> The intensive part is the summing of integer arrays as far as I can see.
>


Postgres does not ship with any 'sum' function which takes array arguments.

> select sum('{1,2,3,4,5,6}'::int[]);

ERROR:  function sum(integer[]) does not exist

Are you using a user defined function?  If so, how did you define it?

Cheers,

Jeff


Re: [PERFORM] Having some problems with concurrent COPY commands

2015-10-12 Thread Jeff Janes
On Mon, Oct 12, 2015 at 11:17 AM, Shaun Thomas  wrote:

> Hi guys,
>
> I've been doing some design investigation and ran into an interesting snag
> I didn't expect to find on 9.4 (and earlier). I wrote a quick python script
> to fork multiple simultaneous COPY commands to several separate tables and
> found that performance apparently degrades based on how many COPY commands
> are running.
>
> For instance, in the logs with one COPY, I see about one second to import
> 100k rows. At two processes, it's 2 seconds. At four processes, it's 4
> seconds. This is for each process. Thus loading 400k rows takes 16 seconds
> cumulatively. To me, it looked like some kind of locking issue, but
> pg_locks showed no waits during the load. In trying to figure this out, I
> ran across this discussion:
>
>
> http://www.postgresql.org/message-id/cab7npqqjeasxdr0rt9cjiaf9onfjojstyk18iw+oxi-obo4...@mail.gmail.com
>
> Which came after this:
>
> http://forums.enterprisedb.com/posts/list/4048.page
>
> It would appear I'm running into whatever issue the xloginsert_slots patch
> tried to address, but not much discussion exists afterwards. It's like the
> patch just kinda vanished into the ether even though it (apparently)
> massively improves PG's ability to scale data import.
>
> I should note that setting wal_level to minimal, or doing the load on
> unlogged tables completely resolves this issue. However, those are not
> acceptable settings in a production environment. Is there any other way to
> get normal parallel COPY performance, or is that just currently impossible?
>
> I also know 9.5 underwent a lot of locking improvements, so it might not
> be relevant. I just haven't gotten a chance to repeat my tests with 9.5
> just yet.
>


Can you provide the test script?  Also, have you tuned your database for
high io throughput?  What is your storage system like?


Re: [PERFORM] Performance problem with gin index

2015-09-29 Thread Jeff Janes
On Tue, Sep 29, 2015 at 8:45 AM, Bertrand Paquet <
bertrand.paq...@doctolib.fr> wrote:

> Hi,
>
> We have got big slow down on our production plateform (PG 9.4.4).
>

What is it slow compared to?  Did your version change, or your
workload/usage change?


>
> After analyzing wals with pg_xlogdump, we see lot of writing in Gin
> Indexes.
>
> We suspect slow down are related to the write of pending update on the
> index.
>
> So, is there any method to see
> - what is the current config of gin_pending_list_limit on a given index ?
>

gin_pending_list_limit will be introduced in 9.5.  In 9.4 and before, there
is no such parameter.  Instead, the limit is tied to the setting of
work_mem in those versions.


> - the current size of pending list on a given index ?
>

You can use this from the pgstattuple contrib module:

SELECT * FROM pgstatginindex('test_gin_index');

Your best bet may be to turn off fastupdate.  It will slow down most
inserts/updates, but you will not have the huge latency spikes you get with
fastupdate turned on.

Also, you might (or might not) have a higher overall throughput with
fastupdate turned off, depending on a lot of things like the size of the
index, the size of ram and shared_buffers, the number of spindles in your
RAID, the amount of parallelization in your insert/update activity, and the
distribution of "keys" among the data you are inserting/updating.

Cheers,

Jeff


Re: [PERFORM] Gist indexing performance with cidr types

2015-08-26 Thread Jeff Janes
On Wed, Aug 26, 2015 at 4:29 AM, Henrik Thostrup Jensen h...@nordu.net
wrote:

 On Wed, 26 Aug 2015, Emre Hasegeli wrote:

 Are the coverage operatons just that expensive?


 They shouldn't be.  A similar query like yours works in 0.5 second on my
 laptop:

 [snip]

 I get the same from your testcase.

 Maybe, something we haven't expected about your dataset causes a
 performance regression on the index.  Did you see anything relevant on
 the server logs on index creation time?


 I tried dropping and re-creating the index. The only log entry was for the
 drop statement.

 The distribution of the data is not uniform like the data set you produce.
 Though I find it hard to believe that it would affect this as much.

 select masklen(route), count(*) from routes group by masklen(route);


Any chance you can share the actual underlying data?  I noticed it wasn't
on github, but is that because it is proprietary, or just because you don't
think it is interesting?


 irrexplorer= explain analyze select routes.route from routes join hmm on
 routes.route  hmm.route;
QUERY PLAN

 
  Nested Loop  (cost=0.41..511914.27 rows=2558 width=7) (actual
 time=8.096..17209.778 rows=8127 loops=1)
-  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
 time=0.010..0.609 rows=732 loops=1)
-  Index Only Scan using route_gist on routes  (cost=0.41..470.32
 rows=22900 width=7) (actual time=4.823..23.502 rows=11 loops=732)
  Index Cond: (route  (hmm.route)::inet)
  Heap Fetches: 0
  Planning time: 0.971 ms
  Execution time: 17210.627 ms
 (7 rows)


If you loop over the 732 rows yourself, issuing the simple query against
each retrieved constant value:

explain (analyze,buffers) select routes.route from routes where route  $1

Does each one take about the same amount of time, or are there some outlier
values which take much more time than the others?

Cheers,

Jeff


Re: [PERFORM] query not using GIN index

2015-08-24 Thread Jeff Janes
On Mon, Aug 24, 2015 at 8:18 AM, Guo, Yun y...@cvent.com wrote:



 From: Jeff Janes jeff.ja...@gmail.com
 Date: Friday, August 21, 2015 at 10:44 PM
 To: Yun y...@cvent.com
 Subject: Re: [PERFORM] query not using GIN index

 On Fri, Aug 21, 2015 at 6:55 PM, Guo, Yun y...@cvent.com wrote:

 Hi,

 We have a query on a column with GIN index, but query plan chooses not
 using the index but do an seq scan whichi is must slower

 CREATE INDEX idx_access_grants_on_access_tokens ON access_grants USING
 gin (access_tokens);

 explain analyze SELECT access_grants.* FROM access_grants  WHERE
 (access_tokens @ ARRAY['124e5a1f9de325fc176a7c89152ac734']) ;
 QUERY PLAN

 --
  Limit  (cost=0.00..7.46 rows=1 width=157) (actual time=260.376..260.377
 rows=1 loops=1)
-  Seq Scan on access_grants  (cost=0.00..29718.03 rows=3985
 width=157) (actual time=260.373..260.373 rows=1 loops=1)
  Filter: (access_tokens @
 '{124e5a1f9de325fc176a7c89152ac734}'::text[])
  Rows Removed by Filter: 796818
  Total runtime: 260.408 ms



 What version are you running?  What are your non-default configuration
 settings (particularly for the *_cost parameters)?

  select name,setting from pg_settings where name like '%cost';
  name | setting
 --+-
  cpu_index_tuple_cost | 0.005
  cpu_operator_cost| 0.0025
  cpu_tuple_cost   | 0.01
  random_page_cost | 4
  seq_page_cost| 1



OK, thanks.  I had overlooked the LIMIT in the first plan you posted, and
so thought you must have some pretty weird settings.  But noticing the
LIMIT, it makes more sense with the normal settings, like the ones you show.



 Can you turn track_io_timing on and then report a explain (analyze,
 buffers) of the same query?

 I didn’t try this as our prod instance is on AWS and setting this would
 require a reboot.


OK, but you can still do an explain (analyze,buffers).  It is less useful
than with track_io_timing on, but it is still more useful than just
explain analyze.



 Then do a set enable_seqscan=off and repeat.

 This is the life saver!  After applying this, it’s able to use the index.
 But should we consider it as the permanent solution?


No, probably not a permanent solution.  Or at least, I only do things like
that in production as a last resort.  I suggested doing that so you can
force it to use the index and so see what the explain (analyze,buffers)
look like when it does use the index.  Sorry for not being more clear.

The seq scan thinks it is going to find a matching row pretty early in the
scan and can stop at the first one, but based on Rows Removed by Filter:
796818 it isn't actually finding a match until the end.  There probably
isn't much you can do about this, other than not using a LIMIT.

The reason it thinks it will find a row soon is that it thinks 0.5% of the
rows meet your criteria.  That is default selectivity estimate it uses when
it has nothing better to use.  Raising the statistics target on the column
might help.  But I doubt it, because access tokens are probably nearly
unique, and so even the highest possible setting for statistics target is
not going get it to record MCE statistics.  See
https://commitfest.postgresql.org/6/323/ for a possible solution, but any
fix for that won't be released to production for a long time.


If your gin index has a large pending list, that will make the index scan
look very expensive.  vacuuming the table will clear that up.  Setting
fastupdate off for the index will prevent it growing again.  Based on your
description of most lists having 0 or 1 element in them, and my assumption
that a table named access_grants isn't getting updated hundreds of times
a second, I don't think fast_update being off is going to cause any
problems at all.

Cheers,

Jeff


Re: [PERFORM] Strange query stalls on replica in 9.3.9

2015-08-18 Thread Jeff Janes
On Fri, Aug 14, 2015 at 9:54 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Fri, Aug 14, 2015 at 9:34 AM, Josh Berkus j...@agliodbs.com wrote:


 On 08/13/2015 01:59 PM, Jeff Janes wrote: execute on the

  Once the commit of the whole-table update has replayed, the problem
  should go way instantly because at that point each backend doing the
  seqscan will find the the transaction has committed and so will set the
  hint bit that means all of the other seqscan backends that come after it
  can skip the proc array scan for that tuple.

 Yes ... and given that the commit on the master took  3 seconds, it's
 not likely to take 30 seconds on the replica.  That aside, the pattern
 of behavior does look similar to the planner issue.


 Another thought.  Who actually sets the hint bits on a replica?

 Do the read-only processes on the replica which discovers a tuple to have
 been securely committed set the hint bits?

 My benchmarking suggests not.


The hint bits only get set if the commit lsn of the transaction of the
tuple being hinted (*not* the page lsn) thinks it has already been flushed
to WAL. On master the transaction commit record usually would have already
flushed its own WAL, or if async then wal writer is going to take care of
this fairly soon if nothing else gets to it first.

On the standby, it looks like the only thing that updates the
thinks-it-has-been-flushed-to marker (which is stored in the control file,
rather than memory) is either the eviction of a dirty buffer, or the
completion of a restartpoint.  I could easily be wrong on that, though.

In any case, you empirically can have committed but unhintable tuples
hanging around for prolonged amounts of time on the standby.  Perhaps
standbys need a wal writer process.

Cheers,

Jeff


Re: [PERFORM] Strange query stalls on replica in 9.3.9

2015-08-14 Thread Jeff Janes
On Fri, Aug 14, 2015 at 9:34 AM, Josh Berkus j...@agliodbs.com wrote:


 On 08/13/2015 01:59 PM, Jeff Janes wrote: execute on the

  Once the commit of the whole-table update has replayed, the problem
  should go way instantly because at that point each backend doing the
  seqscan will find the the transaction has committed and so will set the
  hint bit that means all of the other seqscan backends that come after it
  can skip the proc array scan for that tuple.

 Yes ... and given that the commit on the master took  3 seconds, it's
 not likely to take 30 seconds on the replica.  That aside, the pattern
 of behavior does look similar to the planner issue.


Another thought.  Who actually sets the hint bits on a replica?

Do the read-only processes on the replica which discovers a tuple to have
been securely committed set the hint bits?

My benchmarking suggests not.

Or does it wait for the hint bits to get set on master, and then for a
checkpoint to occur on the master, and then for that page to get changed
again and FPW to the log, and then for the log to get replayed?  If so,
that explains why the issue doesn't clear up on the replica immediately
after the commit gets replayed.




  So perhaps the commit of the whole-table update is delayed because the
  startup process as also getting bogged down on the same contended lock?
  I don't know how hard WAL replay hits the proc array lock.

 I don't know; we don't have any visibility into the replay process, and
 no way to tell if replay is waiting on some kind of lock.  A regular
 UPDATE should not block against any select activity on the replay, though.

 Also, why would this affect *only* the query which does seq scans?  Is
 there some difference between seqscan and index scan here, or is it
 simply because they take longer, and since this issue is timing-based,
 they're more likely to be hit?


An index scan only has to check the commit status of rows which meet the
index quals, which is presumably a small fraction of the rows.

A seq scan checks the visibility of every row first, before checking the
where clause.

Cheers,

Jeff


Re: [PERFORM] Strange query stalls on replica in 9.3.9

2015-08-13 Thread Jeff Janes
On Thu, Aug 13, 2015 at 10:09 AM, Josh Berkus j...@agliodbs.com wrote:

 Setup:

 * PostgreSQL 9.3.9
 * 1 master, 1 replica
 * Tiny database, under 0.5GB, completely cached in shared_buffers
 * 90% read query traffic, which is handled by replica
 * Traffic in the 1000's QPS.

 The wierdness:

 Periodically the master runs an update all rows query on the main
 table in the database.  When this update hits the replica via
 replication stream, *some* (about 5%) of the queries which do seq scans
 will stall for 22 to 32 seconds (these queries normally take about
 75ms).  Queries which do index scans seem not to be affected.

 Thing is, the update all rows only takes 2.5 seconds to execute on the
 master. So even if the update is blocking the seq scans on the replica
 (and I can't see why it would), it should only block them for  3 seconds.

 Anyone seen anything like this?


Sounds like another manifestation of this:  [PERFORM] Planner performance
extremely affected by an hanging transaction (20-30 times)?

http://www.postgresql.org/message-id/CAMkU=1yy-YEQVvqj2xJitT1EFkyuFk7uTV_hrOMGyGMxpU=n...@mail.gmail.com


Each backend that does a seqscan, for each tuple it scans which is not yet
resolved (which near the end of the bulk update is going to be nearly equal
to 2*reltuples, as every tuple has both an old and a new version so one
xmax from one and one xmin from the other must be checked), it has to lock
and scan the proc array lock to see if the tuple-inserting transaction has
committed yet. This creates profound contention on the lock.  Every
scanning backend is looping over every other backend for every tuple

Once the commit of the whole-table update has replayed, the problem should
go way instantly because at that point each backend doing the seqscan will
find the the transaction has committed and so will set the hint bit that
means all of the other seqscan backends that come after it can skip the
proc array scan for that tuple.

So perhaps the commit of the whole-table update is delayed because the
startup process as also getting bogged down on the same contended lock?  I
don't know how hard WAL replay hits the proc array lock.

Cheers,

Jeff


Re: [PERFORM] Are many idle connections bad?

2015-07-25 Thread Jeff Janes
On Sat, Jul 25, 2015 at 7:50 AM, Craig James cja...@emolecules.com wrote:

 The canonical advice here is to avoid more connections than you have CPUs,
 and to use something like pg_pooler to achieve that under heavy load.

 We are considering using the Apache mod_perl fast-CGI system and perl's
 Apache::DBI module, which caches persistent connections in order to improve
 performance for lightweight web requests. Due to the way our customers are
 organized (a separate schema per client company),


And presumably with a different PostgreSQL user to go with each schema?


 it's possible that there would be (for example) 32 fast-CGI processes,
 each of which had hundreds of cached connections open at any given time.
 This would result in a thousand or so Postgres connections on a machine
 with 32 CPUs.


Why would it need so many cached connections per fast-CGI process?  Could
you set up affinity so that the same client (or at least the same web
session) usually ends up at the same fast-CGI process (when it is
available), so the other fast-CGI processes don't need to cache DBI
connections for every DB user, but just for the ones they habitually serve?



 But, Apache's fast-CGI mechanism allows you to specify the maximum number
 of fast-CGI processes that can run at one time; requests are queue by the
 Apache server if the load exceeds this maximum. That means that there would
 never be more than a configured maximum number of active connections; the
 rest would be idle.

 So we'd have a situation where there there could be thousands of
 connections, but the actual workload would be throttled to any limit we
 like. We'd almost certainly limit it to match the number of CPUs.

 So the question is: do idle connections impact performance?


In my hands, truly idle connections are very very cheap, other than the
general overhead of a having a process in the process table and some local
memory.  Where people usually run into trouble are:

1) that the idle connections are only idle normally, and as soon as the
system runs into trouble the app starts trying to use all of those
usually-idle connections.  So you get increased use at the exact moment
when you can't deal with it--when the system is already under stress.  It
sounds like you have that base covered.

2) That the idle connections are idle in transaction, not truly idle, and
this causes a variety of troubles, like vacuum not working effectively and
hint bits that are permanently unsettable.

2b) A special case of 2 is that transaction has inserted a bunch of
uncommitted tuples and then gone idle (or is just doing some other time
consuming things) before either committing them or rolling them back.  This
can create an enormous amount of contention the proclock, as every process
which stumbles across the tuple then has to ask every other active process
Is this your tuple?  Are you done with it?.  This could be particularly
problematic if for example you are bulk loading a vendor catalog in a
single transaction and therefore have a bunch of uncommitted tuples that
are hanging around for along time.

If you have reasonably good load generator, it is pretty easy to spin up a
bunch of idle connections and see what happens on your own hardware with
your own workload and your own version of PostgreSQL.

Cheers,

Jeff


Re: [PERFORM] bitmap heap scan recheck for gin/fts with no lossy blocks

2015-07-24 Thread Jeff Janes
On Fri, Jul 24, 2015 at 2:40 PM, Laurent Debacker deback...@gmail.com
wrote:


The Recheck Cond line is a plan-time piece of info, not a run-time piece.
 It only tells you what condition is going to be rechecked if a recheck is
 found to be necessary.


Thanks Jeff! That makes sense indeed.

 I'm a bit surprised a COUNT(1) would need a bitmap heap scan since we know
 the row count from the index, but okay.


Gin indexes do not (yet) implement index only scans.  It has to visit the
block to check the visibility of the rows, as visibility data is not stored
in the index.

Cheers,

Jeff


Re: [PERFORM] bitmap heap scan recheck for gin/fts with no lossy blocks

2015-07-23 Thread Jeff Janes
On Thu, Jul 23, 2015 at 9:58 AM, Laurent Debacker deback...@gmail.com
wrote:

 Hi,

 I have read that GIN indexes don't require a recheck cond for full text
 search as long as work_mem is big enough, otherwise you get lossy blocks,
 and the recheck cond.

 In my case, I have no lossy blocks (from what I could tell), but I do have
 a recheck...

 EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(1) FROM enterprises WHERE fts @@
 'activ'::tsquery

 Aggregate  (cost=264555.07..264555.08 rows=1 width=0) (actual
 time=25813.920..25813.921 rows=1 loops=1)
   Buffers: shared hit=1 read=178192
   -  Bitmap Heap Scan on enterprises  (cost=5004.86..263202.54
 rows=541014 width=0) (actual time=170.546..25663.048 rows=528376 loops=1)
 Recheck Cond: (fts @@ '''activ'''::tsquery)
 Heap Blocks: exact=178096
 Buffers: shared hit=1 read=178192
 -  Bitmap Index Scan on enterprises_fts_idx  (cost=0.00..4869.61
 rows=541014 width=0) (actual time=120.214..120.214 rows=528376 loops=1)
   Index Cond: (fts @@ '''activ'''::tsquery)
   Buffers: shared hit=1 read=96
 Planning time: 2.383 ms
 Execution time: 25824.476 ms

 Any advice would be greatly appreciated. I'm running PostgreSQL 9.4.1.


The Recheck Cond line is a plan-time piece of info, not a run-time piece.
It only tells you what condition is going to be rechecked if a recheck is
found to be necessary.

It doesn't indicate how many times it was found it to be necessary to do
the recheck.  Presumably that number was zero.

Cheers,

Jeff


Re: [PERFORM] How to find the culprit in server load spikes?

2015-07-22 Thread Jeff Janes
On Wed, Jul 22, 2015 at 5:50 AM, Moreno Andreo moreno.and...@evolu-s.it
wrote:

 Hi everyone,
 I host a Postgresql server on Ubuntu 12.04 and I am facing server load
 spikes (if I run top, it goes up to 25-30 on a 4-core system)...
 In some cases, I have to restart potgresql service because users call us
 complaining of the slowness, but in some cases I can leave things on their
 way and I see that after a bunch of minutes (about 5-10) the situations
 drops to the normality (0.50-2 load).
 The problem is, as in the most cases, the I/O, but I need a small hand to
 know some methods or tools that can help me to investigate who or what is
 causing me these spikes.


I always run systems starting out with logging cranked up to at least these
settings:

log_checkpoints = on
log_lock_waits = on
shared_preload_libraries = 'pg_stat_statements,auto_explain'
auto_explain.log_min_duration = '10s'
track_io_timing = on
log_autovacuum_min_duration = 1000
log_min_duration_statement = 1000 ## or less

In particular, you would want to see what the reported sync time is for
the checkpoint, and whether the slowness (as shown by the frequency of
statement min duration log events) is occurring in a pattern around the
beginning and end of a checkpoint.

I'd also set up vmstat to run continuously capturing output to a logfile
with a timestamp, which can later be correlated to the postgres log file
entries.

Cheers,

Jeff


Re: [PERFORM] QUERY PLANNER - Indexe mono column VS composite Index

2015-07-10 Thread Jeff Janes
On Fri, Jul 10, 2015 at 2:34 AM, Nicolas Paris nipari...@gmail.com wrote:



 =3: without a constraint on tval_char = seq
 scan


  EXPLAIN ANALYSE select  f.patient_num
 from i2b2data_multi_nomi.observation_fact f
 where
 f.concept_cd IN (select concept_cd from
  i2b2data_multi_nomi.concept_dimension   where concept_path LIKE
 '\\i2b2\\cim10\\A00-B99\\%')
   AND  (  modifier_cd = '@'  AND  valtype_cd = 'T'  )
 group by  f.patient_num ;


   QUERY PLAN


 ---
 HashAggregate  (cost=1305637.84..1305688.23 rows=5039 width=4) (actual
 time=22689.279..22694.583 rows=16865 loops=1)
Group Key: f.patient_num
-  Hash Join  (cost=4760.13..1297561.67 rows=3230468 width=4) (actual
 time=12368.418..22674.145 rows=33835 loops=1)
  Hash Cond: ((f.concept_cd)::text =
 (concept_dimension.concept_cd)::text)
  -  Seq Scan on observation_fact f  (cost=0.00..1280362.92
 rows=3230468 width=14) (actual time=0.226..22004.808 rows=3195625 loops=1)
Filter: (((modifier_cd)::text = '@'::text) AND
 ((valtype_cd)::text = 'T'::text))
Rows Removed by Filter: 41423695
  -  Hash  (cost=4748.64..4748.64 rows=919 width=10) (actual
 time=46.833..46.833 rows=925 loops=1)
Buckets: 1024  Batches: 1  Memory Usage: 39kB
-  HashAggregate  (cost=4739.45..4748.64 rows=919
 width=10) (actual time=46.196..46.515 rows=925 loops=1)
  Group Key: (concept_dimension.concept_cd)::text
  -  Seq Scan on concept_dimension
  (cost=0.00..4734.73 rows=1892 width=10) (actual time=18.899..45.800
 rows=925 loops=1)
Filter: ((concept_path)::text ~~
 '\\i2b2\\cim10\\A00-B99\\%'::text)
Rows Removed by Filter: 186413
  Planning time: 1.940 ms
  Execution time: 22695.913 ms

 What I would like is the planner allways hit of_idx_modifier


What does the above explain analyze query give when you have an index on
just modifier_cd, or maybe on both (modifier_cd, valtype_cd)?

Your original email said it uses the index in that case, but we would need
to see the numbers in the query plan in order to figure out why it is doing
that.

It seems like that the tval_char   IN ('DP') part of the restriction is
very selective, while the other two restrictions are not.

Cheers,

Jeff


Re: [PERFORM] pg_stat_all_indexes understand

2015-07-09 Thread Jeff Janes
On Thu, Jul 9, 2015 at 5:20 AM, Nicolas Paris nipari...@gmail.com wrote:

 Hello,

 I wonder how understanding pg_stat_all_indexes working

 When I run an explain, some index are not used, but
 pg_stat_all_indexes.idx_scan is incremented for those indexes.


When the planner considers using a merge join on a indexed column, it uses
an index to check the endpoints of the column (the min and the max) to make
sure it has the latest values to get the most accurate estimate.  This
causes the usage counts to get incremented.  Even when it doesn't end up
using the merge join.


 Does this mean idx_scan is incremented each time the planner check if an
 index could be use whenever it won't use it ?


Not in general, only in a few peculiar cases.

Cheers,

Jeff


Re: [PERFORM] Techniques to Avoid Temp Files

2015-06-19 Thread Jeff Janes
On Thu, Jun 18, 2015 at 12:38 PM, Duane Murphy duane.mur...@gmail.com
wrote:

 We are trying to improve performance by avoiding the temp file creation.

 LOG:  temporary file: path base/pgsql_tmp/pgsql_tmp8068.125071, size
 58988604
 STATEMENT: SELECT iiid.installed_item__id, item_detail.id,
 item_detail.model_id, item_detail.type
 FROM installed_item__item_detail AS iiid
 INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id
 INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id )
 INNER JOIN model ON (item.id = model.item__id AND model.id = $1)

 Our hypothesis is that the temp file creation is caused by the high row
 count of the
 installed_item__item_detail table.

 installed_item__item_detail: 72916824 rows (27 GB)
 item_detail: 59212436 rows (40 GB)

 The other two tables, item and model, are temporary tables created during
 this particular process. Unfortunately, I don't have those table sizes.


Those temporary tables aren't providing any output to the query, so their
only role must be to restrict the rows returned by the permanent tables.
If they restrict that by a lot, then it could do a nested loop over the
temp tables, doing indexed queries against the permanent tables assuming
you have the right indexes.

Temporary tables do not get analyzed automatically, so you should probably
run ANALYZE on them explicitly before this big query.



 What additional information can I gather in order have a better
 understanding of how to improve this query?


What indexes do the tables have?  What is the output of EXPLAIN, or better
yet EXPLAIN (ANALYZE,BUFFERS), for the query?

Cheers,

Jeff


Re: [PERFORM] unlogged tables

2015-04-14 Thread Jeff Janes
On Tue, Apr 14, 2015 at 8:41 AM, Yves Dorfsman y...@zioup.com wrote:

 On 2015-04-13 17:49, Jeff Janes wrote:
 
  One way would be to lock dirty buffers from unlogged relations into
  shared_buffers (which hardly seems like a good thing) until the start of
 a
  super-checkpoint and then write them all out as fast as possible
 (which kind
  of defeats checkpoint_completion_target).  And then if the crash happened
  during a super-checkpoint, the data would still be inconsistent and need
 to be
  truncated.
 

 What do you call a super-checkpoint?


A hypothetical checkpoint which includes writing and flushing pages of
unlogged tables.

Presumably you wouldn't want every checkpoint to do this, because if done
the way I described the super-checkpoint is a vulnerable period.  Crashes
that happen during it would result in truncation of the unlogged relation.
Since that is the very thing we want to avoid, you would want to make these
vulnerable periods rare.

Cheers,

Jeff


  1   2   3   4   5   >