from:"Tomas Vondra"

Re: Performance degradation after upgrading from 9.5 to 14

2024-04-17 Thread Tomas Vondra

On 4/17/24 19:13, Johnathan Tiamoh wrote:
> Hello,
> 
> 
> I performed an  upgrade from postgresql-9.5 to postgresql-14 and the
> performance has degraded drastically.
> 
> Please, is they any advice on getting performance back ?
> 

There's very little practical advice we can provide based on this
report, because it's missing any useful details. There's a number of
things that might have caused this, but we'd have to speculate.

For example:

1) How did you upgrade? pg_dump or pg_upgrade?

2) Did you run ANALYZE to collect statistics after the upgrade?

3) Did you transfer the configuration, or did you just create a new
cluster with the default values?

4) What exactly is slower? Queries? Inserts?

5) Can you quantify the impact? Is it 2x slower? 100x slower?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: How do you optimize the disk IO when you cannot assume a file will start at a boundary then?

2024-02-22 Thread Tomas Vondra

On 2/22/24 02:22, Siddharth Jain wrote:
> Hi All,
> 
> I understand the storage layer in databases goes to great lengths to ensure:
> - a row does not cross a block boundary
> - read/writes/allocation happen in units of blocks
> etc. The motivation is that at the OS level, it reads and writes pages
> (blocks), not individual bytes. I am only concerned about SSDs but I think
> the principle applies to HDD as well.
> 
> but how can we do all this when we are not even guaranteed that the
> beginning of a file will be aligned with a block boundary? refer this
> <https://stackoverflow.com/questions/8018449/is-it-guaranteed-that-the-beginning-of-a-file-is-aligned-with-pagesize-of-file-s>
> .
> 
> Further, I don't see any APIs exposing I/O operations in terms of blocks.
> All File I/O APIs I see expose a file as a randomly accessible contiguous
> byte buffer. Would it not have been easier if there were APIs that exposed
> I/O operations in terms of blocks?
> 
> can someone explain this to me?
> 

The short answer is that this is well outside our control. We do the
best we can - split our data files to "our" 8kB pages - and hope that
the OS / filesystem will do the right thing to map this to blocks at the
storage level.

The filesystems do the same thing, to some extent - they align stuff
with respect to the beginning of the partition, but if the partition
itself is not properly aligned, that won't really work.

As for the APIs, we work with what we have in POSIX - I don't think
there are any APIs working with blocks, and it's not clear to me how
would it fundamentally differ from the APIs we have now. Moreover, it's
not really clear which of the "block" would matter. The postgres 8kB
page? The filesytem page? The storage block/sector size?

FWIW I think for SSDs this matters way more than for HDD, because SSDs
have to erase the space before a rewrite, which makes it much more
expensive. But that's not just about the alignment, but about the page
size (with smaller pages being better).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Intermittent Issue with WAL Segment Removal in Logical Replication

2023-12-29 Thread Tomas Vondra

On 12/29/23 22:28, Kaushik Iska wrote:
> I am unfortunately not really familiar with Google Cloud SQL internals
> as well. But we have seen this happen on Amazon RDS as well.
> 

Do you have a reproducer for regular Postgres?

> Could it be possible that we are requesting a future WAL segment, say
> WAL upto X is written and we are asking for X + 1? It could be that the
> error message is misleading.
> 

I don't think that should be possible. The LSN in the START_REPLICATION
comes from the replica, where it's tracked as the last LSN received from
the upstream. So that shouldn't be in the future. And it's doesn't seem
to be suspiciously close to segment boundary either.

In fact, the LSN in the message is 6/5AE67D79, but the "failed" segment
is 000100060059, which is the *preceding* one. So it can't
be in the future.

> I do not have the information from pg_replication_slots as I have
> terminated the test. I am fairly certain that I can reproduce this
> again. I will gather both the restart_lsn and contents of pg_wal for the
> failed segment. Is there any other information that would help debug
> this further?
> 

Hard to say. The best thing would be to have a reproducer script, ofc.
If that's not possible, the information already requested seems like a
good start.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Intermittent Issue with WAL Segment Removal in Logical Replication

2023-12-29 Thread Tomas Vondra

On 12/27/23 16:31, Kaushik Iska wrote:
> Hi all,
> 
> I'm including additional details, as I am able to reproduce this issue a
> little more reliably.
> 
> Postgres Version: POSTGRES_14_9.R20230830.01_07
> Vendor: Google Cloud SQL
> Logical Replication Protocol version 1
> 

I don't know much about Google Cloud SQL internals. Is it relatively
close to Postgres (as e.g. RDS) or are the internals very different /
modified for cloud environments?

> Here are the logs of attempt succeeding right after it fails:
> 
> 2023-12-27 01:12:40.581 UTC [59790]: [6-1] db=postgres,user=postgres
> STATEMENT:  START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
> 6/5AE67D79 (proto_version '1', publication_names
> 'peerflow_pub_wal_testing_2') <- FAILS
> 2023-12-27 01:12:41.087 UTC [59790]: [7-1] db=postgres,user=postgres
> ERROR:  requested WAL segment 000100060059 has already been
> removed
> 2023-12-27 01:12:44.581 UTC [59794]: [3-1] db=postgres,user=postgres
> STATEMENT:  START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
> 6/5AE67D79 (proto_version '1', publication_names
> 'peerflow_pub_wal_testing_2')  <- SUCCEEDS
> 2023-12-27 01:12:44.582 UTC [59794]: [4-1] db=postgres,user=postgres
> LOG:  logical decoding found consistent point at 6/5A31F050
> 
> Happy to include any additional details of my setup.
> 

I personally don't see how could this fail and then succeed, unless
Google does something smart with the WAL segments under the hood. Surely
we try to open the same WAL segment (given the LSN is the same), so how
could it not exist and then exist?

As Ron already suggested, it might be useful to see information for the
replication slot peerflow_slot_wal_testing_2 (especially the restart_lsn
value). Also, maybe show the contents of pg_wal (especially for the
segment referenced in the error message).

Can you reproduce this outside Google cloud environment?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: How to grant read only functions execute permission to read only user

2023-07-17 Thread Tomas Vondra

On 7/17/23 16:11, Yi Sun wrote:
> Hello guys,
> 
> Our read only user is okay to be granted read only permission of tables.
> 
> How to grant read only functions execute permission to read only user,
> is there a simple way to do it please? If not, how to get the list of
> read only functions please? Then can grant one by one based on the list,
> thanks

What is read-only function? I don't think Postgres has anything like
that. Functions inherit the privileges of the user that executes them by
default. So if the user is read-only (i.e. has just SELECT privilege),
then the function can't do any writes either.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Query take a long time and use no index

2023-07-17 Thread Tomas Vondra

On 7/17/23 13:20, Tomas Vondra wrote:
> ...
> 
> It's always going to be slow with the COUNT(DISTINCT), I'm afraid.
> 
> Not sure how much you can modify the query / database, and how accurate
> results you need. If you're OK with estimates, you can try postgres-hll
> extension [2] which estimates count(distinct). For exact reaults, I
> wrote count_distinct extension [2] that uses hashtable. Might be worth a
> try, I guess.
> 

Damn, I forgot to add the links:

[1] https://github.com/citusdata/postgresql-hll

[2] https://github.com/tvondra/count_distinct

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Query take a long time and use no index

2023-07-17 Thread Tomas Vondra

137..2121.695 rows=5962263 loops=3)
>  Planning Time: 0.247 ms
>  JIT:
>    Functions: 11
>    Options: Inlining false, Optimization false, Expressions true,
> Deforming true
>    Timing: Generation 2.665 ms, Inlining 0.000 ms, Optimization 1.612
> ms, Emission 48.580 ms, Total 52.857 ms
>  Execution Time: 3525.656 ms
> (12 rows)
> 
> Why are the indexes not used ?

It's a common misconception that indexes necessarily improve performance
and that slow queries are slow because of missing indexes. In practice
indexes are most useful for selective queries, i.e. queries that scan
only a fraction of the table - say, a couple percent.

But this query scans ~17M rows from the table, and my guess is that's a
significant part of the table. In which case the index is useless.

Also, notice the seqscan took ~4s. Even if that drops to ~1s with an
index, the overall query will still take 12s.

> Is it a Problem of the DATE_TRUNC ?

My guess is it's more about the DISTINCT, which forces a sort before the
aggregate function. Which also means the query can't be parallel. You
can try simplifying the query step by step - remove the DISTINCT first,
then the DATE_TRUNC.

> How can I optimize?
> 

It's always going to be slow with the COUNT(DISTINCT), I'm afraid.

Not sure how much you can modify the query / database, and how accurate
results you need. If you're OK with estimates, you can try postgres-hll
extension [2] which estimates count(distinct). For exact reaults, I
wrote count_distinct extension [2] that uses hashtable. Might be worth a
try, I guess.

Another option is to precalculate stuff, and do queries on that. That
is, you're not going to add data with old timestamps, so you can
pre-aggregate that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [Beginner Question]What is query jumbling?

2023-07-17 Thread Tomas Vondra

On 7/17/23 05:23, Wen Yi wrote:
> Hi community,
> I am reading the 'Generate code for query jumbling through
> gen_node_support.pl'
> (https://www.postgresql.org/message-id/E1pMk51-000puf-55%40gemulon.postgresql.org)
>  to understand the principle of the postgres's node, but I really don't know 
> the meaning of the word 'query jumbling'.
> 
> Can someone provide me some advice?
> Thanks in advance!
> 

Stuff like this is usually explained somewhere in the code base, so "git
grep" is your friend. In this case it's probably good to read the
comment in src/backend/nodes/queryjumblefuncs.c which says:

 * Normalization is implemented by fingerprinting queries, selectively
 * serializing those fields of each query tree's nodes that are judged
 * to be essential to the query. This is referred to as a query jumble.

So "jumbling" is a process to generate a "normalized" query.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: move databases from a MySQL server to Postgresql.

2023-06-23 Thread Tomas Vondra

On 6/23/23 13:45, Alfredo Alcala wrote:
> Hello
> 
> I need to move some databases from a MySQL server to Postgresql.
> 
> Can someone tell me the migration procedure, tools and recommendations?
> 

I'm not an expect on this, but migrations tend to be somewhat
application-specific. I'd suggest you take a look at this wiki page:

https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL#MySQL

and maybe try some of the tools mentioned there (pgloader, mysql2pgsql,
and so on).

You'll have to give it a try on your databases, and then ask questions
about practical issues you run ran into.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pb with join plan

2023-06-21 Thread Tomas Vondra

On 6/21/23 00:26, Marc Millas wrote:
> 
> 
> On Tue, Jun 20, 2023 at 11:19 PM David Rowley  <mailto:dgrowle...@gmail.com>> wrote:
> 
> On Wed, 21 Jun 2023 at 08:34, Marc Millas  <mailto:marc.mil...@mokadb.com>> wrote:
> >
> > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
> mailto:dgrowle...@gmail.com>> wrote:
> >>
> >> On Wed, 21 Jun 2023 at 07:42, Marc Millas  <mailto:marc.mil...@mokadb.com>> wrote:
> >> > But if I do the same with clause one OR clause 2, I have to 
> kill the request after an hour, seeing the filesystem showing more
> than 140 Mb of increased usage.
> >>
> >>
> > link to the anonymized plan of the req with one clause :
> https://explain.depesz.com/s/TWp4 <https://explain.depesz.com/s/TWp4>
> 
> link to the plan with the second
> clause alone: https://explain.depesz.com/s/byW5
> <https://explain.depesz.com/s/byW5> 
> link to the plan with both clauses ORed (the one not
> finishing) https://explain.depesz.com/s/jHO2
> <https://explain.depesz.com/s/jHO2>
> 
> 
> 
> It's quite difficult to know what the problem is you want to fix here.
> Your initial post indicated it was the query with the OR condition
> that was causing you the problems, but the plan you've posted has no
> OR condition?!
> 
> You're more likely to get help here if you take time to properly
> explain the situation and post the information that's actually
> relevant to the problem you're having, or state the problem more
> clearly, as there's a mismatch somewhere.
> 
> It might also be worth having a look at
> https://wiki.postgresql.org/wiki/Slow_Query_Questions
> <https://wiki.postgresql.org/wiki/Slow_Query_Questions> . EXPLAIN is not
> going to tell us what part of the query is slow. I'll let the wiki
> page guide you into what to do instead.
> 
>  
> I know that page. obviously, as I have to kill the request, I cannot
> provide a explain analyze... 
> 

It's a bit weird the "victor" table is joined seemingly without any join
conditions, leading to a cross join (which massively inflates the cost
for joins above it). Maybe the anonymized plan mangles it somehow.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Question regarding specifics of GIN and pg_trgm performance and potential use of show_trgm to improve it

2023-05-25 Thread Tomas Vondra

ndexing the number of trigrams like this can help, and
I'm not sure how to improve this (at least for the built-in GIN). It
seem similarity searches are bound to be proportional to the most
frequent trigram in the query.

I wonder if the "newer" GIN variants like RUM [1] could improve this,
but I don't think it has trgm opclasses.


regards

[1] https://github.com/postgrespro/rum

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expr. extended stats are skipped with equality operator

2022-08-07 Thread Tomas Vondra

On 8/7/22 19:28, Tom Lane wrote:
> Tomas Vondra  writes:
>> Yeah, this seems like a clear bug - we should not fail queries like
>> this. It's a sign statext_is_compatible_clause() and the MCV code
>> disagrees which clauses are compatible.
> 
> Indeed.  I attempted to fix that at e33ae53dd.
> 

Thanks! That probably explains why I've been unable to reproduce that, I
haven't realized there's a fix already.



-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expr. extended stats are skipped with equality operator

2022-08-07 Thread Tomas Vondra




On 8/5/22 11:50, Danny Shemesh wrote:
> Hey all !
> 
> I'm on a quest to help the planner (on pg14) use the best of several
> partial, expressional indices we have on some large tables (few TBs in
> size, billions of records).
> 
> As we know, stats for expressions in partial indices aren't gathered by
> default - so I'm tinkering with expressional extended stats to cover for
> those.
> 
> I've tackled two interesting points there:
> 1. Seems like expressional stats involving the equality operator are
> skipped or mismatched (fiddle
> <https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/5379>)
> Let's take the following naive example:
> /create table t1 (x integer[]);
> insert into t1 select array[1]::integer[] from generate_series(1,
> 10, 1);
> create statistics s1 on (x[1] = 1) from t1;
> analyze t1;
> /
> /explain analyze select * from t1 where x[1] = 1;/
> /> Seq Scan on t1 (cost=0.00..1986.00 rows=500 width=29) (actual
> time=0.009..36.035 rows=10 loops=1)/
> /
> /
> Now, of course one can just create the stat on x[1] directly in this
> case, but I have a more complex use case where an equality operator is
> beneficial; 
> should the above case be supported ? feels like I'm just missing
> something fundamental.
> 

Hmmm. The problem here is that the expression may be interpreted either
as an operator clause (A=B), or as a boolean expression ((A=B)=TRUE). In
principle we might check which option matches extended statistics, but
that's not for free :-(

So what the current code does is trying to match the more specific cases
first, leaving the "bool expression" as a last resort. That seems like a
reasonable trade off to me, but I'll think about this a bit more.

There are probably other ways to match expressions, and we can't
possibly explore all of them. For example you may create statistics on
(x=1 AND y=2) and I doubt we'll match that, because we'll try matching
individual clauses not some arbitrary combinations of clauses. (Maybe we
shouldn't even allow creating such statistics ...)

> 2. Less important, just a minor note - feel free to ignore - although
> the eq. operator above seems to be skipped when matching the ext. stats,
> I can work around this by using a CASE expression (fiddle
> <https://www.db-fiddle.com/f/wJZNH1rNwJSo3D5aByQiWX/1>);
> Building on the above example, we can:
> /create statistics s2 on (case x[1] when 1 then true else false end)
> from t1;/
> /explain analyze select * from t1 where (case x[1] when 1 then true else
> false end/
> />  Seq Scan on t1 (cost=0.00..1986.00 rows=10 width=25) (actual
> time=0.011..33.721 rows=10 loops=1)/
> /

Yes, because this end ups not being matches as opclause, and therefore
goes all the way to the boolvarsel() in clause_selectivity_ext.

> /
> What's a bit problematic here, though, is that if we mix other dependent
> columns to the extended stat, and specifically if we create an mcv, 
> queries involving the CASE expression throw with `error: unknown clause
> type 130`, where clause type == T_CaseExpr.
> 
> The second point for me would be that I've found it a bit non intuitive
> that creating an extended statistic can fail queries at query time; it
> makes sense that the mcv wouldn't work for case expressions, but it
> might've been a bit clearer to:
> 
> a. Fail this at statistic creation time, potentially, or 
> b. Convert the type numeric in the above error to its text
> representation, if we can extract it out at runtime somehow - 
> I couldn't find a mapping of clause type numerics to their names, and as
> the node tags are generated at compile time, it could be build-dependent
> and a bit hard to track down if one doesn't control the build flags
> 

Yeah, this seems like a clear bug - we should not fail queries like
this. It's a sign statext_is_compatible_clause() and the MCV code
disagrees which clauses are compatible.

Can you share an example triggering this?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-14 Thread Tomas Vondra




On 7/14/22 21:25, Aleš Zelený wrote:
> 
> st 13. 7. 2022 v 2:20 odesílatel Michael Paquier  <mailto:mich...@paquier.xyz>> napsal:
> 
> On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
> > So far, it has happened three times (during a single week) from
> the 14.3 ->
> > 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
> >
> > Questions:
> > 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the
> instance
> > and reinstalling 14.3 PG packages (to prove, that the issue
> disappear)?
> > 2) What is the best way to diagnose what is the root cause?
> 
> Hmm.  14.4 has nothing in its release notes that would point to a
> change in the vacuum or autovacuum's code paths:
> https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
> <https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4>
> 
> There is nothing specific after a look at the changes as of, and I am
> not grabbing anything that would imply a change in memory context
> handling either:
> `git log --stat REL_14_3..REL_14_4`
> `git diff REL_14_3..REL_14_4 -- *.c`
> 
> Saying that, you should be able to downgrade safely as there are no
> changes in WAL format or such that would break things.  Saying that,
> the corruption issue caused by CONCURRENTLY is something you'd still
> have to face.
> 
> 
> Thanks, good to know that, we can use it for a test case, since we
> already hit the CONCURRENTLY bug on 14.3.
> 
> > 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app=
> ERROR:  out of
> > memory
> > 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app=
> DETAIL:  Failed
> > on request of size 152094068 in memory context
> "TopTransactionContext".
> > 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
> >  automatic vacuum of table "prematch.replication.tab_queue_tmp"
> 
> This is the interesting part.  Do you happen to use logical
> replication in a custom C++ plugin?
> 
> 
> We are using logical replication to other instances (pg_output) and
> decoderbufs
> https://github.com/debezium/postgres-decoderbufs
> <https://github.com/debezium/postgres-decoderbufs> for other applications.
>  

This is probably just a red herring - std:bad_alloc is what the process
that runs into the overcommit limit gets. But the real issue (e.g.
memory leak) is likely somewhere else - different part of the code,
different process ...

> ...
> 
> Checking the RssAnon from proc/pid/status I've found some points where
> RssAnon memory usage grew very steep for a minute, but no "suspicious"
> queries/arguments were found in the instance logfile.
>  
> Any hint, on how to get the root cause would be appreciated since so far
> I've failed to isolate the issue reproducible testcase.
> At least I hope that looking for the RssAnon process memory is an
> appropriate metric, if not, let me know and I'll try to update the
> monitoring to get the root cause.
> 
> I can imagine a workaround with client application regular reconnect...,
> but u to 14.3 it works, so I'd like to fix the issue either on our
> application side or at PG side if it is a PG problem.
> 

I think it's be interesting to get memory context stats from the
processes consuming a lot of memory. If you know which processes are
suspect (and it seems you know, bacause if a reconnect helps it's the
backend handling the connection), you can attach a debugger and do

   $ gdb -p $PID
   call MemoryContextStats(TopMemoryContext)

which will log info about memory contexts, just like autovacuum.
Hopefully that tells us memory context is bloated, and that might point
us to particular part of the code.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: increasing effective_cache_size slows down join queries by a factor of 4000x

2022-02-15 Thread Tomas Vondra

On 2/15/22 01:06, A Shaposhnikov wrote:

Interestingly I have a second PG 14.2 database, with identical table 
definitions, but about 10% smaller row counts, and the exact same query 
works fast there without the 2nd condition:

Are you sure about the 10%? Because in the plans from the first machine 
I see this:

>
> -> Index Scan using team_pkey on team t (cost=0.57..11382381.88
> rows=78693167 width=175) (actual time=0.016..0.695 rows=854 loops=1)
>

while the second machine does this:

-> Index Scan using team_pkey on team t (cost=0.57..2366113.83 
rows=2807531 width=160) (actual time=0.031..0.801 rows=888 loops=1)

That's 2.8M vs. 78M, quite far from "10% difference". Not sure about 
team_aliases table, that's imposible to say from the plans.

This may matter a lot, because we use effective cache size to calculate 
cache hit ratio for the query, with relation sizes as an input. So 
smaller relations (or larger effective_cache_size) means cheaper random 
I/O, hence preference for nested loop join.

The other thing is data distribution - that may matter too.

IMO it's pointless to investigate this further - we know what's causing 
the issue. The optimizer is oblivious that merge join will have to skip 
large part of the second input, due to the implicit condition. Notice 
that adding the condition changes the cost from:

 Limit (cost=81.33..331.82 rows=1000 width=183) ...

to

 Limit (cost=81.33..720.48 rows=1000 width=183) ...

So it seems *more* expensive than the first plan. Taken to the extreme 
the planner could theoretically have chosen to use the first plan (and 
delay the condition until after the join).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: increasing effective_cache_size slows down join queries by a factor of 4000x

2022-02-04 Thread Tomas Vondra


On 2/4/22 05:21, A Shaposhnikov wrote:

Tomas,

thank you! The query:

select 1 from data as d, data_class as dc
 where dc.data_id = d.id and d.id > 205284974
   and dc.data_id > 205284974 -- new condition
order by d.id
limit 1000;

totally solved it - it is now fast under all conditions! I thought
that the optimizer would be able to infer it itself.



Unfortunately, the optimizer is not that smart - we can do that for 
equality conditions, but not for other operators. There was actually a 
thread [1] exploring a possibility to extend this to inequalities, but 
it went nowhere so far. It also explains why it's done only for equality 
operators. In short, it's fairly expensive, makes costing of joins more 
difficult, and most queries can't benefit from it (because conditions on 
join keys are not that common).


BTW how does the final query plan look like? Is it using the merge sort 
of nested loop? I wonder if this might be formulated as a costing issue, 
pushing the planner to use the nested loop.



[1] 
https://www.postgresql.org/message-id/flat/CAFQUnFhqkWuPCwQ1NmHYrisHJhYx4DoJak-dV%2BFcjyY6scooYA%40mail.gmail.com



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: increasing effective_cache_size slows down join queries by a factor of 4000x

2022-02-02 Thread Tomas Vondra


On 2/2/22 22:10, Artyom Shaposhnikov wrote:

the row estimate became ~1000x smaller with the stat in place, so it
looks like it grossly miscalculates the query plans without the stats
for large tables representing M:M relations.



Well, if the estimates are significantly off (and 3 orders of magnitude 
certainly qualifies), then all bets are off. There's no magical option 
that'd fix planning in such conditions.


Ultimately, fixing the estimates (e.g. by creating extended statistics) 
is the right "fix" as it gives the optimizer the information needed to 
pick the right plan.



On Wed, Feb 2, 2022 at 11:47 AM Michael Lewis  wrote:


What does the row estimate look like on the scan of data table with that 
statistic in place? Anytime the stats give a mis-estimate this far off, I 
wouldn't expect that plans would be optimal except by luck.

Index Scan using data_pkey on data t (cost=0.57..21427806.53 rows=58785023 
width=131) (actual time=0.024..0.482 rows=854 loops=1)
Index Cond: (id > 205284974)




It'd be interesting to see the plans without the LIMIT, as that makes 
the "actual" values low simply by terminating early.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Create and access a dictionary type

2022-01-05 Thread Tomas Vondra


On 1/5/22 14:36, Shaozhong SHI wrote:
I just checked operators.  I could not find any operator to set a new 
value given a key.




I'd say jsonb_set is what you're looking for.

Postgres does not have a thing that'd match the Oracle feature exactly, 
you'll have to adjust your application to use what's available.


regards


--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Create and access a dictionary type

2022-01-05 Thread Tomas Vondra


On 1/5/22 17:24, Ron wrote:

On 1/5/22 7:22 AM, Tomas Vondra wrote:

On 1/5/22 14:17, Shaozhong SHI wrote:
Any examples in Postgres to create a dictionary type to store and 
access key value pairs?




I'd say JSONB can be used as an associative array, and e.g. in Python 
can map to dict data type.


The question confuses me.  Why isn't "create a table with Primary Key" 
the answer to "Any examples in Postgres to create a dictionary type to 
store and access key value pairs?"




Well, yes - you can always decompose the dict and store it in an EAV, 
but sometimes it's more convenient to just use "nested" dictionary at 
the row level. Which is what a JSONB column does.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Create and access a dictionary type

2022-01-05 Thread Tomas Vondra


On 1/5/22 14:17, Shaozhong SHI wrote:
Any examples in Postgres to create a dictionary type to store and access 
key value pairs?




I'd say JSONB can be used as an associative array, and e.g. in Python 
can map to dict data type.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Postgis - geography function

2022-01-05 Thread Tomas Vondra


On 1/5/22 09:48, Sameer TWAYANA wrote:

Hi Sir,

I need some help. I have been using the postgis extension for many 
years. Now in postgis version 3.2 using geom data in geography with 
ST_Length and ST_Distance function returns some weird results. Before I 
used 3.1.4, all was good. But I got different results in version 3.2. So 
can you suggest why I am getting different results in postgis version 3.2.


I really appreciate for your help. Thank you.



You haven't shown a lot of detail about the issue, so it's hard to say 
what's going on, unfortunately :-(


You'll need to show us some queries with expected/actual results (the 
simpler the better, of course). And provide info about the Postgres 
version, etc.


BTW if it's a problem related to PostGIS, maybe using a mailing list for 
that specific project [1] would yield better results.


[1] https://lists.osgeo.org/mailman/listinfo/postgis-users


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Plans with bad estimates/paths

2021-11-16 Thread Tomas Vondra

On 11/16/21 9:22 PM, Joe Wildish wrote:
> Hello,
> 
> I am struggling with a problem that appears planning related. I'm
> hoping folk here may be able to advise on how best to tackle the
> issue.
> 
> We have a system that ingests JSON messages containing order data.
> The data from these messages is inserted into a normalised table
> structure; "order", "order_item", "order_discount", etc.  Orders can
> have multiple items, items can have multiple discounts and so forth.
> Referential constraints exist between the tables in the expected way.
> Fields in the tables are all declared NOT NULL.  If the schema of the
> JSON is such that a field is optional, we reflect that optionality in
> another "order_" table, and make "order_" be a subset of
> "order".  Moreover, we ingest messages for different
> clients/customers.  Therefore each message-related table carries with
> it a client identifier which forms part of the primary key on the
> table.  For example, "order" has a key of "(client_id, order-id)".
> 
> We have written transformations that calculate various facts about
> orders.  For example, one of the transforms emits order item data
> where we have calculated the overall discount for the item, and have
> "filled in" some of the optional fields with defaults, and have
> categorised the order item on the basis of some aspect of the order
> (e.g. "this is an e-commerce order, this is retail order").  These
> transforms are typically per client (e.g. `WHERE client_id = 123`)
> although in some cases we transform over multiple clients (e.g.
> `WHERE client_id = ANY (SELECT client_id FROM clients WHERE ...)`).
> 
> The issue is that for some clients, or combination of clients, the
> planner is choosing a path that takes substantially longer to
> evaluate than the plan it typically chooses for other clients.  The
> number of tables being joined is in the region of 15.  There is an
> extended statistic object in place to help the one aggregation that
> occurs (defined on the basis of the `GROUP BY` columns) to try and
> get a better estimate of the likely number of rows emitted.  However,
> what I am often seeing in the explain plan is that the estimated rows
> is small and the actuals are significantly larger e.g.
> 
> Merge Join  (cost=1.14..253250.32 rows=1099 width=69) (actual
> time=1268.587..2400.353 rows=4282355 loops=1)
> 

It sure smells like a case of correlated data for some clients but not
others, but who knows ... Hard to say without seeing the nodes below the
join. If the lower nodes are estimated accurately, then it's the join
selectivity that is estimated poorly, and there's not much we can do
about it :-(

Do the "good" plans have the same underestimate? Maybe there's just much
less data for those clients, and the "poor" plan ends up being fast anyway?


> I am assuming this underestimation is the source of the planner
> choosing the "wrong" path; in production, we have had to resort to
> setting the join and from collapse limits to 1 to force a naive plan
> to be generated.  This is giving us execution times in the 10/20
> second range vs. >45m in some cases.
> 

That may be happening, yes. So is it the join order that ends up being
wrong, or the join methods? Have you tried increasing the collapse limit
instead? Although, if it works for some queries but not others, that's
likely not going to help.

> (a) Do you have any suggestions on a general approach to tackling the
> problem? For example, one option might be to pre-compute some of the
> subqueries that are occurring in the transforms, write the results
> into their own tables, and substitute those tables in place of the
> subqueries in the main transform. Is this something people typically
> do in this situation?
> 

The first thing I'd do is reduce the query size as much as possible. In
this case I'd try removing as many joins as possible until the issue
disappears. The simpler the query, the easier it is to investigate.

And yes, replacing parts of a query with a temporary table is a common
solution, because it's possible to collect statistics on it, build
indexes etc. That usually solves estimation issues in multi-tenancy.
Sometimes even a CTE with materialization is enough.

> (b) Do I need to provide a schema and explain plans to get any
> concrete advice on how to proceed?
> 

That would be helpful, particularly after making the query as small as
possible.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: GIN index

2021-10-13 Thread Tomas Vondra


Hi,

Why exactly are you starting a new thread again, instead of continuing 
in the thread you started a couple days ago?


A couple recommendations:

1) I find it unlikely you'll get a lot of help unless you provide a way 
to reproduce the issue easily. That is, something people can build and 
test on their own. Posting incomplete chunks of code means we'd have to 
spend time filling in the gaps, etc. Sorry, not going to happen.


Create an extension that we can build/install easily, include a SQL 
script reproducing the issue.



2) The code is incomplete, e.g. hex2char/char2hex are missing. And the 
calls seems to be the other way around. I'd expect geomgrid_in to 
transform hex string to binary, but it calls Char2Hex.



3) AFIAK there's nothing like CREATE FUNCTION ... NOT FENCED ..., so it 
seems you're using some sort of fork. In that case you should probably 
ask them.



4) Don't post backtraces as screenshots. Copy them as text, attach them 
as a text file.



5) Continue in the thread you already started, otherwise it's impossible 
to follow the discussion.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Avg/max size of these JSON docs in Postgres

2021-10-12 Thread Tomas Vondra

On 10/12/21 21:21, Michael Lewis wrote:
On Tue, Oct 12, 2021 at 12:51 PM Simon Riggs 
mailto:simon.ri...@enterprisedb.com>> wrote:

On Tue, 12 Oct 2021 at 18:53, Michael Lewis mailto:mle...@entrata.com>> wrote:
 >
 > On Tue, Oct 12, 2021 at 1:43 AM Simon Riggs
mailto:simon.ri...@enterprisedb.com>>
wrote:
 >>
 >> On Tue, 12 Oct 2021 at 08:14, Ram Pratap Maurya
 >> mailto:ram.mau...@lavainternational.in>> wrote:
 >>
 >> > Confirm what is Avg/max size of these JSON docs in Postgres.
 >>
 >> JSON and JSONB datatypes can both be max 1GB in size.
 >
 > That is per row.

No, that is per column.

Yes, sorry. My attempt at clarification only muddled things. Each column 
within each row can be up to 1GB in size is how I understand the limit.

But you're kinda right, actually. It very much is per-tuple, because in 
various places we form tuples with all the data inline. Consider for 
example this:

  create table t (a text, b text);
  alter table t alter column a set storage extended;
  alter table t alter column b set storage extended;

  insert into t select repeat(md5('a'), 512*1024*1024/32),
   repeat(md5('b'), 512*1024*1024/32);
  ERROR:  invalid memory alloc request size 1073741880

Clearly, both values are only 512MB (no compression). Yet it fails, 
simply because tts_virtual_copy_heap_tuple calls heap_form_tuple to form 
a tuple with all values and also the tuple header.

But this succeeds, because the values are 64B shorter, leaving enough 
space for the tuple header etc.

  insert into t select repeat(md5('a'), 512*1024*1024/32 - 2),
   repeat(md5('b'), 512*1024*1024/32);

And you can even select the data:

  select * from t;

You can even do this:

  update t set a = repeat(md5('a'), 512*1024*1024/32);

which works, so now you have a row with two 512MB values. But then 
you'll face this:

  select * from t;
  ERROR:  out of memory
  DETAIL:  Cannot enlarge string buffer containing 536870922 bytes by
   536870912 more bytes.

because printtup() builds a huge string with all the data (and if the 
columns had compression, this would be decompressed, because it goes to 
the client).

So yeah, there's an explicit 1GB limit per value, but having rows close 
to the 1GB limit is going to cause all sorts of unpredictable and rather 
painful issues :-(

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: create a new GIN index for my own type

2021-10-04 Thread Tomas Vondra


On 10/4/21 3:32 PM, huangning...@yahoo.com wrote:
I have already debugged the program according to this step, but I found 
that in the DataCopy function, the variable typlen should be 8, but it 
is -1,




Well, if you have debugged this, it'd be nice if you could share more 
information (e.g. backtraces, etc.) otherwise others can just guess what 
you saw. And that makes it much harder to help you.


I see you defined the data type as PASSEDBYVALUE, but you haven't 
specified INTERNALLENGTH, so it's -1 (i.e. variable length). Obviously, 
that can't be passed by value - not sure if this is intentional or just 
a case of CREATE TYPE not checking it.



BTW it's customary not to top post - inline replies are much easier to 
follow, as it makes clearer which parts you respond to. And please make 
sure that you're responding to the mailing list, not just directly to 
the other person.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Problem in pg_basebackup of 1.7 TB database while replicating in PostgreSQL 9.4.( Master and slave both are in postgreSQL 9.4)

2021-10-04 Thread Tomas Vondra


On 10/4/21 9:28 AM, Amal Chakravarty wrote:
Hi all. I am working with a database of 1.7 TB size which is in 
PostgreSQL 9.4. Kindly suggest how to take the pg_basebackup of such a 
huge data while replicating from master to slave.




And what exactly is the issue? Does it fail in some way or are you 
looking for a faster / more efficient way to transfer the data?



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: create a new GIN index for my own type

2021-10-04 Thread Tomas Vondra


On 10/4/21 8:30 AM, huangning...@yahoo.com wrote:

Hi:
I created a new data type, and then I wanted to create a GIN index for 
it, but when I created the index, the program would crash 。

The version of postgresql is 9.6。

The following is part of the code, and I also refer to the code of intarray.



I doubt anyone is going to investigate this unless you provide a more 
complete example - something like an extension where people can do "make 
install" without having to fill in various pieces of code.


To investigate the crash, you need to attach a debugger to the backend 
and run the CREATE INDEX (or whatever triggers the crash). The debugger 
should catch the segfault and you'll be able to identify where exactly 
it crashes and why (and investigate).


1) first get PID of the backend

  SELECT pg_backend_pid();

2) then attach a debugger to the backend

  gdb -p $PID
  (gdb) c

3) run the CREATE INDEX query

4) get backtrace from the debugger

  (gdb) bt


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ERROR: unrecognized node type

2021-09-30 Thread Tomas Vondra


On 9/30/21 8:38 AM, Amine Tengilimoglu wrote:

...
Remarkable thing  the related error occurs when executing sql
statements containing where.  The sqls that do not contain a where
are not getting an error. Location information as below;

ERROR:  XX000: unrecognized node type: 223
*LOCATION:  exprType, nodeFuncs.c:263*
STATEMENT:  SELECT n.nspname as "Schema",
           c.relname as "Name",
           CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view'
WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S'
THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign
table' WHEN 'p' THEN 'partitioned table' WHEN 'I' THEN 'partitioned
index' END as "Type",
           pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
         FROM pg_catalog.pg_class c
              LEFT JOIN pg_catalog.pg_namespace n ON n.oid =
c.relnamespace
         WHERE c.relkind IN ('r','p','v','m','S','f','')
               AND n.nspname <> 'pg_catalog'
               AND n.nspname <> 'information_schema'
               AND n.nspname !~ '^pg_toast'
           AND pg_catalog.pg_table_is_visible(c.oid)
         ORDER BY 1,2;



I'm unable to reproduce the issue, so it probably depends on what tables 
are created etc. But if you say it only happens with WHERE clause, 
that's interesting. It suggests the failure probably happens somewhere 
in transformWhereClause, but we can only speculate why and the query 
conditions look entirely reasonable.


I suggest you do this:

1) start a session, identify the PID of the backend

   select pg_backend_pid();

2) attach a debugger (e.g. gdb) to the pid

   gdb -p $PID

3) set breakpoint to the location in the error message

   (gdb) break nodeFuncs.c:263
   (gdb) continue

4) run the query, the breakpoint should be triggered

5) extract full backtrace

   (gdb) bt full

6) print the expression

   (gdb) p nodeToString(expr)


That should give us some hints about what might be wrong ...


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ERROR: unrecognized node type

2021-09-29 Thread Tomas Vondra





On 9/29/21 8:37 PM, Tom Lane wrote:

Adrian Klaver  writes:

Unless folks are using plain text only email readers. That is why
screenshots are generally a bad idea for textual information. For the
record the Postgres version is 12.8.


Hmm, if it's v12 then NodeTag 223 is T_List, which is ... a bit
surprising.  That's a common enough node type that "oversight"
doesn't seem to fly as an explanation.

I don't think we'll be able to make progress on this without a
lot more detail.  A self-contained example that triggers it
would be very useful; or maybe you could get a backtrace from
the point of the error?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend



A handy trick for similar cases (not mentioned in the wiki) is to set

SET log_error_verbosity = verbose;

and then trigger the error. This logs the exact location (file:line) 
where the error is printed, and then set a breakpoint to that place. 
Makes it easier to generate the backtrace.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: How to hash a large amount of data within Postgres?

2021-06-23 Thread Tomas Vondra

On 6/23/21 7:39 PM, Peter J. Holzer wrote:
> On 2021-06-21 15:53:09 +0200, Thorsten Schöning wrote:
>> Some years ago I implemented some SQL to read all files, build a table
>> of SHA256 hashes and tell me how much data is redundant. The goal was
>> to have a look at which files share the same hash with different LOIDs
>> and optionally change that, so that all those files are only stored
>> once on the end.
>>
>> While the approach was pretty naive, because it simply read all files
>> into memory to calculate the hashes, I'm somewhat sure it worked in
>> the past with Postgres 9.6. The executing server had enough free RAM
>> available as well to process the at most ~4 GiB large files one after
>> another.
>>
>> I tried that SQL today with Postgres 11 on UB 18.04 and it failed:
>>
>>> [Code: 0, SQL State: XX000]  FEHLER: invalid memory alloc request size 
>>> 1898107949
>>>   Wobei: PL/pgSQL-Funktion loid_hash_calc(oid,text)[...]
> [...]
>> I searched regaridng that issue and only found two relevant results:
>> Corrupted rows for some reason and simply size restrictions when
>> allocating memory. The latter is more likely than the former in my
>> case, as the restrictions seems to be 1 GiB and I do have larger
>> files.
> 
> 1 GB is the maximum size of quite a few data structures in PostgreSQL. I
> don't use PL/pgSQL, but I wouldn't be surprised if it was the maximum
> size of whatever loread() returns (a bytea?). I would be surprised if
> this limit was higher in version 9.6 than it is in version 11, however.
> 

Well, it's actually a bit worse than that - the maximum allocation size
is (1GB - 1B), as it's defined like this:

#define MaxAllocSize((Size) 0x3fff) /* 1 gigabyte - 1 */

And this includes both the "user data" and a small "header" used for the
bytea value. Depending on what format you use to output the values there
may be additional limits (e.g. 'hex' requires 2 characters per byte, so
doubling the amount of memory needed).

For large objects this is not an issue, because we store them in small
chunks, not as one large bytea value.

> 
>> I'm doing the following simply currently, because I didn't find any
>> interfaces allowing to forward blocks of data, LOIDs, file descriptors
>> or anything like that working with smaller buffers or alike.
>>
>>> fd  := lo_open( loid,  INV_READ);
>>> size:= lo_lseek(fd, 0, SEEK_END);
>>> PERFORMlo_lseek(fd, 0, SEEK_SET);
>>
>>> hashBin := digest(loread(fd, size), algorithm);
>>> hashHex := encode(hashBin,  'hex');
>>
>> So, is there any way to work around the problem I have currently?
> 
> Normally, hash libararies have a way to feed chunks of data into a hash
> computations to avoid having to keep the whole thing in RAM.
> The pgcrypto extension seems to be lacking such functionality, however.
> 
> I would build something similar to a Merkle tree:
> 
> Choose a convenient chunk size (a few MB is probably ok), read the large
> object in chunks of this size, computing the hash for each. Concatenate
> all the hashes and compute the hash of that. Add intermediate levels if
> the the concatenated hashes are still too large to fit in memory.
> 

Not sure where you searched, but there definitely are interfaces to read
chunks of data from large objects - see this:

1) lo_get (loid, offset, length)
   https://www.postgresql.org/docs/13/lo-funcs.html

2) lo_seek() + lo_read()
   https://www.postgresql.org/docs/13/lo-interfaces.html

Obviously, you can't do "loread(fd, size)" because that's going to
attempt building one large bytea, failing because of the alloc limit.
You have to stream the data into the hash.

Doing that in plpgsql is possible, although possibly somewhat slow.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Strange behavior of function date_trunc

2021-05-05 Thread Tomas Vondra





On 5/5/21 3:23 PM, Pavel Luzanov wrote:

Hello,

It is very likely that the date_trunc function in the following example 
is executed for each line of the query. Although it marked as a STABLE 
and could only be called once.




It could, but that's just an option - the database may do that, but it's 
not required to do it. In this case it might be beneficial, but it'd 
make the planner more complex etc.




EXPLAIN (ANALYZE)
SELECT * FROM generate_series('2021-01-01', '2021-06-01', '1 
s'::interval) AS g(x) WHERE g.x >= date_trunc('day', 
'2021-05-05'::timestamptz);

   QUERY PLAN
--- 

  Function Scan on generate_series g  (cost=0.00..15.00 rows=333 
width=8) (actual time=2801.884..3263.328 rows=2332801 loops=1)
    Filter: (x >= date_trunc('day'::text, '2021-05-05 
00:00:00+03'::timestamp with time zone))

    Rows Removed by Filter: 10713600
  Planning Time: 0.040 ms
  Execution Time: 3336.657 ms

When replacing date_trunc with now, the query is much faster:

EXPLAIN (ANALYZE)
SELECT * FROM generate_series('2021-01-01', '2021-06-01', '1 
s'::interval) AS g(x)

WHERE g.x >= now();
   QUERY PLAN
--- 

  Function Scan on generate_series g  (cost=0.00..15.00 rows=333 
width=8) (actual time=1648.777..1845.430 rows=2275325 loops=1)

    Filter: (x >= now())
    Rows Removed by Filter: 10771076
  Planning Time: 0.039 ms
  Execution Time: 1918.767 ms

The variant with now works almost as fast as with the constant. This 
suggests me that perhaps date_trunc is being executed for every line of 
the query:


EXPLAIN (ANALYZE)
SELECT * FROM generate_series('2021-01-01', '2021-06-01', '1 
s'::interval) AS g(x)

WHERE g.x >= '2021-05-05'::timestamptz;
   QUERY PLAN
--- 

  Function Scan on generate_series g  (cost=0.00..12.50 rows=333 
width=8) (actual time=1628.743..1826.841 rows=2332801 loops=1)

    Filter: (x >= '2021-05-05 00:00:00+03'::timestamp with time zone)
    Rows Removed by Filter: 10713600
  Planning Time: 0.033 ms
  Execution Time: 1901.680 ms

In this regard, I have two questions:
1. How can I find out exactly how many times the date_trunc function has 
been executed? So far, these are just my assumptions.

2. If date_trunc is indeed called multiple times, why is this happening?



Well, it'd not like date_trunc is executed for each row while now() is 
executed only once. The functions are executed for each row in both 
cases, but now() is simply much cheaper - it just returns a value that 
is already calculated, while date_trunc has to parse and truncate the 
value, etc.


You can use CTE to execute it just once, I think:

  with x as (select date_trunc('day', '2021-04-01'::timestamptz) as x)
  select * from t where a > (select x from x);


regards
Tomas

Re: Extended stats - value not in MCV list

2021-05-04 Thread Tomas Vondra


Hi,

this topic would probably be a better fit for pgsql-hackers, as it's 
about the internal implementation, but let's continue the discussion 
here for now.


On 5/2/21 6:52 AM, Pedro Luis Guzmán Hernández wrote:

Hi there,

I've just started using extended stats cause the planner was giving me 
terrible estimates for a certain table. MCV extended stats solved my 
problem when values are in the extended MCV list, but estimates are 
still terrible when they are not in the MCV list. Limited as my 
knowledge of database internals is, I dug into the source code and found 
an important difference on how these not-MCV values are handled in the 
single-column and multi-column cases.




Hmm, so which which PostgreSQL version are you testing this on? I'm not 
sure which code branch you're looking at.


Can you share a reproducer, i.e. a simple example demonstrating the 
issue? That'd be very helpful.



For single columns, the estimate is calculated as follows:
selectivity = (1 - sum(MCV_frequencies)) / (distinct_values - count(MCV))
Which seems to assume a uniform distribution of non-MCV values and looks 
like a sensible guess, at least to my amateur eyes.


For multi-column statistics it seems to me that the estimate is 
calculated instead as:

selectivity = 1 - sum(MCV_frequencies)
Which instead seems to assume that the value could potentially be 
present in all the rows not covered by the MCV. This seems like an 
adequate upper bound, but is also quite conservative compared to the 
single-column estimate. In my specific case this yields a selectivity 
even higher than some of the least frequent MCV values, which is a 
condition that is actually checked for and compensated in the 
single-column estimate as an additional check. I have MCV and 
distinct extended stats, so I know the distinct_values  stats is available.




It shouldn't behave like that, and some quick experiments suggest it 
does not (at least on master). I can't rule out a bug, of course. A 
reproducer would be helpful.


So I hope my question is clear from the above. How come the estimates 
are calculated with such different approaches? I insist I have no 
experience with database/query planner development, so excuse me if I am 
overlooking some obvious conceptual difference between single-column and 
multi-column stats. The single-column estimate is actually described in 
the documentation, but the multi-column estimate is not. If there is 
indeed a good reason for this difference, I think it should be documented.




As for the ndistinct estimates and multi-column MCV lists, it's not all 
that simple - there may be conditions on only some of the columns, in 
which case we don't know how many groups we actually matched, etc.


TBH I'm not sure how much of those implementation details we want to put 
into the user docs - it may be a bit too much, and we may need to change 
some of it.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BRIN index on timestamptz

2021-04-23 Thread Tomas Vondra





On 4/23/21 10:31 AM, Mohan Radhakrishnan wrote:

Hi,
          I am planning to use as I search based on timestamptz fields. 
There are millions of records.I refer 
https://www.percona.com/blog/2019/07/16/brin-index-for-postgresql-dont-forget-the-benefits 
<https://www.percona.com/blog/2019/07/16/brin-index-for-postgresql-dont-forget-the-benefits>


I execute this on the AWS RDS instance. Is there something in the plan I 
should pay attention to ? I notice the Execution Time.


Thanks,
Mohan

INSERT INTO testtab (id, date, level, msg) SELECT g, CURRENT_TIMESTAMP + 
( g || 'minute' ) :: interval, random() * 6, md5(g::text) FROM 
generate_series(1,800) as g;


Two things:

a) You need to do ANALYZE, otherwise there are no statistics the 
optimizer could use (which is why the row estimates in the plans are 
entirely bogus).


b) BRIN indexes don't work on random data, because the whole idea is 
about eliminating large blocks of data (e.g. 1MB). But with random data 
that's not going to happen, because each such range will match anything. 
Which is why seqscan is a bit faster than when using BRIN index.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is there a way to dump schema to files on disk and keep them in sync

2020-12-13 Thread Tomas Vondra

On 12/13/20 6:34 PM, Adrian Klaver wrote:
> On 12/12/20 6:58 PM, Tim Uckun wrote:
>> I want to dump my postgres schema to disk in neat directories like
>> pgadmin presents. Then I want to be able to edit the files and sync
>> changes to the database and ideally if changes were made in the
>> database to sync them back to the disk.
>>
>> Is there a tool that does this? Is there a tool that will dump the
>> schema into separate directories and files like pgadmin does?
> 
> pgAdmin does not create directories, it just organizes the contents of
> the system catalogs into GUI elements.
> 
> For schema management I would suggest a tool like the one I use
> Sqitch(https://sqitch.org/). It will organize the process of schema
> creation and management.
> 

Yeah, that was my thought too. Or maybe look at the other schema
versioning tools available - we have a list on the wiki:

https://wiki.postgresql.org/wiki/Change_management_tools_and_techniques

I'm sure it's incomplete, but it's helpful nevertheless.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Potential BRIN Index Corruption

2020-12-09 Thread Tomas Vondra

On 12/9/20 12:07 AM, Huan Ruan wrote:
> HI Alvaro
> 
> Unfortunately those records were written a while ago and we no longer keep
> their WAL logs. Thanks for your help anyway.
> 

Can you estimate when roughly the records were written? E.g. by using a
rough estimate of WAL or XIDs generated per day, or something like that.
Maybe there was some I/O issue in that time period and a couple writes
got lost, or something like that.

Are there any other corrupted indexes on the table?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Extended statistics for correlated columns, row estimates when values are not in MCVs list

2020-12-07 Thread Tomas Vondra




On 12/4/20 11:53 PM, Michael Lewis wrote:
> On version 12.5, I have a query similar to the below where I am getting
> rather unfortunate row estimates and a sub-optimal index choice as a result.
> 
> SELECT
>   id
> FROM
>   messages
> WHERE
>   client_id = 1234
>   and site_id = 889977
>   and message_type_id in ( 59, 62, 102, 162 )
>   and sent_on > NOW() - INTERVAL '3 days';
> 
> I created extended statistics (all types) on client_id and site_id,
> analyzed the columns, and in pg_stats_ext.dependencies (4 = client_id, 7 =
> site_id) have:
> {"4 => 7": 0.002997, "7 => 4": 0.896230}
> 
> When I check row estimates like below, I get results that indicate the
> columns are expected to be independent still.
> explain SELECT id FROM messages; --889594304
> explain SELECT id FROM messages where client_id = 1234; --133439
> explain SELECT id FROM messages where site_id = 889977; --28800
> explain SELECT id FROM messages where client_id = 1234 and site_id =
> 889977; --4
> 

I was a bit confused about this initially, because why would it still be
estimated as independent clauses, with the extended stats defined? But I
think the explanation is fairly simple - the combination of values is
simply rare enough not to be included in the MCV list, so it probably
gets estimated using the "regular" logic as if independent. But we still
mark the clauses as estimated, so the functional dependencies are not
really considered.

(I wonder if the code in PG14 would work better, but that's of little
use of course.)

> However, I pick a client & site ID pair which show up in the MCVs list,
> then I get the same estimate when querying for that site_id with or without
> including the client_id. That is great.
> 

I think this is consistent with the above explanation - in this case the
MCV actually kicks in, significantly improving the estimate.

> Is it reasonable to expect that if the correlation between two columns is
> rather high, then the optimizer might figure the columns are not
> independent and perhaps would give less weight to the value derived from
> independent column statistics? With table statistics, it is possible to set
> a static value or ratio for something like ndistinct. Any chance for
> something similar on dependency someday?
> 
> Perhaps I am expecting too much or have a poor understanding of what
> extended statistics can or someday might do. I deal with under estimates
> from correlations between client_id and sites or other similar
> dependent objects a fair bit and am hopeful to resolve some of those
> planning problems with extended stats, without the maintenance overhead of
> migrating everything to a single client per database to get more specific
> statistics.
> 

What you might try is defining the statistics with only the functional
dependencies. That should consider the column-level correlation even
when the combination of values is not in the MCV. It might make the
"good" estimate worse, but that's an inherent trade-off.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Check constraints do not seem to be working!!!

2020-11-11 Thread Tomas Vondra

On 11/11/20 10:06 AM, Jitendra Loyal wrote:
> Thanks Nikolay
> 
> I read that but is there a way to meet the above requirement. And I will
> like to add that IS NULL and IS NOT NULL should evaluate to true/false.
> These operators are made for this and should not be returning NULL.
> 

This has nothing to do with IS [NOT] NULL, it's the first part of the
expression (b = TRUE) causing trouble. Essentially, the constraint

(b = true) and (c is not null)

is evaluated in two steps. First we evaluate the two parts individually,
and for (null, true) the results would look like this:

(b = true) => null
(c is not null) => true

and then we combine those results using 'AND'

null AND true => null

which is considered as if the constraint matches. If you want to handle
NULL for the first expression, you may do this, for example:

(b it not null and b = true) and (c is not null)

Or something like that.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: DB running out of memory issues after upgrade

2020-02-18 Thread Tomas Vondra


On Tue, Feb 18, 2020 at 05:46:28PM +, Nagaraj Raj wrote:

after upgrade Postgres to v9.6.11 from v9.6.9 DB running out of memory issues 
no world load has changed before and after upgrade. 

spec: RAM 16gb,4vCore
Any bug reported like this or suggestions on how to fix this issue? I 
appreciate the response..!! 



This bug report (in fact, we don't know if it's a bug, but OK) is
woefully incomplete :-(

The server log is mostly useless, unfortunately - it just says a bunch
of processes were killed (by OOM killer, most likely) so the server has
to restart. It tells us nothing about why the backends consumed so much
memory etc.

What would help us is knowing how much memory was the backend (killed by
OOM) consuming, which should be in dmesg.

And then MemoryContextStats output - you need to connect to a backend
consuming a lot of memory using gdb (before it gets killed) and do

 (gdb) p MemoryContextStats(TopMemoryContext)
 (gdb) q

and show us the output printed into server log. If it's a backend
running a query, it'd help knowing the execution plan.

It would also help knowing the non-default configuration, i.e. stuff
tweaked in postgresql.conf.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: 1 Status of vertical clustered index - 2 Join using (fk_constraint) suggestion - 3 Status of pgsql's parser autonomization

2020-02-16 Thread Tomas Vondra

On Sun, Feb 16, 2020 at 10:38:29PM +0100, maxzor wrote:

Hello,

1. I was told that M$ SQLServer provides huge performance deltas over
PostgreSQL when dealing with index-unaligned queries :

create index i on t (a,b, c);
select * from t where b=... and c=...;

Perhaps index-only scans might help here, but that generally does not
work for "SELECT *" queries.

Columnar storage has been tried by various companies, CitusData,
EnterpriseDB, 2ndQuadrant, Fujitsu Enterprise Postgres. It has been
discussed quite a lot, last thread that I was able to find being in
2017, https://www.postgresql.org/message-id/CAJrrPGfaC7WC9NK6PTTy6YN-NN%2BhCy8xOLAh2doYhVg5d6HsAA%40mail.gmail.com
where Fujitsu's patch made it quite far.

What is the status on such a storage manager extension interface ?

I think you're looking for threads about zheap and (especially)
zedstore. Those are two "storage manager" implementations various people
are currently working on. Neither of those is likely to make it into
pg13, though :-(

2. What do you think of adding a new syntax : 'from t join t2 using
(fk_constraint)' ? And further graph algorithms to make automatic
joins ?
Both 'natural join' and 'using (column_name)' are useless when the
columns are not the same in source and destination.
Plus it is often the case that the fk_constraints are over numerous
columns, even though this is usually advised against. But when this
case happens there will be a significant writing speedup.

I'm not really sure what's the point / benefit here. Initially it seemed
you simply propose a syntax saying "do a join using the columns in the
FK constraint" but it's unclear to me how this implies any writing
speedup?

I have been bothered by this to the point that I developed a
graphical-query-builder plugin for pgModeler,
https://github.com/maxzor/plugins/tree/master/graphicalquerybuilder#automatic-join-mode
,

but I believe such a syntax would be much better in the core!

Hm, maybe.

3. What is the status of making the internal parser of PostgreSQL less
coupled to the core, and easier to cherry-pick from outside?
It would be great to incorporate it into companion projects :
pgAdmin4, pgModeler, pgFormatter...

I have no idea what you mean by "less coupled" here. What are the
requirements / use cases you're thinking about?

FWIW I think it's pretty bad idea to post questions about three very
different topics into a single pgsql-hackers thread. That'll just lead
to a lot of confusion.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Avoiding out of date statistics / planner

2020-02-12 Thread Tomas Vondra


On Wed, Feb 12, 2020 at 10:23:22AM -0700, Michael Lewis wrote:

It may also be worth noting that it is possible to make autovacuum/analyze
more aggressive, perhaps only on the tables that see large changes in data
that might result in a statistics issue. If you could share a query,
explain analyze output, and pseudo code or at least description of what
sort of bulk operations are being done, then more insight could be offered.


Another thing you can do is deploy auto_explain, and log explain plan
for long-runnning queries. That won't fix the root cause, but it will
help you with confirming the root cause - you'll see the query plan,
which should give you enough context.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

2020-01-17 Thread Tomas Vondra


On Wed, Jan 08, 2020 at 05:24:05PM +1030, Andrew Dunstan wrote:

On Wed, Jan 8, 2020 at 7:08 AM Pavel Stehule  wrote:


Hi

po 6. 1. 2020 v 22:34 odesílatel Andrew Dunstan 
 napsal:



Updated version including docco and better error message.

cheers

andrew



I think so my objections are solved. I have small objection

+ errdetail("exception raised due to \"null_value_treatment := 
'raise_exception'\""),
+ errhint("to avoid, either change the null_value_treatment argument or ensure that 
an SQL NULL is not used")));

"null_value_treatment := 'raise_exception'\""

it use proprietary PostgreSQL syntax for named parameters. Better to use 
ANSI/SQL syntax

"null_value_treatment => 'raise_exception'\""

It is fixed in attached patch

source compilation without warnings,
compilation docs without warnings
check-world passed without any problems

I'll mark this patch as ready for commiter

Thank you for your work




Thanks for the review. I propose to commit this shortly.



Now that this was committed, I've updated the patch status accordingly.

Thanks!

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BigSQL pgc alternative

2019-12-21 Thread Tomas Vondra


On Fri, Dec 20, 2019 at 12:15:06PM -0300, Samuel Teixeira Santos wrote:

Hi all.

BigSQL still allow to install Postgres and others resources as like a
portable install.

But today, it's only offer your tool (pgc) for newer Postgresql versions.

I would like to install as portable option because it's more easy to config
and use in my own user in my development station.

What you recommend to do to replace pgc tool as alternative to install
postgresql 10 and the respective postgis version as like portable option?



I'm not familiar with BigSQL, so I'm not sure what exactly you mean when
you say "portable option". Can you explain?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: How to import Apache parquet files?

2019-11-05 Thread Tomas Vondra


On Tue, Nov 05, 2019 at 04:21:45PM +0100, Softwarelimits wrote:

Hi Imre, thanks for the quick response - yes, I found that, but I was not
sure if it is already production ready - also I would like to use the data
with the timescale extension, that is why I need a full import.



Well, we're not in the position to decide if parquet_fdw is production
ready, that's something you need to ask author of the extension (and
then also judge yourself).

That being said, I think FDW is probably the best way to do this. It's
explicitly designed to work with foreign data, so using it to access
parquet files seems somewhat natural.

The alternative is probably transforming the data into COPY format, and
then load it into Postgres using COPY (either as a file, or stdin).

Which of these options is the right one depends on your requirements.
FDW is more convenient, but row-based and probably significantly less
efficient than COPY. So if you have a lot of these parquet files, I'd
probably use the COPY. But maybe the ability to query the parquet files
directly (with FDW) is useful for you.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Barman

2019-10-31 Thread Tomas Vondra


On Thu, Oct 31, 2019 at 05:29:34PM +, Daulat Ram wrote:

Hi All,

We have configured postgres 11.2 in streaming replication (primary &
Standby)  on docker and I am looking to initiate the Postgres backup
using barman. As I know there are few options for taking backup using
barman.

RSYNC backup
Incremental Backups
Streaming Backup with continuous WAL streaming
Centralized and Catalogued Backups

Which is the best option for backup using barman? So that we can keep
the database safe in case of disaster? I feel the Incremental Backups
are most useful to perform the PITR but I want to know the experts
suggestions.



You're mixing a number of topics, here. Firstly, all backups done by
barman are centralized and catalogued, that's pretty much one of the
main purposes of barman.

When it comes to backup methods, there are two basic methods. rsync and
postgres (which means pg_basebackup). This is about creating the initial
base backup. Both methods then can replicate WAL by either streaming or
archive_command.

So first you need to decide whether to use rsync and pg_basebackup,
where rsync allows advanced features like incremental backup, parallel
backup and deduplication.

Then you need to decide whether to use archive_command or streaming
(i.e. pg_receivexlog).

The "right" backup method very much depends on the size of your
database, activity, and so on. By default you should probably go with
the default option, described as "scenario 1" in the barman docs, i.e.
pg_basebackup (backup_method = postgres) and WAL streaming.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

2019-10-21 Thread Tomas Vondra


On Mon, Oct 21, 2019 at 08:06:46AM -0700, Adrian Klaver wrote:

On 10/20/19 11:07 PM, Tomas Vondra wrote:

On Sun, Oct 20, 2019 at 06:51:05PM -0400, Andrew Dunstan wrote:




True. And AFAIK catching exceptions is not really possible in some code,
e.g. in stored procedures (because we can't do subtransactions, so no
exception blocks).



Can you explain the above to me as I thought there are exception 
blocks in stored functions and now sub-transactions in stored 
procedures.




Sorry for the confusion - I've not been particularly careful when
writing that response.

Let me illustrate the issue with this example:

   CREATE TABLE t (a int);

   CREATE OR REPLACE PROCEDURE test() LANGUAGE plpgsql AS $$
   DECLARE
  msg TEXT;
   BEGIN
 -- SAVEPOINT s1;
 INSERT INTO t VALUES (1);
 -- COMMIT;
   EXCEPTION
 WHEN others THEN
   msg := SUBSTR(SQLERRM, 1, 100);
   RAISE NOTICE 'error: %', msg;
   END; $$;

   CALL test();

If you uncomment the SAVEPOINT, you get

   NOTICE:  error: unsupported transaction command in PL/pgSQL

because savepoints are not allowed in stored procedures. Fine.

If you uncomment the COMMIT, you get

   NOTICE:  error: cannot commit while a subtransaction is active

which happens because the EXCEPTION block creates a subtransaction, and
we can't commit when it's active.

But we can commit outside the exception block:

   CREATE OR REPLACE PROCEDURE test() LANGUAGE plpgsql AS $$
   DECLARE
  msg TEXT;
   BEGIN
 BEGIN
   INSERT INTO t VALUES (1);
 EXCEPTION
   WHEN others THEN
 msg := SUBSTR(SQLERRM, 1, 100);
 RAISE NOTICE 'error: %', msg;
  END;
  COMMIT;
   END; $$;


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

2019-10-21 Thread Tomas Vondra


On Sun, Oct 20, 2019 at 06:51:05PM -0400, Andrew Dunstan wrote:


On 10/20/19 4:18 PM, Tomas Vondra wrote:

On Sun, Oct 20, 2019 at 03:48:05PM -0400, Andrew Dunstan wrote:


On 10/20/19 1:14 PM, David G. Johnston wrote:

On Sun, Oct 20, 2019 at 5:31 AM Andrew Dunstan
mailto:andrew.duns...@2ndquadrant.com>> wrote:

    And yet another is to
    raise an exception, which is easy to write but really punts the
issue
    back to the application programmer who will have to decide how to
    ensure
    they never pass in a NULL parameter.


That's kinda the point - if they never pass NULL they won't encounter
any problems but as soon as the data and their application don't see
eye-to-eye the application developer has to decide what they want to
do about it.  We are in no position to decide for them and making it
obvious they have a decision to make and implement here doesn't seem
like a improper position to take.



The app dev can avoid this problem today by making sure they don't pass
a NULL as the value. Or they can use a wrapper function which does that
for them. So frankly this doesn't seem like much of an advance. And, as
has been noted, it's not consistent with what either MySQL or MSSQL do.
In general I'm not that keen on raising an exception for cases like
this.



I think the general premise of this thread is that the application
developer does not realize that may be necessary, because it's a bit
surprising behavior, particularly when having more experience with other
databases that behave differently. It's also pretty easy to not notice
this issue for a long time, resulting in significant data loss.

Let's say you're used to the MSSQL or MySQL behavior, you migrate your
application to PostgreSQL or whatever - how do you find out about this
behavior? Users are likely to visit

   https://www.postgresql.org/docs/12/functions-json.html

but that says nothing about how jsonb_set works with NULL values :-(




We should certainly fix that. I accept some responsibility for the omission.



+1





You're right raising an exception may not be the "right behavior" for
whatever definition of "right". But I kinda agree with David that it's
somewhat reasonable when we don't know what the "universally correct"
thing is (or when there's no such thing). IMHO that's better than
silently discarding some of the data.



I'm not arguing against the idea of improving the situation. But I am
arguing against a minimal fix that will not provide much of value to a
careful app developer. i.e. I want to do more to support app devs.
Ideally they would not need to use wrapper functions. There will be
plenty of situations where it is mighty inconvenient to catch an
exception thrown by jsonb_set(). And catching exceptions can be
expensive. You want to avoid that if possible in your
performance-critical plpgsql code.



True. And AFAIK catching exceptions is not really possible in some code,
e.g. in stored procedures (because we can't do subtransactions, so no
exception blocks).





FWIW I think the JSON/JSONB part of our code base is amazing, and the
fact that various other databases adopted something very similar over
the last couple of years just confirms that. And if this is the only
speck of dust in the API, I think that's pretty amazing.



TY. When I first saw the SQL/JSON spec I thought I should send a request
to the SQL standards committee for a royalty payment, since it looked so
familiar ;-)



;-)





I'm not sure how significant this issue actually is - it's true we got a
couple of complaints over the years (judging by a quick search for
jsonb_set and NULL in the archives), but I'm not sure that's enough to
justify any changes in backbranches. I'd say no, but I have no idea how
many people are affected by this but don't know about it ...




No, no backpatching. As I said upthread, this isn't a bug, but it is
arguably a POLA violation, which is why we should do something for
release 13.



WFM


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: CPU SPIKE

2019-10-21 Thread Tomas Vondra


On Mon, Oct 21, 2019 at 10:45:33AM +0530, Pawan Sharma wrote:

Hi All,

Having real high CPU issue (95-98%), with SELECT statements and select
queries contains multiple AND operator, is it will cause any CPU Spike..???

apps team is using sub-partition, PG11, CPU:24, Mem: 16GB



Any query execution is using CPU, so yes - it can cause a spike in CPU
usage. The question is whether the query plan is efficient, or whether
it's using the CPU unenecessarily. You need to show us EXPLAIN ANALYZE
and perhaps explain why you think it shouldn't use this much CPU.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

2019-10-20 Thread Tomas Vondra


On Sun, Oct 20, 2019 at 03:48:05PM -0400, Andrew Dunstan wrote:


On 10/20/19 1:14 PM, David G. Johnston wrote:

On Sun, Oct 20, 2019 at 5:31 AM Andrew Dunstan
mailto:andrew.duns...@2ndquadrant.com>> wrote:

And yet another is to
raise an exception, which is easy to write but really punts the issue
back to the application programmer who will have to decide how to
ensure
they never pass in a NULL parameter.


That's kinda the point - if they never pass NULL they won't encounter
any problems but as soon as the data and their application don't see
eye-to-eye the application developer has to decide what they want to
do about it.  We are in no position to decide for them and making it
obvious they have a decision to make and implement here doesn't seem
like a improper position to take.



The app dev can avoid this problem today by making sure they don't pass
a NULL as the value. Or they can use a wrapper function which does that
for them. So frankly this doesn't seem like much of an advance. And, as
has been noted, it's not consistent with what either MySQL or MSSQL do.
In general I'm not that keen on raising an exception for cases like this.



I think the general premise of this thread is that the application
developer does not realize that may be necessary, because it's a bit
surprising behavior, particularly when having more experience with other
databases that behave differently. It's also pretty easy to not notice
this issue for a long time, resulting in significant data loss.

Let's say you're used to the MSSQL or MySQL behavior, you migrate your
application to PostgreSQL or whatever - how do you find out about this
behavior? Users are likely to visit

   https://www.postgresql.org/docs/12/functions-json.html

but that says nothing about how jsonb_set works with NULL values :-(

You're right raising an exception may not be the "right behavior" for
whatever definition of "right". But I kinda agree with David that it's
somewhat reasonable when we don't know what the "universally correct"
thing is (or when there's no such thing). IMHO that's better than
silently discarding some of the data.

FWIW I think the JSON/JSONB part of our code base is amazing, and the
fact that various other databases adopted something very similar over
the last couple of years just confirms that. And if this is the only
speck of dust in the API, I think that's pretty amazing.

I'm not sure how significant this issue actually is - it's true we got a
couple of complaints over the years (judging by a quick search for
jsonb_set and NULL in the archives), but I'm not sure that's enough to
justify any changes in backbranches. I'd say no, but I have no idea how
many people are affected by this but don't know about it ...

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: releasing space

2019-10-19 Thread Tomas Vondra


On Sat, Oct 19, 2019 at 04:34:32PM -0700, Adrian Klaver wrote:

On 10/19/19 4:17 PM, Julie Nishimura wrote:
Thank you, Thomas. Do you know if it is safe to replicate 9.6.2 
(smaller) db to 9.6.15 (larger capacity) using pg_basebackup? Would 
it be considered as an upgrade?


pg_basebackup backups an entire Postgres cluster which will be many 
databases. So when you say db do mean a Postgres cluster or an 
individual database?




My understanding is Julie wants to create a copy of a 9.6.2 cluster
using pg_basebackup and then run 9.6.15 on it. That's OK, it's
essentially a minor version upgrade.

FWIW Julie, please don't top post - it just makes it harder to follow
the discussion. Also, this seems like a completely separate question,
unrelated to the DROP DATABLASE one. It might be better to start a new
thread instead of repurposing an existing one.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

2019-10-19 Thread Tomas Vondra

On Sat, Oct 19, 2019 at 12:47:39PM -0400, Andrew Dunstan wrote:

On 10/19/19 12:32 PM, David G. Johnston wrote:

On Sat, Oct 19, 2019 at 9:19 AM Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>>
wrote:

>
>We invented jsonb_set() (credit to Dmitry Dolgov). And we've had it
>since 9.5. That's five releases ago.  So it's a bit late to be
coming to
>us telling us it's not safe (according to your preconceptions of
what it
>should be doing).
>

There have been numerous complaints and questions about this behavior
in those five years; and none of the responses to those defenses has
actually made the current behavior sound beneficial but rather have
simply said "this is how it works, deal with it".

I haven't seen a patch, which for most possible solutions should be
fairly simple to code. This is open source. Code speaks louder than
complaints.

IMHO that might be a bit too harsh - I'm not surprised no one sent a
patch when we're repeatedly telling people "you're holding it wrong".
Without a clear consensus what the "correct" behavior is, I wouldn't
send a patch either.

>
>We could change it prospectively (i.e. from release 13 on) if we
choose.
>But absent an actual bug (i.e. acting contrary to documented
behaviour)
>we do not normally backpatch such changes, especially when there is a
>simple workaround for the perceived problem. And it's that policy
that
>is in large measure responsible for Postgres' deserved reputation for
>stability.
>

Yeah.

Agreed, this is v13 material if enough people come on board to support
making a change.

We have changed such things in the past. But maybe a new function might
be a better way to go. I haven't given it enough thought yet.

I think the #1 thing we should certainly do is explaining the behavior
in the docs.

>And if we were to change it I'm not at all sure that we should do
it the
>way that's suggested here, which strikes me as no more intuitive than
>the current behaviour. Rather I think we should possibly fill in
a json
>null in the indicated place.
>

Not sure, but that seems rather confusing to me, because it's
mixing SQL
NULL and JSON null, i.e. it's not clear to me why

[...]

But I admit it's quite subjective.

Providing SQL NULL to this function and asking it to do something with
that is indeed subjective - with no obvious reasonable default, and I
agree that "return a NULL" while possible consistent is probably the
least useful behavior that could have been chosen.  We should never
have allowed an SQL NULL to be an acceptable argument in the first
place, and can reasonably safely and effectively prevent it going
forward.  Then people will have to explicitly code what they want to
do if their data and queries present this invalid unknown data to the
function.

How exactly do we prevent a NULL being passed as an argument? The only
thing we could do would be to raise an exception, I think. That seems
like a fairly ugly thing to do, I'd need a h3eck of a lot of convincing.

I don't know, but if we don't know what the "right" behavior with NULL
is, is raising an exception really that ugly?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

2019-10-19 Thread Tomas Vondra


On Sat, Oct 19, 2019 at 11:21:26AM -0400, Stephen Frost wrote:

Greetings,

* Dmitry Dolgov (9erthali...@gmail.com) wrote:

If we want to change it, the question is where to stop? Essentially we have:

update table set data = some_func(data, some_args_with_null);

where some_func happened to be jsonb_set, but could be any strict function.


I don't think it makes any sense to try and extrapolate this out to
other strict functions.  Functions should be strict when it makes sense
for them to be- in this case, it sounds like it doesn't really make
sense for jsonb_set to be strict, and that's where we stop it.



Yeah. I think the issue here is (partially) that other databases adopted
similar functions after us, but decided to use a different behavior. It
might be more natural for the users, but that does not mean we should
change the other strict functions.

Plus I'm not sure if SQL standard says anything about strict functions
(I found nothing, but I looked only very quickly), but I'm pretty sure
we can't change how basic operators change, and we translate them to
function calls (e.g. 1+2 is int4pl(1,2)).


I wonder if in this case it makes sense to think about an alternative? For
example, there is generic type subscripting patch, that allows to update a
jsonb in the following way:

update table set jsonb_data[key] = 'value';

It doesn't look like a function, so it's not a big deal if it will handle NULL
values differently. And at the same time one can argue, that people, who are
not aware about this caveat with jsonb_set and NULL values, will most likely
use it due to a bit simpler syntax (more similar to some popular programming
languages).


This seems like an entirely independent thing ...



Right. Useful, but entirely separate feature.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

2019-10-19 Thread Tomas Vondra


On Sat, Oct 19, 2019 at 11:26:50AM -0400, Andrew Dunstan wrote:


...

The hyperbole here is misplaced. There is a difference between a bug and
a POLA violation. This might be the latter, but it isn't the former. So
please tone it down a bit. It's not the function that's unsafe, but the
ill-informed use of it.


We invented jsonb_set() (credit to Dmitry Dolgov). And we've had it
since 9.5. That's five releases ago.  So it's a bit late to be coming to
us telling us it's not safe (according to your preconceptions of what it
should be doing).


We could change it prospectively (i.e. from release 13 on) if we choose.
But absent an actual bug (i.e. acting contrary to documented behaviour)
we do not normally backpatch such changes, especially when there is a
simple workaround for the perceived problem. And it's that policy that
is in large measure responsible for Postgres' deserved reputation for
stability.



Yeah.



Incidentally, why is your function written in plpgsql? Wouldn't a simple
SQL wrapper be better?


   create or replace function safe_jsonb_set
       (target jsonb, path text[], new_value jsonb, create_missing
   boolean default true)
   returns jsonb as
   $func$
       select case when new_value is null then target else
   jsonb_set(target, path, new_value, create_missing) end
   $func$ language sql;


And if we were to change it I'm not at all sure that we should do it the
way that's suggested here, which strikes me as no more intuitive than
the current behaviour. Rather I think we should possibly fill in a json
null in the indicated place.



Not sure, but that seems rather confusing to me, because it's mixing SQL
NULL and JSON null, i.e. it's not clear to me why

   jsonb_set(..., "...", NULL)

should do the same thing as

   jsonb_set(..., "...", 'null':jsonb)

I'm not entirely surprised it's what MySQL does ;-) but I'd say treating
it as a deletion of the key (just like MSSQL) is somewhat more sensible.
But I admit it's quite subjective.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Has there been any discussion of custom dictionaries being defined in the database?

2019-10-19 Thread Tomas Vondra


On Thu, Oct 17, 2019 at 11:52:39AM +0200, Tom Lane wrote:

Morris de Oryx  writes:

Given that Amazon is bragging this week about turning off Oracle, it seems
like they could kick some resources towards contributing something to the
Postgres project. With that in mind, is the idea of defining dictionaries
within a table somehow meritless, or unexpectedly difficult?


Well, it'd just be totally different.  I don't think anybody cares to
provide two separate definitions of common dictionaries (which'd have to
somehow be kept in sync).

As for why we did it with external text files in the first place ---
for at least some of the dictionary types, the point is that you can
drop in data files that are available from upstream sources, without any
modification.  Getting the same info into a table would require some
nonzero amount of data transformation.



IMHO being able to load dictionaries from a table would be quite
useful, and not just because of RDS. For example, it's not entirely true
we're just using the upstream dictionaries verbatim - it's quite common
to add new words, particularly in specialized fields. That's way easier
when you can do that through a table and not through a file.


Having said that ... in the end a dictionary is really just a set of
functions implementing the dictionary API; where they get their data
from is their business.  So in theory you could roll your own
dictionary that gets its data out of a table.  But the dictionary API
would be pretty hard to implement except in C, and I bet RDS doesn't
let you install your own C functions either :-(



Not sure. Of course, if we expect the dictionary to work just like the
ispell one, with preprocessing the dictionary into shmem, then that
requires C. I don't think that's entirely necessary, thoug - we could
use the table directly. Yes, that would be slower, but maybe it'd be
sufficient.

But I think the idea is ultimately that we'd implement a new dict type
in core, and people would just specify which table to load data from.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: releasing space

2019-10-19 Thread Tomas Vondra


On Thu, Oct 17, 2019 at 05:20:09PM +, Julie Nishimura wrote:

Hello everybody, We are running PostgreSQL 9.6.2 cluster master ->
standby (streaming replication). 22 tb of space (constantly struggling
with the space, pruning the old data, but not fast enough). The biggest
db takes 16 tb. So, we've copied it to another server, and now we would
like to delete it from our original source, to free up the space. What
would be the right approach for this?  Just issue drop database command
(16tb). How long it might take? Should we do it gradually (drop biggest
tables first)? Any suggestions? Caveats?



Generally speaking, DROP DATABASE simply recursively drops all the
various objects - indexes, tables, etc. It mostly just deleting the
files, which should not be very expensive (we certainly don't need to
delete all the data or anything), but there's certain number of I/O
involved. But it does depend on the OS / filesystem / hardware if that's
an issue.

So if you want to be on the safe side, you can drop the objects one by
one, with a bit of delay between them, to throttle the I/O a bit.

FWIW the latest minor release for 9.6 is 9.6.15, you're 13 minor
versions (~30 months) of fixes behind. You might want to consider
upgrading ...


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Replication of Replication

2019-10-19 Thread Tomas Vondra


On Fri, Oct 18, 2019 at 05:54:34PM -0400, Edilmar Alves wrote:
I have 3 servers running CentOS+PG 11 
(postgresql11-server-11.5-1PGDG.rhel7.x86_64):

- s1: main db + publication
- s2: subscription of the main db
all works fine until here...
Now, I tried to config this:
- s2: publication of the same db that was replicated from s1
- s3: subscription of the db from s2
When I did this, replication s1=>s2 stopped, and replication s2=>s3 
never worked.

Is this not possible?


You'll have to share more details - error messages from the server log,
how you check that the replication stopped, etc.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: timescaleDB & WAL replication

2019-10-15 Thread Tomas Vondra


On Tue, Oct 15, 2019 at 09:28:27AM +0200, basti wrote:

Hello,
I have a Master / Slave Postgres setup with WAL Replication. Now I want
to add timescaleDB.

I found this todo: https://docs.timescale.com/latest/tutorials/replication

As I understand that in the right way I just need to add the timescaleDB
extention on the master side?



AFAIK timescale is "just" a regular extension, in which case yes, all
you need to do is installing it on the master side.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: temporary files

2019-10-07 Thread Tomas Vondra


On Mon, Oct 07, 2019 at 01:52:41PM -0700, dangal wrote:

Hello everyone, I have a question to see if you can help me, I have set
work_mem in 100 MB but I have the following in the pgbadger

Queries generating the most temporary files (N)

Count Total size   Min size  Max sizeAvg size
   58   3.24 GiB 57.15 MiB  57.15 MiB 57.15 MiB

How could it be that if you require 57 MB, 100 MB will not reach ?
thanks a lot



The on-disk and in-memory representations are not the same, and the
on-disk one is often significantly more efficient. Data that needs 57MB
temporary file may beed ~150MB memory to perform in-memory sort, for
example.

Yet another reason may be that e.g. hash join splits the memory into
batches, and each one has to fit into work_mem. And whevener we hit the
limit, we double the number of batches, i.e. we cut the batch size in
half. Thus the size is somewhere between 50 and 100MB, with ~75MB on
average. Furthermore, we also estimate the batch size before running the
query, so we may start with unnecessarily many batches, pushing the
average batch size down even more.


FWIW it's somewhat suspicious you have 58 temp files with almost no
variability in size. That seems as if a single query was executed
repeatedly. Maybe try looking into what query that is.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgcrypto question

2019-10-07 Thread Tomas Vondra


On Mon, Oct 07, 2019 at 02:51:30PM -0400, Erik Aronesty wrote:

Good idea for "psycopg".  It would be easy for a POC, but I think the
only meaningful layer to operate at would be a libpq drop-in
replacement that intercepts PQgetvalue, PQprepare, PQexecParams,
PQexecPrepared ... etc.   That way odbc, python, node, etc would "just
work" as long as you used LD_PRELOAD appropriately.



It's not clear to me how would that know which columns are encrypted,
with what key, etc. Because those encrypted columns are essentially just
regular bytea columns, so there's no easy way to distinguish them.

I'm no psycopg2 expert, but it does have some infrastructure for casting
PostgreSQL types to Python types, and I guess that could be used for the
encryption.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgcrypto question

2019-10-07 Thread Tomas Vondra


On Mon, Oct 07, 2019 at 12:05:16PM -0400, Erik Aronesty wrote:

Currently, it is my understanding that the pgcrypto library requires
the user to send a password or private key up to the server for
decryption.



Correct. In the naive case the key is included in each SQL query, which
does have various issues. Bruce Momjian has a nice extension that allows
you to fix that by loading the key into backend memory:

 http://momjian.us/download/pgcryptokey/


Is there a notion of a client-side encrypt/decrypt plugin when doing a
postgres query?

For example, a user could query postgres, get back data of type
"encrypted", and a "libpq" plugin could decode/decrypt those columns
that are of data type "encrypted" in a manner transparent to the
user of the client

Obviously I could write this by intercepting the relevant libpq calls
using LD_PRELOAD or Microsoft's "Detours" ... but it there a better
way to do that?



AFAIk that's usually done at the application level, i.e. the application
is sending/receiving encrypted data, and the database simply sees bytea
columns. I'm not aware of a driver doing that transparently, but it
seems like an interesting idea - I wonder if it could be done e.g. in
psycopg as an extension, or something like that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PG 12 not yet for mac

2019-10-07 Thread Tomas Vondra


On Mon, Oct 07, 2019 at 11:11:43AM -0400, Ravi Krishna wrote:

https://postgresapp.com/downloads.html

The link which says PG 12 is actually PG 11.



Not sure if the link is correct or not (it seems to point to
Postgres-2.3-12.dmg, which seems like it might be v12). But more
imporantly, that's not a project/page managed by the PostgreSQL
community, you need to report the issues to the authors (most likely
through github issues).

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Install postgres on rhel 7

2019-10-07 Thread Tomas Vondra


On Mon, Oct 07, 2019 at 01:00:32AM -0400, Mageshwaran Janarthanam wrote:

Hi Team,

I have question on the postgres install process. Tried to read the archives
but couldn't find it.

I wanted to install postgres server via rpm and wanted to store the
binaries under non default location like /pgbin.



The rpm package (at least not those available at yum.postgresql.org) are
not relocatable:

 rpm -qpi postgresql12-12.0-1PGDG.rhel7.x86_64.rpm  | grep Relocations
 Relocations : (not relocatable)

So unfortunately, you'll have to build your own packages, or something
like that.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: COPY command returns "ERROR: invalid XML content"

2019-10-07 Thread Tomas Vondra


On Sun, Oct 06, 2019 at 08:45:40PM -0700, Konstantin Izmailov wrote:

Hi,
I'm using libpq (v10) to import lots of xml files into a PG10 table. I
noticed if number of records imported exceeds 2100 then the following error
is returned:
ERROR:  invalid XML content
DETAIL:  line 1: Couldn't find end of Start Tag
timeBasedFileNamingAndTriggerin line 1
logFile.%d{-MM-dd}.%i.html

My guess is this is an issue/limitation in libxml2, which we use to
parse and process XML. What libxml2 version you have installed? Can you
share an example of a XML document to reproduce the issue?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: integrate Postgres Users Authentication with our own LDAP Server

2019-05-09 Thread Tomas Vondra


On Thu, May 09, 2019 at 07:11:24AM +, M Tarkeshwar Rao wrote:

We  want to setup ldap authentication in pg_hba.conf, for Postgresql
users(other than  postgres super user).

We are getting issue with special characters by following steps given in
postgres documentation.  It is not accepting any special characters as
special characters are mandatory in our use case.

Can you please help us or have you any steps by which we can configure
any postgres with LDAP?


Please don't cross-post - this is a fairly generic question, it has
nothing to do with performance or development, so the right thing is to
send it to pgsql-general. Likewise, it makes little sense to send
questions to the "owner". I've removed the other lists from CC.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Tomas Vondra

On 11/6/18 10:54 PM, Andres Freund wrote:
> On 2018-11-06 16:47:20 -0500, Tom Lane wrote:
>> =?UTF-8?Q?Ond=c5=99ej_Bouda?=  writes:
>>>> Ondřej, as a short-term workaround you could prevent the crash
>>>> by setting that index's recheck_on_update property to false.
>>
>>> Thanks for the tip. I am unsuccessful using it, though:
>>> # ALTER INDEX public.schedulecard_overlap_idx SET (recheck_on_update = 
>>> FALSE);
>>> ERROR:  unrecognized parameter "recheck_on_update"
>>
>> Oh, for crying out loud.  That's yet a different bug.
>> I'm not sure that it's the fault of the recheck_on_update
>> feature proper though; it might be a pre-existing bug in
>> the reloptions code.  Looks like somebody forgot to list
>> RELOPT_KIND_GIST in RELOPT_KIND_INDEX, but is that the
>> fault of commit c203d6cf8 or was it busted before?
> 
> Looks new:
> +   RELOPT_KIND_INDEX = 
> RELOPT_KIND_BTREE|RELOPT_KIND_HASH|RELOPT_KIND_GIN|RELOPT_KIND_SPGIST,
> 
> there aren't any other "for all indexes" type options, so the whole
> category didn't exist before.
> 
> It also strikes me as a really bad idea, even if RELOPT_KIND_GIST
> wouldn't have been omitted: It breaks index am extensibility.
> 

Does it? The RELOPT_KIND_* stuff is hard-coded in reloptions.h anyway,
so I'm not sure how this particular thing makes it less extensible?

That being said, we also have RELOPT_KIND_BRIN, and that seems to be
missing from RELOPT_KIND_INDEX too (and AFAICS the optimization works
for all index types).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Why the index is not used ?

2018-10-08 Thread Tomas Vondra

Hi,

On 10/08/2018 04:10 PM, ROS Didier wrote:
> Hi Tomas
>    
>     Thank you for your answer and recommendation which is very
> interesting. I'm going to study the PCI DSS document right now.
> 
>   * Here are my answer to your question :
> 
> />>/
> /What is your threat model?/
> /< we want to prevent access to sensitive data for everyone except those
> who have the encryption key.
> in case of files theft, backups theft, dumps theft, we do not want
> anyone to access sensitive data.
>    

The thing is - encryption is not panacea. The interesting question is
whether this improves security compared to simply using FDE and regular
access rights (which are grantable at the column level).

Using those two pieces properly may very well be a better defense than
not well designed encryption scheme - and based on this discussion, it
does not seem very polished / resilient.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Why the index is not used ?

2018-10-07 Thread Tomas Vondra

Hi,

On 10/07/2018 08:32 PM, ROS Didier wrote:
> Hi Francisco
> 
>   Thank you for your remark. 
> You're right, but it's the only procedure I found to make search on
> encrypted fields with good response times (using index) !
> 

Unfortunately, that kinda invalidates the whole purpose of in-database
encryption - you'll have encrypted on-disk data in one place, and then
plaintext right next to it. If you're dealing with credit card numbers,
then you presumably care about PCI DSS, and this is likely a direct
violation of that.

> Regarding access to the file system, our servers are in protected
network areas. few people can connect to it.
> 

Then why do you need encryption at all? If you assume access to the
filesystem / storage is protected, why do you bother with encryption?
What is your threat model?

> it's not the best solution, but we have data encryption needs and
> good performance needs too. I do not know how to do it except the
> specified procedure..
>
> if anyone has any proposals to put this in place, I'm interested.
> 

One thing you could do is hashing the value and then searching by the
hash. So aside from having the encrypted column you'll also have a short
hash, and you may use it in the query *together* with the original
condition. It does not need to be unique (in fact it should not be to
make it impossible to reverse the hash), but it needs to have enough
distinct values to make the index efficient. Say, 10k values should be
enough, because that means 0.01% selectivity.

So the function might look like this, for example:

  CREATE FUNCTION cchash(text) RETURNS int AS $$
SELECT abs(hashtext($1)) % 1;
  $$ LANGUAGE sql;

and then be used like this:

  CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(cchash(cc));

and in the query

  SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit
   WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32'
 AND cchash(cc) = cchash('test value 32');

Obviously, this does not really solve the issues with having to pass the
password to the query, making it visible in pg_stat_activity, various
logs etc.

Which is why people generally use FDE for the whole disk, which is
transparent and provides the same level of protection.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Help to understand Actual Rows vs Plan Rows from the query planner output

2018-09-23 Thread Tomas Vondra

                             
>                                   +
>              Plan Width: 0                                              
>                                   +
>              Actual Startup Time: 0.048                                 
>                                   +
>              Actual Total Time: 0.048                                   
>                                   +
>              Actual Rows: 1                                             
>                                   +
>              Actual Loops: 1                                            
>                                   +
>              Index Cond: "(lower((city)::text) = 'melanyfort'::text)"   
>                                   +
>            - Node Type: "Bitmap Index Scan"                             
>                                   +
>              Parent Relationship: "Member"                              
>                                   +
>              Parallel Aware: false                                      
>                                   +
>              Index Name: "lower_state_users_idx"                        
>                                   +
>              Startup Cost: 0.00                                         
>                                   +
>              Total Cost: 5.87                                           
>                                   +
>              Plan Rows: 211                                             
>                                   +
>              Plan Width: 0                                              
>                                   +
>              Actual Startup Time: 0.102                                 
>                                   +
>              Actual Total Time: 0.102                                   
>                                   +
>              Actual Rows: 211                                           
>                                   +
>              Actual Loops: 1                                            
>                                   +
>              Index Cond: "(lower((state)::text) = 'ohio'::text)"        
>                                   +
>    Planning Time: 0.260                                                 
>                                   +
>    Triggers:                                                            
>                                   +
>    Execution Time: 0.249
> (1 row)
> 
> aruprakshit=# 
> 
> 
> --
> 
> In the first node type of "Bitmap Index Scan” on “users_lower_idx”, I
> see the plan rows are 50, but actual rows it got 1. In the second node
> type of "Bitmap Index Scan” on “ lower_state_users_idx”, I see the plan
> rows are 211, and actual rows 211. Both are same. Based on what
> conditions planner estimated the planned and actual rows count? 
>
> 
> In node type “BitmapAnd”, I see again the actual rows 1, then why on the
> final plan i.e. Node Type: "Bitmap Heap Scan” again planner estimated
> rows 1? How does it counts these? What does the Loops count says us
> about the query?  
>

It's not very clear to me whether you're asking how the planner computes
estimates in general, or how it computed these particular estimates (or
what issues you see there).

Perhaps this would give you at least some answers:

https://www.postgresql.org/docs/11/static/row-estimation-examples.html

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Code of Conduct

2018-09-18 Thread Tomas Vondra

On 09/18/2018 01:47 PM, James Keener wrote:

 > following a long consultation process

It's not a consultation if any dissenting voice is simply ignored.
Don't sugar-coat or politicize it like this -- it was rammed down
everyone's throats. That is core's right, but don't act as everyone's
opinions and concerns were taken into consideration.

I respectfully disagree.

I'm not sure which dissenting voices you think were ignored, but from 
what I've observed in the various CoC threads the core team took the 
time to respond to all comments. That does not necessarily mean the 
resulting CoC makes everyone happy, but unfortunately that's not quite 
possible. And it does not mean it was not an honest consultation.

IMO the core team did a good job in listening to comments, tweaking the 
wording and/or explaining the reasoning. Kudos to them.

There are a good number of folks who are concerned that this CoC is
overreaching and is ripe for abuse. Those concerns were always
simply, plainly, and purposely ignored.
No, they were not. There were multiple long discussions about exactly 
these dangers, You may dislike the outcome, but it was not ignored.

 > Please take time to read and understand the CoC, which is intended to 
ensure that PostgreSQL remains an open and enjoyable project for anyone 
to join and participate in.

I sincerely hope so, and that it doesn't become a tool to enforce social 
ideology like in other groups I've been part of. Especially since this 
is the main place to come to get help for PostgreSQL and not a social club.

Ultimately, it's a matter of trust that the CoC committee and core team 
apply the CoC in a careful and cautious way. Based on my personal 
experience with most of the people involved in both groups I'm not 
worried about this part.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fwd: is there any adverse effect on DB if I set autovacuum scale factor to zero?

2018-08-13 Thread Tomas Vondra


On 08/13/2018 04:24 PM, Ashu Pachauri wrote:

+ pgsql-general

Thanks and Regards,
Ashu Pachauri


-- Forwarded message -
From: *Ashu Pachauri* mailto:ashu210...@gmail.com>>
Date: Mon, Aug 13, 2018 at 7:53 PM
Subject: Re: is there any adverse effect on DB if I set autovacuum scale 
factor to zero?

To: mailto:raghavendra...@gmail.com>>


The way I see *autovacuum_vacuum_scale_factor* is not in terms of 
absolute number but as the percentage of any table that can consist of 
updated / deleted tuples to make it eligible for vacuuming. A factor of 
0.1 ensures that your tables would be eligible for vacuuming if more 
than 10% of the tuples are deleted/updated.
1. If you think that 10% is too high for you in terms of storage cost, 
you can decrease the number or set it to zero. But, I would advise to 
increase the value of *autovacuum_vacuum_threshold* to something 
reasonable if you do that, otherwise you pay the CPU cost frequent 
vacuuming across all tables.
2. However, if your issue is not the fixed 10% overhead but the lack of 
throughput i.e. you see the number of deleted/updated tuples keeps 
increasing in an unbounded fashion, the right way to deal with it is a) 
Having higher value of *autovacuum_max_workers* b) lower value for 
*autovacuum_naptime*.




Increasing autovacuum_max_workers is unlikely to solve the issue with 
throughput, because all the workers are throttled together - there's a 
limit on the amount of work that can be done per second. Increasing the 
number of workers is akin to allowing more cars on a highway, but also 
lowering the speed limit.


You need to increase the limit on amount of work, and lowering naptime 
is one way to do that.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: is there any adverse effect on DB if I set autovacuum scale factor to zero?

2018-08-13 Thread Tomas Vondra





On 08/13/2018 03:41 PM, Raghavendra Rao J S V wrote:


Hi Tomas,

Thank you very much for your response.

As we  know table becomes a candidate for autovacuum  process based on 
below formula.



*Autovacuum VACUUM thresold for a table = autovacuum_vacuum_scale_factor 
* number of tuples + autovacuum_vacuum_threshold*




*Current settings in my database are as follows.*


/autovacuum_vacuum_scale_factor = 0.1/

/autovacuum_vacuum_threshold = 40/



Due to above formula the dead tuples are accumulating based on the 
number of live tuples as show below picture.




select relname,n_live_tup,n_dead_tup,(n_live_tup*.1+40) 
expected_to_autovacuum,* from pg_stat_user_tables

where  n_dead_tup>800
order by n_live_tup desc
limit 100;




In order to avoid the dead tuples accumulation I wold like to change the 
auto vacuum  settings in *"postgresql.conf"* as below.


/autovacuum_vacuum_scale_factor = 0.01/
//autovacuum_vacuum_threshold = 100/
/


OK, so the tables apparently have enough dead tuples to trigger vacuum. 
That mean the autovacuum throughput is insufficient to do all the 
cleanup. If you lower the scale factor, the amount of cleanup will 
*increase* (more tables being eligible for cleanup) making it less 
likely autovacuum can keep up.


You need to increase the throughtput, by increasing vacuum_cost_limit or 
something like that.



*Kindly guide me your views. Does it cause any adverse effect on DB.*
*


Well, it forces the database to do more stuff / more often, so it may 
have adverse impact, of course. It's hard to say if it's going to be a 
win overall, because we don't know how serious is the bloat.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: is there any adverse effect on DB if I set autovacuum scale factor to zero?

2018-08-13 Thread Tomas Vondra





On 08/13/2018 11:07 AM, Raghavendra Rao J S V wrote:

Hi All,

We are using postgres *9.2*  version on *Centos *operating system.  We 
have around *1300+* tables.We have following auto vacuum settings are 
enables. Still few of the tables(84 tables) which are always busy are 
not vacuumed.Dead tuples in those tables are more than 5000.  Due to 
that tables are bloating and observed few areas has performance 
degradation.




You don't say how large the tables are, so it's impossible to say 
whether 5000 dead tuples is excessive or not. IMHO it's a negligible 
amount and should not lead to excessive bloat or issues.


A certain amount of wasted is expected - it's a trade-off between 
immediate and delayed cleanup. If you delay the cleanup a bit, it's 
going to be more efficient overall.


It's also unclear why the tables are not vacuumed - it may easily be due 
to all the autovacuum workers being constantly busy, unable to cleanup 
all tables in a timely manner. In that case lowering the threshold is 
not going to help, on the contrary.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Shared buffers increased but cache hit ratio is still 85%

2018-07-18 Thread Tomas Vondra

On 07/18/2018 10:43 AM, Andreas Kretschmer wrote:
> 
> 
> Am 18.07.2018 um 10:26 schrieb Hans Schou:
>> Am I doing something wrong or should some history be cleared?
> 
> Reset the stats for that database. You can check the date of last reset
> with:
> 
> select stats_reset from pg_stat_database where datname = 'database_name';
> 
> and reset it with:
> 
> ||pg_stat_reset()
> ||
> ||Reset all statistics counters for the current database to zero
> (requires superuser privileges by default, but EXECUTE for this function
> can be granted to others.)||
> 

It might be better to note current values of the counters somewhere, and
compute a delta later (and use that to compute the cache hit ratio). The
issue is that pg_stat_reset() throws away all sorts of interesting and
important stats, including those driving autovacuum/autoanalyze.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ERROR: found multixact from before relminmxid

2018-04-09 Thread Tomas Vondra



On 04/09/2018 01:49 PM, Alexandre Arruda wrote:
> 
> 
> 2018-04-06 13:11 GMT-03:00 Tomas Vondra <tomas.von...@2ndquadrant.com
> <mailto:tomas.von...@2ndquadrant.com>>:
> 
> 
> 
> On 04/06/2018 04:29 PM, Alexandre Arruda wrote:
> > 2018-04-06 9:39 GMT-03:00 Tomas Vondra <tomas.von...@2ndquadrant.com 
> <mailto:tomas.von...@2ndquadrant.com>
> > <mailto:tomas.von...@2ndquadrant.com
> <mailto:tomas.von...@2ndquadrant.com>>>:
> >
> >
> >
> >     On 04/06/2018 02:09 AM, Alexandre Arruda wrote:
> >     > Hi,
> >     >
> >     > Some time ago, I had this errors frequently showed in logs
> after some
> >     > autovacuum in some tables(pg 9.6). VACUUM FULL or CLUSTER in
> this tables
> >     > show the same and not complete the tasks (showed by some
> table bloat
> >     > select).
> >     > Then, I did a full dump/restore into a new version (10.2)
> and everything
> >     > is ok for a couple of months. Now, I have errors like this
> again:
> >     >
> >     > db1=# cluster pc44t;
> >     >
> >     > ERROR:  found multixact 134100944 from before relminmxid
> 192042633
> >     >
> >     >  
> >     >
> >     > Like before, the only way to make the errors to desapear is by
> >     > dump/reload the whole table.
> >     >
> >     >
> >     > Thanks for any help.
> >     >
> >
> >     That's going to be hard, unless you still have the cluster around.
> >
> >     This surely seems like some sort of data corruption issue, but
> without
> >     being able to inspect the data pages it's nearly impossible to
> determine
> >     what went wrong.
> >
> >     We'd also need more information about what happened to the
> hardware and
> >     cluster before the issues started to appear - crashes,
> hardware issues.
> >     And so on.
> >
> >     regards
> >
> >     --
> >     Tomas Vondra                  http://www.2ndQuadrant.com
> >     PostgreSQL Development, 24x7 Support, Remote DBA, Training &
> Services
> >
> >
> > Hi Tomas,
> > The old cluster are gone, unfortunatly.
> >
> > This server is a 128GB ECC RAM with a dual redundant hba fiber channel
> > connect to a sotorage with Raid 6 and I don't have (apparently) any
> > errors reported.
> > Yesterday I did a test with one table:  some sum aggragates, count(*),
> > etc, then dump/reload and repeat the tests the results (of querys) are
> > the same, regarding the vacuum problem
> > thats disapeared.
> >
> 
> I'm not sure I understand correctly. So you can reproduce the issue? If
> yes, how can you share the scripts/data you use (and how large is it)?
> If we could reproduce it locally, it would make the investigation much
> easier.
> 
> BTW you mentioned you're using PostgreSQL 9.6 - which minor version,
> specifically?
> 
> 
> regards
> 
> 
> Hi Tomas,
> 
> No, I can't reproduce. What I did is a simple way to "validate" the
> current table data to see if a dump/reload
> preserve them. Old postgresql was 9.6.5. The problem returns now in new
> 10.3 installation.
> 

I'm confused. First you say you can't reproduce the issue, then you say
it got back on a new installation. So which is it?

> There is a way to correct this tables without a dump/reload ?
> 
> I'm thinking to reinstall cluster doing a initdb --data-checksums,
> but I'm affraid about a severe performance impact.
>  

The performance impact should be non-measurable for most workloads. It
might be a couple of percent on certain workloads/hardware, but that's
probably not your case.

The bigger question is whether this can actually detect the issue. If
it's due to an storage issue, then perhaps yes. But if you only see
multixact issues consistently and nothing else, it might easily be a
PostgreSQL bug (in which case the checksum will be correct).


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Seems like there is an issue with reltuples showing twice the number of rows

2018-04-03 Thread Tomas Vondra

On 04/03/2018 11:14 AM, Ranjith Ramachandra wrote:
> I am relying on reltuples on my web app to get fast row counts.
> 
> This was recommended by this article to get fast approx row
> counts: https://wiki.postgresql.org/wiki/Count_estimate
> 
> 
> However for some table I am getting twice as many values when I try to
> do this. I did some more research and came up with this query.
> 
> select reltuples, n_live_tup, n_dead_tup
>               from pg_stat_user_tables join pg_class using (relname)
>              where relname =
> 'main_csv_0f247511f5c247c8813ba3cec90c6ae1_yf34fbb38d';
> 
> 
> it returns
> 
> 
>  reltuples  | n_live_tup | n_dead_tup
> -++
> 2.7209e+06 |    1360448 |    1360448
> 
> 
> If I run analyze main_csv_0f247511f5c247c8813ba3cec90c6ae1_yf34fbb38d
> 
> and I run the same query again,
> 
>   reltuples  | n_live_tup | n_dead_tup
> -++
>  1.36045e+06 |    1360448 |    1360448
> 
> But after some time the value goes back to being double the value. This
> is causing me a lot of problems since this inaccuracy does not make any
> sense to me.
> 

There was a difference between VACUUM and ANALYZE in handling recently
dead rows (essentially deleted rows that can't be removed yet), causing
similar changes to reltuples. Essentially if you do VACUUM and ANALYZE,
it may set reltuples to rather different estimates. That is fixed now
and should be in the next minor release.

Hard to say if this is a case of that, but judging by the number of dead
tuples chances are it is.

It's probably better to use n_live_tup instead, though. I'd say that's
closer to the "live tuples" definition.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-28 Thread Tomas Vondra

Hi,

the perf profiles look really weird - clearly, there's a lot of lock
contention, because the top item is this

13.49%13.40%  postmaster   postgres  [.]   LWLockAcquire
 |

 ---LWLockAcquire



That's a sign of lock contention - not sure which one, though. And then
there's heap_hot_search_buffer

12.65% 7.92%  postmaster   postgres [.] heap_hot_search_buffer
 |

 ---heap_hot_search_buffer


So either there's a table with many HOT-updated rows (repeatedly), or we
search the HOT chains very frequently for some reason. Considering it
only affects the non-LIMIT case, I'd guess the latter.

There also seems to be quite a few page faults, for some reason.

Trying to reproduce this without the actual data is rather futile. We
need some sort of reproducer (sample data to test the query on).

regards

On 03/28/2018 10:30 AM, Alessandro Aste wrote:
> Hello,  any news ?
> 
> Thank you,
> 
> Alessandro.
> 
> On Fri, Mar 23, 2018 at 8:22 PM, Alessandro Aste
> <alessandro.a...@gmail.com <mailto:alessandro.a...@gmail.com>> wrote:
> 
> PS , in the meanwhile I discovered a 2nd workaround(beside disabling
> parallel processing) . I added offset  0 to the subquery , and,
> according to the documentation, “OFFSET 0 is the same as omitting
> the OFFSET clause”
> - https://www.postgresql.org/docs/current/static/queries-limit.html
> <https://www.postgresql.org/docs/current/static/queries-limit.html>
> 
> cmd3dev=# show max_parallel_workers_per_gather ;
> 
> *max_parallel_workers_per_gather*
> 
> *-*
> 
> *8*
> 
> (1 row)
> 
>  
> 
> cmd3dev=# \timing
> 
> Timing is on.
> 
> cmd3dev=#  SELECT * FROM (SELECT seg.circuit_id AS id,
> vendor_gtt_pop.vendor_id, CASE WHEN vendor_gtt_pop.push_to_gaa = 1
> THEN 'Y' END as gaa, pop.gii_circuitid AS pop_name, cst.label AS
> pop_status, seg.a_company_name AS pop_location, seg.vendor_id AS
> pop_provider_id, seg.vendor_name AS pop_provider, cs.address1 AS
> pop_street, cs.city AS pop_city, cs.postal_code AS pop_postal_code,
> cs.state AS pop_state, csc.code AS pop_country, cs.country_id AS
> pop_country_id FROM ( SELECT c.gii_circuitid, max(so.id
> <http://so.id>) AS service_order_id FROM service_order so join
> circuit c on c.product_id=so.product_id join master_service_order
> mso on mso.id <http://mso.id>=so.master_service_order_id WHERE NOT
> (so.ordertype_id = 2 AND so.status <> 999) AND NOT (so.ordertype_id
> = 3 AND so.status <> 999) AND c.status >= 20 AND c.status not in
> (160,999) AND mso.client_id=11615 AND c.service_description=28 AND
> c.status!=160 GROUP BY c.gii_circuitid ) pop JOIN service_order so
> ON so.id <http://so.id> = pop.service_order_id left JOIN client_site
> cs on cs.id <http://cs.id>=so.a_site_id left JOIN country csc on
> csc.id <http://csc.id>=cs.country_id JOIN circuit c ON
> so.product_id=c.product_id JOIN circuit_status cst ON cst.id
> <http://cst.id>=c.status JOIN ( SELECT c.id <http://c.id> AS
> circuit_id, sg.id <http://sg.id> AS segment_id,
> c.pop_support_vendor_id AS vendor_id, v.name <http://v.name> AS
> vendor_name, sg.a_company_name FROM segment sg JOIN circuit_layout
> cl ON cl.segment_id = sg.id <http://sg.id> AND cl.ordinal = 1 JOIN
> circuit c ON c.id <http://c.id> = cl.circuit_id JOIN vendor v ON
> v.id <http://v.id> = c.pop_support_vendor_id ) seg ON seg.circuit_id
> = c.id <http://c.id> JOIN vendor_gtt_pop on
> vendor_gtt_pop.gtt_pop_id = seg.circuit_id offset 0) foo where
> vendor_id = 12346 AND (1=1) ORDER BY pop_name ASC,id desc LIMIT 10;
> 
>    id   | vendor_id | gaa |   pop_name    | pop_status |
> pop_location | pop_provider_id | pop_provider | pop_street 
> | pop_city | pop
> 
> _postal_code | pop_state | pop_country | pop_country_id
> 
> 
> +---+-+---++--+-+--+-+--+
> 
> -+---+-+
> 
> 684807 | 12346 | Y   | GTT/POP/LON1T | Active | LON1T   
> |   12288 | Telehouse UK | 14 Coriander Avenue | London   | E14
> 
> 2AA |   | GB  |    219
> 
> (1 row)
> 
>  
> 
> *Time: 2245.073 ms (00:02.245)*
> 
> 
> 
> 
> On Fri, Mar 23, 2018 at 9:31 AM, Alessandro Aste
> <alessandro.a...@gmail.com <mailto:ales

Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-22 Thread Tomas Vondra

On 03/22/2018 11:29 PM, Alessandro Aste wrote:
> Thanks Tomas. We're currently building postgres from source. In order to
> enable symbols, you want me to re-configure postres with  --enable-debug
> then run perf?
> 

Yes.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-22 Thread Tomas Vondra



On 03/22/2018 11:33 AM, Alessandro Aste wrote:
> Tomas, thank you. This machine  is abare metal server running only a
> staging postgresql 10.3  instance. Nobody is using it beside me. 
> 
> I'm attaching 4 files.  
> 
> every_30_seconds_top_stats_during_query.txt    - this is a caputure of
> the top command every 30 seconds(more or less) for 10+ minutes while I'm
> running the query. Let me know if this helps to answere your question.
> EXPLAIN_WITH_LIMIT_AND_max_parallel_workers_per_gather_SET_TO_0.txt   - 
> query plan  with full query and  max_parallel_workers_per_gather  force
> to 0. Full output.
> EXPLAIN_WITH_LIMIT_AND_DEFAULT_PARALLEL_PROCESSING_SETTINGS.txt  - 
>  query plan with full query and  default parellel processing settings. 
> Full output.
> EXPLAIN_WITHOUT_LIMIT_AND_DEFAULT_PARALLEL_PROCESSING_SETTINGS.tx  -
> query plan of the query omitting the LIMIT clause and default parellel
> processing settings. Full output.
> 

OK. Looking at the top output, I see this:

   PID USER   VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
104880 postgres  30.8g 1.9g 1.9g R 92.0  1.5  15:15.60 postmaster
111732 postgres  30.8g 476m 473m R 88.2  0.4   0:00.47 postmaster
111730 postgres  30.8g 473m 470m R 86.4  0.4   0:00.46 postmaster
111731 postgres  30.8g 476m 473m R 86.4  0.4   0:00.46 postmaster
111733 postgres  30.8g 476m 473m R 86.4  0.4   0:00.46 postmaster
111734 postgres  30.8g 476m 473m R 86.4  0.4   0:00.46 postmaster
111728 root  15824 1912  828 R  3.8  0.0   0:00.04 top

That means it certainly is not stuck, it's simply doing a lot of work on
CPU. The question is why and what it's doing ...

Can you collect some CPU profiles using perf? There's a howto here:

https://wiki.postgresql.org/wiki/Profiling_with_perf

But in short - install perf, install debuginfo packages for postgres,
and then do

perf record -a -g -s sleep 60

while running the query. Once the profile data is collected, do

perf report > report.txt

and share the report.txt with us (well, if it's very large you may need
to only cut the first 1000 lines or so).

That should tell us in which functions most of the time is spent. That
will give us some hints, hopefully.

> 
> For what concerns the  self-contained test case - I'll do my best to
> prepare it.
> 

Yeah, that would be helpful.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-21 Thread Tomas Vondra

On 03/21/2018 08:44 PM, Alessandro Aste wrote:
> Thanks for your reply Tomas.  The query just got stuck for forever.  I
> observed no CPU spikes, it is currently running and I see 89 of the CPU
> idle. I'm using a relative powerfull machine (120 GB ram, 56 CPU).  
> 

That doesn't really answer the question, I'm afraid. I suppose "89 of
CPU" means that 89% idle in total, but 11% with 56 CPUs still means
about 6 cores 100% busy. But maybe you meant something else?

Is there something else running on the machine? If you look at "top" are
the processes (the one you're connected to and the parallel workers)
doing something on the CPU?

> 
> Plain analyze as requested. : 
> 

I don't see anything obviously broken with the query plan, and it's
difficult to compare with the other plans because they are quite different.

But there's one part of the plan interesting:

 Limit  (cost=253523.56..253523.57 rows=1 width=176)
   ->  Sort  (cost=253523.56..253523.57 rows=1 width=176)
 Sort Key: c_2.gii_circuitid, c_1.id
 ->  Nested Loop  (cost=33190.89..253523.55 rows=1 width=176)
   Join Filter: (c_1.id = c.id)
   ->  Nested Loop  (cost=31724.87..31736.29 rows=1 ...)
 ...
   ->  Gather  (cost=1466.02..221787.23 rows=3 width=75)
 Workers Planned: 5
 ->  Hash Join  (cost=466.02..220786.93 rows=1 ...)
 ...

That is, there's a Gather on the inner side of a Nested Loop. I wonder
if that might cause issues in case of under-estimate (in which case we'd
be restarting the Gather many times) ...

BTW one of the plans you sent earlier is incomplete, because it ends
like this:

->  Nested Loop  (cost=42469.41..42480.82 rows=1 width=85) (...)
Join Filter: (c.status = cst.id)
Time: 3016.688 ms (00:03.017)

That is, it's missing the part below the join.

That being said, I'm not sure what's the issue here. Can you prepare a
self-contained test case that we might use to reproduce the issue? For
example by dumping the relevant part of the tables?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Postgresql 10.3 , query never completes if LIMIT clause is specified and paralle processing is on

2018-03-21 Thread Tomas Vondra



On 03/21/2018 05:09 PM, Alessandro Aste wrote:
> Hi there, we are using postgresql 10.3 and we're facing an issue with a
> query. The query (full query below)  completes only  when: 
> 
> 1 - LIMIT 10 is removed
> or
> 2 -  show max_parallel_workers_per_gather  is set to 0, so parallel
> processing is disabled.
> 
> With  max_parallel_workers_per_gather   set to the default value (8) I'm
> not even able to get the query plan.
> 
> Notes: 
> 
>   * We're experiencing the issue in any server of ours but I've
> reproduced the issue in a fresh restored database with full
> vacuum/reindex of the tables.
>   * We didn't touch any parameter concering the parallel processing,
> we're running the defaults: 
> 
> 
> cmdstaging=# show max_parallel_workers_per_gather ;
>  max_parallel_workers_per_gather
> -
>  8
> (1 row)
> 
> cmdstaging=# show max_worker_processes ;
>  max_worker_processes
> --
>  8
> (1 row)
> 
> cmdstaging=# show max_parallel_workers;
>  max_parallel_workers
> --
>  8
> (1 row)
> 
> 
> 
> 
> The query completes only omitting the LIMIT clause or when I disable
> parallel processing: 
> 
>  id   | vendor_id | gaa |   pop_name    | pop_status | pop_location |
> pop_provider_id | pop_provider |     pop_street      | pop_city | pop
> _postal_code | pop_state | pop_country | pop_country_id
> +---+-+---++--+-+--+-+--+
> -+---+-+
>  684807 |     12346 |     | GTT/POP/LON1T | Active     | LON1T        | 
>          12288 | Telehouse UK | 14 Coriander Avenue | London   | E14
>  2AA         |           | GB          |            219
> (1 row)
> 
> Time: 4374.759 ms (00:04.375)
> cmdstaging=# show max_parallel_workers_per_gather ;
>  max_parallel_workers_per_gather
> -
>  0
> (1 row)
> 
> Time: 0.097 ms
> 
> 
> Otherwise it just keep running for forever. 
> 

When you say "running forever" is it actually using CPU, or does it get
stuck on something?

> 
> This is the full query: 
> 
> 
> SELECT * FROM (
> SELECT
> seg.circuit_id AS id,
> vendor_gtt_pop.vendor_id,
> CASE WHEN vendor_gtt_pop.push_to_gaa = 1 THEN 'Y' END as gaa,
> pop.gii_circuitid AS pop_name,
> cst.label AS pop_status,
> seg.a_company_name AS pop_location,
> seg.vendor_id AS pop_provider_id,
> seg.vendor_name AS pop_provider,
> cs.address1 AS pop_street,
> cs.city AS pop_city,
> cs.postal_code AS pop_postal_code,
> cs.state AS pop_state,
> csc.code AS pop_country,
> cs.country_id AS pop_country_id
> FROM (
> SELECT c.gii_circuitid, max(so.id <http://so.id>) AS service_order_id
> FROM service_order so
> join circuit c on c.product_id=so.product_id
> join master_service_order mso on mso.id 
> <http://mso.id>=so.master_service_order_id
> WHERE NOT (so.ordertype_id = 2 AND so.status <> 999) AND
> NOT (so.ordertype_id = 3 AND so.status <> 999) AND
> c.status >= 20 AND
> c.status not in (160,999) AND
> mso.client_id=11615 AND
> c.service_description=28 AND
> c.status!=160
> GROUP BY c.gii_circuitid
> ) pop
> JOIN service_order so ON so.id <http://so.id> = pop.service_order_id
> left JOIN client_site cs on cs.id <http://cs.id>=so.a_site_id
> left JOIN country csc on csc.id <http://csc.id>=cs.country_id
> JOIN circuit c ON so.product_id=c.product_id
> JOIN circuit_status cst ON cst.id <http://cst.id>=c.status
> JOIN (
> SELECT c.id <http://c.id> AS circuit_id, sg.id <http://sg.id> AS segment_id, 
> c.pop_support_vendor_id AS vendor_id,
> v.name <http://v.name> AS vendor_name, sg.a_company_name
> FROM segment sg
> JOIN circuit_layout cl ON cl.segment_id = sg.id <http://sg.id> AND cl.ordinal 
> = 1
> JOIN circuit c ON c.id <http://c.id> = cl.circuit_id
> JOIN vendor v ON v.id <http://v.id> = c.pop_support_vendor_id
> ) seg ON seg.circuit_id = c.id <http://c.id>
> JOIN vendor_gtt_pop on vendor_gtt_pop.gtt_pop_id = seg.circuit_id
> ) foo where vendor_id = 12346 AND (1=1) ORDER BY pop_name ASC,id LIMIT 10
> 
> 
> 
> Execution plan with  max_parallel_workers_per_gather =0 , 
> max_parallel_workers_per_gather =8 and no LIMIT clause : 
> 
> 
We really need to see the execution plan that causes issues, i.e.
max_parallel_workers_per_gather=8 with LIMIT clause. Plain explain
(without analyze), at least.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: dirty_ratio & dirty_background_ratio settings with huge memory

2018-03-06 Thread Tomas Vondra



On 03/06/2018 01:16 PM, Laurenz Albe wrote:
> pinker wrote:
>> I've got cutomer with really huge RAM, now it's:
>>  total   used   free sharedbuffers cached
>> Mem:   31021113052596  49515 2088019922961185
>> -/+ buffers/cache:  904183011693
>> Swap: 8191  1   8190
>>
>> (free -m)
>>
>> and before it was twice more (6TB).
>>
>> and trying to figure out how to set dirty_ratio & dirty_background_ratio
>> parameters. Even for normal sized server those default sizes are to high,
>> but now would be ridiculously huge, respectively 1,24TB and 300GB. 
>> I'm thinking about 1 percent for dirty_ratio and not using
>> dirty_background_ratio because it's not possible to set it below 1% but to
>> set dirty_background_bytes instead to about 1-2GB.
>> Maybe someone has got other experience with RAM of this size and those
>> settings?
> 
> Yes, you should set vm.dirty_background_bytes and vm.dirty_bytes
> and not use the *_ratio settings.
> 
> 2 GB for vm.dirty_background_bytes and 1 GB for vm.dirty_bytes sounds fine.
> 

It should be the other way around: dirty_background_bytes < dirty_bytes

regards
-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

2018-02-21 Thread Tomas Vondra


On 02/21/2018 05:00 PM, Bill Moran wrote:
> On Wed, 21 Feb 2018 13:33:18 +0100
> Alexander Farber <alexander.far...@gmail.com> wrote:
> 
>> Hi Martin -
>>
>> On Wed, Feb 21, 2018 at 1:26 PM, Martin Moore <martin.mo...@avbrief.com>
>> wrote:
>>
>>> I’m no expert but I’d think it unlikely an index would be considered for a
>>> table with only 100 rows in. Also I’m pretty sure only one index per table
>>> is used, so you’d want to put state1 and state2 in one index.
>>
>> I hope to have more records in the words_games table later when my game is
>> launched (currently in soft launch/beta).
> 
> To elaborate+clarify Martin's comments.
> 
> The index will not be used while the table is small because it's actually 
> slower
> to process an index than it is to just read the entire table. However, as the
> table gets more rows, these timings will reverse and Postgres will start using
> the indexes. It's probably best to just create them even though the table is
> small. The performance improvement you'll get when the table grows will be
> well worth it, and it avoids the problem of trying to remember to create it 
> later.
> 
> However, Martin's other comment about only using a single index is incorrect.
> Postgres can use multiple indexes per query, so it's often good practace to
> put indexes on every column that might ever be used in a WHERE clause.
> 

I call this practice "shotgun" and generally discourage people from
using it. It seems attractive, but not every where condition can be
evaluated using an index, and every index has maintenance overhead.

There are cases when it's somewhat reasonable (e.g. when you don't know
which columns will be referenced by WHERE conditions, and data ingestion
has lower priority than queries). But that doesn't seem to be the case
here - you know the WHERE conditions, and people are likely sending a
lot of inserts (and expecting low latency responses).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: query performance

2018-02-18 Thread Tomas Vondra


On 02/18/2018 06:37 AM, David Rowley wrote:
> On 18 February 2018 at 12:35, hmidi slim <hmidi.sl...@gmail.com> wrote:
>> Is there an other optimized solution to make a query such this:
>> select * from (
>> select e.name, e1.name, e.id
>> from establishment as e, establishment as e1
>> where e.id <> e1.id
>> and e1.id = 1
>> and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography,
>> 1000)) as tmp inner join product as p on p.establishment_id = tmp.id
> 
> You really should state what is wrong with the performance of the
> above version and also include the EXPLAIN  (ANALYZE, BUFFERS) of that
> query.
> 
> The version of PostgreSQL that you're running it on is also a good
> thing to share.
> 
> Details of the indexes which you've defined on the tables are also
> useful information. It would especially be good to know if you've
> added an index on product (establishment_id), for example.
> 

There's actually a wiki page with overview of what should be included in
"slow query" questions:

https://wiki.postgresql.org/wiki/Slow_Query_Questions

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: shared_buffers 8GB maximum

2018-02-18 Thread Tomas Vondra


On 02/18/2018 02:41 PM, Vitaliy Garnashevich wrote:
> 
>> I certainly wouldn't recommend using 1/2 of RAM right away. There's
>> a good chance it would be a waste of memory - for example due to
>> double buffering, which effectively reduces "total" cache hit
>> ratio.
> 
> Double buffering is often mentioned in context of tuning shared
> buffers. Is there a tool to actually measure the amount of double
> buffering happening in the system?
> 

I'm not aware of such tool. But I suppose it could be done by
integrating information from pg_buffercache and pgfincore [1].

[1] https://github.com/klando/pgfincore

>> Those evictions are performed either by backends or bgwriter, both
>> of which are less efficient than checkpointer. Not only can
>> checkpointer perform various optimizations (e.g. sorting buffers to
>> make the writes more sequential), but it also writes each dirty
>> buffer just once. With smaller shared_buffers the page may have be
>> written multiple times.
> 
> In the case when shared_buffers cover most of RAM, most of writes
> should happen by checkpointer, and cache hit ratio should be high. So
> a hypothetical question: Could shared_buffers=200GB on a 250 GB RAM
> server ever be a reasonable setting? (assuming there are no other
> applications running except postgres, and 50GB is enough for
> allocating work_mem/maintenance_work_mem and for serving queries)
> 

It depends on how large is the active part of the data set is. If it
fits into 200GB but not to smaller shared buffers (say, 100GB), then
using 200GB may be a win.

If the active set is much larger than RAM, smaller shared_buffer values
work better in my experience.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: shared_buffers 8GB maximum

2018-02-16 Thread Tomas Vondra

On 02/17/2018 02:56 AM, George Neuner wrote:
> On Sat, 17 Feb 2018 00:36:57 +0200, Vitaliy Garnashevich
> <vgarnashev...@gmail.com> wrote:
>
...
>
>> Could that be a reason for increasing the value of shared_buffers?
>>
>> - Could shared_buffers=128GB or more on a 250 GB RAM server be a 
>> reasonable setting? What downsides could there be?
> 
> It depends. 8GB is pretty small for such a large server, but taking 
> 1/2 the RAM is not necessarily the right thing either.
> 

I certainly wouldn't recommend using 1/2 of RAM right away. There's a
good chance it would be a waste of memory - for example due to double
buffering, which effectively reduces "total" cache hit ratio.

Start with lower value, increment it gradually and monitor behavior of
the server.

> The size of shared buffers affects log size and the time to complete 
> checkpoints. If a large(ish) percentage of your workload is writes, 
> having a very large shared space could be bad for performance, or
> bad for space on the log device.
> 

The size of shared_buffers has pretty much no impact on the size of the
WAL - that's flat out wrong.

It also does not affect the time needed to perform a checkpoint. It may
mean that the checkpoint has to write more dirty buffers, but that is
actually a good thing because a checkpoint is about the most efficient
way to do writes. By using smaller shared buffers you're making it more
likely the database has to evict (dirty) buffers from shared buffers to
make space for other buffers needed by queries/vacuum/whatever.

Those evictions are performed either by backends or bgwriter, both of
which are less efficient than checkpointer. Not only can checkpointer
perform various optimizations (e.g. sorting buffers to make the writes
more sequential), but it also writes each dirty buffer just once. With
smaller shared_buffers the page may have be written multiple times.

What actually *does* matter is the active part of the data set, i.e. the
part of the data that is actually accessed regularly. In other words,
your goal is to achieve good cache hit ratio - say, 95% or more. This
also helps reducing the number of syscalls (when reading data from page
cache).

What is the right shared_buffers size? I have no idea, as it's very
dependent on the application. It might be 1GB or 100GB, hard to say.

The best thing you can do is set shared buffers to some conservative
value (say, 4-8GB), let the system run for a day or two, compute the
cache hit ratio using metrics in pg_stat_database, and then decide if
you need to resize shared buffers.

>> PS. Some background. We had shared_buffers=8GB initially. In 
>> pg_stat_bgwriter we saw that dirty buffers were written to disk more 
>> frequently by backends than during checkpoints (buffers_clean > 
>> buffers_checkpoint, and buffers_backend > buffers_checkpoint). According 
>> to pg_buffercache extension, there was very small percentage of dirty 
>> pages in shared buffers. The percentage of pages with usagecount >= 3 
>> was also low. Some of our more frequently used tables and indexes are 
>> more than 10 GB in size. This all suggested that probably the bigger 
>> tables and indexes, whenever scanned, are constantly flushing pages from 
>> the shared buffers area. After increasing shared_buffers to 32GB, the 
>> picture started looking healthier. There were 1GB+ of dirty pages in 
>> shared buffers (vs 40-200MB before), 30-60% of pages with usagecount >= 
>> 3 (vs 10-40% before), buffers_checkpoint started to grow faster than 
>> buffers_clean or buffers_backend. There is still not all frequently used 
>> data fits in shared_buffers, so we're considering to increase the 
>> parameter more. I wanted to have some idea about how big it could 
>> reasonably be.
> 
> So now you know that 32GB is better for your workload than 8GB.  But
> that is not necessarily a reason immediately to go crazy with it.  Try
> increasing it gradually - e.g., adding 16GB at a time - and see if the
> additional shared space provides any real benefit.
> 

Gradual increases are a good approach in general. And yes, having

    buffers_checkpoint > buffers_clean > buffers_backend

is a good idea too. Together with the cache hit ratio it's probably a
more sensible metric than looking at usagecount directly.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: New Copy Formats - avro/orc/parquet

2018-02-10 Thread Tomas Vondra



On 02/10/2018 04:30 PM, Nicolas Paris wrote:
>>> I d'found useful to be able to import/export from postgres to those modern 
>>> data
>>> formats:
>>> - avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html)
>>> - parquet (c++ writer=https://github.com/apache/parquet-cpp)
>>> - orc (all writers=https://github.com/apache/orc)
>>>
>>> Something like :
>>> COPY table TO STDOUT ORC;
>>>
>>> Would be lovely.
>>>
>>> This would greatly enhance how postgres integrates in big-data ecosystem.
>>>
>>> Any thought ?
>>
>> https://www.postgresql.org/docs/10/static/sql-copy.html
>>
>> "PROGRAM
>>
>> A command to execute. In COPY FROM, the input is read from standard
>> output of the command, and in COPY TO, the output is written to the standard
>> input of the command.
>>
>> Note that the command is invoked by the shell, so if you need to pass
>> any arguments to shell command that come from an untrusted source, you must
>> be careful to strip or escape any special characters that might have a
>> special meaning for the shell. For security reasons, it is best to use a
>> fixed command string, or at least avoid passing any user input in it.
>> "
>>
> 
> PROGRAM would involve overhead of transforming data from CSV or
> BINARY to AVRO for example.
> 
> Here, I am talking about native format exports/imports for
> performance considerations.
> 

That is true, but the question is how significant the overhead is. If
it's 50% then reducing it would make perfect sense. If it's 1% then no
one if going to be bothered by it.

Without these numbers it's hard to make any judgments.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: New Copy Formats - avro/orc/parquet

2018-02-10 Thread Tomas Vondra


On 02/10/2018 04:38 PM, David G. Johnston wrote:
> On Saturday, February 10, 2018, Nicolas Paris <nipari...@gmail.com
> <mailto:nipari...@gmail.com>> wrote:
> 
> Hello
> 
> I d'found useful to be able to import/export from postgres to those
> modern data
> formats:
> - avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html
> <https://avro.apache.org/docs/1.8.2/api/c/index.html>)
> - parquet (c++ writer=https://github.com/apache/parquet-cpp
> <https://github.com/apache/parquet-cpp>)
> - orc (all writers=https://github.com/apache/orc
> <https://github.com/apache/orc>)
> 
> Something like :
> COPY table TO STDOUT ORC;
> 
> Would be lovely.
> 
> This would greatly enhance how postgres integrates in big-data
> ecosystem.
> 
>  
> It would be written "... with (format 'orc')" and your best bet would be
> to create an extension.  I don't think that having such code in core (or
> contrib) is desirable.
> 

I don't think make this extensible by an extension (i.e. the formats
supported by COPY are hard-coded in core). But I agree that if we are to
add multiple new formats, it'd be nice to allow doing that in extension.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ERROR: invalid memory alloc request size 1073741824

2018-01-31 Thread Tomas Vondra



On 01/31/2018 09:51 PM, Jan Wieck wrote:
> 
> 
> On Wed, Jan 31, 2018 at 12:32 PM, Stefan Blanke
> <stefan.bla...@framestore.com <mailto:stefan.bla...@framestore.com>> wrote:
> 
> >
> > I'll bet you it's not that. It's quite unlikely that would fail with
> > exactly 1GB request size. It seems much more like a buffer that we keep
> > to be power of 2. The question is which one.
> 
> I had dismissed corruption before writing in. It's exactly 1GB every
> time this has happened - and we can dump the full dataset
> periodically without issue.
> 
> >> I have my money on a corrupted TOAST entry. Is this happening on
> >> trustworthy hardware or beige box with no ECC or RAID?
> 
> It's good quality commercial hardware in our colo - no exactly sure
> what.
> 
> 
> If it is a sporadic issue and you can dump the full dataset, then I just
> lost my money (Tomas, you coming to PGConf in Jersey City?).
> 

Unfortunately no, but I'm sure there will be other opportunities to buy
me a beer ;-) Like pgcon, for example.

>
> But then, if this is a plain COPY to stdout ... I am wondering what
> could possibly be in that path that wants to allocate 1GB. Or is this
> not so plain but rather a COPY ... SELECT ...?
> 

That's what I've been guessing, and why I was asking for a query plan.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Issue with WAL logs temporary not replaying

2018-01-31 Thread Tomas Vondra

 to tables, and yet
> keep silence? There was no indication WAL is not being applied,
> postgres log would be silent (even though it kept writing other
> commodity messages, such as "incomplete startup packet" and
> malformed query attempts).
> 
> Due to this, we noticed the issue only hours after postgres
> supposedly stopped applying the WAL logs to tables.
> 

It wouldn't. But it's unclear how you came to that conclusion.

>   * Is there any other log (like some lower level system log?) where I
> could see the status of writing / reading / applying WAL segments?
> 

There's only one log for PostgreSQL, so it would have to be in that file
depending on the log level.

>   * What measures could be done to prevent this in future, or at least
> make the situation easier to recover from (e.g. without data loss)?
> 

The jury is still out on the root cause, so it's too soon to say.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: EXPLAIN BUFFERS: dirtied

2018-01-29 Thread Tomas Vondra



On 01/29/2018 08:21 PM, Vitaliy Garnashevich wrote:
> I've read this article: https://wiki.postgresql.org/wiki/Hint_Bits
> 
> It says:
> 
>> A plain SELECT, count(*), or VACUUM on the entire table will check
>> every tuple for visibility and set its hint bits. 
> 
> Suppose, a new page was created using many INSERTs, and then was written
> to disk during a checkpoint. There were no SELECTs or VACUUM on the page
> or table yet. Will the following SELECT of one tuple from the page
> update hint bits for ALL tuples on the page? Is that correct?
> 

Possibly, if there are no old transactions running.

> When a page is initially created and then is being written to disk
> during a checkpoint, does checkpoint writer update the hint bits before
> writing the page, or the following SELECT/VACUUM will have to do that
> (possibly loading/updating/writing the page again)?
> 

Checkpoint only deals with 8kB chunks of data. Hint bits are not set on
a page, but on individual items (rows), so it's not up to the checkpoint
process to tweak that - that's up to queries accessing the data.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Hardware advice

2018-01-23 Thread Tomas Vondra

Hi,

On 01/22/2018 09:46 PM, Alban Hertroys wrote:
> Hi all,
> 
> At work we are in the process of setting up a data-warehouse using PG
> 10. I'm looking for a suitable server, but I hardly know anything
> about server-grade hardware.
> 
> Hence, we're looking for some advice, preferably with some
> opportunity to discuss our situation and possibly things we did not
> take into account etc. A face to talk to would be appreciated. Who
> provides that in or near the eastern border of the Netherlands?
> 

Coincidentally, there's a big conference (FOSDEM) in that area next
week, with a smaller PostgreSQL conference (FOSDEM PgDay) the day
before. Might be a good opportunity to talk to PostgreSQL people.

If you're looking for an actual consulting, there are multiple companies
that might help you with this (including ours).

> More details:
> 
> We're planning to deploy on bare-metal hardware, with a fallback
> server with similar or lesser specs for emergencies and upgrades and
> perhaps some (read-only) load balancing of different kinds of loads.
> 
> The server will be accessed for reporting and ETL (or ELT) mostly.
> Both reporting servers (test/devel and production) are configured for
> at most 40 agents, so that's max 40 connections each to the warehouse
> for now. So far, we haven't reached that number in real loads, but
> reports are requested ~40,000 times a month (we measure HTTP requests
> minus static content).
> 
> We will also be doing ETL of (very) remote (SAP) tables to the
> warehouse server; in what we got so far in our limited test
> environment we have tables of over 30GB, most of which is from the
> last 4 to 5 years.
> 

That's nice, but it does not really tell us how much work that means for
the database :-( Those queries might be touching tiny subset of the
data, or it might touch the whole data set. That will have significant
impact on the hardware requirements.

> The biggy though is that we also plan to store factory process
> measurements on this server (temperatures, pressures, etc. at 5s
> intervals).

So, time series data. I wonder if timescale [1] would be appropriate
here (never used it, but seems to be designed for exactly this use
case). And built on PostgreSQL.

[1] http://www.timescale.com/

> Part of one factory has already been writing that data to
> a different server, but that's already 4.3 billion records (140GB)
> for about a year of measuring and that's not even half of the
> factory. We will be required to retain 10-15 years of data from
> several factories (on the short term, at least 2). The expectancy is
> that this will grow to ~15TB for our factory alone.
> 

Storing this amounts of data is not that difficult - the DL360 machines
can handle 40TB+ for example. The question is how intensive the data
access and processing will be, so that you can pick the right storage
configuration, size the amount of RAM etc.

FWIW it makes no sense to size this for 10-15 years from the get go,
because (a) you won't get it right anyway (things change over time), and
(b) you're unlikely to keep the same hardware for 10+ years.

Get smaller but more powerful hardware, plan to replace it in a couple
of years with never machines.

> We also want to keep our options for growth of this data warehouse
> open. There are some lab databases, for example, that currently exist
> as two separate brand database servers (with different major versions
> of the lab software, so there are design differences as well), that
> aren't exactly small either.
> 
> I have been drooling over those shiny new AMD Epyc processors, which
> look certainly adequate with a truckload of memory and a good RAID-10
> array and some SSD(s) for the WAL, but it's really hard to figure out
> how many cores and memory we need. Sure, 2 7601's at 64 cores and 4TB
> of memory (if that's even for sale) would probably do the trick, but
> even I think that might be going a little overboard ;)
> 

Well, you can either find out what your actual needs are (by looking at
the current system and extrapolating it in some way) and sizing the
hardware accordingly. Or you can squeeze as much money from the
management as possible, and buying the shiniest stuff out possible.

> Oh yeah, apparently we're married to HP or something… At least, IT
> management told me to look at their offerings.
> 

I'd say Proliant machines are pretty solid.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: postgresql 9.5 has ocuuered OOM

2017-12-20 Thread Tomas Vondra

On 12/20/2017 04:08 PM, mark wrote:
> I have set shared_buffers is 1/4 of memory.
> work_mem is 2% of memory.
> max_connections is 50.

That means if you have all 50 connections active, they may easily
consume 100% of memory, because 50 * 2 is 100. It's even easier if the
connections are executing complex queries, because each query may use
multiple work_mem buffers. So 2% seems a bit too high.

> momery size is 16GB.
> postgresql process used over 70% of memory and occuered OOM.

So, did a single process use 70% of memory, or all postgres processes
combined?

If just a single process, it might be a poor plan choice (e.g. hash
aggregate may easily cause that).

If all processes combined, then perhaps it's due to work_mem being too high.

> what should I do to deal with this problem?
> 

Hard to say, until you provide enough information.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Size of pg_multixact/members increases 11355

2017-12-13 Thread Tomas Vondra

On 12/13/2017 01:42 PM, Yogesh Sharma wrote:
> Dear Thomas ,
> 
> Thanks for sharing information.
> 
> Is it possible to remove mentioned folder files in some time intervals
> by some DB command?
> Currently i can not upgrade to 3.6.20.
> So please share if any solution is available.
> 

There are no explicit commands to remove them. When the database decides
those files are not needed, it'll remove them automatically.

It's strongly discouraged to mess with those files directly, as it may
easily lead to data loss or data corruption, particularly when it's
unclear why the directory got so large.

You could try setting autovacuum_multixact_freeze_max_age=0 and
vacuuming the database, but it's hard to say if that will help.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: large numbers of inserts out of memory strategy

2017-11-28 Thread Tomas Vondra

On 11/28/2017 07:26 PM, Ted Toth wrote:
> On Tue, Nov 28, 2017 at 12:01 PM, Tomas Vondra
> <tomas.von...@2ndquadrant.com> wrote:
>>
>> ...
>>
>> That is, most of the memory is allocated for SPI (2.4GB) and PL/pgSQL
>> procedure (500MB). How do you do the load? What libraries/drivers?
>>
> 
> I'm doing the load with 'psql -f'. I using 9.6 el6 rpms on a Centos VM
> I downloaded from the postgres repo.
> 

So what does the script actually do? Because psql certainly is not
running pl/pgsql procedures on it's own. We need to understand why
you're getting OOM in the first place - just inserts alone should not
cause failures like that. Please show us more detailed explanation of
what the load actually does, so that we can try reproducing it.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: large numbers of inserts out of memory strategy

2017-11-28 Thread Tomas Vondra

Hi,

On 11/28/2017 06:17 PM, Ted Toth wrote:
> I'm writing a migration utility to move data from non-rdbms data
> source to a postgres db. Currently I'm generating SQL INSERT
> statements involving 6 related tables for each 'thing'. With 100k or
> more 'things' to migrate I'm generating a lot of statements and when I
> try to import using psql postgres fails with 'out of memory' when
> running on a Linux VM with 4G of memory. If I break into smaller
> chunks say ~50K statements then thde import succeeds. I can change my
> migration utility to generate multiple files each with a limited
> number of INSERTs to get around this issue but maybe there's
> another/better way?
> 

The question is what exactly runs out of memory, and how did you modify
the configuration (particularly related to memory).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: query causes connection termination

2017-11-22 Thread Tomas Vondra


On 11/23/2017 01:25 AM, Neto pr wrote:
> Another fact is that when executing the query without the command
> EXPLAIN ANALYZE, the result is usually returned after a few minutes.
> I do not understand, because when using the EXPLAIN ANALYZE command the
> dbms closes the connection. 
> Anyone have any tips on why this occurs?
> 

Attach gdb to the backend, run the query and when it fails get us the
backtrace. So something like

1) select pg_backend_pid()

2) gdb -p $PID

3) run the EXPLAIN ANALYZE again

4) watch the gdb session, when it fails do 'bt'

You need to install debuginfo first, so that the backtrace makes sense.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

95 matches

Mail list logo