Re: [PERFORM] postgresql tuning with perf
On 10/23/2017 09:19 PM, Purav Chovatia wrote: > Hello Experts, > > We are trying to tune our postgresql DB using perf. Can you share some of the perf reports, then? > We are running a C program that connects to postgres DB and calls > very simple StoredProcs, one each for SELECT, INSERT & UPDATE. > > The SPs are very simple. > *SELECT_SP*: > CREATE OR REPLACE PROCEDURE query_dept_new(p1 IN numeric, p2 OUT > numeric,p3 OUT numeric,...,p205 OUT numeric) AS > BEGIN > SELECT c2,c3,..,c205 > INTO p2,p3,...,p205 > FROM dept_new > WHERE c1 = p1; > END; > > *UPDATE_SP*: > CREATE OR REPLACE PROCEDURE query_dept_update(p1 IN numeric, p2 IN > numeric,,p205 IN numeric) AS > BEGIN > update dept_new set c2 = p2,c3 = p3,.,c205 = p205 > WHERE c1 = p1; > commit; > END; > > *INSERT_SP*: > CREATE OR REPLACE PROCEDURE query_dept_insert(p1 IN numeric, p2 IN > numeric,.,p205 IN numeric) AS > BEGIN > insert into dept_new values(p1,p2,.,p205); > commit; > END; > > As shown above, its all on a single table. Before every test, the table > is truncated and loaded with 1m rows. WAL is on a separate disk. > It'd be nice if you could share more details about the structure of the table, hardware and observed metrics (throughput, ...). Otherwise we can't try reproducing it, for example. > Its about 3x slower as compared to Oracle and major events are WAL > related. With fsync=off or sync_commit=off it gets 10% better but still > far from Oracle. Vacuuming the table does not help. Checkpoint too is > not an issue. So how do you know the major events are WAL related? Can you share how you measure that and the measurements? > > Since we dont see any other way to find out what is slowing it down, we > gathered data using the perf tool. Can somebody pls help on how do we go > about reading the perf report. Well, that's hard to do when you haven't shared the report. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Row level security policy policy versus SQL constraints. Any performance difference?
Hi, On 10/17/2017 10:44 PM, Joe Carlson wrote: > Hello. > > I have not used row level security policies in the past but am > considering using them for a project in which I would like to restrict > the set returned in a query based on specific fields. This is more as a > convenience issue (for me) rather than a security issue. > > What I was wondering is what is the performance differences between a > row level security implementation: > > CREATE POLICY ON TO USING > (=ANY()); > > DROP POLICY > > and an implementation where I add on the constraints as part of each > select statement: > > SELECT FROM WHERE AND > =ANY() > > In my (admittedly small) number of EXPLAINs I've looked at, it appears > that the policy logic is added to the SELECT statement as a constraint. > So I would not expect any fundamental performance difference in the 2 > different forms. > > Is this true? Or is there some extra behind-the-scenes things to be > aware of? Can there be excessive overhead from the CREATE/DROP POLICY > statements? > The main point of the RLS is enforcing an order in which the conditions are evaluated. That is, the "security" quals (coming from RLS policies) have to be evaluated first, before any quals that might leak information about the values (imagine a custom PL/pgSQL function inserting the data somewhere, or perhaps just printing debug messages). (Many built-in operators are however exempt from that, as we consider them leak-proof. This allows us to use non-RLS conditions for index scans etc. which might be impossible otherwise) Otherwise yes - it's pretty much the same as if you combine the conditions using AND. It's "just" much more convenient approach. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] blocking index creation
On 10/11/2017 04:11 PM, Neto pr wrote: > > 2017-10-11 10:46 GMT-03:00 Laurenz Albe <laurenz.a...@cybertec.at > <mailto:laurenz.a...@cybertec.at>>: > > Neto pr wrote: > > When creating index on table of approximately 10GB of data, the DBMS > hangs (I think), > > because even after waiting 10 hours there was no return of the command. > > It happened by creating Hash indexes and B + tree indexes. > > However, for some columns, it was successfully (L_RETURNFLAG, > L_PARTKEY). > > > If someone has a hint how to speed up index creation so that it > completes successfully. > > Look if CREATE INDEX is running or waiting for a lock (check the > "pg_locks" table, see if the backend consumes CPU time). > > > In this moment now, there is an index being created in the Lineitem > table (+ - 10 Gb), and apparently it is locked, since it started 7 hours > ago. > I've looked at the pg_locks table and look at the result, it's with > "ShareLock" lock mode. > Is this blocking correct? or should it be another type? > Yes, CREATE INDEX acquire SHARE lock, see https://www.postgresql.org/docs/9.1/static/explicit-locking.html > Before creating the index, should I set the type of transaction lock? What? Eeee? Not sure I understand. The command acquires all necessary locks automatically. > --- > SELECT > L.mode, c.relname, locktype, l.GRANTED, l.transactionid, > virtualtransaction > FROM pg_locks l, pg_class c > where c.oid = l.relation > > -- RESULT > -- > AccessShareLock pg_class_tblspc_relfilenode_index relation > TRUE > (null)3/71 > AccessShareLock pg_class_relname_nsp_index relationTRUE > (null) 3/71 > AccessShareLock pg_class_oid_index relationTRUE(null) > 3/71 > AccessShareLock pg_classrelationTRUE(null) 3/71 > AccessShareLock pg_locksrelationTRUE(null) 3/71 > ShareLock lineitemrelationTRUE(null) 21/3769 > > Well, we see something is holding a SHARE lock on the "lineitem" table, but we don't really know what the session is doing. There's a PID in the pg_locks table, you can use it to lookup the session in pg_stat_activity which includes the query (and also "state" column that will tell you if it's active or waiting for a lock. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] https://stackoverflow.com/questions/28844170/how-to-limit-the-memory-that-is-available-for-postgressql-server
On 09/19/2017 02:49 AM, 園田祥平 wrote: > Hi experts, > > For an academic experiment I need to *restrict the total amount of > memory that is available for a pgSQL server* to compute a given set of > queries. > > I know that I can do this through |postgressql.conf|file, where I can > adjust some parameters related with Resource Management. > > The problem is that: it's not clear for me--given the several parameters > available on the config file--which is the parameter that I should change. > > When I first opened the config file I'm expecting someting like > this: |max_server_memmory|. Instead I found a lot > of: |shared_buffers|, |temp_buffers|, |work_mem|, and so on... > > Given that, I've consulted pgSQL docs. on Resource Consumption > <http://www.postgresql.org/docs/9.3/static/runtime-config-resource.html> and > I come up with the |shared_buffers| as the best candidate for what I'm > looking for: *the parameter that restricts the total amount of memory > that a pgSQL server can use to perform its computation*. But I'm not > completely sure about this. > > Can you guys give me some insight about which parameters should I adjust > to restrict the pgSQL server's memory, please? > The short answer is "You can't do that from within PostgreSQL alone." You can define size of some memory buffers, but not some hard total limit. One reason is that queries may use multiple work_mem buffers, we don't know how much memory the other queries are consuming, etc. We also don't have any control over page cache, for example. If you really need to do that, you'll need to do that at the OS level, e.g. by specifying "mem=X" kernel parameter, at the VM level, or something like that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query of partitioned object doesnt use index in qa
On 09/16/2017 12:05 AM, Tom Lane wrote: > Andres Freund <and...@anarazel.de> writes: >> I'm wondering if we should extend the new CREATE STATISTICS >> framework to be able to do that without requiring an index. > > I think that's already on the roadmap --- it's one of the reasons we > ended up with a SELECT-like syntax for CREATE STATISTICS. But it > didn't get done for v10. > Right. It's one of the things I'd like to be working on after getting in the more complex statistics types (MCV & histograms). > If we do look at that as a substitute for "make an expression index > just so you get some stats", it would be good to have a way to > specify that you only want the standard ANALYZE stats on that value > and not the extended ones. > Not sure I understand what you mean by "extended" - the statistics we collect for expression indexes, or the CREATE STATISTICS stuff? I assume the former, because if you don't want the latter then just don't create the statistics. Or am I missing something? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Execution plan analysis
Hi, So looking at the plans, essentially the only part that is different is the scan node at the very bottom - in one case it's a sequential scan, in the other case (the slow one) it's the bitmap index scan. Essentially it's this: -> Seq Scan on lineitem (cost=0.00..2624738.17 ...) (actual time=0.839..74391.087 ...) vs. this: -> Bitmap Heap Scan on lineitem (cost=336295.10..1970056.39 ...) (actual time=419620.817..509685.421 ...) -> Bitmap Index Scan on idx_l_shipmodelineitem000 (cost=0.00..336227.49 ...) (actual time=419437.172..419437.172 ...) All the nodes are the same and perform about the same in both cases, so you can ignore them. This difference it the the root cause you need to investigate. The question is why is the sequential scan so much faster than bitmap index scan? Ideally, the bitmap heap scan should scan the index (in a mostly sequential way), build a bitmap, and then read just the matching part of the table (sequentially, by skipping some of the pages). Now, there are a few reasons why this might not work that well. Perhaps the table fits into RAM, but table + index does not. That would make the sequential scan much faster than the index path. Not sure if this is the case, as you haven't mentioned which TPC-H scale are you testing, but you only have 4GB of RAM which if fairly low. Another bit is prefetching - with sequential scans, the OS is able to prefetch the next bit of data automatically (read-ahead). With bitmap index scans that's not the case, producing a lot of individual synchronous I/O requests. See if increasing effective_cache_size (from default 1 to 16 or 32) helps. Try generating the plans with EXPLAIN (ANALYZE, BUFFERS), that should tell us more about how many blocks are found in shared buffers, etc. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query runs for more than 24 hours!
On 08/22/2017 04:23 PM, Mariel Cherkassky wrote: Hi, I have a query that I run in my postgresql 9.6 database and it runs for more than 24 hours and doesnt finish. My select consist from few joins : I'm sorry, but the query and plans are completely broken (wrapped in funny ways, missing important bits. ...) I don't know what client you use or how that happened, but I recommend attaching the information as text files instead of pasting it into the message directly. Regarding the query analysis - we can't really help you much without seeing an explain analyze (that is, not just the plan and estimates, but actual performance and row counts). That usually identifies the query operations (scans, join, ...) causing issues. Of course, if the query is already running for 24h and you don't know how much longer it will take to complete, running EXPLAIN ANALYZE on it is not very practical. The best thing you can do is break the query into smaller parts and debugging that - start with one table, and then add tables/conditions until the performance gets bad. Hopefully the explain analyze on that will complete in reasonable time. Of course, you haven't told us anything about what's happening on the machine. It is reading a lot of data from the disks? Random or sequential? Is it writing a lot of data into temporary files? Is it consuming a lot of CPU? And so on. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Simple SQL too slow
On 07/02/2017 03:26 AM, Daviramos Roussenq Fortunato wrote: I am using pgAdmin for SQL test. Linux: EXPLAIN ANALYZE select * from "Seq Scan on lancamentosteste (cost=0.00..49289.74 rows=1883774 width=92) (actual time=0.016..1194.453 rows=1883699 loops=1)" "Total runtime: 2139.067 ms" Windows: "Seq Scan on lancamentosteste (cost=0.00..49288.67 rows=1883967 width=92) (actual time=0.036..745.409 rows=1883699 loops=1)" "Total runtime: 797.159 ms" I'm really, really confused. In the first message you claimed the queries take 7 and 3 minutes, yet here we see the queries taking just a few seconds. I did some test reading the disk and monitored with iotop. #hdparm -t /dev/sdc /dev/sdc: Timing buffered disk reads: 730 MB in 3.01 seconds = 242.65 MB/sec #hdparm -T /dev/sdc /dev/sdc: Timing cached reads: 9392 MB in 2.00 seconds = 4706.06 MB/sec #time sh -c "dd if=/dev/zero of=ddfile bs=8k count=25 && sync"; rm ddfile 25+0 registros de entrada 25+0 registros de saÃda 204800 bytes (2,0 GB) copiados, 5,84926 s, 350 MB/s real0m9.488s user0m0.068s sys 0m5.488s In the tests monitoring the disk by iotop, it kept constant the reading between 100MB/s to 350MB/s By doing the same monitoring on iotop and running SELECT, the disk reading does not exceed 100kb/s, I have the impression that some configuration of LINUX or Postgres is limiting the use of the total capacity of DISCO. Does anyone know if there is any setting for this? There is no such setting. But it's possible that the network is very slow, so transferring the results from the server to the client takes very long. Or that formatting the results in the client takes a lot of time (I'm not sure why there'd be a difference between Windows and Linux though). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Simple SQL too slow
Hello, On 07/01/2017 10:39 PM, Daviramos Roussenq Fortunato wrote: What tests could I do. Rigid Linux disk is much faster than Windows, I should get a much better perfomace on this Linux. What test battery do you recommend I do? I'm not sure what you mean by "rigid disk" or "test battery", but I agree with Andreas that clearly there's something wrong at the system level. It's hard to guess what exactly, but sequential scan on 250MB table (computed the relpages values) should only take a few seconds on any decent hardware, and not 3 or 7 minutes. The first thing I would do is running basic system-level tests, for example benchmarking storage using fio. After that, you need to determine what is the bottleneck. Perhaps the resources are saturated by something else running on the system - other queries, maybe something else running next to PostgreSQL. Look at top and iotop while running the queries, and other system tools. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: FW: Re: [PERFORM] Query is running very slow......
On 5/25/17 2:26 PM, Dinesh Chandra 12108 wrote: Hi Tomas, Please find the below input for slow query. (a) something about the hardware it's running on RAM-->64 GB, CPU->40core (b) amounts of data in the tables / databases Database size :32GB - Tables size - Workflow.project: 8194 byte workflow.tool_performance :175 MB workflow.evidence_to_do :580 MB (c) EXPLAIN or even better EXPLAIN ANALYZE of the query "GroupAggregate (cost=16583736169.63..18157894828.18 rows=5920110 width=69)" " -> Sort (cost=16583736169.63..16714893857.43 rows=52463075120 width=69)" "Sort Key: tool_performance.project_id, project.project_name, tool_performance.step_id, (date_trunc('day'::text, tool_performance.insert_time)), tool_performance.user_id" "-> Nested Loop (cost=2.42..787115179.07 rows=52463075120 width=69)" " -> Seq Scan on evidence_to_do (cost=0.00..119443.95 rows=558296 width=0)" "Filter: (status_id = ANY ('{15100,15150,15200,15300,15400,15500}'::bigint[]))" " -> Materialize (cost=2.42..49843.24 rows=93970 width=69)" "-> Hash Join (cost=2.42..49373.39 rows=93970 width=69)" " Hash Cond: (tool_performance.project_id = project.project_id)" " -> Seq Scan on tool_performance (cost=0.00..48078.88 rows=93970 width=39)" "Filter: ((insert_time > '2017-05-01 00:00:00+05:30'::timestamp with time zone) AND (insert_time < '2017-05-02 00:00:00+05:30'::timestamp with time zone))" " -> Hash (cost=1.63..1.63 rows=63 width=38)" "-> Seq Scan on project (cost=0.00..1.63 rows=63 width=38)" Are you sure this is the same query? The query you posted includes there two join conditions: evidence_to_do.project_id = tool_performance.project_id evidence_to_do.project_id = project.project_id But the plan only seems to enforce the equality between 'project' and 'tool_performance'. So when joining the evidence_to_do, it performs a cartesian product, producing ~52B rows (estimated). That can't be fast. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query is running very slow......
Hi, On 5/24/17 7:04 PM, Dinesh Chandra 12108 wrote: Dear Expert, While executing the blow query, its taking too long time to fetch output. Could you please help to fine tune the same? You'll have to provide far more details - the query alone is certainly not enough for anyone to guess why it's slow. Perhaps look at this: https://wiki.postgresql.org/wiki/Slow_Query_Questions In particular, you'll have to tell us (a) something about the hardware it's running on (b) amounts of data in the tables / databases (c) EXPLAIN or even better EXPLAIN ANALYZE of the query (d) configuration of the database (work_mem, shared_buffers etc.) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 04/28/2017 01:34 AM, Andres Freund wrote: On 2017-04-28 01:29:14 +0200, Tomas Vondra wrote: I can confirm this observation. I bought the Intel 750 NVMe SSD last year, the device has 1GB DDR3 cache on it (power-loss protected), can do ~1GB/s of sustained O_DIRECT sequential writes. But when running pgbench, I can't push more than ~300MB/s of WAL to it, no matter what I do because of WALWriteLock. Hm, interesting. Even if you up wal_buffers to 128MB, use synchronous_commit = off, and play with wal_writer_delay/flush_after? I think I've tried things like that, but let me do some proper testing. I'll report the numbers in a few days. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 04/27/2017 07:35 PM, Andres Freund wrote: On 2017-04-27 10:29:48 -0700, Joshua D. Drake wrote: On 04/27/2017 09:34 AM, Andres Freund wrote: On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote: On 04/27/2017 08:59 AM, Andres Freund wrote: I would agree it isn't yet a widespread issue. I'm not yet sure about that actually. I suspect a large percentage of people with such workloads aren't lingering lots on the lists. That would probably be true. I was thinking of it more as the "most new users are in the cloud" and the "cloud" is going to be rare that a cloud user is going to be able to hit that level of writes. (at least not without spending LOTS of money) You can get pretty decent NVMe SSD drives on serveral cloud providers these days, without immediately bancrupting you. Sure, it's instance storage, but with a decent replication and archival setup, that's not necessarily an issue. It's not that hard to get to the point where postgres can't keep up with storage, at least for some workloads. I can confirm this observation. I bought the Intel 750 NVMe SSD last year, the device has 1GB DDR3 cache on it (power-loss protected), can do ~1GB/s of sustained O_DIRECT sequential writes. But when running pgbench, I can't push more than ~300MB/s of WAL to it, no matter what I do because of WALWriteLock. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 04/27/2017 06:34 PM, Andres Freund wrote: On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote: On 04/27/2017 08:59 AM, Andres Freund wrote: Ok, based on the, few, answers I've got so far, my experience is indeed skewed. A number of the PG users I interacted with over the last couple years had WAL write ranges somewhere in the range of 500MB/s to 2.2GB/s (max I'veseen). At that point WAL insertion became a major bottleneck, even if storage was more than fast enough to keep up. To address these we'd need some changes, but the feedback so far suggest that it's not yet a widespread issue... I would agree it isn't yet a widespread issue. I'm not yet sure about that actually. I suspect a large percentage of people with such workloads aren't lingering lots on the lists. To a certain extent, this is a self-fulfilling prophecy. If you know you'll have such a busy system, you probably do some research and testing first, before choosing the database. If we don't perform well enough, you pick something else. Which removes the data point. Obviously, there are systems that start small and get busier and busier over time. And those are the ones we see. cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
Hi, On 04/25/2017 06:17 AM, Andres Freund wrote: Hi, I've lately seen more and more installations where the generation of write-ahead-log (WAL) is one of the primary bottlenecks. I'm curious whether that's primarily a "sampling error" of mine, or whether that's indeed more common. I see those cases too. To some degree it's a sampling bias. People generally don't call us to look at the 99% of systems that perform fine, so we tend to see the the 1% of systems under pressure. That doesn't make that observation irrelevant, though. Those demanding systems are one of the things that pushes us forward. > The primary reason I'm curious is that I'm pondering a few potential optimizations, and would like to have some guidance which are more and which are less important. I think any optimization you do will improve at least some of those busy systems. Questions (answer as many you can comfortably answer): - How many MB/s, segments/s do you see on busier servers? That depends on the cause (see the next point). - What generates the bulk of WAL on your servers (9.5+ can use pg_xlogdump --stats to compute that)? a) systems doing large batches - bulk loads/updates/deletes, one or few sessions doing a lot - easily high hundreds of MB/s (on a separate device) b) OLTP systems doing a lot of tiny/small transactions - many concurrent sessions - often end up much more limited by WAL, due to locking etc. - often the trouble is random updates all over the place, causing amplification due to FPI (PK on UUID is a great way to cause this unnecessarily even on apps with naturally tiny working set) - Are you seeing WAL writes being a bottleneck?OA On the write-intensive systems, yes. Often the CPUs are waiting for WAL I/O to complete during COMMIT. - What kind of backup methods are you using and is the WAL volume a problem? The large and busy systems can easily produce so much WAL, that the basebackup is not the largest part of the backup. That is somewhat solvable by using other means of obtaining the basebackup snapshot (e.g. by taking some sort of filesystem / SAN / ... snapshot). That reduces the amount of WAL needed to make the basebackup consistent, but it doesn't solve the WAL archiving issue. - What kind of replication are you using and is the WAL volume a problem? Generally streaming replication, and yes, the amount of WAL may be an issue, partially because the standby is a single-process thing. And as it has to process something generated by N sessions on the primary, that can't end well. Interestingly enough, FPIs can actually make it way faster, because the standby does not need to read the data from disk during recovery. - What are your settings for wal_compression, max_wal_size (9.5+) / checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers? I'd say the usual practice is to tune for timed checkpoints, say 30+ minutes apart (or more). wal_compression is typically 'off' (i.e. the default value). - Could you quickly describe your workload? Pretty much a little bit of everything, depending on the customer. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Delete, foreign key, index usage
On 04/25/2017 08:28 AM, Johann Spies wrote: On 24 April 2017 at 15:17, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: On 04/24/2017 08:48 AM, Johann Spies wrote: Why would the planner prefer the use the gin index and not the btree index in this case? You'll need to show what queries are you running - that's a quite important piece of information, and I don't see it anywhere in this thread. Seeing explain plans would also be helpful. It is a simple "delete from wos_2017_1.article;" which causes a domino effect deletes due to foreign keys. In the case of one table with more than 50 million records where the primary key was also the foreign key, the process only started to use the index when we built a gin index. In the case of the "belongs_to" table (shown in my first email) we first built a btree index on the foreign key - and it was ignored. Only after the gin index was created did it use the index. Regards. Johann Wouldn't it be easier to simply show the queries (with the exact condition) and the associated explain plans? I understand you're doing your best to explain what's happening, but the explain plans contain a lot of information that you might have missed. I suppose you actually did explain analyze to verify the query was not using the btree index and then started using the gin index. Or how did you verify that? Also, which PostgreSQL version have you observed this on? I see you've mentioned 9.6 when talking about parallel scans, but I suppose the issue was originally observed on some older version. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Delete, foreign key, index usage
On 04/24/2017 08:48 AM, Johann Spies wrote: Why would the planner prefer the use the gin index and not the btree index in this case? You'll need to show what queries are you running - that's a quite important piece of information, and I don't see it anywhere in this thread. Seeing explain plans would also be helpful. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Number of characters in column preventing index usage
On 02/17/2017 11:42 PM, David G. Johnston wrote: On Fri, Feb 17, 2017 at 3:19 PM, Hustler DBA <hustler...@gmail.com <mailto:hustler...@gmail.com>>wrote: my_db=# create index tab_idx1 on tab(ID); CREATE INDEX my_db=# explain (analyze, buffers) select count(*) from tab where ID = '01625cfa-2bf8-45cf' ; QUERY PLAN --- Aggregate (cost=8.29..8.30 rows=1 width=0) (actual time=0.048..0.048 rows=1 loops=1) Buffers: shared read=2 -> Index Only Scan using tab_idx1 on tab (cost=0.27..8.29 rows=1 width=0) (actual time=0.043..0.043 rows=0 loops=1) Index Cond: (ID = '01625cfa-2bf8-45cf'::text) -> Seq Scan on tab (cost=0.00..14.79 rows=5 width=0) (actual time=0.031..0.108 rows=5 loops=1) Filter: ((ID)::text = '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea'::text) Rows Removed by Filter: 218 Buffers: shared hit=12 Planning time: 0.122 ms Execution time: 0.180 ms (8 rows) IIRC the only reason the first query cares to use the index is because it can perform an Index Only Scan and thus avoid touching the heap at all. If it cannot avoid touching the heap the planner is going to just use a sequential scan to retrieve the records directly from the heap and save the index lookup step. I don't follow - the queries are exactly the same in both cases, except the parameter value. So both cases are eligible for index only scan. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Number of characters in column preventing index usage
Hi, On 02/17/2017 11:19 PM, Hustler DBA wrote: I am seeing this strange behavior, I don't know if this is by design by Postgres. I have an index on a column which is defined as "character varying(255)". When the value I am searching for is of a certain length, the optimizer uses the index but when the value is long, the optimizer doesn't use the index but does a seq scan on the table. Is this by design? How can I make the optimizer use the index no matter what the size/length of the value being searched for? AFAIK there are no such checks, i.e. the optimizer does not consider the length of the value when deciding between scan types. PostgreSQL version: 9.4 That's good to know, but we also need information about the table involved in your queries. I'd bet the table is tiny (it seems to be just 12 pages, so ~100kB), making the indexes rather useless. my_db=# explain (analyze, buffers) select count(*) from tab where ID = '01625cfa-2bf8-45cf' ; QUERY PLAN --- Aggregate (cost=8.29..8.30 rows=1 width=0) (actual time=0.048..0.048 rows=1 loops=1) Buffers: shared read=2 -> Index Only Scan using tab_idx1 on tab (cost=0.27..8.29 rows=1 width=0) (actual time=0.043..0.043 rows=0 loops=1) Index Cond: (ID = '01625cfa-2bf8-45cf'::text) Heap Fetches: 0 Buffers: shared read=2 Planning time: 0.250 ms Execution time: 0.096 ms (8 rows) my_db=# explain (analyze, buffers) select count(*) from tab where ID = '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea' ; QUERY PLAN --- Aggregate (cost=14.80..14.81 rows=1 width=0) (actual time=0.115..0.115 rows=1 loops=1) Buffers: shared hit=12 -> Seq Scan on tab (cost=0.00..14.79 rows=5 width=0) (actual time=0.031..0.108 rows=5 loops=1) Filter: ((ID)::text = '01625cfa-2bf8-45cf-bf4c-aa5f3c6fa8ea'::text) Rows Removed by Filter: 218 Buffers: shared hit=12 Planning time: 0.122 ms Execution time: 0.180 ms (8 rows) The only difference I see is that for the long value the planner expects 5 rows, while for the short one it expects 1 row. That may seem a bit strange, but I'd bet it finds the short value in some statistic (MCV, histogram) ans so can provide very accurate estimate. While for the longer one, it ends up using some default (0.5% for equality IIRC) or value deduced from ndistinct. Or something like that. The differences between the two plans are rather negligible, both in terms of costs (8.3 vs. 14.81) and runtime (0.1 vs 0.2 ms). The choice of a sequential scan seems perfectly reasonable for such tiny tables. FWIW it's impossible to draw conclusions based on two EXPLAIN ANALYZE executions. The timing instrumentation from EXPLAIN ANALYZE may have significant impact impact (different for each plan!). You also need to testing with more values and longer runs, not just a single execution (there are caching effects etc.) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Querying with multicolumn index
Hi, On 12/10/2016 12:51 AM, Tom Lane wrote: Eric Jiang <e...@doublemap.com> writes: I have a query that I *think* should use a multicolumn index, but sometimes isn't, resulting in slow queries. I tried to duplicate this behavior, without success. Are you running with nondefault planner parameters? My guess is this is a case of LIMIT the matching rows are uniformly distributed in the input data. The planner likely concludes that for a driver with a lot of data we'll find the first row using ix_updates_time very quickly, and that it will be cheaper than inspecting the larger multi-column index. But imagine a driver with a lots of data long time ago. That breaks the LIMIT fairly quickly. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Perf decreased although server is better
On 11/02/2016 02:26 PM, Benjamin Toueg wrote: Hi everyone, I'm facing a peformance decrease after switching to a more performant VPS : http://serverfault.com/questions/812702/posgres-perf-decreased-although-server-is-better Well, changing so many things at once (CPU, RAM, storage, Ubuntu version, probably kernel version, PostgreSQL version) is a bad idea, exactly because it makes investigating regressions more complicated. My questions are: 1. What benchmark should I perform before switching to a new server? Three types of benchmarks, in this order: 1) system-level benchmarks to test various resources (fio to test disks, etc.) 2) general-purpose PostgreSQL benchmarks (regular pgbench, ...) 3) application-specific benchmarks, or at least pgbench with templates that match your workload somewhat better than the default one Start with (1), compare results between machines, if it's OK start with (2) and so on. 2. What's your rule of thumb regarding my specific issue? What should be investigated first? There's a bottleneck somewhere. You need to identify which resource is it and why, until then it's just wild guessing. Try measuring how long the requests take at different points - at the app server, at the database, etc. That will tell you whether it's a database issue, a network issue etc. If the queries take longer on the database, use something like perf to profile the system. ISTM it's not a disk issue (at least the chart shows minimum usage). But you're doing ~400tps, returning ~5M rows per second. Also, if it turns out to be a database issue, more info about config and data set would be useful. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning Checkpoints
On 10/31/2016 08:19 PM, Andre Henry wrote: My PG 9.4.5 server runs on Amazon RDS some times of the day we have a lot of checkpoints really close (less than 1 minute apart, see logs below) and we are trying to tune the DB to minimize the impact of the checkpoint or reduce the number of checkpoints. Server Stats · Instance Type db.r3.4xl • 16 vCPUs 122GB of RAM • PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit Some PG Stats • Shared Buffers = 31427608kB • Checkpoint Segments = 64 • Checkpoint completion target = .9 • Rest of the configuration is below Things we are doing • We have a huge table where each row is over 1kB and its very busy. We are splitting that into multiple tables especially the one json field that making it large. Questions • Each checkpoint log writes out the following checkpoint complete: wrote 166481 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=32.441 s, sync=0.050 s, total=32.550 s; sync files=274, longest=0.049 s, average=0.000 s OK, each checkpoint has to write all dirty data from checkpoints. You have ~170k buffers worth of dirty data, i.e. ~1.3GB. • What does buffers mean? How do I find out how much RAM that is equivalent to? Buffer holds 8kB of data, which is the "chunk" of data files. • Based on my RDS stats I don't think IOPs will help, because I don't see any flat lines on my write operations / second graph. Is this a good assumption? Not sure what you mean by this. Also, maybe you should talk to AWS if you're on RDS. • What else can we tune to spread out checkpoints? Based on the logs, your checkpoints are triggered by filling WAL. I see your checkpoints happen every 30 - 40 seconds, and you only have 64 segments. So to get checkpoints checkpoints triggered by timeout (which I assume is 5 minutes, because you have not mentioned checkpoint_timeout), you need to increase checkpoint_segments enough to hold 5 minutes worth of WAL. That means 300/30 * 64, i.e. roughly 640 segments (it's likely an overestimate, due to full page writes, but well). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query slowdown after 9.0 -> 9.4 migration
On 10/26/2016 03:48 PM, Filip Rembiałkowski wrote: Hi. Query run time degraded after migration from Pg 9.0 + postgis 1.5 to Pg 9.4 + postgis 2.2. 1 ms versus 7 ms. Same query, same data, same schema, similar hardware. Data is small and fits in cache. EXPLAIN shows heap scan cost increase. What can be the reason for 40-fold increase in page scans needed to run Bitmap Heap Scan with Filter and Recheck? On 9.0 the the scan accesses only 8 buffers: Buffers: shared hit=8 while on 9.4 it has to inspect 316 of them: Buffers: shared hit=316 Perhaps the table is organized / sorted differently, or something like that. How did you do the upgrade? ragards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL on ZFS: performance tuning
On 09/27/2016 06:00 PM, Torsten Zuehlsdorff wrote: On 29.07.2016 08:30, Tomas Vondra wrote: On 07/29/2016 08:04 AM, trafdev wrote: Hi. I have an OLAP-oriented DB (light occasional bulk writes and heavy aggregated selects over large periods of data) based on Postgres 9.5.3. Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS, mirror). The largest table is 13GB (with a 4GB index on it), other tables are 4, 2 and less than 1GB. After reading a lot of articles and "howto-s" I've collected following set of tweaks and hints: ZFS pools creation: zfs create zroot/ara/sqldb zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql NAME PROPERTY VALUE SOURCE zroot/ara/sqldb/pgsql primarycache all local zroot/ara/sqldb/pgsql recordsize8Klocal zroot/ara/sqldb/pgsql logbias latency local zroot/ara/sqldb/pgsql compression lz4 inherited from zroot L2ARC is disabled VDEV cache is disabled pgsql -c "mkdir /ara/sqldb/pgsql/data_ix" pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data" /etc/sysctl.conf vfs.zfs.metaslab.lba_weighting_enabled=0 postgresql.conf: listen_addresses = '*' max_connections = 100 shared_buffers = 16GB effective_cache_size = 48GB It may not be a problem for your workload, but this effective_cache_size value is far too high. May i asked why? ZFS in default caches your size of RAM minus 1 GB. Getting the shared buffer from the 64 GB RAM i would asume 47 GB would be a better value. But this would not be far too high. So please can you explain this? Because it's not a global value, but an estimate of how much RAM is available as a cache for a single query. So if you're running 10 queries at the same time, they'll have to share the memory. It's a bit trickier as there's often a fair amount of cross-backend sharing (backends accessing the same data, so it's likely one backend loads data into cache, and then other backends access it too). It also ignores that memory may get allocated for other reasons - some queries may allocate quite a bit of memory for sorts/aggregations, so not only is effective_cache_size = RAM - shared_buffers excessive as it ignores the per-query nature, but also because it neglects these other allocations. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL on ZFS: performance tuning
s:99.95%22.04b Streams:10.14b +Resets:0.10%9.97m -Resets:99.90%10.13b Bogus:0 Notes\concerns: - primarycache=metadata (recommended in most articles) produces a significant performance degradation (in SELECT queries); Those articles are wrong. PostgreSQL relies of filesystem cache, so it needs primarycache=all. - from what I can see, Postgres uses memory too carefully. I would like somehow to force it to keep accessed data in memory as long as possible. Instead I often see that even frequently accessed data is pushed out of memory cache for no apparent reasons. > This is probably a consequence of the primarycache misconfiguration. Do I miss something important in my configs? Are there any double writes\reads somewhere because of OS\ZFS\Postgres caches? How to avoid them? Please share your experience\tips. Thanks. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] using stale statistics instead of current ones because stats collector is not responding
Hi, On Tue, 2016-03-08 at 16:18 -0800, Tory M Blue wrote: > No hits on the intratubes on this. > > > Any idea ? We are doing some massive deletes so was curious as to what > would cause this error. The DB is up, not overburdened, just some big > deletes and slon replication processes. > > CentOS 6.6 > Postgres 9.4.5 > > First time I've ever seen this alert/error just curious about it. > > 2016-03-08 16:17:29 PST11877 2016-03-08 16:17:29.850 PSTLOG: > using stale statistics instead of current ones because stats collector > is not responding PostgreSQL tracks 'runtime statistics' (number of scans of a table, tuples fetched from index/table etc.) in a file, maintained by a separate process (collector). When a backed process requests some of the stats (e.g. when a monitoring tool selects from pg_stat_all_tables) it requests a recent snapshot of the file from the collector. The log message you see means that the collector did not handle such requests fast enough, and the backend decided to read an older snapshot instead. So you may see some stale data in monitoring for example. This may easily happen if the I/O system is overloaded, for example. The simplest solution is to move the statistics file to RAM disk (e.g. tmpfs mount on Linux) using stats_temp_directory in postgresql.conf. The space neede depends on the number of objects (databases, tables, indexes), and usually it's a megabyte in total or so. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join gets slower as work_mem increases?
On 02/01/2016 10:38 AM, Albe Laurenz wrote: Tomas Vondra wrote: ... I didn't post the whole plan since it is awfully long, I'll include hyperlinks for the whole plan. work_mem = '100MB' (http://explain.depesz.com/s/7b6a): -> Hash Join (cost=46738.74..285400.61 rows=292 width=8) (actual time=4296.986..106087.683 rows=187222 loops=1) Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id) Buffers: shared hit=1181177 dirtied=1, temp read=7232 written=7230 [...] -> Hash (cost=18044.92..18044.92 rows=4014 width=8) (actual time=4206.892..4206.892 rows=3096362 loops=1) Buckets: 1024 Batches: 2 (originally 1) Memory Usage: 102401kB Buffers: shared hit=1134522 dirtied=1, temp written=5296 work_mem = '500MB' (http://explain.depesz.com/s/Cgkl): -> Hash Join (cost=46738.74..285400.61 rows=292 width=8) (actual time=3802.849..245970.049 rows=187222 loops=1) Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id) Buffers: shared hit=1181175 dirtied=111 [...] -> Hash (cost=18044.92..18044.92 rows=4014 width=8) (actual time=3709.584..3709.584 rows=3096360 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 120952kB Buffers: shared hit=1134520 dirtied=111 Does that support your theory? There is clearly an underestimate here, caused by correlated attributes, but is that the cause for the bad performance with increased work_mem? Yes, that's clearly the culprit here. In both cases we estimate here are only ~4000 tuples in the hash, and 9.3 sizes the hash table to have at most ~10 tuples per bucket (in a linked list). However we actually get ~3M rows, so there will be ~3000 tuples per bucket, and that's extremely expensive to walk. The reason why 100MB is faster is that it's using 2 batches, thus making the lists "just" ~1500 tuples long. This is pretty much exactly the reason why I reworked hash joins in 9.5. I'd bet it's going to be ~20x faster on that version. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] checkpoints, proper config
On 12/10/2015 11:45 PM, Alvaro Herrera wrote: Tomas Vondra wrote: Also, I don't think it makes much sense to set (checkpoint_warning > checkpoint_timeout) as it kinda defeats the whole purpose of the warning. I agree, but actually, what is the sense of checkpoint_warning? I think it was useful back when we didn't have log_checkpoints, but now that we have detailed checkpoint logging I think it's pretty much useless noise. Not entirely. The WARNING only triggers when you get below the 30s (or whatever value is set in the config) and explicitly warns you about doing checkpoints too often. log_checkpoints=on logs all checkpoints and you have to do further analysis on the data (and it's just LOG). -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] checkpoints, proper config
On 12/10/2015 06:20 PM, Joshua D. Drake wrote: On 12/10/2015 01:12 AM, Tory M Blue wrote: checkpoint_timeout = 5min checkpoint_completion_target = 0.9 The above is your problem. Make checkpoint_timeout = 1h . Also, considering turning synchronous_commit off. I doubt that. The report mentioned that the checkpoints happen 0.027... minutes apart (assuming the minutes_between_checkpoints is computed in a sane way). That's way below 5 minutes, so the checkpoints have to be triggered by something else - probably by running out of segments, but we don't know the value before Tory increased it to 150. Also, recommending synchronous_commit=off is a bit silly, because not only it introduces data loss issues, but it'll likely cause even more frequent checkpoints. Tory, please enable logging of checkpoints (log_checkpoints=on). Also, I don't think it makes much sense to set (checkpoint_warning > checkpoint_timeout) as it kinda defeats the whole purpose of the warning. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Server slowing down over time
Hi, On 08/27/2015 07:21 PM, Jean Cavallo wrote: Hi, I am currently working on a data migration for a client. The general plan is : - Read data from a postgresql database - Convert them to the new application - Insert in another database (same postgresql instance). The source database is rather big (~40GB, wo indexes), and the conversion process takes some time. It is done by multiple workers on a separate Linux environnement, piece by piece. When we start the migration, at first it looks good. Performances are good, and it ran smoothly. After a few hours, we noticed that things started to slow down. Some queries seemed to be stuck, so we waited for them to end, and restarted the server. After that it went well for some time (~10 minutes), then it slowed down again. We tried again (a few times), and the pattern repeats. If you're moving a lot of data (especially if the destination database is empty), one possible problem is statistics. This generally is not a problem in regular operation, because the data growth is gradual and autovacuum analyzes the tables regularly, but in batch processes this is often a big issue. The usual scenario is that there's an empty (or very small) table, where indexes are inefficient so PostgreSQL plans the queries with sequential scans. The table suddenly grows, which would make indexes efficient, but the planner has no idea about that until autovacuum kicks in. But before that happens, the batch process executes queries on that table. Try adding ANALYZE after steps that add a lot of data. My postgresql specific problem is that it looks like the server gets stuck. CPU usage is <10%, RAM usage is under 50% max, there is no noticeable disk usage. But, there are some (<10) active queries, some of which may take several hours to complete. Those queries work properly (i.e < 1min) right after the server restarts. That's a bit strange. Essentially what you're saying is that the workload is neither CPU nor I/O bound. To make it CPU bound, at least one CPU would have to be 100% utilized, and with 4 CPUs that's 25%, but you're saying there's only 10% used. But you're saying I/O is not the bottleneck either. So my question is : What could slow the queries from ~1min to 2hours which does not involve CPU, Memory, or disk usage, and which would "reset" when restarting the server ? A lot of things, unfortunately, and the fact that this is a migration moving data between two databases makes it even more complicated. The virtualization does not make it less complex either. For example, are you sure it's not stuck on the other database? I assume you're running some long queries, so maybe it's stuck there and the destination database is just waiting for data? That's be consistent with the low CPU and I/O usage you observe. Locking is another possibility, although it probably is not the only cause - it'd be utilizing at least one CPU otherwise. For information, the number of processes does not seem to be the problem, there are ~20 connections with max_connection set to 100. We noticed at some point that the hard drive holding the target database was heavily fragmented (100%...), but defrag did not seem to change anything. If it was a problem, you'd see high I/O usage. And that's not the case. Also, the queries that appear to get stuck are "heavy" queries, though after a fresh restart they execute in a reasonable time. Does the plan change? If not, check waiting locks in pg_locks. Finally, whatever causes the database to wait also causes the Windows instance to slow down. But restarting Postgresql fixes this as well. That's a bit strange, I guess. If you're not observing light CPU and I/O usage, then the instance should not be slow, unless there's something else going on - possibly at the virtualization level (e.g. another busy instance on the same hardware, some sort of accounting that limits the resources after a time, etc.) Configuration : The Postgresql server runs on a Windows Virtual Machine under VMWare. The VM has dedicated resources, and the only other VM on the host is the applicative server (which runs idle while waiting for the database). There is nothing else running on the server except postgresql (well, there were other things, but we stopped everything to no avail). PostgreSQL 9.3.5, compiled by Visual C++ build 1600, 64-bit You're 4 versions behind. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query > 1000× slowdown after adding datetime comparison
On 08/31/2015 06:09 PM, twoflower wrote: I have the following three tables: ... This is the query plan: <http://postgresql.nabble.com/file/n5864045/qp1.png> If I remove the WHERE condition, it returns immediately. Am I doing something obviously wrong? Please share explain plans for both the slow and the fast query. That makes it easier to spot the difference, and possibly identify the cause. Also, what PostgreSQL version is this, and what are "basic" config parameters (shared buffers, work mem)? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index creation running now for 14 hours
Hi, On 08/26/2015 11:53 PM, Tory M Blue wrote: On Wed, Aug 26, 2015 at 2:45 PM, Qingqing Zhou zhouqq.postg...@gmail.com mailto:zhouqq.postg...@gmail.com wrote: On Wed, Aug 26, 2015 at 1:26 PM, Tory M Blue tmb...@gmail.com mailto:tmb...@gmail.com wrote: Right now the 100% cpu process which is this index is only using 3.5GB and has been for the last 15 hours If 100% cpu, you can do 'sudo perf top' to see what the CPU is busy about. Regards, Qingqing I appreciate the attempted help, but I know what it's doing, it's creating indexes for the last 14+ hours. Sure, but what exactly was it doing? 'perf top' might give us a hint which function is consuming most of the time, for example. I've killed it now, as it was about to run my machine out of disk space, stopped it at 97% full, could not go any longer. Which suggests it's using a lot of temp files. Indexes are built by reading all the necessary data from the table (just the columns), sorted and then an index is built using the sorted data (because it can be done very efficiently - much faster than when simply inserting the tuples into the btree index). The fact that you ran out of disk space probably means that you don't have enough space for the sort (it clearly does not fit into maintenance_work_mem), and there's no way around that - you need enough disk space. I will now clean up the table a bit but will still have 500million rows with 6 indexes on it. I will create the indexes after the data is laid down vs during, so it doesn't block my other table replications. I will then fire off my index creations in parallel for my other tables so I can actually use the hardware the DB is sitting on. That's a very bad idea, because each of the index builds will require disk space for the sort, and you're even more likely to run out of disk space. But I guess the answer is, no real way to tell what the box is doing when it's creating an index. Yes there was a lock, no I could not find a way to see how it's progressing so there was no way for me to gauge when it would be done. Had it been waiting on a lock, it wouldn't consume 100% of CPU. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index creation running now for 14 hours
On 08/26/2015 10:26 PM, Tory M Blue wrote: the table is 90GB without indexes, 285GB with indexes and bloat, The row count is not actually completing.. 125Million rows over 13 months, this table is probably close to 600million rows. You don't need to do SELECT COUNT(*) if you only need an approximate number. You can look at pg_class.reltuples: SELECT reltuples FROM pg_class WHERE relname = 'impressions'; That should be a sufficiently accurate estimate. The above is when it had finished copying the table and started on the index.. Well as I said I'm running out of storage as the index is creating some serious data on the filesystem, I'll have to kill it, try to massage the data a bit and increase the maintenance_work mem to use some of my 256GB of ram to try to get through this. Right now the 100% cpu process which is this index is only using 3.5GB and has been for the last 15 hours Please post details on the configuration (shared_buffer, work_mem, maintenance_work_mem and such). BTW while the the CREATE INDEX is reporting 3.5GB, it most likely wrote a lot of data into on-disk chunks when sorting the data. So it's actually using the memory through page cache (i.e. don't increase maintenance_work_mem too much, you don't want to force the data to disk needlessly). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query not using GIN index
Hi, On 08/22/2015 03:55 AM, Guo, Yun wrote: Hi, We have a query on a column with GIN index, but query plan chooses not using the index but do an seq scan whichi is must slower CREATE INDEX idx_access_grants_on_access_tokens ON access_grants USING gin (access_tokens); explain analyze SELECT access_grants.* FROM access_grants WHERE (access_tokens @ ARRAY['124e5a1f9de325fc176a7c89152ac734']) ; QUERY PLAN -- Limit (cost=0.00..7.46 rows=1 width=157) (actual time=260.376..260.377 rows=1 loops=1) - Seq Scan on access_grants (cost=0.00..29718.03 rows=3985 width=157) (actual time=260.373..260.373 rows=1 loops=1) Filter: (access_tokens @ '{124e5a1f9de325fc176a7c89152ac734}'::text[]) Rows Removed by Filter: 796818 Total runtime: 260.408 ms I find it very likely that the explain output actually comes from a slightly different query, including a LIMIT 1 clause. That might easily be the problem here, because the optimizer expects the 3985 matches to be uniformly distributed in the table, so it thinks it'll scan just a tiny fraction of the table (1/3985) until the first match. But it's quite possible all at rows are end of the table, and the executor has to actually scan the whole table. It's difficult to say without further details of the table and how the data are generated. We tested on smaller table in development region and it chooses to use the index there. However, in production size table it decides to ignore the index for unknown reasons. Please provide explain output from that table. It's difficult to say what's different without seeing the details. Also please provide important details about the system (e.g. which PostgreSQL version, how much RAM, what work_mem/shared_buffers and such stuff). Is the large number of tuples skewing the query planner’s decision or the index itself is larger than the table therefor it would decide to do table scan? What large number of tuples? The indexes are supposed to be more efficient the larger the table is. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Do work_mem and shared buffers have 1g or 2g limit on 64 bit linux?
On 06/15/15 05:44, Kaijiang Chen wrote: I've checked the source codes in postgresql 9.2.4. In function static bool grow_memtuples(Tuplesortstate *state) the codes: /* * On a 64-bit machine, allowedMem could be high enough to get us into * trouble with MaxAllocSize, too. */ if ((Size) (state-memtupsize * 2) = MaxAllocSize / sizeof(SortTuple)) return false; Note that MaxAllocSize == 1GB - 1 that means, at least for sorting, it uses at most 1GB work_mem! And setting larger work_mem has no use at all... That's not true. This only limits the size of 'memtuples' array, which only stores pointer to the actual tuple, and some additional data. The tuple itself is not counted against MaxAllocSize directly. The SortTuple structure has ~24B which means you can track 33M tuples in that array, and the tuples may take a lot more space. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query - lots of temporary files.
On 06/10/15 15:42, Johann Spies wrote: On 10 June 2015 at 15:02, Claudio Freire klaussfre...@gmail.com mailto:klaussfre...@gmail.com wrote: The joins are different on both versions, and the most likely culprit is the join against D. It's probably wrong, and the first query is building a cartesian product. Without more information about the schema it's difficult to be sure though. Thanks for your reply. I will experiment futher with different joins. I don't know what you mean by experimenting with joins - that should be determined by the schema. The problematic piece of the explain plan is this: - Merge Join (cost=4384310.92..21202716.78 rows=6664163593 width=390) Output: a.ut, c.gt, b.go, b.gn, d.au Merge Cond: ((c.ut)::text = (d.rart_id)::text) That is, the planner expects ~6.7 billion rows, each ~390B wide. That's ~2.5TB of data that needs to be stored to disk (so that the sort can process it). The way the schema is designed might be one of the issues - ISTM the 'ut' column is somehow universal, mixing values referencing different columns in multiple tables. Not only that's utterly misleading for the planner (and may easily cause issues with huge intermediate results), but it also makes formulating the queries very difficult. And of course, the casting between text and int is not very good either. Fix the schema to follow relational best practices - separate the values into multiple columns, and most of this will go away. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to reduce writing on disk ? (90 gb on pgsql_tmp)
On 06/04/15 02:58, Scott Marlowe wrote: On Wed, Jun 3, 2015 at 6:53 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Wed, Jun 3, 2015 at 4:29 PM, Joshua D. Drake j...@commandprompt.com wrote: On 06/03/2015 03:16 PM, Tomas Vondra wrote: Cache is not free memory - it's there for a purpose and usually plays a significant role in performance. Sure, it may be freed and used for other purposes, but that has consequences - e.g. it impacts performance of other queries etc. You generally don't want to do that onproduction. Exactly. If your cache is reduced your performance is reduced because less things are in cache. It is not free memory. Also the command free is not useful in this scenario. It is almost alwaysbetter to use sar so you can see where the data points are thatfree is using. But if that WAS happening he wouldn't still HAVE 60G of cache! That's my whole point. He's NOT running out of memory. He's not even having to dump cache right now. No one claimed he's running out of memory ... What I claimed is that considering page cache equal to free memory is not really appropriate, because it is used for caching data, which plays a significant role. Regarding the free output, we have no clue when the free command was executed. I might have been executed while the query was running, right after it failed or long after that. That has significant impact on interpretation of the output. Also, we have no clue what happens on the machine, so it's possible there are other queries competing for the page cache, quickly filling reusing free memory (used for large query moments ago) for page cache. And finally, we have no clue what plan the query is using, so we don't know how much memory it's using before it starts spilling to disk. For example it might easily be a single sort node, taking only 384MB (the work_mem) of RAM before it starts spilling to disk. Further if he started using a few gig here for this one it wouldn't have a big impact on cache (60G-1G etc) but might make it much faster, as spilling to disk is a lot less intrusive when you've got a bigger chunk of ram to work in. OTOH doing something like setting work_mem to 60G would likely be fatal. It'd certainly be fatal, because this query is spilling 95G to disk, and keeping that in memory would easily require 2-3x more space. But he's not down to 3GB of memory by any kind of imagination. Any working machine will slowly, certainly fill its caches since it's not using the memory for anything else. That's normal. As long as you're not blowing out the cache you're fine. Once again, what about the 15GB shared buffers? Not that it'd change anything, really. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to reduce writing on disk ? (90 gb on pgsql_tmp)
On 06/03/15 17:09, Scott Marlowe wrote: On Wed, Jun 3, 2015 at 8:56 AM, Tomas Vondra I don't see why you think you have less than 3GB used. The output you posted clearly shows there's only ~300MB memory free - there's 15GB shared buffers and ~45GB of page cache (file system cache). Because you subtract cached from used to see how much real spare memory you have. The kernel will dump cached mem as needed to free up space for memory usage. So 64141-61761=2380MB used. Well, except that 15GB of that is shared_buffers, and I wouldn't call that 'free'. Also, I don't see page cache as entirely free - you probably want at least some caching at this level. In any case, even if all 64GB were free, this would not be enough for the query that needs 95GB for temp files. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to reduce writing on disk ? (90 gb on pgsql_tmp)
On 06/03/15 23:18, Scott Marlowe wrote: On Wed, Jun 3, 2015 at 1:24 PM, Tomas Vondra tomas.von...@2ndquadrant.com wrote: On 06/03/15 17:09, Scott Marlowe wrote: On Wed, Jun 3, 2015 at 8:56 AM, Tomas Vondra Well, except that 15GB of that is shared_buffers, and I wouldn't call that 'free'. Also, I don't see page cache as entirely free - you probably want at least some caching at this level. In any case, even if all 64GB were free, this would not be enough for the query that needs 95GB for temp files. You can argue all you want, but this machine has plenty of free memory right now, and unless the OP goes crazy and cranks up work_mem to some much higher level it'll stay that way, which is good. There's far far more than 300MB free here. At the drop of a hat there can be ~60G freed up as needed, either for shared_buffers or work_mem or other things to happen. Cache doesn't count as used in terms of real memory pressure. IE you're not gonna start getting swapping becase you need more memory, it'll just come from the cache. Please, could you explain how you free 60GB 'as need' when 15GB of that is actually used for shared buffers? Also, we don't know how much of that cache is 'dirty' which makes it more difficult to free. What is more important, though, is the amount of memory. OP reported the query writes ~95GB of temp files (and dies because of full disk, so there may be more). The on-disk format is usually more compact than the in-memory representation - for example on-disk sort often needs 3x less space than in-memory qsort. So we can assume the query needs 95GB of data. Can you explain how that's going to fit into the 64GB RAM? Cache is free memory. If you think of it any other way when you're looking at memory usage and pressure on theings like swap you're gonna make some bad decisions. Cache is not free memory - it's there for a purpose and usually plays a significant role in performance. Sure, it may be freed and used for other purposes, but that has consequences - e.g. it impacts performance of other queries etc. You generally don't want to do that on production. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to reduce writing on disk ? (90 gb on pgsql_tmp)
On 06/04/15 01:54, Yves Dorfsman wrote: On 2015-06-03 16:29, Joshua D. Drake wrote: On 06/03/2015 03:16 PM, Tomas Vondra wrote: What is more important, though, is the amount of memory. OP reported the query writes ~95GB of temp files (and dies because of full disk, so there may be more). The on-disk format is usually more compact than the in-memory representation - for example on-disk sort often needs 3x less space than in-memory qsort. So we can assume the query needs 95GB of data. Can you explain how that's going to fit into the 64GB RAM? Cache is free memory. If you think of it any other way when you're looking at memory usage and pressure on theings like swap you're gonna make some bad decisions. Cache is not free memory - it's there for a purpose and usually plays a significant role in performance. Sure, it may be freed and used for other purposes, but that has consequences - e.g. it impacts performance of other queries etc. You generally don't want to do that on production. Exactly. If your cache is reduced your performance is reduced because less things are in cache. It is not free memory. Also the command free is not useful in this scenario. It is almost always better to use sar so you can see where the data points are that free is using. It's one thing to consciously keep free memory for the OS cache, but you should not take the free column from the first line output of the program free as meaning that's all there is left, or that you need allthat memory. No one suggested using the 'free' column this way, so I'm not sure what you're responding to? You should look at used from the second line (-/+ buffers/cache). That value is what the kernel and all the apps are using on your machine. Add whatever you want to have for OS cache, and this is the total amount ofmemory you want in your machine. Except that the second line is not particularly helpful too, because it does not account for the shared buffers clearly, nor does it show what part of the page cache is dirty etc. Note that for a machine that has run long enough, and done enough I/O ops, free from the first line will always be close to 0, because the OS tries to use as much memory as possible for caching, do enough I/O and you'll fill that up. That's generally true, but the assumption is that on a 300GB database the page cache has a significant benefit for performance. What however makes this approach utterly futile is the fact that OP has only 64GB of RAM (and only ~45GB of that in page cache), and the query writes 95GB temp files on disk (and then fails). So even if you drop the whole page cache, the query will fail anyway. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to reduce writing on disk ? (90 gb on pgsql_tmp)
On 06/03/15 15:27, chiru r wrote: Hi Benjamin, It looks you are facing disk space issue for queries. In order to avid the disk space issue you can do the following. 1) Increase the work_mem parameter session level before executing the queries. 2) If you observe diskspace issue particular user queries,increase the work_mem parameter user level. The suggestion to increase work_mem is a bit naive, IMHO. The query is writing ~95GB to disk, it usually takes more space to keep the same data in memory. They only have 64GB of RAM ... In the good case, it will crash just like now. In the worse case, the OOM killer will intervene, possibly crashing the whole database. 3) Check with developer to tune the query. That's a better possibility. Sadly, we don't know what the query is doing, so we can't judge how much it can be optimized. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to reduce writing on disk ? (90 gb on pgsql_tmp)
On 06/03/15 16:06, ben.play wrote: The query is (unfortunately) generated by Doctrine 2 (Symfony 2). We can’t change the query easily. Well, then you'll probably have to buy more RAM, apparently. This is my config : max_connections = 80 shared_buffers = 15GB work_mem = 384MB maintenance_work_mem = 1GB #temp_buffers = 8MB #temp_file_limit = -1 effective_cache_size = 44GB If I put a temp_file_limit …Are all my queries (who have to write on disk) will crash ? As you can see… I have 64 gb of Ram, but less than 3 Gb is used ! ben@bdd:/home/benjamin# free -m total used free sharedbuffers cached Mem: 64456 64141315 15726 53 61761 -/+ buffers/cache: 2326 62130 Swap: 1021 63958 Thanks guys for your help :) I don't see why you think you have less than 3GB used. The output you posted clearly shows there's only ~300MB memory free - there's 15GB shared buffers and ~45GB of page cache (file system cache). But you still haven't shows us the query (the EXPLAIN ANALYZE of it), so we can't really give you advice. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Connection time when using SSL
Hi On 06/01/15 22:51, Marco Di Cesare wrote: I am connecting to a Postgres instance using SSL and seeing fairly slow connect times. I would expect there would be some overhead but it's more than I anticipated. The connection is happening over a network. I am using a wildcard SSL certificate on the server side only. Using one of these JDBC SSL connect strings takes on average: 1060 ms to connect to the database: jdbc:postgresql://db01-dev.pointclickcare.com:5432/testdb?ssl=truesslmode=requiresslfactory=org.postgresql.ssl.jdbc4.LibPQFactory - or - jdbc:postgresql://db01-dev.pointclickcare.com:5432/testdb?ssl=truesslmode=requiresslfactory=org.postgresql.ssl.NonValidatingFactory Using this JDBC non-SSL connect string takes on average: 190 ms to connect to the database: jdbc:postgresql://db01-dev.pointclickcare.com:5432/testdb Does this sound like a reasonable overhead that SSL would add to the connection time or does this seem high? (~870ms/~443% slower using SSL) What is the network latency (ping) between the two hosts? SSL requires a handshake, exchanging a number messages between the two hosts, and if each roundtrip takes a significant amount of time ... The 190ms seems quite high. On my rather slow workstation, a local connection without SSL takes ~30ms , with SSL ~70ms. So I wouldn't be surprised by ~100ms roundtrips in your case, and that is going to slow down the SSL handshake significantly. There's very little you can do with the roundtrip time, usually, but you can keep the connections open in a pool. That'll amortize the costs. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Different plan for very similar queries
On 05/31/15 18:22, Tom Lane wrote: Tomas Vondra tomas.von...@2ndquadrant.com writes: On 05/31/15 13:00, Peter J. Holzer wrote: (There was no analyze on facttable_stat_fta4 (automatic or manual) on facttable_stat_fta4 between those two tests, so the statistics on facttable_stat_fta4 shouldn't have changed - only those for term.) So maybe there was autoanalyze, because otherwise it really should be the same in both plans ... No, because that's the inside of a nestloop with significantly different outer-side rowcount estimates. The first case gets a benefit from the expectation that it will be re-executed many times (see the impact of loop_count on cost_index). Meh, I got confused by the plan a bit - I thought there's a problem in the outer path (e.g. change of row count). But actually this is the path scanning the 'term' table, so the change is expected there. The fact that the index scan cost 'suddenly' grows from 386k to 2M is confusing at first, but yeah - it's caused by the 'averaging' in cost_index() depending on loop_count. But I think this does not really change the problem with eliminating inner paths solely on the basis of total cost - in fact it probably makes it slightly worse, because the cost also depends on estimates in the outer path (while the bitmapscan does not). -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Different plan for very similar queries
On 05/31/15 13:00, Peter J. Holzer wrote: [I've seen in -hackers that you already seem to have a fix] On 2015-05-30 15:04:34 -0400, Tom Lane wrote: Tomas Vondra tomas.von...@2ndquadrant.com writes: Why exactly does the second query use a much slower plan I'm not sure. I believe I've found an issue in planning semi joins (reported to pgsql-hackers a few minutes ago), but may be wrong and the code is OK. I think you are probably right that there's a bug there: the planner is vastly overestimating the cost of the nestloop-with-inner-indexscan plan. However, the reason why the mergejoin plan gets chosen in some cases seems to be that an additional estimation error is needed to make that happen; otherwise the nestloop still comes out looking cheaper. The undesirable case looks like: Merge Semi Join (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1) Merge Cond: ((t.term)::text = (f.berechnungsart)::text) - Index Scan using term_term_idx on term t (cost=0.00..319880.73 rows=636 width=81) (actual time=7703.809..7703.938 rows=3 loops=1) Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'berechnungsart'::text)) Just noticed that this is a bit strange, too: This scans the whole index term_term_idx and for every row found it checks the table for the filter condition. So it has to read the whole index and the whole table, right? But the planner estimates that it will return only 636 rows (out of 6.1E6), so using term_facttablename_columnname_idx to extract those 636 and then sorting them should be quite a bit faster (even just a plain full table scan and then sorting should be faster). That seems a bit strange, yes. I don't see why a simple index scan (with Index Cond), expected to produce 636, should be more expensive than scanning the whole index (with a Filter). Even if there's an additional Sort node, sorting those 636 rows. But I've been unable to reproduce that (both on 9.1 and HEAD) without significant 'SET enable_*' gymnastics, so I'm not sure why that happens. Don't you have some 'enable_sort=off' or something like that? A test case with a data set would help a lot, in this case. Another thought: For the merge semi join postgresql doesn't actually have to scan the whole inner index. It can skip from the first 'm' entry to the first 'n' entry reading only a few non-leaf blocks, skipping many leaf blocks in the process. The times (7703.917..30948.271) indicate that it doesn't actually do this, but maybe the planner assumes it does? How would it know how far to skip? I mean, assume you're on the first 'n' entry - how do you know where is the first 'm' entry? If you only really need to check existence, a nested loop with an inner index scan is probably the right thing anyway, especially if the number of outer rows (and thus loops performed) is quite low. This is clearly demonstrated by the first plan in this thread: QUERY PLAN - ... Nested Loop Semi Join (cost=0.00..384860.48 rows=1 width=81 ... - Index Scan using term_facttablename_columnname_idx on ... Index Cond: (((facttablename)::text = 'facttable_sta ... - Index Scan using facttable_stat_fta4_einheit_idx on fa ... Index Cond: ((einheit)::text = (t.term)::text) Total runtime: 0.173 ms (6 rows) This is probably the best plan you can get in cases like this ... I also suspected that the culprit is the columnname column. That one has a very skewed distribution: wdsah= select columnname, count(*) from term group by columnname order by count(*); columnname | count -+- warenstrom | 3 einheit | 3 berechnungsart | 3 og | 26 berichtsregion | 242 partnerregion | 246 sitcr4 |4719 kurzbezeichnung | 1221319 macrobondtimeseries | 1221320 | 3661206 (10 rows) So random variation in the sample could throw off the estimated frequencies of the the least frequent columnnames by quite a bit. But given that both plans estimated the number of rows returned by the outer index scan as 636, that was probably a red herring. But there does seem to be a connection to this column: In one case pg_stats contained n_distinct=7 and only the two most common values. Then the plan looked like this: wdsah= explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register from term t where facttablename='facttable_stat_fta4' and columnname='warenstrom' and exists (select 1 from facttable_stat_fta4 f where f.warenstrom=t.term ); QUERY PLAN
Re: [PERFORM] Postmaster eating up all my cpu
Hi, On 05/28/15 13:25, birimblongas wrote: Hi, My app was working just fine. A month ago postmaster started to eat all my cpu sometimes (not all day, but a lot of times and everyday) and then my app gets really slow and sometimes don't even complete the requests. What could it be? A lot of things. The first step should be looking at pg_stat_activity, what is the process eating the CPU doing. We also need much more information about your system - what PostgreSQL version are you using, what kind of OS, configuration etc. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres is using 100% CPU
Hi, On 05/30/15 15:46, Ashik S L wrote: We are using postgres SQL version 8.4.17.. FYI 8.4 is already unsupported for ~1 year, so you should consider upgrading to a newer release. Also, the newest version in that branch is 8.4.22, so with 8.4.17 you're missing ~1 year of patches. Postgres DB szie is 900 MB and we are inserting 273 rows at once .and each row is of 60 bytes.Every time we insert 16380 bytes of data. So you insert 273 rows and it gets stuck for 40 minutes? That's really strange, and I find it rather unlikely even with a badly misconfigured instance. It should easily insert thousands of rows per second. Can you elaborate more about the database structure, or at least the table(s) you're inserting into. Are there any foreign keys (in either direction), indexes or triggers? I tried to make some config changes using above link. But I did not see any improvement. I made following changes in postgres.conf file: shared_buffers = 512MB // It was 32MB work_mem = 30MB effective_cache_size = 512MB // I tried with 128MB 256MB also Please let me know any config changes that I can try out. I don't think this has anything to do with configuration. This seems like an issue at the application level, or maybe poorly designed schema. You mentioned you have multiple machines, and only some of them are having this issue. What are the differences between the machines? Are all the machines using the same schema? I assume each has a different amount of data. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Different plan for very similar queries
Hi, On 05/29/15 11:51, Peter J. Holzer wrote: A couple of additional observations: The total cost of both queries is quite similar, so random variations might push into one direction or the other. Indeed, after dropping and recreating indexes (I tried GIN indexes as suggested by Heikki on [1]) and calling analyze after each change, I have now reached a state where both queries use the fast plan. I don't think bitmap indexes are particularly good match for this use case. The queries need to check an existence of a few records, and btree indexes are great for that - the first plan is very fast. Why exactly does the second query use a much slower plan I'm not sure. I believe I've found an issue in planning semi joins (reported to pgsql-hackers a few minutes ago), but may be wrong and the code is OK. Can you try forcing the same plan for the second query, using enable flags? E.g. SET enable_mergejoin = off; will disable the merge join, and push the optimizer towards a different join type. You may have to disable a few more node types until you get the same plan as for the first query, i.e. nestloop semi join - index scan - index scan See this for more info: http://www.postgresql.org/docs/9.1/static/runtime-config-query.html Also, have you tuned the PostgreSQL configuration? How? Can you provide the dataset? Not necessarily all the columns, it should be sufficient to provide the columns used in the join/where clauses: term - facttablename, columnname, term facttable_stat_fta4 - einheit, berechnungsart That'd make reproducing the problem much easier. In the first case the query planner seems to add the cost of the two index scans to get the total cost, despite the fact that for a semi join the second index scan can be aborted after the first hit (so either the cost of the second scan should be a lot less than 384457.80 or it needs to be divided by a large factor for the semi join). In the second case the cost of the second index scan (2545748.85) is either completely ignored or divided by a large factor: It doesn't seem to contribute much to the total cost. I believe this is a consequence of the semi join semantics, because the explain plan contains total costs and row counts, as if the whole relation was scanned (in this case all the 43M rows), but the optimizer only propagates fraction of the cost estimate (depending on how much of the relation it expects to scan). In this case it expects to scan a tiny part of the index scan, so the impact on the total cost is small. A bit confusing, yeah. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres is using 100% CPU
On 05/29/15 20:10, Ashik S L wrote: Hi All, I am using postgresDB on redhat machine which is having 4GB RAM machine. As soon as it starts to Inserting rows into the postgres DB it will reach 100%cpu. It will comedown to normal after 40 minutes. I tried perform some tuning on the postgres DB, But result was same.I am not postgres DB expert. Even we are not seeing in all machine. Only few machines we are seeing this issue. Any help on this would be appreciated. Ashik, before pointing you to this list, I asked for some basic information that are needed when diagnosing issues like this - database size, postgres version etc. We can't really help you without this info, because right now we only know you're doing some inserts (while before you mentioned updates), and it's slow. Also, can you please provide info about the configuration and what changes have you done when tuning it? Have you seen this? https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] extract(year from date) doesn't use index but maybe could?
On 04/19/15 22:10, Jon Dufresne wrote: On Sun, Apr 19, 2015 at 10:42 AM, Tomas Vondra tomas.von...@2ndquadrant.com wrote: Or you might try creating an expression index ... CREATE INDEX date_year_idx ON dates((extract(year from d))); Certainly, but won't this add additional overhead in the form of two indexes; one for the column and one for the expression? It will, but it probably will be more efficient than poorly performing queries. Another option is to use the first type of queries with explicit date ranges, thus making it possible to use a single index. My point is, why force the user to take these extra steps or add overhead when the the two queries (or two indexes) are functionally equivalent. Shouldn't this is an optimization handled by the database so the user doesn't need to hand optimize these differences? Theoretically yes. But currently the extract function call is pretty much a black box for the planner, just like any other function - it has no idea what happens inside, what fields are extracted and so on. It certainly is unable to infer the date range as you propose. It's possible that in the future someone will implement an optimization like this, but I'm not aware of anyone working on that and I wouldn't hold my breath. Until then you either have to create an expression index, or use queries with explicit date ranges (without extract calls). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] extract(year from date) doesn't use index but maybe could?
On 04/19/15 19:16, Jon Dufresne wrote: Given the table: CREATE TABLE dates (id SERIAL, d DATE NOT NULL, t TEXT NOT NULL) With an *index* on field d. The following two queries are functionally equivalent: 1. SELECT * FROM dates WHERE d = '1900-01-01' 2. SELECT * FROM dates WHERE EXTRACT(year from d) = 1900' By functionally equivalent, they will return the same result set. Query 2 does not use the index, adding a performance cost. It seems there is an opportunity for optimization to handle these two queries equivalently to take advantage of the index. Or you might try creating an expression index ... CREATE INDEX date_year_idx ON dates((extract(year from d))); regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows
On 26.3.2015 08:48, Jeff Janes wrote: OK, this is starting to look like a long-standing bug to me. If it only sees 3 distinct values, and all three are present at least twice, it throws all of them into the MCV list. But if one of those 3 were present just once, then it tests them to see if they qualify. The test for inclusion is that it has to be present more than once, and that it must be over-represented by 25%. Lets say it sampled 3 rows and found 29,900 of one value, 99 of another, and 1 of a third. But that turns into the second one needing to be present 12,500 times. The average value is present 10,000 times (30,000 samples with 3 distinct values) and 25 more than that is 12,500. So it excluded. It seems to me that a more reasonable criteria is that it must be over-represented 25% compared to the average of all the remaining values not yet accepted into the MCV list. I.e. all the greater ones should be subtracted out before computing the over-representation threshold. That might work IMO, but maybe we should increase the coefficient a bit (say, from 1.25 to 2), not to produce needlessly long MCV lists. It is also grossly inconsistent with the other behavior. If they are 29900; 98; 2 then all three go into the MCV. Isn't the mincount still 12500? How could all three get into the MCV? -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows
On 26.3.2015 17:35, Jeff Janes wrote: On Thu, Mar 26, 2015 at 5:44 AM, Tomas Vondra tomas.von...@2ndquadrant.com mailto:tomas.von...@2ndquadrant.com wrote: That might work IMO, but maybe we should increase the coefficient a bit (say, from 1.25 to 2), not to produce needlessly long MCV lists. That wouldn't work here, because at the point of decision the value present 99 times contributes half the average, so the average is 50, and of course it can't possibly be twice of that. Oh, right. How could I miss that? ;-) I have a patch, but is there a way to determine how it affects a wide variety of situations? I guess run `make installcheck`, then analyze, then dump pg_stats, with the patch and without the patch, and then compare the dumpsj? I doubt there's such way. I'd argue that if you can show this always generates longer MCV lists, we can assume the stats are probably more accurate, and thus the plans should be better. Of course, there's always the possibility that the plan was good by luck, and improving the estimates will result in a worse plan. But I don't think we can really fix that :-( It is also grossly inconsistent with the other behavior. If they are 29900; 98; 2 then all three go into the MCV. Isn't the mincount still 12500? How could all three get into the MCV? If all observed values are observed at least twice, it takes a different path through the code. It just keeps them all in the MCV list. That is what is causing the instability for the OP. If the 3rd most common is seen twice, then all three are kept. If it is seen once, then only the most common is kept. See if statements at 2494 and 2585 else if (toowide_cnt == 0 nmultiple == ndistinct) if (track_cnt == ndistinct Aha, I think I see it now. I've been concentrating on this code: avgcount = (double) samplerows / ndistinct; /* set minimum threshold count to store a value */ mincount = avgcount * 1.25; if (mincount 2) mincount = 2; but this is actually too late, because first we do this: else if (toowide_cnt == 0 nmultiple == ndistinct) { stats-stadistinct = ndistinct; } and that only happens if each item is observed at least 2x in the sample (and the actual Haas and Stokes estimator it not used). And then we do this: if (track_cnt == ndistinct toowide_cnt == 0 stats-stadistinct 0 track_cnt = num_mcv) { num_mcv = track_cnt; } so that we track everything. If at least one value is seen only 1x, it works differently, and we use the code with (1.25*avgcount) threshold. I wonder where the 1.25x threshold comes from - whether it's something we came up with, or if it comes from some paper. I guess the former. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows
On 25.3.2015 13:04, Feike Steenbergen wrote: ... When analyzing pg_stats we have sometimes have the following: (Note: 'NOT_YET_PRINTED' has not been found during this analyze, these are real values) attname| status inherited | f null_frac | 0 avg_width | 4 n_distinct | 3 most_common_vals | {PRINTED} most_common_freqs | {0.996567} histogram_bounds | {PREPARED,ERROR} correlation| 0.980644 A question about this specific entry, which some of you may be able to shed some light on: most_common_vals contains only 1 entry, why is this? I would expect to see 3 entries, as it has n_distinct=3 To be included in the MCV list, the value has to actually appear in the random sample at least twice, IIRC. If the values are very rare (e.g. if you only have such 10 rows out of 3.5M), that may not happen. You may try increasing the statistics target for this column, which should make the sample larger and stats more detailed (max is 1, which should use sample ~3M rows, i.e. almost the whole table). When looking at http://www.postgresql.org/docs/current/static/row-estimation-examples.html we can see that an estimate 5000 is what is to be expected for these statistics: # select ( (1 - 0.996567)/2 * 350 )::int; int4 -- 6008 (1 row) But why does it not record the frequency of 'PREPARED' and 'ERROR' in most_common_*? Can you post results for this query? SELECT stats, COUNT(*) FROM print_list group by 1 I'd like to know how frequent the other values are. Our current strategies in mitigating this problem is decreasing the autovacuum_*_scale_factor for this specific table, therefore triggering more analyses and vacuums. I'm not sure this is a good solution. The problem is elsewhere, IMHO. This is helping somewhat, as if the problem occurs it often solved automatically if autoanalyze analyzes this table, it is analyzed many times an hour currently. We can also increase the 'Stats target' for this table, which will cause the statistics to contain information about 'NOT_YET_PRINTED' more often, but even then, it may not find any of these records, as they sometimes do not exist. This is a better solution, IMHO. Could you help us to find a strategy to troubleshoot this issue further? You might also make the index scans cheaper, so that the switch to sequential scan happens later (when more rows are estimated). Try to decreasing random_page_cost from 4 (default) to 1.5 or something like that. It may hurt other queries, though, depending on the dataset size etc. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
On 22.3.2015 22:50, Vivekanand Joshi wrote: Any documentation regarding how to configure postgresql.conf file as per individual user? That can't be done in postgresql.conf, but by ALTER ROLE commands. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
Hi, On 18.3.2015 18:31, Vivekanand Joshi wrote: So, here is the first taste of success and which gives me the confidence that if properly worked out with a good hardware and proper tuning, PostgreSQL could be a good replacement. Out of the 9 reports which needs to be migrated in PostgreSQL, 3 are now running. Report 4 was giving an issue and I will see it tomorrow. Just to inform you guys that, the thing that helped most is setting enable_nestloops to false worked. Plans are now not miscalculated. The estimates are still miscalculated, but you're forcing the database not to use the nested loop. The problem is the nested loop may be appropriate in some cases (maybe only in a few places of the plan) so this is really corse-grained solution. But this is not a production-suitable setting. So what do you think how to get a work around this? (a) Try to identify why the queries are poorly estimated, and rephrase them somehow. This is the best solution, but takes time, expertise and may not be feasible in some cases. (b) Tweak the database structure, possibly introducing intermediate tables, materialized views (or tables maintained by triggers - this might work for the 'latest record' subquery), etc. (c) Try to tweak the cost parameters, to make the nested loops more expensive (and thus less likely to be selected), but in a more gradual way than enable_nestloops=false. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware Configuration and other Stuff
Hi, On 18.3.2015 20:07, Vivekanand Joshi wrote: Hi Team, I don't know under which section does this question comes, so I am posting this question to both Admin and performance mailing list. Apologies in advance. Let's keep this in pgsql-performance. Objective: We are planning to use PostgreSQL instead of Netezza for our data warehouse as well as database solutions. Right now, we have all our clients in one Netezza box. What we are thinking of migrating our clients to dedicated PostgreSQL for each of them. We will start with one of the client. If it works successfully, we will be migrating all the clients one by one. The objective is to get a better performance than our existing solution. We are hopeful of that mainly because of two reasons. Firstly, we will have a dedicated server for each of the client with good hardware instead of having one server with all the clients on that. Secondly, we can spend on hardware much easily than spending on a proprietary appliance. OK. I am hoping this community can help us to know that what would be the good infrastructure/hardware that can help us in achieving our goal. Here are few of the statistics which might act as a starting point. Availability: High (24*7). User Data : 700 GB which will increase to 1.5 TB in next 2-3 years. How do you measure the amount of data? Is that the amount of data before loading, size of the database, or what? Also, is this a single client (thus placed on a single box), or multiple clients? Number of User Databases : 2 (One is the main database, other is used only for working tables where tables gets deleted in every 48 hours) You mentioned 700GB of data - is that just the main database, or both databases? Number of tables : 200 (in the main database), (2000-3000 in working database) Size of top 5 biggest tables : 20-40 GB No of users concurrently accessing the system : 5-6 with write access. 10 with read access. No of User Queries running on the system in a day : ~80K Read-only Queries (Select): ~60K Write queries: ~20K Data Import Queries: ~1K Typical Business Day : 18-20 hours. So is this a typical batch environment when you do the loads at night, but no during loads? That might be possible with clients on dedicated boxes and would allow various optimizations. I can pass on few complex queries to let you guys know what are we doing. Here are few questions: 1.) I don't need a load balancing solution. It must be high availability server and I can work with asynchronous replication. The most important thing here would be recovery should be as fast as possible. What approach would you recommend? Streaming replication. I would probably start with sync replication. 2.) Recommendations on indexes, WAL, table spaces. I am not asking about on which key I need to make indexes, but an high level approach about how to keep them? This might come out as a weird question to many but please excuse me for being a novice. Not sure what exactly are you looking for - there's a lot of things, and many of them depend on what hardware you plan to use. The simplest indexing strategy is to design them along with the schema, and evaluate them on queries (collect slow queries - create suitable indexes - repeat). 3.) What would be the ideal hardware configuration for this requirement? I know there is not a one-stop answer for this, but let's take it is a starting point. We can come to a proper conclusion after a discussion. This is very complex question, to be honest. I assume you're looking for regular servers, in that case a good server with not that many CPUs (say, 32 cores seems to be enough for your workload), plenty of RAM and good disk system to handle the load would be a good start. What are the best on-line resources/books which can tell us about the hardware requirements? I'd say these two books would be helpful: (1) https://www.packtpub.com/big-data-and-business-intelligence/postgresql-9-high-availability-cookbook - explains capacity planning etc. (2) https://www.packtpub.com/big-data-and-business-intelligence/postgresql-90-high-performance - a good book about PostgreSQL performance tuning regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
On 17.3.2015 16:10, Vivekanand Joshi wrote: The confusion for me here is that : I am getting results from the view in around 3 seconds (S_V_D_CAMPAIGN_HIERARCHY) and 25 seconds (S_V_F_PROMOTION_HISTORY_EMAIL) But when I am using these two views in the query as the joining tables, it doesn't give any result. As per my understanding, the planner is making new plan and that is costly instead of using output from the view, which is actually understandable. In general, yes. The problem is that the plan is constructed based on the estimates, and those are very inaccurate in this case. The planner may do various changes, but let's assume that does not happen and the plans are executed and and the results are joined. For example what might happen is this: for each row in 's_v_d_campaign_hierarchy' (1 row expected): execute s_v_f_promotion_history_email join (11644 rows exp.) But then it gets 45k rows from s_v_d_campaign_hierarchy, and ~400x more rows from s_v_f_promotion_history_email (I'm neglecting the join condition here, but that's not really significant). Kabm! In reality, the plan is reorganized (e.g. different join order), but the misestimates are still lurking there. Is there a way, we can do anything about it? Rephrasing the query so that the planner can estimate it more accurately. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
Hi, On 17.3.2015 08:41, Vivekanand Joshi wrote: Hi Guys, Next level of query is following: If this works, I guess 90% of the problem will be solved. SELECT COUNT(DISTINCT TARGET_ID) FROM S_V_F_PROMOTION_HISTORY_EMAIL PH INNER JOIN S_V_D_CAMPAIGN_HIERARCHY CH ON PH.TOUCHPOINT_EXECUTION_ID = CH.TOUCHPOINT_EXECUTION_ID WHERE 1=1 AND SEND_DT = '2014-03-13' AND SEND_DT = '2015-03-14' In this query, I am joining two views which were made earlier with CTEs. I have replaced the CTE's with subqueries. The view were giving me output in around 5-10 minutes and now I am getting the same result in around 3-4 seconds. But when I executed the query written above, I am again stuck. I am attaching the query plan as well the link. http://explain.depesz.com/s/REeu I can see most of the time is spending inside a nested loop and total costs comes out be cost=338203.81..338203.82. Most of that cost comes from this: Seq Scan on s_f_promotion_history base (cost=0.00..283,333.66 rows=1 width=32) Filter: ((send_dt = '2014-03-13 00:00:00'::timestamp without time zone) AND (send_dt = '2015-03-14 00:00:00'::timestamp without time That's a bit weird, I guess. If you analyze this part of the query separately, i.e. EXPLAIN ANALYZE SELECT * FROM s_f_promotion_history WHERE (send_dt = '2014-03-13 00:00:00') AND (send_dt = '2015-03-14 00:00:00') what do you get? I suspect it's used in EXISTS, i.e. something like this: ... WHERE EXISTS (SELECT * FROM s_f_promotion_history WHERE ... send_dt conditions ... AND touchpoint_execution_id = s_f_touchpoint_execution_status_history_1.touchpoint_execution_id) and this is transformed into a nested loop join. If there's a misestimate, this may be quite expensive - try to create index on s_f_promotion_history (touchpoint_execution_id, send_date) regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
On 17.3.2015 12:07, Vivekanand Joshi wrote: EXPLAIN ANALYZE didn't give result even after three hours. In that case the only thing you can do is 'slice' the query into smaller parts (representing subtrees of the plan), and analyze those first. Look for misestimates (significant differences between estimated and actual row counts, and very expensive parts). We can't do that, because we don't have your data or queries, and without the explain analyze it's difficult to give advices. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
Just as I feared, the attached explain analyze results show significant misestimates, like this for example: Nested Loop (cost=32782.19..34504.16 rows=1 width=16) (actual time=337.484..884.438 rows=46454 loops=1) Nested Loop (cost=18484.94..20366.29 rows=1 width=776) (actual time=2445.487..3741.049 rows=45360 loops=1) Hash Left Join (cost=34679.90..37396.37 rows=11644 width=148) (actual time=609.472..9070.675 rows=4559289 loops=1) There's plenty of nested loop joins - the optimizer believes there will be only a few rows in the outer relation, but gets order of magnitude more tuples. And nested loops are terrible in that case. In case of the first view, it seems to be caused by this: Merge Cond: ((s_f_touchpoint_execution_status_history.touchpoint_execution_id = s_f_touchpoint_execution_status_history_1.touchpoint_ex ecution_id) AND (s_f_touchpoint_execution_status_history.creation_dt = (max(s_f_touchpoint_execution_status_history_1.creation_dt especially the ':id = max(:id)' condition is probably giving the optimizer a hard time. This is a conceptually difficult poblem (i.e. fixing this at the optimizer level is unlikely to happen any time soon, because it effectively means you have to predict the statistical properties of the aggregation). You may try increasing the statistical target, which makes the stats a bit more detailed (the default on 9.4 is 100): SET default_statistics_target = 1; ANALYZE; But I don't really believe this might really fix the problem. But maybe it's possible to rewrite the query somehow? Let's experiment a bit - remove the aggregation, i.e. join directly to s_f_touchpoint_execution_status_history. It'll return wrong results, but the estimates should be better, so let's see what happens. You may also try disabling nested loops - the other join algorithms usually perform better with large row counts. SET enable_nestloop = false; This is not a production-suitable solution, but for experimenting that's OK. ISTM what the aggregation (or the whole mergejoin) does is selecting the last s_f_touchpoint_execution_status_history record for each touchpoint_execution_id. There are better ways to determine that, IMHO. For example: (1) adding a 'is_last' flag to s_f_touchpoint_execution_status_history This however requires maintaining that flag somehow, but the join would not be needed at all. The last IDs might be maintained in a separate table - the join would be still necessary, but it might be less intrusive and cheper to maintain. (2) using window functions, e.g. like this: SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id ORDER BY FROM max_creation_dt) AS rn FROM s_f_touchpoint_execution_status_history ) foo WHERE rn = 1 But estimating this is also rather difficult ... (3) Using temporary table / MV - this really depends on your requirements, load schedule, how you run the queries etc. It would however fix the estimation errors (probably). The 2nd view seems to suffer because of the same issue (underestimates leading to choice of nested loops), but caused by something else: - Hash Join (cost=1954.13..6249.67 rows=13 width=108) (actual time=31.777..210.346 rows=72670 loops=1) Hash Cond: ((tp_exec.touchpoint_id = tp.touchpoint_id) AND (wave_exec.wave_id = tp.wave_id)) Estimating cardinality of joins with multi-column conditions is difficult, no idea how to fix that at the moment. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
On 17.3.2015 15:19, Thomas Kellerer wrote: Tomas Vondra schrieb am 17.03.2015 um 14:55: (2) using window functions, e.g. like this: SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id ORDER BY FROM max_creation_dt) AS rn FROM s_f_touchpoint_execution_status_history ) foo WHERE rn = 1 But estimating this is also rather difficult ... From my experience rewriting something like the above using DISTINCT ON is usually faster. How do you get the last record (with respect to a timestamp column) using a DISTINCT ON? -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
On 17.3.2015 16:24, Thomas Kellerer wrote: Tomas Vondra schrieb am 17.03.2015 um 15:43: On 17.3.2015 15:19, Thomas Kellerer wrote: Tomas Vondra schrieb am 17.03.2015 um 14:55: (2) using window functions, e.g. like this: SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id ORDER BY FROM max_creation_dt) AS rn FROM s_f_touchpoint_execution_status_history ) foo WHERE rn = 1 But estimating this is also rather difficult ... From my experience rewriting something like the above using DISTINCT ON is usually faster. How do you get the last record (with respect to a timestamp column) using a DISTINCT ON? You need to use order by ... desc. See here: http://sqlfiddle.com/#!15/d4846/2 Nice, thanks! Btw: your row_number() usage wouldn't return the latest row either. It would return the oldest row. Oh, right. I forgot the DESC in the window. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
On 16.3.2015 20:43, Jim Nasby wrote: On 3/13/15 7:12 PM, Tomas Vondra wrote: (4) I suspect many of the relations referenced in the views are not actually needed in the query, i.e. the join is performed but then it's just discarded because those columns are not used. Try to simplify the views as much has possible - remove all the tables that are not really necessary to run the query. If two queries need different tables, maybe defining two views is a better approach. A better alternative with multi-purpose views is to use an outer join instead of an inner join. With an outer join if you ultimately don't refer to any of the columns in a particular table Postgres will remove the table from the query completely. Really? Because a quick test suggests otherwise: db=# create table test_a (id int); CREATE TABLE db=# create table test_b (id int); CREATE TABLE db=# explain select test_a.* from test_a left join test_b using (id); QUERY PLAN -- Merge Left Join (cost=359.57..860.00 rows=32512 width=4) Merge Cond: (test_a.id = test_b.id) - Sort (cost=179.78..186.16 rows=2550 width=4) Sort Key: test_a.id - Seq Scan on test_a (cost=0.00..35.50 rows=2550 width=4) - Sort (cost=179.78..186.16 rows=2550 width=4) Sort Key: test_b.id - Seq Scan on test_b (cost=0.00..35.50 rows=2550 width=4) (8 rows) Also, how would that work with duplicate rows in the referenced table? -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
On 16.3.2015 18:49, Marc Mamin wrote: Hi Team, This is the EXPLAIN ANALYZE for one of the view : S_V_D_CAMPAIGN_HIERARCHY: FWIW, this is a somewhat more readable version of the plan: http://explain.depesz.com/s/nbB In the future, please do two things: (1) Attach the plan as a text file, because the mail clients tend to screw things up (wrapping long lines). Unless the plan is trivial, of course - but pgsql-performance usually deals with complex stuff. (2) Put the plan on explain.depesz.com helps too, because it's considerably more readable (but always do 1, because resorces placed somewhere else tends to disappear, and the posts then make very little sense, which is bad when searching in the archives) (3) Same for stuff pasted somewhere else - always attach it to the message. For example I'd like to give you more accurate advice, but I can't as http://pgsql.privatepaste.com/41207bea45 is unavailable. Rows Removed by Join Filter: 3577676116 That's quite a lot. You're possibly missing a clause in a join, resulting in a cross join. It is also helpful to put your result here: http://explain.depesz.com/ regards, IMHO this is merely a consequence of using the CTE, which produces 52997 rows and is scanned 67508x as the inner relation of a nested loop. That gives you 3577721476 tuples in total, and only 45360 are kept (hence 3577676116 are removed). This is a prime example of why CTEs are not just aliases for subqueries, but may actually cause serious trouble. There are other issues (e.g. the row count estimate of the CTE is seriously off, most likely because of the HashAggregate in the outer branch), but that's a secondary issue IMHO. Vivekanand, try this (in the order of intrusiveness): (1) Get rid of the CTE, and just replace it with subselect in the FROM part of the query, so instead of this: WITH valid_executions AS (...) SELECT ... FROM ... JOIN valid_executions ON (...) you'll have something like this: SELECT ... FROM ... JOIN (...) AS valid_executions ON (...) This way the subselect will optimized properly. (2) Replace the CTE with a materialized view, or a temporary table. This has both advantages and disadvantages - the main advantage is that you can create indexes, collect statistics. Disadvantage is you have to refresh the MV, fill temporary table etc. I expect (1) to improve the performance significantly, and (2) might improve it even further by fixing the misestimates. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 15.3.2015 13:07, Robert Kaye wrote: If the load problem really is being caused by swapping when things really shouldn't be swapping, it could be a matter of adjusting your swappiness - what does cat /proc/sys/vm/swappiness show on your server? 0 We adjusted that too, but no effect. (I’ve updated the blog post with these two comments) IMHO setting swappiness to 0 is way too aggressive. Just set it to something like 10 - 20, that works better in my experience. There are other linux memory management things that can cause postgres and the server running it to throw fits like THP and zone reclaim. I don't have enough info about your system to say they are the cause either, but check out the many postings here and other places on the detrimental effect that those settings *can* have. That would at least give you another angle to investigate. If there are specific things you’d like to know, I’ve be happy to be a human proxy. :) I'd start with vm.* configuration, so the output from this: # sysctl -a | grep '^vm.*' and possibly /proc/meminfo. I'm especially interested in the overcommit settings, because per the free output you provided there's ~16GB of free RAM. BTW what amounts of data are we talking about? How large is the database and how large is the active set? I also noticed you use kernel 3.2 - that's not the best kernel version for PostgreSQL - see [1] or [2] for example. [1] https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4 [2] http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 15.3.2015 23:47, Andres Freund wrote: On 2015-03-15 12:25:07 -0600, Scott Marlowe wrote: Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're not just losing performance, that huge shared_buffers is now working against you because what you THINK are buffers in RAM to make things faster are in fact blocks on a hard drive being swapped in and out during reads. It's the exact opposite of fast. :) IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out actively used pages, the problem isn't the 12GB of shared buffers, but that you require so much memory for other things. That needs to be fixed. I second this opinion. As was already pointed out, the 500 connections is rather insane (assuming the machine does not have hundreds of cores). If there are memory pressure issues, it's likely because many queries are performing memory-expensive operations at the same time (might even be a bad estimate causing hashagg to use much more than work_mem). But! We haven't even established that swapping is an actual problem here. The ~2GB of swapped out memory could just as well be the java raid controller management monstrosity or something similar. Those pages won't ever be used and thus can better be used to buffer IO. You can check what's actually swapped out using: grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' For swapping to be actually harmful you need to have pages that are regularly swapped in. vmstat will tell. I've already asked for vmstat logs, so let's wait. In a concurrent OLTP workload (~450 established connections do suggest that) with a fair amount of data keeping the hot data set in shared_buffers can significantly reduce problems. Constantly searching for victim buffers isn't a nice thing, and that will happen if your most frequently used data doesn't fit into s_b. On the other hand, if your data set is so large that even the hottest part doesn't fit into memory (perhaps because there's no hottest part as there's no locality at all), a smaller shared buffers can make things more efficient, because the search for replacement buffers is cheaper with a smaller shared buffers setting. I've met many systems with max_connections values this high, and it was mostly idle connections because of separate connection pools on each application server. So mostly idle (90% of the time), but at peak time all the application servers want to od stuff at the same time. And it all goes KABOOOM! just like here. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 16.3.2015 00:55, mich...@sqlexec.com wrote: Why is 500 connections insane. We got 32 CPU with 96GB and 3000 max connections, and we are doing fine, even when hitting our max concurrent connection peaks around 4500. At a previous site, we were using 2000 max connections on 24 CPU and 64GB RAM, with about 1500 max concurrent connections. So I wouldn't be too hasty in saying more than 500 is asking for trouble. Just as long as you got your kernel resources set high enough to sustain it (SHMMAX, SHMALL, SEMMNI, and ulimits), and RAM for work_mem. If all the connections are active at the same time (i.e. running queries), they have to share the 32 cores somehow. Or I/O, if that's the bottleneck. In other words, you're not improving the throughput of the system, you're merely increasing latencies. And it may easily happen that the latency increase is not linear, but grows faster - because of locking, context switches and other process-related management. Imagine you have a query taking 1 second of CPU time. If you have 64 such queries running concurrently on 32 cores, each gets only 1/2 a CPU and so takes =2 seconds. With 500 queries, it's =15 seconds per, etc. If those queries are acquiring the same locks (e.g. updating the same rows, or so), you can imagine what happens ... Also, if part of the query required a certain amount of memory for part of the plan, it now holds that memory for much longer too. That only increases the change of OOM issues. It may work fine when most of the connections are idle, but it makes storms like this possible. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 15.3.2015 18:54, Ilya Kosmodemiansky wrote: On Sun, Mar 15, 2015 at 8:46 PM, Andres Freund and...@2ndquadrant.com wrote: That imo doesn't really have anything to do with it. The primary benefit of a BBU with writeback caching is accelerating (near-)synchronous writes. Like the WAL. My point was, that having no proper raid controller (today bbu surely needed for the controller to be a proper one) + heavy writes of any kind, it is absolutely impossible to live with large shared_buffers and without io problems. That is not really true, IMHO. The benefit of the write cache is that it can absorb certain amount of writes, equal to the size of the cache (nowadays usually 512MB or 1GB), without forcing them to disks. It however still has to flush the dirty data to the drives later, but that side usually has much lower throughput - e.g. while you can easily write several GB/s to the controller, the drives usually handle only ~1MB/s of random writes each (I assume rotational drives here). But if you do a lot of random writes (which is likely the case for write-heavy databases), you'll fill the write cache pretty soon and will be bounded by the drives anyway. The controller really can't help with sequential writes much, because the drives already handle that quite well. And SSDs are a completely different story of course. That does not mean the write cache is useless - it can absorb short bursts of random writes, fix the write hole with RAID5, the controller may compute the parity computation etc. Whenever someone asks me whether they should buy a RAID controller with write cache for their database server, my answer is absolutely yes in 95.23% cases ... ... but really it's not something that magically changes the limits for write-heavy databases - the main limit are still the drives. regards Tomas -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
On 13.3.2015 21:46, Vivekanand Joshi wrote: Since I was doing it only for the testing purposes and on a development server which has only 8 GB of RAM, I used only 10m rows. But the original table has 1.5 billion rows. We will obviously be using a server with very high capacity, but I am not satisfied with the performance at all. This might be only a start, so I might get a better performance later. OK, understood. Yes, the view is complex and almost is created by using 10 tables. Same goes with other views as well but this is what we are using in Netezza as well. And we are getting results of the full report in less than 5 seconds. And add to that, this is only a very little part of the whole query used in a report. Well, in the very first message you asked Is the query written correctly as per the PostgreSQL? - how can we decide that when most of the query is hidden in some unknown view? I will post the result of whole query with Explain analyze tomorrow. Please also collect some information about the system using iostat, vmstat and such, so that we know what is the bottleneck. We might even consider taking experts advice on how to tune queries and server, but if postgres is going to behave like this, I am not sure we would be able to continue with it. That's probably a good idea. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
Hi, On 13.3.2015 20:59, Vivekanand Joshi wrote: I am really worried about the performance of PostgreSQL as we have almost 1.5 billion records in promotion history table. Do you guys In the previous message you claimed the post table has 10M rows ... really think PostgreSQL can handle this much load. We have fact tables which are more than 15 GB in size and we have to make joins with those tables in almost every query. That depends on what performance you're looking for. You'll have to provide considerably more information until we can help you. You might want to check this: https://wiki.postgresql.org/wiki/Slow_Query_Questions You have not provided the full query, just a query apparently referencing views, so that the actual query is way more complicated. Also, plain EXPLAIN is not really sufficient, we need EXPLAIN ANALYZE. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues
On 14.3.2015 00:28, Vivekanand Joshi wrote: Hi Guys, So here is the full information attached as well as in the link provided below: http://pgsql.privatepaste.com/41207bea45 I can provide new information as well. Thanks. We still don't have EXPLAIN ANALYZE - how long was the query running (I assume it got killed at some point)? It's really difficult to give you any advices because we don't know where the problem is. If EXPLAIN ANALYZE really takes too long (say, it does not complete after an hour / over night), you'll have to break the query into parts and first tweak those independently. For example in the first message you mentioned that select from the S_V_D_CAMPAIGN_HIERARCHY view takes ~9 minutes, so start with that. Give us EXPLAIN ANALYZE for that query. Few more comments: (1) You're using CTEs - be aware that CTEs are not just aliases, but impact planning / optimization, and in some cases may prevent proper optimization. Try replacing them with plain views. (2) Varadharajan Mukundan already recommended you to create index on s_f_promotion_history.send_dt. Have you tried that? You may also try creating an index on all the columns needed by the query, so that Index Only Scan is possible. (3) There are probably additional indexes that might be useful here. What I'd try is adding indexes on all columns that are either a foreign key or used in a WHERE condition. This might be an overkill in some cases, but let's see. (4) I suspect many of the relations referenced in the views are not actually needed in the query, i.e. the join is performed but then it's just discarded because those columns are not used. Try to simplify the views as much has possible - remove all the tables that are not really necessary to run the query. If two queries need different tables, maybe defining two views is a better approach. (5) The vmstat / iostat data are pretty useless - what you provided are averages since the machine was started, but we need a few samples collected when the query is running. I.e. start the query, and then give us a few samples from these commands: iostat -x -k 1 vmstat 1 Would like to see if queries of these type can actually run in postgres server? Why not? We're running DWH applications on tens/hundreds of GBs. If yes, what would be the minimum requirements for hardware? We would like to migrate our whole solution on PostgreSQL as we can spend on hardware as much as we can but working on a proprietary appliance is becoming very difficult for us. That's difficult to say, because we really don't know where the problem is and how much the queries can be optimized. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT
On 11.3.2015 18:30, Jeff Janes wrote: On Sat, Mar 7, 2015 at 7:44 AM, Tomas Vondra tomas.von...@2ndquadrant.com mailto:tomas.von...@2ndquadrant.com wrote: On 7.3.2015 03:26, Jeff Janes wrote: On Fri, Mar 6, 2015 at 5:38 PM, Tom Lane t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us wrote: But the actual query is using a seq scan, and so it would hint the table in efficient sequential order, rather than hinting the table haphazardly in index order like probing the endpoint does. I think this has nothing to do with the plan itself, but with the estimation in optimizer - that looks up the range from the index in some cases, and that may generate random I/O to the table. Right. Tom was saying that the work needs to be done anyway, but it is just that some ways of doing the work are far more efficient than others. It just happens that the executed plan in this case would do it more efficiently, (but in general you aren't going to get any less efficient than having the planner do it in index order). Oh! Now I see what you meant. I parsed is as if you're suggesting that the theory does not match the symptoms because the plan contains sequential scan yet there's a lot of random I/O. But now I see that's not what you claimed, so sorry for the noise. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT
On 7.3.2015 03:26, Jeff Janes wrote: On Fri, Mar 6, 2015 at 5:38 PM, Tom Lane t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us wrote: But the actual query is using a seq scan, and so it would hint the table in efficient sequential order, rather than hinting the table haphazardly in index order like probing the endpoint does. I think this has nothing to do with the plan itself, but with the estimation in optimizer - that looks up the range from the index in some cases, and that may generate random I/O to the table. Also, it's less than clear why only this particular query is showing any stress. Dead rows should be a hazard for anything, especially if there are enough of them to require hours to re-hint. And why wouldn't autovacuum get to them first? Say the timing of this query is such that 10% of the parent turns over between invocations of this query, and that this 10% is all at the end of some index but random over the table heap. If autovac kicks in at 20% turn over, then half the time autovac does get to them first, and half the time it doesn't. It would be interesting to know if this query is bad every time it is planner, or just sometimes. Yeah, this might be the reason. Another possibility is that this is part of some large batch, and autovacuum simply did not have change to do the work. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT
On 6.3.2015 01:44, Tom Lane wrote: Tomas Vondra tomas.von...@2ndquadrant.com writes: On 5.3.2015 16:01, Gunnlaugur Thor Briem wrote: - postgres version is 9.1.13 The only thing I can think of is some sort of memory exhaustion, resulting in swapping out large amounts of memory. I'm wondering about the issue addressed by commit fccebe421 (Use SnapshotDirty rather than an active snapshot to probe index endpoints). Now, that was allegedly fixed in 9.1.13 ... but if the OP were confused and this server were running, say, 9.1.12, that could be a viable explanation. Another possibly viable explanation for seeing the issue in 9.1.13 would be if I fat-fingered the back-patch somehow :-(. How would fccebe421 explain the large amount of random writes (~4MB/s for more than an hour), reported in the initial post? And why would that only affect the EXPLAIN and not the bare query? I guess there might be two sessions, one keeping uncommitted changes (thus invisible tuples), and the other one doing the explain. And the actual query might be executed after the first session does a commit. But the random writes don't really match in this scenario ... regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT
Hi, On 5.3.2015 16:01, Gunnlaugur Thor Briem wrote: Hi, thanks for your follow-up questions. - postgres version is 9.1.13 - the number of rows (in this latest instance) is 28,474,842 - I've clustered and vacuum-full-ed and analyzed this table frequently, attempting to troubleshoot this. (Running vacuum full on the whole catalog seems a little excessive, and unlikely to help.) - no other processes are likely to be interfering; nothing other than PostgreSQL runs on this machine (except for normal OS processes and New Relic server monitoring service); concurrent activity in PostgreSQL is low-level and unrelated, and this effect is observed systematically whenever this kind of operation is performed on this table - no override for this table; the system default_statistics_target is 100 (the default) - yes, an ANALYZE is performed on the temp table after the COPY and before the INSERT - no index on the temp table (but I'm scanning the whole thing anyway). There are indexes on f_foo as detailed in my original post, and I expect the PK to make the WHERE NOT EXISTS filtering efficient (as it filters on exactly all columns of the PK) ... but even if it didn't, I would expect that to only slow down the actual insert execution, not the EXPLAIN. The only thing I can think of is some sort of memory exhaustion, resulting in swapping out large amounts of memory. That'd explain the I/O load. Can you run something like vmstat to see if this really is swap? The fact that plain INSERT does not do that contradicts that, as it should be able to plan either both queries (plain and EXPLAIN), or none of them. Can you prepare a self-contained test case? I.e. a script that demonstrates the issue? I tried to reproduce the issue using the information provided so far, but unsuccessfully :-( Even if you could reproduce the problem on another machine (because of keeping the data internal) on a server with debug symbols and see where most of the time is spent (e.g. using 'perf top'), that'd be useful. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] working around JSONB's lack of stats?
On 29.1.2015 00:03, Josh Berkus wrote: On 01/28/2015 11:48 AM, Tomas Vondra wrote: On 27.1.2015 08:06, Josh Berkus wrote: Folks, ... On a normal column, I'd raise n_distinct to reflect the higher selecivity of the search terms. However, since @ uses contsel, n_distinct is ignored. Anyone know a clever workaround I don't currently see? I don't see any reasonable workaround :-( ISTM we'll have to invent a way to collect useful stats about contents of JSON/JSONB documents. JSONB is cool, but at the moment we're mostly relying on defaults that may be reasonable, but still misfire in many cases. Do we have any ideas of how that might work? We're already collecting stats about contents of arrays, and maybe we could do something similar for JSONB? The nested nature of JSON makes that rather incompatible with the flat MCV/histogram stats, though. Well, I was thinking about this. We already have most_common_elem (MCE) for arrays and tsearch. What if we put JSONB's most common top-level keys (or array elements, depending) in the MCE array? Then we could still apply a simple rule for any path criteria below the top-level keys, say assuming that any sub-key criteria would match 10% of the time. While it wouldn't be perfect, it would be better than what we have now. So how would that work with your 'tags' example? ISTM most of your documents have 'tags' as top-level key, so that would end up in the MCV list. But there's no info about the elements of the 'tags' array (thus the 10% default, which may work in this particular case, but it's hardly a general solution and I doubt it's somehow superior to the defaults we're using right now). I think a 'proper' solution to JSONB stats needs to somehow reflect the nested structure. What I was thinking about is tracking MCV for complete paths, i.e. for a document: { keyA : { keyB : x, keyC : z, } keyD : [1, 2, 3, 4] } We'd extract three paths keyA.keyB keyA.keyC keyD and aggregate that over all the documents to select the MCV paths. And then, for each of those MCV paths track the most common values. ISTM this would allow better estimations, but it has issues too: Firstly, it does not match the MCV structure, because it requires storing (a) MCV paths and (b) MCV values for those paths. Moreover, (b) probably stores different data types (some values are strings, some integers, etc.). Arrays might be handled just like regular arrays, i.e. tracking stats of elements, but it's still mixed data types. Secondly, I think it's based on the assumption of independence (i.e. that the occurence of one path does not depend on occurence of a different path in the same document). Same for values x paths. Which may or may not be be true - it's essentially the same as assumption of independence for predicates on multiple columns. While I do have ideas on how to approach this in the multi-column case, handling this for JSONB is going to be much more complex I think. But the first question (what stats to collect and how to store them) is the most important at this point, I guess. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] working around JSONB's lack of stats?
On 27.1.2015 08:06, Josh Berkus wrote: Folks, ... On a normal column, I'd raise n_distinct to reflect the higher selecivity of the search terms. However, since @ uses contsel, n_distinct is ignored. Anyone know a clever workaround I don't currently see? I don't see any reasonable workaround :-( ISTM we'll have to invent a way to collect useful stats about contents of JSON/JSONB documents. JSONB is cool, but at the moment we're mostly relying on defaults that may be reasonable, but still misfire in many cases. Do we have any ideas of how that might work? We're already collecting stats about contents of arrays, and maybe we could do something similar for JSONB? The nested nature of JSON makes that rather incompatible with the flat MCV/histogram stats, though. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is PostgreSQL not using my index?
Hi, On 26.1.2015 17:32, Christian Roche wrote: select * from mixpanel_events_201409 mp inner join mixpanel_event_list ev on ( ev.id = mp.event_id ) where ev.id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318); Hash Join (cost=20.73..2892183.32 rows=487288 width=1000) Hash Cond: (mp.event_id = ev.id) - Seq Scan on mixpanel_events_201409 mp (cost=0.00..2809276.70 rows=20803470 width=949) - Hash (cost=20.57..20.57 rows=13 width=51) - Seq Scan on mixpanel_event_list ev (cost=0.00..20.57 rows=13 width=51) Filter: (id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[])) Both tables have been vacuum analyzed. Can we get EXPLAIN ANALYZE please, and maybe some timings for the two plans? Otherwise we have no clue how accurate those estimates really are, making it difficult to judge the plan choice. You might also use enable_hashjoin=off to force a different join algorithm (it may not switch to nested loop immediately, so maybe try the other enable_* options). The estimated row counts are quite near each other (410k vs. 487k), but the costs are not. I'm pretty sure that's because while the fist query has WHERE condition directly on the event_id column, the second one moves the condition to the 'list' table, forcing this particular plan. But as the condition is on the join column, you may try moving it back: select * from mixpanel_events_201409 mp inner join mixpanel_event_list ev on ( ev.id = mp.event_id ) where mp.event_id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318); Of course, this only works on this particular column - it won't work for other columns in the 'list' table. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query performance
Hi, On 25.1.2015 07:38, Joe Van Dyk wrote: Here's one that's not quite as well: http://explain.depesz.com/s/SgT As Pavel already pointed out, the first problem is this part of the plan: Seq Scan on events e2 (cost=0.00..120,179.60 rows=4,450,241 width=51) (actual time=0.014..33,773.370 rows=4,450,865 loops=1) Filter: (product_id '81716'::citext) Consuming ~33 seconds of the runtime. If you can make this faster somehow (e.g. by getting rid of the citext cast), that'd be nice. Another issue is that the hashjoin is batched: Buckets: 65536 Batches: 8 Memory Usage: 46085kB The hash preparation takes ~40 seconds, so maybe try to give it a bit more memory - I assume you have work_mem=64MB, so try doubling that (ISTM 512MB should work with a single batch). Maybe this won't really improve the performance, though. It still has to process ~4.5M rows. Increasing the work mem could also result in switching to hash aggregate, making the sort (~30 seconds) unnecessary. Anyway, ISTM this works as expected, i.e. (a) with rare product_id values the queries are fast (b) with common product_id values the queries are slow That's expected, because (b) needs to process much more data. I don't think you can magically make it run as fast as (a). The best solution might be to keep a pre-aggregated results - I don't think you really need exact answers when recommending similar products. I also wonder if you really need to join the tables? I mean, what if you do something like this: CREATE TABLE events_aggregated AS SELECT site_id, array_agg(product_id) AS product_ids, count(nullif(e2.type='viewed', false)) view_count, count(nullif(e2.type='purchased', false)) purchase_count FROM events GROUP BY 1; and then using intarray with GIN indexes to query this table? Something like this: CREATE products_agg_idx ON aggregated USING GIN (product_ids gin__int_ops); SELECT * FROM events_aggregated WHERE product_ids @ ARRAY['82503']; regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to tell ANALYZE to collect statistics from the whole table?
Hi, On 25.1.2015 00:33, AlexK987 wrote: The documentation states that The extent of analysis can be controlled by adjusting the default_statistics_target configuration variable. It looks like I can tell Postgres to create more histograms with more bins, and more distinct values. This implicitly means that Postgres will use a larger random subset to calculate statistics. However, this is not what I want. My data may be quite skewed, and I want full control over the size of the sample. I want to explicitly tell Postgres to analyze the whole table. How can I accomplish that? I don't think there's an official way to do that - at least I can't think of one. The only thing you can do is increasing statistics target (either globally by setting default_statistics_target, or per column using ALTER TABLE ... SET STATISTICS). As you noticed, this however controls two things - sample size and how detailed the statistics (MCV list / histogram) will be. The statistics target is used as upper bound for number of MCV items / histogram bins, and the number of sampled rows is (300 * statistics_target). With default_statistics_target = 1 (which si the max allowed value since 9.0), this produces very detailed stats and uses sample of ~3M rows. It's a bit more complicated though, because there's an algorithm that decides how many MCV items / histogram buckets to actually create, based on the data. So you may not get more detailed stats, even when using larger sample. That being said, I really doubt increasing the statistics target above 1 (or even sampling the whole table) will help you in practice. Might be worth showing an example of a bad estimate with your data, or maybe a test case to play with. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to tell ANALYZE to collect statistics from the whole table?
On 25.1.2015 02:04, AlexK987 wrote: Tomas, Thank you for a very useful reply. Right now I do not have a case of poor performance caused by strong data skew which is not properly reflected in statistics. I was being defensive, trying to prevent every possible thing that might go wrong. OK. My recommendation is not to mess with default_statistics unless you actually have to (e.g. increasing the value on all tables, withouth a query where the current value causes trouble). It increases time to plan the queries, collect statistics (ANALYZE / autovacuum) etc. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Initial insert
it to 1 ms, I don't see how this would do a difference in a batch-style job. If you're doing many such queries (with different id_article values), you may do something like this select caracteris1_.id_article, sum(quantitest0_.quantite_valeur) as col_0_0_ from dm5_quantitestock quantitest0_, dm5_caracteristiquearticlestock caracteris1_ where quantitest0_.id_caracteristiquearticlestock=caracteris1_.id group by caracteris1_.id_article and then query this (supposedly much smaller) table. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another abort-early plan disaster on 9.3
On 21.11.2014 19:38, Jeff Janes wrote: When I run this patch on the regression database, I get a case where the current method is exact but the adaptive one is off: WARNING: ndistinct estimate current=676.00 adaptive=906.00 select count(distinct stringu1) from onek; 676 It should be seeing every single row, so I don't know why the adaptive method is off. Seems like a bug. Thanks for noticing this. I wouldn't call it a bug, but there's clearly room for improvement. The estimator, as described in the original paper, does not expect the sampling to be done our way (using fixed number of rows) but assumes to get a fixed percentage of rows. Thus it does not expect the number of sampled rows to get so close (or equal) to the total number of rows. I think the only way to fix this is by checking if samplerows is close to totalrows, and use a straightforward estimate in that case (instead of a more sophisticated one). Something along these lines: if (samplerows = 0.95 * totalrows) stats-stadistinct = (d + d/0.95) / 2; which means if we sampled = 95% of the table, use the number of observed distinct values directly. I have modified the estimator to do the adaptive estimation, and then do this correction too (and print the values). And with that in place I get these results WARNING: ndistinct estimate current=676.00 adaptive=996.00 WARNING: corrected ndistinct estimate current=676.00 adaptive=693.79 So it gets fairly close to the original estimate (and exact value). In the end, this check should be performed before calling the adaptive estimator at all (and not calling it in case we sampled most of the rows). I also discovered an actual bug in the optimize_estimate() function, using 'f_max' instead of the number of sampled rows. Attached is a patch fixing the bug, and implementing the sample size check. regards Tomas diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 732ab22..3975fb6 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -110,6 +110,8 @@ static void update_attstats(Oid relid, bool inh, static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull); static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull); +static double optimize_estimate(int total_rows, int sample_rows, int ndistinct, +int *f, int f_max); /* * analyze_rel() -- analyze one relation @@ -2369,6 +2371,11 @@ compute_scalar_stats(VacAttrStatsP stats, int slot_idx = 0; CompareScalarsContext cxt; + /* f values for the estimator - messy and we likely need much + * less memory, but who cares */ + int f_max = 0; /* max number of duplicates */ + int *f_count = (int*)palloc0(sizeof(int)*values_cnt); + /* Sort the collected values */ cxt.ssup = ssup; cxt.tupnoLink = tupnoLink; @@ -2410,6 +2417,7 @@ compute_scalar_stats(VacAttrStatsP stats, ndistinct++; if (dups_cnt 1) { + nmultiple++; if (track_cnt num_mcv || dups_cnt track[track_cnt - 1].count) @@ -2435,6 +2443,12 @@ compute_scalar_stats(VacAttrStatsP stats, track[j].first = i + 1 - dups_cnt; } } + +/* increment the number of values with this number of + * repetitions and the largest number of repetitions */ +f_count[dups_cnt] += 1; +f_max = (f_max dups_cnt) ? dups_cnt : f_max; + dups_cnt = 0; } } @@ -2481,6 +2495,7 @@ compute_scalar_stats(VacAttrStatsP stats, double numer, denom, stadistinct; + double adaptdistinct; /* adaptive estimate */ numer = (double) samplerows *(double) d; @@ -2494,6 +2509,20 @@ compute_scalar_stats(VacAttrStatsP stats, if (stadistinct totalrows) stadistinct = totalrows; stats-stadistinct = floor(stadistinct + 0.5); + + /* compute the adaptive estimate */ + adaptdistinct = optimize_estimate(totalrows, samplerows, d, f_count, f_max); + + elog(WARNING, ndistinct estimate current=%.2f adaptive=%.2f, stadistinct, adaptdistinct); + + /* if we've seen 'almost' all rows, use the estimate instead */ + if (samplerows = 0.95 * totalrows) + { +adaptdistinct = (d + d/0.95)/2; +elog(WARNING, corrected ndistinct estimate current=%.2f adaptive=%.2f, + stadistinct, adaptdistinct); + } + } /* @@ -2819,3 +2848,84 @@ compare_mcvs(const void *a, const void *b) return da - db; } + + +/* + * We need to minimize this equality (find m solving it) + * + * m - f1 - f2 = f1 * (A + A(m)) / (B + B(m)) + * + * where A, B are effectively constants (not depending on m), and A(m) + * and B(m) are functions. This is equal to solving + * + * 0 = f1 * (A + A(m)) / (B + B(m)) - (m - f1 - f2) + * + * Instead of looking for the exact solution to this equation (which + * might be fractional), we'll look for a natural number minimizing + * the absolute difference. Number of (distinct) elements is a natural + * number, and we don't mind if the
Re: [PERFORM] Increased shared_buffer setting = lower hit ratio ?
Hi, On 14.11.2014 00:16, CS DBA wrote: This is on a CentOS 6.5 box running PostgreSQL 9.2 On 11/13/14 4:09 PM, CS DBA wrote: All; We have a large db server with 128GB of ram running complex functions. with the server set to have the following we were seeing a somewhat low hit ratio and lots of temp buffers shared_buffers = 18GB work_mem = 75MB effective_cache_size = 105GB checkpoint_segments = 128 when we increased the values to these not only did the hit ratio drop but query times are now longer as well: shared_buffers = 28GB work_mem = 150MB effective_cache_size = 105GB checkpoint_segments = 256 This does not seem to make sense to me, anyone have any thoughts on why more memory resources would cause worse performance? what exactly do you mean by hit ratio - is that the page cache hit ratio (filesystem cache), or shared buffers hit ratio (measured e.g. using pg_buffercache)? Regarding the unexpected decrease of performance after increasing shared_buffers - that's actually quite common behavior. First, the management of shared buffers is not free, and the more pieces you need to manage the more expensive it is. Also, by using larger shared buffers you make that memory unusable for page cache etc. There are also other negative consequences - double buffering, accumulating more changes for a checkpoint etc. The common wisdom (which some claim to be obsolete) is not to set shared buffers over ~10GB of RAM. It's however very workload-dependent so your mileage may vary. To get some basic idea of the shared_buffers utilization, it's possible to compute stats using pg_buffercache. Also pg_stat_bgwriter contains useful data. BTW, it's difficult to say why a query is slow - can you post explain analyze of the query with both shared_buffers settings? And just to check - what kind of hardware/kernel version is this? Do you have numa / transparent huge pages or similar trouble-indicing issues? Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 9.3 performance issues, lots of bind and parse log entries
On 5.11.2014 20:16, Tory M Blue wrote: Thanks Thomas, On 4.11.2014 21:07, Tory M Blue wrote: Well after fighting this all day and dealing with a really sluggish db where even my slon processes were taking several seconds, I reduced my shared_buffers back to 2GB from 10GB and my work_mem from 7.5GB to 2GB. i actually undid all my changes, including dropping my effective_cache back to 7GB and restarted. Have you been using the same parameter values on 9.2, or have you bumped them up only on the new 9.3? I'm wondering whether 9.2 was performing better with the values? Things seem to have been running better on 9.2 at this point I'm using the same config file from 9.2 and I'm still experiencing slowness under heavier write access. And my disk subsystem has not changed. Hardware has not changed, heck i'm even running the old version of slony (have not upgraded it yet). So with shared_buffers=10GB and work_mem=7.5GB you saw significant slowdown both for read and write queries, and after reverting to lower values the read queries are OK but writes still take much longer? But since the upgrade to 9.3 even calls to my sl_log tables which are tiny can take: 2014-11-04 02:58:40 PST clsdb postgres 10.13.200.242(52022) 21642 2014-11-04 02:58:40.515 PSTLOG: duration: 1627.019 ms statement: fetch 500 from LOG; (log had 145K items). I have 300 connections configured, we will use around 87 normally with some spikes, but I'm wondering if the 10GB shared memory caused me some grief, I don't believe it was the work_mem and don't believe it was the effective cache, but something caused my DB to run into issues with basic queries, same queries after restart are finishing in milliseconds instead of 2-3 seconds. No disk issues seen,. I assume only some of the connections will be active (running queries) at the same time. If you expect 32 active queries at the same time, you're only increasing latency. Based on your description I assume you're CPU bound (otherwise the machine would not get hotter, and planning is not about I/O). I'm not sure if this is a production machine or how much you can experiment with it, but it'd be helpful if you could provide some profiling information $ iostat -x -k 1 $ vmstat 1 and such data. A perf profile would be even better, but to get the most useful info it may be necessary to recompile the postgres with debug info and '-fno-omit-frame-pointer'. Then this should do the trick: perf record -a -g (for a few seconds, then Ctrl-C) perf report or just perf top to see what functions are at the top. This is a production server, but it was not really CPU bound with 9.2 so something is odd and I'm starting to stress, because it is a production environment :) Yeah, I was talking about the 9.3 - that's clearly CPU bound. Connections correct, I have less than 20 or so active requests at a time, but i would say active queries are in the handful. I was was not seeing IO, but was seeing load increase as queries started taking longer, but nothing in iostat or vmstat/free showed any contention. Heck even Top while showed some cores as busy, nothing was sitting at over 60% utilized. And we are talking a load of 12-14 here on a 32 core system, when it's normally asleep! Right. That's consistent with being CPU bound. This is my master slon insert server, so I can run commands, tweak configs but any type of rebuild or restart of postgres is a scheduled affair. OK, understood. That however mostly rules out recompiling with debug info and frame pointers, as that might make it significantly slower. That's not something you'd like to do on production. These work loads that seem to be creating the issues run between midnight and now almost 6am, prior to 9.3 it was taking maybe 4 hours, now it's taking 6. So tomorrow AM , I'll grab some stats when I see that it's struggling. But even now with almost no connections or really any major access i'm seeing the sl_log grab 500 rows take 1-3 seconds, which is just plain silly (but it's not a constant, so I may see 1 of these alerts every hour) Is that plain SELECT * FROM sl_log or something more complex? When you do explain analyze on the query, what you see? Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 9.3 performance issues, lots of bind and parse log entries
Hi Tory, On 4.11.2014 21:07, Tory M Blue wrote: Well after fighting this all day and dealing with a really sluggish db where even my slon processes were taking several seconds, I reduced my shared_buffers back to 2GB from 10GB and my work_mem from 7.5GB to 2GB. i actually undid all my changes, including dropping my effective_cache back to 7GB and restarted. Have you been using the same parameter values on 9.2, or have you bumped them up only on the new 9.3? I'm wondering whether 9.2 was performing better with the values? I have 300 connections configured, we will use around 87 normally with some spikes, but I'm wondering if the 10GB shared memory caused me some grief, I don't believe it was the work_mem and don't believe it was the effective cache, but something caused my DB to run into issues with basic queries, same queries after restart are finishing in milliseconds instead of 2-3 seconds. No disk issues seen,. I assume only some of the connections will be active (running queries) at the same time. If you expect 32 active queries at the same time, you're only increasing latency. Based on your description I assume you're CPU bound (otherwise the machine would not get hotter, and planning is not about I/O). I'm not sure if this is a production machine or how much you can experiment with it, but it'd be helpful if you could provide some profiling information $ iostat -x -k 1 $ vmstat 1 and such data. A perf profile would be even better, but to get the most useful info it may be necessary to recompile the postgres with debug info and '-fno-omit-frame-pointer'. Then this should do the trick: perf record -a -g (for a few seconds, then Ctrl-C) perf report or just perf top to see what functions are at the top. So if this is not a 9.3 issue, it's an issue with me upping my config params to a level I thought would give a nice bump.. CentOS 6.x Postgres: 9.3.4 256GB Mem 32Core What kernel version are you using? I assume 6.x means 6.5, or are you using an older CentOS version? Are you using transparent huge pages, NUMA or similar features? Althought, that'd probably impact 9.2 too. Also, what package is this? Is it coming from the CentOS repository, yum.postgresql.org or some other repository? regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Incredibly slow restore times after 9.09.2 upgrade
On 29.10.2014 16:12, jmcdonagh wrote: Hi Tomas- thank you for your thoughtful response! Tomas Vondra wrote On 28.10.2014 21:55, jmcdonagh wrote: Hi, we have a nightly job that restores current production data to the development databases in a 'warm spare' database so that if the developers need fresh data, it's ready during the day. When we moved from 9.0 to 9.2 suddenly the restores began to take from a few hours to more like 15 hours or so. We're in Amazon EC2, I've tried new EBS volumes, warmed them up, threw IOPS at them, pretty much all the standard stuff to get more disk performance. So, if I understand it correctly, you've been restoring into 9.0, then you switched to 9.2 and it's much slower? Yes- but since the move was done utilizing snapshots so the move involves new volumes, but I have created new volumes since then to rule out a single bad volume. My advice would be to do some basic low-level performance tests to rule this out. Use dd or (better) fio to test basic I/O performance, it's much easier to spot issues that way. Tomas Vondra wrote Is the 9.2 configured equally to 9.0? If you do something like this SELECT name, setting FROM pg_settings WHERE source = 'configuration file'; on both versions, what do you get? I no longer have the 9.0 box up but we do track configuration via puppet and git. The only configuration change made for 9.2 is: -#standard_conforming_strings = off +standard_conforming_strings = off Compared to 9.0, I suppose? Anyway, post the non-default config values at least for 9.2, please. Cause we have an old app that needs this setting on otherwise we'd spend a lot of time trying to fix it. I doubt standard_conforming_strings has anything to do with the issues. Tomas Vondra wrote Here's the thing, the disk isn't saturated. The behavior I'm seeing seems very odd to me; I'm seeing the source disk which holds the dump saturated by reads, which is great, but then I just see nothing being written to the postgres volume. Just nothing happening, then a small burst. There is no write queue backup on the destination disk either. if I look at pg_stat_activity I'll see something like: COPY salesforce_reconciliation (salesforce_id, email, advisor_salesforce_id, processed) FROM stdin and even for small tables, that seems to take a very long time even though the destination disk is almost at 0 utilization. So, where's the bottleneck? Clearly, there's one, so is it a CPU, a disk or something else? Or maybe network, because you're using EBS? What do you mean by 'utilization'? How do you measure that? The bottleneck is I/O somehow. I say somehow, because I see iowait averaging about 50% between two CPUs, but there is just no writes to the destination EBS volume really happening, just reads from the disk where the source dump is located, then bursts of writes to the destination volume every so often. It's kind of puzzling. This is happening on multiple database servers, in multiple availability zones. Driving me bonkers. What I mean by utilization is util% from iostat -m -x 1. I find this rather contradictory. At one moment you say the disk isn't saturated, the next moment you say you're I/O bound. Also, iowait (as reported e.g. by 'top') is tricky to interpret correctly, especially on multi-cpu systems (nice intro to the complexity [1]). It's really difficult to interpret the 50% iowait without more info about what's happening on the machine. IMHO, the utilization (as reported by iotop) is much easier to interpret, because it means '% of time the device was servicing requests'. It has issues too, because 100% does not mean 'saturated' (especially on RAID arrays that can service multiple requests in parallel), but it's better than iowait. If I had to guess based from your info, I'd bet you're CPU bound, so there's very little idle time and about 50% of it is spent waiting for I/O requests (hence the 50% iowait). But in total the amount of I/O is very small, so %util is ~0. Please, post a few lines of 'iostat -x -k 1' output. Samples from 'top' and 'vmstat 1' would be handy too. regards Tomas [1] http://veithen.blogspot.cz/2013/11/iowait-linux.html -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Incredibly slow restore times after 9.09.2 upgrade
On 28.10.2014 21:55, jmcdonagh wrote: Hi, we have a nightly job that restores current production data to the development databases in a 'warm spare' database so that if the developers need fresh data, it's ready during the day. When we moved from 9.0 to 9.2 suddenly the restores began to take from a few hours to more like 15 hours or so. We're in Amazon EC2, I've tried new EBS volumes, warmed them up, threw IOPS at them, pretty much all the standard stuff to get more disk performance. So, if I understand it correctly, you've been restoring into 9.0, then you switched to 9.2 and it's much slower? Is the 9.2 configured equally to 9.0? If you do something like this SELECT name, setting FROM pg_settings WHERE source = 'configuration file'; on both versions, what do you get? Here's the thing, the disk isn't saturated. The behavior I'm seeing seems very odd to me; I'm seeing the source disk which holds the dump saturated by reads, which is great, but then I just see nothing being written to the postgres volume. Just nothing happening, then a small burst. There is no write queue backup on the destination disk either. if I look at pg_stat_activity I'll see something like: COPY salesforce_reconciliation (salesforce_id, email, advisor_salesforce_id, processed) FROM stdin and even for small tables, that seems to take a very long time even though the destination disk is almost at 0 utilization. So, where's the bottleneck? Clearly, there's one, so is it a CPU, a disk or something else? Or maybe network, because you're using EBS? What do you mean by 'utilization'? How do you measure that? The dumps are created with pg_dump -Fc and restored with pg_restore -d db -j 2 -O -U postgres PostgreSQL-db.sql. OK Is it possible that some default settings were changed from 9.0 to 9.2 that would cause this kind of behavior? I'm stumped here. Thanks in advance for any consideration here. I doubt that. There probably were some changes (after all, we're talking about 2 major versions), but we generally don't change it in a way that'd hurt performance. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sanity checking big select performance
On 28.10.2014 22:15, Jeff Chen wrote: Hi friends! I'd love to get a sanity check on whether a fat select query I'm doing makes sense given the infrastructure that we have. We have 3 big tables that we typically join together for certain queries: a ~40 million row photos table, a ~20 million row users table, and a ~50 million row photo_to_album table that maps photos to albums. So how much data is it? Does it fit within RAM (after loading into DB, with all the indexes)? We like to display real time analytics, which often results in a query like: select (random aggregations ) from photo_to_album join photos on photos.id http://photos.id = photo_to_album.photo_id join users on users.id http://users.id = photos.user_id where photo_to_album.album_id = something and photos.created_at between some dates and other junk We have indexes on all of the joins, and the where clauses. Can we get EXPLAIN (and ideally EXPLAIN ANALYZE) for such queries? One of these queries that should be targeting something like 300K photos takes 38 seconds to run (with an aggregate/nested loop taking effectively all of that time), and then upon second execution with a warm cache, 4 seconds. Well, if you're hitting disk, it's going to be slow. As you observed, after loading it into page cache, it's much faster. Also worryingly, it spikes read IOPS to almost 1500/sec during the time and write IOPS 200/sec. When not running the query, steady level read iops basically nil, write hovers around 50-100. This also increases the queue depth from basically 0 up to 6. Keeping the queue depth high seems to cause timeouts in other queries. The CPU is barely if at all affected, hovering around 20%. Memory also barely affected. 20% is ~2 CPU cores (as you have 8 of them). We have a RDS Postgres database, m3.2xlarge with 2000 Provisioned IOPS and 400GB storage. This translates to 8 virtual CPUs, 30GiB memory, and all SSD drives. AFAIK there are two PostgreSQL major versions supported on RDS - 9.1 and 9.3. Which one are you using? Also, can you list values for some basic parameters (shared_buffers, work_mem)? We don't know what are the default values on RDS, neither if you somehow modified them. Several questions here: 1) Is that level of IOPS normal? Um, why wouldn't it be? Each IO request works with 16 KB (on EBS), and you're reading/writing a certain amount of data. 2) Is it bad that the level of iops can queue requests that screw up the whole database even if it's just select queries? Especially when the CPU and Memory are still plentiful? You're saturating a particular resource. If you hit I/O wall, you can't use the CPU / memory. The fact that it slows down your queries is somehow expected. Is it bad? Well, if you need to minimize impact on other queries, then probably yes. 3) What is up with the huge difference between cold and warm cache? I don't understand why you're surprised by this? The EBS performance on m3.2xlarge (with EBS-Optimized networking, i.e. 1 Gbit dedicated to EBS) you get up to ~120 MB/s, except that you set 2000 IOPS, which is ~32 MB/s. Memory is orders of magnitude faster, hence the difference. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine
Dne 22 Říjen 2014, 0:25, Montana Low napsal(a): I'm running postgres-9.3 on a 30GB ec2 xen instance w/ linux kernel 3.16.3. I receive numerous Error: out of memory messages in the log, which are aborting client requests, even though there appears to be 23GB available in the OS cache. There is no swap on the box. Postgres is behind pgbouncer to protect from the 200 real clients, which limits connections to 32, although there are rarely more than 20 active connections, even though postgres max_connections is set very high for historic reasons. There is also a 4GB java process running on the box. relevant postgresql.conf: max_connections = 1000 # (change requires restart) shared_buffers = 7GB# min 128kB work_mem = 40MB # min 64kB maintenance_work_mem = 1GB # min 1MB effective_cache_size = 20GB sysctl.conf: vm.swappiness = 0 vm.overcommit_memory = 2 This means you have 'no overcommit', so the amount of memory is limited by overcommit_ratio + swap. The default value for overcommit_ratio is 50% RAM, and as you have no swap that effectively means only 50% of the RAM is available to the system. If you want to verify this, check /proc/meminfo - see the lines CommitLimit (the current limit) and Commited_AS (committed address space). Once the committed_as reaches the limit, it's game over. There are different ways to fix this, or at least improve that: (1) increasing the overcommit_ratio (clearly, 50% is way too low - something 90% might be more appropriate on 30GB RAM without swap) (2) adding swap (say a small ephemeral drive, with swappiness=10 or something like that) Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another abort-early plan disaster on 9.3
On 17.10.2014 19:25, Greg Stark wrote: On Wed, Oct 15, 2014 at 7:02 PM, Tomas Vondra t...@fuzzy.cz wrote: If you know the title of the article, it's usually available elsewhere on the web - either at the university site, or elsewhere. I found these two articles about block-based sampling: http://ranger.uta.edu/~gdas/websitepages/preprints-papers/p287-chaudhuri.pdf There are a series of papers with Chaudhuri as lead author which I agree sounds like what Josh is talking about. Note that he's Microsoft Research's database group lead and it would be a pretty safe bet anything published from there is going to be covered by patents from here till next Tuesday (and seventeen years beyond). Hmmm. I have 0 experience with handling patents and related issues. Any idea how to address that? I think this is all putting the cart before the horse however. If we could fix our current sampling to use the data more efficiently that would be a good start before we start trying to read even more data. We currently read just one row from each block on average instead of using the whole block. That's what would be needed in the worst case if the blocks were a very biased sample (which indeed they probably are in most databases due to the way Postgres handles updates). But we could at least give users the option to use more than one row per block when they know it's ok (such as data that hasn't been updated) or detect when it's ok (such as by carefully thinking about how Postgres's storage mechanism would bias the data). I think this will be very tricky, and in fact it may make the estimates much worse easily, because all the algorithms assume random sampling. For example the ndistinct estimator uses the low-frequency values (that were observed only once or twice in the sample). By using multiple rows from each block, you'll significantly influence this probability for columns with values correlated to block (which is quite common. Take for example fact tables in data warehouses - those are usually denormalized, mostly append-only. Say each row has date_id which is a sequential number of a day, with 0 sometime in the past. Daily increments are usually stored on many consecutive blocks, so on each block there's usually a single date_id value. By sampling all rows on a block you gain exactly nothing, and in fact it results in observing no low-frequency values, making the estimator absolutely useless. I can imagine fixing this (although I don't see how exactly), but the thing is we need to fix *all* the estimators we have, not just ndistinct. And that's going to be tough. I don't think adding a knob to tune the number of tuples sampled per block is a good approach. Either we can solve the issues I described (and in that case it's unnecessary), or we can't solve them and it turns into a massive foot gun. But I looked into this and ran into a problem. I think our algorithm for calculating the most frequent values list is bunk. The more rows I picked from each block the more biased that list was towards values seen earlier in the table. What's worse, when that list was biased it threw off the histogram since we exclude the most frequent values from the histogram, so all estimates were thrown off. I think the 'minimal' stats (when we have just '=' for the type) behaves like this, but fixing it by switching to a two-pass approach should not be that difficult (but would cost a few more CPU cycles). Or do you suggest that even the scalar MCV algorithm behaves has this bias issue? I doubt that, because the MCV works with an array sorted by number of occurences, so the position within the table is irrelevant. If we could fix the most frequent values collection to not be biased when it sees values in a clumpy way then I think we would be okay to set the row sample size in Vitter's algorithm to a factor of N larger than the block sample size where N is somewhat smaller than the average number of rows per block. In fact even if we used all the rows in the block I think I've convinced myself that the results would be accurate in most circumstances. I don't expect fixing the MCV to be overly difficult (although it will need a few more CPU cycles). But making it work with the block sampling will be much harder, because of the bias. The phrase 'in most circumstances' doesn't sound really convincing to me ... I think to calcualte the most frequent values more accurately it would take a two pass approach. Scan some random sample of blocks with a counting bloom filter then do a second pass (possibly for the same sample?) keeping counts only for values that the counting bloom filter said hashed to the most common hash values. That might not be exactly the most common values but should be at least a representative sample of the most common values. I don't see why the counting bloom filter would be necessary, in a two pass approach? regards Tomas -- Sent via pgsql
Re: [PERFORM] Yet another abort-early plan disaster on 9.3
On 15.10.2014 19:20, Josh Berkus wrote: On 10/10/2014 04:16 AM, Greg Stark wrote: On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus j...@agliodbs.com wrote: Yes, it's only intractable if you're wedded to the idea of a tiny, fixed-size sample. If we're allowed to sample, say, 1% of the table, we can get a MUCH more accurate n_distinct estimate using multiple algorithms, of which HLL is one. While n_distinct will still have some variance, it'll be over a much smaller range. I've gone looking for papers on this topic but from what I read this isn't so. To get any noticeable improvement you need to read 10-50% of the table and that's effectively the same as reading the entire table -- and it still had pretty poor results. All the research I could find went into how to analyze the whole table while using a reasonable amount of scratch space and how to do it incrementally. So, right now our estimation is off on large tables by -10X to -1X. First, the fact that it's *always* low is an indication we're using the wrong algorithm. Second, we can most certainly do better than a median of -1000X. A few days ago I posted an experimental patch with the adaptive estimator, described in [1]. Not perfect, but based on the testing I did I believe it's a superior algorithm to the one we use now. Would be nice to identify a few datasets where the current estimate is way off. [1] http://ftp.cse.buffalo.edu/users/azhang/disc/disc01/cd1/out/papers/pods/towardsestimatimosur.pdf One interesting set of algorithms is block-based sampling. That is, you read 5% of the physical table in random blocks, reading every row in the block. The block size is determined by your storage block size, so you're not actually reading any more physically than you are logically; it really is just 5% of the table, especially on SSD. Then you apply algorithms which first estimate the correlation of common values in the block (i.e. how likely is it that the table is completely sorted?), and then estimates of how many values there might be total based on the correlation estimate. I think we might also use a different approach - instead of sampling the data when ANALYZE kicks in, we might collect a requested sample of rows on the fly. Say we want 1% sample - whenever you insert a new row, you do [random() 0.01] and if it happens to be true you keep a copy of the row aside. Then, when you need the sample, you simply read the sample and you're done - no random access to the main table, no problems with estimated being off due to block-level sampling, etc. Not sure how to track deletions/updates, though. Maybe rebuilding the sample if the number of deletions exceeds some threshold, but that contradicts the whole idea a bit. I no longer have my ACM membership, so I can't link this, but researchers were able to get +/- 3X accuracy for a TPCH workload using this approach. A real database would be more variable, of course, but even so we should be able to achieve +/- 50X, which would be an order of magnitude better than we're doing now. If you know the title of the article, it's usually available elsewhere on the web - either at the university site, or elsewhere. I found these two articles about block-based sampling: http://ranger.uta.edu/~gdas/websitepages/preprints-papers/p287-chaudhuri.pdf https://www.stat.washington.edu/research/reports/1999/tr355.pdf Maybe there are more, but most of the other links were about how Oracle does this in 11g. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another abort-early plan disaster on 9.3
Dne 10 Říjen 2014, 13:16, Greg Stark napsal(a): On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus j...@agliodbs.com wrote: Yes, it's only intractable if you're wedded to the idea of a tiny, fixed-size sample. If we're allowed to sample, say, 1% of the table, we can get a MUCH more accurate n_distinct estimate using multiple algorithms, of which HLL is one. While n_distinct will still have some variance, it'll be over a much smaller range. I've gone looking for papers on this topic but from what I read this isn't so. To get any noticeable improvement you need to read 10-50% of the table and that's effectively the same as reading the entire table -- and it still had pretty poor results. All the research I could find went into how to analyze the whole table while using a reasonable amount of scratch space and how to do it incrementally. I think it's really difficult to discuss the estimation without some basic agreement on what are the goals. Naturally, we can't get a perfect estimator with small samples (especially when the sample size is fixed and not scaling with the table). But maybe we can improve the estimates without scanning most of the table? FWIW I've been playing with the adaptive estimator described in [1] and the results looks really interesting, IMHO. So far I was testing it on synthetic datasets outside the database, but I plan to use it instead of our estimator, and do some more tests. Would be helpful to get a collection of test cases that currently perform poorly. I have collected a few from the archives, but if those who follow this thread can provide additional test cases / point to a thread describing related etc. that'd be great. It certainly won't be perfect, but if it considerably improves the estimates then I believe it's step forward. Ultimately, it's impossible to improve the estimates without increasing the sample size. [1] http://ftp.cse.buffalo.edu/users/azhang/disc/disc01/cd1/out/papers/pods/towardsestimatimosur.pdf regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another abort-early plan disaster on 9.3
On 10.10.2014 16:21, Craig James wrote: On Fri, Oct 10, 2014 at 5:10 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: I've gone looking for papers on this topic but from what I read this isn't so. To get any noticeable improvement you need to read 10-50% of the table and that's effectively the same as reading the entire table -- and it still had pretty poor results. All the research I could find went into how to analyze the whole table while using a reasonable amount of scratch space and how to do it incrementally. I think it's really difficult to discuss the estimation without some basic agreement on what are the goals. Naturally, we can't get a perfect estimator with small samples (especially when the sample size is fixed and not scaling with the table). But maybe we can improve the estimates without scanning most of the table? FWIW I've been playing with the adaptive estimator described in [1] and the results looks really interesting, IMHO. So far I was testing it on synthetic datasets outside the database, but I plan to use it instead of our estimator, and do some more tests. We've solved this problem using an external (non-Postgres) dynamically optimizing index. In addition to the early abort, we also require an efficient late start, the equivalent of offset 100 limit 10. It's a common problem for web sites that let users page through data with just a tiny amount of state information (a cookie). Yeah, paging is a known example, both for the inefficiency once you get to pages far away, and because of the planning challenges. I think there are known solutions to this problem (http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way), although those are not applicable to all cases. But I'm not sure how that's related to the ndistinct estimation problem, discussed in this thread (or rather in this subthread)? Our index is for chemical structures. Chemicals are indexed on chemical fragments http://emolecules.com/info/molecular-informatics. A search typically starts with 50-200 indexed columns (chemical fragments). The query is always flat, A and B and ... and Z. The indexed fragments are both correlated (the existence of one strongly raises the chances of another) and anti-correlated (certain combinations are very rare). Maybe I don't understand the problem well enough, but isn't this a perfect match for GIN indexes? I mean, you essentially need to do queries like WHERE substance @@ ('A B !C') etc. Which is exactly what GIN does, because it keeps pointers to tuples for each fragment. Static planners simply can't handle the early abort condition, even with good statistics. Many have pointed out that data are lumpy rather than well distributed. A more subtle problem is that you can have evenly distributed data, but badly distributed correlations. Agnes and Bob may be names that are distributed well in a real-estate database, but it might happen that all of the information about homes whose owners' names are Agnes and Bob occurs at the very end of all of your data because they just got married and bought a house. The end result is that even with perfect statistics on each column, you're still screwed. The combinatorial explosion of possible correlations between indexes is intractable. Static planners clearly have limitations, but we don't have dynamic planning in PostgreSQL, so we have to live with them. And if we could improve the quality of estimates - lowering the probability of poorly performing plans, it's probably good to do that. It won't be perfect, but until we have dynamic planning it's better than nothing. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another abort-early plan disaster on 9.3
On 10.10.2014 14:10, Tomas Vondra wrote: Dne 10 Říjen 2014, 13:16, Greg Stark napsal(a): On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus j...@agliodbs.com wrote: Yes, it's only intractable if you're wedded to the idea of a tiny, fixed-size sample. If we're allowed to sample, say, 1% of the table, we can get a MUCH more accurate n_distinct estimate using multiple algorithms, of which HLL is one. While n_distinct will still have some variance, it'll be over a much smaller range. I've gone looking for papers on this topic but from what I read this isn't so. To get any noticeable improvement you need to read 10-50% of the table and that's effectively the same as reading the entire table -- and it still had pretty poor results. All the research I could find went into how to analyze the whole table while using a reasonable amount of scratch space and how to do it incrementally. I think it's really difficult to discuss the estimation without some basic agreement on what are the goals. Naturally, we can't get a perfect estimator with small samples (especially when the sample size is fixed and not scaling with the table). But maybe we can improve the estimates without scanning most of the table? FWIW I've been playing with the adaptive estimator described in [1] and the results looks really interesting, IMHO. So far I was testing it on synthetic datasets outside the database, but I plan to use it instead of our estimator, and do some more tests. Attached is an experimental patch implementing the adaptive estimator. It was fairly simple (although it's a bit messy). It only computes the estimates for the scalar case (i.e. data types that we can sort). Implementing this for the minimal case is possible, but requires a bit more work. It only computes the estimate and prints a WARNING with both the current and new estimate, but the old estimate is stored. I also attach a few synthetic examples of synthetic datasets with distributions stored in various ways, that I used for testing. In all cases there's a single table with 10M rows and a single INT column. There are three kinds of skew: 1) smooth skew - N distinct values (100, 10.000 and 100.000 values) - average moves to 0 as 'k' increases ('k' between 1 and 9) - smooth distribution of frequencies INSERT INTO test SELECT pow(random(),k) * 1 FROM generate_series(1,1000); 2) step skew - a few very frequent values, many rare values - for example this generates 5 very frequent and ~10k rare values INSERT INTO test SELECT (CASE WHEN (v 9) THEN MOD(v,5) ELSE v END) FROM ( SELECT (random()*10)::int AS v FROM generate_series(1,1000) ) foo; Results === I tested this with various statistics target settings (10, 100, 1000), which translates to different sample sizes. statistics target 100 (default, 30k rows, 0.3% sample) == a) smooth skew, 101 values, different skew ('k') k currentadaptive - 1 101102 3 101102 5 101102 7 101102 9 101102 b) smooth skew, 10.001 values, different skew ('k') k currentadaptive - 1 9986 10542 3 8902 10883 5 7579 10824 7 6639 10188 9 5947 10013 c) step skew (different numbers of values) values currentadaptive -- 106 106107 106 35 104 1006 2591262 100062823 11047 statistics target 10 (3k rows, 0.03% sample) a) smooth skew, 101 values, different skew ('k') k currentadaptive - 1 101102 3 101102 5 101102 7 101102 9 101102 b) smooth skew, 10.001 values, different skew ('k') k currentadaptive - 1 9846 10014 3 4399 7190 5 2532 5477 7 1938 4932 9 1623 1623 c) step skew (different numbers of values) values currentadaptive -- 106 100114 106 5 5 1006 37 532 1000632320970 statistics target 1000 (300k rows, 3% sample) = k currentadaptive - 1 101102 3 101102 5 101102 7 101102 9 101102 b) smooth skew, 10.001 values, different skew ('k') k currentadaptive - 1 10001 10002 3 1 1 5 9998 10011 7 9973 10045 9 9939 10114 c) step skew (different numbers of values) values currentadaptive
Re: [PERFORM] Yet another abort-early plan disaster on 9.3
On 10.10.2014 19:59, Craig James wrote: On Fri, Oct 10, 2014 at 9:53 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: On 10.10.2014 16:21, Craig James wrote: Our index is for chemical structures. Chemicals are indexed on chemical fragments http://emolecules.com/info/molecular-informatics. A search typically starts with 50-200 indexed columns (chemical fragments). The query is always flat, A and B and ... and Z. The indexed fragments are both correlated (the existence of one strongly raises the chances of another) and anti-correlated (certain combinations are very rare). Maybe I don't understand the problem well enough, but isn't this a perfect match for GIN indexes? I mean, you essentially need to do queries like WHERE substance @@ ('A B !C') etc. Which is exactly what GIN does, because it keeps pointers to tuples for each fragment. On the day our web site opened we were using tsearch. Before the end of the day we realized it was a bad idea, for the very reasons discussed here. The early-abort/late-start problem (offset N limit M) could take minutes to return the requested page. With the external dynamically-optimized index, we can almost always get answers in less than a couple seconds, often in 0.1 seconds. In the early days of tsearch, it did not support GIN indexes, and AFAIK GiST are not nearly as fast for such queries. Also, the GIN fastscan implemented by Alexander Korotkov in 9.4 makes a huge difference for queries combining frequent and rare terms. Maybe it'd be interesting to try this on 9.4. I'm not saying it will make it faster than the optimized index, but it might be an interesting comparison. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another abort-early plan disaster on 9.3
On 3.10.2014 21:58, Jeff Janes wrote: On Thu, Oct 2, 2014 at 12:56 PM, Josh Berkus j...@agliodbs.com mailto:j...@agliodbs.com wrote: Yes, it's only intractable if you're wedded to the idea of a tiny, fixed-size sample. If we're allowed to sample, say, 1% of the table, we can get a MUCH more accurate n_distinct estimate using multiple algorithms, of which HLL is one. While n_distinct will still have some variance, it'll be over a much smaller range. In my hands, the problems with poor n_distinct were not due to the insufficient size of the sample, but the insufficient randomness of it. Increasing default_statistics_target did help but not because it increases the number of rows sampled, but rather because it increases the number of blocks sampled. Once substantially all of the blocks are part of the block sampling, the bias is eliminated even though it was still sampling a small fraction of the rows (roughly one per block). I don't think that's entirely accurate. According to [1], there's a lower boundary on ratio error, depending on the number of sampled rows. Say there's a table with 10M rows, we sample 30k rows (which is the default). Then with probability 5% we'll get ratio error over 20. That is, we may either estimate 5% or 200% of the actual ndistinct value. Combined with our arbitrary 10% limit that we use to decide whether ndistinct scales with the number of rows, this sometimes explodes. By increasing the statistics target, you get much larger sample and thus lower probability of such error. But nevertheless, it breaks from time to time, and the fact that statistics target is static (and not scaling with the size of the table to get appropriate sample size) is not really helping IMHO. Static sample size may work for histograms, for ndistinct not so much. [1] http://ftp.cse.buffalo.edu/users/azhang/disc/disc01/cd1/out/papers/pods/towardsestimatimosur.pdf So one idea would be go get rid of the 2-stage sampling algorithm (sample blocks, sample rows from the chosen blocks) and just read the whole table and sample rows from it unbiased, at least under some conditions. Some low level benchmarking on my favorite server showed that reading 1% of a system's blocks (in block number order within each file) was no faster than reading all of them from an IO perspective. But that is a virtualized server that wasn't really speced out to be an IO intensive database server in the first place. It would be interesting to see what people get on real hardware that they actually designed for the task. I think there was a discussion about the sampling on pgsql-hackers a while ago ... and yes, here it is [2]. However it seems there was no clear conclusion on how to change it at that time ... [2] http://www.postgresql.org/message-id/ca+tgmozaqygsual2v+yfvsx06dqdqh-pev0nobgpws-dnwa...@mail.gmail.com regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another abort-early plan disaster on 9.3
On 3.10.2014 02:54, Peter Geoghegan wrote: On Thu, Oct 2, 2014 at 12:56 PM, Josh Berkus j...@agliodbs.com wrote: Yes, it's only intractable if you're wedded to the idea of a tiny, fixed-size sample. If we're allowed to sample, say, 1% of the table, we can get a MUCH more accurate n_distinct estimate using multiple algorithms, of which HLL is one. While n_distinct will still have some variance, it'll be over a much smaller range. I think that HyperLogLog, as a streaming algorithm, will always require that the entire set be streamed. This doesn't need to be a big deal - in the case of my abbreviated key patch, it appears to basically be free because of the fact that we were streaming everything anyway. It's a very cool algorithm, with fixed overhead and constant memory usage. It makes very useful guarantees around accuracy. I think you're mixing two things here - estimating the number of distinct values in a sample (which can be done very efficiently using HLL) and estimating the number of distinct values in the whole table. For that HLL is not usable, unless you process all the data. Sadly HLL is rather incompatible with the usual estimators, because the ones I'm aware of need to know the number of occurences for the distinct values etc. But couldn't we just piggyback this on autovacuum? One of the nice HLL features is that it's additive - you can build partial counters for ranges of blocks (say, a few MBs per range), and then merge them when needed. By keeping the parts it's possible to rebuild it separately. But maybe this idea is way too crazy ... regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 60 core performance with 9.3
On 30 Červenec 2014, 3:44, Mark Kirkwood wrote: While these numbers look great in the middle range (12-96 clients), then benefit looks to be tailing off as client numbers increase. Also running with no stats (and hence no auto vacuum or analyze) is way too scary! I assume you've disabled statistics collector, which has nothing to do with vacuum or analyze. There are two kinds of statistics in PostgreSQL - data distribution statistics (which is collected by ANALYZE and stored in actual tables within the database) and runtime statistics (which is collected by the stats collector and stored in a file somewhere on the dist). By disabling statistics collector you loose runtime counters - number of sequential/index scans on a table, tuples read from a relation aetc. But it does not influence VACUUM or planning at all. Also, it's mostly async (send over UDP and you're done) and shouldn't make much difference unless you have large number of objects. There are ways to improve this (e.g. by placing the stat files into a tmpfs). Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance