[HACKERS] master and sync-replica diverging
AMD FX 8120 / centos 6.2 / latest source (git head) It seems to be quite easy to force a 'sync' replica to not be equal to master by recreating+loading a table in a while loop. For this test I compiled+checked+installed three separate instances on the same machine. The replica application_name are names 'wal_receiver_$copy' where $copy is 01, resp. 02. $ ./sync_state.sh pid | application_name | state | sync_state ---+--+---+ 19520 | wal_receiver_01 | streaming | sync 19567 | wal_receiver_02 | streaming | async (2 rows) port | synchronous_commit | synchronous_standby_names --++--- 6564 | on | wal_receiver_01 (1 row) port | synchronous_commit | synchronous_standby_names --++--- 6565 | off| (1 row) port | synchronous_commit | synchronous_standby_names --++--- 6566 | off| (1 row) The test consists of creating a table and loading tab-separated data from file with COPY and then taking the rowcount of that table (13 MB, almost 200k rows) in all three instances: # wget http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz zcat $slurp_file \ | grep -v '^#' \ | grep -Ev '^[[:space:]]*$' \ | psql -c " drop table if exists $table cascade; create table $table ( gene_symbol text ,primary_fbgn text ,secondary_fbgns text ,annotation_idtext ,secondary_annotation_ids text ); copy $table from stdin csv delimiter E'\t'; "; # count on master: echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6564 # count on wal_receiver_01 (sync replica): echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6565 # count on wal_receiver_02 (async replica): echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6566 I expected the rowcounts from master and sync replica to always be the same. Initially this seemed to be the case, but when I run the above sequence in a while loop for a few minutes about 10% of rowcounts from the sync-replica are not equal to the master. Perhaps not a likely scenario, but surely such a deviating rowcount on a sync replica should not be possible? thank you, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is indexonlyscan so darned slow?
On Thu, May 17, 2012 at 6:08 AM, Joshua Berkus wrote: > As you can see, the indexonlyscan version of the query spends 5% as much time > reading the data as the seq scan version, and doesn't have to read the heap > at all. Yet it spends 20 seconds doing ... what, exactly? > > BTW, kudos on the new explain analyze reporting ... works great! Looks like timing overhead. Timing is called twice per tuple which gives around 950ns per timing call for your index only result. This is around what is expected of hpet based timing. If you are on Linux you can check what clocksource you are using by running cat /sys/devices/system/clocksource/clocksource0/current_clocksource You can verify that it is due to timing overhead by adding timing off to the explain clause. Or use the pg_test_timing utility to check the timing overhead on your system. With hpet based timing I'm seeing 660ns timing overhead and 26.5s execution for your query, with timing off execution time falls to 2.1s. For reference, tsc based timing gives 19.2ns overhead and 2.3s execution time with timing. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is indexonlyscan so darned slow?
Ants, Well, that's somewhat better, but again hardly the gain in performance I'd expect to see ... especially since this is ideal circumstances for index-only scan. bench2=# select count(*) from pgbench_accounts; count -- 2000 (1 row) Time: 3827.508 ms bench2=# set enable_indexonlyscan=off; SET Time: 0.241 ms bench2=# select count(*) from pgbench_accounts; count -- 2000 (1 row) Time: 16012.444 ms For some reason counting tuples in an index takes 5X as long (per tuple) as counting them in a table. Why? - Original Message - > On Thu, May 17, 2012 at 6:08 AM, Joshua Berkus > wrote: > > As you can see, the indexonlyscan version of the query spends 5% as > > much time reading the data as the seq scan version, and doesn't > > have to read the heap at all. Yet it spends 20 seconds doing ... > > what, exactly? > > > > BTW, kudos on the new explain analyze reporting ... works great! > > Looks like timing overhead. Timing is called twice per tuple which > gives around 950ns per timing call for your index only result. This > is > around what is expected of hpet based timing. If you are on Linux you > can check what clocksource you are using by running cat > /sys/devices/system/clocksource/clocksource0/current_clocksource > > You can verify that it is due to timing overhead by adding timing off > to the explain clause. Or use the pg_test_timing utility to check the > timing overhead on your system. With hpet based timing I'm seeing > 660ns timing overhead and 26.5s execution for your query, with timing > off execution time falls to 2.1s. For reference, tsc based timing > gives 19.2ns overhead and 2.3s execution time with timing. > > Ants Aasma > -- > Cybertec Schönig & Schönig GmbH > Gröhrmühlgasse 26 > A-2700 Wiener Neustadt > Web: http://www.postgresql-support.de > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "could not open relation with OID" errors after promoting the standby to master
On Wed, May 16, 2012 at 11:38 PM, Alvaro Herrera wrote: > Well, that is not surprising in itself -- InitTempTableNamespace calls > RemoveTempRelations to cleanup from a possibly crashed previous backend > with the same ID. So that part of the backtrace looks normal to me > (unless there is something weird going on, which might very well be the > case). Right, I guess the stack trace is okay but some state was obviously wrong. I was able to clean that up now by some catalog hacking, but I'm definitely going to dump and reload soon. I found out that it was certain backend ids which couldn't create temporary tables, meaning that when I did a "create temp table" in these few certain backend ids (about 4-5 all with low id numbers which is why I hit them quite often), it would give me this "could not open relation with OID x" error. I also couldn't drop the temp schema in these backends: # drop schema pg_temp_4; ERROR: cache lookup failed for relation 1990987636 # select oid, * from pg_namespace ; (got oid 4664506 for "pg_temp_4") # select * from pg_class where oid = 1990987636; (no rows returned) # delete from pg_namespace where oid = 4664506; DELETE 1 # create temp table myfoo(a int); CREATE TABLE Later on I also found some leftover pg_type entries from temporary tables that didn't exist anymore. I'm quite that certain I shouldn't see these anymore... And I also find a few entries in pg_class with relistemp='t' whose oid is considerably older than anything recent. This kinda suggests that there might be something weird going on when you have temp tables in flight and fail over, at least that's the only explanation I have for how this could have happened. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] counting pallocs
On Thu, May 17, 2012 at 2:28 AM, Heikki Linnakangas wrote: > What percentage of total CPU usage is the palloc() overhead in these tests? > If we could totally eliminate the palloc() overhead, how much faster would > the test run? AllocSetAlloc is often the top CPU consumer in profiling results, but it's typically only in the single-digit percentages. However, there's also some distributed overhead that's more difficult to measure. For example, the fact that OpExpr uses a List instead of directly pointing to its arguments costs us three pallocs - plus three more if we ever copy it - but it also means that accessing the first element of an OpExpr requires three pointer dereferences instead of one, and accessing the second one requires four pointer dereferences instead of one. There's no real way to isolate the overhead of that, but it's got to cost at least something. The reality - I'm not sure whether it's a happy reality or a sad reality - is that most CPU profiles of PostgreSQL are pretty flat. The nails that stick up have, for the most part, long since been pounded down. If we want to make further improvements to our parse and plan time, and I do, because I think we lag our competitors, then I think this is the kind of stuff we need to look at. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] master and sync-replica diverging
Erik, Are you taking the counts *while* the table is loading? In sync replication, it's possible for the counts to differ for a short time due to one of three things: * transaction has been saved to the replica and confirm message hasn't reached the master yet * replica has synched the transaction to the WAL log, but due to wal_delay settings hasn't yet applied it to the tables in memory. * updating the master with synchronous_commit = local. - Original Message - > AMD FX 8120 / centos 6.2 / latest source (git head) > > > It seems to be quite easy to force a 'sync' replica to not be equal > to master by > recreating+loading a table in a while loop. > > > For this test I compiled+checked+installed three separate instances > on the same machine. The > replica application_name are names 'wal_receiver_$copy' where $copy > is 01, resp. 02. > > $ ./sync_state.sh > pid | application_name | state | sync_state > ---+--+---+ > 19520 | wal_receiver_01 | streaming | sync > 19567 | wal_receiver_02 | streaming | async > (2 rows) > > port | synchronous_commit | synchronous_standby_names > --++--- > 6564 | on | wal_receiver_01 > (1 row) > > port | synchronous_commit | synchronous_standby_names > --++--- > 6565 | off| > (1 row) > > port | synchronous_commit | synchronous_standby_names > --++--- > 6566 | off| > (1 row) > > > > The test consists of creating a table and loading tab-separated data > from file with COPY and then > taking the rowcount of that table (13 MB, almost 200k rows) in all > three instances: > > > # wget > http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz > > slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz > > zcat $slurp_file \ > | grep -v '^#' \ > | grep -Ev '^[[:space:]]*$' \ > | psql -c " > drop table if exists $table cascade; > create table $table ( > gene_symbol text > ,primary_fbgn text > ,secondary_fbgns text > ,annotation_idtext > ,secondary_annotation_ids text > ); > copy $table from stdin csv delimiter E'\t'; > "; > > # count on master: > echo "select current_setting('port') port,count(*) from $table"|psql > -qtXp 6564 > > # count on wal_receiver_01 (sync replica): > echo "select current_setting('port') port,count(*) from $table"|psql > -qtXp 6565 > > # count on wal_receiver_02 (async replica): > echo "select current_setting('port') port,count(*) from $table"|psql > -qtXp 6566 > > > > I expected the rowcounts from master and sync replica to always be > the same. > > Initially this seemed to be the case, but when I run the above > sequence in a while loop for a few > minutes about 10% of rowcounts from the sync-replica are not equal to > the master. > > Perhaps not a likely scenario, but surely such a deviating rowcount > on a sync replica should not > be possible? > > > thank you, > > > Erik Rijkers > > > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange issues with 9.2 pg_basebackup & replication
Jim, Fujii, Even more fun: 1) Set up a server as a cascading replica (e.g. max_wal_senders = 3, standby_mode = on ) 2) Connect the server to *itself* as a replica. 3) This will work and report success, up until you do your first write. 4) Then ... segfault! - Original Message - > On 5/16/12 10:53 AM, Fujii Masao wrote: > > On Wed, May 16, 2012 at 3:43 AM, Joshua Berkus > > wrote: > >> > >>> Before restarting it, you need to do pg_basebackup and make a > >>> base > >>> backup > >>> onto the standby again. Since you started the standby without > >>> recovery.conf, > >>> a series of WAL in the standby has gotten inconsistent with that > >>> in > >>> the master. > >>> So you need a fresh backup to restart the standby. > >> > >> You're not understanding the bug. The problem is that the standby > >> came up and reported that it was replicating OK, when clearly it > >> wasn't. > > > >> 8. Got this fatal error on the standby server: > >> > >> LOG: record with incorrect prev-link 0/7B8 at 0/7E0 > >> LOG: record with incorrect prev-link 0/7B8 at 0/7E0 > >> > >> ... this error message repeated every 5s. > > > > According to your first report, ISTM you got error messages. > > Only *after* it was correctly setup. > > Josh's point is that if you flub the configuration, you should get an > error, which is not what's happening now. Right now it just comes up > and acts as if nothing's wrong. > -- > Jim C. Nasby, Database Architect j...@nasby.net > 512.569.9461 (cell) http://jim.nasby.net > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] counting pallocs
Robert Haas writes: > One piece of reasonably low-hanging fruit appears to be OpExpr. It > seems like it would be better all around to put Node *arg1 and Node > *arg2 in there instead of a list... aside from saving pallocs, it > seems like it would generally simplify the code. Obviously, Stephen Frost's list-allocation patch would affect your results here ... but I wonder how much the above change would affect *his* results. Specifically, the observation that most lists are 1 or 2 elements long would presumably become less true, but I wonder by how much exactly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] read() returns ERANGE in Mac OS X
On May16, 2012, at 15:51 , Tom Lane wrote: > Alvaro Herrera writes: >> We just came across a situation where a corrupted HFS+ filesystem >> appears to return ERANGE on a customer machine. Our first reaction was >> to turn zero_damaged_pages on to allow taking a pg_dump backup of the >> database, but surprisingly this does not work. A quick glance at the >> code shows the reason: >> ... >> Note that zero_damaged_pages only enters the picture if it's a short >> read, not if the read actually fails completely. > >> Is this by design, or is this just an oversight? > > It is by design, in that the only contemplated case was truncated-away > pages. I'm pretty hesitant to consider allowing arbitrary kernel errors > to be ignored here … Maybe we should have zero_missing_pages which would only zero on short reads, and zero_damaged_pages which would zero on all IO errors? Or we could have zero_damaged_pages zero only if reports EIO, and then add any platform-specific additional error codes as we learn about them. ERANGE on darwin would be the first such addition. In any case, it seems to me that at least EIO should trigger zeroing, since that is presumably what you'd get on a filesystem with integrated checksums like ZFS. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange issues with 9.2 pg_basebackup & replication
On Thu, May 17, 2012 at 3:42 PM, Joshua Berkus wrote: > Even more fun: > > 1) Set up a server as a cascading replica (e.g. max_wal_senders = 3, > standby_mode = on ) > > 2) Connect the server to *itself* as a replica. > > 3) This will work and report success, up until you do your first write. > > 4) Then ... segfault! I cannot reproduce this. Attached is the script that I use for cascade replication testing. With it I can see the replica connecting to itself but no segfault. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de test_cascading.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] master and sync-replica diverging
On Thu, May 17, 2012 14:32, Joshua Berkus wrote: > Erik, > > Are you taking the counts *while* the table is loading? In sync replication, > it's possible for > the counts to differ for a short time due to one of three things: > > * transaction has been saved to the replica and confirm message hasn't > reached the master yet The count(*) was done in the way that I showed, i.e. *after* psql had exited. My understanding is that, with synchronous replication 'on' and configured properly, psql could only return *after* the sync-replica had the data safely on disk. Either my understanding is not correct or there is a bug in postgres sync-rep. > * replica has synched the transaction to the WAL log, but due to wal_delay > settings hasn't yet > applied it to the tables in memory. settings are untouched. > * updating the master with synchronous_commit = local. During my test: synchronous_commit = on Thanks, Erik Rijkers > - Original Message - >> AMD FX 8120 / centos 6.2 / latest source (git head) >> >> >> It seems to be quite easy to force a 'sync' replica to not be equal >> to master by >> recreating+loading a table in a while loop. >> >> >> For this test I compiled+checked+installed three separate instances >> on the same machine. The >> replica application_name are names 'wal_receiver_$copy' where $copy >> is 01, resp. 02. >> >> $ ./sync_state.sh >> pid | application_name | state | sync_state >> ---+--+---+ >> 19520 | wal_receiver_01 | streaming | sync >> 19567 | wal_receiver_02 | streaming | async >> (2 rows) >> >> port | synchronous_commit | synchronous_standby_names >> --++--- >> 6564 | on | wal_receiver_01 >> (1 row) >> >> port | synchronous_commit | synchronous_standby_names >> --++--- >> 6565 | off| >> (1 row) >> >> port | synchronous_commit | synchronous_standby_names >> --++--- >> 6566 | off| >> (1 row) >> >> >> >> The test consists of creating a table and loading tab-separated data >> from file with COPY and then >> taking the rowcount of that table (13 MB, almost 200k rows) in all >> three instances: >> >> >> # wget >> http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz >> >> slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz >> >> zcat $slurp_file \ >> | grep -v '^#' \ >> | grep -Ev '^[[:space:]]*$' \ >> | psql -c " >> drop table if exists $table cascade; >> create table $table ( >> gene_symbol text >> ,primary_fbgn text >> ,secondary_fbgns text >> ,annotation_idtext >> ,secondary_annotation_ids text >> ); >> copy $table from stdin csv delimiter E'\t'; >> "; >> >> # count on master: >> echo "select current_setting('port') port,count(*) from $table"|psql >> -qtXp 6564 >> >> # count on wal_receiver_01 (sync replica): >> echo "select current_setting('port') port,count(*) from $table"|psql >> -qtXp 6565 >> >> # count on wal_receiver_02 (async replica): >> echo "select current_setting('port') port,count(*) from $table"|psql >> -qtXp 6566 >> >> >> >> I expected the rowcounts from master and sync replica to always be >> the same. >> >> Initially this seemed to be the case, but when I run the above >> sequence in a while loop for a few >> minutes about 10% of rowcounts from the sync-replica are not equal to >> the master. >> >> Perhaps not a likely scenario, but surely such a deviating rowcount >> on a sync replica should not >> be possible? >> >> >> thank you, >> >> >> Erik Rijkers >> >> >> >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql bug
will investigate that Tom Lane wrote: Teodor Sigaev writes: After editing query with external editor psql exits on Ctrl-C: FWIW, I failed to reproduce that on any of my machines. Maybe your editor is leaving the tty in a funny state? regards, tom lane -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] master and sync-replica diverging
On Thu, May 17, 2012 at 4:53 PM, Erik Rijkers wrote: > The count(*) was done in the way that I showed, i.e. *after* psql had exited. > My understanding is > that, with synchronous replication 'on' and configured properly, psql could > only return *after* > the sync-replica had the data safely on disk. Either my understanding is not > correct or there is > a bug in postgres sync-rep. Commit can only return when sync-replica has the data safely on disk, but this doesn't mean that it's visible yet. The sequence of events is in dot notation: commit_command -> master_wal_sync -> replica_wal_sync -> master_commit_visible -> commit_response replica_wal_sync -> replica_replay_wal -> replica_commit_visible If you issue a select on the replica after getting a commit response from master you can see that the query getting a snapshot races with replay of the commit record. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] master and sync-replica diverging
On Thu, May 17, 2012 16:10, Ants Aasma wrote: > On Thu, May 17, 2012 at 4:53 PM, Erik Rijkers wrote: >> The count(*) was done in the way that I showed, i.e. *after* psql had >> exited. My understanding >> is >> that, with synchronous replication 'on' and configured properly, psql could >> only return *after* >> the sync-replica had the data safely on disk. Either my understanding is >> not correct or there >> is >> a bug in postgres sync-rep. > > Commit can only return when sync-replica has the data safely on disk, > but this doesn't mean that it's visible yet. > > The sequence of events is in dot notation: > commit_command -> master_wal_sync -> replica_wal_sync -> > master_commit_visible -> commit_response > replica_wal_sync -> replica_replay_wal -> replica_commit_visible > > If you issue a select on the replica after getting a commit response > from master you can see that the query getting a snapshot races with > replay of the commit record. > Ah yes, that makes sense. I hadn't thought of that. Thank you for that explanation. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing optimization when filters are applied after window functions
Hitoshi Harada schrieb: > On Wed, May 16, 2012 at 12:50 AM, Volker Grabsch > wrote: > > I propose the following general optimization: If all window > > functions are partitioned by the same first field (here: id), > > then any filter on that field should be executed before > > WindowAgg. So a query like this: > > I think that's possible. Currently the planner doesn't think any > qualification from the upper query can be pushed down to a query that > has a window function. It would be possible to let it push down if > the expression matches PARTITION BY expression. Sounds great! > However, the > challenge is that a query may have a number of window functions that > have different PARTITION BY expressions. At the time of pushing down > in the planning, it is not obvious which window function comes first. I'm don't really unterstand what you mean with "which window function comes first", because to my understanding, all window functions of a query belong to the same level in the query hierarchy. But then, my knowledge of PostgreSQL internals isn't very deep, either. > One idea is to restrict such optimization in only case of single > window function, and the other is to make it generalize and cover a > lot of cases. >From a practical point of view, the restriction to a single window function wouldn't be that bad, although I'd prefer to think about the number of different windows rather than number of window functions. In other words, every optimization that is correct for a single window function is also correct for multiple window functions if those use all the same window. > That said, our planner on window functions has a lot of improvement to > be done. Every kind of optimization I see is what I raised above; > they can be done easily by hacking in a small case, or they can be > done by generalizing for the most of cases. My understanding is our > project tends to like the latter and it takes a little time but covers > more use cases. I'd also prefer to see a general solution, as this provides less room for unpleasant surprises (e.g. "This query is only slightly different from the previous one. Why does it take so much longer?"). On the other hand, any small improvement is a big step forward regarding window functions. Unfortunately, I can't voluteer on that, as it is currently impossible for me to allocate enough time for this. However, any pointer to where to look at the source (or in the manual) would be of great. Maybe I'll find at least enough time to provide a rough proposal, or to improve existing attempts to solve this issue. Also, is there any chance to include a (simple) attempt of such an optimiztation into PostgreSQL-9.2 beta, or is this only a possible topic for 9.3 and later? Regards, Volker -- Volker Grabsch ---<<(())>>--- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-alloc ListCell's optimization
* Robert Haas (robertmh...@gmail.com) wrote: > So I guess the first question here is - does it improve performance? > > Because if it does, then it's worth pursuing ... if not, that's the > first thing to fix. Alright, so I've done some pgbench's using all default configs with just a straight up './configure' and pgbench -S -T 300, 3 runs each and then averaged: llist_opt: 9289 tps HEAD: 9286 tps I realize we see tons of palloc() calls happening but now I'm wondering if they really contribute all that match time, overall. Also, I'm wondering if all the benefit from removing the palloc()'s is being sucked up by the regression in list_concat(). A few folks have mentioned just going whole-hog and doing all list allocations in blocks of 8, which would actually allow us to go back to an O(1) list_concat, though we wouldn't be able to free the 2nd list passed to list_concat in that case, which may or may not be acceptable, based on how necessary those couple pfree's we had previously are. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Why is indexonlyscan so darned slow?
On Thu, May 17, 2012 at 5:22 AM, Joshua Berkus wrote: > Ants, > > Well, that's somewhat better, but again hardly the gain in performance I'd > expect to see ... especially since this is ideal circumstances for index-only > scan. > > bench2=# select count(*) from pgbench_accounts; > count > -- > 2000 > (1 row) > > Time: 3827.508 ms > > bench2=# set enable_indexonlyscan=off; > SET > Time: 0.241 ms > bench2=# select count(*) from pgbench_accounts; > count > -- > 2000 > (1 row) > > Time: 16012.444 ms > > For some reason counting tuples in an index takes 5X as long (per tuple) as > counting them in a table. Why? > It looks like the IOS is taking 4x less time, not more time. Anyway, the IOS follows the index logical structure, not the physical structure, so if the index is not in RAM it will really be hurt by the lack of sequential reads. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange issues with 9.2 pg_basebackup & replication
On Thu, May 17, 2012 at 12:01 PM, Joshua Berkus wrote: > >> > And: if we still have to ship logs, what's the point in even having >> > cascading replication? >> >> At least cascading replication (1) allows you to adopt more flexible >> configuration of servers, > > I'm just pretty shocked. The last time we talked about this, at the end of > the 9.1 development cycle, you almost had remastering using streaming-only > replication working, you just ran out of time. Now it appears that you've > abandoned working on that completely. What's going on? You mean that "remaster" is, after promoting one of standby servers, to make remaining standby servers reconnect to new master and resolve the timeline gap without the shared archive? Yep, that's one of my TODO items, but I'm not sure if I have enough time to implement that for 9.3 Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange issues with 9.2 pg_basebackup & replication
On Thu, May 17, 2012 at 10:42 PM, Ants Aasma wrote: > On Thu, May 17, 2012 at 3:42 PM, Joshua Berkus wrote: >> Even more fun: >> >> 1) Set up a server as a cascading replica (e.g. max_wal_senders = 3, >> standby_mode = on ) >> >> 2) Connect the server to *itself* as a replica. >> >> 3) This will work and report success, up until you do your first write. >> >> 4) Then ... segfault! > > I cannot reproduce this. Me, neither. Josh, could you show me the more detail procedure to reproduce the problem? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing optimization when filters are applied after window functions
2012/5/17 Volker Grabsch : > Also, is there any chance to include a (simple) attempt of > such an optimiztation into PostgreSQL-9.2 beta, or is this > only a possible topic for 9.3 and later? For 9.2, you’re about 4 months late :-). The last commitfest was in Januari: https://commitfest.postgresql.org/action/commitfest_view?id=13> Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is indexonlyscan so darned slow?
Jeff, That's in-RAM speed ... I ran the query twice to make sure the index was cached, and it didn't get any better. And I meant 5X per byte rather than 5X per tuple. I talked this over with Haas, and his opinion is that we have a LOT of overhead in the way we transverse indexes, especially lookups which happen once per leaf node instead of in bulk.Certainly the performance I'm seeing would be consistent with that idea. I'll try some multi-column covering indexes next to see how it looks. - Original Message - > On Thu, May 17, 2012 at 5:22 AM, Joshua Berkus > wrote: > > Ants, > > > > Well, that's somewhat better, but again hardly the gain in > > performance I'd expect to see ... especially since this is ideal > > circumstances for index-only scan. > > > > bench2=# select count(*) from pgbench_accounts; > > count > > -- > > 2000 > > (1 row) > > > > Time: 3827.508 ms > > > > bench2=# set enable_indexonlyscan=off; > > SET > > Time: 0.241 ms > > bench2=# select count(*) from pgbench_accounts; > > count > > -- > > 2000 > > (1 row) > > > > Time: 16012.444 ms > > > > For some reason counting tuples in an index takes 5X as long (per > > tuple) as counting them in a table. Why? > > > > It looks like the IOS is taking 4x less time, not more time. > > Anyway, the IOS follows the index logical structure, not the physical > structure, so if the index is not in RAM it will really be hurt by > the > lack of sequential reads. > > Cheers, > > Jeff > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is indexonlyscan so darned slow?
On Thu, May 17, 2012 at 11:35 AM, Joshua Berkus wrote: > Jeff, > > That's in-RAM speed ... I ran the query twice to make sure the index was > cached, and it didn't get any better. And I meant 5X per byte rather than 5X > per tuple. Ah, OK that makes more sense. I played around with this, specifically count(*), quite a bit when IOS first came out, and I attributed a large part of the time to the code that forms a tuple out of raw bytes, and the code that advances the aggregate. The first one is probably more a per-tuple cost than per byte, and the second definitely is per tuple cost. I can't find my detailed notes from this work, so this is just from memory. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange issues with 9.2 pg_basebackup & replication
Yeah, I don't know how I produced the crash in the first place, because of course the self-replica should block all writes, and retesting it I can't get it to accept a write. Not sure how I did it in the first place. So the bug is just that you can connect a server to itself as its own replica. Since I can't think of any good reason to do this, we should simply error out on startup if someone sets things up that way. How can we detect that we've connected streaming replication to the same server? - Original Message - > On Thu, May 17, 2012 at 10:42 PM, Ants Aasma > wrote: > > On Thu, May 17, 2012 at 3:42 PM, Joshua Berkus > > wrote: > >> Even more fun: > >> > >> 1) Set up a server as a cascading replica (e.g. max_wal_senders = > >> 3, standby_mode = on ) > >> > >> 2) Connect the server to *itself* as a replica. > >> > >> 3) This will work and report success, up until you do your first > >> write. > >> > >> 4) Then ... segfault! > > > > I cannot reproduce this. > > Me, neither. > > Josh, could you show me the more detail procedure to reproduce the > problem? > > Regards, > > -- > Fujii Masao > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql bug
FWIW, I failed to reproduce that on any of my machines. Maybe your editor is leaving the tty in a funny state? Seems system() call cleanups sigaction state on FreeBSD. I've modify void setup_cancel_handler(void) { fprintf(stderr, "%p -> %p\n", pqsignal(SIGINT, handle_sigint), handle_sigint); } and add it around system() call. Next: % export EDITOR=echo % psql postgres 0x0 -> 0x409620 SET Timing is on. psql (9.2beta1) Type "help" for help. postgres=# \e 0x409620 -> 0x409620 /tmp/psql.edit.7997.sql 0x0 -> 0x409620 postgres=# -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers