[HACKERS] Queuing all tables for analyze after recovery
Hi. Some (maybe all) row statistics are lost after the database has recovered after a failover. So it's recommended to ANALYZE all databases in a cluster after recovery. Amazon's AWS RDS (their managed SQL databases service) even sends an email "consider running analyze if your database is slow" after a failover of so called MultiAZ databases (with fast automatic failover for double price). Funny that they send it for both PostgreSQL and Oracle databases, which, I suppose, confuses Oracle DBA's greatly. And in AWS RDS MultiAZ a failover is pretty common. Minor version upgrade - failover. A storage hiccup - failover. Out of memory - failover. Shouldn't this analyze be queued and all tables analyzed automatically after failover by autovacuum daemon? With up to autovacuum_max_workers in parallel? It might save some DBA's from a couple of lost sleeping hours for sure. What do you think? A GUC option? On by dafault? Maybe even backported, but off by default in released versions? -- Tomasz "Tometzky" Ostrowski -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Queuing all tables for analyze after recovery
On 10/19/2017 10:54 PM, Tom Lane wrote: Uh ... recommended by whom? pg_statistic has exactly the same reliability guarantees as the rest of the system catalogs. Actually I'm not exactly sure what is lost and what is preserved. I'm pretty sure that pg_stat_all_tables and similar views turn out with no data after a failover. Also I have some experience with badly performing databases after a failover, which went back to normal performance after whole cluster analyze. This email from AWS suggests that it's not only me. I don't deny that there might be cases where this is worth doing, but it does not seem so likely that it should be part of one's standard checklist. Much less something that we should expend a great deal of effort to automate. I assumed that the effort here shouldn't be that large. I imagined a simple check if the statistics are missing when considering tables for analyze by autovacuum. But I'm not a programmer, so I might misestimate this effort badly. -- Regards, Tomasz "Tometzky" Ostrowski -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Queuing all tables for analyze after recovery
Hi. Some (maybe all) row statistics are lost after the database has recovered after a failover. So it's recommended to ANALYZE all databases in a cluster after recovery. Amazon's AWS RDS (their managed SQL databases service) even sends an email "consider running analyze if your database is slow" after a failover of so called MultiAZ databases (with fast automatic failover for double price). Funny that they send it for both PostgreSQL and Oracle databases, which, I suppose, confuses Oracle DBA's greatly. And in AWS RDS MultiAZ a failover is pretty common. Minor version upgrade - failover. A storage hiccup - failover. Out of memory - failover. Shouldn't this analyze be queued and all tables analyzed automatically after failover by autovacuum daemon? With up to autovacuum_max_workers in parallel? It might save some DBA's from a couple of lost sleeping hours for sure. What do you think? A GUC option? On by dafault? Maybe even backported, but off by default in released versions? -- Tomasz "Tometzky" Ostrowski -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multicolumn hash indexes
On 09/27/2017 05:57 PM, Tom Lane wrote: If we follow GIST's lead that the leading column is "most important", the idea could be to require a search constraint on the first column, which produces the hash that determines the bucket assignment. Hashes for additional columns would just be payload data in the index entry. If you have search constraint(s) on low-order column(s), you can check for hash matches before visiting the heap, but they don't reduce how much of the index you have to search. Even btree works that way for many combinations of incomplete index constraints. I feel that this would eliminate a large amount of potential gains from such an index. This would be usable only when a sufficiently variable column exists, in which case a simple hash index on the column wouldn't be much worse. But I have an idea. What if there was a requirement for the search criteria to use tuple equality comparison: where (a,b,c)=(?,?,?) or where (a,b,c) in ((?,?,?),(?,?,?),(?,?,?),(?,?,?)) Wouldn't it eliminate problems with disappearing conditions? Actually maybe this could be implemented just like a functional index. So it would implement reasonably something that otherwise would be a terribly hackish and slow solution like: create or replace function hashhack(a bytea, b bytea) returns bigint language sql immutable as $$ -- uses 'x1e' (record separator) -- to ensure hashhack('a','')!=hashhack('','a') select ( 'x' || substr( md5($1||'\x1e'::bytea||$2), 1, 16 ) )::bit(64)::bigint; $$; create index t_hashhack_a_b_idx on t( hashhack(a::bytea,b::bytea) ); select * from t where a='a' and b='b' and hashhack(a::bytea, b::bytea) = hashhack('a'::bytea,'b'::bytea); If if was automatic man could avoid the overhead of converting data to bytea/string, concatenating, truncating, converting back to bigint, rechecking condition etc. that make this kind of hack not very sane. Even providing a specially crafted function or operator for queries specifically targeted for the index would be quite sufficient: where pg_equal( (a,b,c), (?,?,?) ); -- Tomasz "Tometzky" Ostrowski -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Multicolumn hash indexes
Hi. I've noticed that hash indexes can't currently (in PG10) be multicolumn. Are they technically hard to implement or just nobody took such a feature? I think multicolumn hash indexes should help pretty significantly with queries like: - where username=? and user_post_id=? - where client_id=? and period=? and invoice_number=? etc. I imagine that calculating a multicolumn hash should be pretty straightforward to implement - after hashing bytes of first column just keep going and update the hash state with bytes of a second and subsequent columns. And it should allow for faster (O(1), less IO) and much smaller (better cached, less IO again) multicolumn indexes. Also in PG10 hash indexes are WAL-logged and therefore much easier to work with. What do you think? -- Tomasz "Tometzky" Ostrowski -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_restore foreign keys NOT VALID, or [assume] VALID; VALIDATE CONSTRAINT CONCURRENTLY
Hi. A lot of time during pg_restore of a large database is spent on validating all the foreign keys. In contrast to importing data and creating indexes this operation does not parallelize well. So large percentage of parallel restore time ends up using single worker to validate foreign keys for the largest table. If we'd have a option to restore the table without validating foreign keys and leaving them in NOT VALID state, the downtime needed for us to restore would decrease significantly. If we'd also have an option to avoid blocking updates on the table during (potentially long) validating, for example: ALTER TABLE distributors VALIDATE CONSTRAINT CONCURRENTLY distfk; Then we could postpone it and do it during normal operation of the database, out of precious disaster recovery time. Alternatively maybe it should be allowed to do for example: ALTER TABLE distributor ADD CONSTRAINT distfk FOREIGN KEY (address) REFERENCES addresses (address) VALID; It would mean that the database should assume that this constraint is valid. Should be possible to turn it on using some pg_restore option (or pg_dump option when dumping to text format), though maybe only when restoring whole database, not single table. Though there's a possibility that a partially failed restore could leave database in inconsistent state. So I'd rather prefer the above option (NOT VALID + VALIDATE CONCURRENTLY). Any comments on this? Does it look like a good idea? It shouldn't be hard to implement. -- Tomasz "Tometzky" Ostrowski -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Routine analyze of single column prevents standard autoanalyze from running at all
W dniu 2016-06-08 o 05:04, Tom Lane pisze: Jim Nasbywrites: Is there any significant advantage to not analyzing all columns? Only case I can think of is if you have a fair number of columns that have been toasted; otherwise I'd think IO would completely swamp any other considerations. Yeah, my guess is that the OP's example where analyzing just one column was significantly cheaper boiled down to some of the other columns being mostly toasted data. Otherwise it's hard to see how there's much more expense in analyzing them all. Actually no - this volatile column has smaller "statistics" than most of the table, so analyzing it is much faster when it's data is not in RAM. Here is a small exaggerated example showing a difference: $ psql tometzky=> create table test (id serial, data text); tometzky=> insert into test(data) select 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. ' ||generate_series(0,1000)::text; tometzky=> alter table test alter column id set statistics 10; tometzky=> alter table test alter column data set statistics 1000; tometzky=> \q # Drop OS page cache and restart postgres # so the table data won't be in RAM anymore: $ sudo sh -c 'echo 1 >/proc/sys/vm/drop_caches' $ sudo systemctl restart postgresql; # Test single column analyze: $ psql tometzky=> \timing Timing is on. tometzky=> analyze verbose test(id); INFO: analyzing "public.test" INFO: "test": scanned 3000 of 123457 pages, containing 243000 live rows and 0 dead rows; 3000 rows in sample, 2712238 estimated total rows ANALYZE Time: 422,521 ms tometzky=> \q # Drop OS page cache and restart postgres again $ sudo sh -c 'echo 1 >/proc/sys/vm/drop_caches' $ sudo systemctl restart postgresql; $ psql tometzky=> \timing Timing is on. tometzky=> analyze verbose test; INFO: analyzing "public.test" INFO: "test": scanned 123457 of 123457 pages, containing 1001 live rows and 0 dead rows; 30 rows in sample, 1001 estimated total rows ANALYZE Time: 9447,519 ms -- Tomasz "Tometzky" Ostrowski -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementing Sorting Refinements
On Tue, 08 Jan 2008, [EMAIL PROTECTED] wrote: Well, sorry for hijacking... ummm how did I do that? You replied to a post instead of creating a new, unrelated e-mail. It is different. Just try to use threaded mode of your e-mail client and you'll get the idea. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Spoofing as the postmaster
On Sun, 23 Dec 2007, Tom Lane wrote: ISTM we have these action items: 1. Improve the code so that SSL authentication can be used across a Unix-socket connection (we can disable encryption though). I've just realised that there's a problem with SSL with disabled encryption on a unix socket / localhost connections for cpu-saving. Any local user using this attack would be able to eavesdrop everything comming through a socket. If an attacker just acts as a tunnel, highjacking a unix-socket and talking to a server using any other interface (or the other way around), then he would not be able to modify information flow, but he would be able to read and save everything going to and from a server. It is again not obvious as normally local connections are not susceptible to eavesdropping. And could go unnoticed for a long time as everything would just work normally. So I think no cpu-saving by turning off encryption should be done. And this would all not help for a denial-of-service attack. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Spoofing as the postmaster
On Sun, 23 Dec 2007, Magnus Hagander wrote: I'm just surprised that people are actually surprised by this. To me, it's just a natural fact that happens to pretty much all systems. And a good reason not to let arbitrary users run processes that can bind to something on your server. Not everybody works for Enterprise, where price does not matter. I cannot afford a dedicated servers for database, DNS, e-mail, antispam, firewall, file, WWW etc. Even administrative overhead would be too much for one person IT staff. I have to run all of this and much more on one machine, so I'm interested in limiting rights for a user for example running WWW, so when, god forbid, compromized, it'd limit damage. I am also not able to run sophisticated security frameworks, limiting every user rights to just what they need, as maintaining it would require a security full-timer. So I'm not very fond of this insecure by default, it's your problem to make it secure attitude. I'm the one who reported this. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Spoofing as the postmaster
On Sun, 23 Dec 2007, Tom Lane wrote: IIRC, you started out your argument by also saying that we had to move the TCP socket to the reserved range, so as to prevent the equivalent problem in the TCP case. 1. Postmaster must be started as root, thereby introducing security risks of its own (ie, after breaking into the DB, an attacker might be able to re-acquire root privileges). Not at all, as it won't run as root, it'll just start as root and then give up all root privileges. The only thing it would have after being root is just an open socket. 2. Can only have one postmaster per machine (ICANN is certainly not going to give us dozens of reserved addresses). I don't think ICANN would prevent anybody from using different port. I'm running httpd on port 81, sshd on 222 etc. It's just the default that should be made official through ICANN. 3. Massive confusion and breakage as various people transition to the new standard at different times. As with any major version. 4. Potential to create, rather than remove, spoofing opportunities anyplace there is confusion about which port the postmaster is really listening on. I agree. But because it would just not work it'll be easy to notice and correct. And when corrected it would be no more confusion. Fundamentally these are man-in-the-middle attacks, and the only real solution is mutual authentication. The problem is not many people expect man-in-the-middle attack on secure lan, localhost or local socket connection, so they'll not try to prevent it. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq