[HACKERS] Queuing all tables for analyze after recovery

2017-10-19 Thread Tomasz Ostrowski

Hi.

Some (maybe all) row statistics are lost after the database has 
recovered after a failover. So it's recommended to ANALYZE all databases 
in a cluster after recovery.


Amazon's AWS RDS (their managed SQL databases service) even sends an 
email "consider running analyze if your database is slow" after a 
failover of so called MultiAZ  databases (with fast automatic failover 
for double price). Funny that they send it for both PostgreSQL and 
Oracle databases, which, I suppose, confuses Oracle DBA's greatly.


And in AWS RDS MultiAZ a failover is pretty common. Minor version 
upgrade - failover. A storage hiccup - failover. Out of memory - failover.


Shouldn't this analyze be queued and all tables analyzed automatically 
after failover by autovacuum daemon? With up to autovacuum_max_workers 
in parallel?


It might save some DBA's from a couple of lost sleeping hours for sure. 
What do you think? A GUC option? On by dafault? Maybe even backported, 
but off by default in released versions?


--
Tomasz "Tometzky" Ostrowski


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Queuing all tables for analyze after recovery

2017-10-19 Thread Tomasz Ostrowski

On 10/19/2017 10:54 PM, Tom Lane wrote:

Uh ... recommended by whom?  pg_statistic has exactly the same reliability
guarantees as the rest of the system catalogs.


Actually I'm not exactly sure what is lost and what is preserved. I'm 
pretty sure that pg_stat_all_tables and similar views turn out with no 
data after a failover.


Also I have some experience with badly performing databases after a 
failover, which went back to normal performance after whole cluster 
analyze. This email from AWS suggests that it's not only me.



I don't deny that there might be cases where this is worth doing, but
it does not seem so likely that it should be part of one's standard
checklist.  Much less something that we should expend a great deal
of effort to automate.


I assumed that the effort here shouldn't be that large. I imagined a 
simple check if the statistics are missing when considering tables for 
analyze by autovacuum. But I'm not a programmer, so I might misestimate 
this effort badly.


--
Regards,
Tomasz "Tometzky" Ostrowski


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Queuing all tables for analyze after recovery

2017-10-19 Thread Tomasz Ostrowski

Hi.

Some (maybe all) row statistics are lost after the database has 
recovered after a failover. So it's recommended to ANALYZE all databases 
in a cluster after recovery.


Amazon's AWS RDS (their managed SQL databases service) even sends an 
email "consider running analyze if your database is slow" after a 
failover of so called MultiAZ  databases (with fast automatic failover 
for double price). Funny that they send it for both PostgreSQL and 
Oracle databases, which, I suppose, confuses Oracle DBA's greatly.


And in AWS RDS MultiAZ a failover is pretty common. Minor version 
upgrade - failover. A storage hiccup - failover. Out of memory - failover.


Shouldn't this analyze be queued and all tables analyzed automatically 
after failover by autovacuum daemon? With up to autovacuum_max_workers 
in parallel?


It might save some DBA's from a couple of lost sleeping hours for sure. 
What do you think? A GUC option? On by dafault? Maybe even backported, 
but off by default in released versions?


--
Tomasz "Tometzky" Ostrowski


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multicolumn hash indexes

2017-09-27 Thread Tomasz Ostrowski

On 09/27/2017 05:57 PM, Tom Lane wrote:

If we follow GIST's lead that the leading column is "most important",
the idea could be to require a search constraint on the first column,
which produces the hash that determines the bucket assignment.  Hashes
for additional columns would just be payload data in the index entry.
If you have search constraint(s) on low-order column(s), you can check
for hash matches before visiting the heap, but they don't reduce how
much of the index you have to search.  Even btree works that way for
many combinations of incomplete index constraints.


I feel that this would eliminate a large amount of potential gains from 
such an index. This would be usable only when a sufficiently variable 
column exists, in which case a simple hash index on the column wouldn't 
be much worse.


But I have an idea. What if there was a requirement for the search 
criteria to use tuple equality comparison:

where (a,b,c)=(?,?,?)
or
where (a,b,c) in ((?,?,?),(?,?,?),(?,?,?),(?,?,?))

Wouldn't it eliminate problems with disappearing conditions?

Actually maybe this could be implemented just like a functional index. 
So it would implement reasonably something that otherwise would be a 
terribly hackish and slow solution like:


create or replace function hashhack(a bytea, b bytea)
returns bigint
language sql
immutable
as $$
-- uses 'x1e' (record separator)
-- to ensure hashhack('a','')!=hashhack('','a')
select (
  'x'
  ||
  substr(
   md5($1||'\x1e'::bytea||$2),
   1,
   16
  )
)::bit(64)::bigint;
$$;

create index t_hashhack_a_b_idx
  on t( hashhack(a::bytea,b::bytea) );

select * from t
  where a='a' and b='b' 
  and
hashhack(a::bytea, b::bytea)
=
hashhack('a'::bytea,'b'::bytea);

If if was automatic man could avoid the overhead of converting data to 
bytea/string, concatenating, truncating, converting back to bigint, 
rechecking condition etc. that make this kind of hack not very sane.


Even providing a specially crafted function or operator for queries 
specifically targeted for the index would be quite sufficient:

where pg_equal( (a,b,c), (?,?,?) );

--
Tomasz "Tometzky" Ostrowski


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Multicolumn hash indexes

2017-09-26 Thread Tomasz Ostrowski

Hi.

I've noticed that hash indexes can't currently (in PG10) be multicolumn. 
Are they technically hard to implement or just nobody took such a feature?


I think multicolumn hash indexes should help pretty significantly with 
queries like:

- where username=? and user_post_id=?
- where client_id=? and period=? and invoice_number=?
etc.

I imagine that calculating a multicolumn hash should be pretty 
straightforward to implement - after hashing bytes of first column just 
keep going and update the hash state with bytes of a second and 
subsequent columns. And it should allow for faster (O(1), less IO) and 
much smaller (better cached, less IO again) multicolumn indexes. Also in 
PG10 hash indexes are WAL-logged and therefore much easier to work with. 
What do you think?


--
Tomasz "Tometzky" Ostrowski


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_restore foreign keys NOT VALID, or [assume] VALID; VALIDATE CONSTRAINT CONCURRENTLY

2017-02-04 Thread Tomasz Ostrowski

Hi.

A lot of time during pg_restore of a large database is spent on 
validating all the foreign keys. In contrast to importing data and 
creating indexes this operation does not parallelize well. So large 
percentage of parallel restore time ends up using single worker to 
validate foreign keys for the largest table.


If we'd have a option to restore the table without validating foreign 
keys and leaving them in NOT VALID state, the downtime needed for us to 
restore would decrease significantly.


If we'd also have an option to avoid blocking updates on the table 
during (potentially long) validating, for example:

  ALTER TABLE distributors VALIDATE CONSTRAINT CONCURRENTLY distfk;
Then we could postpone it and do it during normal operation of the 
database, out of precious disaster recovery time.


Alternatively maybe it should be allowed to do for example:
  ALTER TABLE distributor ADD CONSTRAINT distfk
FOREIGN KEY (address) REFERENCES addresses (address) VALID;
It would mean that the database should assume that this constraint is 
valid. Should be possible to turn it on using some pg_restore option (or 
pg_dump option when dumping to text format), though maybe only when 
restoring whole database, not single table. Though there's a possibility 
that a partially failed restore could leave database in inconsistent 
state. So I'd rather prefer the above option (NOT VALID + VALIDATE 
CONCURRENTLY).


Any comments on this? Does it look like a good idea? It shouldn't be 
hard to implement.


--
Tomasz "Tometzky" Ostrowski


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Routine analyze of single column prevents standard autoanalyze from running at all

2016-06-08 Thread Tomasz Ostrowski

W dniu 2016-06-08 o 05:04, Tom Lane pisze:

Jim Nasby  writes:

Is there any significant advantage to not analyzing all columns? Only
case I can think of is if you have a fair number of columns that have
been toasted; otherwise I'd think IO would completely swamp any other
considerations.


Yeah, my guess is that the OP's example where analyzing just one column
was significantly cheaper boiled down to some of the other columns being
mostly toasted data.  Otherwise it's hard to see how there's much more
expense in analyzing them all.


Actually no - this volatile column has smaller "statistics" than most of 
the table, so analyzing it is much faster when it's data is not in RAM. 
Here is a small exaggerated example showing a difference:


$ psql
tometzky=> create table test (id serial, data text);
tometzky=> insert into test(data)
 select 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. '
 ||generate_series(0,1000)::text;
tometzky=> alter table test alter column id set statistics 10;
tometzky=> alter table test alter column data set statistics 1000;
tometzky=> \q

# Drop OS page cache and restart postgres
# so the table data won't be in RAM anymore:
$ sudo sh -c 'echo 1 >/proc/sys/vm/drop_caches'
$ sudo systemctl restart postgresql;

# Test single column analyze:
$ psql
tometzky=> \timing
Timing is on.
tometzky=> analyze verbose test(id);
INFO:  analyzing "public.test"
INFO:  "test": scanned 3000 of 123457 pages, containing 243000 live rows 
and 0 dead rows; 3000 rows in sample, 2712238 estimated total rows

ANALYZE
Time: 422,521 ms
tometzky=> \q

# Drop OS page cache and restart postgres again
$ sudo sh -c 'echo 1 >/proc/sys/vm/drop_caches'
$ sudo systemctl restart postgresql;

$ psql
tometzky=> \timing
Timing is on.
tometzky=> analyze verbose test;
INFO:  analyzing "public.test"
INFO:  "test": scanned 123457 of 123457 pages, containing 1001 live 
rows and 0 dead rows; 30 rows in sample, 1001 estimated total rows

ANALYZE
Time: 9447,519 ms

--
Tomasz "Tometzky" Ostrowski


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing Sorting Refinements

2008-01-07 Thread Tomasz Ostrowski
On Tue, 08 Jan 2008, [EMAIL PROTECTED] wrote:

 Well, sorry for hijacking... ummm how did I do that?

You replied to a post instead of creating a new, unrelated e-mail. It
is different.

Just try to use threaded mode of your e-mail client and you'll get
the idea.

Regards
Tometzky
-- 
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
  Winnie the Pooh

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Spoofing as the postmaster

2007-12-27 Thread Tomasz Ostrowski
On Sun, 23 Dec 2007, Tom Lane wrote:

 ISTM we have these action items:
 1. Improve the code so that SSL authentication can be used across a
 Unix-socket connection (we can disable encryption though).

I've just realised that there's a problem with SSL with disabled
encryption on a unix socket / localhost connections for cpu-saving.
Any local user using this attack would be able to eavesdrop
everything comming through a socket.

If an attacker just acts as a tunnel, highjacking a unix-socket and
talking to a server using any other interface (or the other way
around), then he would not be able to modify information flow, but he
would be able to read and save everything going to and from a server.
It is again not obvious as normally local connections are not
susceptible to eavesdropping. And could go unnoticed for a long time
as everything would just work normally.

So I think no cpu-saving by turning off encryption should be done.

And this would all not help for a denial-of-service attack.

Regards
Tometzky
-- 
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
  Winnie the Pooh

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Spoofing as the postmaster

2007-12-23 Thread Tomasz Ostrowski
On Sun, 23 Dec 2007, Magnus Hagander wrote:

 I'm just surprised that people are actually surprised by this. To me,
 it's just a natural fact that happens to pretty much all systems. And a
 good reason not to let arbitrary users run processes that can bind to
 something on your server.

Not everybody works for Enterprise, where price does not matter. I
cannot afford a dedicated servers for database, DNS, e-mail,
antispam, firewall, file, WWW etc. Even administrative overhead would
be too much for one person IT staff. I have to run all of this
and much more on one machine, so I'm interested in limiting rights
for a user for example running WWW, so when, god forbid, compromized,
it'd limit damage.

I am also not able to run sophisticated security frameworks, limiting
every user rights to just what they need, as maintaining it would
require a security full-timer.

So I'm not very fond of this insecure by default, it's your problem
to make it secure attitude. I'm the one who reported this.

Regards
Tometzky
-- 
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
  Winnie the Pooh

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Spoofing as the postmaster

2007-12-23 Thread Tomasz Ostrowski
On Sun, 23 Dec 2007, Tom Lane wrote:

 IIRC, you started out your argument by also saying that we had to move
 the TCP socket to the reserved range, so as to prevent the equivalent
 problem in the TCP case.
 
 1. Postmaster must be started as root, thereby introducing security
 risks of its own (ie, after breaking into the DB, an attacker might be
 able to re-acquire root privileges).

Not at all, as it won't run as root, it'll just start as root and
then give up all root privileges. The only thing it would have after
being root is just an open socket.

 2. Can only have one postmaster per machine (ICANN is certainly not
 going to give us dozens of reserved addresses).

I don't think ICANN would prevent anybody from using different port.
I'm running httpd on port 81, sshd on 222 etc. It's just the default
that should be made official through ICANN.

 3. Massive confusion and breakage as various people transition to the
 new standard at different times.

As with any major version.

 4. Potential to create, rather than remove, spoofing opportunities
 anyplace there is confusion about which port the postmaster is really
 listening on.

I agree. But because it would just not work it'll be easy to notice
and correct. And when corrected it would be no more confusion.

 Fundamentally these are man-in-the-middle attacks, and the only real
 solution is mutual authentication.

The problem is not many people expect man-in-the-middle attack on
secure lan, localhost or local socket connection, so they'll not try
to prevent it.

Regards
Tometzky
-- 
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
  Winnie the Pooh

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq