Re: [HACKERS] Sync Rep: First Thoughts on Code

2008-12-06 Thread Fujii Masao
Hi,

On Fri, Dec 5, 2008 at 7:09 PM, Simon Riggs [EMAIL PROTECTED] wrote:

 On Fri, 2008-12-05 at 12:09 +0900, Fujii Masao wrote:

 On Thu, Dec 4, 2008 at 6:29 PM, Simon Riggs [EMAIL PROTECTED] wrote:
  The only sensible settings are
  synchronous_commit = on, synchronous_replication = on
  synchronous_commit = on, synchronous_replication = off
  synchronous_commit = off, synchronous_replication = off
 
  This doesn't make any sense: (does it??)
  synchronous_commit = off, synchronous_replication = on

 If the standby replies before writing the WAL, that strategy can improve
 the performance with moderate reliability, and sounds sensible.

 Do you think it likely that your replication time is consistently and
 noticeably less than your time-to-disk?

It depends on a system environment.
- How many miles two servers? same rack? separate continent?
- Does system have high-end storage? cheap one?
... etc


 On a related thought: presumably we force a sync rep if forceSyncCommit
 is set?

Yes!
Please see RecordTransactionCommit() in xact.c (in my patch).

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep: First Thoughts on Code

2008-12-06 Thread Fujii Masao
Greetings!

On Fri, Dec 5, 2008 at 6:59 PM, Simon Riggs [EMAIL PROTECTED] wrote:

 On Fri, 2008-12-05 at 16:00 +0900, Fujii Masao wrote:

 On Fri, Dec 5, 2008 at 12:09 PM, Fujii Masao [EMAIL PROTECTED] wrote:
  I was expecting you to have walreceiver and startup share an end of WAL
  address via shared memory, so that startup never tries to read past end.
  That way we would be able to begin reading a WAL file *before* it was
  filled. Waiting until a file fills means we still have to have
  archive_timeout set to ensure we switch regularly.
 
  You mean that not pg_standby but startup process waits for the next
  WAL available? If so, I agree with you in the future. That is, I just think
  that this is next TODO because there are many problems which we
  should resolve carefully to achieve it. But, if it's essential for 8.4, I 
  will
  tackle it. What is your opinion? I'd like to clear up the goal for 8.4.

 Umm.. on second thought, this feature (continuous recovery without
 pg_standby) seems to be essential for 8.4. So, I will try it.

 Sounds good. Perhaps you can share what changed your mind in those 4
 hours...

Yeah, it's my imagination about the real situation after 8.4 release,
especially I considered about the future conjugal life of Synch Rep and
Hot Standby ;) Waiting to redo until the file fills might lead to marital
breakdown.


 Could we start with pictures and some descriptions first, so we know
 we're on the right track? I foresee no coding issues.

 My understanding is that we start with a normal log shipping
 architecture, then we switch into continuous recovery mode. So we do use
 pg_standby at beginning, but then it gets turned off.

Yes, I also understand so. Updated sequence pictures are on wiki
as per usual. Please see P3, 4.
http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Detailed_Design


 Let's look at all of the corner cases also:
 * standby keeps pace with primary (desired state)
 * standby falls behind primary
 * standby restarts to change shmmem settings
 etc

Yes, I will examine such cases!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing DISTINCT with LIMIT

2008-12-06 Thread David Lee Lambert
On Thursday 04 December 2008 15:09, Gregory Stark wrote:
 tmp [EMAIL PROTECTED] writes:

  Also, it is my impression that many people use LIMIT to minimize the
  evaluation time of sub queries from which the outer query only needs a
  small subset of the sub query output.

 I've seen lots of queries which only pull a subset of the results too --
 but it's always a specific subset. So that means using ORDER BY or a WHERE
 clause to control it.

I use ORDER BY random() LIMIT :some_small_number frequently to get a feel 
for data.  That always builds the unrandomized relation and then sorts it.  I 
guess an alternate path for single-table queries would be to randomly choose 
a block number and then a tuple number;  but that would be biased toward long 
rows (of which fewer can appear in a block).

-- 
David Lee Lambert ... Software Developer
Cell phone: +1 586-873-8813 ; alt. email [EMAIL PROTECTED] or 
[EMAIL PROTECTED]
GPG key at http://www.lmert.com/keyring.txt

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] visibility map - what do i miss?

2008-12-06 Thread hubert depesz lubaczewski
--- repost to hackers as suggested by RhodiumToad ---

hi,
i tried to test new visibility map feature.

to do so i:
1. fetched postgresql sources from cvs
2. compiled
3. turned autovacuum off
4. started pg
5. ran this queries:
  - CREATE TABLE test_1 (i INT4);
  - CREATE TABLE test_2 (i INT4);
  - CREATE TABLE test_3 (i INT4);
  - CREATE TABLE test_4 (i INT4);
  - INSERT INTO test_1 SELECT generate_series(1, 1);
  - INSERT INTO test_2 SELECT generate_series(1, 1);
  - INSERT INTO test_3 SELECT generate_series(1, 1);
  - INSERT INTO test_4 SELECT generate_series(1, 1);
  - UPDATE test_2 SET i = i + 1 WHERE i  1000;
  - UPDATE test_3 SET i = i + 1 WHERE i  5000;
  - UPDATE test_4 SET i = i + 1 WHERE i  9000;
  - VACUUM test_1;
  - VACUUM test_2;
  - VACUUM test_3;
  - VACUUM test_4;

I did it 2 times, first with sources of pg from 1st of november, and
second - with head from yesterday evening (warsaw, poland time).

results puzzled me.

First run - without visibility maps, timing of vacuums:
Time: 267844.822 ms
Time: 138854.592 ms
Time: 305467.950 ms
Time: 487133.179 ms

Second run - on head:

Time: 252218.609 ms
Time: 234388.763 ms
Time: 334016.413 ms
Time: 575698.750 ms

Now - as I understand the change - visilibity maps should make second run much 
faster?

Tests were performed on laptop. During first test I used it to browse the web,
read mail. During second test - nobody used the laptop.

Relation forms seem to exist:
# select oid from pg_database where datname = 'depesz';
  oid
---
 16389
(1 row)
# select relfilenode from pg_class where relname ~ 'test_';
 relfilenode
-
   26756
   26759
   26762
   26765
(4 rows)

= ls -l {26756,26759,26762,26765}*
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:31 26756
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:33 26756.1
-rw--- 1 pgdba pgdba 1065066496 2008-12-06 01:34 26756.2
-rw--- 1 pgdba pgdba 811008 2008-12-06 01:34 26756_fsm
-rw--- 1 pgdba pgdba  57344 2008-12-06 01:34 26756_vm
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:35 26759
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:36 26759.1
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:37 26759.2
-rw--- 1 pgdba pgdba  312582144 2008-12-06 01:39 26759.3
-rw--- 1 pgdba pgdba 892928 2008-12-06 01:39 26759_fsm
-rw--- 1 pgdba pgdba  57344 2008-12-06 01:39 26759_vm
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:39 26762
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:49 26762.1
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:41 26762.2
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:42 26762.3
-rw--- 1 pgdba pgdba  523862016 2008-12-06 01:43 26762.4
-rw--- 1 pgdba pgdba1204224 2008-12-06 01:43 26762_fsm
-rw--- 1 pgdba pgdba  81920 2008-12-06 01:53 26762_vm
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 02:01 26765
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 02:08 26765.1
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 02:18 26765.2
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:50 26765.3
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:51 26765.4
-rw--- 1 pgdba pgdba  735141888 2008-12-06 02:00 26765.5
-rw--- 1 pgdba pgdba1523712 2008-12-06 02:00 26765_fsm
-rw--- 1 pgdba pgdba  98304 2008-12-06 02:18 26765_vm

What do I miss?

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: [EMAIL PROTECTED] / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map - what do i miss?

2008-12-06 Thread Guillaume Smet
On Sat, Dec 6, 2008 at 12:53 PM, hubert depesz lubaczewski
[EMAIL PROTECTED] wrote:
 First run - without visibility maps, timing of vacuums:
 Time: 267844.822 ms
 Time: 138854.592 ms
 Time: 305467.950 ms
 Time: 487133.179 ms

 Second run - on head:

 Time: 252218.609 ms
 Time: 234388.763 ms
 Time: 334016.413 ms
 Time: 575698.750 ms

 Now - as I understand the change - visilibity maps should make second run 
 much faster?

If I understand correctly what Heikki explained, not if you run VACUUM
only once (and you confirmed me on IRC you run it only once). It's the
VACUUM which sets the PD_ALL_VISIBLE flag on the pages so the first
VACUUM should be slower with HEAD than with 8.3 as it has far more
work to do.

The second VACUUM should then be faster.

-- 
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map - what do i miss?

2008-12-06 Thread Heikki Linnakangas

Guillaume Smet wrote:

On Sat, Dec 6, 2008 at 12:53 PM, hubert depesz lubaczewski
[EMAIL PROTECTED] wrote:

First run - without visibility maps, timing of vacuums:
Time: 267844.822 ms
Time: 138854.592 ms
Time: 305467.950 ms
Time: 487133.179 ms

Second run - on head:

Time: 252218.609 ms
Time: 234388.763 ms
Time: 334016.413 ms
Time: 575698.750 ms

Now - as I understand the change - visilibity maps should make second run much 
faster?


If I understand correctly what Heikki explained, not if you run VACUUM
only once (and you confirmed me on IRC you run it only once). It's the
VACUUM which sets the PD_ALL_VISIBLE flag on the pages so the first
VACUUM should be slower with HEAD than with 8.3 as it has far more
work to do.

The second VACUUM should then be faster.


That diagnosis is not quite right, but the prognosis is correct. The 
first VACUUM after the UPDATEs doesn't set the PD_ALL_VISIBLE flags, 
because there's still dead tuples on the pages. The dead tuples are 
removed in the 2nd pass of the first vacuum, but it doesn't try to set 
the PD_ALL_VISIBLE flags; that's only done in the first phase.


The second vacuum is just as slow as the first one, because the 
visibility map doesn't have any bits set yet. The second vacuum will set 
the bits, though, so the *third* vacuum should go faster.


So setting the PD_ALL_VISIBLE flags doesn't slow things down. That 
should be just a tiny bit of extra CPU work per vacuumed page, not 
something that would show up in performance tests.


This is the 1st issue I mentioned in this mail:

http://archives.postgresql.org/message-id/[EMAIL PROTECTED]

There was some suggestions in that thread, but none has been implemented.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility maps

2008-12-06 Thread Heikki Linnakangas

Pavan Deolasee wrote:

/*
 * We don't need to lock the page, as we're only looking at a single
bit.
 */
result = (map[mapByte]  (1  mapBit)) ? true : false;


Isn't this a dangerous assumption to make ? I am not so sure that even a bit
can be read atomically on all platforms. 


Umm, what non-atomic state could the bit be in? Half-set, half-cleared? 
Or do you think that if some other bit in proximity is changed, the 
other bit would temporarily flip 0-1-0, or something like that? I 
don't think that should happen.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility maps and heap_prune

2008-12-06 Thread Heikki Linnakangas

Pavan Deolasee wrote:

ISTM that the PD_ALL_VISIBLE flag and the visibility map bit can be set at
the end of pruning operation if we know that there are only tuples visible
to all transactions left in the page.


Right.


The way pruning is done, I think it
would be straight forward to get this information.


Is it? I thought about that a bit while writing the patch, but didn't 
see any obvious way to do it. Except by adding a loop through all tuples 
on the page, but that's extra overhead. I think we're looping through 
all tuples in the pruning, but it's not quite obvious.


If you see a straightforward way, please submit a patch!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility maps

2008-12-06 Thread Pavan Deolasee
On Sat, Dec 6, 2008 at 7:57 PM, Heikki Linnakangas 
[EMAIL PROTECTED] wrote:


 Umm, what non-atomic state could the bit be in? Half-set, half-cleared? Or
 do you think that if some other bit in proximity is changed, the other bit
 would temporarily flip 0-1-0, or something like that? I don't think that
 should happen.


Since the lock is not held, the bit can be flipped while we are reading,
isn't it ? IOW, the test is not reliable is what I fear.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] visibility maps

2008-12-06 Thread Heikki Linnakangas

Pavan Deolasee wrote:

On Sat, Dec 6, 2008 at 7:57 PM, Heikki Linnakangas 
[EMAIL PROTECTED] wrote:


Umm, what non-atomic state could the bit be in? Half-set, half-cleared? Or
do you think that if some other bit in proximity is changed, the other bit
would temporarily flip 0-1-0, or something like that? I don't think that
should happen.


Since the lock is not held, the bit can be flipped while we are reading,
isn't it ? IOW, the test is not reliable is what I fear.


If someone is changing the bit at the same time, it doesn't matter 
whether we read it as 1 or 0. Locking the page wouldn't change the 
situation: you would still read the old value if you got the lock before 
the concurrent updater, or the new value if you got the lock after.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility maps

2008-12-06 Thread Heikki Linnakangas

Pavan Deolasee wrote:

/*
 * Size of the bitmap on each visibility map page, in bytes. There's no
 * extra headers, so the whole page minus except for the standard page
header
 * is used for the bitmap.
 */
#define MAPSIZE (BLCKSZ - SizeOfPageHeaderData)


ISTM that we should MAXALIGN the SizeOfPageHeaderData to compute MAPSIZE.
PageGetContents() works that way and I believe that's the right thing to do.


Yep, you're right. Thanks, fixed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing DISTINCT with LIMIT

2008-12-06 Thread Grzegorz Jaskiewicz


On 2008-12-06, at 11:29, David Lee Lambert wrote:




I use ORDER BY random() LIMIT :some_small_number frequently to get  
a feel
for data.  That always builds the unrandomized relation and then  
sorts it.  I
guess an alternate path for single-table queries would be to  
randomly choose
a block number and then a tuple number;  but that would be biased  
toward long

rows (of which fewer can appear in a block).


but that's going to be extremely slow, due to speed of random()  
function.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] benchmarking the query planner (was Re: Simple postgresql.conf wizard)

2008-12-06 Thread Robert Haas
Sorry for top posting but we are getting a bit far afield from the
original topic.  I followed up the tests I did last night:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00369.php

I benchmarked 100 iterations of EXPLAIN on the query Greg Stark put
together as a synthetic benchmark for default_statistics_target with
various values for SET STATISTICS n.  Testing was done on CVS HEAD
on my laptop with no configure options other than --prefix.  Then I
did this, to disable compression on pg_statistic.

alter table pg_statistic alter column stanumbers1 set storage external;
alter table pg_statistic alter column stanumbers2 set storage external;
alter table pg_statistic alter column stanumbers3 set storage external;
alter table pg_statistic alter column stanumbers4 set storage external;
alter table pg_statistic alter column stavalues1 set storage external;
alter table pg_statistic alter column stavalues2 set storage external;
alter table pg_statistic alter column stavalues3 set storage external;
alter table pg_statistic alter column stavalues4 set storage external;

(Note that you'll need to put allow_system_table_mods=true in your
postgresql.conf file if you want this to work.)  Then I reran the
tests.  The results were pretty dramatic.  In the table below, the
first column is value of SET STATISTICS n that was performed the
table column prior to analyzing it.  The second column is the time
required to plan the query 100x AFTER disabling compression on
pg_statistic, and the third column is the time required to plan the
query 100x BEFORE disabling compression on pg_statistic.

10  0.8292020.8249
20  1.0599761.06957
30  1.1687271.143803
40  1.2871891.263252
50  1.3701671.363951
60  1.4865891.460464
70  1.6038991.571107
80  1.69402 1.689651
90  1.79068 1.804454
100 1.9308772.803941
150 2.4464714.833002
200 2.95323 6.217708
250 3.4367417.507919
300 3.9835688.895015
350 4.49747510.201713
400 5.07247111.576961
450 5.61527212.933128
500 6.28635814.408157
550 6.89595115.745378
600 7.40013417.192916
650 8.03815918.568616
700 8.60670420.025952
750 9.15488921.45775
800 9.80953 22.74635
850 10.363471   24.057379
900 11.022348   25.559911
950 11.6973227.021034
100012.266699   28.711027

As you can see, for default_statistics_target  90, this is a HUGE win.

After doing this test, I rebuilt with --enable-profiling and profiled
EXPLAIN 10x with SET STATISTICS 10, 70, 100, 200 with a vanilla
configuration, and then 200 again with compression turned off as
described above.  The, ahem, ridiculously small limit on attachment
size prevents me from attaching the full results, so please see the
attached results which are truncated after the first section.  10x
doesn't seem to be quite enough to get the exact picture of where the
bottlenecks are, but the overall picture is clear enough:
decompression introduces a huge overhead.

Looking specifically at the 200-decompress output, the next biggest
hit is AllocSetAlloc(), which, from the detailed results that I
unfortunately can't include, is being called mostly by datumCopy()
which is being called mostly by get_attstatsslot().  There are 4000
calls to get_attstatsslot() which result 701,500 calls to datumCopy().

I'm not too sure what any of this means in terms of optimizatiion,
other than that changing the storage type of pg_statistic columns to
external looks like a huge win.  Perhaps someone more knowledgeable
than I has some thoughts.

...Robert


gmon-summary.tbz
Description: application/bzip-compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] user-based query white list

2008-12-06 Thread Andrew Chernow
Looking for a way to limited a user to a specific set of queries.  I don't think 
this can be done right now ... or can it?  Has this feature request surfaced in 
the past?


I currently need this as an extra security measure for a libpq client app (want 
to block arbitrary queries from malicious attackers).  The easiest way I found 
was to add some query_string checks into backend/tcop/postgres.c for the 'Q' and 
'P' commands in PostgresMain().  Seems to work just fine.  If it doesn't match, 
I issue an ereport FATAL since that is seen as a malicious query execution 
attempt.


I think it is something rather simple to design/implement (probably use a table 
of user allowed queries, support regex matches, etc.. loaded at session startup 
and SIGHUP).


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] user-based query white list

2008-12-06 Thread Grzegorz Jaskiewicz


On 2008-12-06, at 18:21, Andrew Chernow wrote:

Looking for a way to limited a user to a specific set of queries.  I  
don't think this can be done right now ... or can it?  Has this  
feature request surfaced in the past?


I currently need this as an extra security measure for a libpq  
client app (want to block arbitrary queries from malicious  
attackers).  The easiest way I found was to add some query_string  
checks into backend/tcop/postgres.c for the 'Q' and 'P' commands in  
PostgresMain().  Seems to work just fine.  If it doesn't match, I  
issue an ereport FATAL since that is seen as a malicious query  
execution attempt.


I think it is something rather simple to design/implement (probably  
use a table of user allowed queries, support regex matches, etc..  
loaded at session startup and SIGHUP).


Can it be done with views, and adjusting permissions so user is only  
allowed to use few views ??



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] user-based query white list

2008-12-06 Thread Andrew Chernow

Grzegorz Jaskiewicz wrote:


On 2008-12-06, at 18:21, Andrew Chernow wrote:

Looking for a way to limited a user to a specific set of queries.  I 
don't think this can be done right now ... or can it?  Has this 
feature request surfaced in the past?


I currently need this as an extra security measure for a libpq client 
app (want to block arbitrary queries from malicious attackers).  The 
easiest way I found was to add some query_string checks into 
backend/tcop/postgres.c for the 'Q' and 'P' commands in 
PostgresMain().  Seems to work just fine.  If it doesn't match, I 
issue an ereport FATAL since that is seen as a malicious query 
execution attempt.


I think it is something rather simple to design/implement (probably 
use a table of user allowed queries, support regex matches, etc.. 
loaded at session startup and SIGHUP).


Can it be done with views, and adjusting permissions so user is only 
allowed to use few views ??





Not sure.  The client I am working on only calls functions, small API to 
interact with (no knowledge of views or tables).  Even if that were not the 
case, would views stop a client from sending in other queries, like SELECT 1+1 
or something that could bog down the server?


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] user-based query white list

2008-12-06 Thread Grzegorz Jaskiewicz


On 2008-12-06, at 18:30, Andrew Chernow wrote:


Grzegorz Jaskiewicz wrote:

On 2008-12-06, at 18:21, Andrew Chernow wrote:
Looking for a way to limited a user to a specific set of queries.   
I don't think this can be done right now ... or can it?  Has this  
feature request surfaced in the past?


I currently need this as an extra security measure for a libpq  
client app (want to block arbitrary queries from malicious  
attackers).  The easiest way I found was to add some query_string  
checks into backend/tcop/postgres.c for the 'Q' and 'P' commands  
in PostgresMain().  Seems to work just fine.  If it doesn't match,  
I issue an ereport FATAL since that is seen as a malicious query  
execution attempt.


I think it is something rather simple to design/implement  
(probably use a table of user allowed queries, support regex  
matches, etc.. loaded at session startup and SIGHUP).
Can it be done with views, and adjusting permissions so user is  
only allowed to use few views ??


Not sure.  The client I am working on only calls functions, small  
API to interact with (no knowledge of views or tables).  Even if  
that were not the case, would views stop a client from sending in  
other queries, like SELECT 1+1 or something that could bog down  
the server?



I use views to simplify code. Say you have a simple join, with one  
WHERE. You omit the WHERE in view, and leave it like that. Than just  
select foo1, foo2 from VIEW WHERE boo1=foo1 and foo3  '123';
Postgresql is smart enough, to run it as one query (as oppose to  
mysql), so the code is simpler, everybody's happy.


If you want to continue on that discussion, I suggest we move it to pg- 
general.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing DISTINCT with LIMIT

2008-12-06 Thread Greg Stark
It's slow because there's no way around running through the entire  
input. The optimization tmp is talking about wouldn't be relevant  
becase there is an order by clause which was precisely why I I said it  
was a fairly narrow use case. Most people who use limit want a  
specific subset even if that specific subset is random. Without the  
order by the subset is entirely arbitrary but not useully random.


Incidentally order by ... limit is amenable to an optimization which  
avoids having to *sort* the whole input even though it still has to  
read the whole input. We implemented that in 8.3.



greg

On 6 Dec 2008, at 06:08 PM, Grzegorz Jaskiewicz [EMAIL PROTECTED]  
wrote:




On 2008-12-06, at 11:29, David Lee Lambert wrote:




I use ORDER BY random() LIMIT :some_small_number frequently to  
get a feel
for data.  That always builds the unrandomized relation and then  
sorts it.  I
guess an alternate path for single-table queries would be to  
randomly choose
a block number and then a tuple number;  but that would be biased  
toward long

rows (of which fewer can appear in a block).


but that's going to be extremely slow, due to speed of random()  
function.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] user-based query white list

2008-12-06 Thread Andrew Chernow

Grzegorz Jaskiewicz wrote:


On 2008-12-06, at 18:30, Andrew Chernow wrote:


Grzegorz Jaskiewicz wrote:

On 2008-12-06, at 18:21, Andrew Chernow wrote:
Looking for a way to limited a user to a specific set of queries.  I 
don't think this can be done right now ... or can it?  Has this 
feature request surfaced in the past?


I currently need this as an extra security measure for a libpq 
client app (want to block arbitrary queries from malicious 
attackers).  The easiest way I found was to add some query_string 
checks into backend/tcop/postgres.c for the 'Q' and 'P' commands in 
PostgresMain().  Seems to work just fine.  If it doesn't match, I 
issue an ereport FATAL since that is seen as a malicious query 
execution attempt.


I think it is something rather simple to design/implement (probably 
use a table of user allowed queries, support regex matches, etc.. 
loaded at session startup and SIGHUP).
Can it be done with views, and adjusting permissions so user is only 
allowed to use few views ??


Not sure.  The client I am working on only calls functions, small API 
to interact with (no knowledge of views or tables).  Even if that were 
not the case, would views stop a client from sending in other queries, 
like SELECT 1+1 or something that could bog down the server?



I use views to simplify code. Say you have a simple join, with one 
WHERE. You omit the WHERE in view, and leave it like that. Than just 
select foo1, foo2 from VIEW WHERE boo1=foo1 and foo3  '123';
Postgresql is smart enough, to run it as one query (as oppose to mysql), 
so the code is simpler, everybody's happy.


If you want to continue on that discussion, I suggest we move it to 
pg-general.





I don't think view-based security solves my problem.  I need to limit a user to 
20 fixed queries, for example.  That means the user cannot execute SELECT 
NOW() or SELECT 'hello world'.  The user can only execute a pre-defined list 
of queries.


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] benchmarking the query planner (was Re: Simple postgresql.conf wizard)

2008-12-06 Thread Greg Stark
That might only be the case when the pg_statistic record is in shared  
buffers.


Also I wonder if eqjoinsel and company might need to be made more  
toast-aware by detoasring all the things it needs once rather than  
every time it accesses them.


greg

On 6 Dec 2008, at 06:19 PM, Robert Haas [EMAIL PROTECTED] wrote:


Sorry for top posting but we are getting a bit far afield from the
original topic.  I followed up the tests I did last night:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00369.php

I benchmarked 100 iterations of EXPLAIN on the query Greg Stark put
together as a synthetic benchmark for default_statistics_target with
various values for SET STATISTICS n.  Testing was done on CVS HEAD
on my laptop with no configure options other than --prefix.  Then I
did this, to disable compression on pg_statistic.

alter table pg_statistic alter column stanumbers1 set storage  
external;
alter table pg_statistic alter column stanumbers2 set storage  
external;
alter table pg_statistic alter column stanumbers3 set storage  
external;
alter table pg_statistic alter column stanumbers4 set storage  
external;

alter table pg_statistic alter column stavalues1 set storage external;
alter table pg_statistic alter column stavalues2 set storage external;
alter table pg_statistic alter column stavalues3 set storage external;
alter table pg_statistic alter column stavalues4 set storage external;

(Note that you'll need to put allow_system_table_mods=true in your
postgresql.conf file if you want this to work.)  Then I reran the
tests.  The results were pretty dramatic.  In the table below, the
first column is value of SET STATISTICS n that was performed the
table column prior to analyzing it.  The second column is the time
required to plan the query 100x AFTER disabling compression on
pg_statistic, and the third column is the time required to plan the
query 100x BEFORE disabling compression on pg_statistic.

100.8292020.8249
201.0599761.06957
301.1687271.143803
401.2871891.263252
501.3701671.363951
601.4865891.460464
701.6038991.571107
801.694021.689651
901.790681.804454
1001.9308772.803941
1502.4464714.833002
2002.953236.217708
2503.4367417.507919
3003.9835688.895015
3504.49747510.201713
4005.07247111.576961
4505.61527212.933128
5006.28635814.408157
5506.89595115.745378
6007.40013417.192916
6508.03815918.568616
7008.60670420.025952
7509.15488921.45775
8009.8095322.74635
85010.36347124.057379
90011.02234825.559911
95011.6973227.021034
100012.26669928.711027

As you can see, for default_statistics_target  90, this is a HUGE  
win.


After doing this test, I rebuilt with --enable-profiling and profiled
EXPLAIN 10x with SET STATISTICS 10, 70, 100, 200 with a vanilla
configuration, and then 200 again with compression turned off as
described above.  The, ahem, ridiculously small limit on attachment
size prevents me from attaching the full results, so please see the
attached results which are truncated after the first section.  10x
doesn't seem to be quite enough to get the exact picture of where the
bottlenecks are, but the overall picture is clear enough:
decompression introduces a huge overhead.

Looking specifically at the 200-decompress output, the next biggest
hit is AllocSetAlloc(), which, from the detailed results that I
unfortunately can't include, is being called mostly by datumCopy()
which is being called mostly by get_attstatsslot().  There are 4000
calls to get_attstatsslot() which result 701,500 calls to datumCopy().

I'm not too sure what any of this means in terms of optimizatiion,
other than that changing the storage type of pg_statistic columns to
external looks like a huge win.  Perhaps someone more knowledgeable
than I has some thoughts.

...Robert
gmon-summary.tbz


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mostly Harmless: Welcoming our C++ friends

2008-12-06 Thread Bruce Momjian
Greg Smith wrote:
 On Fri, 5 Dec 2008, Robert Treat wrote:
 
  Might I suggest you collect all of these various arguments (both for and
  against) and patches into a wiki page on the developers wiki?
 
 I'm getting the feeling this is going to take a while to sort out too. 
 Page with most of the relevant stuff Kurt has posted so far is now listed 
 under Development Projects on the wiki: 
 http://wiki.postgresql.org/wiki/C%2B%2B_Compatibility

Is this a TODO?

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mostly Harmless: Welcoming our C++ friends

2008-12-06 Thread James Mansion

Kurt Harriman wrote:

The foremost opposing argument seems to have been that there
should be no attempt to alleviate the existing reserved word
problem without automatic enforcement to guarantee that never
in the future can new occurrences be introduced.

Is there anything in the source that would necessarily preclude using the
C++ compiler to build *all* the code?

I'd guess that this would be quite a big patch to do this in any places
where we have implicit conversions from void* to char* etc, but
the result is valid as C and C++ and arguably better documented.

C++ is picky about a few things you can do in C, but most of them
are things I'd rather not do anyway.

Run such a build on the build farm each night, and it will be obvious as
soon as C++-unfriendly code sneaks in.

And who know, maybe eventually we could use C++ properly in the
code.

James


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] user-based query white list

2008-12-06 Thread Andrew Dunstan



Andrew Chernow wrote:


I don't think view-based security solves my problem.  I need to limit 
a user to 20 fixed queries, for example.  That means the user cannot 
execute SELECT NOW() or SELECT 'hello world'.  The user can only 
execute a pre-defined list of queries.




Put your queries in security definer functions and put those in a schema 
that is the only one your user has access to. That should just about do 
the trick, although s/he might still be able to do select 'foo';


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-06 Thread Bruce Momjian
KaiGai Kohei wrote:
 Bruce Momjian wrote:
  KaiGai Kohei wrote:
  I don't oppose to elimination of --disable-row-acl options, however,
  it is not clear for me whether it should be unavoidable selection in
  the future, or not.
  Look at the existing configure options;  we don't remove features via
  configure unless it is for some platform-specific reason.  Please remove
  the configure option and make it always enabled.
  OK, I'll update it in the next patch set.
  
  Good.  I assume the SQL-row security patch is not testable alone with
  out the rest of the patches, right?
 
 The minimum requirements are the 1st and 2nd patches.
 The first provides security hooks to PostgreSQL server program, and
 the other provides ones to pg_dump command.
 The 3rd, 4th and 5th are not necessary for the test purpose.

First, let me say you have done an amazing job of producing patches for
us, and your code quality is very high, especially considering the
complexity of this code and your newness to our development process. My
compliments to NEC, your employer.

Also, I personally am excited about this code and what it will add to
Postgres 8.4.

I hate to ask for something else from you, but I am trying to figure out
how we can proceed in reviewing and applying your additions.  I am
wondering if you can produce a patch that has the SE-Linux part separate
so I can review the non-SE-Linux parts of the patch alone --- right now
I am not 100% clear on what parts are always active as row-level SQL
security and what needs SE-Linux to operate.  I know this is an
additional burden on you and if it is too much to ask, please tell me.

Thanks.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add columns via CREATE OR REPLACE VIEW

2008-12-06 Thread Bruce Momjian

Patch applied, thanks.

---

Robert Haas wrote:
  I had a deeper look at this now. The patch looks clean and applies without
  any problems, regression tests passes. However, ATRewriteTables() has a
  problem when adding columns with domains and constraints. Consider this
  small test case:
 
  CREATE TABLE bar (id INTEGER);
  CREATE OR REPLACE VIEW vbar AS SELECT * FROM bar;
  CREATE DOMAIN person AS TEXT CHECK(value IN ('haas', 'helmle'));
  ALTER TABLE bar ADD COLUMN name person;
  CREATE OR REPLACE VIEW vbar AS SELECT * FROM bar;
 
  The last command confuses ATRewriteTable(), which wants to scan the relation
  leading to this error:
  ERROR:  could not open relation base/16384/16476:
 
  I see that ATRewriteTable() errors out on heap_beginscan(), since needscan
  is set to TRUE. One solution would be to teach ATRewriteTable(s) to handle
  view alteration differently in this case.
 
 After looking at this, I think the root cause of this problem is that
 ATPrepAddColumn isn't smart enough to know that when the underlying
 relation is a view, there's no point in asking for a table rewrite.
 Please find an updated patch that addresses this problem.
 
 Thanks again for the review - let me know what you think of this version!
 
 ...Robert

[ Attachment, skipping... ]

 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] allow has_table_privilege(..., 'usage') on sequences

2008-12-06 Thread Bruce Momjian
Abhijit Menon-Sen wrote:
 At 2008-09-22 12:54:34 -0500, [EMAIL PROTECTED] wrote:
 
  can we tell there is consensus in create a new has_sequence_privilege()?
  Abhijit will you make it? if not i can make a try...
 
 Yes, I'll do it.

This hasn't been done so I added it to the TODO list.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [postgis-devel] CLUSTER in 8.3 on GIST indexes break

2008-12-06 Thread Robert W. Burgholzer
FWIW,
I have experienced some oddities in performing SELECT statements after
restarting on an 8.2 system, whereby I occasionally would get a ton of
duplicate records when I would do a select statement (my assumption is that
they are deleted tuples being returned).  If I executed the same select
statement again, I would get the correct number of records.

Thanks for exploring this issue ladies and gents.

r.b.


Quoting Mark Cave-Ayland [EMAIL PROTECTED]:

 Gregory Stark wrote:

  Uhm. That rather sucks. I was able to reproduce it too.
 
  It seems to happen after I pause for a bit, and not when I run the script
 in
  fast succession.

 Thanks for the verification Greg. I'm wondering if the GiST part is a
 red herring, and in fact it is related to some bizarre interaction
 between CLUSTER/VACUUM/autovacuum?


 ATB,

 Mark.

 --
 Mark Cave-Ayland
 Sirius Corporation - The Open Source Experts
 http://www.siriusit.co.uk
 T: +44 870 608 0063
 ___
 postgis-devel mailing list
 [EMAIL PROTECTED]
 http://postgis.refractions.net/mailman/listinfo/postgis-devel



--
Robert W. Burgholzer
--
Finding the occasional straw of truth awash in a great ocean of confusion and
bamboozle requires intelligence, vigilance, dedication and courage.  But if we
don't practice these tough habits of thought, we cannot hope to solve the truly
serious problems that face us -- and we risk becoming a nation of suckers, up
for grabs by the next charlatan who comes along.
-- Carl Sagan, The Fine Art of Baloney Detection, Parade, February 1, 1987

Web Hydrology Objects - Online Collaborative Modeling:
http://sourceforge.net/projects/npsource/
Home Page:
http://soulswimmer.dynalias.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Quick patch: Display sequence owner

2008-12-06 Thread Josh Williams
Hi folks,

Was recently poked and reminded that this patch may be of interest to
the community.  It was mostly done as an academic exercise, just to see
how it works, and so it has a rather hackish feel.  The patch adds the
sequence owner, if available, to psql's \d sequence output, as
suggested in a recent thread:

http://archives.postgresql.org/pgsql-general/2008-11/msg01300.php

The patch adds a query against pg_depend, then fakes an extra column
owned_by in the output:

# \d tablename_columnname_seq
 Sequence public.tablename_columnname_seq
Column |   Type   |  Value   
---+--+--
 sequence_name | name | tablename_columnname_seq
 last_value| bigint   | 1
 start_value   | bigint   | 1
 increment_by  | bigint   | 1
 max_value | bigint   | 9223372036854775807
 min_value | bigint   | 1
 cache_value   | bigint   | 1
 log_cnt   | bigint   | 1
 is_cycled | boolean  | f
 is_called | boolean  | f
 owned_by  | regclass | tablename


Now for the snags and additional thoughts:

The query against pg_depend looks for relations for which the sequence
is auto-dependent.  It wouldn't make any sense, but is it at all
possible for a sequence to auto-depend on something else?

An earlier version of the patch pulled the owning table and schema names
directly, rather than casting to regclass, so the schema name was always
shown.  Would this be preferable, in case there's some ambiguity in
similarly named tables between schemas?

I'd pondered briefly whether there should be a real attribute to
represent the sequence owner, just for display purposes.  But I'm
assuming that would present a big concurrency issue, as other
transactions would see the change on the sequence immediately while
pg_depend wouldn't be seen to change until committed.  That, and
ROLLBACK wouldn't work at all...

The column info query is getting messy.  Could probably clean that up a
bit if anyone thinks it'd be worth it?

- Josh Williams

Index: src/bin/psql/describe.c
===
RCS file: /projects/cvsroot/pgsql/src/bin/psql/describe.c,v
retrieving revision 1.188
diff -r1.188 describe.c
917c917
 		seq_values = pg_malloc_zero((SEQ_NUM_COLS+1) * sizeof(*seq_values));
---
 		seq_values = pg_malloc_zero((SEQ_NUM_COLS+2) * sizeof(*seq_values));
922a923,939
 
 		printfPQExpBuffer(buf,
 		  SELECT d.refobjid::regclass\n
 		  FROM pg_catalog.pg_depend d\n
 		  WHERE d.deptype = 'a' AND d.objid = '%s',
 		  oid);
 
 		result = PSQLexec(buf.data, false);
 		if (!result)
 			goto error_return;
 
 		if (PQntuples(result))
 			seq_values[10] = pg_strdup(PQgetvalue(result, 0, 0));
 		else 
 			seq_values[10] = ;
 
 		PQclear(result);
940c957,966
 	appendPQExpBuffer(buf, \nORDER BY a.attnum);
---
 	/* For sequences we'll 'fake' an additional column to show the owning relation */
 	if (tableinfo.relkind == 'S')
 	{
 		appendPQExpBuffer(buf, \nUNION SELECT 'owned_by', 'regclass', NULL, true, 11);
 		if (verbose)
 			appendPQExpBuffer(buf, , 'p', 'Owning relation');
 		appendPQExpBuffer(buf, \nORDER BY attnum);
 	}
 	else
 		appendPQExpBuffer(buf, \nORDER BY a.attnum);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers