Re: [HACKERS] (WIP) VACUUM REWRITE - CLUSTER by ctid

2009-10-22 Thread Heikki Linnakangas
Itagaki Takahiro wrote:
> I'm working on alternative version of VACUUM FULL, which is
> like CLUSTER but sort tuples in ctid order without index.
> The original discussion is here:
> [HACKERS] Feedback on getting rid of VACUUM FULL
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg01047.php
> 
> WIP patch attached. I have some questions over the development:
> 
>  1. Syntax: I choose "CLUSTER tbl WITHOUT INDEX" for the syntax,
> but it is debatable. What syntax is the best?
>   VACUUM REWRITE? CLUSTER ORDER BY ctid? or replace VACUUM FULL?

I got the impression that replacing VACUUM FULL is the most popular
opinion. I like VACUUM REWRITE myself, except that it would require
making REWRITE a reserved keyword. I don't like tacking this onto
CLUSTER, particularly not with "ORDER BY ctid". ctids are an
implementation detail most users don't care about, and ORDER BY sounds
like it's sorting something, but it's not.

>  2. Superclass of HeapScanDesc and IndexScanDesc:
> We don't have an abstraction layer of HeapScanDesc and IndexScanDesc,
> but the layer is useful for this purpose. Is it reasonable?
> It is partially implemented as genam_beginscan() family in the patch.

I don't think it's really necessary. You might as well put a few "if
(indexOid)" clauses directly into copy_heap_data.

>  3. Should we allow "ctid" as a default clustered index?
> We could assume "ctid" as a virtual index. The syntax for it
> might be "ALTER TABLE tbl CLUSTER ON COLUMN ctid" or so.

Isn't that the same as having no clustering index? We already have
"ALTER TABLE tbl SET WITHOUT CLUSTER".

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pre-proposal: type interfaces

2009-10-22 Thread Jeff Davis
I am starting to plan a few features that are important for temporal
data, and one prerequisite for several of them is the ability to find an
operator that fills a certain role.

For instance, one feature that I'm considering now is a "temporal join"
which is a join on "overlaps" rather than "equals", e.g.:

  SELECT * FROM a, b WHERE a.x && b.x;

I might try to provide a modified merge join algorithm to implement that
more efficiently in some cases. I don't mean to discuss the feature in
detail here (I will make a separate proposal) but the algorithm requires
that I find the "strictly left of" operator. 

So, after I recognize that a temporal join is required, I need to be
able to identify the << operator. But how? In other languages, it would
probably be done with something like an "interface", but we don't have
that concept here. The internals generally use operators attached to the
default btree opclass, but I don't think that works very well here.

The way I see it, we have two approaches:
 1. Try to make the current system work by standardizing the strategy
numbers for GiST somehow, and then use the default GiST operator class,
if available.
 2. Invent a new system, perhaps interfaces, perhaps something else.
 3. Use extra flags in CREATE OPERATOR somehow

Thoughts?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] client_lc_messages

2009-10-22 Thread Peter Eisentraut
On tor, 2009-10-22 at 10:59 -0300, Alvaro Herrera wrote:
> What I am wondering right now is whether we could have two separate
> attributes, one SUSET defining what goes to the log, and another one
> USERSET defining what's sent to the client.

Note that only glibc supports switching the language at run time.  And
doing it is probably very expensive if you want to do it twice per
message.

I think you could probably get much of the use case out of this if you
concentrate on making two switches for the client and the log, which can
be set to a language or "untranslated", and if you choose a language it
has to be the same for both.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] table corrupted

2009-10-22 Thread Robert Haas
On Thu, Oct 22, 2009 at 11:00 PM, João Eugenio Marynowski
 wrote:
> I don't be able to select the data from table with pg_dump or select, both
> occur error...
> how i can identified the register of a table independent the contends of the
> register, only control of the postgres?
> I eliminate the index pk but now I don't be able to create begin because
> show the error:
> ERROR:  index row requires 21776744 bytes, maximum size is 8191
> How increase the index row?

Well, that's not really the problem.  Your data is corrupted -
increasing the index row size is not going to fix it.

I'm not really knowledgeable enough about the guts of the database to
know whether there are lower-level tools that could be used to rescue
your data.  I wonder if you'd have any luck selecting data a few rows
at a time (LIMIT 100, say, without ORDER BY).  That might at least
enable you to get some of the data out of there, if there are some
pages that are undamaged.  But I'm grasping at straws here.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] (WIP) VACUUM REWRITE - CLUSTER by ctid

2009-10-22 Thread Itagaki Takahiro
I'm working on alternative version of VACUUM FULL, which is
like CLUSTER but sort tuples in ctid order without index.
The original discussion is here:
[HACKERS] Feedback on getting rid of VACUUM FULL
http://archives.postgresql.org/pgsql-hackers/2009-09/msg01047.php

WIP patch attached. I have some questions over the development:

 1. Syntax: I choose "CLUSTER tbl WITHOUT INDEX" for the syntax,
but it is debatable. What syntax is the best?
  VACUUM REWRITE? CLUSTER ORDER BY ctid? or replace VACUUM FULL?

 2. Superclass of HeapScanDesc and IndexScanDesc:
We don't have an abstraction layer of HeapScanDesc and IndexScanDesc,
but the layer is useful for this purpose. Is it reasonable?
It is partially implemented as genam_beginscan() family in the patch.

 3. Should we allow "ctid" as a default clustered index?
We could assume "ctid" as a virtual index. The syntax for it
might be "ALTER TABLE tbl CLUSTER ON COLUMN ctid" or so.

Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


cluster-without-index_20091023.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] table corrupted

2009-10-22 Thread João Eugenio Marynowski
I don't be able to select the data from table with pg_dump or select, both
occur error...
how i can identified the register of a table independent the contends of the
register, only control of the postgres?
I eliminate the index pk but now I don't be able to create begin because
show the error:
ERROR:  index row requires 21776744 bytes, maximum size is 8191
How increase the index row?

--
João Eugenio Marynowski
(41) 9624-3933


2009/10/23 Robert Haas 

> On Thu, Oct 22, 2009 at 10:34 PM, João Eugenio Marynowski
>  wrote:
> > The hardware is ok because we have migrated to other machine and the
> errors
> > is the same
> > zero_damaged_pages is on and the vaccum verbose don't show any error...
>
> You need to dump (or otherwise extract) your data and load it into a
> new cluster.  The one you have sounds like it's in bad shape.
>
> ...Robert
>


Re: [HACKERS] table corrupted

2009-10-22 Thread Robert Haas
On Thu, Oct 22, 2009 at 10:34 PM, João Eugenio Marynowski
 wrote:
> The hardware is ok because we have migrated to other machine and the errors
> is the same
> zero_damaged_pages is on and the vaccum verbose don't show any error...

You need to dump (or otherwise extract) your data and load it into a
new cluster.  The one you have sounds like it's in bad shape.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] table corrupted

2009-10-22 Thread João Eugenio Marynowski
The hardware is ok because we have migrated to other machine and the errors
is the same
zero_damaged_pages is on and the vaccum verbose don't show any error...

2009/10/22 Joshua D. Drake 

> On Thu, 2009-10-22 at 14:28 -0200, João Eugenio Marynowski wrote:
> > Hi
> >
> Repair? Not likely. Get past? Maybe.
>
> set zero_damaged_pages to on;
> vacuum verbose;
>
> I would do a hardware check too.
>
> Joshua D. Drake
>
>
> > Can someone help me how to repair the problem below, I'm using
> > Postgres 8.2.5:
> > - after appeared the erros below in selects, vacuum and dump in one
> > table:
> > 2009-10-16 16:07:06 BRT 192.168.0.87 ERROR:  could not access status
> > of transaction 29024764
> > 2009-10-16 16:07:06 BRT 192.168.0.87 DETAIL:  Could not open file
> > "pg_clog/001B": No such file or directory.
> > 2009-10-16 16:07:06 BRT 192.168.0.87 STATEMENT:  select ...
> > 2009-10-16 16:11:47 BRT 192.168.0.29 ERROR:  invalid page header in
> > block 462821 of relation "..."
> > 2009-10-16 16:11:47 BRT 192.168.0.29 STATEMENT:  select 
> > I created the file pg_clog/001B with 256kB of /dev/zero
> > That resolve the problem with vacuum but began other error in selects
> > and dump to the same table ended all connections and stay up after
> > showing the error:
> > 2009-10-19 13:50:03 BRT  LOG:  server process (PID 1544) was
> > terminated by signal 11
> > 2009-10-19 13:50:03 BRT  LOG:  terminating any other active server
> > processes
> > 2009-10-19 13:50:03 BRT 192.168.0.253 WARNING:  terminating connection
> > because of crash of another server process
> > 2009-10-19 13:50:03 BRT 192.168.0.253 DETAIL:  The postmaster has
> > commanded this server process to roll back the current transaction and
> > exit, because another server process exited abnormally and possibly
> > corrupted shared memory.
> > 2009-10-19 13:50:03 BRT 192.168.0.253 HINT:  In a moment you should be
> > able to reconnect to the database and repeat your command.
> > Was habilited the zero_damage_pages option then executed selects,
> > vacuums, and dumps but not changed...
> > Was identified 2 register that if refered cause error.
> > The BD was restored in backup server with 8.2.7 and executed vacuums
> > ok but select and reindex crashed...
> >
> > Instaled 8.3.8 version and used pg_dump but error
> > And then the select below show the problem where the codentrega from
> > where clause differ from select answer:
> >
> > LOGIST=# select codentrega from entregas where codentrega='9879622';
> >  codentrega
> > 
> >  z879622
> > (1 registro)
> >
> > Any idea?
>
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
> Consulting, Training, Support, Custom Development, Engineering
> If the world pushes look it in the eye and GRR. Then push back harder. -
> Salamander
>
>


Re: [HACKERS] per table random-page-cost?

2009-10-22 Thread Alvaro Herrera
Greg Stark escribió:

> There is another use case which perhaps needs to be addressed: if the
> user has some queries which are very latency sensitive and others
> which are not latency sensitive. In that case it might be very
> important to keep the pages of data used by the high priority queries
> in the cache. That's something we should have a high level abstract
> interface for, not depend on low level system features.

Yeah.

I wonder if the right thing for this is to mark the objects (pages), or
the queries, as needing special attention.  If you mark the queries,
then perhaps they could behave slightly differently like adding +2 or so
to buffer usage count instead of +1, so that they take longer than a
normal buffer in getting evicted.  This way you don't force the admin to
figure out what's the right size ratio for different named caches, etc.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] client_lc_messages

2009-10-22 Thread Alvaro Herrera
Magnus Hagander wrote:
> 2009/10/22 Alvaro Herrera :

> > What I am wondering right now is whether we could have two separate
> > attributes, one SUSET defining what goes to the log, and another one
> > USERSET defining what's sent to the client.

> This would allow me to have the client messages in one language and
> the log in English, right? If so, +100 or sometihng like that, if it's
> not too hard to do.

Yes.

> Meaning it's not just encoding, but actual translation?

Yeah, they would both have to be in server encoding I guess.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] client_lc_messages

2009-10-22 Thread Magnus Hagander
2009/10/22 Alvaro Herrera :
> So right now we have a single GUC determining the language that log
> messages are in, and it is PGC_SUSET to avoid a user from sending
> messages to the log that the DBA cannot read.
>
> However, this means that the client cannot get the messages in the
> language of his choice.
>
> What I am wondering right now is whether we could have two separate
> attributes, one SUSET defining what goes to the log, and another one
> USERSET defining what's sent to the client.
>
> This would require a bit of shuffling, because right now we translate
> the message pretty early in the call chain -- way before we try to send
> it to client or server.  This would need to be postponed, or
> alternatively, we would need to store a second, untranslated copy so
> that another translation can take place later, if necessary.
>
> Thoughts?

This would allow me to have the client messages in one language and
the log in English, right? If so, +100 or sometihng like that, if it's
not too hard to do. Meaning it's not just encoding, but actual
translation?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] table corrupted

2009-10-22 Thread Joshua D. Drake
On Thu, 2009-10-22 at 14:28 -0200, João Eugenio Marynowski wrote:
> Hi
> 
Repair? Not likely. Get past? Maybe.

set zero_damaged_pages to on;
vacuum verbose;

I would do a hardware check too.

Joshua D. Drake


> Can someone help me how to repair the problem below, I'm using
> Postgres 8.2.5:
> - after appeared the erros below in selects, vacuum and dump in one
> table:
> 2009-10-16 16:07:06 BRT 192.168.0.87 ERROR:  could not access status
> of transaction 29024764
> 2009-10-16 16:07:06 BRT 192.168.0.87 DETAIL:  Could not open file
> "pg_clog/001B": No such file or directory.
> 2009-10-16 16:07:06 BRT 192.168.0.87 STATEMENT:  select ...
> 2009-10-16 16:11:47 BRT 192.168.0.29 ERROR:  invalid page header in
> block 462821 of relation "..."
> 2009-10-16 16:11:47 BRT 192.168.0.29 STATEMENT:  select 
> I created the file pg_clog/001B with 256kB of /dev/zero 
> That resolve the problem with vacuum but began other error in selects
> and dump to the same table ended all connections and stay up after
> showing the error:
> 2009-10-19 13:50:03 BRT  LOG:  server process (PID 1544) was
> terminated by signal 11
> 2009-10-19 13:50:03 BRT  LOG:  terminating any other active server
> processes
> 2009-10-19 13:50:03 BRT 192.168.0.253 WARNING:  terminating connection
> because of crash of another server process
> 2009-10-19 13:50:03 BRT 192.168.0.253 DETAIL:  The postmaster has
> commanded this server process to roll back the current transaction and
> exit, because another server process exited abnormally and possibly
> corrupted shared memory.
> 2009-10-19 13:50:03 BRT 192.168.0.253 HINT:  In a moment you should be
> able to reconnect to the database and repeat your command.
> Was habilited the zero_damage_pages option then executed selects,
> vacuums, and dumps but not changed...
> Was identified 2 register that if refered cause error.
> The BD was restored in backup server with 8.2.7 and executed vacuums
> ok but select and reindex crashed...
> 
> Instaled 8.3.8 version and used pg_dump but error
> And then the select below show the problem where the codentrega from
> where clause differ from select answer:
> 
> LOGIST=# select codentrega from entregas where codentrega='9879622';
>  codentrega
> 
>  z879622
> (1 registro)
> 
> Any idea?


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - 
Salamander


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] table corrupted

2009-10-22 Thread João Eugenio Marynowski
Hi

Can someone help me how to repair the problem below, I'm using Postgres
8.2.5:
- after appeared the erros below in selects, vacuum and dump in one table:
2009-10-16 16:07:06 BRT 192.168.0.87 ERROR:  could not access status of
transaction 29024764
2009-10-16 16:07:06 BRT 192.168.0.87 DETAIL:  Could not open file
"pg_clog/001B": No such file or directory.
2009-10-16 16:07:06 BRT 192.168.0.87 STATEMENT:  select ...
2009-10-16 16:11:47 BRT 192.168.0.29 ERROR:  invalid page header in block
462821 of relation "..."
2009-10-16 16:11:47 BRT 192.168.0.29 STATEMENT:  select 
I created the file pg_clog/001B with 256kB of /dev/zero
That resolve the problem with vacuum but began other error in selects and
dump to the same table ended all connections and stay up after showing the
error:
2009-10-19 13:50:03 BRT  LOG:  server process (PID 1544) was terminated by
signal 11
2009-10-19 13:50:03 BRT  LOG:  terminating any other active server processes
2009-10-19 13:50:03 BRT 192.168.0.253 WARNING:  terminating connection
because of crash of another server process
2009-10-19 13:50:03 BRT 192.168.0.253 DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2009-10-19 13:50:03 BRT 192.168.0.253 HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
Was habilited the zero_damage_pages option then executed selects, vacuums,
and dumps but not changed...
Was identified 2 register that if refered cause error.
The BD was restored in backup server with 8.2.7 and executed vacuums ok but
select and reindex crashed...
Instaled 8.3.8 version and used pg_dump but error
And then the select below show the problem where the codentrega from where
clause differ from select answer:
 LOGIST=# select codentrega from entregas where codentrega='9879622';
 codentrega

 z879622
(1 registro)

Any idea?


Re: [HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-22 Thread Tom Lane
Dimitri Fontaine  writes:
> But it will set GET DIAGNOSTIC ... = ROW_COUNT, am I being told on IRC.

This has been discussed before, please read archives.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-22 Thread Joshua D. Drake
On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote:
> On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain
>  wrote:
> > Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
> >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank  wrote:
> >> > Currently random_page_cost is a GUC. I propose that this could be set
> >> > per-table.
> >>
> >> Or per-tablespace.
> >>
> >> Yes, I think there are a class of GUCs which describe the physical
> >> attributes of the storage system which should be per-table or
> >> per-tablespace. random_page_cost, sequential_page_cost,
> >> effective_io_concurrency come to mind.
> >
> > and, perhaps effective_cache_size.
> >
> > You can have situation where you don't want some tables go to OS memory (you
> > can disabled that at filesystem level, ... l'd like to be able to do that at
> > postgres level but it is another point)
> >
> > So you put those tables in a separate tablespace, and tell postgresql that 
> > the
> > effective_cache_size is 0 (for this tablespace), up to postgres to do the 
> > right
> > thing with that ;)
> 
> Why would you ever want to set effective_cache_size to 0?

I think this is a misunderstanding of how effective_cache_size works. I
can't think of any reason to do that. I could see a reason to tell the
OS to not throw a relation into cache but that is a different thing.

Joshua D. Drake


> 
> ...Robert
> 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - 
Salamander


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Writeable CTEs, again

2009-10-22 Thread Marko Tiikkaja

Hi,

Attached is a WIP patch which implements writeable CTEs.  This patch has
some defects I'll be discussing below.  Also, I haven't implemented the
grammar changes for using WITH ( .. RETURNING ) in non-SELECT queries
yet.

What's not obvious from the patch:
  - estate->es_result_relation_info is currently only set during
EvalPlanQual().  ModifyTable nodes have an array of
ResultRelInfos they will be operating on.  That array is part of
estate->es_result_relations.
  - I removed resultRelations from PlannerInfo completely because I
didn't find use for it any more.  That list is now stored first
in ModifyTable nodes, and then added to PlannerGlobal's
new resultRelations list during set_plan_refs().

Currently, we don't allow DO ALSO SELECT .. rules for SELECT queries.
But with this patch you could have a top-level SELECT which results in
multiple SELECTs when the DML operations inside CTEs are rewritten.
Consider this example:

=> CREATE RULE additional_select AS ON INSERT TO foo DO ALSO SELECT *
FROM bar;

=> WITH t AS (INSERT INTO foo VALUES(0) RETURNING *) SELECT * FROM t;

INSERT INTO foo VALUES(0) is ran first, but the results of that are
ignored.  What you actually see is the output of SELECT * FROM bar which
is certainly surprising.  What do you think should happen here?
INSERT/UPDATE/DELETE works as expected; both queries are ran but you get
the output of SELECT * FROM t;

Currently we also only allow cursors for simple SELECT queries.  IMHO we
should also allow cursor for SELECT queries like the one above; the
INSERT is run to completion first, but then the user could use a cursor
to scan through the RETURNING tuples.  I haven't looked into this very
thoroughly yet, but I don't see any obvious problems.

I'd appreciate any input.

Regards,
Marko Tiikkaja

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index b2741bc..111ed6a 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -1499,7 +1499,7 @@ SELECT 3, 'three';
 
 SELECT select_list FROM table_expression
 
-   and can appear anywhere a SELECT can.  For example, you can
+   and can appear anywhere a SELECT can.  For example, you can
use it as part of a UNION, or attach a
sort_specification (ORDER BY,
LIMIT, and/or OFFSET) to it.  VALUES
@@ -1529,10 +1529,11 @@ SELECT select_list FROM table_expression
   
 
   
-   WITH provides a way to write subqueries for use in a larger
-   SELECT query.  The subqueries can be thought of as defining
-   temporary tables that exist just for this query.  One use of this feature
-   is to break down complicated queries into simpler parts.  An example is:
+   WITH provides a way to write subqueries for use in a
+   larger query.  The subqueries can be thought of as defining
+   temporary tables that exist just for this query.  One use of this
+   feature is to break down complicated queries into simpler parts.
+   An example is:
 
 
 WITH regional_sales AS (
@@ -1560,6 +1561,28 @@ GROUP BY region, product;
   
 
   
+  WITH clauses can also have a
+  INSERT, UPDATE,
+  DELETE(each optionally with a
+  RETURNING clause) in them.  The example below
+  moves rows from the main table, foo_log into a partition,
+  foo_log_200910.
+
+
+WITH t AS (
+DELETE FROM foo_log
+WHERE
+foo_date >= '2009-10-01' AND
+foo_date <  '2009-11-01'
+RETURNING *
+)
+INSERT INTO foo_log_200910
+SELECT * FROM t;
+
+
+  
+
+  
The optional RECURSIVE modifier changes WITH
from a mere syntactic convenience into a feature that accomplishes
things not otherwise possible in standard SQL.  Using
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 05d90dc..29cc9db 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -58,7 +58,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionand with_query is:
 
-with_query_name [ ( column_name [, ...] ) ] AS ( select )
+with_query_name [ ( column_name [, ...] ) ] AS ( select | (insert | update | delete [ RETURNING...]))
 
 TABLE { [ ONLY ] table_name [ * ] | with_query_name }
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9100dd9..78d2344 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2160,7 +2160,8 @@ CopyFrom(CopyState cstate)
 			heap_insert(cstate->rel, tuple, mycid, hi_options, bistate);
 
 			if (resultRelInfo->ri_NumIndices > 0)
-recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
+recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+	   slot, &(tuple->t_self),
 	   estate, false);
 
 			/* AFTER ROW INSERT Triggers */
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 756c65c..2446c6f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2159,7 +2159,7 @@ ltrmark:;
 	TupleTableSlot *epqslot;
 
 	epqslot = EvalPlanQual(estate,

[HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-22 Thread Dimitri Fontaine

But it will set GET DIAGNOSTIC ... = ROW_COUNT, am I being told on IRC.

I was really suprised FOUND is not set by EXECUTE in 8.3 when doing a
partitioning UPDATE trigger (we partition a summary table and have to
see about doing UPSERT).

As I wouldn't have figured GET DIAGNOSTIC was the way to go, I had to
resort to unclean method, namely EXECUTE ... RETURNING ... INTO then
checking the variable against NULL.

Why isn't that a bug again, that GET DIAGNOSTIC and FOUND are not
exposing the same reality?
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-22 Thread Greg Stark
Well I think we need sone way to accomplish the same high level goal  
of guaranteeing response times for latency-critical queries.


However my point is that cache policy is an internal implementation  
detail we don't want to expose in a user interface.


--
Greg

On 2009-10-22, at 11:41 AM, "Kevin Grittner" > wrote:



Greg Stark  wrote:


There is another use case which perhaps needs to be addressed: if
the user has some queries which are very latency sensitive and
others which are not latency sensitive.


Yes.  Some products allow you to create a named cache and bind
particular objects to it.  This can be used both to keep a large
object with a low cache hit rate from pushing other things out of the
cache or to create a pseudo "memory resident" set of objects by
binding them to a cache which is sized a little bigger than those
objects.  I don't know if you have any other suggestions for this
problem, but the named cache idea didn't go over well last time it was
suggested.

In all fairness, PostgreSQL does a good enough job in general that I
haven't missed this feature nearly as much as I thought I would; and
its absence means one less thing to worry about keeping properly
tuned.

-Kevin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Controlling changes in plpgsql variable resolution

2009-10-22 Thread Dimitri Fontaine
Andrew Dunstan  writes:
> I don't see why it feels any more foreign than, say, #pragma in C.
>
> And it's something we already have, albeit undocumented.
>
> Let's not get too hung up on syntax.

Ok just wanted to have this syntax part explicitely talked about, I
don't have strong opinions about it. I sure don't find it nice, but that
doesn't change the fact that it's there and cheap to use ;)

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-22 Thread Robert Haas
On Thu, Oct 22, 2009 at 2:28 PM, Josh Berkus  wrote:
> All,
>
> Wouldn't per *tablespace* costs make more sense?
>
> --Josh

Yes, we already had several votes in favor of that approach.  See upthread.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-22 Thread Kevin Grittner
Greg Stark  wrote:
 
> There is another use case which perhaps needs to be addressed: if
> the user has some queries which are very latency sensitive and
> others which are not latency sensitive.
 
Yes.  Some products allow you to create a named cache and bind
particular objects to it.  This can be used both to keep a large
object with a low cache hit rate from pushing other things out of the
cache or to create a pseudo "memory resident" set of objects by
binding them to a cache which is sized a little bigger than those
objects.  I don't know if you have any other suggestions for this
problem, but the named cache idea didn't go over well last time it was
suggested.
 
In all fairness, PostgreSQL does a good enough job in general that I
haven't missed this feature nearly as much as I thought I would; and
its absence means one less thing to worry about keeping properly
tuned.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-22 Thread Josh Berkus
All,

Wouldn't per *tablespace* costs make more sense?

--Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Reversing flow of WAL shipping

2009-10-22 Thread Greg Stark
[moving to pgsql-hackers]

On Thu, Oct 22, 2009 at 10:28 AM, David Jantzen  wrote:
> Thanks for the quick response Joshua, much appreciated.  Is there any
> way to avoid or minimize a period without a warm standby when I switch
> to Server B for production?  What about rsyncing the data directory
> from Server B to Server C after B goes live?

This is definitely something that needs to be addressed in future
versions of Postgres, especially once we have Hot
Standby/Read-Only-Slaves and Sync-Replication. People will want to
have dozens of active slaves and rebuilding them all every time a
failover happens will be unbearable. Not to mention that in the
meantime they have to live with no redundancy.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual seems a tad broken

2009-10-22 Thread Tom Lane
I wrote:
> [ EvalPlanQual does not work well ]

I'm planning to go back to work on this now that we're out of
CommitFest.

> We could improve that by feeding successfully locked rows into the EPQ
> machinery as well as ones that were found to be outdated.  But that
> would still leave us with two failure cases:

> 1. if some of the tables being joined are not selected FOR UPDATE.

> 2. if the select involves any set-returning functions in the targetlist.

> I think we could get around #1 by having *all* tables in the query
> marked FOR UPDATE at least in a dummy form, ie give them entries in
> the rowMarks list and create junk tlist entries to report their current
> ctid.  Then we'd feed all the relevant rows into the EPQ machinery.
> We'd just not lock the ones we weren't asked to lock.

On further review it seems that a better way to do this is to make
things happen inside the EPQ machinery.  We need to "freeze" the rows
returned by *all* scan nodes, not only the ones referencing real tables
--- for example, a join against a VALUES scan node would still be a
problem if we don't lock down the VALUES output, since we could end up
getting multiple join rows out.  This means we can't assume that there
is a CTID associated with every scan node that EPQ needs to lock down.
What looks like it would work instead is to pass through the current
scan tuple for every scan plan node, not only the ones that are FOR
UPDATE targets.  I'm tempted to try to move the responsibility for this
into execScan.c instead of having all the individual scan plan types
know about it.

> I do not see any very good way around #2.  I'm tempted to propose
> that we just forbid SRFs in the targetlist of a FOR UPDATE query.
> This could be justified on the same grounds that we forbid aggregate
> functions there, ie, they destroy the one-to-one correspondence between
> table rows and SELECT output rows.  If you really had to have it you
> could do something like
>   select srf(...) from (select ... for update) ss;

This still seems like a necessary restriction unfortunately :-(.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-22 Thread Greg Stark
On Thu, Oct 22, 2009 at 8:16 AM, Cédric Villemain
 wrote:
> You can have situation where you don't want some tables go to OS memory

I don't think this is a configuration we want to cater for. The
sysadmin shouldn't be required to understand the i/o pattern of
postgres. He or she cannot know whether the database will want to
access the same blocks twice for internal algorithms that isn't
visible from the user point of view.

The scenarios where you might want to do this would be if you know
there are tables which are accessed very randomly with no locality and
very low cache hit rates. I think the direction we want to head is
towards making sure the cache manager is automatically resistant to
such data.

There is another use case which perhaps needs to be addressed: if the
user has some queries which are very latency sensitive and others
which are not latency sensitive. In that case it might be very
important to keep the pages of data used by the high priority queries
in the cache. That's something we should have a high level abstract
interface for, not depend on low level system features.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-tree leaf node structure

2009-10-22 Thread Jeff Davis
On Wed, 2009-10-21 at 23:55 -0700, edwardyf wrote:
> If the index is on an attribute with duplicate values. will it be:
> 1) one index tuple for each row, though with the same value, or
> 2) one index tuple for each value, containing a list of row ids.

As Tom already pointed out, #1 is the answer.

However, I'd like to add that there's a feature that never quite made it
called Grouped Index Tuples (GIT) that might still be viable:

http://community.enterprisedb.com/git/

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Marc Munro
Just to intoduce myself, I'm Marc Munro the developer of Veil, a
postgres add-in that allows you to implement virtual private databases
using views.

The problem we are discussing here is the existence of covert or
side-channels being available from functions that can leak information
about the rows they see, even though the end-user may not see those
rows.  These can be built-ins such as set_config() (thanks, Heikki) or
user-defined.

I assert that any attempt to use a secured-view's where-clause
conditions before any other conditions are applied will lead to poor
performance.

Here is a typical veil secured view definition:

create view parties as
SELECT party_id, client_id, party_type_id, username, party_name
FROM parties.parties
WHERE api.user_has_client_or_personal_privilege(client_id,
party_id, 'select parties') 
OR api.user_has_client_privilege(party_id, 'select clients');

A typical query against this would be:

select * from parties where party_id = 42;

The conditions in the view's where clause cannot generally be indexed.
Applying those conditions before the user-supplied conditions would mean
that a full-table scan would be required and performance would suck.  In
fact, this very suckiness also exposes a covert channel in that now we
can use the performance of the query to estimate the number of party
records.

The most acceptable solution I have heard so far for this issue, is to
identify those functions which can leak information as 'insecure', and 
those views which are for security purpose as 'secured'.  Then it is
simply (hah!) a matter of planning the query of secured views so that
all insecure functions are called after all secure functions.  In this
way, they will only be able to leak what the user is entitled to see,
and performance will only be as badly affected as is strictly necessary.

__
Marc


signature.asc
Description: This is a digitally signed message part


[HACKERS] table corrupted

2009-10-22 Thread João Eugenio Marynowski
Hi everybody,
Can someone help me how to repair the problem below, I'm using Postgres
8.2.5:
- after appeared the erros below in selects, vacuum and dump in one table:
2009-10-16 16:07:06 BRT 192.168.0.87 ERROR:  could not access status of
transaction 29024764
2009-10-16 16:07:06 BRT 192.168.0.87 DETAIL:  Could not open file
"pg_clog/001B": No such file or directory.
2009-10-16 16:07:06 BRT 192.168.0.87 STATEMENT:  select ...
2009-10-16 16:11:47 BRT 192.168.0.29 ERROR:  invalid page header in block
462821 of relation "..."
2009-10-16 16:11:47 BRT 192.168.0.29 STATEMENT:  select 
I created the file pg_clog/001B with 256kB of /dev/zero
That resolve the problem with vacuum but began other error in selects and
dump to the same table ended all connections and stay up after showing the
error:
2009-10-19 13:50:03 BRT  LOG:  server process (PID 1544) was terminated by
signal 11
2009-10-19 13:50:03 BRT  LOG:  terminating any other active server processes
2009-10-19 13:50:03 BRT 192.168.0.253 WARNING:  terminating connection
because of crash of another server process
2009-10-19 13:50:03 BRT 192.168.0.253 DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2009-10-19 13:50:03 BRT 192.168.0.253 HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
Was habilited the zero_damage_pages option then executed selects, vacuums,
and dumps but not changed...
Was identified 2 register that if refered cause error.
The BD was restored in backup server with 8.2.7 and executed vacuums ok but
select and reindex crashed...
Instaled 8.3.8 version and used pg_dump but error
And then the select below show the problem where the codentrega from where
clause differ from select answer:
 LOGIST=# select codentrega from entregas where codentrega='9879622';
 codentrega

 z879622
(1 registro)

Any idea?


Re: [HACKERS] some possible parser cleaning: drop support column(table) syntax

2009-10-22 Thread Pavel Stehule
2009/10/22 Tom Lane :
> Pavel Stehule  writes:
>> 2009/10/22 Tom Lane :
>>> It is not --- it's the recommended solution for emulating computed
>>> columns.  The column-as-f(x) direction maybe isn't terribly exciting,
>>> but the function-as-x.col direction definitely is.
>
>> is it documented?
>
> Yes, see "SQL Functions on Composite Types" (34.4.2 as of CVS HEAD),
> down near the end of that subsection.
>

ok, thank you
Pavel

>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some possible parser cleaning: drop support column(table) syntax

2009-10-22 Thread Tom Lane
Pavel Stehule  writes:
> 2009/10/22 Tom Lane :
>> It is not --- it's the recommended solution for emulating computed
>> columns.  The column-as-f(x) direction maybe isn't terribly exciting,
>> but the function-as-x.col direction definitely is.

> is it documented?

Yes, see "SQL Functions on Composite Types" (34.4.2 as of CVS HEAD),
down near the end of that subsection.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-22 Thread Robert Haas
On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain
 wrote:
> Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
>> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank  wrote:
>> > Currently random_page_cost is a GUC. I propose that this could be set
>> > per-table.
>>
>> Or per-tablespace.
>>
>> Yes, I think there are a class of GUCs which describe the physical
>> attributes of the storage system which should be per-table or
>> per-tablespace. random_page_cost, sequential_page_cost,
>> effective_io_concurrency come to mind.
>
> and, perhaps effective_cache_size.
>
> You can have situation where you don't want some tables go to OS memory (you
> can disabled that at filesystem level, ... l'd like to be able to do that at
> postgres level but it is another point)
>
> So you put those tables in a separate tablespace, and tell postgresql that the
> effective_cache_size is 0 (for this tablespace), up to postgres to do the 
> right
> thing with that ;)

Why would you ever want to set effective_cache_size to 0?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some possible parser cleaning: drop support column(table) syntax

2009-10-22 Thread Pavel Stehule
2009/10/22 Tom Lane :
> Pavel Stehule  writes:
>> 2009/10/22 Tom Lane :
>>> Pavel Stehule  writes:
 this is syntax column(table) necessary still?
>
>>> There is no reason to remove that.
>
>> do you know somebody who use it? It is dead code.
>
> It is not --- it's the recommended solution for emulating computed
> columns.  The column-as-f(x) direction maybe isn't terribly exciting,
> but the function-as-x.col direction definitely is.
>

is it documented?

regards
Pavel Stehule

>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-22 Thread Cédric Villemain
Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank  wrote:
> > Currently random_page_cost is a GUC. I propose that this could be set
> > per-table.
> 
> Or per-tablespace.
> 
> Yes, I think there are a class of GUCs which describe the physical
> attributes of the storage system which should be per-table or
> per-tablespace. random_page_cost, sequential_page_cost,
> effective_io_concurrency come to mind.

and, perhaps effective_cache_size. 

You can have situation where you don't want some tables go to OS memory (you 
can disabled that at filesystem level, ... l'd like to be able to do that at 
postgres level but it is another point) 

So you put those tables in a separate tablespace, and tell postgresql that the 
effective_cache_size is 0 (for this tablespace), up to postgres to do the right 
thing with that ;)


> 
> While this isn't a simple flag to change it does seem like a bit of a
> SMOP. The GUC infrastructure stores these values in global variables
> which the planner and other systems consult directly. They would
> instead have to be made storage parameters which the planner and other
> systems check on the appropriate table and default to the global GUC
> if they're not set.
> 



-- 
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] B-tree leaf node structure

2009-10-22 Thread Tom Lane
edwardyf  writes:
> If the index is on an attribute with duplicate values. will it be:
> 1) one index tuple for each row, though with the same value, or
> 2) one index tuple for each value, containing a list of row ids.

1.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-22 Thread Cédric Villemain
Le lundi 19 octobre 2009 23:14:40, Robert Haas a écrit :
> On Mon, Oct 19, 2009 at 5:08 PM, marcin mank  wrote:
> > Currently random_page_cost is a GUC. I propose that this could be set
> > per-table.
> >
> > I think this is a good idea for widely-wanted planner hints. This way
> > You can say "I do NOT want this table to be index-scanned, because I
> > know it is not cached" by setting it`s random_page_cost to a large
> > value (an obviously You can do the other way around, when setting the
> > random_page_cost to 1 You say "I don`t care how You fetch the pages,
> > they are all in cache")
> >
> > The value for the per-table setting could be inferred from
> > pg_stat(io)?.*tables . We could have a tool to suggest appropriate
> > values.
> >
> > We could call it something like cached_percentage (and have the cost
> > of a random tuple fetch be inferred from the global random_page_cost,
> > seq_tuple_cost and the per-table cached_percentage). Then we could set
> > the global random_page_cost to a sane value like 200. Now one can
> > wonder why the planner works while having such blantantly unrealistic
> > values for random_page_cost :)
> >
> > What do You think?
> 
> I've been thinking about this a bit, too.  I've been wondering if it
> might make sense to have a "random_page_cost" and "seq_page_cost"
> setting for each TABLESPACE, to compensate for the fact that different
> media might be faster or slower, and a percent-cached setting for each
> table over top of that.

At least settings by TABLESPACE should exists. I totaly agree with that.

> 
> ...Robert
> 

-- 
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] some possible parser cleaning: drop support column(table) syntax

2009-10-22 Thread Robert Haas
On Thu, Oct 22, 2009 at 10:22 AM, Pavel Stehule  wrote:
> 2009/10/22 Tom Lane :
>> Pavel Stehule  writes:
>>> this is syntax column(table) necessary still?
>>
>> There is no reason to remove that.
>>
>
> do you know somebody who use it? It is dead code. Processing of call
> functions should be simpler.

"Dead code" means code that can't actually be reached, not code that
does something you don't like.  :-)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-22 Thread Cédric Villemain
Le mardi 20 octobre 2009 06:30:26, Greg Smith a écrit :
> On Mon, 19 Oct 2009, Jeff Davis wrote:
> > On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote:
> >> I'd bet accounts receivable applications often hit that.
> >> (Most payments on recent billings; a sprinkling on older ones.)
> >> I'm sure there are others.
> >
> > You worded the examples in terms of writes (I think), and we're talking
> > about read caching, so I still don't entirely understand.
> 
> No, that part was fair.  The unfortunate reality of accounts receivable is
> that reports run to list people who owe one money happen much more often
> than posting payments into the system does.
> 
> > Also, the example sounds like you'd like to optimize across queries.
> > There's no mechanism for the planner to remember some query executed a
> > while ago, and match it up to some new query that it's trying to plan.
> 
> Some of the use-cases here involve situations where you know most of a
> relation is likely to be in cache just because there's not much going on
> that might evict it.  In any case, something that attempts to model some
> average percentage you can expect a relation to be in cache is in effect
> serving as a memory of past queries.
> 
> > I'm not clear on the scenario that we're trying to improve.
> 
> Duh, that would be the situation where someone wants optimizer hints but
> can't call them that because then the idea would be reflexively rejected!
> 
> Looks like I should dust off the much more complicated proposal for
> tracking and using in-cache hit percentages I keep not having time to
> finish writing up.  Allowing a user-set value for that is a lot more
> reasonable if the system computes a reasonable one itself under normal
> circumstances.  That's what I think people really want, even if it's not
> what they're asking for.

Have you already some work in a git or somewhere ? 


> 
> --
> * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD
> 

-- 
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] some possible parser cleaning: drop support column(table) syntax

2009-10-22 Thread Tom Lane
Pavel Stehule  writes:
> 2009/10/22 Tom Lane :
>> Pavel Stehule  writes:
>>> this is syntax column(table) necessary still?

>> There is no reason to remove that.

> do you know somebody who use it? It is dead code.

It is not --- it's the recommended solution for emulating computed
columns.  The column-as-f(x) direction maybe isn't terribly exciting,
but the function-as-x.col direction definitely is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some possible parser cleaning: drop support column(table) syntax

2009-10-22 Thread Pavel Stehule
2009/10/22 Tom Lane :
> Pavel Stehule  writes:
>> this is syntax column(table) necessary still?
>
> There is no reason to remove that.
>

do you know somebody who use it? It is dead code. Processing of call
functions should be simpler.

Pavel

>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Controlling changes in plpgsql variable resolution

2009-10-22 Thread Robert Haas
On Thu, Oct 22, 2009 at 10:12 AM, Andrew Dunstan  wrote:
> Let's not get too hung up on syntax.

+1.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Controlling changes in plpgsql variable resolution

2009-10-22 Thread Andrew Dunstan



Dimitri Fontaine wrote:

I know I don't like #option because it looks and feels "foreign", so t
might just boils down to syntax issue for others too.


  


I don't see why it feels any more foreign than, say, #pragma in C.

And it's something we already have, albeit undocumented.

Let's not get too hung up on syntax.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some possible parser cleaning: drop support column(table) syntax

2009-10-22 Thread Tom Lane
Pavel Stehule  writes:
> this is syntax column(table) necessary still?

There is no reason to remove that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] client_lc_messages

2009-10-22 Thread Alvaro Herrera
So right now we have a single GUC determining the language that log
messages are in, and it is PGC_SUSET to avoid a user from sending
messages to the log that the DBA cannot read.

However, this means that the client cannot get the messages in the
language of his choice.

What I am wondering right now is whether we could have two separate
attributes, one SUSET defining what goes to the log, and another one
USERSET defining what's sent to the client.

This would require a bit of shuffling, because right now we translate
the message pretty early in the call chain -- way before we try to send
it to client or server.  This would need to be postponed, or
alternatively, we would need to store a second, untranslated copy so
that another translation can take place later, if necessary.

Thoughts?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] B-tree leaf node structure

2009-10-22 Thread edwardyf

If the index is on an attribute with duplicate values. will it be:
1) one index tuple for each row, though with the same value, or
2) one index tuple for each value, containing a list of row ids.

thanx
-- 
View this message in context: 
http://www.nabble.com/B-tree-leaf-node-structure-tp26004939p26004939.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Rod Taylor
>
> \c - secretary
>
> CREATE OR REPLACE FUNCTION expose_person (person text, phone text)
> RETURNS bool AS $$
> begin
>  RAISE NOTICE 'person: % number: %', person, phone;
>  RETURN true;
> END; $$ LANGUAGE plpgsql COST 0.01;
>
> postgres=>  SELECT * FROM phone_number WHERE expose_person(person, phone);
> NOTICE:  person: public person number: 12345
> NOTICE:  person: secret person number: 67890
>    person     | phone
> ---+---
>  public person | 12345
> (1 row)
>

Given RAISE is easily replaced with INSERT into a logging table or
another recording mechanism, it needs to be something to push back
execution of user based parameters OR something to push forward
security clauses.


Is there any way of exposing the information using standard SQL or is
a procedure required?



If a procedure is required, then we simply need a way of ensuring the
SECURITY clauses or functions run before all of the things which an
expose information (procedures at the moment).



How about some kind of a marker on which allows security based
constraints to be pushed forward rather than the entire view?

CREATE VIEW phone_number AS
SELECT person, phone FROM phone_data WHERE SECURITY(phone NOT LIKE '6%');


This still allows complex views and queries to be mostly optimized
with a few filters that run very early and in the order they are
defined in.

Perhaps we go one step further and encourage security filters to be
applied to the table directly where possible:

CREATE VIEW phone_number AS
SELECT person, phone
FROM phone_data USING SECURITY FILTER(phone NOT LIKE '6%');


This still allow many optimizations to be applied in complex cases. The planner

CREATE VIEW phone_number AS
SELECT person, phone, company
FROM phone_data USING SECURITY FILTER(phone NOT LIKE '6%')
   JOIN person USING (person_id)
   JOIN company USING (company_id)
AND person.active AND company.active;

\c - secretary
SELECT * FROM  phone_number WHERE company = 'Frankies Co.';


This still allows a query against phone_number to use the company data
first, find the single person (Frankie) within that company, then get
his phone number out.

The scan against phone_data would be an index scan for person_id BUT
applies the SECURITY FILTER as the node immediately around the index
scan as a Recheck Condition, similar to how bitmap scans ensure they
got the correct and only the correct information.

person.active and company.active, and the joins can still be optimized
in standard ways.


More complex SECURITY FILTER clauses might be applied in the where clause. I.e.

CREATE VIEW phone_number AS
SELECT person, phone, company
FROM phone_data USING SECURITY CLAUSE (phone NOT LIKE '6%')
   JOIN person USING (person_id)
   JOIN company USING (company_id)
 WHERE SECURITY CLAUSE (person.status = company.status)
AND person.active AND company.active;


This would result in the security check (person.status =
company.status) occurring as a filter tied to the join node for person
and company which cannot be moved around.


Layering is tricky, using the above view:

\c - secretary
CREATE VIEW company_number AS
 SELECT * FROM phone_number SECURITY CLAUSE (expose_person(person, phone));

SELECT * FROM company_number;


The security clauses are bound to run in the order they are found in
the node closes to the data they use.

phone_data is immediately run through a Recheck Cond. person/company
join node is checked immediately after. Finally, the expose_person()
function is run against the now clean data.



Oh, This all has the nice side effect of knowing what to hide in
explain analyze as well since the specific clauses are marked up.  If
the user running the query is super user or owner of the view, they
see the security clause filters. If they are not, then they get a line
like this:

SELECT * FROM phone_number WHERE phone = '555-555-';

 Bitmap Heap Scan on phone_data  (cost=14.25..61.47 rows=258 width=185)
   Security Cond: ** Hidden due to permissions **
   ->  Bitmap Index Scan on phone_data_index  (cost=0.00..14.19
rows=258 width=0)
 Index Cond: (phone = '555-555-')

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Application name patch - v3

2009-10-22 Thread Dave Page
Updated patch attached. Per discussion, this:

- Changes the envvar name to PGAPPNAME
- Removes support for setting application_name in the startup packet,
and instead sends an explicit SET command as part of the connection
setup in PQconnectPoll. In order to avoid adding to the
application-visible connection states, this is overloaded on the
CONNECTION_SETENV state which is only used in the v2 protocol at
present and seems like an ideal fit for such a similar use.

Other features are as per the last version.

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
PGDay.EU 2009 Conference: http://2009.pgday.eu/start


appname-v3.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Robert Haas
On Thu, Oct 22, 2009 at 6:03 AM, Heikki Linnakangas
 wrote:
> In chapter "36.4 Rules and Privileges" we show an example of using a
> view to expose part of a table to other users, keeping other rows private:
>
>> For example: A user has a list of phone numbers where some of them are
> private, the others are of interest for the secretary of the office. He
> can construct the following:
>>
>> CREATE TABLE phone_data (person text, phone text, private boolean);
>> CREATE VIEW phone_number AS
>>     SELECT person, phone FROM phone_data WHERE NOT private;
>> GRANT SELECT ON phone_number TO secretary;
>
> While it works for this example, if the WHERE clause in the view is more
> complex, it is possible for the secretary to circumvent the protection
> by filtering rows in a function used in the WHERE clause. If the
> function has a lower cost than the restriction in the view, it will be
> executed first and will see all the rows, even though they won't be
> present in the final result set.
>
> For example:
>
> CREATE TABLE phone_data (person text, phone text);
> CREATE VIEW phone_number AS
>    SELECT person, phone FROM phone_data WHERE phone NOT LIKE '6%';
> GRANT SELECT ON phone_number TO secretary;
>
> -- secretary should only see the first row
> INSERT INTO phone_data VALUES ('public person', '12345');
> INSERT INTO phone_data VALUES ('secret person', '67890');
>
> \c - secretary
>
> CREATE OR REPLACE FUNCTION expose_person (person text, phone text)
> RETURNS bool AS $$
> begin
>  RAISE NOTICE 'person: % number: %', person, phone;
>  RETURN true;
> END; $$ LANGUAGE plpgsql COST 0.01;
>
> postgres=>  SELECT * FROM phone_number WHERE expose_person(person, phone);
> NOTICE:  person: public person number: 12345
> NOTICE:  person: secret person number: 67890
>    person     | phone
> ---+---
>  public person | 12345
> (1 row)
>
>
> Using views for access control like this is what we've been suggesting
> to implement row-level access control for a long time. For example, Veil
> depends heavily on this.
>
>
> A related problem is that EXPLAIN ANALYZE too can reveal information
> about the underlying tables that the view doesn't reveal, even if
> there's no functions involved.
>
>
> I posted this to secur...@postgresql.org first, and the conclusion was
> to take the discussion to pgsql-hackers since there's no easy, robust
> and back-patchable solution in sight. Options discussed include:
>
> 1. Change the planner so that conditions (and join!) in the view are
> always enforced first, before executing any quals from the user-supplied
>  query. Unfortunately that would have a catastrophic effect on performance.
>
> 2. As an optimization, we could keep the current behavior if the user
> has access to all the underlying tables anyway, but that's nontrivial
> because permission checks are supposed to be executed at runtime, not
> plan time.
>
> 3. Label every function as safe or unsafe, depending on whether it can
> leak information about the arguments. Classifying functions correctly
> can be a bit tricky; e.g functions that throw an error on some input
> values could be exploited. And it's not clear how a user could label
> user-defined functions in a secure way. We'd still need the
> infrastructure of point 1 to delay evaluation of non-safe quals, so this
> is really just another optimization of it.
>
> 4. Make the behavior user-controllable, something along the lines of
> "CREATE RESTRICTED VIEW ...", to avoid the performance impact when views
> are not used for access control.

Well, I think #4 is a good start (though I don't like CREATE x VIEW -
I think that keyword should appear somewhere lower down in the
syntax), but I'm not sure where to go with it after that.  I'm not
sure you're going to be able to make it secure short of leaving the
view as an unflattened subquery.  "Catastrophic for performance" is a
charitable description...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Richard Huxton
Pavel Stehule wrote:
> 
> postgres=# create or replace function vv(int, int) returns bool as
> $$begin raise notice '% %', $1, $2; return true; end$$ language
> plpgsql COST 0.01;
> CREATE FUNCTION
> postgres=# select * from v where vv(a,b);NOTICE:  10 20
>  a │ b
> ───┼───
> (0 rows)
> 
> still I have not bad result, but, yes, I see what I could not to see.

Ah - that's the problem. It's not possible to get the "hidden" values
into the result set, but it is possible to see them. It only matters if
you are using the view to prevent access to certain rows.

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Pavel Stehule
2009/10/22 Richard Huxton :
> Pavel Stehule wrote:
>> 2009/10/22 Heikki Linnakangas :
>>> That example I ran on CVS HEAD, but it's a generic problem on all versions.
>> postgres=# select version();
>>                                                    version
>> 
>>  PostgreSQL 8.5devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
>> 4.4.1 20090725
>> (1 row)
>>
>> postgres=# select * from x;
>>  a  │ b
>> ┼
>>  10 │ 20
>> (1 row)
>>
>> postgres=# create view v as select * from x where b <> 20;
>                                                    ^^^
> This is the expression that needs to be expensive. Then the exposing
> function needs to be cheap. That makes the planner run the exposing
> function first.
>

postgres=# create or replace function vv(int, int) returns bool as
$$begin raise notice '% %', $1, $2; return true; end$$ language
plpgsql COST 0.01;
CREATE FUNCTION
postgres=# select * from v where vv(a,b);NOTICE:  10 20
 a │ b
───┼───
(0 rows)

still I have not bad result, but, yes, I see what I could not to see.

Pavel

> --
>  Richard Huxton
>  Archonet Ltd
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Richard Huxton
Richard Huxton wrote:
> Heikki Linnakangas wrote:
>> CREATE VIEW phone_number AS
>> SELECT person, phone FROM phone_data WHERE phone NOT LIKE '6%';
> 
>> CREATE OR REPLACE FUNCTION expose_person (person text, phone text)
>> RETURNS bool AS $$
>> begin
>>   RAISE NOTICE 'person: % number: %', person, phone;
>>   RETURN true;
>> END; $$ LANGUAGE plpgsql COST 0.01;
>>
>> postgres=>  SELECT * FROM phone_number WHERE expose_person(person, phone);
>> NOTICE:  person: public person number: 12345
>> NOTICE:  person: secret person number: 67890
>> person | phone
>> ---+---
>>  public person | 12345

Hmm - just using SQL (but with an expensive view filtering function):

SELECT * FROM phone_number WHERE (CASE WHEN phone = '67890' THEN
person::int ELSE 2 END)=2;
ERROR:  invalid input syntax for integer: "secret person"

You could get a related problem where a view exposes a text column full
of valid dates which the user then tries to cast to date. If the
underlying table contains non-dates you could still get an error.
Arguably the view should have handled the cast in this case though.

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Richard Huxton
Pavel Stehule wrote:
> 2009/10/22 Heikki Linnakangas :
>> That example I ran on CVS HEAD, but it's a generic problem on all versions.
> postgres=# select version();
>version
> 
>  PostgreSQL 8.5devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
> 4.4.1 20090725
> (1 row)
> 
> postgres=# select * from x;
>  a  │ b
> ┼
>  10 │ 20
> (1 row)
> 
> postgres=# create view v as select * from x where b <> 20;
^^^
This is the expression that needs to be expensive. Then the exposing
function needs to be cheap. That makes the planner run the exposing
function first.

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Pavel Stehule
2009/10/22 Heikki Linnakangas :
> That example I ran on CVS HEAD, but it's a generic problem on all versions.
postgres=# select version();
   version

 PostgreSQL 8.5devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
4.4.1 20090725
(1 row)

postgres=# select * from x;
 a  │ b
┼
 10 │ 20
(1 row)

postgres=# create view v as select * from x where b <> 20;
CREATE VIEW
postgres=# create function vv(int, int) returns bool as $$begin raise
notice '% %', $1, $2; return true; end$$ language plpgsql;
CREATE FUNCTION
postgres=# select * from v where vv(a,b);
 a │ b
───┼───
(0 rows)

postgres=# create or replace function vv(int, int) returns bool as
$$begin raise notice '% %', $1, $2; return true; end$$ language
plpgsql COST 999;
CREATE FUNCTION

postgres=# select * from v where vv(a,b); a │ b
───┼───
(0 rows)

it is ok
Pavel


>
> Pavel Stehule wrote:
>> What version do you have?
>>
>> I am cannot repeat it.
>>
>> Regards
>> Pavel Stehule
>>
>> 2009/10/22 Richard Huxton :
>>> Heikki Linnakangas wrote:
 CREATE VIEW phone_number AS
     SELECT person, phone FROM phone_data WHERE phone NOT LIKE '6%';
 CREATE OR REPLACE FUNCTION expose_person (person text, phone text)
 RETURNS bool AS $$
 begin
   RAISE NOTICE 'person: % number: %', person, phone;
   RETURN true;
 END; $$ LANGUAGE plpgsql COST 0.01;

 postgres=>  SELECT * FROM phone_number WHERE expose_person(person, phone);
 NOTICE:  person: public person number: 12345
 NOTICE:  person: secret person number: 67890
     person     | phone
 ---+---
  public person | 12345
>>> Ouch!
>>>
 1. Change the planner so that conditions (and join!) in the view are
 always enforced first, before executing any quals from the user-supplied
  query. Unfortunately that would have a catastrophic effect on performance.
>>> I have the horrible feeling that you're going to end up doing this
>>> (possibly in conjunction with #4). Once you've executed a user-defined
>>> function on a "hidden" row I think the game is lost. That might even
>>> apply to non-trivial expressions too.
>>>
 2. As an optimization, we could keep the current behavior if the user
 has access to all the underlying tables anyway, but that's nontrivial
 because permission checks are supposed to be executed at runtime, not
 plan time.

 3. Label every function as safe or unsafe, depending on whether it can
 leak information about the arguments. Classifying functions correctly
 can be a bit tricky; e.g functions that throw an error on some input
 values could be exploited.
>>> [snip]
>>>
>>> I'm sure there's a way to generate an error on-demand for rows with
>>> specific numbers. That opens you up to fishing for hidden rows.
>>>
>>> It might be possible to label a subset of operators etc as safe. I'd
>>> guess that would exclude any casts in it, and perhaps CASE. Hmm - you
>>> could probably generate a divide-by-zero or overflow error or some such
>>> for any targetted numeric value though.
>>>
 4. Make the behavior user-controllable, something along the lines of
 "CREATE RESTRICTED VIEW ...", to avoid the performance impact when views
 are not used for access control.
>>> Not pretty, but solves the problem.
>>>
>>> --
>>>  Richard Huxton
>>>  Archonet Ltd
>>>
>>> --
>>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>>
>
>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Richard Huxton
Pavel Stehule wrote:
> What version do you have?
> 
> I am cannot repeat it.

It will depend on the relative cost of the clauses (though 0.0001 should
have been enough to force it). Try:

CREATE OR REPLACE FUNCTION row_hidden (phone text) RETURNS bool AS $$
BEGIN
RETURN phone LIKE '6%';
END;
$$ LANGUAGE plpgsql COST 999;

CREATE VIEW phone_number AS
SELECT person, phone FROM phone_data WHERE NOT row_hidden(phone);


-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Heikki Linnakangas
That example I ran on CVS HEAD, but it's a generic problem on all versions.

Pavel Stehule wrote:
> What version do you have?
> 
> I am cannot repeat it.
> 
> Regards
> Pavel Stehule
> 
> 2009/10/22 Richard Huxton :
>> Heikki Linnakangas wrote:
>>> CREATE VIEW phone_number AS
>>> SELECT person, phone FROM phone_data WHERE phone NOT LIKE '6%';
>>> CREATE OR REPLACE FUNCTION expose_person (person text, phone text)
>>> RETURNS bool AS $$
>>> begin
>>>   RAISE NOTICE 'person: % number: %', person, phone;
>>>   RETURN true;
>>> END; $$ LANGUAGE plpgsql COST 0.01;
>>>
>>> postgres=>  SELECT * FROM phone_number WHERE expose_person(person, phone);
>>> NOTICE:  person: public person number: 12345
>>> NOTICE:  person: secret person number: 67890
>>> person | phone
>>> ---+---
>>>  public person | 12345
>> Ouch!
>>
>>> 1. Change the planner so that conditions (and join!) in the view are
>>> always enforced first, before executing any quals from the user-supplied
>>>  query. Unfortunately that would have a catastrophic effect on performance.
>> I have the horrible feeling that you're going to end up doing this
>> (possibly in conjunction with #4). Once you've executed a user-defined
>> function on a "hidden" row I think the game is lost. That might even
>> apply to non-trivial expressions too.
>>
>>> 2. As an optimization, we could keep the current behavior if the user
>>> has access to all the underlying tables anyway, but that's nontrivial
>>> because permission checks are supposed to be executed at runtime, not
>>> plan time.
>>>
>>> 3. Label every function as safe or unsafe, depending on whether it can
>>> leak information about the arguments. Classifying functions correctly
>>> can be a bit tricky; e.g functions that throw an error on some input
>>> values could be exploited.
>> [snip]
>>
>> I'm sure there's a way to generate an error on-demand for rows with
>> specific numbers. That opens you up to fishing for hidden rows.
>>
>> It might be possible to label a subset of operators etc as safe. I'd
>> guess that would exclude any casts in it, and perhaps CASE. Hmm - you
>> could probably generate a divide-by-zero or overflow error or some such
>> for any targetted numeric value though.
>>
>>> 4. Make the behavior user-controllable, something along the lines of
>>> "CREATE RESTRICTED VIEW ...", to avoid the performance impact when views
>>> are not used for access control.
>> Not pretty, but solves the problem.
>>
>> --
>>  Richard Huxton
>>  Archonet Ltd
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>


-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Pavel Stehule
What version do you have?

I am cannot repeat it.

Regards
Pavel Stehule

2009/10/22 Richard Huxton :
> Heikki Linnakangas wrote:
>> CREATE VIEW phone_number AS
>>     SELECT person, phone FROM phone_data WHERE phone NOT LIKE '6%';
>
>> CREATE OR REPLACE FUNCTION expose_person (person text, phone text)
>> RETURNS bool AS $$
>> begin
>>   RAISE NOTICE 'person: % number: %', person, phone;
>>   RETURN true;
>> END; $$ LANGUAGE plpgsql COST 0.01;
>>
>> postgres=>  SELECT * FROM phone_number WHERE expose_person(person, phone);
>> NOTICE:  person: public person number: 12345
>> NOTICE:  person: secret person number: 67890
>>     person     | phone
>> ---+---
>>  public person | 12345
>
> Ouch!
>
>> 1. Change the planner so that conditions (and join!) in the view are
>> always enforced first, before executing any quals from the user-supplied
>>  query. Unfortunately that would have a catastrophic effect on performance.
>
> I have the horrible feeling that you're going to end up doing this
> (possibly in conjunction with #4). Once you've executed a user-defined
> function on a "hidden" row I think the game is lost. That might even
> apply to non-trivial expressions too.
>
>> 2. As an optimization, we could keep the current behavior if the user
>> has access to all the underlying tables anyway, but that's nontrivial
>> because permission checks are supposed to be executed at runtime, not
>> plan time.
>>
>> 3. Label every function as safe or unsafe, depending on whether it can
>> leak information about the arguments. Classifying functions correctly
>> can be a bit tricky; e.g functions that throw an error on some input
>> values could be exploited.
> [snip]
>
> I'm sure there's a way to generate an error on-demand for rows with
> specific numbers. That opens you up to fishing for hidden rows.
>
> It might be possible to label a subset of operators etc as safe. I'd
> guess that would exclude any casts in it, and perhaps CASE. Hmm - you
> could probably generate a divide-by-zero or overflow error or some such
> for any targetted numeric value though.
>
>> 4. Make the behavior user-controllable, something along the lines of
>> "CREATE RESTRICTED VIEW ...", to avoid the performance impact when views
>> are not used for access control.
>
> Not pretty, but solves the problem.
>
> --
>  Richard Huxton
>  Archonet Ltd
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Richard Huxton
Heikki Linnakangas wrote:
> CREATE VIEW phone_number AS
> SELECT person, phone FROM phone_data WHERE phone NOT LIKE '6%';

> CREATE OR REPLACE FUNCTION expose_person (person text, phone text)
> RETURNS bool AS $$
> begin
>   RAISE NOTICE 'person: % number: %', person, phone;
>   RETURN true;
> END; $$ LANGUAGE plpgsql COST 0.01;
> 
> postgres=>  SELECT * FROM phone_number WHERE expose_person(person, phone);
> NOTICE:  person: public person number: 12345
> NOTICE:  person: secret person number: 67890
> person | phone
> ---+---
>  public person | 12345

Ouch!

> 1. Change the planner so that conditions (and join!) in the view are
> always enforced first, before executing any quals from the user-supplied
>  query. Unfortunately that would have a catastrophic effect on performance.

I have the horrible feeling that you're going to end up doing this
(possibly in conjunction with #4). Once you've executed a user-defined
function on a "hidden" row I think the game is lost. That might even
apply to non-trivial expressions too.

> 2. As an optimization, we could keep the current behavior if the user
> has access to all the underlying tables anyway, but that's nontrivial
> because permission checks are supposed to be executed at runtime, not
> plan time.
> 
> 3. Label every function as safe or unsafe, depending on whether it can
> leak information about the arguments. Classifying functions correctly
> can be a bit tricky; e.g functions that throw an error on some input
> values could be exploited. 
[snip]

I'm sure there's a way to generate an error on-demand for rows with
specific numbers. That opens you up to fishing for hidden rows.

It might be possible to label a subset of operators etc as safe. I'd
guess that would exclude any casts in it, and perhaps CASE. Hmm - you
could probably generate a divide-by-zero or overflow error or some such
for any targetted numeric value though.

> 4. Make the behavior user-controllable, something along the lines of
> "CREATE RESTRICTED VIEW ...", to avoid the performance impact when views
> are not used for access control.

Not pretty, but solves the problem.

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Controlling changes in plpgsql variable resolution

2009-10-22 Thread Pavel Stehule
2009/10/22 Dimitri Fontaine :
> Tom Lane  writes:
>> be seen as one.)  And the Oracle-compatible option will be attractive
>> to people coming in from that side.  Reviewing megabytes of pl/sql
>> code for this kind of gotcha is not fun, and the "error" default would
>> only help a bit.
>
> What about having a new pl language called plsql (or mabe plosql) where
> it behaves like Oracle. The handler could maybe set the environment then
> call the plpgsql interpreter. Is it technically sound?

-1

without significant refactoring you will be far to plsql. And you
don't solve problem of plpgsql. Minimally plpgsql needs better
solution of ambiguous identifiers, and have to have some back
compatibility possibility.

I am thinking about new language based on SQL/PSM syntax - but I am
sure, so I don't use current plpgsql interpret. I thing, so there are
some possibilities for simplification - but it should to have some
incompatibilities (so I would not to back port it to plpgsql).

>
> If it is, it's just another way to spell this unfriendly #option syntax
> that people do not like. Yet another idea would be to keep the #option
> mecanism but spell it differently, in a more plpgsql like way:
>
>  create funcion ... language plpgsql
>  as $$
>  OPTIONS
>   lexer_priority = 'table, variable';
>  DECLARE
>   v_foo integer;
>  BEGIN
>  END;
>  $$;
>
> I know I don't like #option because it looks and feels "foreign", so t
> might just boils down to syntax issue for others too.
>

sorry I don't see it cleaner then just #option

CREATE FUNCTION foo()
RETURNS int AS $$
#option sqlprecedence
DECLARE ...
..

This mechanism is available now, and don't need to add some new code.

Regards
Pavel

> Regards,
> --
> dim
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Controlling changes in plpgsql variable resolution

2009-10-22 Thread Dimitri Fontaine
Tom Lane  writes:
> be seen as one.)  And the Oracle-compatible option will be attractive
> to people coming in from that side.  Reviewing megabytes of pl/sql
> code for this kind of gotcha is not fun, and the "error" default would
> only help a bit.

What about having a new pl language called plsql (or mabe plosql) where
it behaves like Oracle. The handler could maybe set the environment then
call the plpgsql interpreter. Is it technically sound?

If it is, it's just another way to spell this unfriendly #option syntax
that people do not like. Yet another idea would be to keep the #option
mecanism but spell it differently, in a more plpgsql like way:

  create funcion ... language plpgsql
  as $$
  OPTIONS
   lexer_priority = 'table, variable';
  DECLARE
   v_foo integer;
  BEGIN
  END;
  $$;

I know I don't like #option because it looks and feels "foreign", so t
might just boils down to syntax issue for others too.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Heikki Linnakangas
In chapter "36.4 Rules and Privileges" we show an example of using a
view to expose part of a table to other users, keeping other rows private:

> For example: A user has a list of phone numbers where some of them are
private, the others are of interest for the secretary of the office. He
can construct the following:
>
> CREATE TABLE phone_data (person text, phone text, private boolean);
> CREATE VIEW phone_number AS
> SELECT person, phone FROM phone_data WHERE NOT private;
> GRANT SELECT ON phone_number TO secretary;

While it works for this example, if the WHERE clause in the view is more
complex, it is possible for the secretary to circumvent the protection
by filtering rows in a function used in the WHERE clause. If the
function has a lower cost than the restriction in the view, it will be
executed first and will see all the rows, even though they won't be
present in the final result set.

For example:

CREATE TABLE phone_data (person text, phone text);
CREATE VIEW phone_number AS
SELECT person, phone FROM phone_data WHERE phone NOT LIKE '6%';
GRANT SELECT ON phone_number TO secretary;

-- secretary should only see the first row
INSERT INTO phone_data VALUES ('public person', '12345');
INSERT INTO phone_data VALUES ('secret person', '67890');

\c - secretary

CREATE OR REPLACE FUNCTION expose_person (person text, phone text)
RETURNS bool AS $$
begin
  RAISE NOTICE 'person: % number: %', person, phone;
  RETURN true;
END; $$ LANGUAGE plpgsql COST 0.01;

postgres=>  SELECT * FROM phone_number WHERE expose_person(person, phone);
NOTICE:  person: public person number: 12345
NOTICE:  person: secret person number: 67890
person | phone
---+---
 public person | 12345
(1 row)


Using views for access control like this is what we've been suggesting
to implement row-level access control for a long time. For example, Veil
depends heavily on this.


A related problem is that EXPLAIN ANALYZE too can reveal information
about the underlying tables that the view doesn't reveal, even if
there's no functions involved.


I posted this to secur...@postgresql.org first, and the conclusion was
to take the discussion to pgsql-hackers since there's no easy, robust
and back-patchable solution in sight. Options discussed include:

1. Change the planner so that conditions (and join!) in the view are
always enforced first, before executing any quals from the user-supplied
 query. Unfortunately that would have a catastrophic effect on performance.

2. As an optimization, we could keep the current behavior if the user
has access to all the underlying tables anyway, but that's nontrivial
because permission checks are supposed to be executed at runtime, not
plan time.

3. Label every function as safe or unsafe, depending on whether it can
leak information about the arguments. Classifying functions correctly
can be a bit tricky; e.g functions that throw an error on some input
values could be exploited. And it's not clear how a user could label
user-defined functions in a secure way. We'd still need the
infrastructure of point 1 to delay evaluation of non-safe quals, so this
is really just another optimization of it.

4. Make the behavior user-controllable, something along the lines of
"CREATE RESTRICTED VIEW ...", to avoid the performance impact when views
are not used for access control.


Thoughts?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby, prepared xacts, locks

2009-10-22 Thread Simon Riggs
On Thu, 2009-10-22 at 09:41 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Thu, 2009-10-22 at 07:55 +0300, Heikki Linnakangas wrote:
> >> Making some effort to transfer locks instead of acquiring+releasing
> >> would eliminate the need for having extra lock space available when
> >> switching from hot standby mode to normal operation.
> > 
> > This isn't very clear. You started by saying you were quite eager to
> > always grant and then release; this sounds like you don't want that now,
> > but you now again like the approach I had already attempted to take.
> 
> Yeah, I haven't made up my mind. What's in there now is certainly
> broken, so we need to do something. 

Agreed

> The simplest approach 

is the best

> would be to
> revert the changes in lock_twophase_recover(), while transfering the
> locks with something like AtPrepare_Locks() would be more robust in the
> face of shared memory shortage.

Will look into it

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers