Re: [HACKERS] (WIP) VACUUM REWRITE - CLUSTER by ctid

2009-10-23 Thread Heikki Linnakangas
Itagaki Takahiro wrote:
 I'm working on alternative version of VACUUM FULL, which is
 like CLUSTER but sort tuples in ctid order without index.
 The original discussion is here:
 [HACKERS] Feedback on getting rid of VACUUM FULL
 http://archives.postgresql.org/pgsql-hackers/2009-09/msg01047.php
 
 WIP patch attached. I have some questions over the development:
 
  1. Syntax: I choose CLUSTER tbl WITHOUT INDEX for the syntax,
 but it is debatable. What syntax is the best?
   VACUUM REWRITE? CLUSTER ORDER BY ctid? or replace VACUUM FULL?

I got the impression that replacing VACUUM FULL is the most popular
opinion. I like VACUUM REWRITE myself, except that it would require
making REWRITE a reserved keyword. I don't like tacking this onto
CLUSTER, particularly not with ORDER BY ctid. ctids are an
implementation detail most users don't care about, and ORDER BY sounds
like it's sorting something, but it's not.

  2. Superclass of HeapScanDesc and IndexScanDesc:
 We don't have an abstraction layer of HeapScanDesc and IndexScanDesc,
 but the layer is useful for this purpose. Is it reasonable?
 It is partially implemented as genam_beginscan() family in the patch.

I don't think it's really necessary. You might as well put a few if
(indexOid) clauses directly into copy_heap_data.

  3. Should we allow ctid as a default clustered index?
 We could assume ctid as a virtual index. The syntax for it
 might be ALTER TABLE tbl CLUSTER ON COLUMN ctid or so.

Isn't that the same as having no clustering index? We already have
ALTER TABLE tbl SET WITHOUT CLUSTER.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-23 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:

 Dimitri Fontaine dfonta...@hi-media.com writes:
 But it will set GET DIAGNOSTIC ... = ROW_COUNT, am I being told on IRC.

 This has been discussed before, please read archives.

The thread I found is pretty content free as far as answering my
question goes.
  http://archives.postgresql.org/pgsql-general/2009-06/msg00208.php

I'll go search for more, meantime I'll just add the main goal of this
new thread is to have -hackers know there is a real user demand for
having EXECUTE set FOUND the same way it sets GET DIAGNOSTIC.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Controlling changes in plpgsql variable resolution

2009-10-23 Thread Peter Eisentraut
On Tue, 2009-10-20 at 10:32 -0400, Tom Lane wrote:
 That's only sane if you are 100% certain that there could not be a
 security issue arising from the change of behavior.  Otherwise someone
 could for instance subvert a security-definer function by running it
 under the setting it wasn't written for.  Personally I am not 100%
 certain of that.

This is pretty much equivalent to the issue of setting search_path for
security definer functions.  Such settings that affect the semantics of
code should be overridden to a know safe value by such functions.

Still, making the setting SUSET will be very impractical.  The superuser
will have to ensure that all current and future functions on the
installation conform to the setting that is chosen.  This is pretty much
impossible.  The other choice is to uglify every function with an
explicit setting.  So in practice the only convenient and robust choice
is to leave it as is.

The correct solution, IMO, is that security definer functions need to
save the set of deemed-security-critical parameters automatically at
function creation time.  Since we don't do that currently, I'd guess
that 50% of security definer functions out there are still unsafe about
search_path.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] table corrupted

2009-10-23 Thread Dimitri Fontaine
João Eugenio Marynowski joa...@gmail.com writes:
 2009-10-16 16:11:47 BRT 192.168.0.29 ERROR:  invalid page header in
 block 462821 of relation ...

It could be that the following will prove helpful:
  http://archives.postgresql.org/pgsql-general/2007-07/msg00506.php
  http://pgsql.tapoueh.org/site/html/news/20080410.badblock.html

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread Heikki Linnakangas
Marc Munro wrote:
 Here is a typical veil secured view definition:
 
 create view parties as
 SELECT party_id, client_id, party_type_id, username, party_name
 FROM parties.parties
 WHERE api.user_has_client_or_personal_privilege(client_id,
 party_id, 'select parties') 
 OR api.user_has_client_privilege(party_id, 'select clients');
 
 A typical query against this would be:
 
 select * from parties where party_id = 42;
 
 The conditions in the view's where clause cannot generally be indexed.
 Applying those conditions before the user-supplied conditions would mean
 that a full-table scan would be required and performance would suck.  In
 fact, this very suckiness also exposes a covert channel in that now we
 can use the performance of the query to estimate the number of party
 records.

I'm not too worried about that as anyone can do SELECT reltuples FROM
pg_class where relname='parties'. An index-scan, however, would let you
estimate the number of records with party_id=42, whether or not you have
privileges on them, by timing how long the query takes. I don't think we
can avoid that.

 The most acceptable solution I have heard so far for this issue, is to
 identify those functions which can leak information as 'insecure', and 
 those views which are for security purpose as 'secured'.  Then it is
 simply (hah!) a matter of planning the query of secured views so that
 all insecure functions are called after all secure functions.  In this
 way, they will only be able to leak what the user is entitled to see,
 and performance will only be as badly affected as is strictly necessary.

The most user-friendly and backwards-compatible (though not necessarily
back-patchable) approach I can see is:

1. If the user has read access to all the underlying tables, plan it
like we do today.

2. If the view refers only one table (as a typical Veil view does), plan
it like we do today but enforce that view conditions are evaluated first
in the Filter. Notably, allow using any user-supplied conditions as
index quals if there's a matching index.

3. Otherwise fully materialize the view.

I expect 1  2 to catch most real-life scenarios. 1 ensures that we
don't degrade performance of (potentially very complex) views that are
not used for access control, and 2 should apply to most views used for
row-level access control. Note that if you have an access-control view
that has a join, you can often split it into two access-control views
over the underlying tables with another view that joins the two views.
Such a construct would be quite well

Tom didn't like the idea of performing permission checks at plan time,
because they're currently always done at execution time. If we plan a
query on the assumption that you have access to the underlying table, I
think we could perform permission checks on the underlying tables at
runtime to check that the user still has access. It would mean that a
previously prepared query might start to fail with permission denied
errors if someone revokes your privileges on the tables, but I think we
could live with that. Plan invalidation would cover the case where
permissions on the tables are changed, but would not cover indirect
cases like removing the user from a group that had the required permissions.


Whatever we do, we probably should add a caveat in the manual that
EXPLAIN or timing attacks can let you deduce some (statistical)
information you won't have direct access to.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread Simon Riggs
On Fri, 2009-10-23 at 11:30 +0300, Heikki Linnakangas wrote:

 The most user-friendly and backwards-compatible (though not necessarily
 back-patchable) approach I can see is:
 
 1. If the user has read access to all the underlying tables, plan it
 like we do today.

For me, it would make most sense to explicitly mark Views as being
security views. That way planning need only change when we are
optimizing a query that accesses a view with plan security enabled.

ALTER VIEW foo ENABLE PLAN SECURITY;

That is much clearer and easily to verify/audit, so more secure.

Also, we should presume that any function created with SECURITY DEFINER
and created by a superuser would have plan security, so we don't need to
annotate lots of old code to work securely. Annotating the built-in
functions is a lot easier.

 2. If the view refers only one table (as a typical Veil view does), plan
 it like we do today but enforce that view conditions are evaluated first
 in the Filter. Notably, allow using any user-supplied conditions as
 index quals if there's a matching index.
 
 3. Otherwise fully materialize the view.

So if we join a normal table or a view to a secure view then only the
secure view part would be materialized? Or do you mean the whole query
would be materialized?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread Pavel Stehule
2009/10/23 Simon Riggs si...@2ndquadrant.com:
 On Fri, 2009-10-23 at 11:30 +0300, Heikki Linnakangas wrote:

 The most user-friendly and backwards-compatible (though not necessarily
 back-patchable) approach I can see is:

 1. If the user has read access to all the underlying tables, plan it
 like we do today.

 For me, it would make most sense to explicitly mark Views as being
 security views. That way planning need only change when we are
 optimizing a query that accesses a view with plan security enabled.

 ALTER VIEW foo ENABLE PLAN SECURITY;

+1

Pavel


 That is much clearer and easily to verify/audit, so more secure.

 Also, we should presume that any function created with SECURITY DEFINER
 and created by a superuser would have plan security, so we don't need to
 annotate lots of old code to work securely. Annotating the built-in
 functions is a lot easier.

 2. If the view refers only one table (as a typical Veil view does), plan
 it like we do today but enforce that view conditions are evaluated first
 in the Filter. Notably, allow using any user-supplied conditions as
 index quals if there's a matching index.

 3. Otherwise fully materialize the view.

 So if we join a normal table or a view to a secure view then only the
 secure view part would be materialized? Or do you mean the whole query
 would be materialized?

 --
  Simon Riggs           www.2ndQuadrant.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-23 Thread Cédric Villemain
Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit :
 On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote:
  On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain
 
  cedric.villem...@dalibo.com wrote:
   Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
   On Mon, Oct 19, 2009 at 2:08 PM, marcin mank marcin.m...@gmail.com 
wrote:
Currently random_page_cost is a GUC. I propose that this could be
set per-table.
  
   Or per-tablespace.
  
   Yes, I think there are a class of GUCs which describe the physical
   attributes of the storage system which should be per-table or
   per-tablespace. random_page_cost, sequential_page_cost,
   effective_io_concurrency come to mind.
  
   and, perhaps effective_cache_size.
  
   You can have situation where you don't want some tables go to OS memory
   (you can disabled that at filesystem level, ... l'd like to be able to
   do that at postgres level but it is another point)
  
   So you put those tables in a separate tablespace, and tell postgresql
   that the effective_cache_size is 0 (for this tablespace), up to
   postgres to do the right thing with that ;)
 
  Why would you ever want to set effective_cache_size to 0?
 
 I think this is a misunderstanding of how effective_cache_size works. I
 can't think of any reason to do that. I could see a reason to tell the
 OS to not throw a relation into cache but that is a different thing.

Well the effective_cache_size in this context is OS cache memory (0 in my case) 
+ estimation of shared_buffer.. ah so DBA should estimate the amount in the 
shared_buffer only, ok.

So consider effective_cache_size = 0 + what pg_buffer_cache will tell.

My case is a table containing 29 GB of bytea in a database of 52 GB. Every row 
on the 29GB table is grab only few times. And it will just renew OS cache 
memory  every time (the server have only 8GB of ram).
So when I remove this table (not the index) from the OS cache memory,  I keep 
more interesting blocks in the OS cache memory.

And disk + raid are quick enought to bypass the OS cache memory for this 
tablespace.


Are things a bit clearer and usage not so silly ?



 
 Joshua D. Drake
 
  ...Robert
 

-- 
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread KaiGai Kohei
Simon Riggs wrote:
 On Fri, 2009-10-23 at 11:30 +0300, Heikki Linnakangas wrote:
 
 The most user-friendly and backwards-compatible (though not necessarily
 back-patchable) approach I can see is:

 1. If the user has read access to all the underlying tables, plan it
 like we do today.
 
 For me, it would make most sense to explicitly mark Views as being
 security views. That way planning need only change when we are
 optimizing a query that accesses a view with plan security enabled.
 
 ALTER VIEW foo ENABLE PLAN SECURITY;
 
 That is much clearer and easily to verify/audit, so more secure.

The point is that only owner knows intention of the view correctly.
It seems to me reasonable to explicitly mark whether the major purpose
of view is security, or not.

 Also, we should presume that any function created with SECURITY DEFINER
 and created by a superuser would have plan security, so we don't need to
 annotate lots of old code to work securely. Annotating the built-in
 functions is a lot easier.

Sorry, what is happen if function is marked as plan security?

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread Simon Riggs
On Fri, 2009-10-23 at 19:38 +0900, KaiGai Kohei wrote:
  Also, we should presume that any function created with SECURITY DEFINER
  and created by a superuser would have plan security, so we don't need to
  annotate lots of old code to work securely. Annotating the built-in
  functions is a lot easier.
 
 Sorry, what is happen if function is marked as plan security?

I was suggesting an intelligent default by which we could determine
function marking implicitly, if it was not explicitly stated on the
CREATE FUNCTION.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread KaiGai Kohei
Simon Riggs wrote:
 On Fri, 2009-10-23 at 19:38 +0900, KaiGai Kohei wrote:
 Also, we should presume that any function created with SECURITY DEFINER
 and created by a superuser would have plan security, so we don't need to
 annotate lots of old code to work securely. Annotating the built-in
 functions is a lot easier.
 Sorry, what is happen if function is marked as plan security?
 
 I was suggesting an intelligent default by which we could determine
 function marking implicitly, if it was not explicitly stated on the
 CREATE FUNCTION.

How to handle a (corner) case when the function owner was changed to non
privileged user and its definition is replaced later?

Even if someone malicious gives leakage condition on the view, possible
leakable infotmation is restricted to where the owner of view can access.
So, it seems to me the security mark on views by owner are sufficient.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))

2009-10-23 Thread KaiGai Kohei
KaiGai Kohei wrote:
 After the long trial-and-errors, we learned a few approaches which
 use common entry points for both of DAC and MAC were rocky-path more
 than what we initially imagined.
 So, we came back to the original design. It deploys MAC hooks on
 the strategic points of core routines. On the other hand, people
 complained about this approach without clear documentation, because
 most of people are not familiar to both of SELinux and PgSQL.
 Heikki suggested that a clear developer documentation should be
 provided to understand pgsql-hackers this new concept.
 (And, Peter has also suggested before a developer documentation will
 be a good source of user documentations.)
 
 I plan to submit SE-PgSQL/lite patch with developer documentations
 on the next commit-fest.
 I can understand what I want to develop and the purpose of codes.
 However, it may not match with what you want to know.
 
 So, I'd like to ask what should be included within the developer
 documentation at first prior to making a documentation.
 
 I plans the developer documentation should be put as a REAME file,
 not a SGML documentation or a certain wiki page.
 And I think it should contain the following items.
 
 * overview
  - general overview of SE-PgSQL
  - introduction of SELinux specific terms (such as security context)
 
 * internal architecture
  - the purpose of sub-components (such as management of security
context, caches of access control decision and so on)
  - differences from similar permissions in DAC
 
 * object classes and permissions defined in SELinux model
  - list of them and when/where they should be checked.
 
 * specification of SE-PgSQL hooks
   (It should be put on the source code comments for easy maintenance.)
  - what this hook does, what arguments are required, what result will
be returned.
 
 * code examples
  - a few examples to add MAC checks within 3rd party modules.
 
 Do you have any comments? What should be added to? or removed from?

I guess it was too abstract to suggest anything.
Anyway, I'll begin to describe the developer documentation base on the chapters.
If necessary, we can fix it up later.

In addition, I determined the initial patch only covers access controls on
the four object classes (database, schema, table and column) to reduce burdens
of reviewing.
We also can add support for other object classes (such as procedure) later.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread Heikki Linnakangas
Simon Riggs wrote:
 Also, we should presume that any function created with SECURITY DEFINER
 and created by a superuser would have plan security, so we don't need to
 annotate lots of old code to work securely. Annotating the built-in
 functions is a lot easier.

SECURITY DEFINER is an orthogonal aspect. Consider something as
innocent-looking as this:

CREATE FUNCTION secdeffunc(text) RETURNS boolean AS $$ SELECT
$1::integer  10; $$ LANGUAGE SQL SECURITY DEFINER;

The text-to-integer cast throws an error and reveals the argument as
someone pointed out earlier in this thread. Creating such a function
shouldn't open the door to information leaks in views elsewhere.

The most useful automatic annotation I can see is to treat functions
implementing B-tree operators as safe. I *think* that's safe, anyway.

 2. If the view refers only one table (as a typical Veil view does), plan
 it like we do today but enforce that view conditions are evaluated first
 in the Filter. Notably, allow using any user-supplied conditions as
 index quals if there's a matching index.

 3. Otherwise fully materialize the view.
 
 So if we join a normal table or a view to a secure view then only the
 secure view part would be materialized? Or do you mean the whole query
 would be materialized?

Just the secure view. Materializing the result of the overall query
wouldn't help.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread Heikki Linnakangas
Rod Taylor wrote:
 This still allow many optimizations to be applied in complex cases. The 
 planner
 
 CREATE VIEW phone_number AS
 SELECT person, phone, company
 FROM phone_data USING SECURITY FILTER(phone NOT LIKE '6%')
JOIN person USING (person_id)
JOIN company USING (company_id)
 AND person.active AND company.active;

Well, you can also achieve that by creating two views, one to hide the
sensitive data and another to do the join:

CREATE VIEW not6_numbers AS
  SELECT phone FROM phone_data WHERE phone NOT LIKE '6%';

CREATE VIEW phone_number AS
  SELECT person, phone, company FROM not6_numbers
  JOIN person USING (person_id)
  JOIN company USING (company_id)
  WHERE person.active AND company.active;

So I don't think we should invent new syntax for that. The 1st view
would be marked with SECURE if we end up using that explicit annotation
in CREATE VIEW.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread Richard Huxton
Heikki Linnakangas wrote:
 The most useful automatic annotation I can see is to treat functions
 implementing B-tree operators as safe. I *think* that's safe, anyway.

Index lookups and single-type comparisons were the only things I could
come up with as safe. Unless there is some way to generate an error from
geometric ops (overflow or some such).

Anything involving a type-cast can obviously be finessed. If you allow
arithmetic then you could trigger an overflow or divide-by-zero error.

Hmm - you can probably do something evil with non-UTF8 characters if you
allow string operations. Would string comparisons be safe (because a
literal would be caught before the view gets evaluated)?

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-23 Thread Robert Haas
On Fri, Oct 23, 2009 at 5:23 AM, Cédric Villemain
cedric.villem...@dalibo.com wrote:
 Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit :
 On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote:
  On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain
 
  cedric.villem...@dalibo.com wrote:
   Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
   On Mon, Oct 19, 2009 at 2:08 PM, marcin mank marcin.m...@gmail.com
 wrote:
Currently random_page_cost is a GUC. I propose that this could be
set per-table.
  
   Or per-tablespace.
  
   Yes, I think there are a class of GUCs which describe the physical
   attributes of the storage system which should be per-table or
   per-tablespace. random_page_cost, sequential_page_cost,
   effective_io_concurrency come to mind.
  
   and, perhaps effective_cache_size.
  
   You can have situation where you don't want some tables go to OS memory
   (you can disabled that at filesystem level, ... l'd like to be able to
   do that at postgres level but it is another point)
  
   So you put those tables in a separate tablespace, and tell postgresql
   that the effective_cache_size is 0 (for this tablespace), up to
   postgres to do the right thing with that ;)
 
  Why would you ever want to set effective_cache_size to 0?

 I think this is a misunderstanding of how effective_cache_size works. I
 can't think of any reason to do that. I could see a reason to tell the
 OS to not throw a relation into cache but that is a different thing.

 Well the effective_cache_size in this context is OS cache memory (0 in my 
 case)
 + estimation of shared_buffer.. ah so DBA should estimate the amount in the
 shared_buffer only, ok.

 So consider effective_cache_size = 0 + what pg_buffer_cache will tell.

 My case is a table containing 29 GB of bytea in a database of 52 GB. Every row
 on the 29GB table is grab only few times. And it will just renew OS cache
 memory  every time (the server have only 8GB of ram).
 So when I remove this table (not the index) from the OS cache memory,  I keep
 more interesting blocks in the OS cache memory.

 And disk + raid are quick enought to bypass the OS cache memory for this
 tablespace.


 Are things a bit clearer and usage not so silly ?

Well, I think you're vastly overestimating the power of
effective_cache_size.  effective_cache_size changes the planner's
estimation of how likely a repeated partial index scan is to find the
same block in cache.  So it only affects
nested-loop-with-inner-indexscan plans, and if effective_cache_size is
set to a value larger than the size of the index (or maybe the
relation, I'm too lazy to go reread the code right now), one value is
as good as another.  For a typical user, I think you could set
effective_cache_size to, say, a terabyte, and it wouldn't make a bit
of difference.  Heck, why not 2TB.

As far as I can see, the only possible value of setting this knob to a
value other than positive-infinity is that if you have a huge dataset
that's not close to fitting in memory, this might cause the planner to
pick a merge join over a nested loop with inner indexscan, which might
be better if it makes the I/O sequential rather than random.  Anyone
think I'm a pessimist?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-23 Thread Cédric Villemain
Le vendredi 23 octobre 2009 14:23:09, Robert Haas a écrit :
 On Fri, Oct 23, 2009 at 5:23 AM, Cédric Villemain
 
 cedric.villem...@dalibo.com wrote:
  Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit :
  On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote:
   On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain
  
   cedric.villem...@dalibo.com wrote:
Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
On Mon, Oct 19, 2009 at 2:08 PM, marcin mank
marcin.m...@gmail.com
 
  wrote:
 Currently random_page_cost is a GUC. I propose that this could be
 set per-table.
   
Or per-tablespace.
   
Yes, I think there are a class of GUCs which describe the physical
attributes of the storage system which should be per-table or
per-tablespace. random_page_cost, sequential_page_cost,
effective_io_concurrency come to mind.
   
and, perhaps effective_cache_size.
   
You can have situation where you don't want some tables go to OS
memory (you can disabled that at filesystem level, ... l'd like to
be able to do that at postgres level but it is another point)
   
So you put those tables in a separate tablespace, and tell
postgresql that the effective_cache_size is 0 (for this tablespace),
up to postgres to do the right thing with that ;)
  
   Why would you ever want to set effective_cache_size to 0?
 
  I think this is a misunderstanding of how effective_cache_size works. I
  can't think of any reason to do that. I could see a reason to tell the
  OS to not throw a relation into cache but that is a different thing.
 
  Well the effective_cache_size in this context is OS cache memory (0 in my
  case) + estimation of shared_buffer.. ah so DBA should estimate the
  amount in the shared_buffer only, ok.
 
  So consider effective_cache_size = 0 + what pg_buffer_cache will tell.
 
  My case is a table containing 29 GB of bytea in a database of 52 GB.
  Every row on the 29GB table is grab only few times. And it will just
  renew OS cache memory  every time (the server have only 8GB of ram).
  So when I remove this table (not the index) from the OS cache memory,  I
  keep more interesting blocks in the OS cache memory.
 
  And disk + raid are quick enought to bypass the OS cache memory for this
  tablespace.
 
 
  Are things a bit clearer and usage not so silly ?
 
 Well, I think you're vastly overestimating the power of
 effective_cache_size.  effective_cache_size changes the planner's
 estimation of how likely a repeated partial index scan is to find the
 same block in cache.  So it only affects
 nested-loop-with-inner-indexscan plans, and if effective_cache_size is
 set to a value larger than the size of the index (or maybe the
 relation, I'm too lazy to go reread the code right now), one value is
 as good as another.  For a typical user, I think you could set
 effective_cache_size to, say, a terabyte, and it wouldn't make a bit
 of difference.  Heck, why not 2TB.

Ok. I don't care too much on this parameter so. 
As we were talking about some parameters that can be tablespace specific, I 
thought this one can have different values too.

 
 As far as I can see, the only possible value of setting this knob to a
 value other than positive-infinity is that if you have a huge dataset
 that's not close to fitting in memory, this might cause the planner to
 pick a merge join over a nested loop with inner indexscan, which might
 be better if it makes the I/O sequential rather than random.  Anyone
 think I'm a pessimist?
 
 ...Robert
 

-- 
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] EvalPlanQual seems a tad broken

2009-10-23 Thread Tom Lane
I wrote:
 On further review it seems that a better way to do this is to make
 things happen inside the EPQ machinery.  We need to freeze the rows
 returned by *all* scan nodes, not only the ones referencing real tables
 --- for example, a join against a VALUES scan node would still be a
 problem if we don't lock down the VALUES output, since we could end up
 getting multiple join rows out.  This means we can't assume that there
 is a CTID associated with every scan node that EPQ needs to lock down.
 What looks like it would work instead is to pass through the current
 scan tuple for every scan plan node, not only the ones that are FOR
 UPDATE targets.  I'm tempted to try to move the responsibility for this
 into execScan.c instead of having all the individual scan plan types
 know about it.

What I had been thinking of when I wrote that was to pass down the
ScanTupleSlots from the outer query's scan nodes.  That codes up
nicely but doesn't work at all :-(.  As is obvious in hindsight,
the scan nodes are not necessarily still returning the same tuples
that contribute to the current join tuple --- for instance if you
have a sort-and-mergejoin type of plan, all the scans will be at
EOF by the time the top level sees any tuples.

So we need to be able to recover the original scan tuples from the
join tuple, even for scans that are not to be locked.  For real
tables this isn't hard, we can pass up the CTID as a junk column
the same as we do for tables that are to be locked.  It's harder
for non-table scans though (VALUES, functions, etc).  I can see
two conceivably workable alternatives:

1. Pass up the entire row value as a junk whole-row Var.

2. Invent some equivalent to CTID that would allow the row to be
recovered later --- for instance, the row number in a tuplestore.
One problem with this is that not all those scan types use a tuplestore
now, so we'd be adding significant overhead.  Also, I'm not entirely
sure that it can work for scan nodes that get reset and rescanned
repeatedly.  Some of them clear and refill the tuplestore when they
do that, and the refill isn't necessarily the same row values.

Fortunately, this case probably doesn't arise that much in practice,
so while it needs to work I doubt that performance is critical.

I'm planning to try alternative #1 next, but I wonder if anyone
has a better idea?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-23 Thread Tom Lane
Dimitri Fontaine dfonta...@hi-media.com writes:
 I'll go search for more, meantime I'll just add the main goal of this
 new thread is to have -hackers know there is a real user demand for
 having EXECUTE set FOUND the same way it sets GET DIAGNOSTIC.

[shrug...]  There is also real user demand for not silently breaking
code that works now, which is what we risk anytime we change the set
of statements that can set FOUND.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Fri, 2009-10-23 at 19:38 +0900, KaiGai Kohei wrote:
 Sorry, what is happen if function is marked as plan security?

 I was suggesting an intelligent default by which we could determine
 function marking implicitly, if it was not explicitly stated on the
 CREATE FUNCTION.

The thought that's been in the back of my mind is that you could solve
99% of the performance problem if you trusted all builtin functions and
nothing else.  This avoids the question of who gets to mark functions
as trustable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-23 Thread Robert Haas
On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Dimitri Fontaine dfonta...@hi-media.com writes:
 I'll go search for more, meantime I'll just add the main goal of this
 new thread is to have -hackers know there is a real user demand for
 having EXECUTE set FOUND the same way it sets GET DIAGNOSTIC.

 [shrug...]  There is also real user demand for not silently breaking
 code that works now, which is what we risk anytime we change the set
 of statements that can set FOUND.

We've had this discussion before and I'm still unpersuaded by your
position.  I *never* write IF FOUND THEN except immediately after
the statement where I expect that variable to be set, and I submit
that anyone who who does write code that relies on certain statements
not setting FOUND is, IMO, depending on a bug.  We don't and shouldn't
have a policy of making future PostgreSQL releases bug-compatible with
previous releases.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread David Fetter
On Fri, Oct 23, 2009 at 10:04:29AM -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Fri, 2009-10-23 at 19:38 +0900, KaiGai Kohei wrote:
  Sorry, what is happen if function is marked as plan security?
 
  I was suggesting an intelligent default by which we could
  determine function marking implicitly, if it was not explicitly
  stated on the CREATE FUNCTION.
 
 The thought that's been in the back of my mind is that you could
 solve 99% of the performance problem if you trusted all builtin
 functions and nothing else.  This avoids the question of who gets to
 mark functions as trustable.

Great idea!

One of the things the security community has learned is that the only
way it's even possible to get an information leak rate of zero is to
have a system which does nothing at all.  It's a fact we need to bear
in mind when addressing this or any other issue of access control.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-23 Thread Andrew Dunstan



Robert Haas wrote:

On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  

Dimitri Fontaine dfonta...@hi-media.com writes:


I'll go search for more, meantime I'll just add the main goal of this
new thread is to have -hackers know there is a real user demand for
having EXECUTE set FOUND the same way it sets GET DIAGNOSTIC.
  

[shrug...]  There is also real user demand for not silently breaking
code that works now, which is what we risk anytime we change the set
of statements that can set FOUND.



We've had this discussion before and I'm still unpersuaded by your
position.  I *never* write IF FOUND THEN except immediately after
the statement where I expect that variable to be set, and I submit
that anyone who who does write code that relies on certain statements
not setting FOUND is, IMO, depending on a bug.  We don't and shouldn't
have a policy of making future PostgreSQL releases bug-compatible with
previous releases.


  


I agree that  doing that is bad practice. I disagree that it's a bug. 
And if it is, and we change it, then locating all the places where the 
bug might occur will be a nightmare. In effect it means you'll need to 
review every single use of FOUND in your code (possibly hundreds of 
thousands or millions of lines) to make sure you're not accidentally 
relying on the behaviour. No thanks.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pre-proposal: type interfaces

2009-10-23 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 For instance, one feature that I'm considering now is a temporal join
 which is a join on overlaps rather than equals, e.g.:

   SELECT * FROM a, b WHERE a.x  b.x;

 I might try to provide a modified merge join algorithm to implement that
 more efficiently in some cases. I don't mean to discuss the feature in
 detail here (I will make a separate proposal) but the algorithm requires
 that I find the strictly left of operator. 

Well, actually you need *two* things.  The first prerequisite is to know
that the operator named  has the semantics of overlaps in some
generalized sense.  The second prerequisite is to identify the
associated strictly left of operator.

As you say, we've done this in the past using operator classes.  That's
worked reasonably well because what the existing code needs to know
about is equality, ordering, and hashing, and that all matches up nicely
with btree and hash opclasses.

 The way I see it, we have two approaches:
  1. Try to make the current system work by standardizing the strategy
 numbers for GiST somehow, and then use the default GiST operator class,
 if available.

This proposal is a non-starter, because one of the whole points of GIST
is that it can support multiple sets of semantics.  There is no reason
at all to assume that every GIST opclass must include operators having
the semantics of overlaps and to-left-of.

  2. Invent a new system, perhaps interfaces, perhaps something else.
  3. Use extra flags in CREATE OPERATOR somehow

The thing that would require the least amount of invention is to use the
operator class/family machinery with a new index type.  Perhaps a dummy
entry in pg_am would be acceptable.  (You would probably create just
operator families, with no contained classes, since the only point of a
class is to identify the minimum support set for an index and there
won't be any indexes.)  An alternative is to somehow mark those GIST
opclasses that do meet the expectation about operator semantics.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread Heikki Linnakangas
Tom Lane wrote:
 The thought that's been in the back of my mind is that you could solve
 99% of the performance problem if you trusted all builtin functions and
 nothing else.  This avoids the question of who gets to mark functions
 as trustable.

Except that all builtin functions are not trustworthy. set_config and
int-text cast are two examples mentioned this far, and I'm sure there's
a boatload of others.

Trusting only index operators seems more and more attractive to me. That
won't limit us to built-in datatypes, requires no explicit user action
to categorize functions. They're also the most significant functions
from a performance point-of-view, allowing use of indexes instead of
forcing a seqscan of all tables.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [shrug...]  There is also real user demand for not silently breaking
 code that works now, which is what we risk anytime we change the set
 of statements that can set FOUND.

 We've had this discussion before and I'm still unpersuaded by your
 position.  I *never* write IF FOUND THEN except immediately after
 the statement where I expect that variable to be set, and I submit
 that anyone who who does write code that relies on certain statements
 not setting FOUND is, IMO, depending on a bug.  We don't and shouldn't
 have a policy of making future PostgreSQL releases bug-compatible with
 previous releases.

This position is nonsense for two reasons:

1. It can hardly be considered a bug that FOUND is set only by the
statements that the documentation specifically states are the only ones
it is set by.

2. In order to use FOUND *at all* you must assume that it has got some
amount of stability.  IF FOUND is already assuming that the IF
statement didn't reset the flag before evaluating the expression.
Lots of other perfectly reasonable constructions assume that FOUND
will stay stable across no op statements.

Any change here is *not* a bug fix, it is a change of clearly documented
and not-obviously-unreasonable behavior.  We have to take seriously the
likelihood that it will break existing code.  If there were a way to
flag such breakage I would be happier about changing it; but as Andrew
already noted, there doesn't seem to be any way to find affected code
except painful manual review.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-23 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 Any change here is *not* a bug fix, it is a change of clearly
 documented and not-obviously-unreasonable behavior.  We have to take
 seriously the likelihood that it will break existing code.
 
Perhaps plpgsql could support tests of SQLSTATE, and recognize '02000'
(the standard value for zero rows affected) to support the desired
new semantics?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread Kevin Grittner
David Fetter da...@fetter.org wrote:
 
 One of the things the security community has learned is that the
 only way it's even possible to get an information leak rate of zero
 is to have a system which does nothing at all.  It's a fact we need
 to bear in mind when addressing this or any other issue of access
 control.
 
And to get all old-school about it, I tend to go with the position put
forward by Admiral Grace Hopper[1] when I heard her speak at an ACM
meeting here.  She said that *any* security could be broken, and that
the goal should be to put the cost of creating the breach higher for
the perpetrators than the benefits which would accrue to them.  That
informs my perspective, anyway.
 
So, one of the first questions I ask about an information leak is
what good would it do someone to know that?  So I don't worry too
much about someone knowing the size of my database or the number of
rows in a table, or for that matter whether county 12 has a
2009GN000317 case or how many party records have a Social Security
Number stored.  I care very much that the SSN associated with a person
or a document flagged as confidential doesn't leak to unauthorized
viewers, because that information could benefit someone who obtains it
and harm others.
 
Perspective is more important that purity here.
 
-Kevin
 
[1] http://en.wikipedia.org/wiki/Grace_Hopper

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] client_lc_messages

2009-10-23 Thread Alvaro Herrera
Peter Eisentraut wrote:

 Note that only glibc supports switching the language at run time.  And
 doing it is probably very expensive if you want to do it twice per
 message.

Ouch :-(

 I think you could probably get much of the use case out of this if you
 concentrate on making two switches for the client and the log, which can
 be set to a language or untranslated, and if you choose a language it
 has to be the same for both.

So we'd go with a single setting to define language, which would be the
current lc_messages, and two new settings, say translate_log_messages
and translate_client_messages, the latter being USERSET.

Does that sound reasonable?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-23 Thread Merlin Moncure
On Fri, Oct 23, 2009 at 11:07 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:

 Any change here is *not* a bug fix, it is a change of clearly
 documented and not-obviously-unreasonable behavior.  We have to take
 seriously the likelihood that it will break existing code.

 Perhaps plpgsql could support tests of SQLSTATE, and recognize '02000'
 (the standard value for zero rows affected) to support the desired
 new semantics?

+1

I rarely use found because it's dangerous ...would be nice to have a
more rigorous test...

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-23 Thread Robert Haas
On Fri, Oct 23, 2009 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [shrug...]  There is also real user demand for not silently breaking
 code that works now, which is what we risk anytime we change the set
 of statements that can set FOUND.

 We've had this discussion before and I'm still unpersuaded by your
 position.  I *never* write IF FOUND THEN except immediately after
 the statement where I expect that variable to be set, and I submit
 that anyone who who does write code that relies on certain statements
 not setting FOUND is, IMO, depending on a bug.  We don't and shouldn't
 have a policy of making future PostgreSQL releases bug-compatible with
 previous releases.

 This position is nonsense for two reasons:

 1. It can hardly be considered a bug that FOUND is set only by the
 statements that the documentation specifically states are the only ones
 it is set by.

OK, it's not a bug: it's a misfeature.  :-)

 2. In order to use FOUND *at all* you must assume that it has got some
 amount of stability.  IF FOUND is already assuming that the IF
 statement didn't reset the flag before evaluating the expression.
 Lots of other perfectly reasonable constructions assume that FOUND
 will stay stable across no op statements.

Sure.  I think there's a big difference between assuming that the word
IF (or the intervening semicolon and/or whitespace) did not reset
FOUND and assuming that it will not be reset by the execution of a
dynamic SQL query.  The former is necessary for there to be any
conceivable way of using FOUND; the latter is assuming that for some
reason we want to treat dynamic SQL queries differently than static
ones.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] client_lc_messages

2009-10-23 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 So we'd go with a single setting to define language, which would be the
 current lc_messages, and two new settings, say translate_log_messages
 and translate_client_messages, the latter being USERSET.

 Does that sound reasonable?

How do we get to the point where individual users can choose their
message language, though?  If LC_MESSAGES stays as SUSET then you
haven't really made matters better for anybody.

With the above infrastructure, we could get there if there were a way to
say LC_MESSAGES is USERSET if translate_log_messages is OFF, but there
isn't and I doubt it would be a good idea to try to make it work like
that.

Maybe we should stick to the original design and just document that
you'll take a big performance hit if the settings are different and
both not C.  And of course make sure we avoid the performance hit
otherwise.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-23 Thread Simon Riggs
On Fri, 2009-10-23 at 10:04 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Fri, 2009-10-23 at 19:38 +0900, KaiGai Kohei wrote:
  Sorry, what is happen if function is marked as plan security?
 
  I was suggesting an intelligent default by which we could determine
  function marking implicitly, if it was not explicitly stated on the
  CREATE FUNCTION.
 
 The thought that's been in the back of my mind is that you could solve
 99% of the performance problem if you trusted all builtin functions and
 nothing else.  This avoids the question of who gets to mark functions
 as trustable.

That is a very good default. My experience is that those 1% of cases are
responsible for 99% of wasted time, so the ability to specify things for
user functions is critical. If we make user extensibility second rate we
will force solutions to be second rate also. (e.g. where would PostGIS
be without type-specific analyze functions?).

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-23 Thread Pavel Stehule
2009/10/23 Robert Haas robertmh...@gmail.com:
 On Fri, Oct 23, 2009 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [shrug...]  There is also real user demand for not silently breaking
 code that works now, which is what we risk anytime we change the set
 of statements that can set FOUND.

 We've had this discussion before and I'm still unpersuaded by your
 position.  I *never* write IF FOUND THEN except immediately after
 the statement where I expect that variable to be set, and I submit
 that anyone who who does write code that relies on certain statements
 not setting FOUND is, IMO, depending on a bug.  We don't and shouldn't
 have a policy of making future PostgreSQL releases bug-compatible with
 previous releases.

 This position is nonsense for two reasons:

 1. It can hardly be considered a bug that FOUND is set only by the
 statements that the documentation specifically states are the only ones
 it is set by.

 OK, it's not a bug: it's a misfeature.  :-)

Isn't this behave shared with PL/SQL? In some environments the dynamic
queries are external - so there wasn't possibility to get return
state. I afraid so somewhere this feature was extensively used - I
dislike this feature too, but I agree with Tom - this is small
problem, and it is better do nothing.

What about to add new flag to EXECUTE?

or create execute function, that returns found

like

execute('SELECT ' INTO ... USING ...)?

it's obscure too.

Regards
Pavel Stehule


 2. In order to use FOUND *at all* you must assume that it has got some
 amount of stability.  IF FOUND is already assuming that the IF
 statement didn't reset the flag before evaluating the expression.
 Lots of other perfectly reasonable constructions assume that FOUND
 will stay stable across no op statements.

 Sure.  I think there's a big difference between assuming that the word
 IF (or the intervening semicolon and/or whitespace) did not reset
 FOUND and assuming that it will not be reset by the execution of a
 dynamic SQL query.  The former is necessary for there to be any
 conceivable way of using FOUND; the latter is assuming that for some
 reason we want to treat dynamic SQL queries differently than static
 ones.

 ...Robert

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql EXECUTE will not set FOUND

2009-10-23 Thread Robert Haas
On Fri, Oct 23, 2009 at 12:05 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2009/10/23 Robert Haas robertmh...@gmail.com:
 On Fri, Oct 23, 2009 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [shrug...]  There is also real user demand for not silently breaking
 code that works now, which is what we risk anytime we change the set
 of statements that can set FOUND.

 We've had this discussion before and I'm still unpersuaded by your
 position.  I *never* write IF FOUND THEN except immediately after
 the statement where I expect that variable to be set, and I submit
 that anyone who who does write code that relies on certain statements
 not setting FOUND is, IMO, depending on a bug.  We don't and shouldn't
 have a policy of making future PostgreSQL releases bug-compatible with
 previous releases.

 This position is nonsense for two reasons:

 1. It can hardly be considered a bug that FOUND is set only by the
 statements that the documentation specifically states are the only ones
 it is set by.

 OK, it's not a bug: it's a misfeature.  :-)

 Isn't this behave shared with PL/SQL? In some environments the dynamic
 queries are external - so there wasn't possibility to get return
 state. I afraid so somewhere this feature was extensively used - I
 dislike this feature too, but I agree with Tom - this is small
 problem, and it is better do nothing.

 What about to add new flag to EXECUTE?

 or create execute function, that returns found

 like

 execute('SELECT ' INTO ... USING ...)?

 it's obscure too.

Yeah, I mean, if the consensus is that we shouldn't change this, then
people will just have to work around it using some other method, like
GET DIAGNOSTICS.  It's not really worth adding a whole separate way of
doing this just to set FOUND.  However, it would be worth documenting
the workaround, because I can see where the OP was left scratching his
head.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] client_lc_messages

2009-10-23 Thread Magnus Hagander
2009/10/23 Alvaro Herrera alvhe...@commandprompt.com:
 Peter Eisentraut wrote:

 Note that only glibc supports switching the language at run time.  And
 doing it is probably very expensive if you want to do it twice per
 message.

 Ouch :-(

 I think you could probably get much of the use case out of this if you
 concentrate on making two switches for the client and the log, which can
 be set to a language or untranslated, and if you choose a language it
 has to be the same for both.

 So we'd go with a single setting to define language, which would be the
 current lc_messages, and two new settings, say translate_log_messages
 and translate_client_messages, the latter being USERSET.

 Does that sound reasonable?

Does to me. It would certainly cover my use-case, which is that unless
your logs are in english, it's really hard to google on anything :-)
(I guess it may work for the big languages like spanish, french and
german, but that's about it. And even then, you're missing out on a
lot of resources)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pre-proposal: type interfaces

2009-10-23 Thread Greg Stark
On Fri, Oct 23, 2009 at 7:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The way I see it, we have two approaches:
  1. Try to make the current system work by standardizing the strategy
 numbers for GiST somehow, and then use the default GiST operator class,
 if available.

 This proposal is a non-starter, because one of the whole points of GIST
 is that it can support multiple sets of semantics.  There is no reason
 at all to assume that every GIST opclass must include operators having
 the semantics of overlaps and to-left-of.


I always thought it was strange that the GIST strategy numbers were
completely meaningless. It does seem like assigning meaning to
strategy numbers gradually as we learn new interrelated indexable
strategies. We would still have a range of values for new non-standard
semantics, but at least the common ones would be nailed down.

I'm not sure that solves the problem because the default gist
operator class isn't necessarily going to be the one with the
strategies this needs, though I suppose normally people would want to
make the default operator class for each data type one which supports
the most strategies.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pre-proposal: type interfaces

2009-10-23 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 I always thought it was strange that the GIST strategy numbers were
 completely meaningless. It does seem like assigning meaning to
 strategy numbers gradually as we learn new interrelated indexable
 strategies. We would still have a range of values for new non-standard
 semantics, but at least the common ones would be nailed down.

Well, the problem with that is that GIST strategy numbers are
historically an internal implementation detail for any particular
opclass, and so trying to standardize them now is going to mean lots
of incompatibility with third-party opclasses.

I'd feel more comfortable with being able to add some flags to an
opclass (or more likely an opfamily) that assert that its strategy
numbers agree with some convention or other.  Maybe a bitmap so
that there's room for multiple future conventions of this kind?
For instance it's certainly conceivable that a GIST class could
support both btree-compatible and overlaps operators.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] table corrupted

2009-10-23 Thread João Eugenio Marynowski
2009/10/23 Robert Haas robertmh...@gmail.com

  Well, that's not really the problem.  Your data is corrupted -
 increasing the index row size is not going to fix it.

 I'm not really knowledgeable enough about the guts of the database to
 know whether there are lower-level tools that could be used to rescue
 your data.  I wonder if you'd have any luck selecting data a few rows
 at a time (LIMIT 100, say, without ORDER BY).  That might at least
 enable you to get some of the data out of there, if there are some
 pages that are undamaged.  But I'm grasping at straws here.

 ...Robert

I ask about the index row size because I can't re-index the database and
I've a server for tests and in this I removed the pk and can't recreate the
index because it showing error about size row limit indices.
And, only occurs erros when you run a query involving the records damaged.
I'm trying to identify them (less of 1% of the total registers).


Re: [HACKERS] pre-proposal: type interfaces

2009-10-23 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 I'm not sure that solves the problem because the default gist
 operator class isn't necessarily going to be the one with the
 strategies this needs,

Forgot to mention: I do not think default-ness of opclasses enters
into it at all.  The meaning of the query is fully defined by the
operator that is in it.  All we need to know is what are the
semantics of that operator.  If we can find it in the overlaps
position of *any* opclass, we are entitled to suppose that it
behaves like overlaps and the associated left-of operator can be
used to optimize it.  Conceivably we could get different left-of
operators out of different opclasses, but if they don't behave
effectively the same, the user has messed up the opclasses.

As an example, suppose we are trying to implement DISTINCT via
sort-and-unique, and we've chosen an appropriate '=' operator that
defines what distinct-ness means.  That operator might appear in more
than one opclass having more than one sort order, but it doesn't matter
which one we choose.  As long as the opclasses are self-consistent,
we should get correct answers with any one.

The case where default-ness of opclasses matters is where we are
trying to assign specific meaning to some generic construct like
DISTINCT or ORDER BY.  For instance, it makes sense to require
that ORDER BY be interpreted by reference to a default opclass,
because otherwise you don't have a way to know which sort ordering
the user wants.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pre-proposal: type interfaces

2009-10-23 Thread Jeff Davis
On Fri, 2009-10-23 at 16:25 -0400, Tom Lane wrote:
 Forgot to mention: I do not think default-ness of opclasses enters
 into it at all.  The meaning of the query is fully defined by the
 operator that is in it.  All we need to know is what are the
 semantics of that operator.  If we can find it in the overlaps
 position of *any* opclass, we are entitled to suppose that it
 behaves like overlaps and the associated left-of operator can be
 used to optimize it.

Interesting, that sounds we've got a good approach to the problem now.
This thread has been useful.

 Conceivably we could get different left-of
 operators out of different opclasses, but if they don't behave
 effectively the same, the user has messed up the opclasses.

It would probably be worthwhile to make an attempt to throw a useful
error there, but I agree it's not really a problem.

 The case where default-ness of opclasses matters is where we are
 trying to assign specific meaning to some generic construct like
 DISTINCT or ORDER BY.  For instance, it makes sense to require
 that ORDER BY be interpreted by reference to a default opclass,
 because otherwise you don't have a way to know which sort ordering
 the user wants.

That makes sense.

Thanks,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pre-proposal: type interfaces

2009-10-23 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Fri, 2009-10-23 at 16:25 -0400, Tom Lane wrote:
 Conceivably we could get different left-of
 operators out of different opclasses, but if they don't behave
 effectively the same, the user has messed up the opclasses.

 It would probably be worthwhile to make an attempt to throw a useful
 error there, but I agree it's not really a problem.

Sure, right after we solve the halting problem ;-).  The point I
was trying to make is that getting different operators is not wrong.
It's only wrong if their behavior isn't consistent with what the
opclass asserts, and there's no practical way to determine that.
We have to trust the opclass maker.  (This is one of the main
reasons why CREATE OPERATOR CLASS is superuser-only.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per table random-page-cost?

2009-10-23 Thread Josh Berkus
Cedric,

 ase is a table containing 29 GB of bytea in a database of 52 GB. Every row 
 on the 29GB table is grab only few times. And it will just renew OS cache 
 memory  every time (the server have only 8GB of ram).
 So when I remove this table (not the index) from the OS cache memory,  I keep 
 more interesting blocks in the OS cache memory.

effective_cache_size doesn't control what gets cached, it just tells the
planner about it.

Now, if we had an OS which could be convinced to handle caching
differently for different physical devices, then I could see wanting
this setting to be per-tablespace.  For example, it would make a lot of
sense not to FS-cache any data which is on a ramdisk or superfast SSD
array.  The same with archive data which you expected to be slow and
infrequently accessed on a NAS device.  If your OS can do that, while
caching data from other sources, then it would make sense.

However, I don't know any current OS which allows for this.  Does anyone
else?

--Josh Berkus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Statement Level Deferred Triggers

2009-10-23 Thread Andres Freund
Hi all,

I would like to hear some opinions before starting to take a stab at 
implementing $subject.
My current use case is updating materialized views at the end of the 
transaction so that they appear consistent to the outside.
Updating them on every row changed is far too expensive - so every change is 
logged and depending on the size of the changeset the views are recomputed 
completely or incrementally.
Currently this is hacked up by using deferred constraint triggers - which are 
row level only... (i.e. the first run trigger deletes the whole changelog so 
that all following triggers have nearly nothing to do).
This is neither nice from an architectural point nor from an performance 
angle.

I am sure most of you can think of other use cases.

Opinions? 

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] table corrupted

2009-10-23 Thread João Eugenio Marynowski
2009/10/23 Dimitri Fontaine dfonta...@hi-media.com

 João Eugenio Marynowski joa...@gmail.com writes:
  2009-10-16 16:11:47 BRT 192.168.0.29 ERROR:  invalid page header in
  block 462821 of relation ...

 It could be that the following will prove helpful:
  http://archives.postgresql.org/pgsql-general/2007-07/msg00506.php
  http://pgsql.tapoueh.org/site/html/news/20080410.badblock.html

 Regards,
 --
 dim

I thought this would solve my problems but not yet ...
the block that I found is not presenting error...
 find_bad_block

 (514708,16)
(1 registro)
blockId / blocks_per_chunk = 514708 / 131072 = 3,9269104
base/21058/21271.3
chunk_block_id = block_id % blocks_per_chunk = 514708 % 131072 = 121492
is attached the result of pg_filedump -if -R 121492
/var/lib/postgresql/8.2/main/base/21058/21271.3
Even then I filled the block with zero
But when I run again the find_bad_block it show one block before, ctid
514707, and then, 514706 and I stop...
Any suggestion and explication?

--
João Eugenio


pg_filedump-ctid-514708
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement Level Deferred Triggers

2009-10-23 Thread Robert Haas
On Fri, Oct 23, 2009 at 7:29 PM, Andres Freund and...@anarazel.de wrote:
 Hi all,

 I would like to hear some opinions before starting to take a stab at
 implementing $subject.
 My current use case is updating materialized views at the end of the
 transaction so that they appear consistent to the outside.
 Updating them on every row changed is far too expensive - so every change is
 logged and depending on the size of the changeset the views are recomputed
 completely or incrementally.
 Currently this is hacked up by using deferred constraint triggers - which are
 row level only... (i.e. the first run trigger deletes the whole changelog so
 that all following triggers have nearly nothing to do).
 This is neither nice from an architectural point nor from an performance
 angle.

 I am sure most of you can think of other use cases.

 Opinions?

This same use case has arisen for me in the past, with slightly
different details.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] misleading comments in pgbench

2009-10-23 Thread Jeff Janes
From contrib/pgbench/pgbench.c starting in revision 1.77

 * Note: TPC-B requires at least 100 bytes per row, and the filler
 * fields in these table declarations were intended to comply with that.
 * But because they default to NULLs, they don't actually take any
 * space.  We could fix that by giving them non-null default values.
 * However, that would completely break comparability of pgbench
 * results with prior versions.  Since pgbench has never pretended
 * to be fully TPC-B compliant anyway, we stick with the historical
 * behavior.

The statement about NULLs applies to the other 3 tables, but
pgbench_accounts.filler is loaded (via COPY) as empty strings, not as
NULLs.  When stored into char(84), the empty string takes the full
specified storage.  So on my system, rows in pgbench_accounts take up
about 130 bytes, rather than than the about 40 they would with a NULL.

The behavior probably won't be changed, but the code comment probably
should be.  Sorry if this seems like picayune thing, but it led to a
bit of head scratching until I figured it out.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers