Re: [HACKERS] (WIP) VACUUM REWRITE - CLUSTER by ctid
Itagaki Takahiro wrote: I'm working on alternative version of VACUUM FULL, which is like CLUSTER but sort tuples in ctid order without index. The original discussion is here: [HACKERS] Feedback on getting rid of VACUUM FULL http://archives.postgresql.org/pgsql-hackers/2009-09/msg01047.php WIP patch attached. I have some questions over the development: 1. Syntax: I choose CLUSTER tbl WITHOUT INDEX for the syntax, but it is debatable. What syntax is the best? VACUUM REWRITE? CLUSTER ORDER BY ctid? or replace VACUUM FULL? I got the impression that replacing VACUUM FULL is the most popular opinion. I like VACUUM REWRITE myself, except that it would require making REWRITE a reserved keyword. I don't like tacking this onto CLUSTER, particularly not with ORDER BY ctid. ctids are an implementation detail most users don't care about, and ORDER BY sounds like it's sorting something, but it's not. 2. Superclass of HeapScanDesc and IndexScanDesc: We don't have an abstraction layer of HeapScanDesc and IndexScanDesc, but the layer is useful for this purpose. Is it reasonable? It is partially implemented as genam_beginscan() family in the patch. I don't think it's really necessary. You might as well put a few if (indexOid) clauses directly into copy_heap_data. 3. Should we allow ctid as a default clustered index? We could assume ctid as a virtual index. The syntax for it might be ALTER TABLE tbl CLUSTER ON COLUMN ctid or so. Isn't that the same as having no clustering index? We already have ALTER TABLE tbl SET WITHOUT CLUSTER. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql EXECUTE will not set FOUND
Tom Lane t...@sss.pgh.pa.us writes: Dimitri Fontaine dfonta...@hi-media.com writes: But it will set GET DIAGNOSTIC ... = ROW_COUNT, am I being told on IRC. This has been discussed before, please read archives. The thread I found is pretty content free as far as answering my question goes. http://archives.postgresql.org/pgsql-general/2009-06/msg00208.php I'll go search for more, meantime I'll just add the main goal of this new thread is to have -hackers know there is a real user demand for having EXECUTE set FOUND the same way it sets GET DIAGNOSTIC. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Controlling changes in plpgsql variable resolution
On Tue, 2009-10-20 at 10:32 -0400, Tom Lane wrote: That's only sane if you are 100% certain that there could not be a security issue arising from the change of behavior. Otherwise someone could for instance subvert a security-definer function by running it under the setting it wasn't written for. Personally I am not 100% certain of that. This is pretty much equivalent to the issue of setting search_path for security definer functions. Such settings that affect the semantics of code should be overridden to a know safe value by such functions. Still, making the setting SUSET will be very impractical. The superuser will have to ensure that all current and future functions on the installation conform to the setting that is chosen. This is pretty much impossible. The other choice is to uglify every function with an explicit setting. So in practice the only convenient and robust choice is to leave it as is. The correct solution, IMO, is that security definer functions need to save the set of deemed-security-critical parameters automatically at function creation time. Since we don't do that currently, I'd guess that 50% of security definer functions out there are still unsafe about search_path. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] table corrupted
João Eugenio Marynowski joa...@gmail.com writes: 2009-10-16 16:11:47 BRT 192.168.0.29 ERROR: invalid page header in block 462821 of relation ... It could be that the following will prove helpful: http://archives.postgresql.org/pgsql-general/2007-07/msg00506.php http://pgsql.tapoueh.org/site/html/news/20080410.badblock.html Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
Marc Munro wrote: Here is a typical veil secured view definition: create view parties as SELECT party_id, client_id, party_type_id, username, party_name FROM parties.parties WHERE api.user_has_client_or_personal_privilege(client_id, party_id, 'select parties') OR api.user_has_client_privilege(party_id, 'select clients'); A typical query against this would be: select * from parties where party_id = 42; The conditions in the view's where clause cannot generally be indexed. Applying those conditions before the user-supplied conditions would mean that a full-table scan would be required and performance would suck. In fact, this very suckiness also exposes a covert channel in that now we can use the performance of the query to estimate the number of party records. I'm not too worried about that as anyone can do SELECT reltuples FROM pg_class where relname='parties'. An index-scan, however, would let you estimate the number of records with party_id=42, whether or not you have privileges on them, by timing how long the query takes. I don't think we can avoid that. The most acceptable solution I have heard so far for this issue, is to identify those functions which can leak information as 'insecure', and those views which are for security purpose as 'secured'. Then it is simply (hah!) a matter of planning the query of secured views so that all insecure functions are called after all secure functions. In this way, they will only be able to leak what the user is entitled to see, and performance will only be as badly affected as is strictly necessary. The most user-friendly and backwards-compatible (though not necessarily back-patchable) approach I can see is: 1. If the user has read access to all the underlying tables, plan it like we do today. 2. If the view refers only one table (as a typical Veil view does), plan it like we do today but enforce that view conditions are evaluated first in the Filter. Notably, allow using any user-supplied conditions as index quals if there's a matching index. 3. Otherwise fully materialize the view. I expect 1 2 to catch most real-life scenarios. 1 ensures that we don't degrade performance of (potentially very complex) views that are not used for access control, and 2 should apply to most views used for row-level access control. Note that if you have an access-control view that has a join, you can often split it into two access-control views over the underlying tables with another view that joins the two views. Such a construct would be quite well Tom didn't like the idea of performing permission checks at plan time, because they're currently always done at execution time. If we plan a query on the assumption that you have access to the underlying table, I think we could perform permission checks on the underlying tables at runtime to check that the user still has access. It would mean that a previously prepared query might start to fail with permission denied errors if someone revokes your privileges on the tables, but I think we could live with that. Plan invalidation would cover the case where permissions on the tables are changed, but would not cover indirect cases like removing the user from a group that had the required permissions. Whatever we do, we probably should add a caveat in the manual that EXPLAIN or timing attacks can let you deduce some (statistical) information you won't have direct access to. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
On Fri, 2009-10-23 at 11:30 +0300, Heikki Linnakangas wrote: The most user-friendly and backwards-compatible (though not necessarily back-patchable) approach I can see is: 1. If the user has read access to all the underlying tables, plan it like we do today. For me, it would make most sense to explicitly mark Views as being security views. That way planning need only change when we are optimizing a query that accesses a view with plan security enabled. ALTER VIEW foo ENABLE PLAN SECURITY; That is much clearer and easily to verify/audit, so more secure. Also, we should presume that any function created with SECURITY DEFINER and created by a superuser would have plan security, so we don't need to annotate lots of old code to work securely. Annotating the built-in functions is a lot easier. 2. If the view refers only one table (as a typical Veil view does), plan it like we do today but enforce that view conditions are evaluated first in the Filter. Notably, allow using any user-supplied conditions as index quals if there's a matching index. 3. Otherwise fully materialize the view. So if we join a normal table or a view to a secure view then only the secure view part would be materialized? Or do you mean the whole query would be materialized? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
2009/10/23 Simon Riggs si...@2ndquadrant.com: On Fri, 2009-10-23 at 11:30 +0300, Heikki Linnakangas wrote: The most user-friendly and backwards-compatible (though not necessarily back-patchable) approach I can see is: 1. If the user has read access to all the underlying tables, plan it like we do today. For me, it would make most sense to explicitly mark Views as being security views. That way planning need only change when we are optimizing a query that accesses a view with plan security enabled. ALTER VIEW foo ENABLE PLAN SECURITY; +1 Pavel That is much clearer and easily to verify/audit, so more secure. Also, we should presume that any function created with SECURITY DEFINER and created by a superuser would have plan security, so we don't need to annotate lots of old code to work securely. Annotating the built-in functions is a lot easier. 2. If the view refers only one table (as a typical Veil view does), plan it like we do today but enforce that view conditions are evaluated first in the Filter. Notably, allow using any user-supplied conditions as index quals if there's a matching index. 3. Otherwise fully materialize the view. So if we join a normal table or a view to a secure view then only the secure view part would be materialized? Or do you mean the whole query would be materialized? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] per table random-page-cost?
Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit : On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote: On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain cedric.villem...@dalibo.com wrote: Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit : On Mon, Oct 19, 2009 at 2:08 PM, marcin mank marcin.m...@gmail.com wrote: Currently random_page_cost is a GUC. I propose that this could be set per-table. Or per-tablespace. Yes, I think there are a class of GUCs which describe the physical attributes of the storage system which should be per-table or per-tablespace. random_page_cost, sequential_page_cost, effective_io_concurrency come to mind. and, perhaps effective_cache_size. You can have situation where you don't want some tables go to OS memory (you can disabled that at filesystem level, ... l'd like to be able to do that at postgres level but it is another point) So you put those tables in a separate tablespace, and tell postgresql that the effective_cache_size is 0 (for this tablespace), up to postgres to do the right thing with that ;) Why would you ever want to set effective_cache_size to 0? I think this is a misunderstanding of how effective_cache_size works. I can't think of any reason to do that. I could see a reason to tell the OS to not throw a relation into cache but that is a different thing. Well the effective_cache_size in this context is OS cache memory (0 in my case) + estimation of shared_buffer.. ah so DBA should estimate the amount in the shared_buffer only, ok. So consider effective_cache_size = 0 + what pg_buffer_cache will tell. My case is a table containing 29 GB of bytea in a database of 52 GB. Every row on the 29GB table is grab only few times. And it will just renew OS cache memory every time (the server have only 8GB of ram). So when I remove this table (not the index) from the OS cache memory, I keep more interesting blocks in the OS cache memory. And disk + raid are quick enought to bypass the OS cache memory for this tablespace. Are things a bit clearer and usage not so silly ? Joshua D. Drake ...Robert -- Cédric Villemain Administrateur de Base de Données Cel: +33 (0)6 74 15 56 53 http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Using views for row-level access control is leaky
Simon Riggs wrote: On Fri, 2009-10-23 at 11:30 +0300, Heikki Linnakangas wrote: The most user-friendly and backwards-compatible (though not necessarily back-patchable) approach I can see is: 1. If the user has read access to all the underlying tables, plan it like we do today. For me, it would make most sense to explicitly mark Views as being security views. That way planning need only change when we are optimizing a query that accesses a view with plan security enabled. ALTER VIEW foo ENABLE PLAN SECURITY; That is much clearer and easily to verify/audit, so more secure. The point is that only owner knows intention of the view correctly. It seems to me reasonable to explicitly mark whether the major purpose of view is security, or not. Also, we should presume that any function created with SECURITY DEFINER and created by a superuser would have plan security, so we don't need to annotate lots of old code to work securely. Annotating the built-in functions is a lot easier. Sorry, what is happen if function is marked as plan security? Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
On Fri, 2009-10-23 at 19:38 +0900, KaiGai Kohei wrote: Also, we should presume that any function created with SECURITY DEFINER and created by a superuser would have plan security, so we don't need to annotate lots of old code to work securely. Annotating the built-in functions is a lot easier. Sorry, what is happen if function is marked as plan security? I was suggesting an intelligent default by which we could determine function marking implicitly, if it was not explicitly stated on the CREATE FUNCTION. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
Simon Riggs wrote: On Fri, 2009-10-23 at 19:38 +0900, KaiGai Kohei wrote: Also, we should presume that any function created with SECURITY DEFINER and created by a superuser would have plan security, so we don't need to annotate lots of old code to work securely. Annotating the built-in functions is a lot easier. Sorry, what is happen if function is marked as plan security? I was suggesting an intelligent default by which we could determine function marking implicitly, if it was not explicitly stated on the CREATE FUNCTION. How to handle a (corner) case when the function owner was changed to non privileged user and its definition is replaced later? Even if someone malicious gives leakage condition on the view, possible leakable infotmation is restricted to where the owner of view can access. So, it seems to me the security mark on views by owner are sufficient. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SE-PgSQL developer documentation (Re: Reworks for Access Control facilities (r2363))
KaiGai Kohei wrote: After the long trial-and-errors, we learned a few approaches which use common entry points for both of DAC and MAC were rocky-path more than what we initially imagined. So, we came back to the original design. It deploys MAC hooks on the strategic points of core routines. On the other hand, people complained about this approach without clear documentation, because most of people are not familiar to both of SELinux and PgSQL. Heikki suggested that a clear developer documentation should be provided to understand pgsql-hackers this new concept. (And, Peter has also suggested before a developer documentation will be a good source of user documentations.) I plan to submit SE-PgSQL/lite patch with developer documentations on the next commit-fest. I can understand what I want to develop and the purpose of codes. However, it may not match with what you want to know. So, I'd like to ask what should be included within the developer documentation at first prior to making a documentation. I plans the developer documentation should be put as a REAME file, not a SGML documentation or a certain wiki page. And I think it should contain the following items. * overview - general overview of SE-PgSQL - introduction of SELinux specific terms (such as security context) * internal architecture - the purpose of sub-components (such as management of security context, caches of access control decision and so on) - differences from similar permissions in DAC * object classes and permissions defined in SELinux model - list of them and when/where they should be checked. * specification of SE-PgSQL hooks (It should be put on the source code comments for easy maintenance.) - what this hook does, what arguments are required, what result will be returned. * code examples - a few examples to add MAC checks within 3rd party modules. Do you have any comments? What should be added to? or removed from? I guess it was too abstract to suggest anything. Anyway, I'll begin to describe the developer documentation base on the chapters. If necessary, we can fix it up later. In addition, I determined the initial patch only covers access controls on the four object classes (database, schema, table and column) to reduce burdens of reviewing. We also can add support for other object classes (such as procedure) later. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
Simon Riggs wrote: Also, we should presume that any function created with SECURITY DEFINER and created by a superuser would have plan security, so we don't need to annotate lots of old code to work securely. Annotating the built-in functions is a lot easier. SECURITY DEFINER is an orthogonal aspect. Consider something as innocent-looking as this: CREATE FUNCTION secdeffunc(text) RETURNS boolean AS $$ SELECT $1::integer 10; $$ LANGUAGE SQL SECURITY DEFINER; The text-to-integer cast throws an error and reveals the argument as someone pointed out earlier in this thread. Creating such a function shouldn't open the door to information leaks in views elsewhere. The most useful automatic annotation I can see is to treat functions implementing B-tree operators as safe. I *think* that's safe, anyway. 2. If the view refers only one table (as a typical Veil view does), plan it like we do today but enforce that view conditions are evaluated first in the Filter. Notably, allow using any user-supplied conditions as index quals if there's a matching index. 3. Otherwise fully materialize the view. So if we join a normal table or a view to a secure view then only the secure view part would be materialized? Or do you mean the whole query would be materialized? Just the secure view. Materializing the result of the overall query wouldn't help. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
Rod Taylor wrote: This still allow many optimizations to be applied in complex cases. The planner CREATE VIEW phone_number AS SELECT person, phone, company FROM phone_data USING SECURITY FILTER(phone NOT LIKE '6%') JOIN person USING (person_id) JOIN company USING (company_id) AND person.active AND company.active; Well, you can also achieve that by creating two views, one to hide the sensitive data and another to do the join: CREATE VIEW not6_numbers AS SELECT phone FROM phone_data WHERE phone NOT LIKE '6%'; CREATE VIEW phone_number AS SELECT person, phone, company FROM not6_numbers JOIN person USING (person_id) JOIN company USING (company_id) WHERE person.active AND company.active; So I don't think we should invent new syntax for that. The 1st view would be marked with SECURE if we end up using that explicit annotation in CREATE VIEW. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
Heikki Linnakangas wrote: The most useful automatic annotation I can see is to treat functions implementing B-tree operators as safe. I *think* that's safe, anyway. Index lookups and single-type comparisons were the only things I could come up with as safe. Unless there is some way to generate an error from geometric ops (overflow or some such). Anything involving a type-cast can obviously be finessed. If you allow arithmetic then you could trigger an overflow or divide-by-zero error. Hmm - you can probably do something evil with non-UTF8 characters if you allow string operations. Would string comparisons be safe (because a literal would be caught before the view gets evaluated)? -- Richard Huxton Archonet Ltd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] per table random-page-cost?
On Fri, Oct 23, 2009 at 5:23 AM, Cédric Villemain cedric.villem...@dalibo.com wrote: Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit : On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote: On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain cedric.villem...@dalibo.com wrote: Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit : On Mon, Oct 19, 2009 at 2:08 PM, marcin mank marcin.m...@gmail.com wrote: Currently random_page_cost is a GUC. I propose that this could be set per-table. Or per-tablespace. Yes, I think there are a class of GUCs which describe the physical attributes of the storage system which should be per-table or per-tablespace. random_page_cost, sequential_page_cost, effective_io_concurrency come to mind. and, perhaps effective_cache_size. You can have situation where you don't want some tables go to OS memory (you can disabled that at filesystem level, ... l'd like to be able to do that at postgres level but it is another point) So you put those tables in a separate tablespace, and tell postgresql that the effective_cache_size is 0 (for this tablespace), up to postgres to do the right thing with that ;) Why would you ever want to set effective_cache_size to 0? I think this is a misunderstanding of how effective_cache_size works. I can't think of any reason to do that. I could see a reason to tell the OS to not throw a relation into cache but that is a different thing. Well the effective_cache_size in this context is OS cache memory (0 in my case) + estimation of shared_buffer.. ah so DBA should estimate the amount in the shared_buffer only, ok. So consider effective_cache_size = 0 + what pg_buffer_cache will tell. My case is a table containing 29 GB of bytea in a database of 52 GB. Every row on the 29GB table is grab only few times. And it will just renew OS cache memory every time (the server have only 8GB of ram). So when I remove this table (not the index) from the OS cache memory, I keep more interesting blocks in the OS cache memory. And disk + raid are quick enought to bypass the OS cache memory for this tablespace. Are things a bit clearer and usage not so silly ? Well, I think you're vastly overestimating the power of effective_cache_size. effective_cache_size changes the planner's estimation of how likely a repeated partial index scan is to find the same block in cache. So it only affects nested-loop-with-inner-indexscan plans, and if effective_cache_size is set to a value larger than the size of the index (or maybe the relation, I'm too lazy to go reread the code right now), one value is as good as another. For a typical user, I think you could set effective_cache_size to, say, a terabyte, and it wouldn't make a bit of difference. Heck, why not 2TB. As far as I can see, the only possible value of setting this knob to a value other than positive-infinity is that if you have a huge dataset that's not close to fitting in memory, this might cause the planner to pick a merge join over a nested loop with inner indexscan, which might be better if it makes the I/O sequential rather than random. Anyone think I'm a pessimist? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] per table random-page-cost?
Le vendredi 23 octobre 2009 14:23:09, Robert Haas a écrit : On Fri, Oct 23, 2009 at 5:23 AM, Cédric Villemain cedric.villem...@dalibo.com wrote: Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit : On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote: On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain cedric.villem...@dalibo.com wrote: Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit : On Mon, Oct 19, 2009 at 2:08 PM, marcin mank marcin.m...@gmail.com wrote: Currently random_page_cost is a GUC. I propose that this could be set per-table. Or per-tablespace. Yes, I think there are a class of GUCs which describe the physical attributes of the storage system which should be per-table or per-tablespace. random_page_cost, sequential_page_cost, effective_io_concurrency come to mind. and, perhaps effective_cache_size. You can have situation where you don't want some tables go to OS memory (you can disabled that at filesystem level, ... l'd like to be able to do that at postgres level but it is another point) So you put those tables in a separate tablespace, and tell postgresql that the effective_cache_size is 0 (for this tablespace), up to postgres to do the right thing with that ;) Why would you ever want to set effective_cache_size to 0? I think this is a misunderstanding of how effective_cache_size works. I can't think of any reason to do that. I could see a reason to tell the OS to not throw a relation into cache but that is a different thing. Well the effective_cache_size in this context is OS cache memory (0 in my case) + estimation of shared_buffer.. ah so DBA should estimate the amount in the shared_buffer only, ok. So consider effective_cache_size = 0 + what pg_buffer_cache will tell. My case is a table containing 29 GB of bytea in a database of 52 GB. Every row on the 29GB table is grab only few times. And it will just renew OS cache memory every time (the server have only 8GB of ram). So when I remove this table (not the index) from the OS cache memory, I keep more interesting blocks in the OS cache memory. And disk + raid are quick enought to bypass the OS cache memory for this tablespace. Are things a bit clearer and usage not so silly ? Well, I think you're vastly overestimating the power of effective_cache_size. effective_cache_size changes the planner's estimation of how likely a repeated partial index scan is to find the same block in cache. So it only affects nested-loop-with-inner-indexscan plans, and if effective_cache_size is set to a value larger than the size of the index (or maybe the relation, I'm too lazy to go reread the code right now), one value is as good as another. For a typical user, I think you could set effective_cache_size to, say, a terabyte, and it wouldn't make a bit of difference. Heck, why not 2TB. Ok. I don't care too much on this parameter so. As we were talking about some parameters that can be tablespace specific, I thought this one can have different values too. As far as I can see, the only possible value of setting this knob to a value other than positive-infinity is that if you have a huge dataset that's not close to fitting in memory, this might cause the planner to pick a merge join over a nested loop with inner indexscan, which might be better if it makes the I/O sequential rather than random. Anyone think I'm a pessimist? ...Robert -- Cédric Villemain Administrateur de Base de Données Cel: +33 (0)6 74 15 56 53 http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] EvalPlanQual seems a tad broken
I wrote: On further review it seems that a better way to do this is to make things happen inside the EPQ machinery. We need to freeze the rows returned by *all* scan nodes, not only the ones referencing real tables --- for example, a join against a VALUES scan node would still be a problem if we don't lock down the VALUES output, since we could end up getting multiple join rows out. This means we can't assume that there is a CTID associated with every scan node that EPQ needs to lock down. What looks like it would work instead is to pass through the current scan tuple for every scan plan node, not only the ones that are FOR UPDATE targets. I'm tempted to try to move the responsibility for this into execScan.c instead of having all the individual scan plan types know about it. What I had been thinking of when I wrote that was to pass down the ScanTupleSlots from the outer query's scan nodes. That codes up nicely but doesn't work at all :-(. As is obvious in hindsight, the scan nodes are not necessarily still returning the same tuples that contribute to the current join tuple --- for instance if you have a sort-and-mergejoin type of plan, all the scans will be at EOF by the time the top level sees any tuples. So we need to be able to recover the original scan tuples from the join tuple, even for scans that are not to be locked. For real tables this isn't hard, we can pass up the CTID as a junk column the same as we do for tables that are to be locked. It's harder for non-table scans though (VALUES, functions, etc). I can see two conceivably workable alternatives: 1. Pass up the entire row value as a junk whole-row Var. 2. Invent some equivalent to CTID that would allow the row to be recovered later --- for instance, the row number in a tuplestore. One problem with this is that not all those scan types use a tuplestore now, so we'd be adding significant overhead. Also, I'm not entirely sure that it can work for scan nodes that get reset and rescanned repeatedly. Some of them clear and refill the tuplestore when they do that, and the refill isn't necessarily the same row values. Fortunately, this case probably doesn't arise that much in practice, so while it needs to work I doubt that performance is critical. I'm planning to try alternative #1 next, but I wonder if anyone has a better idea? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql EXECUTE will not set FOUND
Dimitri Fontaine dfonta...@hi-media.com writes: I'll go search for more, meantime I'll just add the main goal of this new thread is to have -hackers know there is a real user demand for having EXECUTE set FOUND the same way it sets GET DIAGNOSTIC. [shrug...] There is also real user demand for not silently breaking code that works now, which is what we risk anytime we change the set of statements that can set FOUND. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
Simon Riggs si...@2ndquadrant.com writes: On Fri, 2009-10-23 at 19:38 +0900, KaiGai Kohei wrote: Sorry, what is happen if function is marked as plan security? I was suggesting an intelligent default by which we could determine function marking implicitly, if it was not explicitly stated on the CREATE FUNCTION. The thought that's been in the back of my mind is that you could solve 99% of the performance problem if you trusted all builtin functions and nothing else. This avoids the question of who gets to mark functions as trustable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql EXECUTE will not set FOUND
On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: Dimitri Fontaine dfonta...@hi-media.com writes: I'll go search for more, meantime I'll just add the main goal of this new thread is to have -hackers know there is a real user demand for having EXECUTE set FOUND the same way it sets GET DIAGNOSTIC. [shrug...] There is also real user demand for not silently breaking code that works now, which is what we risk anytime we change the set of statements that can set FOUND. We've had this discussion before and I'm still unpersuaded by your position. I *never* write IF FOUND THEN except immediately after the statement where I expect that variable to be set, and I submit that anyone who who does write code that relies on certain statements not setting FOUND is, IMO, depending on a bug. We don't and shouldn't have a policy of making future PostgreSQL releases bug-compatible with previous releases. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
On Fri, Oct 23, 2009 at 10:04:29AM -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Fri, 2009-10-23 at 19:38 +0900, KaiGai Kohei wrote: Sorry, what is happen if function is marked as plan security? I was suggesting an intelligent default by which we could determine function marking implicitly, if it was not explicitly stated on the CREATE FUNCTION. The thought that's been in the back of my mind is that you could solve 99% of the performance problem if you trusted all builtin functions and nothing else. This avoids the question of who gets to mark functions as trustable. Great idea! One of the things the security community has learned is that the only way it's even possible to get an information leak rate of zero is to have a system which does nothing at all. It's a fact we need to bear in mind when addressing this or any other issue of access control. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql EXECUTE will not set FOUND
Robert Haas wrote: On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: Dimitri Fontaine dfonta...@hi-media.com writes: I'll go search for more, meantime I'll just add the main goal of this new thread is to have -hackers know there is a real user demand for having EXECUTE set FOUND the same way it sets GET DIAGNOSTIC. [shrug...] There is also real user demand for not silently breaking code that works now, which is what we risk anytime we change the set of statements that can set FOUND. We've had this discussion before and I'm still unpersuaded by your position. I *never* write IF FOUND THEN except immediately after the statement where I expect that variable to be set, and I submit that anyone who who does write code that relies on certain statements not setting FOUND is, IMO, depending on a bug. We don't and shouldn't have a policy of making future PostgreSQL releases bug-compatible with previous releases. I agree that doing that is bad practice. I disagree that it's a bug. And if it is, and we change it, then locating all the places where the bug might occur will be a nightmare. In effect it means you'll need to review every single use of FOUND in your code (possibly hundreds of thousands or millions of lines) to make sure you're not accidentally relying on the behaviour. No thanks. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-proposal: type interfaces
Jeff Davis pg...@j-davis.com writes: For instance, one feature that I'm considering now is a temporal join which is a join on overlaps rather than equals, e.g.: SELECT * FROM a, b WHERE a.x b.x; I might try to provide a modified merge join algorithm to implement that more efficiently in some cases. I don't mean to discuss the feature in detail here (I will make a separate proposal) but the algorithm requires that I find the strictly left of operator. Well, actually you need *two* things. The first prerequisite is to know that the operator named has the semantics of overlaps in some generalized sense. The second prerequisite is to identify the associated strictly left of operator. As you say, we've done this in the past using operator classes. That's worked reasonably well because what the existing code needs to know about is equality, ordering, and hashing, and that all matches up nicely with btree and hash opclasses. The way I see it, we have two approaches: 1. Try to make the current system work by standardizing the strategy numbers for GiST somehow, and then use the default GiST operator class, if available. This proposal is a non-starter, because one of the whole points of GIST is that it can support multiple sets of semantics. There is no reason at all to assume that every GIST opclass must include operators having the semantics of overlaps and to-left-of. 2. Invent a new system, perhaps interfaces, perhaps something else. 3. Use extra flags in CREATE OPERATOR somehow The thing that would require the least amount of invention is to use the operator class/family machinery with a new index type. Perhaps a dummy entry in pg_am would be acceptable. (You would probably create just operator families, with no contained classes, since the only point of a class is to identify the minimum support set for an index and there won't be any indexes.) An alternative is to somehow mark those GIST opclasses that do meet the expectation about operator semantics. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
Tom Lane wrote: The thought that's been in the back of my mind is that you could solve 99% of the performance problem if you trusted all builtin functions and nothing else. This avoids the question of who gets to mark functions as trustable. Except that all builtin functions are not trustworthy. set_config and int-text cast are two examples mentioned this far, and I'm sure there's a boatload of others. Trusting only index operators seems more and more attractive to me. That won't limit us to built-in datatypes, requires no explicit user action to categorize functions. They're also the most significant functions from a performance point-of-view, allowing use of indexes instead of forcing a seqscan of all tables. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql EXECUTE will not set FOUND
Robert Haas robertmh...@gmail.com writes: On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: [shrug...] There is also real user demand for not silently breaking code that works now, which is what we risk anytime we change the set of statements that can set FOUND. We've had this discussion before and I'm still unpersuaded by your position. I *never* write IF FOUND THEN except immediately after the statement where I expect that variable to be set, and I submit that anyone who who does write code that relies on certain statements not setting FOUND is, IMO, depending on a bug. We don't and shouldn't have a policy of making future PostgreSQL releases bug-compatible with previous releases. This position is nonsense for two reasons: 1. It can hardly be considered a bug that FOUND is set only by the statements that the documentation specifically states are the only ones it is set by. 2. In order to use FOUND *at all* you must assume that it has got some amount of stability. IF FOUND is already assuming that the IF statement didn't reset the flag before evaluating the expression. Lots of other perfectly reasonable constructions assume that FOUND will stay stable across no op statements. Any change here is *not* a bug fix, it is a change of clearly documented and not-obviously-unreasonable behavior. We have to take seriously the likelihood that it will break existing code. If there were a way to flag such breakage I would be happier about changing it; but as Andrew already noted, there doesn't seem to be any way to find affected code except painful manual review. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql EXECUTE will not set FOUND
Tom Lane t...@sss.pgh.pa.us wrote: Any change here is *not* a bug fix, it is a change of clearly documented and not-obviously-unreasonable behavior. We have to take seriously the likelihood that it will break existing code. Perhaps plpgsql could support tests of SQLSTATE, and recognize '02000' (the standard value for zero rows affected) to support the desired new semantics? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
David Fetter da...@fetter.org wrote: One of the things the security community has learned is that the only way it's even possible to get an information leak rate of zero is to have a system which does nothing at all. It's a fact we need to bear in mind when addressing this or any other issue of access control. And to get all old-school about it, I tend to go with the position put forward by Admiral Grace Hopper[1] when I heard her speak at an ACM meeting here. She said that *any* security could be broken, and that the goal should be to put the cost of creating the breach higher for the perpetrators than the benefits which would accrue to them. That informs my perspective, anyway. So, one of the first questions I ask about an information leak is what good would it do someone to know that? So I don't worry too much about someone knowing the size of my database or the number of rows in a table, or for that matter whether county 12 has a 2009GN000317 case or how many party records have a Social Security Number stored. I care very much that the SSN associated with a person or a document flagged as confidential doesn't leak to unauthorized viewers, because that information could benefit someone who obtains it and harm others. Perspective is more important that purity here. -Kevin [1] http://en.wikipedia.org/wiki/Grace_Hopper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] client_lc_messages
Peter Eisentraut wrote: Note that only glibc supports switching the language at run time. And doing it is probably very expensive if you want to do it twice per message. Ouch :-( I think you could probably get much of the use case out of this if you concentrate on making two switches for the client and the log, which can be set to a language or untranslated, and if you choose a language it has to be the same for both. So we'd go with a single setting to define language, which would be the current lc_messages, and two new settings, say translate_log_messages and translate_client_messages, the latter being USERSET. Does that sound reasonable? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql EXECUTE will not set FOUND
On Fri, Oct 23, 2009 at 11:07 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Tom Lane t...@sss.pgh.pa.us wrote: Any change here is *not* a bug fix, it is a change of clearly documented and not-obviously-unreasonable behavior. We have to take seriously the likelihood that it will break existing code. Perhaps plpgsql could support tests of SQLSTATE, and recognize '02000' (the standard value for zero rows affected) to support the desired new semantics? +1 I rarely use found because it's dangerous ...would be nice to have a more rigorous test... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql EXECUTE will not set FOUND
On Fri, Oct 23, 2009 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: [shrug...] There is also real user demand for not silently breaking code that works now, which is what we risk anytime we change the set of statements that can set FOUND. We've had this discussion before and I'm still unpersuaded by your position. I *never* write IF FOUND THEN except immediately after the statement where I expect that variable to be set, and I submit that anyone who who does write code that relies on certain statements not setting FOUND is, IMO, depending on a bug. We don't and shouldn't have a policy of making future PostgreSQL releases bug-compatible with previous releases. This position is nonsense for two reasons: 1. It can hardly be considered a bug that FOUND is set only by the statements that the documentation specifically states are the only ones it is set by. OK, it's not a bug: it's a misfeature. :-) 2. In order to use FOUND *at all* you must assume that it has got some amount of stability. IF FOUND is already assuming that the IF statement didn't reset the flag before evaluating the expression. Lots of other perfectly reasonable constructions assume that FOUND will stay stable across no op statements. Sure. I think there's a big difference between assuming that the word IF (or the intervening semicolon and/or whitespace) did not reset FOUND and assuming that it will not be reset by the execution of a dynamic SQL query. The former is necessary for there to be any conceivable way of using FOUND; the latter is assuming that for some reason we want to treat dynamic SQL queries differently than static ones. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] client_lc_messages
Alvaro Herrera alvhe...@commandprompt.com writes: So we'd go with a single setting to define language, which would be the current lc_messages, and two new settings, say translate_log_messages and translate_client_messages, the latter being USERSET. Does that sound reasonable? How do we get to the point where individual users can choose their message language, though? If LC_MESSAGES stays as SUSET then you haven't really made matters better for anybody. With the above infrastructure, we could get there if there were a way to say LC_MESSAGES is USERSET if translate_log_messages is OFF, but there isn't and I doubt it would be a good idea to try to make it work like that. Maybe we should stick to the original design and just document that you'll take a big performance hit if the settings are different and both not C. And of course make sure we avoid the performance hit otherwise. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using views for row-level access control is leaky
On Fri, 2009-10-23 at 10:04 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Fri, 2009-10-23 at 19:38 +0900, KaiGai Kohei wrote: Sorry, what is happen if function is marked as plan security? I was suggesting an intelligent default by which we could determine function marking implicitly, if it was not explicitly stated on the CREATE FUNCTION. The thought that's been in the back of my mind is that you could solve 99% of the performance problem if you trusted all builtin functions and nothing else. This avoids the question of who gets to mark functions as trustable. That is a very good default. My experience is that those 1% of cases are responsible for 99% of wasted time, so the ability to specify things for user functions is critical. If we make user extensibility second rate we will force solutions to be second rate also. (e.g. where would PostGIS be without type-specific analyze functions?). -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql EXECUTE will not set FOUND
2009/10/23 Robert Haas robertmh...@gmail.com: On Fri, Oct 23, 2009 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: [shrug...] There is also real user demand for not silently breaking code that works now, which is what we risk anytime we change the set of statements that can set FOUND. We've had this discussion before and I'm still unpersuaded by your position. I *never* write IF FOUND THEN except immediately after the statement where I expect that variable to be set, and I submit that anyone who who does write code that relies on certain statements not setting FOUND is, IMO, depending on a bug. We don't and shouldn't have a policy of making future PostgreSQL releases bug-compatible with previous releases. This position is nonsense for two reasons: 1. It can hardly be considered a bug that FOUND is set only by the statements that the documentation specifically states are the only ones it is set by. OK, it's not a bug: it's a misfeature. :-) Isn't this behave shared with PL/SQL? In some environments the dynamic queries are external - so there wasn't possibility to get return state. I afraid so somewhere this feature was extensively used - I dislike this feature too, but I agree with Tom - this is small problem, and it is better do nothing. What about to add new flag to EXECUTE? or create execute function, that returns found like execute('SELECT ' INTO ... USING ...)? it's obscure too. Regards Pavel Stehule 2. In order to use FOUND *at all* you must assume that it has got some amount of stability. IF FOUND is already assuming that the IF statement didn't reset the flag before evaluating the expression. Lots of other perfectly reasonable constructions assume that FOUND will stay stable across no op statements. Sure. I think there's a big difference between assuming that the word IF (or the intervening semicolon and/or whitespace) did not reset FOUND and assuming that it will not be reset by the execution of a dynamic SQL query. The former is necessary for there to be any conceivable way of using FOUND; the latter is assuming that for some reason we want to treat dynamic SQL queries differently than static ones. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql EXECUTE will not set FOUND
On Fri, Oct 23, 2009 at 12:05 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2009/10/23 Robert Haas robertmh...@gmail.com: On Fri, Oct 23, 2009 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 23, 2009 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: [shrug...] There is also real user demand for not silently breaking code that works now, which is what we risk anytime we change the set of statements that can set FOUND. We've had this discussion before and I'm still unpersuaded by your position. I *never* write IF FOUND THEN except immediately after the statement where I expect that variable to be set, and I submit that anyone who who does write code that relies on certain statements not setting FOUND is, IMO, depending on a bug. We don't and shouldn't have a policy of making future PostgreSQL releases bug-compatible with previous releases. This position is nonsense for two reasons: 1. It can hardly be considered a bug that FOUND is set only by the statements that the documentation specifically states are the only ones it is set by. OK, it's not a bug: it's a misfeature. :-) Isn't this behave shared with PL/SQL? In some environments the dynamic queries are external - so there wasn't possibility to get return state. I afraid so somewhere this feature was extensively used - I dislike this feature too, but I agree with Tom - this is small problem, and it is better do nothing. What about to add new flag to EXECUTE? or create execute function, that returns found like execute('SELECT ' INTO ... USING ...)? it's obscure too. Yeah, I mean, if the consensus is that we shouldn't change this, then people will just have to work around it using some other method, like GET DIAGNOSTICS. It's not really worth adding a whole separate way of doing this just to set FOUND. However, it would be worth documenting the workaround, because I can see where the OP was left scratching his head. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] client_lc_messages
2009/10/23 Alvaro Herrera alvhe...@commandprompt.com: Peter Eisentraut wrote: Note that only glibc supports switching the language at run time. And doing it is probably very expensive if you want to do it twice per message. Ouch :-( I think you could probably get much of the use case out of this if you concentrate on making two switches for the client and the log, which can be set to a language or untranslated, and if you choose a language it has to be the same for both. So we'd go with a single setting to define language, which would be the current lc_messages, and two new settings, say translate_log_messages and translate_client_messages, the latter being USERSET. Does that sound reasonable? Does to me. It would certainly cover my use-case, which is that unless your logs are in english, it's really hard to google on anything :-) (I guess it may work for the big languages like spanish, french and german, but that's about it. And even then, you're missing out on a lot of resources) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-proposal: type interfaces
On Fri, Oct 23, 2009 at 7:34 AM, Tom Lane t...@sss.pgh.pa.us wrote: The way I see it, we have two approaches: 1. Try to make the current system work by standardizing the strategy numbers for GiST somehow, and then use the default GiST operator class, if available. This proposal is a non-starter, because one of the whole points of GIST is that it can support multiple sets of semantics. There is no reason at all to assume that every GIST opclass must include operators having the semantics of overlaps and to-left-of. I always thought it was strange that the GIST strategy numbers were completely meaningless. It does seem like assigning meaning to strategy numbers gradually as we learn new interrelated indexable strategies. We would still have a range of values for new non-standard semantics, but at least the common ones would be nailed down. I'm not sure that solves the problem because the default gist operator class isn't necessarily going to be the one with the strategies this needs, though I suppose normally people would want to make the default operator class for each data type one which supports the most strategies. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-proposal: type interfaces
Greg Stark gsst...@mit.edu writes: I always thought it was strange that the GIST strategy numbers were completely meaningless. It does seem like assigning meaning to strategy numbers gradually as we learn new interrelated indexable strategies. We would still have a range of values for new non-standard semantics, but at least the common ones would be nailed down. Well, the problem with that is that GIST strategy numbers are historically an internal implementation detail for any particular opclass, and so trying to standardize them now is going to mean lots of incompatibility with third-party opclasses. I'd feel more comfortable with being able to add some flags to an opclass (or more likely an opfamily) that assert that its strategy numbers agree with some convention or other. Maybe a bitmap so that there's room for multiple future conventions of this kind? For instance it's certainly conceivable that a GIST class could support both btree-compatible and overlaps operators. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] table corrupted
2009/10/23 Robert Haas robertmh...@gmail.com Well, that's not really the problem. Your data is corrupted - increasing the index row size is not going to fix it. I'm not really knowledgeable enough about the guts of the database to know whether there are lower-level tools that could be used to rescue your data. I wonder if you'd have any luck selecting data a few rows at a time (LIMIT 100, say, without ORDER BY). That might at least enable you to get some of the data out of there, if there are some pages that are undamaged. But I'm grasping at straws here. ...Robert I ask about the index row size because I can't re-index the database and I've a server for tests and in this I removed the pk and can't recreate the index because it showing error about size row limit indices. And, only occurs erros when you run a query involving the records damaged. I'm trying to identify them (less of 1% of the total registers).
Re: [HACKERS] pre-proposal: type interfaces
Greg Stark gsst...@mit.edu writes: I'm not sure that solves the problem because the default gist operator class isn't necessarily going to be the one with the strategies this needs, Forgot to mention: I do not think default-ness of opclasses enters into it at all. The meaning of the query is fully defined by the operator that is in it. All we need to know is what are the semantics of that operator. If we can find it in the overlaps position of *any* opclass, we are entitled to suppose that it behaves like overlaps and the associated left-of operator can be used to optimize it. Conceivably we could get different left-of operators out of different opclasses, but if they don't behave effectively the same, the user has messed up the opclasses. As an example, suppose we are trying to implement DISTINCT via sort-and-unique, and we've chosen an appropriate '=' operator that defines what distinct-ness means. That operator might appear in more than one opclass having more than one sort order, but it doesn't matter which one we choose. As long as the opclasses are self-consistent, we should get correct answers with any one. The case where default-ness of opclasses matters is where we are trying to assign specific meaning to some generic construct like DISTINCT or ORDER BY. For instance, it makes sense to require that ORDER BY be interpreted by reference to a default opclass, because otherwise you don't have a way to know which sort ordering the user wants. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-proposal: type interfaces
On Fri, 2009-10-23 at 16:25 -0400, Tom Lane wrote: Forgot to mention: I do not think default-ness of opclasses enters into it at all. The meaning of the query is fully defined by the operator that is in it. All we need to know is what are the semantics of that operator. If we can find it in the overlaps position of *any* opclass, we are entitled to suppose that it behaves like overlaps and the associated left-of operator can be used to optimize it. Interesting, that sounds we've got a good approach to the problem now. This thread has been useful. Conceivably we could get different left-of operators out of different opclasses, but if they don't behave effectively the same, the user has messed up the opclasses. It would probably be worthwhile to make an attempt to throw a useful error there, but I agree it's not really a problem. The case where default-ness of opclasses matters is where we are trying to assign specific meaning to some generic construct like DISTINCT or ORDER BY. For instance, it makes sense to require that ORDER BY be interpreted by reference to a default opclass, because otherwise you don't have a way to know which sort ordering the user wants. That makes sense. Thanks, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-proposal: type interfaces
Jeff Davis pg...@j-davis.com writes: On Fri, 2009-10-23 at 16:25 -0400, Tom Lane wrote: Conceivably we could get different left-of operators out of different opclasses, but if they don't behave effectively the same, the user has messed up the opclasses. It would probably be worthwhile to make an attempt to throw a useful error there, but I agree it's not really a problem. Sure, right after we solve the halting problem ;-). The point I was trying to make is that getting different operators is not wrong. It's only wrong if their behavior isn't consistent with what the opclass asserts, and there's no practical way to determine that. We have to trust the opclass maker. (This is one of the main reasons why CREATE OPERATOR CLASS is superuser-only.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] per table random-page-cost?
Cedric, ase is a table containing 29 GB of bytea in a database of 52 GB. Every row on the 29GB table is grab only few times. And it will just renew OS cache memory every time (the server have only 8GB of ram). So when I remove this table (not the index) from the OS cache memory, I keep more interesting blocks in the OS cache memory. effective_cache_size doesn't control what gets cached, it just tells the planner about it. Now, if we had an OS which could be convinced to handle caching differently for different physical devices, then I could see wanting this setting to be per-tablespace. For example, it would make a lot of sense not to FS-cache any data which is on a ramdisk or superfast SSD array. The same with archive data which you expected to be slow and infrequently accessed on a NAS device. If your OS can do that, while caching data from other sources, then it would make sense. However, I don't know any current OS which allows for this. Does anyone else? --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Statement Level Deferred Triggers
Hi all, I would like to hear some opinions before starting to take a stab at implementing $subject. My current use case is updating materialized views at the end of the transaction so that they appear consistent to the outside. Updating them on every row changed is far too expensive - so every change is logged and depending on the size of the changeset the views are recomputed completely or incrementally. Currently this is hacked up by using deferred constraint triggers - which are row level only... (i.e. the first run trigger deletes the whole changelog so that all following triggers have nearly nothing to do). This is neither nice from an architectural point nor from an performance angle. I am sure most of you can think of other use cases. Opinions? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] table corrupted
2009/10/23 Dimitri Fontaine dfonta...@hi-media.com João Eugenio Marynowski joa...@gmail.com writes: 2009-10-16 16:11:47 BRT 192.168.0.29 ERROR: invalid page header in block 462821 of relation ... It could be that the following will prove helpful: http://archives.postgresql.org/pgsql-general/2007-07/msg00506.php http://pgsql.tapoueh.org/site/html/news/20080410.badblock.html Regards, -- dim I thought this would solve my problems but not yet ... the block that I found is not presenting error... find_bad_block (514708,16) (1 registro) blockId / blocks_per_chunk = 514708 / 131072 = 3,9269104 base/21058/21271.3 chunk_block_id = block_id % blocks_per_chunk = 514708 % 131072 = 121492 is attached the result of pg_filedump -if -R 121492 /var/lib/postgresql/8.2/main/base/21058/21271.3 Even then I filled the block with zero But when I run again the find_bad_block it show one block before, ctid 514707, and then, 514706 and I stop... Any suggestion and explication? -- João Eugenio pg_filedump-ctid-514708 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement Level Deferred Triggers
On Fri, Oct 23, 2009 at 7:29 PM, Andres Freund and...@anarazel.de wrote: Hi all, I would like to hear some opinions before starting to take a stab at implementing $subject. My current use case is updating materialized views at the end of the transaction so that they appear consistent to the outside. Updating them on every row changed is far too expensive - so every change is logged and depending on the size of the changeset the views are recomputed completely or incrementally. Currently this is hacked up by using deferred constraint triggers - which are row level only... (i.e. the first run trigger deletes the whole changelog so that all following triggers have nearly nothing to do). This is neither nice from an architectural point nor from an performance angle. I am sure most of you can think of other use cases. Opinions? This same use case has arisen for me in the past, with slightly different details. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] misleading comments in pgbench
From contrib/pgbench/pgbench.c starting in revision 1.77 * Note: TPC-B requires at least 100 bytes per row, and the filler * fields in these table declarations were intended to comply with that. * But because they default to NULLs, they don't actually take any * space. We could fix that by giving them non-null default values. * However, that would completely break comparability of pgbench * results with prior versions. Since pgbench has never pretended * to be fully TPC-B compliant anyway, we stick with the historical * behavior. The statement about NULLs applies to the other 3 tables, but pgbench_accounts.filler is loaded (via COPY) as empty strings, not as NULLs. When stored into char(84), the empty string takes the full specified storage. So on my system, rows in pgbench_accounts take up about 130 bytes, rather than than the about 40 they would with a NULL. The behavior probably won't be changed, but the code comment probably should be. Sorry if this seems like picayune thing, but it led to a bit of head scratching until I figured it out. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers