[GENERAL] Re: Some thoughts on table inheritance (which is uniquely awesome on PostgreSQL)
Chris Travers, 23.08.2012 05:22: The fact that this allows you to create essentially derived values from groups of re-used columns is itself remarkable and can be used to implement path traversal etc. which is not directly supported in PostgreSQL in the sense that it is in Oracle or DB2. What exactly do you mean with path traversal here? If you are talking about Oracle's CONNECT BY then this is fully supported in PostgreSQL using a recursive common table expression. Thomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] What text format is this and can I import it into Postgres?
Mike Christensen, 23.08.2012 02:41: Oh, also if anyone knows of a way to export an Access database to Postgres, that might be helpful. I don't have a copy of Access. If you have a Windows box, you can try SQL Workbench/J. Even though it is a Java application it can connect to an Access database using the JDBC/ODBC bridge. As every Windows box has an ODBC driver for Access built-in you don't need any additional drivers (you do need a Java runtime obviously). An example connection URL is shown in the manual: http://www.sql-workbench.net/manual/profiles.html#odbc On a 64bit System setting up the ODBC connection can be tricky though as both 32bit and 64bit drivers are registered I believe. The Java Runtime has to have the same bits as the ODBC driver in order to work. I have no idea if using JDBC/ODBC would work on a Unix/Linux box though. SQL Workbench can export any database it can connect to, to various output formats (http://www.sql-workbench.net/manual/command-export.html). The text/csv exports can be imported into Postgres. Regards Thomas (I'm the author of SQL Workbench/J) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] At what point does a big table start becoming too big?
I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Alternatives to very large tables with many performance-killing indicies?
On Wed, Aug 22, 2012 at 05:56:27PM -0700, Jeff Janes wrote: It's wide-ish, too, 98 columns. How many of the columns are NULL for any given row? Or perhaps better, what is the distribution of values for any given column? For a given column, is there some magic value (NULL, 0, 1, -1, , '') which most of the rows have? In particular, if the data is sparse, as in lots of NULLs, and you don't need to search on those, you might consider partial indexes. If you create partial indexes for only the non-NULL entries, postgres is smart enough to use it when you query it for something not NULL. Example: db=# create temp table foo (a int4, b int4); CREATE TABLE db=# insert into foo (a) select generate_series(1,10); INSERT 0 10 db=# update foo set b=1 where a=1; UPDATE 1 db=# create index bar on foo(b) where b is not null; CREATE INDEX db=# explain select * from foo where b=1; QUERY PLAN Bitmap Heap Scan on foo (cost=4.38..424.59 rows=500 width=8) Recheck Cond: (b = 1) - Bitmap Index Scan on bar (cost=0.00..4.26 rows=500 width=0) Index Cond: (b = 1) (4 rows) In this case a row update will only update indexes with non-NULL rows, which may cut the overhead considerably. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [GENERAL] At what point does a big table start becoming too big?
I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? -- My rule here is that a table is too big when performance starts degrading beyond an acceptable level. If the database and server are delivering consistent and acceptable performance levels despite an index being 6x RAM or a table consuming 10% of disk, then I tend to leave it be until an issue is raised.CheersMartin= Romax Technology Limited Rutherford House Nottingham Science & Technology Park Nottingham, NG7 2PZ England Telephone numbers: +44 (0)115 951 88 00 (main) For other office locations see: http://www.romaxtech.com/Contact = === E-mail: i...@romaxtech.com Website: www.romaxtech.com = Confidentiality Statement This transmission is for the addressee only and contains information that is confidential and privileged. Unless you are the named addressee, or authorised to receive it on behalf of the addressee you may not copy or use it, or disclose it to anyone else. If you have received this transmission in error please delete from your system and contact the sender. Thank you for your cooperation. =
[GENERAL] Result from Having count
Hello ppl, I try to make query and see how many ids have more then one row. few records is: ids | val | some a | 1 | x a | 1 | v b | 1 | x b | 2 | c I focus on ids and val with: SELECT ids, val FROM table WHERE ids = 'a' GROUP BY ids, val HAVING COUNT(ids) 1; and result is: ids | val a | 1 Well in this condition pgsql shold not return me positive result because on documentation I read having count work on group clause, and when I group these two records based on ids = 'a' they become to one row and my condition is if the result after grouping is greeter then 1. I use postgresql 9.1.4 x64 Any one can tell me what I miss ? Regards, Hristo Simeonov -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Result from Having count
Am 23.08.2012 09:52, schrieb Condor: Hello ppl, I try to make query and see how many ids have more then one row. few records is: ids | val | some a | 1 | x a | 1 | v b | 1 | x b | 2 | c I focus on ids and val with: SELECT ids, val FROM table WHERE ids = 'a' GROUP BY ids, val HAVING COUNT(ids) 1; and result is: ids | val a | 1 Well in this condition pgsql shold not return me positive result because on documentation I read having count work on group clause, and when I group these two records based on ids = 'a' they become to one row and my condition is if the result after grouping is greeter then 1. I use postgresql 9.1.4 x64 Any one can tell me what I miss ? Not sure I understand you correct, but maybe count() is working for you. Maybe you would need some primary key for good values. cheers, Frank -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Result from Having count
On , Frank Lanitz wrote: Am 23.08.2012 09:52, schrieb Condor: Hello ppl, I try to make query and see how many ids have more then one row. few records is: ids | val | some a | 1 | x a | 1 | v b | 1 | x b | 2 | c I focus on ids and val with: SELECT ids, val FROM table WHERE ids = 'a' GROUP BY ids, val HAVING COUNT(ids) 1; and result is: ids | val a | 1 Well in this condition pgsql shold not return me positive result because on documentation I read having count work on group clause, and when I group these two records based on ids = 'a' they become to one row and my condition is if the result after grouping is greeter then 1. I use postgresql 9.1.4 x64 Any one can tell me what I miss ? Not sure I understand you correct, but maybe count() is working for you. Maybe you would need some primary key for good values. cheers, Frank Sorry for my email, after some thinking I understand my error and change query to: SELECT COUNT(DISTINCT val), ids FROM table WHERE ids = 'a' GROUP BY ids HAVING COUNT(DISTINCT val) 1; and it's work. Thanks. Hristo C. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Result from Having count
Am 23.08.2012 10:45, schrieb Condor: On , Frank Lanitz wrote: Am 23.08.2012 09:52, schrieb Condor: Hello ppl, I try to make query and see how many ids have more then one row. few records is: ids | val | some a | 1 | x a | 1 | v b | 1 | x b | 2 | c I focus on ids and val with: SELECT ids, val FROM table WHERE ids = 'a' GROUP BY ids, val HAVING COUNT(ids) 1; and result is: ids | val a | 1 Well in this condition pgsql shold not return me positive result because on documentation I read having count work on group clause, and when I group these two records based on ids = 'a' they become to one row and my condition is if the result after grouping is greeter then 1. I use postgresql 9.1.4 x64 Any one can tell me what I miss ? Not sure I understand you correct, but maybe count() is working for you. Maybe you would need some primary key for good values. cheers, Frank Sorry for my email, after some thinking I understand my error and change query to: SELECT COUNT(DISTINCT val), ids FROM table WHERE ids = 'a' GROUP BY ids HAVING COUNT(DISTINCT val) 1; and it's work. At least I was wrong in understanding your request. ;) But glad, you found a solution. Cheers, Frank -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] At what point does a big table start becoming too big?
On Wed, Aug 22, 2012 at 4:06 PM, Nick nboutel...@gmail.com wrote: I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? Performance on a big table is going to depend very heavily on the sorts of queries executed against it.I don't think you can come up with a rule of thumb of that sort. Best Wishes, Chris Travers -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Rules, Windows and ORDER BY
Hello List, I have a simple table of keys and values which periodically receives updated values. It's desirable to keep older values but, most of the time, we query only for the latest value of a particular key. CREATE TABLE kv ( k bytea NOT NULL, at timestamptz NOT NULL, realm bytea NOT NULL, v bytea NOT NULL ); CREATE INDEX ON kv USING hash(k); CREATE INDEX ON kv (t); CREATE INDEX ON kv USING hash(realm); SELECT * FROM kv WHERE k = $1 AND realm = $2 ORDER BY at DESC LIMIT 1; It would be nice to encapsulate this common query with a VIEW; for example: CREATE VIEW kv_new AS SELECT * FROM kv WHERE at = ( SELECT at FROM kv AS _ WHERE _.k = kv.k AND _.realm = kv.realm ORDER BY at DESC LIMIT 1 ); I tried partition functions, at first, but they were really very slow. This view is pretty sprightly but has a more complicated plan than the original query, which only has a sort followed by an index scan, and is consequently not as fast. Please find the plans below my signature. Ideally, I'd be able to create a rule where the ORDER BY and LIMIT were simply appended to whatever SELECT was given; but I am at a loss as to how to do that. Creating a VIEW with the order and limit just gives me a table with one row in it (of course). Is there something better than a sub-select here? I tried using one with max(at) but it's not noticeably faster. I would be interested to see how others have approached this kind of log- -structured storage in Postgres. The window functions make, alas, no use of indexes. -- Jason Dusek pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B EXPLAIN (COSTS FALSE, FORMAT YAML) SELECT * FROM kv WHERE k = ... AND realm = ... ORDER BY at LIMIT 1; -[ RECORD 1 ]- QUERY PLAN | - Plan: | Node Type: Limit | Plans: | - Node Type: Sort | Parent Relationship: Outer | Sort Key: | - at | Plans: | - Node Type: Index Scan | Parent Relationship: Outer | Scan Direction: NoMovement | Index Name: kv_k_idx | Relation Name: kv | Alias: kv | Index Cond: (k = ...) | Filter: (realm = ...) EXPLAIN (COSTS FALSE, FORMAT YAML) SELECT * FROM kv_new WHERE k = ... AND realm = ...; -[ RECORD 1 ]- QUERY PLAN | - Plan: | Node Type: Index Scan | Scan Direction: NoMovement | Index Name: kv_k_idx | Relation Name: kv | Alias: kv | Index Cond: (k = ...) | Filter: ((realm = ...) AND (at = (SubPlan 1))) | Plans: | - Node Type: Limit | Parent Relationship: SubPlan | Subplan Name: SubPlan 1 | Plans: | - Node Type: Sort | Parent Relationship: Outer | Sort Key: | - _.at | Plans: | - Node Type: Index Scan | Parent Relationship: Outer | Scan Direction: NoMovement | Index Name: kv_k_idx | Relation Name: kv | Alias: _ | Index Cond: (k = kv.k) | Filter: (realm = kv.realm) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
Le mercredi 22 août 2012 à 13:30 -0400, Sébastien Lorion a écrit : Vincent, I would appreciate that you stop assuming things based on zero information about what I am doing. I understand that you are trying to be helpful, but I can assure you that going bare-metal only does not make any sense in my context. Dude, I would appreciate you realize that approaching a newsgroup while providing zero information about what you are doing (in your own words) is not the best way to get relevant responses to your question. Ignoring repeated requests for information does not help, castigating people trying to help for not having said information at least shows a certain consistency on your part. Lest we ridicule ourselves publicly, I suggest we leave the discussion at that and wish you luck in your endeavor. Vincent Veyron -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
On Thu, Aug 23, 2012 at 7:39 AM, Vincent Veyron vv.li...@wanadoo.fr wrote: Lest we ridicule ourselves publicly, I suggest we leave the discussion at that and wish you luck in your endeavor. If anyone has an answer to his question, I'd appreciate hearing it, despite any faux pas that the OP has committed. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] At what point does a big table start becoming too big?
On 2012-08-22, Nick nboutel...@gmail.com wrote: I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? when you need to run a query that needs to fetch too many rows. For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? If you only need one row at a time and you have the indices for it no size is too big, the larger they are the more impressive indices are. O(log(n)) beats O(n) more and more as n grows. -- ⚂⚃ 100% natural -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] At what point does a big table start becoming too big?
In response to Martin French martin.fre...@romaxtech.com: I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? My rule here is that a table is too big when performance starts degrading beyond an acceptable level. The challenge there is that if you wait until performance degrades beyond an acceptable level, you've allowed yourself to get into a situation where clients are upset and frustrated, and fixing the problem is difficult because there's so much data to manipulate to rearrange things. And the advice I have along those lines is to establish now what constitutes unacceptable performance, and put some sort of monitoring and tracking in place to know what your performance degradation looks like and predict when you'll have to react. For example, a MRTG graph that runs an experimental query once a day during off hours and graphs the time it takes vs. the # of rows in the table will prove a valuable tool that can sometimes predict exactly when you'll have to change things before it becomes a problem. Other tricks work as well, such as having the application send an email any time a process takes more than 50% of the allowable maximum time. The key is to have visibility into what's going on so your guesses are at least informed. People will often point out that no monitoring or trend tracking is 100% accurate, but if it allows you to predict and plan for 90% of the future issues, you'll have that much more time available to deal with the 10% that you don't expect. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
Vincent, The original question can be summed up by how is general performance of PostgreSQL on Amazon IOPS. I fail to understand why that would require me to explain the specifics of my application and/or my market. The only one asking for that information is you, while others have provided useful answers, for which I am very grateful. p.s. My name is not dude or seb, we have not raised the pigs together ... Sébastien On Thu, Aug 23, 2012 at 7:39 AM, Vincent Veyron vv.li...@wanadoo.fr wrote: Le mercredi 22 août 2012 à 13:30 -0400, Sébastien Lorion a écrit : Vincent, I would appreciate that you stop assuming things based on zero information about what I am doing. I understand that you are trying to be helpful, but I can assure you that going bare-metal only does not make any sense in my context. Dude, I would appreciate you realize that approaching a newsgroup while providing zero information about what you are doing (in your own words) is not the best way to get relevant responses to your question. Ignoring repeated requests for information does not help, castigating people trying to help for not having said information at least shows a certain consistency on your part. Lest we ridicule ourselves publicly, I suggest we leave the discussion at that and wish you luck in your endeavor. Vincent Veyron
Re: [GENERAL] Can column name aliases be supported?
Thanks for all the replies ! The real problem has nothing to do with names. I just used that as a vehicle for articulating the problem. The view approach for queries is workable, at least for queries. Thanks for the input on that and the idea to replicate the various aliases in the view! I take it that duplication of the aliases as formal table columns, synched up with a before trigger is a bad idea? Is there any fruit on the end of that branch? (after looking at RULES, I expect only a single dried raisin :-)) Regarding the RULES approach, I've never used RULES before, so please bear with me a bit :-) My stab at this using the base example... create rule lastnamealias1 on update PEOPLE.lname do instead update PEOPLE.last_name; create rule lastnamealias2 on update PEOPLE.surname do instead update PEOPLE.last_name; does that look correct (given that last_name is the actual c olumn name)? I too would like to shoot the clients. But as they say, the customer is always right ! Thanks Again everyone ! -Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Chris Angelico Sent: Thursday, August 23, 2012 1:23 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Can column name aliases be supported? On Thu, Aug 23, 2012 at 2:19 PM, Craig Ringer ring...@ringerc.id.au wrote: On 08/23/2012 11:56 AM, Chris Angelico wrote: Here's an out-of-the-box suggestion. Drop the column altogether and have a single column name. Trying to divide names up never works properly. Does surname mean family name? Not all cultures put the family name last. Is last_name simply the part of the name after the last space? +1 to that, and it gets way worse: http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-abo ut-names/ Yes, that link was posted on python-list a little while back, and that's what I had in mind as I was writing that up. Couldn't remember the actual link though. Thanks! and while you're at it, read this: http://www.joelonsoftware.com/articles/Unicode.html Definitely. I disagree with Joel Spolsky on many things, but I agree with that post. These days, Unicode is an absolute necessity. Our PHP-based web site has a number of issues with Unicode input, but at least everything that goes through the database (we use Postgres for everything) is safe. ChrisA -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] At what point does a big table start becoming too big?
On Wed, Aug 22, 2012 at 6:06 PM, Nick nboutel...@gmail.com wrote: I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? Well, that begs the question: ...and do what? I guess you probably mean partitioning. Partitioning doesn't reduce index size -- it makes total index size *bigger* since you have to duplicate higher nodes in the index -- unless you can exploit the table structure around the partition so that less fields have to be indexed. Where partitioning helps is by speeding certain classes of bulk operations like deleting a bunch of rows -- maybe you can set it up so that a partition can be dropped instead for a huge efficiency win. Partitioning also helps by breaking up administrative operations such as vacuum, analyze, cluster, create index, reindex, etc. So I'd argue that it's time to start thinking about plan 'b' when you find yourself getting concerned about performance of those operations. Partitioning aside, the way to reduce the number of rows you're dealing with is to explore reorganizing your data: classic normalization or use of arrays are a couple of examples of things you can try. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
On 08/23/2012 07:39 PM, Vincent Veyron wrote: Le mercredi 22 août 2012 à 13:30 -0400, Sébastien Lorion a écrit : Vincent, I would appreciate that you stop assuming things based on zero information about what I am doing. I understand that you are trying to be helpful, but I can assure you that going bare-metal only does not make any sense in my context. Dude, I would appreciate you realize that approaching a newsgroup while providing zero information about what you are doing In this case, what he's doing is seeking generalized performance measurements. I don't think details were particularly necessary until it got pulled off-track. I'll be interested to hear if you have any results Sébastien, or if anyone else does. It's good to have data on the increasingly popular cloud platforms out there. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] At what point does a big table start becoming too big?
On Thu, Aug 23, 2012 at 6:46 AM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Aug 22, 2012 at 6:06 PM, Nick nboutel...@gmail.com wrote: I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? Well, that begs the question: ...and do what? I guess you probably mean partitioning. Partitioning doesn't reduce index size -- it makes total index size *bigger* since you have to duplicate higher nodes in the index -- unless you can exploit the table structure around the partition so that less fields have to be indexed. Depending on the operation it may make the effective index size bigger or smaller. For example if querying only one child table your effective index size is much smaller. However, if you are worried about that, partial indexes rock :-D Best Wishes, Chris Travers -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] At what point does a big table start becoming too big?
Chris Travers chris.trav...@gmail.com writes: On Thu, Aug 23, 2012 at 6:46 AM, Merlin Moncure mmonc...@gmail.com wrote: Partitioning doesn't reduce index size -- it makes total index size *bigger* since you have to duplicate higher nodes in the index -- unless you can exploit the table structure around the partition so that less fields have to be indexed. Depending on the operation it may make the effective index size bigger or smaller. For example if querying only one child table your effective index size is much smaller. I tend to think of it like this: partitioning means *manually* replacing the first level of index search. As such, it is almost never a win for either complexity or performance of simple searches and updates. As Merlin said, pretty much the only compelling reason to do it is if you can match up the partition boundaries with bulk tasks that are common in your application, such as dropping a month's worth of data at a time. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Can column name aliases be supported?
Updateable views. This is great. I didn't know about these. Absolutely delicious ! I found a great example here... http://vibhorkumar.wordpress.com/2011/10/28/instead-of-trigger/ The problem of user updating 1 alias remains, but I have no problem bouncing them if they attempt that. Maybe return a -E- You cannot update multiple aliases of the same base value (something like that). I'd just have to detect this circumstance, raise the message and bounce (return OLD). Does this sound doable? -Original Message- From: Craig Ringer [mailto:ring...@ringerc.id.au] Sent: Thursday, August 23, 2012 9:52 AM To: Gauthier, Dave Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Can column name aliases be supported? On 08/23/2012 09:32 PM, Gauthier, Dave wrote: The view approach for queries is workable, at least for queries. Thanks for the input on that and the idea to replicate the various aliases in the view! The key issue with all your approaches is whether the client can ever `UPDATE` the view. If they can, and you duplicate the column under multiple aliases, what happens when they `UPDATE` two of the aliases to different values? I too would like to shoot the clients. But as they say, the customer is always right ! Heh, I wouldn't shoot them. Fire the client means: http://sixrevisions.com/project-management/how-to-fire-bad-clients/ -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Can column name aliases be supported?
On 08/23/2012 09:32 PM, Gauthier, Dave wrote: The view approach for queries is workable, at least for queries. Thanks for the input on that and the idea to replicate the various aliases in the view! The key issue with all your approaches is whether the client can ever `UPDATE` the view. If they can, and you duplicate the column under multiple aliases, what happens when they `UPDATE` two of the aliases to different values? I too would like to shoot the clients. But as they say, the customer is always right ! Heh, I wouldn't shoot them. Fire the client means: http://sixrevisions.com/project-management/how-to-fire-bad-clients/ -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Cannot Run EnterpriseDB Postgresql Installation
Hi, i just formatted my machine and installed fresh win7 x64. Also installed VS2012 since i do .net developement. In backend i use postgresql so downloaded latest postgresql 9.1.5 installation. But, i am not able to install. Upon executing file postgresql-9.1.5-1-windows.exe it asks for UAC and i say yes then i get a black window and another window named console with (Downloads) 1 % written in it. I have installed old version before and also on many clients but this is first time i am encountering this issue. There are also no logs anywhere. Please help, as my development is fully stranded because of this issue.
Re: [GENERAL] Rules, Windows and ORDER BY
Jason Dusek jason.du...@gmail.com writes: I have a simple table of keys and values which periodically receives updated values. It's desirable to keep older values but, most of the time, we query only for the latest value of a particular key. CREATE TABLE kv ( k bytea NOT NULL, at timestamptz NOT NULL, realm bytea NOT NULL, v bytea NOT NULL ); CREATE INDEX ON kv USING hash(k); CREATE INDEX ON kv (t); CREATE INDEX ON kv USING hash(realm); SELECT * FROM kv WHERE k = $1 AND realm = $2 ORDER BY at DESC LIMIT 1; If you want to make that fast, an index on (k,realm,at) would help. Those indexes that you did create are next to useless for this, and furthermore hash indexes are quite unsafe for production. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] NULL value comparison
On Aug 22, 2012, at 9:38 PM, Craig Ringer wrote: On 08/23/2012 10:32 AM, Michael Sacket wrote: The good news is I now have the proper constraints in place and the app and it's 130 tables are working with PostgreSQL in less than a day. Wow, that's cool, especially without SQL changes. What was the previous database? I'm curious now. The database was OpenBase. I did an ascii backup and used some scripts I created last summer to make some small changes (mostly changing types on the create statements)... then ran it through psql. On the WebObjects side I just had to update the external types in the EOModel to match the PostgreSQL ones. Turned out to be a fairly straightforward process. In any case, we've been using PostgreSQL for all new development for about 1.5 years and have found it to be quite reliable, quick, and elegant. OpenBase was quick and reliable for us as well; however, due to its limited adoption outside of MacOSX and WebObjects we were always having to write our own tools and frameworks to work with it. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] At what point does a big table start becoming too big?
I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?My rule here is that a table is too big when performance starts degrading beyond an acceptable level. The challenge there is that if you wait until performance degrades beyond an acceptable level, you've allowed yourself to get into a situation where clients are upset and frustrated, and fixing the problem is difficult because there's so much data to manipulate to rearrange things. Apologies, I could/should have phrased that better..My rule here is that a table is too big when performance starts degrading beyond a MEASURABLE level. :)CheersMartin = Romax Technology Limited Rutherford House Nottingham Science & Technology Park Nottingham, NG7 2PZ England Telephone numbers: +44 (0)115 951 88 00 (main) For other office locations see: http://www.romaxtech.com/Contact = === E-mail: i...@romaxtech.com Website: www.romaxtech.com = Confidentiality Statement This transmission is for the addressee only and contains information that is confidential and privileged. Unless you are the named addressee, or authorised to receive it on behalf of the addressee you may not copy or use it, or disclose it to anyone else. If you have received this transmission in error please delete from your system and contact the sender. Thank you for your cooperation. =
Re: [GENERAL] Performance implications of numeric?
On Tue, Aug 21, 2012 at 11:27 PM, Wells Oliver wellsoli...@gmail.com wrote: We have a lot of tables which store numeric data. These tables all use the numeric type, where the values are 95% integer values. We used numeric because it eliminated the need for casting during division to yield a floating point value. I'm curious as to whether this would have performance and/or disk size implications. Would converting these columns to integer (or double precision on the handful that require the precision) and forcing developers to use explicit casting be worth the time? Thanks for any clarification. Calculations against numeric are several orders of magnitude slower than native binary operations. Fortunately the time the database spends doing these types of calculations is often a tiny fraction of overall execution time and I advise giving numeric a whirl unless you measure a big performance hit. Let's put it this way: native binary types are a performance hack that come with all kinds of weird baggage that percolate up and uglify your code: your example given is a classic case in point. Database integer types are not in fact integers but a physically constrained approximation of them. Floating point types are even worse. Another example: I just found out for the first time (after many years of programming professionally) that -2147483648 / -1 raises a hardware exception: this is exactly the kind of thing that makes me think that rote use of hardware integer types is a terribly bad practice. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
On Wed, Aug 22, 2012 at 4:12 PM, Alan Hodgson ahodg...@simkin.ca wrote: On Wednesday, August 22, 2012 04:10:01 PM Andrew Hannon wrote: Just looking into High IO instances for a DB deployment. In order to get past 1TB, we are looking at RAID-0. I have heard (http://hackerne.ws/item?id=4266119) there might be a problem if TRIM isn't supported. Does anyone know if it is and has anyone used RAID-0 on these instances? (Linux of course…) Just use LVM striping. If it turns out to be an issue; that seems to be mostly conjecture. I note that the SSDs are only instance storage. The data will be gone when the instance goes away. I have used instance storage in replicated setups but it always feels rather fragile unless your data really is transient or you can maintain 2 replicas. Their other new service, provisioned IOPS for EBS, might be more useful for a persistent database. Although not nearly SSD speeds, of course. Yeah -- I should have mentioned that: you absolutely must run hs/sr or some other strategy that maintains your data. I guess you might as well turn off fsync, right? merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Cannot Run EnterpriseDB Postgresql Installation
Hello, Can we get a screenshot? Also you can check for any partial installation logs in your %TEMP% as install-postgresql.log or bitrock_installer_.log. Check the %TEMP% of the Administrator as well (If you dont see any logs in the %TEMP% of the logged in user) On Thu, Aug 23, 2012 at 8:01 PM, javad M mjavad...@gmail.com wrote: Hi, i just formatted my machine and installed fresh win7 x64. Also installed VS2012 since i do .net developement. In backend i use postgresql so downloaded latest postgresql 9.1.5 installation. But, i am not able to install. Upon executing file postgresql-9.1.5-1-windows.exe it asks for UAC and i say yes then i get a black window and another window named console with (Downloads) 1 % written in it. I have installed old version before and also on many clients but this is first time i am encountering this issue. There are also no logs anywhere. Please help, as my development is fully stranded because of this issue. -- Regards, Sachin Srivastava EnterpriseDB, India
Re: [GENERAL] Amazon High I/O instances
On 08/23/12 6:49 AM, Craig Ringer wrote: In this case, what he's doing is seeking generalized performance measurements. I don't think details were particularly necessary until it got pulled off-track. 42 performance measurements without a very narrow definition of 'performance' are useless.depending on the nature of the application workload, postgres can stress completely different aspects of the system (cpu vs read IO performance vs write IO performance being the big three). -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Should I drop / create extensions after upgraging?
Hello all! I'm on Ubuntu 11.10, using 9.1.4. There was a new package for 9.1.5, and I just finished my upgrade. I make use of the uuid-ossp and tablefunc extensions. What is the expected upgrade procedure for these extensions? A drop followed by a create? Or is there something else less invasive? I can't find any references to upgrades on http://www.postgresql.org/docs/9.1/static/sql-createextension.html, nor on http://wiki.postgresql.org/wiki/ExtensionPackaging. Where should I have looked to find a reference to upgrading? Thanks! François Beausoleil -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Should I drop / create extensions after upgraging?
On 23 Srpen 2012, 18:19, François Beausoleil wrote: Hello all! I'm on Ubuntu 11.10, using 9.1.4. There was a new package for 9.1.5, and I just finished my upgrade. I make use of the uuid-ossp and tablefunc extensions. What is the expected upgrade procedure for these extensions? A drop followed by a create? Or is there something else less invasive? I can't find any references to upgrades on http://www.postgresql.org/docs/9.1/static/sql-createextension.html, nor on http://wiki.postgresql.org/wiki/ExtensionPackaging. Where should I have looked to find a reference to upgrading? The minor updates (e.g. from 9.1.4 to 9.1.5) are a drop-in binary replacements. You don't need to do anything special about the extensions, it just works (unless there's something really broken). Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Should I drop / create extensions after upgraging?
Tomas Vondra t...@fuzzy.cz writes: On 23 Srpen 2012, 18:19, François Beausoleil wrote: I'm on Ubuntu 11.10, using 9.1.4. There was a new package for 9.1.5, and I just finished my upgrade. I make use of the uuid-ossp and tablefunc extensions. What is the expected upgrade procedure for these extensions? A drop followed by a create? Or is there something else less invasive? The minor updates (e.g. from 9.1.4 to 9.1.5) are a drop-in binary replacements. You don't need to do anything special about the extensions, it just works (unless there's something really broken). Or perhaps more to the point: if there were something special you had to do, the release notes for 9.1.5 would have told you about it. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Cannot Run EnterpriseDB Postgresql Installation
One more thing i forgot to add The console - help - about window reads Tcl for Windows Tcl 8.5.9 Tk 8.5.9 So i searched and tried downloading and running the latest Tcl available from http://www.activestate.com/activetcl And what do you know, the same thing as in the screenshot running that tcl setup Maybe some issue with the tcl in the postgresql setup I tried older postgresql setup version like 9.0.9, 9.1.4, x64 setups but same issue. For some reason the tcl component is creating some issue, Javad On Thu, Aug 23, 2012 at 11:32 PM, javad M mjavad...@gmail.com wrote: i have attached the screenshot. I searched everywhere for any log file. but nothing. On Thu, Aug 23, 2012 at 9:10 PM, Sachin Srivastava sachin.srivast...@enterprisedb.com wrote: Hello, Can we get a screenshot? Also you can check for any partial installation logs in your %TEMP% as install-postgresql.log or bitrock_installer_.log. Check the %TEMP% of the Administrator as well (If you dont see any logs in the %TEMP% of the logged in user) On Thu, Aug 23, 2012 at 8:01 PM, javad M mjavad...@gmail.com wrote: Hi, i just formatted my machine and installed fresh win7 x64. Also installed VS2012 since i do .net developement. In backend i use postgresql so downloaded latest postgresql 9.1.5 installation. But, i am not able to install. Upon executing file postgresql-9.1.5-1-windows.exe it asks for UAC and i say yes then i get a black window and another window named console with (Downloads) 1 % written in it. I have installed old version before and also on many clients but this is first time i am encountering this issue. There are also no logs anywhere. Please help, as my development is fully stranded because of this issue. -- Regards, Sachin Srivastava EnterpriseDB, India
[GENERAL] Confirming \timing output
With \timing set on, I run an update statement and it reports Time: 0.524 ms Is that really 0.524 ms? As in 524 nanoseconds? Also, is this wallclock time or some sort of indication of how much cpu it took? Thanks for any answers !
Re: [GENERAL] Amazon High I/O instances
I think both kind of tests (general and app specific) are complementary and useful in their own way. At a minimum, if the general ones fail, why go to the expenses of doing the specific ones ? Setting up a meaningful application test can take a lot of time and it can be hard to pinpoint exactly where in the stack the performance drops occur. The way I see it, synthetic benchmarks allow to isolate somewhat the layers and serve as a base to validate application tests done later on. It surprises me that asking for the general perf behavior of a platform is controversial. Sébastien On Thu, Aug 23, 2012 at 11:51 AM, John R Pierce pie...@hogranch.com wrote: On 08/23/12 6:49 AM, Craig Ringer wrote: In this case, what he's doing is seeking generalized performance measurements. I don't think details were particularly necessary until it got pulled off-track. 42 performance measurements without a very narrow definition of 'performance' are useless.depending on the nature of the application workload, postgres can stress completely different aspects of the system (cpu vs read IO performance vs write IO performance being the big three). -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/**mailpref/pgsql-generalhttp://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Confirming \timing output
On Aug 23, 2012, at 11:13 AM, Gauthier, Dave dave.gauth...@intel.com wrote: With \timing set on, I run an update statement and it reports Time: 0.524 ms Is that really 0.524 ms? As in 524 nanoseconds? 0.524ms = 524000ns Perhaps you meant microseconds? 0.524ms = 524us If all your data happens to be in RAM cache, simple queries can execute very fast! Unless you have a reason to believe it's wrong, I would trust it to be accurate :-) Also, is this wallclock time or some sort of indication of how much cpu it took? Thanks for any answers ! \timing measures wall time. There's a more detailed discussion of the difference between this and e.g. EXPLAIN ANALYZE here: http://postgresql.1045698.n5.nabble.com/What-does-timing-measure-td4289329.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
On 08/23/12 11:24 AM, Sébastien Lorion wrote: I think both kind of tests (general and app specific) are complementary and useful in their own way. At a minimum, if the general ones fail, why go to the expenses of doing the specific ones ? Setting up a meaningful application test can take a lot of time and it can be hard to pinpoint exactly where in the stack the performance drops occur. The way I see it, synthetic benchmarks allow to isolate somewhat the layers and serve as a base to validate application tests done later on. It surprises me that asking for the general perf behavior of a platform is controversial. I don't use AWS at all. But, it shouldnt take more than a couple hours to spin up an instance, populate a pgbench database and run a series of pgbench runs against it, and do the same against any other sort of system you wish to use as your reference. I like to test with a database about twice the size of the available memory if I'm testing IO, and I've found that pgbench -i -s , for =1 it generates a 1 billion row table and uses about 150GB (and a hour or so to initialize on fast IO hardware). I then run pgbench with -c of about 2-4X the cpu/thread count, and -j of about -c/16, and a -t of at least 1 (so each client connection runs 1 transactions). on a modest but decent 2U class 2-socket dedicated server with a decent raid card and raid10 across enough spindles, I can see numbers as high as 5000 transactions/second with 15krpm rust, and 7000-8000 with a couple MLC SSD's striped. trying to raid10 a bunch of SATA 7200 disks gives numbers more like 1000. using host based raid, without a write-back cache in the raid card, gives numbers about 1/2 the above. the IOPS during these tests hit around 12000 or 15000 small writes/second. doing this level of IO on a midsized SAN can often cause the SAN CPU to run at 80%+ so if there's other activity on the SAN from other hosts, good luck. in a heavily virtualized shared-everything environment, I'm guessing your numbers will be all over the place and difficult to achieve consistency. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Confirming \timing output
On 08/23/12 11:13 AM, Gauthier, Dave wrote: Time: 0.524 ms Is that really 0.524 ms? As in 524 nanoseconds? 0.524 MILLIseconds. as in 524 microseconds. microseconds is commonly abbreviated us. afaik, its elapsed time, not CPU time. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Windows SIngle Sign On - LINUX Server
Hi All, We are currently running PostgreSQL 8.4 on Windows server 2003 and are planning to move the instance to Ubuntu 10.4 - yay!. At the same time we will also upgrade to 9.1. One nice features that we leverage from the windows configuration is the ability for windows clients to use AD SSO i.e SSPI. This was really easy to set-up and configure. If we move to Linux I was wondering if anyone could provide a howto reference or some tips on how to set-up auth configuration to provide SSO to windows clients? I've read the GSSAPI/Kerberos authentication section of the docs, but it's still unclear to me the exact process to follow. In particular what things would I need to configure on the Ubuntu server and which tasks will need to be done on the Windows domain controller (which is managed by a third party service provider who knows little about PostgreSQL or Ubuntu) We are using a WIndows server 2008 for the domain control. However I know little about it's setup or configuration, I only know it's our windows domain realm. Regards, Jeremy This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or i...@linz.govt.nz) and destroy the original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank You. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How hard would a path operator be to implement in PostgreSQL
On Tue, Aug 21, 2012 at 2:56 AM, Craig Ringer ring...@ringerc.id.au wrote: On 08/21/2012 03:01 PM, Martijn van Oosterhout wrote: Well, Postgres in principle supports arrays of records, so I've wondered if a relationship join could stuff all the objects in a single field of the response using an aggregate. I think what's always prevented this from working is that client would have to parse the resulting output text output, which is practically impossible in the face of custom types. That's where the new JSON support is interesting; it provides a much more commonly understood and easier to parse structured form for results, so trees (but not more general graphs) can be returned. I'd go beyond 'interesting' and call it a complete game changer if you are consuming data in a language that has good support for JSON (especially javascript naturally). Another good option if you're consuming structured data in C is libpqtypes. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Some thoughts on table inheritance (which is uniquely awesome on PostgreSQL)
On Wed, Aug 22, 2012 at 10:22 PM, Chris Travers chris.trav...@gmail.com wrote: I have now been working with table inheritance for a while and after starting to grapple with many of the use cases it has have become increasingly impressed with this feature. I also think that some of the apparent limitations fundamentally follow from the support for multiple inheritance, and multiple inheritance itself is so useful I would not want to see this go away. Inheritance really starts to come to its own once you start using table methods, and some features that are useful in some sorts of inheritance modelling are useless in others. The problem with postgres table inheritance is that it doesn't really solve the problem that people wanted solved: to be able to define an set of specific extra attributes for each row depending on some characteristic of that row. The feature only tantalizingly masquerades as such. Until it found use in table partitioning, I found the inheritance feature to be basically useless. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problems with timestamp with time zone and old dates?
HI Scott. Thanks a lot for the feedback. I ended up setting the client time zone to GMT on my connections, and that has fixed the problem for us. It's actually an awesome solution, we can now expect all timestamps to be returned in a consistent fashion. Thanks for prodding me on that and sending me down that road! Michael. On Wed, Aug 22, 2012 at 12:19 AM, Scott Marlowe scott.marl...@gmail.comwrote: On Tue, Aug 21, 2012 at 10:12 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Tue, Aug 21, 2012 at 10:08 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Tue, Aug 21, 2012 at 3:29 PM, Michael Clark codingni...@gmail.com wrote: For example, if I insert like so: INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00'); What's the reason for you inserting with an offest instead of letting the client timezone set it for you? Just wondering. Note that if you just want to get out what you're putting in (GMT) you can do this: select startdate at time zone 'GMT' from sometable ; timezone - 1750-08-21 21:17:00 Or you could just use plain timezones, not timezones with timestamp.
Re: [GENERAL] Problems with timestamp with time zone and old dates?
On Wed, Aug 22, 2012 at 5:00 AM, hubert depesz lubaczewski dep...@depesz.com wrote: On Tue, Aug 21, 2012 at 05:29:14PM -0400, Michael Clark wrote: For example, if I insert like so: INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00'); I get the following when I select: SELECT startdate FROM sometable; startdate -- 1750-08-21 15:59:28-05:17:32 (1 row) It's impossible to tell without knowing what is your time zone, but I don't see anything particularly strange about it. Non-integer offsets do happen, and in the past there were more commonly used. Based on what I can see in timezone data, it looks that time zone America/Toronto used this offset until 1895. Ah, I see. That makes sense. Thanks for the info! It's appreciated! Michael.
[GENERAL] need substring based on delimiter
Hi: I want to create a string from the first 3 elements of a csv (for example). The csv is longer than 3 elements. Example... aaa,bbb,ccc,ddd,eee,fff,ggg I want the string aaa,bbb,ccc. Tried splitting this to an array (precursor to appending elements 1,2,3), but failed to be able to reference the elements returned... stdb_a0=# select string_to_array('aaa,bbb,ccc,ddd,eee,fff,ggg',','); string_to_array --- {aaa,bbb,ccc,ddd,eee,fff,ggg} (1 row) stdb_a0=# select string_to_array('aaa,bbb,ccc,ddd,eee,fff,ggg',',')[1]; ERROR: syntax error at or near [ If there was a way position would return the pos of the 3rd comma, I could use sustring. But I don't think it can :-(
Re: [GENERAL] Statistical aggregates with intervals
Thomas, On Wed, Aug 22, 2012 at 12:25 PM, Thomas Munro mu...@ip9.org wrote: I noticed that 'avg' works on 'interval', but 'stddev' and 'variance' don't: I don't know why, but you could convert 'interval' into something else where all the functions work: CREATE OR REPLACE FUNCTION interval_to_seconds(interval) RETURNS double precision AS $$ SELECT (extract(days from $1) * 86400) + (extract(hours from $1) * 3600) + (extract(minutes from $1) * 60) + extract(seconds from $1); $$ LANGUAGE SQL; Cheers, Chris -- Christopher Swingley Fairbanks, Alaska http://swingleydev.com/ cswin...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] need substring based on delimiter
On 08/23/2012 02:30 PM, Gauthier, Dave wrote: I want to create a string from the first 3 elements of a csv (for example). The csv is longer than 3 elements. Example... aaa,bbb,ccc,ddd,eee,fff,ggg I want the string aaa,bbb,ccc. select array_to_string ( ( string_to_array ( 'aaa,bbb,ccc,ddd,eee,fff,ggg',',' ) )[1:3],',' ); array_to_string - aaa,bbb,ccc (1 row) HTH, Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Statistical aggregates with intervals
Hi, On 24 August 2012 07:39, Christopher Swingley cswin...@gmail.com wrote: I don't know why, but you could convert 'interval' into something else where all the functions work: CREATE OR REPLACE FUNCTION interval_to_seconds(interval) RETURNS double precision AS $$ SELECT (extract(days from $1) * 86400) + (extract(hours from $1) * 3600) + (extract(minutes from $1) * 60) + extract(seconds from $1); $$ LANGUAGE SQL; Looks complicated. You can extract 'epoch': db=# select now() - (now() - interval '1 day'); ?column? -- 1 day (1 row) db=# select extract(epoch from (now() - (now() - interval '1 day'))); date_part --- 86400 (1 row) -- Ondrej Ivanic (ondrej.iva...@gmail.com) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] At what point does a big table start becoming too big?
On Wed, Aug 22, 2012 at 4:06 PM, Nick nboutel...@gmail.com wrote: I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? No. Assuming you decided it were too big, what could you do about it? If there are chunks of data that you don't need anymore, why wait for the table to be become too big before removing it? And partitioning very often isn't the answer, either. There are very few problems that ill-conceived partitioning won't make worse. And there are very many problems which even the best-conceived partitioning will fail to improve. If you have one of the cases where partitioning is a good solution, don't wait for the table to become 'too big'. Just go do it. For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? If you need to be able to rapidly insert new rows in bulk, and you can't sort them before inserting because there are multiple indexes with completely different sort order and they cover the entire key range of at least some of the indexes, than your performance will collapse long before you get to 6x the amount of RAM. But, what can you do about it? Maybe partitioning will fix this, maybe it won't. If it will, why wait for a rule of thumb to be met? If it won't, what do you actually do once the rule of thumb is met? I guess one rule of them I would have is, if for some reason I had to cluster or reindex the table, how long would it take to do so? If that is much longer than I can reasonably schedule as a maintenance window, I would be worried. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] At what point does a big table start becoming too big?
On Thu, Aug 23, 2012 at 6:37 AM, Bill Moran wmo...@potentialtech.com wrote: In response to Martin French martin.fre...@romaxtech.com: I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? My rule here is that a table is too big when performance starts degrading beyond an acceptable level. The challenge there is that if you wait until performance degrades beyond an acceptable level, you've allowed yourself to get into a situation where clients are upset and frustrated, and fixing the problem is difficult because there's so much data to manipulate to rearrange things. Yes, I agree with that. And the advice I have along those lines is to establish now what constitutes unacceptable performance, and put some sort of monitoring and tracking in place to know what your performance degradation looks like and predict when you'll have to react. For example, a MRTG graph that runs an experimental query once a day during off hours and graphs the time it takes vs. the # of rows in the table will prove a valuable tool that can sometimes predict exactly when you'll have to change things before it becomes a problem. This seems inconsistent with your previous advice. By the time your experimental query shows a problem, you no longer have any maintenance windows left large enough to fix it. Unless your experimental query was a reindex or something non-production like that, in which case running it on a production server, even off-hours, doesn't seem like a good idea. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] FETCH in subqueries or CTEs
Hi all I've noticed that FETCH doesn't seem to be supported in subqueries or in CTEs. Is there a specific reason for that, beyond nobody's needed it and implemented it? I'm not complaining at all, merely curious. A toy example: DECLARE somecursor CURSOR FOR SELECT generate_series(1,1000) SELECT * FROM ( FETCH ALL FROM somecursor ) x; produces: ERROR: syntax error at or near FETCH LINE 1: SELECT * FROM ( FETCH ALL FROM somecursor ) x; ^ Same with a CTE: WITH x AS ( FETCH ALL FROM somecursor ) SELECT * FROM x; ERROR: syntax error at or near FETCH LINE 1: WITH x AS ( FETCH ALL FROM somecursor ) SELECT * FROM x; -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Some thoughts on table inheritance (which is uniquely awesome on PostgreSQL)
On Thu, Aug 23, 2012 at 12:36 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Aug 22, 2012 at 10:22 PM, Chris Travers chris.trav...@gmail.com wrote: I have now been working with table inheritance for a while and after starting to grapple with many of the use cases it has have become increasingly impressed with this feature. I also think that some of the apparent limitations fundamentally follow from the support for multiple inheritance, and multiple inheritance itself is so useful I would not want to see this go away. Inheritance really starts to come to its own once you start using table methods, and some features that are useful in some sorts of inheritance modelling are useless in others. The problem with postgres table inheritance is that it doesn't really solve the problem that people wanted solved: to be able to define an set of specific extra attributes for each row depending on some characteristic of that row. The feature only tantalizingly masquerades as such. This is true to some extent. I may have found a solution there, which is to build your inheritance solutions on top of the solutions used for table partitioning. It's one reason why I say you should start working with it on table partitioning before you try to do set/subset modelling elsewhere. Interestingly one appealing solution doesn't really work (which is to put a check constraint which checks the tableoid column, presumably because this isn't set on insert until after the check constraint fires). One thing I have found looking through Oracle and DB2 docs is that their table inheritance seems to have all the same problems as ours and their solutions to these problems seem rather broken from a pure relational perspective. For example, Oracle and DB2 make extensive use of OID's here (which must be recorded in some sort of system catalog somewhere given what they do with them), and they have functions to take a reference to a row and operators to dereference the row. This sounds all good and well until you come across the IS DANGLING operator, which returns true when the reference no longer is there... In other words, as far as I can see nobody else has come up with a sane foreign key solution for inherited tables either. Until it found use in table partitioning, I found the inheritance feature to be basically useless. I think one can actually borrow techniques from table partitioning to solve the problems associated with inheritance. However here's what turned me around on table inheritance: 1) First, in LedgerSMB, we started using it to create consistent interfaces to sets of storage tables. The storage tables would behave differently, but would inherit essentially interfaces from their parents. In this regard, you can think of an inheritance tree as a partitioned table set, but where the partitioning is necessary because foreign key fields reference different tables in different children. We use this for example, to avoid having to have a global notes table or global file attachments table and it gives us clear control over where these can be attached along with central maintenance of data structures. In cases, like with file attachments, where foreign keys to inheritance trees ended up being needed, we started out with a more complex but workable solution but I think are going to a simpler one. This is a good thing. In essence what we did was use inheritance to give us variable target tables for a foreign key column. I would still like to see inheritable foreign key constraints because that would make some things a lot easier, but the idea that foreign keys are not, by default, copied in, means that you can override the destination in the child table. It isn't the use documented but it actually works very well. 2) Secondly I re-read Stonebraker's Object-Relational Database: The Next Wave and I had a rather sudden epiphany. Relational databases are about modelling your data so you can ensure consistency and gain as many answers as you can. Object-relational modelling adds interfaces (possibly written in arbitrary programming languages) to derive additional information from stored information. The example he gives could be summarized in English to be Give me all pictures of sunsets taken within 20 miles of Sacramento where whether a picture is of a sunset is determined by analyzing the graphic itself. Thus you have to add features to allow you to plug into the query to answer that question, and you have to have a planner capable of optimizing such a query. I also read some other papers which discussed table inheritance and what sort of modelling problems it was designed to solve (the main one is actually part/whole modelling where a row may be a whole in itself and also a part of another whole--- for example we might sell timing belts, but they might also come included in an engine assembly). 3) I was talking with Matt Trout regarding object-oriented programming in Perl, and
Re: [GENERAL] FETCH in subqueries or CTEs
On Fri, 2012-08-24 at 09:35 +0800, Craig Ringer wrote: Hi all I've noticed that FETCH doesn't seem to be supported in subqueries or in CTEs. Is there a specific reason for that, beyond nobody's needed it and implemented it? I'm not complaining at all, merely curious. 1. Cursors have their own snapshot, so it would be kind of like looking at two snapshots of data at the same time. That would be a little strange. 2. For regular subqueries, it would also be potentially non-deterministic, because the FETCH operation has the side effect of advancing the cursor. So, if you had something like SELECT * FROM (FETCH 1 FROM mycursor) x WHERE FALSE, it's not clear whether the FETCH would execute or not. After the query, it may have advanced the cursor or may not have, depending on whether the optimizer decided it didn't need to compute the subquery. 3. Cursors are really meant for a more effective interaction with the client, it's not really meant as an operator (and it doesn't change the results, anyway). You can already do LIMIT/OFFSET in a subquery if you need that kind of thing. All that being said, there may be some use case for something like what you are describing, if you get creative. Regards, Jeff Davis -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Some thoughts on table inheritance (which is uniquely awesome on PostgreSQL)
One other thing that seems worth mentioning is that as soon as you jump from relational to object-relational modelling is that the latter is more rich and hence more complex than the former. Because object-relational modelling is a much expanded semantic superset of relational modelling, the antipatterns are a much expanded superset there too. Additionally because the patterns are not yet well understood, the whole area needs to be seen as somewhat immature. That doesn't mean that the features are primarily useful as foot-guns, but it does mean that productive use of features like this involves careful documentation of patterns and antipatterns. Best Wishes, Chris Travers -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Can column name aliases be supported?
On 08/23/2012 10:19 PM, Gauthier, Dave wrote: Updateable views. This is great. I didn't know about these. Absolutely delicious ! I found a great example here... http://vibhorkumar.wordpress.com/2011/10/28/instead-of-trigger/ The problem of user updating 1 alias remains, but I have no problem bouncing them if they attempt that. Maybe return a -E- You cannot update multiple aliases of the same base value (something like that). I'd just have to detect this circumstance, raise the message and bounce (return OLD). Does this sound doable? With a RULE, probably not. With a `DO INSTEAD` view trigger - available in Pg 9.1 and above - yes. I'd recommend using a view trigger instead of a rule if at all possible. Rules are tricksy things and sooner or later they'll bite you. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Some thoughts on table inheritance (which is uniquely awesome on PostgreSQL)
Hi, On 24 August 2012 11:44, Chris Travers chris.trav...@gmail.com wrote: One thing I have found looking through Oracle and DB2 docs is that their table inheritance seems to have all the same problems as ours and their solutions to these problems seem rather broken from a pure relational perspective. I can second that. Additionally, some vendors tried to fix partitioning (which uses table inheritance) issues by creating all sort of extension like CREATE TABLE ... PARTITION BY, and ALTER TABLE ... ALTER PARTITION ... which create all sorts of issues which are not documented at all but you get response like yes, we know about this bug; fix not yet available. Many people asked for SQL wrappers for table partitioning but that's not easy to do. I would be happy to have out of the box auto-routing for insert/update/copy statements -- Ondrej Ivanic (ondrej.iva...@gmail.com) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgresql 9.1 on VMWare ESXi 5.0
Dear friends, Anyone has experienced extreme slowness running PostgreSQL 9.1.4 on virtualized CentOS 5.8 on VMware ESXi 5.0 (with all Paravirtualized drivers)? By extreme slowness, consider a query that brings one record from a table with 5000 records (using the PK as criteria) takes 200ms... for comparision, it takes 0ms in my local server (not virtualized). I can see that disks have almost same performance as physical disks: 110Mb/s. Your help will be greatly appreciated! Regards, Edson -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] At what point does a big table start becoming too big?
On Thu, 23 Aug 2012 17:56:37 -0700 Jeff Janes jeff.ja...@gmail.com wrote: On Thu, Aug 23, 2012 at 6:37 AM, Bill Moran wmo...@potentialtech.com wrote: In response to Martin French martin.fre...@romaxtech.com: I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? My rule here is that a table is too big when performance starts degrading beyond an acceptable level. The challenge there is that if you wait until performance degrades beyond an acceptable level, you've allowed yourself to get into a situation where clients are upset and frustrated, and fixing the problem is difficult because there's so much data to manipulate to rearrange things. Yes, I agree with that. And the advice I have along those lines is to establish now what constitutes unacceptable performance, and put some sort of monitoring and tracking in place to know what your performance degradation looks like and predict when you'll have to react. For example, a MRTG graph that runs an experimental query once a day during off hours and graphs the time it takes vs. the # of rows in the table will prove a valuable tool that can sometimes predict exactly when you'll have to change things before it becomes a problem. This seems inconsistent with your previous advice. By the time your experimental query shows a problem, you no longer have any maintenance windows left large enough to fix it. Unless your experimental query was a reindex or something non-production like that, in which case running it on a production server, even off-hours, doesn't seem like a good idea. Perhaps I didn't explain the approach sufficiently. If you can establish something like, This specific SELECT has to run in under 5 minutes to meet the client's expectations you can then time how long that query takes each time it's run (by capturing that information in the application, for example ... or by running it in some automated fashion ... possibly other methods as well). If you capture that runtime on a regular basis and put the results on a graph in concert with other relevant data, such as the number of rows in the related tables, size of the data, etc, you quickly get a good picture of how fast things are growing, and frequently you can project the line out into the future and say things like if we don't come up with a better way to do this by Sept of next year, we're going to exceed our allowed run time. You can then take that very detailed information to business planners and point out that they need to schedule developer time _before_ then if they don't want the application to slow down below the allowable level. Unless you work for somewhere that has unlimited resources, your time is always split between feature requests, day to day operations, firefighting, etc. In my experience, keeping things like this under control is often a matter of having enough information to justify why your optimization project is more important than whizbang feature x that marketing wants so bad. Of course, if you work somewhere with unlimited resources, you should let me know so I can send in my resume. And none of what I'm suggesting is intended to belittle the other suggestions either -- if you know of a way to optimize the data better, why not do it now? If you can be purging old data, why wait until performance is a problem to start purging, etc. It's just another trick to have in your bag. -- Bill Moran wmo...@potentialtech.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Confirming \timing output
On 08/24/2012 02:30 AM, John R Pierce wrote: On 08/23/12 11:13 AM, Gauthier, Dave wrote: Time: 0.524 ms Is that really 0.524 ms? As in 524 nanoseconds? 0.524 MILLIseconds. as in 524 microseconds. microseconds is commonly abbreviated us. They should be µs ; (micro µ seconds s). Sadly, many setups still can't type anything outside 7-bit ASCII even in 2012 :-( -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] At what point does a big table start becoming too big?
Hi, On 23 August 2012 23:37, Bill Moran wmo...@potentialtech.com wrote: And the advice I have along those lines is to establish now what constitutes unacceptable performance, and put some sort of monitoring and tracking in place to know what your performance degradation looks like and predict when you'll have to react. For example, a MRTG graph that runs an experimental query once a day during off hours and graphs the time it takes vs. the # of rows in the table will prove a valuable tool that can sometimes predict exactly when you'll have to change things before it becomes a problem. Other tricks work as well, such as having the application send an email any time a process takes more than 50% of the allowable maximum time. I like to use APDEX (http://apdex.org/specs.html). You can change your database to all time for all statements and then calculate APDEX score based on last N log entries ( 10). APDEX score is weighted score based on number of datapoints within three zones: 0...T Satisfied Zone T..F (=4*T) Tolerating Zone 4T... Frustrated Zone you can choose T (or F; then T = F/4) i.e. under normal circumstances all queries should finish under 20ms (T = 20ms, F = 4T = 80ms). Apdex score is: score = (Satisfied count + Tolerating count / 2) / Total samples You can get this number, for example, every minute and plot it using Ganglia / MRTG / ... -- Ondrej Ivanic (ondrej.iva...@gmail.com) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Confirming \timing output
On 08/23/12 7:31 PM, Craig Ringer wrote: 0.524 MILLIseconds. as in 524 microseconds. microseconds is commonly abbreviated us. They should be µs ; (micro µ seconds s). Sadly, many setups still can't type anything outside 7-bit ASCII even in 2012 yeah, I know I could enter the alt+xyz except this laptop keyboard doesn't have a number pad, and I was way way too lazy to find and copy/paste one, or to use charmap. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgresql 9.1 on VMWare ESXi 5.0
On 08/24/2012 10:18 AM, Edson Richter wrote: Dear friends, Anyone has experienced extreme slowness running PostgreSQL 9.1.4 on virtualized CentOS 5.8 on VMware ESXi 5.0 (with all Paravirtualized drivers)? https://wiki.postgresql.org/wiki/Slow_Query_Questions Examine `EXPLAIN ANALYZE` for virtual and physical, compare. If it's complicated, post both to explain.depesz.com . Examine `iostat`, `vmstat` and `top` to see where the bottlenecks lie. etc. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
I will be setting up an instance in the coming days and post the results here. While reading on the subject, I found this interesting discussion on YCombinator: http://news.ycombinator.com/item?id=4264754 Sébastien On Thu, Aug 23, 2012 at 2:41 PM, John R Pierce pie...@hogranch.com wrote: On 08/23/12 11:24 AM, Sébastien Lorion wrote: I think both kind of tests (general and app specific) are complementary and useful in their own way. At a minimum, if the general ones fail, why go to the expenses of doing the specific ones ? Setting up a meaningful application test can take a lot of time and it can be hard to pinpoint exactly where in the stack the performance drops occur. The way I see it, synthetic benchmarks allow to isolate somewhat the layers and serve as a base to validate application tests done later on. It surprises me that asking for the general perf behavior of a platform is controversial. I don't use AWS at all. But, it shouldnt take more than a couple hours to spin up an instance, populate a pgbench database and run a series of pgbench runs against it, and do the same against any other sort of system you wish to use as your reference. I like to test with a database about twice the size of the available memory if I'm testing IO, and I've found that pgbench -i -s , for =1 it generates a 1 billion row table and uses about 150GB (and a hour or so to initialize on fast IO hardware). I then run pgbench with -c of about 2-4X the cpu/thread count, and -j of about -c/16, and a -t of at least 1 (so each client connection runs 1 transactions). on a modest but decent 2U class 2-socket dedicated server with a decent raid card and raid10 across enough spindles, I can see numbers as high as 5000 transactions/second with 15krpm rust, and 7000-8000 with a couple MLC SSD's striped. trying to raid10 a bunch of SATA 7200 disks gives numbers more like 1000. using host based raid, without a write-back cache in the raid card, gives numbers about 1/2 the above. the IOPS during these tests hit around 12000 or 15000 small writes/second. doing this level of IO on a midsized SAN can often cause the SAN CPU to run at 80%+ so if there's other activity on the SAN from other hosts, good luck. in a heavily virtualized shared-everything environment, I'm guessing your numbers will be all over the place and difficult to achieve consistency. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/**mailpref/pgsql-generalhttp://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FETCH in subqueries or CTEs
Hello 2012/8/24 Craig Ringer ring...@ringerc.id.au: Hi all I've noticed that FETCH doesn't seem to be supported in subqueries or in CTEs. Is there a specific reason for that, beyond nobody's needed it and implemented it? I'm not complaining at all, merely curious. A toy example: DECLARE somecursor CURSOR FOR SELECT generate_series(1,1000) SELECT * FROM ( FETCH ALL FROM somecursor ) x; produces: ERROR: syntax error at or near FETCH LINE 1: SELECT * FROM ( FETCH ALL FROM somecursor ) x; ^ Same with a CTE: WITH x AS ( FETCH ALL FROM somecursor ) SELECT * FROM x; ERROR: syntax error at or near FETCH LINE 1: WITH x AS ( FETCH ALL FROM somecursor ) SELECT * FROM x; you can't mix planned and unplanned statements together - think about stored plans every time Regards Pavel -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general