[HACKERS] pg_dump permissions
Suppose I have a database with a table in it. I try to dump that database. The user I dump as is not the owner of that table nor does it have the SELECT permission on that table. Why can't I do a schema-only dump of that table? I find that I need SELECT permission on the table, even though my user can happily query the catalogs. PostgreSQL version: 7.4 Chris ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Segv in pg_autovacuum in 8.0.x
Apologies if this is old news, but pg_autovacuum in 8.0.x has the bad habit of SEGVing and exiting when a table gets dropped out from under it. This creates problems if you rely on pg_autovacuum for the bulk of your vacuuming as it forgets it's statistics when it is restarted and so will skip some desireable vacuums. I have a patch for this that should apply against any 8.0.x if there is any interest. I looked at the new autovacuum in 8.1 and it appears from casual inspection not to have the same problem. -dg -- David Gould [EMAIL PROTECTED] If simplicity worked, the world would be overrun with insects. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 2nd try @NetBSD/2.0 Alpha
On Tue, Oct 18, 2005 at 10:07:26PM -0500, Larry Rosenman wrote: ...or attach with a debugger like gdb. d'oh. I go stupid occasionally :) If someone wants me to, I can try that. Yes, actually. See, its dying in the seg test already with: -- Open intervals SELECT '0..'::seg AS seg; ! ERROR: floating-point exception ! DETAIL: An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, s uch as division by zero. SELECT '0...'::seg AS seg; ! ERROR: floating-point exception ! DETAIL: An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, s uch as division by zero. You need to attach a debugger to find out where that error is actually happenening. Just startup the backend, connect to it and connect gdb to the newly spawned backend and just run that query by hand. Then you should get the backtrace at SIGFPE. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgp437RUDPiCT.pgp Description: PGP signature
[HACKERS] collector/autovacuum after crash (8.1beta3)
Hi All, At system crash or poweroff the autovacuum statistics will be lost, because this statistics only stored in RAM and saved/restored at service shutdown/startup. I think it should be saved periodically and not to be deleted after crash. -Sygma ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Question about Ctrl-C and less
On 2005-10-19, Kevin Brown [EMAIL PROTECTED] wrote: Making assumptions about what the pager will do upon receipt of SIGINT is folly as well. Setting up SIGINT to be ignored may be the right answer (I don't believe it is -- see below), but if so then it needs to be done properly. If it gets ignored prior to the popen(), then the child will also end up ignoring it by default, because signal disposition is inherited by child processes. If we ignore SIGINT, it should be after the popen(), not before. I do not believe it is possible to do the signal disposition correctly and still use popen() to run the pager. (You would need to reimplement popen using raw syscalls.) So I think the right answer here is for psql to handle SIGINT internally by doing a pclose() first The chances that psql can do this safely approach zero. pclose() is not a signal-safe function, so it can only be called from a signal handler if you _know_ that the signal did not interrupt any non-signal-safe function. (Nor can the signal handler longjmp out in such a case, unless the code is never again going to call any unsafe function.) -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 2nd try @NetBSD/2.0 Alpha
Martijn van Oosterhout wrote: On Tue, Oct 18, 2005 at 10:07:26PM -0500, Larry Rosenman wrote: ...or attach with a debugger like gdb. d'oh. I go stupid occasionally :) If someone wants me to, I can try that. Yes, actually. See, its dying in the seg test already with: -- Open intervals SELECT '0..'::seg AS seg; ! ERROR: floating-point exception ! DETAIL: An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, s uch as division by zero. SELECT '0...'::seg AS seg; ! ERROR: floating-point exception ! DETAIL: An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, s uch as division by zero. You need to attach a debugger to find out where that error is actually happenening. Just startup the backend, connect to it and connect gdb to the newly spawned backend and just run that query by hand. Then you should get the backtrace at SIGFPE. I don't have the time today (need to do some paying work). However, If someone wants, I can pass an ID/PW along so that they May find it. (or it can wait till the weekend). LER -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 3535 Gaspar Drive, Dallas, TX 75220-3611 US ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Segv in pg_autovacuum in 8.0.x
daveg wrote: Apologies if this is old news, but pg_autovacuum in 8.0.x has the bad habit of SEGVing and exiting when a table gets dropped out from under it. This creates problems if you rely on pg_autovacuum for the bulk of your vacuuming as it forgets it's statistics when it is restarted and so will skip some desireable vacuums. I have a patch for this that should apply against any 8.0.x if there is any interest. Please post the patch, it might get committed into 8.0.x. I personally haven't heard any other reports of it dieing when tables are dropped, but I don't doubt there are some lingering issues like this. I looked at the new autovacuum in 8.1 and it appears from casual inspection not to have the same problem. The autovacuum in 8.1 is a very different critter. While its design is based on the contrib autovacuum, it is mostly a total rewrite, and probably a vast improvement :-) Matt ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] A costing analysis tool
If we stored the actual queries and the EXPLAIN ANALYZE results (when generated) in the database, what would be the purpose of the node_name, db_object, and condition_detail columns? They don't seem like they would be useful for statistical analysis, and it seems like the information would be more useful in context. Are these column really needed? For a given node_type, are there mutiple valid condition_type values? If so, I need to modify my python script to capture this. If not, I don't see a need to store it. -Kevin Josh Berkus josh@agliodbs.com [snip] so, for example, the query step: - Seq Scan on detail0009 (cost=0.00..20500.11 rows=26 width=1005) (actual time=453.000..5983.000 rows=53588 loops=1) Filter: ((txcontenttype ~~* '%html%'::text) AND ((vchost)::text ~~* '%www.%'::text)) Could be displayed as: query_instance 12008 step_id 14701 parent_step 14698 node_name Seq Scan on detail0009 node_type Seq Scan cost_start 0 cost_end20500.11 est_rows26 time_start 453.0 time_end5983.0 actual_rows 53588 loops 1 db_object detail009 condition_type Filter condition_detail((txcontenttype ~~* '%html%'::text) AND ((vchost)::text ~~* '%www.%'::text)) By collecting all of this data, you make it possible to perform other sorts of analysis on the cost estimates. For example, statistical analysis might tell us that 3-or-more-condition filters take significantly longer to execute than single-condition filters, which would be important to know for the cost model. Limiting it to collecting only 3 of the 13 bits of node data produced by EA would very much limit the usefulness of the tool and the reliability of its statistics. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] A costing analysis tool
Kevin, If we stored the actual queries and the EXPLAIN ANALYZE results (when generated) in the database, what would be the purpose of the node_name, db_object, and condition_detail columns? They don't seem like they would be useful for statistical analysis, and it seems like the information would be more useful in context. Are these column really needed? Yes. For example, the only way you're going analyze index costing by type of index is if the index name is stored somewhere (db_object) so that it can be matched to its characteristics. For condition_detail, again we could determine that (for example) we have costing problems when filters involve more than 2 columns or complex expressions. Node_name is as actually duplicative of some of the other columns, so I suppose it could be dropped. For a given node_type, are there mutiple valid condition_type values? If so, I need to modify my python script to capture this. If not, I don't see a need to store it. I'm not sure. Even if there aren't now, there could be in the future. I'm more focused on supporting cross-node-type conditions. For example, Filter conditions can apply to a variety of node types (Index Scan, Merge Join, Subquery Scan, Seq Scan, aggregates). If we were costing Filters, we'd want to be able to aggregate their stats regardless of the node in which they occurred. I'm also really unclear on why you're so focused on storing less information rather than more. In an investigation tool like this, it's important to collect as much data as possible because we don't know what's going to be valuable until we analyze it. You seem to be starting out with the idea that you *already* know exactly where the problems are located, in which case why develop a tool at all? Just fix the problem. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 8.04 and RedHat/CentOS init script issue
Hi, On Tue, 18 Oct 2005, Tony Caduto wrote: I installed 8.04 via RPM on Centos 4.2 which is the same as RedHat 4.2 and while booting the init script reports that the daemon [FAILED], but after I logon it shows the postmaster running and I am able to connect from any client remotely. I made not modifcations to the script and there is nothing out of the ordinary in the log. Hmm. In 8.0.4 RPM init scripts, we were using a 1 second of sleep time (see sleep 1 line in the init script). On some cases where the system is slow, you are prompted about the startup failure; however this is not the real case. In 8.1 RPMs, the sleep time was increased to 2 seconds; which we believe that won't have the problem you've reported: http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgsqlrpms/patches/8.1/postgresql.init?rev=1.2content-type=text/x-cvsweb-markup So please increase this sleep time and give another try. Regards, -- Devrim GUNDUZ Kivi Bilişim Teknolojileri - http://www.kivi.com.tr devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] A costing analysis tool
Summary of schema I'm considering. Comments welcome. When it gets downt to the detail, it may make sense to combine or split some of these. For example, runtime_options should probably not have a column for each currently known option, but a child table which maps to all non-default option values. submitter Identifies who submitted test results. Name, optional info such as email address and organization runtime_environment Identifies the overall test environment. OS with distribution version, CPU number, type speed, RAM, background load, static configuration, etc. This provides context for a series of tests, to see how the numbers look in a given environment dataset_characteristics Identifies data metrics which may affect costing accuracy Table counts, row counts, column counts, disk space used, level of fragmentation. Maybe some of the standard tests will share common dataset characteristics across multiple environments. cache_state Identifies the level of initial caching for a test, and the degree to which the referenced data can be cached during execution runtime_options Identifies the runtime choices in effect for a query run. The state of EXPLAIN, ANALYZE, enable_xxx, and dynamic configuration settings, such as random_page_cost. query Identifies a test query, possibly run by many people in many environments against various datasets with different cache states and runtime options test_result_summary Ties a query to details about a run, with a summary of results. Run time from the client perspective, rows returned. test_result_step_detail Shows EXPLAIN ANALYZE information (if any) for each step. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] A costing analysis tool
Kevin, When it gets downt to the detail, it may make sense to combine or split some of these. For example, runtime_options should probably not have a column for each currently known option, but a child table which maps to all non-default option values. I'm a little cautious about storing only non-defaults; the defaults have changed from version to version, after all. If we did that, we'd need to have a defaults table in the db as a reference list. Also, we'll need to store runtime options both on the machine level and on the query level, in order to allow testing of changing an enable_* or other query cost option at runtime. Not sure how to capture this, though. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] A costing analysis tool
I'm not interested in storing less information. I'm trying to make sure that all redundant information is justified. Since I plan to store the actual query text and the full EXPLAIN ANALYZE output, every column I pull out and put in another table is redundant data. The questions are, why do we have it, is it going to be used heavily enough in such a way that it should be stored redundantly? If the goal is to be able to get at the number of filters, etc., perhaps another table which digests this to a form which can be used in summary queries is what's needed. If this is a rare type of query which few people will run, maybe they can go to the EXPLAIN ANALYZE, using the line number info, and grab it from there. -Kevin Josh Berkus josh@agliodbs.com I'm also really unclear on why you're so focused on storing less information rather than more. In an investigation tool like this, it's important to collect as much data as possible because we don't know what's going to be valuable until we analyze it. You seem to be starting out with the idea that you *already* know exactly where the problems are located, in which case why develop a tool at all? Just fix the problem. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] A costing analysis tool
On Tuesday 18 October 2005 18:05, Kevin Grittner wrote: Regarding the idea of a site where results could be posted and loaded into a database which would be available for public access -- I agree that would be great; however, my client is not willing to take that on. If anyone wants to volunteer, that wuold be fantastic. Josh, does the setup on the foundry homepages give you enough to do this? -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] A costing analysis tool
Maybe we could associate a set of defaults to runtime_environment, and you would associate any overrides with the runtime_options. Does this address both your concerns? Josh Berkus josh@agliodbs.com Kevin, When it gets downt to the detail, it may make sense to combine or split some of these. For example, runtime_options should probably not have a column for each currently known option, but a child table which maps to all non-default option values. I'm a little cautious about storing only non-defaults; the defaults have changed from version to version, after all. If we did that, we'd need to have a defaults table in the db as a reference list. Also, we'll need to store runtime options both on the machine level and on the query level, in order to allow testing of changing an enable_* or other query cost option at runtime. Not sure how to capture this, though. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] A costing analysis tool
Robert Treat wrote: On Tuesday 18 October 2005 18:05, Kevin Grittner wrote: Regarding the idea of a site where results could be posted and loaded into a database which would be available for public access -- I agree that would be great; however, my client is not willing to take that on. If anyone wants to volunteer, that wuold be fantastic. Josh, does the setup on the foundry homepages give you enough to do this? pgFoundry currently does not provide either a database or web programs for groups, for security reasons. All you get is the ability to make static pages. The new machine will provide both, we hope. cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buys Innobase)
I'm CC'ng this over to -hackers ... Tom? Comments? On Wed, 19 Oct 2005, Dann Corbit wrote: Yes, clearly that is the wrong result according to the SQL standard. Here is a SQL*Server query: select 1 where 'a' = 'a ' AND 'a' = 'a ' AND 'a ' = 'a ' It returns (correctly): 1 -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-general- [EMAIL PROTECTED] On Behalf Of Marc G. Fournier Sent: Wednesday, October 19, 2005 11:41 AM To: [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase On Wed, 19 Oct 2005, [EMAIL PROTECTED] wrote: I was referring to trailing blanks, but did not explicitly say it, though showed it in the examples. I am pretty sure that the SQL standard says that trailing whitespace is insignificant in string comparison. Then we are broken too :) # select 'a ' = 'a '; ?column? -- f (1 row) Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [pgsql-advocacy] [GENERAL] Oracle buys Innobase
On Wed, 19 Oct 2005, Dann Corbit wrote: -Original Message- From: Stephan Szabo [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 12:39 PM To: Dann Corbit Cc: Marc G. Fournier; [EMAIL PROTECTED]; pgsql- [EMAIL PROTECTED] Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase On Wed, 19 Oct 2005, Dann Corbit wrote: Yes, clearly that is the wrong result according to the SQL standard. Here is a SQL*Server query: select 1 where 'a' = 'a ' AND 'a' = 'a ' AND 'a ' = 'a ' It returns (correctly): 1 Doesn't that depend on the collating sequence in use, or is a NO PAD collating sequence not allowed here? If the implementation defines constants as NO PAD and the implementation defined pad character is something other than space, then they could compare unequal. I would find that implementation disturbing. But I am easily bent out of shape. The attached HTML file in my earlier post is the official quote from the SQL 99 standard. That is the formal and correct definition, far superior to my off the cuff approximations. 'k, if I'm reading the right section (you say its bolded, but I'm using pine which doesn't seem to do a good job of reading HTML): === d) Depending on the collating sequence, two strings may compare as equal even if they are of different lengths or contain different sequences of characters. When any of the operations MAX, MIN, and DISTINCT reference a grouping column, and the UNION, EXCEPT, and INTERSECT operators refer to character strings, the specific value selected by these operations from a set of such equal values is implementation-dependent. === I think the key part of that 'clause' is two strings *may* compare as equal ... sounds implementation dependent to me, depending on how the implementor interprets it ... or am I reading the wrong section? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [pgsql-advocacy] [GENERAL] Oracle buys Innobase
Given this part of that same rule applied to the strings: b) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD characteristic, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a space. I do not see how two strings which are otherwise equal (except for length or blank padding) can possibly compare unequal unless the NO PAD property is applied and the implementation defined pad character is also something other than a space. Is that the case for PostgreSQL? Even if it is, is seems truly bizarre that the NO PAD attribute would be applied to string constants. -Original Message- From: Marc G. Fournier [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 12:53 PM To: Dann Corbit Cc: Stephan Szabo; Marc G. Fournier; [EMAIL PROTECTED]; pgsql- [EMAIL PROTECTED]; pgsql-general@postgresql.org Subject: RE: [pgsql-advocacy] [GENERAL] Oracle buys Innobase On Wed, 19 Oct 2005, Dann Corbit wrote: -Original Message- From: Stephan Szabo [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 12:39 PM To: Dann Corbit Cc: Marc G. Fournier; [EMAIL PROTECTED]; pgsql- [EMAIL PROTECTED] Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase On Wed, 19 Oct 2005, Dann Corbit wrote: Yes, clearly that is the wrong result according to the SQL standard. Here is a SQL*Server query: select 1 where 'a' = 'a ' AND 'a' = 'a ' AND 'a ' = 'a ' It returns (correctly): 1 Doesn't that depend on the collating sequence in use, or is a NO PAD collating sequence not allowed here? If the implementation defines constants as NO PAD and the implementation defined pad character is something other than space, then they could compare unequal. I would find that implementation disturbing. But I am easily bent out of shape. The attached HTML file in my earlier post is the official quote from the SQL 99 standard. That is the formal and correct definition, far superior to my off the cuff approximations. 'k, if I'm reading the right section (you say its bolded, but I'm using pine which doesn't seem to do a good job of reading HTML): === d) Depending on the collating sequence, two strings may compare as equal even if they are of different lengths or contain different sequences of characters. When any of the operations MAX, MIN, and DISTINCT reference a grouping column, and the UNION, EXCEPT, and INTERSECT operators refer to character strings, the specific value selected by these operations from a set of such equal values is implementation-dependent. === I think the key part of that 'clause' is two strings *may* compare as equal ... sounds implementation dependent to me, depending on how the implementor interprets it ... or am I reading the wrong section? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] A costing analysis tool
Kevin, This would require capture of information beyond what I was thinking about in terms of schema. Do you think we need to capture just index type, or something more? Do you propose that we capture the pg_* metadata related to every object referenced in the plan, every object related to every table in the query, or everything in the entire database? Not right away. However, having collected the info in the first place would give us the option of duping a copy of the system tables so that we could mine them for more information. Also, for our test harness, we will have pre-programmed object information for the objects we use. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buys
Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier: I'm CC'ng this over to -hackers ... Tom? Comments? On Wed, 19 Oct 2005, Dann Corbit wrote: Yes, clearly that is the wrong result according to the SQL standard. Here is a SQL*Server query: select 1 where 'a' = 'a ' AND 'a' = 'a ' AND 'a ' = 'a ' It returns (correctly): 1 -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-general- [EMAIL PROTECTED] On Behalf Of Marc G. Fournier Sent: Wednesday, October 19, 2005 11:41 AM To: [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase On Wed, 19 Oct 2005, [EMAIL PROTECTED] wrote: I was referring to trailing blanks, but did not explicitly say it, though showed it in the examples. I am pretty sure that the SQL standard says that trailing whitespace is insignificant in string comparison. Then we are broken too :) # select 'a ' = 'a '; ?column? -- f (1 row) experiment=# SELECT 'a '::char = 'a '::char; ?column? -- t ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
Try this query in Oracle, SQL*Server, DB/2, Informix, etc.: connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as varchar(30)); ?column? -- (0 rows) I see how you can interpret the SQL Standard to make the above response a correct one. But is it the response that you would like? Suppose, for instance, that you have an employee table in your database. You have another table called benefits. Perhaps it is even in another database. Conceivably even in a database other than PostgreSQL. Anyway, you want to match information between the two systems so you join on some redundant columns like a.First_Name = b.fname AND a.Last_name = b.lname AND a.Middle_Initial = b.mi AND a.City = b.city AND a.Street_name = b.street If the columns are not the same length (or one is fixed length and the other variable), then 'Danniel' won't match 'Danniel' unless you trim it. If you trim it, then the indexes go out the window. If the indexes go out the window, then we table scan. I don't like that behavior. Perhaps others who know more than me can say why not blank padding comparisons is a good idea. Clearly, one could argue that having redundant data is bad and that every attribute in a database intended to match should be exactly the same type. But there are lots of database systems badly designed. And of well designed systems, it is not uncommon to have more than one database in your organization, and a need to perform federated joins as well because of it. -Original Message- From: Tino Wildenhain [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 1:05 PM To: Marc G. Fournier Cc: Dann Corbit; [EMAIL PROTECTED]; pgsql- [EMAIL PROTECTED]; pgsql-general@postgresql.org Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier: I'm CC'ng this over to -hackers ... Tom? Comments? On Wed, 19 Oct 2005, Dann Corbit wrote: Yes, clearly that is the wrong result according to the SQL standard. Here is a SQL*Server query: select 1 where 'a' = 'a ' AND 'a' = 'a ' AND 'a ' = 'a ' It returns (correctly): 1 -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-general- [EMAIL PROTECTED] On Behalf Of Marc G. Fournier Sent: Wednesday, October 19, 2005 11:41 AM To: [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase On Wed, 19 Oct 2005, [EMAIL PROTECTED] wrote: I was referring to trailing blanks, but did not explicitly say it, though showed it in the examples. I am pretty sure that the SQL standard says that trailing whitespace is insignificant in string comparison. Then we are broken too :) # select 'a ' = 'a '; ?column? -- f (1 row) experiment=# SELECT 'a '::char = 'a '::char; ?column? -- t ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buys
Guy Rouillier [EMAIL PROTECTED] writes: Tino Wildenhain wrote: experiment=# SELECT 'a '::char = 'a '::char; ?column? -- t This does't show anything useful, because the ::char casting simply takes the first char of any string: select 'abc'::char = 'axy'::char Also results in 'true'. Hopefully no one in this discussion would want those two strings to be equal. A better experiment: db= select 'a '::char(3) = 'a '::char(3); ?column? -- t (1 row) -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
-Original Message- From: Terry Fielder [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 2:05 PM To: Dann Corbit Cc: Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql-general@postgresql.org Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) Dann Corbit wrote: Try this query in Oracle, SQL*Server, DB/2, Informix, etc.: connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as varchar(30)); ?column? -- (0 rows) I see how you can interpret the SQL Standard to make the above response a correct one. But is it the response that you would like? When the compared datatypes are VARCHAR: YES What is the value of doing that? I can see plenty of harm and absolutely no return. We are talking about blank padding before comparison. Do you really want 'Danniel ' considered distinct from 'Danniel ' in a comparison? In real life, what does that buy you? And even if people think it is a good idea, it seems to be at odds with the way that I read the standard (though I am easily confused by simple enough language at times). It could well be that I am jaded from years of doing it the wrong way (I expect two character strings with all leading non-blanks in agreement to compare equal). Perhaps this is old hat to the long-timers around here and there is a good explanation as to why varchar should have non-blank padding when comparisons are performed. Can someone point me to documentation that explains it? [snip] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
On Wed, 19 Oct 2005, Dann Corbit wrote: -Original Message- From: Terry Fielder [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 2:05 PM To: Dann Corbit Cc: Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql-general@postgresql.org Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) Dann Corbit wrote: Try this query in Oracle, SQL*Server, DB/2, Informix, etc.: connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as varchar(30)); ?column? -- (0 rows) I see how you can interpret the SQL Standard to make the above response a correct one. But is it the response that you would like? When the compared datatypes are VARCHAR: YES What is the value of doing that? I can see plenty of harm and absolutely no return. We are talking about blank padding before comparison. Do you really want 'Danniel ' considered distinct from 'Danniel ' in a comparison? In real life, what does that buy you? It buys you the ability to store things where trailing spaces are signficant (for example passwords) within the existing limitations of not having a full set of the collation behavior. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
On Wed, Oct 19, 2005 at 02:05:20PM -0700, Dann Corbit wrote: When the compared datatypes are VARCHAR: YES What is the value of doing that? I can see plenty of harm and absolutely no return. We are talking about blank padding before comparison. Do you really want 'Danniel ' considered distinct from 'Danniel ' in a comparison? In real life, what does that buy you? Well, looking from the point of view of using indexes, indexes can only really match on things that are equal. Which means the system is going to have to trim them anyway. I'm of the opinion that strings are strings and spaces are no different from other characters. That bit of the standard quoted earlier, if you read the PAD character that is different from any other character as being the NUL character, then 'aspacenul' is clearly different from 'aspacespace'. This whacky space behaviour is something I associate with the char(N) and is the main reason I never use it. Perhaps this is old hat to the long-timers around here and there is a good explanation as to why varchar should have non-blank padding when comparisons are performed. Can someone point me to documentation that explains it? The way I understood it: char(N) is blank padding varchar(N) is not If you make varchar(n) do blank padding, then what's the difference between the two types? You may as well get rid of one... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpl2oFftSN1C.pgp Description: PGP signature
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
-Original Message- From: Stephan Szabo [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 2:34 PM To: Dann Corbit Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql- [EMAIL PROTECTED] Subject: Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) On Wed, 19 Oct 2005, Dann Corbit wrote: -Original Message- From: Terry Fielder [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 2:05 PM To: Dann Corbit Cc: Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql-general@postgresql.org Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) Dann Corbit wrote: Try this query in Oracle, SQL*Server, DB/2, Informix, etc.: connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as varchar(30)); ?column? -- (0 rows) I see how you can interpret the SQL Standard to make the above response a correct one. But is it the response that you would like? When the compared datatypes are VARCHAR: YES What is the value of doing that? I can see plenty of harm and absolutely no return. We are talking about blank padding before comparison. Do you really want 'Danniel ' considered distinct from 'Danniel ' in a comparison? In real life, what does that buy you? It buys you the ability to store things where trailing spaces are signficant (for example passwords) within the existing limitations of not having a full set of the collation behavior. I suppose that there may be some value in having 'J^% ' be different from 'J^% '. I would expect to insert a password with trailing blanks to be the same as inserting a password with no trailing blanks. I think that whatever is done ought to be whatever the standard says. If I misinterpret the standard and PostgreSQL is doing it right, then that is fine. It is just that PostgreSQL is very counter-intuitive compared to other database systems that I have used in this one particular area. When I read the standard, it looked to me like PostgreSQL was not performing correctly. It is not unlikely that I read it wrong. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
-Original Message- From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 2:46 PM To: Dann Corbit Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql- [EMAIL PROTECTED] Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) On Wed, Oct 19, 2005 at 02:05:20PM -0700, Dann Corbit wrote: When the compared datatypes are VARCHAR: YES What is the value of doing that? I can see plenty of harm and absolutely no return. We are talking about blank padding before comparison. Do you really want 'Danniel ' considered distinct from 'Danniel ' in a comparison? In real life, what does that buy you? Well, looking from the point of view of using indexes, indexes can only really match on things that are equal. Which means the system is going to have to trim them anyway. I'm of the opinion that strings are strings and spaces are no different from other characters. That bit of the standard quoted earlier, if you read the PAD character that is different from any other character as being the NUL character, then 'aspacenul' is clearly different from 'aspacespace'. This whacky space behaviour is something I associate with the char(N) and is the main reason I never use it. Perhaps this is old hat to the long-timers around here and there is a good explanation as to why varchar should have non-blank padding when comparisons are performed. Can someone point me to documentation that explains it? The way I understood it: char(N) is blank padding varchar(N) is not If you make varchar(n) do blank padding, then what's the difference between the two types? Storage. The blank padding is only for comparison purposes. You may as well get rid of one... All the other database systems seem to handle it in the way that I expect. Which is not to say that it is the right way or that it agrees with the standard. But it is how it appears to me, so far. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 'a' == 'a '
Dann, I think that whatever is done ought to be whatever the standard says. If I misinterpret the standard and PostgreSQL is doing it right, then that is fine. It is just that PostgreSQL is very counter-intuitive compared to other database systems that I have used in this one particular area. When I read the standard, it looked to me like PostgreSQL was not performing correctly. It is not unlikely that I read it wrong. AFAIT, the standard says implementation-specific. So we're standard. The main cost for comparing trimmed values is performance; factoring an rtrim into every comparison will add significant overhead to the already CPU-locked process of, for example, creating indexes. We're looking for ways to make the comparison operators lighter-weight, not heavier. My general perspective on this is that if trailing blanks are a significant hazard for your application, then trim them on data input. That requires a *lot* less peformance overhead than doing it every time you compare something. Changing the behaviour would break backwards compatibility for some users. For that matter, I've been subscribed to 8 PostgreSQL mailing lists since 1999, and this is the first time I can recall someone complaining about this comparison behavior. So it's obviously not a widespread issue. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 'a' == 'a '
If there is a significant performance benefit to not expanding text columns in comparison operations, then it seems it should be OK. I probably read the standard wrong, but it seems to me that varchar, char, and bpchar columns should all behave the same (e.g. if you do not expand with blank or the PAD character (whatever that is) then all char type columns should behave the same. I guess that there could be different default collations for different column types though (that is clearly allowed in the standard). Perhaps it just needs to be documented in such a way that even a blockhead like me can comprehend it easily. -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 5:06 PM To: pgsql-hackers@postgresql.org Cc: Dann Corbit; Stephan Szabo; Terry Fielder; Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED]; pgsql-general@postgresql.org Subject: Re: [HACKERS] 'a' == 'a ' Dann, I think that whatever is done ought to be whatever the standard says. If I misinterpret the standard and PostgreSQL is doing it right, then that is fine. It is just that PostgreSQL is very counter-intuitive compared to other database systems that I have used in this one particular area. When I read the standard, it looked to me like PostgreSQL was not performing correctly. It is not unlikely that I read it wrong. AFAIT, the standard says implementation-specific. So we're standard. The main cost for comparing trimmed values is performance; factoring an rtrim into every comparison will add significant overhead to the already CPU-locked process of, for example, creating indexes. We're looking for ways to make the comparison operators lighter-weight, not heavier. My general perspective on this is that if trailing blanks are a significant hazard for your application, then trim them on data input. That requires a *lot* less peformance overhead than doing it every time you compare something. Changing the behaviour would break backwards compatibility for some users. For that matter, I've been subscribed to 8 PostgreSQL mailing lists since 1999, and this is the first time I can recall someone complaining about this comparison behavior. So it's obviously not a widespread issue. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] RSS feeds of CVS revision logs
Hello, Some of you may have noted the new project I am playing with the PostgreSQL Source browser. It is a Subversion-Trac interface to the PostgreSQL CVS repository. The entire repository is represented. It is currently updated daily but I am hoping to get this down to hourly in the near future. The site can be viewed here: http://www-new.commandprompt.com/projects/public/pgsql The URL will change in the future, probably to pgcode.org or maybe just projects.commandprompt.com. I was speaking with JoshB and he mentioned that one particular facet of this may be interesting to you folks, which is the RSS feeds. JoshB was thinking this would be useful for replacing pgsql-patches@ for some people as their mailboxes get filled up. You have the ability using any RSS reader to subscribe to any portion of the revision log. For example, for the docs project, they can subscribe to Trunk-docs and thus only see the revision logs within their feed for the docs project. At your leisure you may either choose to see only a brief synopsis for the complete commit message. Here is an example of the short feed: http://www-new.commandprompt.com/projects/public/pgsql/log/trunk?limit=100mode=stop_on_copystop_rev=format=rss Here is an example of the long feed: http://www-new.commandprompt.com/projects/public/pgsql/log/trunk?verbose=onformat=rssstop_rev=limit=100mode=stop_on_copy Notice the only real difference is the verbose=on The power of this at least from my perspective, is that I use Thunderbird as my RSS reader. I can view a current revision, click forward, and send a question to the list about a specific change. Sincerely, Joshua D. Drake Command Prompt, Inc. -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] [HACKERS] 'a' == 'a '
Josh Berkus wrote: Dann, I think that whatever is done ought to be whatever the standard says. If I misinterpret the standard and PostgreSQL is doing it right, then that is fine. It is just that PostgreSQL is very counter-intuitive compared to other database systems that I have used in this one particular area. When I read the standard, it looked to me like PostgreSQL was not performing correctly. It is not unlikely that I read it wrong. AFAIT, the standard says implementation-specific. So we're standard. The main cost for comparing trimmed values is performance; factoring an rtrim into every comparison will add significant overhead to the already CPU-locked process of, for example, creating indexes. We're looking for ways to make the comparison operators lighter-weight, not heavier. If I understand the spec correctly, it seems to indicate that this is specific to the locale/character set. Assuming that the standard doesn't have anything to do with any character sets, it should be possible to make this available for those who want it as an initdb option. Whether or not this is important enough to offer or not is another matter. Personally my questions are: 1) How many people have been bitten by this badly? 2) How many people have been bitten by joins that depend on padding? Personally, unlike case folding, this seems to be an area where a bit of documentation (i.e. all collation sets have are assumed to have the NO PAD option in the SQL standard) would be sufficient to answer to questions of standards-compliance. My general perspective on this is that if trailing blanks are a significant hazard for your application, then trim them on data input. That requires a *lot* less peformance overhead than doing it every time you compare something. In general I agree. But I am not willing to jump to the conclusion that it will never be warranted to add this as an initdb option. I am more interested in what cases people see where this would be required. But I agree that the bar is much higher than it is in many other cases. Best Wishes, Chris Travers Metatron Technology Consulting ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] Call for translators
Lyubomir Rusanov wrote: Hi,br I am also interested in helping building Bulgarian translation for PostgreSQL. br I think that we will not have enough time for 8.1 but maybe for 8.2.br The best you could do is submit your incremental improvements for 8.1, so there is at least _some_ translation. Then you complete it for 8.2. I advise you start with the client-side tools (which you can reasonably expect to finish for 8.1), and leave the backend messages for last, because of the size. -- Alvaro Herrerahttp://www.PlanetPostgreSQL.org Voy a acabar con todos los humanos / con los humanos yo acabaré voy a acabar con todos / con todos los humanos acabaré (Bender) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] [HACKERS] 'a' == 'a '
Chris Travers [EMAIL PROTECTED] writes: If I understand the spec correctly, it seems to indicate that this is specific to the locale/character set. The spec associates padding behavior with collations, which per spec are separate from the datatypes --- that is, you should be able to able to specify a collation for each string-type table column (whether char(N) or varchar(N)) and even for each literal string constant. We do not currently have that capability, and accordingly fall back to binding PAD SPACE behavior to char(N) and NO PAD behavior to varchar(N). AFAICS this choice is allowed by the spec since the default collation is implementation-defined. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org