Re: [HACKERS] CONNECT BY and WITH ...
On Tue, Jul 08, 2008 at 07:28:36AM +0200, Hans-Juergen Schoenig wrote: good morning everybody, i know that this is really a hot potato on the mailing list but i think it is useful to discuss this issue. in the past few months we have been working with a customer to improve evgen's CONNECT BY patch. as we have a nice and promising WITH RECURSIVE patch the original CONNECT BY codes are pretty obsolete. however, in the past view weeks I have been asked more than once if it is possible to use the current with patch and add the parser support for CONNECT BY to it. so, people had the choice whether to go with CONNECT BY syntax (to be Oracle compliant, which is important) or ANSI SQL compliant (which is important as well). how are the feelings towards an improvement like that? i would ease the pain of many people for sure. This has been discussed to death several times. We're only going to do the WITH RECURSIVE (i.e. the SQL standard) tree handling. Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding variables for segment_size, wal_segment_size and block sizes
On Thu, 2008-07-03 at 16:36 +0200, Bernd Helmle wrote: --On Montag, Juni 30, 2008 18:47:33 -0400 Bruce Momjian [EMAIL PROTECTED] wrote: I'd like to implement them if we agree on them Bernd, have you made any progress on this? Here's a patch for this. I'll add it to the commit fest wiki page if it's okay for you. I'm not sure why you've included access/xlog_internal.h. All the #defines come from pgconfig.h Maybe that changed from when you started thinking about this? Other than that, no other comments. Looks good. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: CITEXT 2.0 v2
On Mon, Jul 07, 2008 at 12:06:08PM -0700, David E. Wheeler wrote: I guess that'd be the reason to keep it on pgFoundry, but I have two comments: * 2-3 years is a *long* time in Internet time. There have been patches over the years, but they tend not to get looked at. Recently someone pulled up the COLLATE patch from a couple of years ago but it didn't get much feedback either. (I can't find the link right now). It's disappointing that the discussions get hung up on the ICU library when it's not required or even needed for COLLATE support. My original patch never even mentioned it. I note that Firebird added COLLATE using ICU a few years back now. I think PostgreSQL is the only large DBMS to not support it. * There is on guarantee that it will be finished in that time or, indeed ever (no disrespect to Radek Strnad, it's just there are always unforeseen issues). I think that with concerted coding effort it could be done in 2-3 months (judging by how long it took to write the first version). The problem is it needs some planner kung-fu which not many people have. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] PATCH: CITEXT 2.0 v2
Martijn van Oosterhout napsal(a): On Mon, Jul 07, 2008 at 12:06:08PM -0700, David E. Wheeler wrote: I guess that'd be the reason to keep it on pgFoundry, but I have two comments: * 2-3 years is a *long* time in Internet time. There have been patches over the years, but they tend not to get looked at. Recently someone pulled up the COLLATE patch from a couple of years ago but it didn't get much feedback either. (I can't find the link right now). I know about it. I have printed your proposal on my desk. I think It is linked from TODO list. It's disappointing that the discussions get hung up on the ICU library when it's not required or even needed for COLLATE support. My original patch never even mentioned it. I note that Firebird added COLLATE using ICU a few years back now. I think PostgreSQL is the only large DBMS to not support it. Complete agree. Collation missing support is big problem for many users. * There is on guarantee that it will be finished in that time or, indeed ever (no disrespect to Radek Strnad, it's just there are always unforeseen issues). I think that with concerted coding effort it could be done in 2-3 months (judging by how long it took to write the first version). The problem is it needs some planner kung-fu which not many people have. I agree that 2-3 months on fulltime is good estimation, problem is that you need kung-fu master which has time to do it :(. What we currently have is student which works on it in free time. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest rules
On Mon, Jul 7, 2008 at 6:36 PM, Josh Berkus [EMAIL PROTECTED] wrote: Dave Page wrote: it concerns me that despite it being day 3 of the July commit fest, people are still being advised to add new items to the wiki page. To make the idea work, we need to stick to the rules we defined when we came up with the concept - specifically, no new patches once the fest begins! Where are these rules posted? Personally, *I'm* not clear on them. And I'm supposed to be the coordinator. http://archives.postgresql.org/pgsql-hackers/2008-02/msg00193.php Though they're more in the form of a description of how the fest system will work, than bullet-pointed 'rules'. You'll recall I sent that after much discussion on -core. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
Tom Lane napsal(a): Gregory Stark [EMAIL PROTECTED] writes: Out of curiosity, what is a user-defined collation? Are there SQL statements to go around declaring what order code points should be sorted in? That seems like it would be... quite tedious! snip We might be best off to treat collations like index access methods, ie, they're theoretically add-able but there's no infrastructure for managing them, and what's expected is that all the ones you need are created by initdb. I though more about it and I discussed it with Radek yesterday. The problem is that collation must be created before user want to use CREATE DATABASE ... COLLATE ... command. It inclines to have have pg_collation as a global catalog, but ANSI specifies to use schema name in collation specification and schemes are database specific ... It means that pg_collation have to be non-shared catalog and new database only inherits collation from template db. And CREATE DATABASE have to check list of collation in template database :(. My conclusion is that CREATE COLLATION does not make much sense. I see two possible solutions: 1) have global an local catalog for collation and have modified variants of create cmd: CREATE COLLATION ... GLOBAL|LOCAL CREATE DATABASE will use only collation from global catalog Local catalog will be useful when full support of collation will be available mostly for specifying case sensitivity of collation. 2) Use Tom's suggested approach. Create list of collations in initdb phase. But there is problem how to obtain list of supported collation on the server. I think, only what is possible to do is to use default locale for creating default collation for template1 database. Any suggestion? thanks Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCHES] [HACKERS] WITH RECURSIVE updated to CVS TIP
Here is the patches he made against CVS HEAD (as of today). According to him followings are fixed with the patches: - fix crush with DISTINCT - fix creating VIEW - fix the case when recursion plan has another recursion plan under it - fix WITH RECURSIVE ...(..) SELECT ...WHERE.. returns wrong result - fix inifinit recursion with OUTER JOIN Not yet fixed: - detect certain queries those are not valid acroding to the standard - sort query names acording to the dependency - planner always estimate 0 cost for recursion plans -- Tatsuo Ishii SRA OSS, Inc. Japan - SQL:2008 に規定されているクエリ以外をエラーにする処理 - 依存関係の順番で評価するようにする仕組み - プランナが常にコスト 0 で見積る On Mon, Jul 07, 2008 at 04:22:21PM +0900, Yoshiyuki Asaba wrote: Hi, test=# explain select count(*) test-# from ( WITH RECURSIVE t(n) AS ( SELECT 1 UNION ALL SELECT DISTINCT n+1 FROM t ) test(# SELECT * FROM t WHERE n 50) as t test-# WHERE n 100; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. ! \q this one will kill the planner :( removing the (totally stupid) distinct avoids the core dump. Thanks. I've fixed on local repository. Asaba-san, do you have a patch against CVS HEAD or against the previous one? Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate recursive_query-8.patch.bz2 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal of SE-PostgreSQL patches [try#2]
KaiGai Kohei wrote: Might want to change the option name --enable-selinux to something like --security-context. In general, we might want to not name things selinux_* but instead sepostgresql_* or security_* or security_context_*. Or maybe PGACE? The pgace_* scheme is an attractive idea, although the server process has to provide a bit more hints (like the name of security system column and the kind of objects exported with security attribute) pg_dump to support various kind of security features with smallest implementation. It might not be necessary to provide all the hints pg_dump to make queries. The minimum required information is which security feature is running on the server process, or nothing. And, pg_dump can add a security system column within its queries to get security attribute, if required. Now, I'm considering to add pgaceDumpSOMETHING() functions within pg_dump for better modularity. What do you think? Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] plproxy v2
On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote: I mentioned that I planned to remove SELECT/CONNECT too. Now I've thought about it more and it seems to me that its better to keep them. As they give additional flexibility. I very much like PL/Proxy and support your vision. Including the features of PL/Proxy in core seems like a great idea to me. If we have just a couple of commands, would it be easier to include those features by some additional attributes on pg_proc? That way we could include the features in a more native way, similar to the way we have integrated text search, without needing a plugin language at all. CREATE CLUSTER foo ... CREATE FUNCTION bar() CLUSTER foo RUN ON ANY ... If we did that, we might also include a similar proxy feature for tables, making the feature exciting for more users than just those who can specify implementing all logic through functions. It would also remove the need for a specific SELECT command in PL/Proxy. CREATE TABLE bar CLUSTER foo RUN ON ANY ... If we're running a SELECT and all tables accessed run on the same cluster we ship the whole SQL statement according to the RUN ON clause. It would effectively bring some parts of dblink into core. If all tables not on same cluster we throw an error in this release, but in later releases we might introduce distributed join features and full distributed DML support. Having the PL/Proxy features available via the catalog will allow a clear picture of what runs where without parsing the function text. It will also allow things like a pg_dump of all objects relating to a cluster. Adding this feature for tables would be interesting with Hot Standby, since it would allow you to offload SELECT statements onto the standby automatically. This would be considerably easier to integrate than text search was. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
Zdenek Kotala napsal(a): I though more about it and I discussed it with Radek yesterday. The problem is that collation must be created before user want to use CREATE DATABASE ... COLLATE ... command. It inclines to have have pg_collation as a global catalog, but ANSI specifies to use schema name in collation specification and schemes are database specific ... It means that pg_collation have to be non-shared catalog and new database only inherits collation from template db. And CREATE DATABASE have to check list of collation in template database :(. thinking more ... It must be shared catalog because pg_database will depend on it. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
On Tue, Jul 08, 2008 at 11:27:35AM +0200, Zdenek Kotala wrote: Zdenek Kotala napsal(a): I though more about it and I discussed it with Radek yesterday. The problem is that collation must be created before user want to use CREATE DATABASE ... COLLATE ... command. It inclines to have have pg_collation as a global catalog, but ANSI specifies to use schema name in collation specification and schemes are database specific ... It means that pg_collation have to be non-shared catalog and new database only inherits collation from template db. And CREATE DATABASE have to check list of collation in template database :(. thinking more ... It must be shared catalog because pg_database will depend on it. Not necessarily. pg_class is not shared yet without it you can't even find pg_database. Same deal with pg_type. All it means is that pg_collation in template1 must contain all the collations used in template1, which shouldn't be hard to arrange. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] [WIP] patch - Collation at database level
Martijn van Oosterhout napsal(a): On Tue, Jul 08, 2008 at 11:27:35AM +0200, Zdenek Kotala wrote: Zdenek Kotala napsal(a): I though more about it and I discussed it with Radek yesterday. The problem is that collation must be created before user want to use CREATE DATABASE ... COLLATE ... command. It inclines to have have pg_collation as a global catalog, but ANSI specifies to use schema name in collation specification and schemes are database specific ... It means that pg_collation have to be non-shared catalog and new database only inherits collation from template db. And CREATE DATABASE have to check list of collation in template database :(. thinking more ... It must be shared catalog because pg_database will depend on it. Not necessarily. pg_class is not shared yet without it you can't even find pg_database. Same deal with pg_type. All it means is that pg_collation in template1 must contain all the collations used in template1, which shouldn't be hard to arrange. I think, Collation situation is different, becasue pg_database will contains column colname. pg_class (and all bootstrap catalog) only contains row which specify that shared table exists and content is cloned to the new database from template database. In corner case you can get context specific dependency for example if Czech collation will have oid=10 in database test01 and Swedish collation will have oid=10 in database test02. How to handle CREATE DATABASE and connect database? OK it shouldn't happen in normal situation but ... Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
On Tue, Jul 08, 2008 at 12:00:34PM +0200, Zdenek Kotala wrote: Not necessarily. pg_class is not shared yet without it you can't even find pg_database. Same deal with pg_type. All it means is that pg_collation in template1 must contain all the collations used in template1, which shouldn't be hard to arrange. I think, Collation situation is different, becasue pg_database will contains column colname. pg_class (and all bootstrap catalog) only contains row which specify that shared table exists and content is cloned to the new database from template database. In corner case you can get context specific dependency for example if Czech collation will have oid=10 in database test01 and Swedish collation will have oid=10 in database test02. How to handle CREATE DATABASE and connect database? OK it shouldn't happen in normal situation but ... Oh I see, you're referring to the storage of the default collation for a database. I was jumping ahead to the per-column collation state, when the collation default is attached to columns, types and domains, and not at the database level. So there the problem does not exist. To be honest, I'd suggest storing the collation in pg_database as a string, rather than as an identifier. This sidesteps the problem entirly. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] [WIP] patch - Collation at database level
Martijn van Oosterhout napsal(a): Oh I see, you're referring to the storage of the default collation for a database. I was jumping ahead to the per-column collation state, when the collation default is attached to columns, types and domains, and not at the database level. So there the problem does not exist. Yeah, but you still need one source/one collation list for database, scheme, table and column. And of course shared tables need also collation for their indexes. To be honest, I'd suggest storing the collation in pg_database as a string, rather than as an identifier. This sidesteps the problem entirly. I don't think that string is good idea. You need to use same approach on all levels by my opinion. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] WIP: executor_hook for pg_stat_statements
On Mon, 2008-07-07 at 10:51 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2008-07-07 at 11:03 +0900, ITAGAKI Takahiro wrote: One issue is tag field. The type is now uint32. It's enough in my plugin, but if some people need to add more complex structures in PlannedStmt, Node type would be better rather than uint32. Which is better? I was imagining that tag was just an index to another data structure, but probably better if its a pointer. I don't want the tag there at all, much less converted to a pointer. What would the semantics be of copying the node, and why? Please justify why you must have this and can't do what you want some other way. Agreed. If we have plugins for planner and executor we should be able to pass information around in the background. We have mechanisms for two plugins to rendezvous, so we can use that if they're completely separate plugins. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.1 index corruption woes
On Jul 7, 2008, at 7:39 PM, Tom Lane wrote: Another point to keep in mind, if you are trying to analyze files belonging to a live database, is that what you can see in the filesystem may not be the current contents of every page. For typical access patterns it'd be unsurprising for the visible index pages to lag behind those of the heap, since they'd be hotter and tend to stay in shared buffers longer. None of the tests were done on the production database. Most of the checks were not done on a PITR restore; they were done on a SAN-level snapshot that had been run through the recovery process (startup postmaster on snapshot, let it recover, shut down). I hadn't thought about checkpointing; I'll make sure to do that next time we take a snapshot. We also analyzed a single table from a completely different (much larger) database. In that case the analysis was done on a PITR- recovered slave that was up and running, but nothing should have been writing to the table at all, and it would have been up long enough that it would have checkpointed after exiting PITR recovery (though IIRC there's a manual checkpoint done at exit of PITR recovery anyway). That check didn't show as many questionable index pointers, but there were some (again, the bulk of them were index pointers that were using the first line pointer slot in the index page). -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] CommitFest rules
On Mon, 2008-07-07 at 11:03 -0700, Josh Berkus wrote: Brendan, What I'm saying is, I've got the opportunity and the inclination to make the wiki more usable. If you guys want better navigation, there's a suggestion on the table to improve that. I'm not too worried about the wiki eventually being ditched for something better -- in fact I'm looking forward to it. Oh, please go ahead. I just don't want people to get *too* comfortable with the wiki ;-) I can tell you from the perspective of CF coordinator, it's a PITA. I felt so too at first, but I'm comfortable with it now. Writing a working PHP app will take much longer and will be a PITA also. Not sure if I've done everything correctly though :-) It would be very useful if people would add a comment like started my review of X now. Otherwise people don't know whether the reviewer is actually reviewing it at length, or just intend to at some point. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: New relkind (was Re: [HACKERS] Exposing quals)
Simon Riggs wrote: On Mon, 2008-07-07 at 16:26 -0700, David Fetter wrote: On Mon, Jul 07, 2008 at 06:46:29PM -0400, Andrew Dunstan wrote: For the record, I agree with Jan's suggestion of passing a pointer to the parse tree, and offline gave David a suggestion verbally as to how this could be handled for PL/PerlU. I don't think we should be tied too closely to a string representation, although possibly the first and simplest callback function would simply stringify the quals. As I understand Jan's plan, the idea is to create a new relkind with an exit to user code at leaf nodes in the plan tree. This would require an API design for both user C code and for each PL to use, but would then allow PostgreSQL's optimizer to work on JOINs, etc. Jan, have I got that right so far? Do you have something in the way of a rough patch, docs, etc. for this? It sounds like we can make it happen as text for other DBMS and as plan nodes for PostgreSQL, which is the best solution all round. Personally not too worried which way we do this - as long as we do it for 8.4 :-) It's obviously happening in the background, so I'll leave it alone. I think the concept involving the plan tree is gold. Hannu Krosing mentioned some idea like that recently as well. If the function had the chance to tell the planner how it is gonna operate (e.g produces sorted output, etc.) it would be perfect. The golden thing here would be if we could teach a function whether it is STREAMABLE | NOT STREAMABLE. streamable would make sure that we don't have to materialize the output of a set returning function. this would allow google-like analysis in postgresql easily by allowing to fetch data from any amount of data from any data source. best regards, hans -- Cybertec Schönig Schönig GmbH PostgreSQL Solutions and Support Gröhrmühlgasse 26, A-2700 Wiener Neustadt Tel: +43/1/205 10 35 / 340 www.postgresql-support.de, www.postgresql-support.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] Extending grant insert on tables to sequences
Jaime Casanova escribió: On Thu, May 22, 2008 at 1:18 PM, Jaime Casanova [EMAIL PROTECTED] wrote: Hi, The idea of this patch is to avoid the need to make explicit grants on sequences owned by tables. I've noted that the patch i attached is an older version that doesn't compile because of a typo... Re-attaching right patch and fix documentation to indicate the new behaviour... I had a look at this patch and it looks good. The only thing that's not clear to me is whether we have agreed we want this to be the default behavior? A quibble: + foreach(cell, istmt.objects) + { + [...] + + istmt_seq.objects = getOwnedSequences(lfirst_oid(cell)); + if (istmt_seq.objects != NIL) + { + if (istmt.privileges (ACL_INSERT)) + istmt_seq.privileges |= ACL_USAGE; + else if (istmt.privileges (ACL_UPDATE)) + istmt_seq.privileges |= ACL_UPDATE; + else if (istmt.privileges (ACL_SELECT)) + istmt_seq.privileges |= ACL_SELECT; + + ExecGrantStmt_oids(istmt_seq); + } Wouldn't it be clearer to build a list with all the sequences owned by the tables in istmt.objects, and then call ExecGrantStmt_oids() a single time with the big list? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
Zdenek Kotala [EMAIL PROTECTED] writes: Martijn van Oosterhout napsal(a): Not necessarily. pg_class is not shared yet without it you can't even find pg_database. Same deal with pg_type. All it means is that pg_collation in template1 must contain all the collations used in template1, which shouldn't be hard to arrange. I think, Collation situation is different, All the argument here is based on the premise that we should have database-level collation specifications, which AFAICS is not required nor suggested by the SQL spec. I wonder why we are allowing a nonstandard half-measure to drive our thinking, rather than solving the real problem which is column-level collations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest rules
Simon Riggs [EMAIL PROTECTED] writes: It would be very useful if people would add a comment like started my review of X now. Otherwise people don't know whether the reviewer is actually reviewing it at length, or just intend to at some point. That's a good idea, imho, I'll try to do that -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
Zdenek Kotala [EMAIL PROTECTED] writes: ... And of course shared tables need also collation for their indexes. No, they don't, because the only textual indexes on shared catalogs are on name columns, which are intentionally not locale aware, and wouldn't be collation aware either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing quals
Simon Riggs wrote: The notes say Heikki doesn't think this is a long term solution, but in the following discussion it was the *only* way of doing this that will work with non-PostgreSQL databases. So it seems like the way we would want to go, yes? How did you come to the conclusion that this is the only way that will work with non-PostgreSQL databases? I don't see any limitations like that in any of the proposed approaches. I guess I should clarify my position on this: We should start moving towards a full SQL:MED solution that will ultimately support pushing down joins, aggregates etc. to the remote database. Including support for transaction control, using 2PC, and cost estimation and intelligent planning. This should be done in an extensible way, so that people can write their own plugins to connect to different RDBMSs, as well as simple data sources like flat files. The plugin needs to be able to control which parts of a plan tree can be pushed down to the remote source, estimate the cost of remote execution, and map remote data types to local ones. And it then needs to be able to construct and execute the remote parts of a plan. We're obviously not going to get all that overnight, but whatever we implement now should be the first step towards that, rather than something that we need to deprecate and replace in the future. Unfortunately I don't see a way to extend the proposed exposing quals to functions patch to do more than just that. The list of functionality a full-blown plugin will need is quite long. I don't think there's any hope of supporting all that without reaching into some PostgreSQL internal data structures, particularly the planner structures like RelOptInfo, Path and Plan. The plugins will be more tightly integrated into the system than say user defined data types. They will need to be written in C, and they will be somewhat version dependent. Simpler plugins, like one to read CSV files, with no pushing down and no update support, will need less access to internals, and thus will be less version dependent, so pgfoundry projects like that will be feasible. Note that the dependency on internal data structures doesn't go away by saying that they're passed as text; the text representation of our data structures is version dependent as well. So what would the plugin API look like? To hook into the planner, I'm envisioning the plugin would define these functions: /* * Generate a remote plan for executing a whole subquery remotely. For * example, if the query is an aggregate, we might be able to execute * the whole aggregate in the remote database. This will be called * from grouping_planner(), like optimize_minmax_aggregates(). * Returns NULL if remote execution is not possible. (a dummy * implementation can always return NULL. */ Plan *generate_remote_path(PlannerInfo *, List *tlist); /* * Generate a path for executing one relation in remote * database. The relation can be a base (non-join) remote relation, * or a join involving a remote relation. Can return NULL for join * relations if the join can't be executed remotely. */ Path *generate_remote_path(PlannerInfo *, RelOptInfo *) /* * Create a Plan node from a Path. Called from create_plan, when * the planner chooses to use a remote path. A typical implementation * would create the SQL string to be executed in the remote database, * and return a RemotePlan node with that SQL string in it. */ Plan *create_remote_plan(PlannerInfo *, RemotePath *) On the execution side, the plugin needs to be able to execute a previously generated RemotePlan. There would be a new executor node type, a RemoteScan, that would be similar to a seq scan or index scan, but delegates the actual execution to the plugin. The execution part of the plugin API would reflect the API of executor nodes, something like: void *scan_open(RemotePlan *) HeapTuple *scan_getnext(void *scanstate) void scan_close(void *scanstate) The presumption here is that you would define remote tables with the appropriate SQL:MED statements beforehand (CREATE FOREIGN TABLE). However, it is flexible enough that you could implement the exposing quals to functions functionality with this as well: generate_remote_path() would need to recognize the function scans that it can handle, and return a RemotePath struct with all the same information as create_functionscan_path does (the cost estimates could be adjusted for the pushed down quals at this point as well). create_remote_plan would return a FunctionScan node, but with the extra qualifiers passed into the function as arguments. In case of dblink, it could just add extra WHERE clauses to the query that's being passed as argument. I'm not proposing that we do the stuff described in this paragraph, just using it as an example of the flexibility. BTW, I think the exposing quals to functions functionality could be
Re: [HACKERS] [WIP] patch - Collation at database level
Tom Lane wrote: All the argument here is based on the premise that we should have database-level collation specifications, which AFAICS is not required nor suggested by the SQL spec. I wonder why we are allowing a nonstandard half-measure to drive our thinking, rather than solving the real problem which is column-level collations. Agreed. Are we even sure that we want per-database collations as a half-way house? Unless we can be sure that we want all the required catalog changes for the full requirement, it seems to me a rather messy way of getting to where we want to go. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
On Tue, Jul 08, 2008 at 10:54:28AM -0400, Andrew Dunstan wrote: Agreed. Are we even sure that we want per-database collations as a half-way house? Unless we can be sure that we want all the required catalog changes for the full requirement, it seems to me a rather messy way of getting to where we want to go. Given that the current projected timeframe for full COLLATE support is something like 2-3 major releases, I don't really see the problem with doing this now. Who knows, it might prompt people to do something sooner. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] Exposing quals
On Tue, 2008-07-08 at 17:51 +0300, Heikki Linnakangas wrote: Simon Riggs wrote: The notes say Heikki doesn't think this is a long term solution, but in the following discussion it was the *only* way of doing this that will work with non-PostgreSQL databases. So it seems like the way we would want to go, yes? How did you come to the conclusion that this is the only way that will work with non-PostgreSQL databases? SQL, in text form, is the interface to other databases. You can't pass half a plan tree to Oracle, especially not a PostgreSQL plan tree. It has to be text if you wish to send a query to another RDBMS, or another version of PostgreSQL. We should start moving towards a full SQL:MED solution that will ultimately support pushing down joins, aggregates etc. to the remote database. Including support for transaction control, using 2PC, and cost estimation and intelligent planning. This should be done in an extensible way, so that people can write their own plugins to connect to different RDBMSs, as well as simple data sources like flat files. The plugin needs to be able to control which parts of a plan tree can be pushed down to the remote source, estimate the cost of remote execution, and map remote data types to local ones. And it then needs to be able to construct and execute the remote parts of a plan. So if I understand you, you want to pass the partial plan tree and then have a plugin construct the SQL text. Sounds like a great approach. Maybe you thought I meant internal interfaces should be in text? No, that would be bizarre. I meant we should not attempt to pass partial plan trees outside of the database, since that would limit the feature to only working with the same version of PostgreSQL database. I support your wish to have something that can work with other types of database. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing quals
Simon Riggs wrote: SQL, in text form, is the interface to other databases. You can't pass half a plan tree to Oracle, especially not a PostgreSQL plan tree. It has to be text if you wish to send a query to another RDBMS, or another version of PostgreSQL. Oh, I see. Agreed. Though note that there's big differences in SQL dialects, so a one-size-fits-all approach to generating SQL to be executed in the remote database won't work. (not that I think anyone has suggested that) So if I understand you, you want to pass the partial plan tree and then have a plugin construct the SQL text. Exactly. Maybe you thought I meant internal interfaces should be in text? Yeah, that's exactly what I thought you meant. No, that would be bizarre. I meant we should not attempt to pass partial plan trees outside of the database, since that would limit the feature to only working with the same version of PostgreSQL database. Agreed. I'm glad we're on the same page now. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] plproxy v2
On Tue, 2008-07-08 at 10:21 +0100, Simon Riggs wrote: On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote: I very much like PL/Proxy and support your vision. Including the features of PL/Proxy in core seems like a great idea to me. Adding this feature for tables would be interesting with Hot Standby, since it would allow you to offload SELECT statements onto the standby automatically. This would be considerably easier to integrate than text search was. First let me say that I too enjoy PL/Proxy quite a bit. However, I don't think it needs to be in core. I wouldn't mind seeing it in contrib (or better yet modules/ should we ever get around to that). Sincerely, Joshua D. Drake -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] plproxy v2
On 7/8/08, Simon Riggs [EMAIL PROTECTED] wrote: On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote: I mentioned that I planned to remove SELECT/CONNECT too. Now I've thought about it more and it seems to me that its better to keep them. As they give additional flexibility. I very much like PL/Proxy and support your vision. Including the features of PL/Proxy in core seems like a great idea to me. If we have just a couple of commands, would it be easier to include those features by some additional attributes on pg_proc? That way we could include the features in a more native way, similar to the way we have integrated text search, without needing a plugin language at all. CREATE CLUSTER foo ... CREATE FUNCTION bar() CLUSTER foo RUN ON ANY ... If we did that, we might also include a similar proxy feature for tables, making the feature exciting for more users than just those who can specify implementing all logic through functions. It would also remove the need for a specific SELECT command in PL/Proxy. CREATE TABLE bar CLUSTER foo RUN ON ANY ... If we're running a SELECT and all tables accessed run on the same cluster we ship the whole SQL statement according to the RUN ON clause. It would effectively bring some parts of dblink into core. If all tables not on same cluster we throw an error in this release, but in later releases we might introduce distributed join features and full distributed DML support. Having the PL/Proxy features available via the catalog will allow a clear picture of what runs where without parsing the function text. It will also allow things like a pg_dump of all objects relating to a cluster. Adding this feature for tables would be interesting with Hot Standby, since it would allow you to offload SELECT statements onto the standby automatically. This would be considerably easier to integrate than text search was. Interesting proposal. First I want to say - we can forget the SELECT/CONNECT statements when discussing this approach. They are in because they were easy to add and gave some additional flexibility. But they are not important. If they don't fit some new approach, there is no problem dropping them. So that leaves functions in form: CLUSTER expr; RUN ON expr; and potentially SPREAD BY as discussed in: http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/93.html which sends different arguments to different partitions. I'm not yet sure it's worthwhile addition, but I work mostly on OLTP databases and that feature would target OLAP ones. So I let others decide. Now few technical points about your proposal: - One feature that current function-based configuration approach gives is that we can manage cluster configuration centrally and replicate to actual proxy databases. And this is something I would like to keep. This can be solved by using also plain table or functions behind the scenes. - How about CREATE REMOTE FUNCTION / TABLE .. ; for syntax? - Currently both hash and cluster selection expressions can be quite free-form. So parsing them out to some pg_proc field would not be much help actually. And some philosophical points: - PL/Proxy main use-case is complex read-write transactions in OLTP setting. But remote table/views target simple read-only transactions with free-form queries. - PL/Proxy has concrete argument list and free-form cluster and partition selection. Remote tables have free-form arguments, maybe they want more rigid cluster / partition selection? If the syntax and backend implementation can be merged, its good, but it should not be forced. So before we start adding syntax to core, maybe it would be good to have concrete idea how the remote tables will look like and what representation they want for a cluster? Especially if you want to do stuff like distributed joins. OTOH, if you say that current PL/Proxy approach fits remote tables as well, I'm not against doing it SQL level. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing quals
On Tue, Jul 08, 2008 at 06:22:23PM +0300, Heikki Linnakangas wrote: Simon Riggs wrote: SQL, in text form, is the interface to other databases. You can't pass half a plan tree to Oracle, especially not a PostgreSQL plan tree. It has to be text if you wish to send a query to another RDBMS, or another version of PostgreSQL. Oh, I see. Agreed. Though note that there's big differences in SQL dialects, so a one-size-fits-all approach to generating SQL to be executed in the remote database won't work. (not that I think anyone has suggested that) So if I understand you, you want to pass the partial plan tree and then have a plugin construct the SQL text. Exactly. Maybe you thought I meant internal interfaces should be in text? Yeah, that's exactly what I thought you meant. No, that would be bizarre. I meant we should not attempt to pass partial plan trees outside of the database, since that would limit the feature to only working with the same version of PostgreSQL database. Agreed. I'm glad we're on the same page now. Everybody's weighed in on this thread except the guy who's actually doing the work. Jan? Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] plproxy v2
On 7/8/08, Joshua D. Drake [EMAIL PROTECTED] wrote: On Tue, 2008-07-08 at 10:21 +0100, Simon Riggs wrote: On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote: I very much like PL/Proxy and support your vision. Including the features of PL/Proxy in core seems like a great idea to me. Adding this feature for tables would be interesting with Hot Standby, since it would allow you to offload SELECT statements onto the standby automatically. This would be considerably easier to integrate than text search was. First let me say that I too enjoy PL/Proxy quite a bit. However, I don't think it needs to be in core. I wouldn't mind seeing it in contrib (or better yet modules/ should we ever get around to that). I'm not against contrib/ considering that the docs are now nicely integrated, but then, whats the difference between src/pl/ and contrib/? OTOH, if you argue LANGUAGE plproxy vs. CREATE REMOTE FUNCTION, then thats different matter. It seems to be we should do REMOTE FUNCTION after, not before REMOTE TABLE as the table/view implementation needs to dictate the actual details. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] plproxy v2
On Tue, 2008-07-08 at 18:29 +0300, Marko Kreen wrote: and potentially SPREAD BY as discussed in: http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/93.html That *sounds* cool, but its just the first part of the implementation of a massively parallel executor. You'll quickly end up wanting to do something else as well. Redistributing data is the hard part of a hard problem. I'd steer clear of that. Skytools are good cause they do simple things well. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] plproxy v2
On Tue, 2008-07-08 at 18:43 +0300, Marko Kreen wrote: On 7/8/08, Joshua D. Drake [EMAIL PROTECTED] wrote: First let me say that I too enjoy PL/Proxy quite a bit. However, I don't think it needs to be in core. I wouldn't mind seeing it in contrib (or better yet modules/ should we ever get around to that). I'm not against contrib/ considering that the docs are now nicely integrated, but then, whats the difference between src/pl/ and contrib/? I am actually against adding to the grammar which is what Simon was suggesting. If it wants to go into src/pl I wouldn't have a problem with that. Sorry for not being more clear. Joshua D. Drake -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] plproxy v2
On Tue, 2008-07-08 at 08:58 -0700, Joshua D. Drake wrote: On Tue, 2008-07-08 at 18:43 +0300, Marko Kreen wrote: On 7/8/08, Joshua D. Drake [EMAIL PROTECTED] wrote: First let me say that I too enjoy PL/Proxy quite a bit. However, I don't think it needs to be in core. I wouldn't mind seeing it in contrib (or better yet modules/ should we ever get around to that). I'm not against contrib/ considering that the docs are now nicely integrated, but then, whats the difference between src/pl/ and contrib/? I am actually against adding to the grammar Why? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] Extending grant insert on tables to sequences
On 7/8/08, Alvaro Herrera [EMAIL PROTECTED] wrote: Jaime Casanova escribió: On Thu, May 22, 2008 at 1:18 PM, Jaime Casanova [EMAIL PROTECTED] wrote: Hi, The idea of this patch is to avoid the need to make explicit grants on sequences owned by tables. I've noted that the patch i attached is an older version that doesn't compile because of a typo... Re-attaching right patch and fix documentation to indicate the new behaviour... I had a look at this patch and it looks good. The only thing that's not clear to me is whether we have agreed we want this to be the default behavior? mmm... i don't remember from where i took the equivalences... i will review if there is any concensus in that... anyway now i when people should speak about it... Wouldn't it be clearer to build a list with all the sequences owned by the tables in istmt.objects, and then call ExecGrantStmt_oids() a single time with the big list? at night i will see the code for this... -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Guayaquil - Ecuador Cel. (593) 87171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] plproxy v2
On 7/8/08, Simon Riggs [EMAIL PROTECTED] wrote: On Tue, 2008-07-08 at 18:29 +0300, Marko Kreen wrote: and potentially SPREAD BY as discussed in: http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/93.html That *sounds* cool, but its just the first part of the implementation of a massively parallel executor. You'll quickly end up wanting to do something else as well. Redistributing data is the hard part of a hard problem. I'd steer clear of that. Skytools are good cause they do simple things well. Well, for PL/Proxy it would be the _last_ part. Yes, now the user can build parallel OLAP executor, but all of this will be up to user. PL/Proxy itself will stay dumb and simple. It would not need do to any guesswork, all the data will be provided by user. The amount of code needed to make the SPREAD work would be minimal, mostly reactoring of existing code is needed. So it fits the current design. The point is - PL/Proxy already executes single query with same arguments in parallel. With the SPREAD feature it could execute single query with different arguments in parallel. And the next step of executing different queries in parallel does not make sense for PL/Proxy as it's main concept is function-calls not queries. But ofcourse we can decide we don't want do go that way, and that's ok also. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] plproxy v2
On 7/8/08, Joshua D. Drake [EMAIL PROTECTED] wrote: On Tue, 2008-07-08 at 18:43 +0300, Marko Kreen wrote: On 7/8/08, Joshua D. Drake [EMAIL PROTECTED] wrote: First let me say that I too enjoy PL/Proxy quite a bit. However, I don't think it needs to be in core. I wouldn't mind seeing it in contrib (or better yet modules/ should we ever get around to that). I'm not against contrib/ considering that the docs are now nicely integrated, but then, whats the difference between src/pl/ and contrib/? I am actually against adding to the grammar which is what Simon was suggesting. If it wants to go into src/pl I wouldn't have a problem with that. Sorry for not being more clear. Current patch is for src/pl/. Diffstat for patch v2: doc/src/sgml/filelist.sgml |1 doc/src/sgml/plproxy.sgml | 221 ++ doc/src/sgml/postgres.sgml |1 src/include/catalog/pg_pltemplate.h|1 src/pl/Makefile|2 src/pl/plproxy/Makefile| 89 ++ src/pl/plproxy/cluster.c | 469 + src/pl/plproxy/execute.c | 724 + src/pl/plproxy/expected/plproxy_clustermap.out | 71 ++ src/pl/plproxy/expected/plproxy_dynamic_record.out | 51 + src/pl/plproxy/expected/plproxy_errors.out | 66 + src/pl/plproxy/expected/plproxy_init.out |2 src/pl/plproxy/expected/plproxy_many.out | 116 +++ src/pl/plproxy/expected/plproxy_select.out | 37 + src/pl/plproxy/expected/plproxy_test.out | 312 + src/pl/plproxy/function.c | 479 + src/pl/plproxy/main.c | 214 ++ src/pl/plproxy/parser.y| 203 + src/pl/plproxy/plproxy.h | 301 src/pl/plproxy/poll_compat.c | 140 src/pl/plproxy/poll_compat.h | 58 + src/pl/plproxy/query.c | 316 + src/pl/plproxy/result.c| 222 ++ src/pl/plproxy/rowstamp.h | 27 src/pl/plproxy/scanner.l | 320 + src/pl/plproxy/sql/plproxy_clustermap.sql | 56 + src/pl/plproxy/sql/plproxy_dynamic_record.sql | 43 + src/pl/plproxy/sql/plproxy_errors.sql | 63 + src/pl/plproxy/sql/plproxy_init.sql| 57 + src/pl/plproxy/sql/plproxy_many.sql| 66 + src/pl/plproxy/sql/plproxy_select.sql | 37 + src/pl/plproxy/sql/plproxy_test.sql| 200 + src/pl/plproxy/type.c | 336 + 33 files changed, 5299 insertions(+), 2 modifications(!) -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
Andrew, Tom, Agreed. Are we even sure that we want per-database collations as a half-way house? Unless we can be sure that we want all the required catalog changes for the full requirement, it seems to me a rather messy way of getting to where we want to go. Given that we don't have a delivery date for table or column level collations, we don't want to turn down database-level collations.If nothing else, Radek's work will expose what areas of our code are collation-dependant and hopefully make the work of more granular collations easier. And if it takes us 3 years to get more granular collations, at least people can use database-level ones in the meantime so that they don't need to have separate PostgreSQL binaries for every language they want to support fully. Also ... this is a Summer of Code Project, which we accepted, which at least in Google and the student's eyes means we're not going to discard the entire premise of the patch. I'm not exaggerating when I say doing something like that could get PostgreSQL permanently banned from Google SoC. Tom, I think you need to be on the SoC committee in the future, just to raise objections. Some 15+ PostgreSQL contributors on the SoC committee approved Radek's project. -- Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
Tom Lane wrote: Zdenek Kotala [EMAIL PROTECTED] writes: Martijn van Oosterhout napsal(a): Not necessarily. pg_class is not shared yet without it you can't even find pg_database. Same deal with pg_type. All it means is that pg_collation in template1 must contain all the collations used in template1, which shouldn't be hard to arrange. I think, Collation situation is different, All the argument here is based on the premise that we should have database-level collation specifications, which AFAICS is not required nor suggested by the SQL spec. I wonder why we are allowing a nonstandard half-measure to drive our thinking, rather than solving the real problem which is column-level collations. Wouldn't you still need per-database and per-table default collations? At least MySQL does have such a concept. Best Regards Michael Paesold -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCHES] [HACKERS] Solaris ident authentication using unix domain sockets
Tom, Indeed. If the Solaris folk feel that getupeercred() is insecure, they had better explain why their kernel is that broken. This is entirely unrelated to the known shortcomings of the ident IP protocol. The Solaris security kernel folks do, actually. However, there's no question that TRUST is inherently insecure, and that's what people are going to use if they can't get IDENT to work. -- Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal of SE-PostgreSQL patches [try#2]
All, The pgace_* scheme is an attractive idea, although the server process has to provide a bit more hints (like the name of security system column and the kind of objects exported with security attribute) pg_dump to support various kind of security features with smallest implementation. If we have a choice, it should be pg_ace_*. We've told developers that they can expect system stuff to be named pg_* ; let's stick to that whenever we can. -- Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
Tom Lane napsal(a): Zdenek Kotala [EMAIL PROTECTED] writes: ... And of course shared tables need also collation for their indexes. No, they don't, because the only textual indexes on shared catalogs are on name columns, which are intentionally not locale aware, and wouldn't be collation aware either. Yeah, name uses strcmp, which is not locale aware but from ANSI perspective there is collation SQL_IDENTIFIER for it which is fortunately implementation defined. What I see now as the problem is that we need also to know correct collation for ORDER - for example: select * from pg_shdescription order by description; ...thinking... but it should solve by collation per column which will work well with pg_attribute cloning for new database as Martijn mentioned. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Identifier case folding notes
I have had some idle thoughts on the issue of identifier case folding. Maybe we can collect our ideas and inch closer to a solution sometime. Or we determine that it's useless and impossible, but then I can at least collect that result in the wiki and point future users to it. Background: PostgreSQL folds unquoted identifiers to lower case. SQL specifies folding to upper case, and most other SQL DBMS do it that way. If an application mixes referring to an object using quoted and unquoted versions of an identifier, then incompatibilities arise. We have always stressed to users that one should refer to an object always unquoted or always quoted. While that remains a good suggestion for a number of reasons, we have seen occasional complaints that some closed source applications violate this rule and therefore cannot be run on PostgreSQL. A number of solutions have been proposed over time, which I summarize here: 1. Change the lexer to fold to upper case, as it is supposed to do according to the SQL standard. This will break almost everything, because almost all built-in objects have lower-case names and thus couldn't be referred to anymore except by quoting. Changing the names of all the internal objects to upper-case names would involve vast code changes, probably break just as much, and make everything uglier. So this approach is unworkable. 2. Fold to upper case, but not when referring built-in objects. The lexer generally doesn't know what a name will refer to, so this is not possible to implement, at least without resorting to lots of hard-coding or horrid kludges. Also, a behavior like this will probably create all kinds of weird inconsistencies, resulting from putting catalog knowledge in the lexer. 3 and 4 -- Two variants of ignore case altogether: 3. Fold all identifiers to lower case, even quoted ones. This would probably in fact fix the breakage of many of the above-mentioned problem applications, and it would in general be very easy to understand for a user. And it could be implemented in about three lines. One disadvantage is that one could no longer have objects that have names different only by case, but that is probably rare and incredibly stupid and can be neglected. The main disadvantage is that the case of identifiers and in particular column labels is lost. So applications and programming interfaces that look up result columns in a case-sensitive manner would fail. And things like SELECT expr AS Nice Heading won't work properly anymore. 4. Compare the name data type in a case-insensitive manner. This would probably address most problem cases. Again, you can't have objects with names different in case only. One condition to implementing this would be that this behavior would have be tied down globally at initdb, because it affects system indexes and shared catalogs. That might be impractical for some, because you'd need different instances for different behaviors, especially when you want to host multiple applications or want to port an affected application to the native PostgreSQL behavior over time. 5. One additional approach I thought of is that you swap the case of identifiers as you lex them (upper to lower, lower to upper), and then swap them back when you send them to the client. This needs a small change in the lexer, one for sending the RowDescription, and support in pg_dump and a few other places if desired. There will, however, be a number of weird, albeit self-imposed, side-effects. I have implemented a little test patch for this. It's weird, but it works in basic ways. Obviously, no solution will ever work completely. And we probably don't want such a solution, because it would create two different and incompatible PostgreSQL universes. If we are aiming for a solution that would allow most affected applications to hobble along, we would probably serve most users. Implementing some or all of 3, 4, and 5 would probably achieve that. Comments? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCHES] [HACKERS] Solaris ident authentication using unix domain sockets
Josh Berkus wrote: Tom, Indeed. If the Solaris folk feel that getupeercred() is insecure, they had better explain why their kernel is that broken. This is entirely unrelated to the known shortcomings of the ident IP protocol. The Solaris security kernel folks do, actually. However, there's no question that TRUST is inherently insecure, and that's what people are going to use if they can't get IDENT to work. I think I'd pose a slightly different question from Tom. Do the Solaris devs think that their getupeercred() is more insecure than the more or less equivalent calls that we are doing on Linux and *BSD for example? I suspect they probably don't ;-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
On Tue, 2008-07-08 at 19:25 +0200, Peter Eisentraut wrote: 4. Compare the name data type in a case-insensitive manner. This would probably address most problem cases. Again, you can't have objects with names different in case only. One condition to implementing this would be that this behavior would have be tied down globally at initdb, because it affects system indexes and shared catalogs. That might be impractical for some, because you'd need different instances for different behaviors, especially when you want to host multiple applications or want to port an affected application to the native PostgreSQL behavior over time. That sounds the most workable, given your descriptions. If objects are never different solely by case alone, then you will have the same index ordering as if you had sent them all to lower case. Surely it is possible to mix the two approaches somehow? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Peter Eisentraut [EMAIL PROTECTED] writes: I have had some idle thoughts on the issue of identifier case folding. ... Comments? IMHO, practically the only solid argument for changing from the way we do things now is to meet the letter of the spec. The various sorts of gamesmanship you list would most definitely not meet the letter of the spec; between that and the inevitability of breaking some apps, I'm inclined to reject them all on sight. What I think would perhaps be worth investigating is a compile-time (or at latest initdb-time) option that flips the case folding behavior to SQL-spec-compliant and also changes all the built-in catalog entries to upper case. We would then have a solution we could offer to people who really need to run apps that depend on SQL-spec case folding ... and if the upper case hurts their eyes, or breaks some other apps that they wish they could run in the same DB, well it's their problem. Of course there would be large amounts of work to try to make psql, pg_dump, etc behave as nicely as possible with either case-folding rule, but it doesn't strike me as being so obviously unworkable as to be dismissed at once. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
Josh Berkus [EMAIL PROTECTED] writes: Andrew, Tom, Agreed. Are we even sure that we want per-database collations as a half-way house? Unless we can be sure that we want all the required catalog changes for the full requirement, it seems to me a rather messy way of getting to where we want to go. Given that we don't have a delivery date for table or column level collations, we don't want to turn down database-level collations. I am one hundred percent prepared to turn them down, if they end up contorting the design in a way that we will have to undo (with consequent backwards-compatibility problems) to get to the full feature. If it's a partial implementation of the full feature, that's fine, but I'm not getting good vibes about that from the discussions so far. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Am Dienstag, 8. Juli 2008 schrieb Tom Lane: IMHO, practically the only solid argument for changing from the way we do things now is to meet the letter of the spec. Well no. As I have mentioned, there have actually been occasional complaints by people who can't run their code generated by closed-source applications, because they handle the case differently. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Peter Eisentraut [EMAIL PROTECTED] writes: Am Dienstag, 8. Juli 2008 schrieb Tom Lane: IMHO, practically the only solid argument for changing from the way we do things now is to meet the letter of the spec. Well no. As I have mentioned, there have actually been occasional complaints by people who can't run their code generated by closed-source applications, because they handle the case differently. Sure, otherwise we wouldn't really be worrying about this. But if someone comes to us and says this closed source app requires some weird non-spec-compliant case folding rule, please make Postgres do that, we're going to say no. Their argument only has weight if they say their app expects the SQL-spec behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing quals
On 7/8/2008 11:38 AM, David Fetter wrote: On Tue, Jul 08, 2008 at 06:22:23PM +0300, Heikki Linnakangas wrote: Simon Riggs wrote: SQL, in text form, is the interface to other databases. You can't pass half a plan tree to Oracle, especially not a PostgreSQL plan tree. It has to be text if you wish to send a query to another RDBMS, or another version of PostgreSQL. Oh, I see. Agreed. Though note that there's big differences in SQL dialects, so a one-size-fits-all approach to generating SQL to be executed in the remote database won't work. (not that I think anyone has suggested that) So if I understand you, you want to pass the partial plan tree and then have a plugin construct the SQL text. Exactly. Maybe you thought I meant internal interfaces should be in text? Yeah, that's exactly what I thought you meant. No, that would be bizarre. I meant we should not attempt to pass partial plan trees outside of the database, since that would limit the feature to only working with the same version of PostgreSQL database. Agreed. I'm glad we're on the same page now. Everybody's weighed in on this thread except the guy who's actually doing the work. Jan? Here, I talked to my supervisor here in Toronto (that's where I am this week) and Afilias actually sees enough value in this for me to go and spend time officially on it. The ideas I have so far are as follows: Down in the exec nodes like SeqScan or IndexScan, there are several parts available that are important. - Scanned relation - Targetlist - Filter (for SeqScan) - IndexQual (for IndexScan) These pieces are available at least in the scans Init function and actually can be converted back into some SQL statement that effectively represents this one single table scan. However, parsing it back at that point is nonsense, as we cannot expect everything out there to actually be an SQL database. Also, both the qualification as well as the targetlist can contain things like user defined function calls. We neither want to deny nor require that this sort of construct is actually handed over to the external data source, so the interface needs to be more flexible. Therefore it is best to divide the functionality into several user exit functions. The several functions that implement a scan type inside of the executor very much resemble opening a cursor for a single table query, fetching rows from it, eventually (in the case of a nested loop for example) close and reopen the cursor with different key values from the outer tuple, close the cursor. So it makes total sense to actually require an implementation of an external data source to provide functions to open a cursor, fetch rows, close the cursor. There will be some connection and transaction handling around all this that I have in mind but think it would distract from the problem to be solved right here, so more about that another time. The C implementation for open cursor would be called with a scan handle, containing the connection, the relname, the targetlist and the qualification subtrees. These are modified from the real ones in the scan node so that all Var's have varno=1 and that all OUTER Var's have been replaced with a Const that reflects the current outer tuples values. From here there are several support functions available to dumb down each of those to whatever the external data source may support. In case of the targetlist, this could mean to filter out a unique list of Var nodes only, removing all expressions from it. In case of the qualification, this could mean remove everything that isn't a standard operator (=, , ...), or remove everything that isn't Postgres builtin. Finally, there is a support function that will build a SQL statement according to what's left inside that scan handle. The scan handle would track which modifications have been done to the various pieces so that the outer support framework knows if it gets back the originally requested targetlist, or if it has to run the projection on the returned unique list of Var's. And if it has to recheck the returned tuples for qualification, because some of the qual's had been removed. In order to allow the user exits to be written in PL's, I can think of makiing a complex data type containing the scan handle. The subtrees could be accessed by the PL via support functions that return them in nodeToString() or other formats. I'll try to write up a more complete proposal until end of next week. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] GIN improvements
I looked this over and it looks good in general. May I think that patch passed review and commit it? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] GIN improvements
Teodor Sigaev [EMAIL PROTECTED] writes: I looked this over and it looks good in general. May I think that patch passed review and commit it? I'd still like to take a look. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
Tom Lane napsal(a): Zdenek Kotala [EMAIL PROTECTED] writes: Martijn van Oosterhout napsal(a): Not necessarily. pg_class is not shared yet without it you can't even find pg_database. Same deal with pg_type. All it means is that pg_collation in template1 must contain all the collations used in template1, which shouldn't be hard to arrange. I think, Collation situation is different, All the argument here is based on the premise that we should have database-level collation specifications, which AFAICS is not required nor suggested by the SQL spec. Yeah, it is not required, but by my opinion it should be derived from CREATE SCHEMA statement. There is following item: --- SQL ANSI 2003 page 520 --- 5) If schema character set specification is not specified, then a schema character set specification that specifies an implementation-defined character set that contains at least every character that is in SQL language character is implicit. It is not for collation directly, but if I understand it correctly when you want to create schema then default charset is inherit from parent instance which is database (catalog). Following sentence specified that pg_collation should be database specific. SQL ANSI 2003 page 15 --- Character sets defined by standards or by SQL-implementations reside in the Information Schema (named INFORMATION_SCHEMA) in each catalog, as do collations defined by standards and collations, transliterations, and transcodings defined by SQL-implementations. I wonder why we are allowing a nonstandard half-measure to drive our thinking, rather than solving the real problem which is column-level collations. I try to determine how to implement collation itself - collation catalog structure and content and how to create new collation. Column-level collation is nice but until we will not have basic infrastructure we cannot start implemented it. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCHES] [HACKERS] Solaris ident authentication using unix domain sockets
Josh Berkus wrote: Tom, Indeed. If the Solaris folk feel that getupeercred() is insecure, they had better explain why their kernel is that broken. This is entirely unrelated to the known shortcomings of the ident IP protocol. The Solaris security kernel folks do, actually. However, there's no question that TRUST is inherently insecure, and that's what people are going to use if they can't get IDENT to work. I'd be *very* interested in how they come to that assessment. I'd have thought that the only alternative to getpeereid/getupeercred is password-based or certificate-based authenticated - which seem *less* secure because a) they also rely on the client having the correct uid or gid (to read the password/private key), plus b) the risk of the password/private key getting into the wrong hands. How is that sort of authenticated handled by services shipping with solaris? regards, Florian Pflug, hoping to be enlightened beyond his limited posix-ish view of the world... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] patch - Collation at database level
Andrew Dunstan napsal(a): Tom Lane wrote: All the argument here is based on the premise that we should have database-level collation specifications, which AFAICS is not required nor suggested by the SQL spec. I wonder why we are allowing a nonstandard half-measure to drive our thinking, rather than solving the real problem which is column-level collations. Agreed. Are we even sure that we want per-database collations as a half-way house? Unless we can be sure that we want all the required catalog changes for the full requirement, it seems to me a rather messy way of getting to where we want to go. Andrew, I would like also to see full collation implementation rather then collation per database. But from my point of view split collation into small parts is much better. Radek's work is mostly about creating infrastructure for full collation support. When it will be finished, then only changes in parser, executor... will be necessary to complete a job. If you look on most discussion about collation they fell into ICU yes/no problem without any real decision how to implemented the feature. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Tom, IMHO, practically the only solid argument for changing from the way we do things now is to meet the letter of the spec. The various sorts of gamesmanship you list would most definitely not meet the letter of the spec; between that and the inevitability of breaking some apps, I'm inclined to reject them all on sight. Actually, there are a number of *very* popular database tools, particularly in the Java world (such as Netbeans and BIRT) which do mix quoted and unquoted identifiers. In general, users of those tools reject PostgreSQL as broken for our nonstandard behavoir rather than trying to work around it. So it's not just a standards issue; this problem really *is* hurting us in adoption. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
On Tuesday 08 July 2008 23:04:51 Josh Berkus wrote: Tom, IMHO, practically the only solid argument for changing from the way we do things now is to meet the letter of the spec. The various sorts of gamesmanship you list would most definitely not meet the letter of the spec; between that and the inevitability of breaking some apps, I'm inclined to reject them all on sight. Actually, there are a number of *very* popular database tools, particularly in the Java world (such as Netbeans and BIRT) which do mix quoted and unquoted identifiers. In general, users of those tools reject PostgreSQL as broken for our nonstandard behavoir rather than trying to work around it. So it's not just a standards issue; this problem really *is* hurting us in adoption. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco Right. From a user's perspective 4) sounds best. I often run into problems having keywords as column-names: andreak=# create table test(user varchar); ERROR: syntax error at or near user LINE 1: create table test(user varchar); ^ andreak=# create table test(user varchar); CREATE TABLE andreak=# insert into test(USER) values('testuser'); ERROR: column USER of relation test does not exist LINE 1: insert into test(USER) values('testuser'); ^ andreak=# insert into test(user) values('testuser'); ERROR: syntax error at or near user LINE 1: insert into test(user) values('testuser'); ^ andreak=# insert into test(user) values('testuser'); INSERT 0 1 As you know, the only way of referring to the user-column is to qoute it in lowercase, which many apps and tools don't do. -- Andreas Joseph Krogh [EMAIL PROTECTED] Senior Software Developer / Manager -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Josh Berkus [EMAIL PROTECTED] wrote: Actually, there are a number of *very* popular database tools, particularly in the Java world (such as Netbeans and BIRT) which do mix quoted and unquoted identifiers. In general, users of those tools reject PostgreSQL as broken for our nonstandard behavoir rather than trying to work around it. Do these tools expect an unquoted identifier to be treated according to the standard? As I read it, an unquoted identifier should be treated identically to the same identifier folded to uppercase and wrapped in quotes, except that it will be guaranteed to be considered an identifier, rather than possibly considered as a reserved word, etc. From our perspective, we're OK with the status quo since we always quote all identifiers. I don't think any of the suggestions would bite us (if implemented bug-free) because we also forbid names which differ only in capitalization. We help out our programmers by letting them ignore quoting (except identifiers which are reserved words) and capitalization when they write queries in our tool; we correct the capitalization and wrap the identifiers in quotes as we generate the Java query classes. Doing something like that in psql autocompletion and in other PostgreSQL tools would be a nice feature, if practicable. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Tom Lane wrote: What I think would perhaps be worth investigating is a compile-time (or at latest initdb-time) option that flips the case folding behavior to SQL-spec-compliant and also changes all the built-in catalog entries to upper case. We would then have a solution we could offer to people who really need to run apps that depend on SQL-spec case folding ... and if the upper case hurts their eyes, or breaks some other apps that they wish they could run in the same DB, well it's their problem. +1 for a compile-time option for spec-compliant behavior. Even where the spec is stupid (timestamp with time zone literals) it'd be nice to have the option; both for feature completeness checklists and for teachers who want to teach targeting the spec. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Ron Mayer [EMAIL PROTECTED] wrote: +1 for a compile-time option for spec-compliant behavior. Even where the spec is stupid (timestamp with time zone literals) it'd be nice to have the option; both for feature completeness checklists and for teachers who want to teach targeting the spec. In my world it would be even more important for feature completeness itself, and for production applications written to the spec for portability. But, agreed: +1 -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Kevin Grittner [EMAIL PROTECTED] writes: Josh Berkus [EMAIL PROTECTED] wrote: Actually, there are a number of *very* popular database tools, particularly in the Java world (such as Netbeans and BIRT) which do mix quoted and unquoted identifiers. Do these tools expect an unquoted identifier to be treated according to the standard? Unfortunately, they almost certainly don't. I'd bet long odds that what they expect is mysql's traditional behavior, which is not even within hailing distance of being spec compliant. (In a quick test, it looks like mysql 5.0's default behavior is never to fold case at all; and then there's the problem that they use the wrong kind of quotes ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, text search selectivity and dllist enhancments
Jan Urbański wrote: If you think the Lossy Counting method has potential, I could test it somehow. Using my current work I could extract a stream of lexemes as ANALYZE sees it and run it through a python implementation of the algorithm to see if the result makes sense. I hacked together a simplistic python implementation and ran it on a table with 244901 tsvectors, 45624891 lexemes total. I was comparing results from my current approach with the results I'd get from a Lossy Counting algorithm. I experimented with statistics_target set to 10 and 100, and ran pruning in the LC algorithm every 3, 10 or 100 tsvectors. The sample size with statistics_target set to 100 was 3 rows and that's what the input to the script was - lexemes from these 3 tsvectors. I found out that with pruning happening every 10 tsvectors I got precisely the same results as with the original algorithm (same most common lexemes, same frequencies). When I tried pruning after every 100 tsvectors the results changed very slightly (they were a tiny bit more distant from the ones from the original algorithm, and I think a tiny bit more precise, but I didn't give it much attention). Bottom line seems to be: the Lossy Counting algorithm gives roughly the same results as the algorithm used currently and is also possibly faster (and more scalable wrt. statistics_target). This should probably get more testing than just running some script 5 times over a fixed set of data, but I had trouble already sucking ~300 MB of tsvectors from one of my production sites, putting it on my laptop and so on. Do you think it's worthwhile to implement the LC algorithm in C and send it out, so others could try it out? Heck, maybe it's worthwhile to replace the current compute_minimal_stats() algorithm with LC and see how that compares? Anyway, I can share the python script if someone would like to do some more tests (I suppose no-one would, 'cause you first need to apply my ts_typanalyze patch and then change it some more to extract lexemes from the sample). Cheers, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Andreas Joseph Krogh [EMAIL PROTECTED] writes: Right. From a user's perspective 4) sounds best. I often run into problems having keywords as column-names: None of the proposals on the table will remove the need to use quotes in that case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, text search selectivity and dllist enhancments
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED] writes: Do you think it's worthwhile to implement the LC algorithm in C and send it out, so others could try it out? Heck, maybe it's worthwhile to replace the current compute_minimal_stats() algorithm with LC and see how that compares? Very possibly. I repeat that the current implementation of compute_minimal_stats is very ad-hoc code and wasn't written with an eye to high performance. Replacing it with an algorithm that someone actually thought about might well be worth doing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Summary of some postgres portability issues
In trying to port postgres to 64-bit Windows, I've encountered a number of issues which may (or may not) affect other compilers. If you followed the other thread, this is mostly a summary with a bit more details so feel free to ignore it. Some of these may have some minor effects on other platforms, so they may be of interest (but I doubt it, since no one has noticed/complained about them yet). This post contains a small taxonomy of the problems, as well as some discussion about the work that needs to be done in order to make postgres portable to LLP64 data model compilers (in case someone else is interested). I use the standard terms for discussing different compiler data models, which are explained here: http://www.unix.org/version2/whatsnew/lp64_wp.html Throughout this post I will assume sizeof(size_t) == sizeof(void*), because I doubt you want to support a system where this is not the case. When I try to compile postgres, I get 396 warnings. These come from several different places: 1.) Most of the code involving strings requires a ILP32 or ILP64 to not generate a warning. This means sizeof(int) == sizeof(size_t) == 32 or 64, respectively. Something as simple as: int len; len = strlen(str); violates this on LP32, LP64, and LLP64. AFAIK, there really are no LP32 compilers around anymore, but LP64 is common (LLP64 is MSVC). None of these warnings are actually problems, since they involve strings and realistically the problems which could happen never will. Unfortunately, these are actually portability problems, since you never want to disable narrow cast warnings when supporting different architectures because some of the warnings could be important. If these aren't disabled, they will be very annoying and make it hard to spot real problems (and tempt people to turn off all such warnings). If they are changed, almost 300 lines will need to be committed, all of which have the not very exciting form: int len; len = (int)strlen(str); the alternative is changing int to size_t everywhere, which several have objected to because of bloat. This bloat will only affect LP64 and LLP64, which do not seem to have been the target machines in the first place. I'd be willing to make the changes to either form, but I don't know if anyone would be willing to commit them :) 2.) There is a lot of other code involving memory index and offset calculations being int. On ILP64, these will be able to work with buffers 2 GB. On LP64 or LLP64, they will not. On ILP64, sizeof(int) == sizeof(size_t), but on the other two sizeof(int) sizeof(size_t). Either c.h or postgres.h (I forgot which) defines an Offset and Index typedef to aid in portability, but they are only rarely used. Most of the unchecked conversions from size_t to int are of the string variety (1), but there are a fair amount of these as well. None of these warnings are actually problems even on LP64 or LLP64, unless the buffers involved are 2 GB. Buffers 2 GB will work with no changes on ILP64 only. Whether the problem domain specifies that they _can't_ (or probably never should) be 2 GB either way must be examined on a case by case basis, and I haven't examined that yet. Thoughts on 1 2 == I was surprised to see this in the code. The reason is that both of these issues affect LP64. Problems with LLP64 are expected, because LLP64 basically means Microsoft and therefore support is not usually a concern of the OSS community. LP64 on the other hand is any x64 machine using gcc, or at least it was several years ago. Has that changed? Can gcc now be configured to use ILP64 instead? 3.) Some of the code assigns size_t to uint16. These should elicit warnings on all compilers, but are almost certainly guaranteed to never be errors because the use of uint16 implies that the developer clearly knows that this is the maximum size needed in all situations and the precision loss never matters (here, the size_t variables being cast were probably not really supposed to be size_t in the first place, but Size was used in the RHS with no cast, carelessly). 4.) Some of the code assigns size_t to uint32. It's unclear if these cases are like (2) or like (3), and would need to be examined on a case by case basis. Problems for LLP64 compilers == Almost everywhere the keyword long appears, is a problem for LLP64 whether its a warning or not. Unfortunately this happens in a very large number of places. Size is supposed to be used for memory resident objects, but rarely is. Often (signed) long is used, and its not immediately clear without fully reading through the program whether it actually needs to be signed or not (i.e., whether it can safely be changed to Size or not). Sometimes it does need to be signed, as allocations can apparently be negative, and this is required for correct program operation i.e., there are a few: while !(something_which_should_semantically_never_be_less_than_0 0) The types of
Re: [HACKERS] Identifier case folding notes
Peter Eisentraut wrote: I have had some idle thoughts on the issue of identifier case folding. Maybe we can collect our ideas and inch closer to a solution sometime. Or we determine that it's useless and impossible, but then I can at least collect that result in the wiki and point future users to it. Background: PostgreSQL folds unquoted identifiers to lower case. SQL specifies folding to upper case, and most other SQL DBMS do it that way. If an application mixes referring to an object using quoted and unquoted versions of an identifier, then incompatibilities arise. We have always stressed to users that one should refer to an object always unquoted or always quoted. While that remains a good suggestion for a number of reasons, we have seen occasional complaints that some closed source applications violate this rule and therefore cannot be run on PostgreSQL. A number of solutions have been proposed over time, which I summarize here: 1. Change the lexer to fold to upper case, as it is supposed to do according to the SQL standard. This will break almost everything, because almost all built-in objects have lower-case names and thus couldn't be referred to anymore except by quoting. Changing the names of all the internal objects to upper-case names would involve vast code changes, probably break just as much, and make everything uglier. So this approach is unworkable. 2. Fold to upper case, but not when referring built-in objects. The lexer generally doesn't know what a name will refer to, so this is not possible to implement, at least without resorting to lots of hard-coding or horrid kludges. Also, a behavior like this will probably create all kinds of weird inconsistencies, resulting from putting catalog knowledge in the lexer. 3 and 4 -- Two variants of ignore case altogether: 3. Fold all identifiers to lower case, even quoted ones. This would probably in fact fix the breakage of many of the above-mentioned problem applications, and it would in general be very easy to understand for a user. And it could be implemented in about three lines. One disadvantage is that one could no longer have objects that have names different only by case, but that is probably rare and incredibly stupid and can be neglected. The main disadvantage is that the case of identifiers and in particular column labels is lost. So applications and programming interfaces that look up result columns in a case-sensitive manner would fail. And things like SELECT expr AS Nice Heading won't work properly anymore. 4. Compare the name data type in a case-insensitive manner. This would probably address most problem cases. Again, you can't have objects with names different in case only. One condition to implementing this would be that this behavior would have be tied down globally at initdb, because it affects system indexes and shared catalogs. That might be impractical for some, because you'd need different instances for different behaviors, especially when you want to host multiple applications or want to port an affected application to the native PostgreSQL behavior over time. 5. One additional approach I thought of is that you swap the case of identifiers as you lex them (upper to lower, lower to upper), and then swap them back when you send them to the client. This needs a small change in the lexer, one for sending the RowDescription, and support in pg_dump and a few other places if desired. There will, however, be a number of weird, albeit self-imposed, side-effects. I have implemented a little test patch for this. It's weird, but it works in basic ways. Obviously, no solution will ever work completely. And we probably don't want such a solution, because it would create two different and incompatible PostgreSQL universes. If we are aiming for a solution that would allow most affected applications to hobble along, we would probably serve most users. Implementing some or all of 3, 4, and 5 would probably achieve that. I'm not sure if you've read all the archive history on this. Here are the pointers from the TODO list: http://archives.postgresql.org/pgsql-hackers/2004-04/msg00818.php http://archives.postgresql.org/pgsql-hackers/2006-10/msg01527.php http://archives.postgresql.org/pgsql-hackers/2008-03/msg00849.php The fact is that we have substantial groups of users who want different things: . current users who want no change so there is no breakage in existing apps . users on other DBs who want Spec compliance . users on yet other DBs who want case preservation The last group should not be lightly dismissed - it is quite common behaviour on MSSQL as well as on MySQL, so we have some incentive to make this possible to encourage migration. I'm strongly of the opinion therefore that this should be behaviour determined at initdb time (can't make it later because of shared catalogs). I suspect that we won't be able to do all this
Re: [HACKERS] Identifier case folding notes
Peter Eisentraut [EMAIL PROTECTED] writes: One disadvantage is that one could no longer have objects that have names different only by case, but that is probably rare and incredibly stupid and can be neglected. Certainly not if you hope to claim being within a mile of spec -- which seems like the only point of fiddling with this. Breaking this would take as further from spec-compliance than we are today. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Andrew Dunstan wrote: I'm not sure if you've read all the archive history on this. Here are the pointers from the TODO list: http://archives.postgresql.org/pgsql-hackers/2004-04/msg00818.php http://archives.postgresql.org/pgsql-hackers/2006-10/msg01527.php http://archives.postgresql.org/pgsql-hackers/2008-03/msg00849.php The fact is that we have substantial groups of users who want different things: . current users who want no change so there is no breakage in existing apps . users on other DBs who want Spec compliance . users on yet other DBs who want case preservation The last group should not be lightly dismissed - it is quite common behaviour on MSSQL as well as on MySQL, so we have some incentive to make this possible to encourage migration. I'm strongly of the opinion therefore that this should be behaviour determined at initdb time (can't make it later because of shared catalogs). I suspect that we won't be able to do all this by simple transformations in the lexer, unlike what we do now. But I do think it's worth doing. cheers andrew Hi, as part of the http://archives.postgresql.org/pgsql-hackers/2008-03/msg00849.php thread, I did a reasonable amount of discovery work on making the 3 options andrew presents a reality. As I'm not skilled enough I never got far enough to make them all work at once. I did however get lower case and case preservation working. To make those tow work the catalogs need no changes. Some of the regressions tests expect case folding, so they did need changing to operate correctly. I was unable to make the input files to initdb correctly fold the case of system catalogs for the upper case version. I'm sure somebody with more experience would not find it as difficult as I did. Function names tended to be where all the gotchas were. Count() vs count() vs COUNT() for example. Once the db was up and running, the issue becomes all the supporting tools. psql was made to autocomplete with case preservation, I was going to make pg_dump just quote everything. I then got to the point of adding a fixed GUC like LC_LOCALE that allows psql to read the case folding situation and act according. That is where my progress ended. Attached is what i had worked in. It's a patch against 8.3.1. I know it's not CVS head, but it is what I was using at the time to experiment. Regards Russell === modified file 'src/backend/access/transam/xlog.c' --- src/backend/access/transam/xlog.c 2008-03-27 12:10:18 + +++ src/backend/access/transam/xlog.c 2008-03-27 14:15:13 + @@ -4040,6 +4040,9 @@ PGC_INTERNAL, PGC_S_OVERRIDE); SetConfigOption(lc_ctype, ControlFile-lc_ctype, PGC_INTERNAL, PGC_S_OVERRIDE); + /* Make the fixed case folding visible as GUC variables, too */ + SetConfigOption(identifier_case_folding, ControlFile-identifierCaseFolding, + PGC_INTERNAL, PGC_S_OVERRIDE); } void @@ -4290,6 +4293,10 @@ ControlFile-time = checkPoint.time; ControlFile-checkPoint = checkPoint.redo; ControlFile-checkPointCopy = checkPoint; + + /* Set the case folding option */ + strncpy(ControlFile-identifierCaseFolding, preserved, 9); + /* some additional ControlFile fields are set in WriteControlFile() */ WriteControlFile(); === modified file 'src/backend/catalog/information_schema.sql' --- src/backend/catalog/information_schema.sql 2008-03-27 12:10:18 + +++ src/backend/catalog/information_schema.sql 2008-03-27 12:12:15 + @@ -23,7 +23,7 @@ */ CREATE SCHEMA information_schema; -GRANT USAGE ON SCHEMA information_schema TO PUBLIC; +GRANT usage ON SCHEMA information_schema TO public; SET search_path TO information_schema, public; @@ -33,7 +33,7 @@ /* Expand any 1-D array into a set with integers 1..N */ CREATE FUNCTION _pg_expandarray(IN anyarray, OUT x anyelement, OUT n int) -RETURNS SETOF RECORD +RETURNS SETOF record LANGUAGE sql STRICT IMMUTABLE AS 'select $1[s], s - pg_catalog.array_lower($1,1) + 1 from pg_catalog.generate_series(pg_catalog.array_lower($1,1), @@ -214,7 +214,7 @@ CREATE VIEW information_schema_catalog_name AS SELECT CAST(current_database() AS sql_identifier) AS catalog_name; -GRANT SELECT ON information_schema_catalog_name TO PUBLIC; +GRANT SELECT ON information_schema_catalog_name TO public; /* @@ -241,9 +241,9 @@ FROM pg_auth_members m JOIN pg_authid a ON (m.member = a.oid) JOIN pg_authid b ON (m.roleid = b.oid) -WHERE pg_has_role(a.oid, 'USAGE'); +WHERE pg_has_role(a.oid, 'usage'); -GRANT SELECT ON applicable_roles TO PUBLIC; +GRANT SELECT ON applicable_roles TO public; /* @@ -256,7 +256,7 @@ FROM applicable_roles WHERE is_grantable = 'YES'; -GRANT SELECT ON administrable_role_authorizations TO PUBLIC; +GRANT SELECT ON administrable_role_authorizations TO public; /* @@ -353,7 +353,7 @@ AND a.attnum 0 AND NOT a.attisdropped AND c.relkind in ('c'); -GRANT SELECT ON attributes
Re: [PATCHES] [HACKERS] WITH RECURSIVE updated to CVS TIP
On Tue, Jul 08, 2008 at 06:01:05PM +0900, Tatsuo Ishii wrote: Here is the patches he made against CVS HEAD (as of today). The git repository should now match this :) http://git.postgresql.org/?p=~davidfetter/postgresql/.git;a=summary Apparently, it's easiest to clone via the following URL: http://git.postgresql.org/git/~davidfetter/postgresql/.git Is there some git repository I can pull from to make this a little less manual? Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCHES] [HACKERS] WITH RECURSIVE updated to CVS TIP
David Fetter wrote: On Tue, Jul 08, 2008 at 06:01:05PM +0900, Tatsuo Ishii wrote: Here is the patches he made against CVS HEAD (as of today). The git repository should now match this :) http://git.postgresql.org/?p=~davidfetter/postgresql/.git;a=summary Apparently, it's easiest to clone via the following URL: http://git.postgresql.org/git/~davidfetter/postgresql/.git Is there some git repository I can pull from to make this a little less manual? In fact, I fail to see the point of you providing the repo if the upstream guys are apparently not using it ... -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers