[HACKERS] PGDay.it collation discussion notes
Radek, Zdenek, and Heikki had an extended discussion here at PGDay.it on collation support. I was volunteered to be the note-taker (!). Here is the plan we came up with: Firstly, the ANSI standard makes collations into schema-qualified names which pretty much forces us to have the collation catalog table. The ANSI standard syntax where the COLLATION keyword can follow just about any string value in an expression and then bubbles up the expression until an operation needs to pick a collation seemed very weird to us. Hooking that into the parser was pretty intimidating but we thought we could postpone that till later and solve the meat of the problem first. Instead, at least for now, we'll have a guc variable which decides the current collation. Eventually that would just be the default if the query (or column or whatever) doesn't override it with an explicitly selected collation. (This would mean we need to treat the default collation like search_path for things like plpgsql cached plans etc. Perhaps we could have a flag to indicate whether plans actually need to be invalidated or are collation-independent) In a query the current collation would be grabbed by the planner when it's deciding on operators. The planner would grab that collation oid and track it in the sort key along with the nulls first/last and asc/desc properties. It's important it do this immediately in the planner rather than just reference the guc later because eventually it needs to possible to override it at this point and also so that if you change the guc during the query execution (ie, in a function) it would not affect the execution of your current query. pg_index would need an additional column to go along with indkey and indoptions. It would be an oidvector of collation oids. When planning a query or building an index we need to determine whether the opclass is collation-aware or not. It seemed best that we not hard code which data types are collation aware and instead look at either the operator or the pg_proc entry for the function implementing the sort operator or perhaps the btproc for the opclass to see if it takes a third argument. Or perhaps we would have a boolean column in pg_opclass which asserts they do and that it should be the collation. If so then the collation is stored in the sort key and the indcollation slots and is passed to the operator or the btproc function as the third argument. We think even without the parser changes this would be useful enough to take on its own. It would allow having indexes built in different collations and have different sessions use different collations. You could even hack queries which combine collations by defining a function handle subparts of the query and attach the guc the same way we attach the search_path to security definer functions. To finish the feature the catalog needs to add a default collation to every column and some other database objects. Then the parser needs to check those objects first before falling back to the session variable. Then it needs to bubble that value up as it builds expressions so that it's available at the comparison operator or sort node. We didn't read the part of the spec which covered the rules for this but it does have rules which should turn up if you search for collation derivation. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: First integrated patch
On Fri, 2008-10-17 at 16:47 -0400, Merlin Moncure wrote: On Fri, Oct 17, 2008 at 10:38 AM, Simon Riggs [EMAIL PROTECTED] wrote: First integrated patch for Hot Standby, allowing queries to be executed while in recovery mode. The patch tests successfully with the enclosed files: * primary_setup_test.sql - run it on primary node * standby_allowed.sql - run on standby - should all succeed * standby_disallowed.sql - run on standby - should all fail plus other manual testing. This is still WIP - its good enough to release for comments, though I am not yet confident enough to claim it bug free. What this doesn't do YET: * cope fully with subxid cache overflows (some parts still to add) * cope with prepared transactions on master * work correctly when running queries AND replaying WAL * work correctly with regard to AccessExclusiveLocks, which should prevent access to tables These last four points are what I'm working on over the next two weeks, plus any other holes people point out along the way. I have worked out designs for most of these aspects and will discuss them on -hackers, though most design notes are in the Wiki. I'm still looking into prepared transactions. Comments appreciated. It appears to be working, at least in some fashion. The supplied tests all pass. Cool Thanks for testing so far. At first glance it seems like I have to force changes to the standby with pg_switch_xlog(). hmm. You'll have to explain some more. Normally files don't get sent until they are full, so yes, you would need to do a pg_switch_xlog(). This is not streaming replication. Others are working on that. This probably isn't right: postgres=# \d No relations found. postgres=# select count(*) from foo; count - 100 (1 row) I created a table, pg_switch_xlog, query several times,i dropped a table, pg_switch_xlog, table is 'gone', but still returns data exit/enter session, now its gone. Sometimes I have to exit/enter session to get an up to date standby. These are just first impressions... Replaying and queries don't mix yet, so that is expected. I'm working on this in phases. This patch is phase 1 - it is not the finished patch. Phase 2: working on correct block locking to allow concurrent DML changes to occcur while we run queries. Phase 3: working on correct relation locking/relcache to allow concurrent DDL changes to occur while we run queries. I have designs of the above and expect to complete in next two weeks. The reason for the above behaviour is that DDL changes need to fire relcache invalidation messages so that the query backend sees the change. The reason the table is still there is because the files haven't been dropped yet. So everything you have seen is expected, by me. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing some DDL Locks to ShareLock
On Tue, 2008-10-07 at 10:35 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Tue, 2008-10-07 at 10:05 -0400, Robert Haas wrote: 3. The patch introduces a slight weirdness: if you create two FKs on the same column at the same time you end up with two constraints with identical names. Drop constraint then removes them both, though in other respects they are both valid, just not uniquely. CREATE INDEX avoids this by way of the unique index on relname. The equivalent index on pg_constraint is not unique, though *cannot* be made unique without breaking some corner cases of table inheritance. Urk... this seems pretty undesirable. OK, but please say what behaviour you would like in its place. I wonder whether this could be helped if we refactored pg_constraint. The lack of a useful pkey for it has been annoying me for awhile, and I think it stems from a misguided choice to put table and domain constraints into the same catalog. Suppose that * table constraints go into pg_relation_constraint, with a unique key on (conrelid, conname) * domain constraints go into pg_domain_constraint, with a unique key on (contypid, conname) * pg_constraint can still exist as a union view, for client compatibility Then the unique key would prevent concurrent creation of identically-named constraints for the same relation. I'm planning to squeeze this refactoring work in now also before freeze, unless somebody else wants to pick this up? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] adding collation to a SQL database
Normally I would lurk on this list for a month or two before I considered posting, but I saw a note on collation and thought some of you might be interested in my experience implementing collation for the ANTs Data Server. In ADS, we treated the collation internally as part of the type for string types. In other words, char, varchar, and clob expressions and columns had not only a length and a can-be-null flag, but also a collation. This let us do a lot less work to add collations because we didn't have to munge the expression code very much. We just adapted the code that does type assignment and then the information was carried through the compiler in the type info with no further work (except where the compiler actually had to know about collation). We didn't need to mess with index selection, for example, because it was handled correctly just based on types. And it let us add collated indexes very easily just by adding the new type information. We treated those weird collation expressions like type casts, which made them easier to implement also. Collated compares are very slow and it's worth quite a bit of effort to to avoid them when possible (for example by putting them last in the list of predicates to evaluate). ICU has a facility for prepocessing search strings so they can be byte compared (which is much faster than normal collated compares). You can construct a collated index by using proprocessed search strings as the keys so that index searches are faster. We didn't do that in ADS for several reasons: first, the preprocessed strings are bigger (about 30% as I recall) so you can get fewer keys in a node. Second, you don't do really do a lot of comparisons in an index lookup. Third, ADS had an optimization where it didn't read the data files if the index contained all of the columns needed for the query, and that optimization would not work with these sorts of collation indexes. Still, the parameters of Postgresql are quite a bit different so it might be worth considering. Partly for backward compatibility and partly because collated compares are so slow, ADS had a default collation that was just a normal ascii compare. Since strings were all encoded in UTF-8, the collation order was incorrect if there were any multi-byte characters, but in English-only columns, or columns where they just wanted a reproducible ordering and didn't care that much about language rules, it was much faster than the ICU compare. regards, David Gudeman
Re: [HACKERS] PGDay.it collation discussion notes
On Sat, Oct 18, 2008 at 07:41:12AM +0100, Gregory Stark wrote: The ANSI standard syntax where the COLLATION keyword can follow just about any string value in an expression and then bubbles up the expression until an operation needs to pick a collation seemed very weird to us. Hooking that into the parser was pretty intimidating but we thought we could postpone that till later and solve the meat of the problem first. It's quite straightforward. Every expression has a collation, the COLLATE keyword just overrides it. And the collation is a parameter of the operators/functions that want to use it. Implementation is also straightforward: add expr :: expr COLLATE ident to the bison code and a collation oid to the Expr node. The code to bubble up the collation ids would be in the patch I posted a few years back. pg_index would need an additional column to go along with indkey and indoptions. It would be an oidvector of collation oids. Have you decided what the collation oid will refer to? If the index is a descending index, does that affect the collation oid? When planning a query or building an index we need to determine whether the opclass is collation-aware or not. It seemed best that we not hard code which data types are collation aware and instead look at either the operator or the pg_proc entry for the function implementing the sort operator or perhaps the btproc for the opclass to see if it takes a third argument. Or perhaps we would have a boolean column in pg_opclass which asserts they do and that it should be the collation. If so then the collation is stored in the sort key and the indcollation slots and is passed to the operator or the btproc function as the third argument. The way I approached it was to just mark the opclass (perhaps opfamily would be better now, it didn't exist at the time). Additionally you will need to mark the individual operators/functions as to whether they're sensetive to the collation or not. This is needed because an Error state in the collation oid is only an error if applied to an operator that cares. This might not be needed in the first implementation, though. We think even without the parser changes this would be useful enough to take on its own. It would allow having indexes built in different collations and have different sessions use different collations. You could even hack queries which combine collations by defining a function handle subparts of the query and attach the guc the same way we attach the search_path to security definer functions. Ugh. I think just implementing the parser changes will be less work. To finish the feature the catalog needs to add a default collation to every column and some other database objects. Then the parser needs to check those objects first before falling back to the session variable. Then it needs to bubble that value up as it builds expressions so that it's available at the comparison operator or sort node. We didn't read the part of the spec which covered the rules for this but it does have rules which should turn up if you search for collation derivation. Once you've added the default collation to every object, the session variable becomes redundant. Even unknown text strings will have a collation oid as soon as they are cast to text, since the text type will have a default collation (just like it has a default operator class). Nice to see some progress being made here. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
[HACKERS] ALTER TABLE ... SET DATA TYPE (SQL:2008)
Here is a patch that allows the new SQL:2008 syntax (also used by IBM) ALTER TABLE tab ALTER COLUMN col SET DATA TYPE typ alongside our current syntax ALTER TABLE tab ALTER COLUMN col TYPE typ I verified that we implement a superset what the standard says. (Of course, the standard doesn't suport the USING clause.) There was some key word overlap with ecpg, so the patch looks bulkier than it really is. diff -ur -x CVS ../cvs-pgsql/doc/src/sgml/ref/alter_table.sgml ./doc/src/sgml/ref/alter_table.sgml --- ../cvs-pgsql/doc/src/sgml/ref/alter_table.sgml 2008-05-10 01:32:03.0 +0200 +++ ./doc/src/sgml/ref/alter_table.sgml 2008-10-18 10:30:41.0 +0200 @@ -33,7 +33,7 @@ ADD [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable replaceable class=PARAMETERtype/replaceable [ replaceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ] DROP [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable [ RESTRICT | CASCADE ] -ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable TYPE replaceable class=PARAMETERtype/replaceable [ USING replaceable class=PARAMETERexpression/replaceable ] +ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable [ SET DATA ] TYPE replaceable class=PARAMETERtype/replaceable [ USING replaceable class=PARAMETERexpression/replaceable ] ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable SET DEFAULT replaceable class=PARAMETERexpression/replaceable ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable DROP DEFAULT ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable { SET | DROP } NOT NULL @@ -93,7 +93,7 @@ /varlistentry varlistentry -termliteralALTER COLUMN TYPE/literal/term +termliteralSET DATA TYPE/literal/term listitem para This form changes the type of a column of a table. Indexes and @@ -760,7 +760,7 @@ with time zone/type via a literalUSING/literal clause: programlisting ALTER TABLE foo -ALTER COLUMN foo_timestamp TYPE timestamp with time zone +ALTER COLUMN foo_timestamp SET DATA TYPE timestamp with time zone USING timestamp with time zone 'epoch' + foo_timestamp * interval '1 second'; /programlisting @@ -868,8 +868,9 @@ titleCompatibility/title para - The literalADD/literal, literalDROP/, and literalSET DEFAULT/ - forms conform with the SQL standard. The other forms are + The forms literalADD/literal, literalDROP/, literalSET DEFAULT/, + and literalSET DATA TYPE/literal (without literalUSING/literal) + conform with the SQL standard. The other forms are productnamePostgreSQL/productname extensions of the SQL standard. Also, the ability to specify more than one manipulation in a single commandALTER TABLE/ command is an extension. diff -ur -x CVS ../cvs-pgsql/src/backend/catalog/sql_features.txt ./src/backend/catalog/sql_features.txt --- ../cvs-pgsql/src/backend/catalog/sql_features.txt 2008-10-18 11:56:26.0 +0200 +++ ./src/backend/catalog/sql_features.txt 2008-10-18 10:23:46.0 +0200 @@ -236,7 +236,7 @@ F381 Extended schema manipulation01 ALTER TABLE statement: ALTER COLUMN clause YES F381 Extended schema manipulation02 ALTER TABLE statement: ADD CONSTRAINT clauseYES F381 Extended schema manipulation03 ALTER TABLE statement: DROP CONSTRAINT clause YES -F382 Alter column data type NO PostgreSQL syntax differs +F382 Alter column data type YES F391 Long identifiersYES F392 Unicode escapes in identifiers NO F393 Unicode escapes in literals NO diff -ur -x CVS ../cvs-pgsql/src/backend/parser/gram.y ./src/backend/parser/gram.y --- ../cvs-pgsql/src/backend/parser/gram.y 2008-10-15 13:19:39.0 +0200 +++ ./src/backend/parser/gram.y 2008-10-18 10:14:05.0 +0200 @@ -304,7 +304,7 @@ %type boolean copy_from -%type ival opt_column event cursor_options opt_hold +%type ival opt_column event cursor_options opt_hold opt_set_data %type objtypereindex_type drop_type comment_type %type node fetch_direction select_limit_value select_offset_value @@ -407,7 +407,7 @@ CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE - DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS + DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP @@ -1534,16 +1534,16 @@ $$ = (Node *)n; } /* -* ALTER TABLE name ALTER [COLUMN]
Re: [HACKERS] Hot Standby: First integrated patch
On Sat, Oct 18, 2008 at 4:11 AM, Simon Riggs [EMAIL PROTECTED] wrote: On Fri, 2008-10-17 at 16:47 -0400, Merlin Moncure wrote: On Fri, Oct 17, 2008 at 10:38 AM, Simon Riggs [EMAIL PROTECTED] wrote: First integrated patch for Hot Standby, allowing queries to be executed while in recovery mode. The patch tests successfully with the enclosed files: * primary_setup_test.sql - run it on primary node * standby_allowed.sql - run on standby - should all succeed * standby_disallowed.sql - run on standby - should all fail plus other manual testing. This is still WIP - its good enough to release for comments, though I am not yet confident enough to claim it bug free. What this doesn't do YET: * cope fully with subxid cache overflows (some parts still to add) * cope with prepared transactions on master * work correctly when running queries AND replaying WAL * work correctly with regard to AccessExclusiveLocks, which should prevent access to tables These last four points are what I'm working on over the next two weeks, plus any other holes people point out along the way. I have worked out designs for most of these aspects and will discuss them on -hackers, though most design notes are in the Wiki. I'm still looking into prepared transactions. Comments appreciated. It appears to be working, at least in some fashion. The supplied tests all pass. Cool Thanks for testing so far. At first glance it seems like I have to force changes to the standby with pg_switch_xlog(). hmm. You'll have to explain some more. Normally files don't get sent until they are full, so yes, you would need to do a pg_switch_xlog(). This is not streaming replication. Others are working on that. right...this was expected. less the missing parts, things are working very well. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDay.it collation discussion notes
Martijn van Oosterhout [EMAIL PROTECTED] writes: It's quite straightforward. Every expression has a collation, the COLLATE keyword just overrides it. And the collation is a parameter of the operators/functions that want to use it. Implementation is also straightforward: add expr :: expr COLLATE ident to the bison code and a collation oid to the Expr node. It's fairly irritating to think that a string-specific option is going to become part of the fundamental type system --- it makes no sense to distinguish different collations for numeric for instance (and in fact I would want to see the parser throw an error if you tried to attach a collation to a non-string type). As for inserting a collation OID in every single variant of Expr, the answer is no. Another objection to this design is that it's completely unclear that functions from text to text should necessarily yield the same collation that went into them, but if you treat collation as a hard-wired part of the expression syntax tree you aren't going to be able to do anything else. (What will you do about functions/operators taking more than one text argument?) I think it would be better to treat the collation indicator as part of string *values* and let it bubble up through expressions that way. The expr COLLATE ident syntax would be a simple run-time operation that pokes a new collation into a string value. The notion of a column having a particular collation would then amount to a check constraint on the values going into the column. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] two servers on the same port
I just spent a couple of days trying to figure out why I couldn't start two servers on the same port, even though I was configuring separate listen_address values. I kept gettting errors about shmget failing with could not create shared memory segment: Invalid argument. I finally noticed that the shared memory key mentioned in the error when starting the second server was the same as what the first server was using, which appeared to be generated based off of the port number. Sure enough when I changed the port, it used a different shared memory key and started right up. After searching around on the web a bit I found some pages that suggested running under different userids might be necessary. So, I tried that, and changed the port back to the standard 5432, and it started up. Anyway, everything seems to be working fine, but I figured this info should be a bit easier to find, so here's a couple patches to the documentation to mention how this works. eric --- doc/src/sgml/config.sgml.orig 2008-10-18 00:08:50.0 -0500 +++ doc/src/sgml/config.sgml2008-10-18 00:10:58.0 -0500 @@ -337,6 +337,12 @@ same port number is used for all IP addresses the server listens on. This parameter can only be set at server start. /para + para +This setting also determines the key used for the shared memory +segment. Because of that, two servers can not be started on the +same port, even if they have different listen_addresses, unless +they are also running under two different userids. + /para /listitem /varlistentry --- doc/src/sgml/runtime.sgml.orig 2008-10-18 00:05:37.0 -0500 +++ doc/src/sgml/runtime.sgml 2008-10-18 00:08:37.0 -0500 @@ -401,6 +401,15 @@ /para para +You can also get this error if you try to start two servers on the +same machine, on the same port, even if you specify different +listen_address values. In order for that configuration to work, +you'll need to run the servers under different userids, which will +cause productnamePostgreSQL/productname to use different +shared memory keys. + /para + +para An error like screen FATAL: could not create semaphores: No space left on device -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDay.it collation discussion notes
Tom Lane [EMAIL PROTECTED] writes: It's fairly irritating to think that a string-specific option is going to become part of the fundamental type system --- it makes no sense to distinguish different collations for numeric for instance Actually I thought of that generality as an advantage. Just because we can't think of any right now doesn't mean there aren't applications of this. The only example I could think of was a comparison operator on numeric which specifies a significant precision. That doesn't sound super useful but there are a lot of data types out there and I don't see any reason to think text is the only one in the world that will have more than one reasonable ordering. I think it would be better to treat the collation indicator as part of string *values* and let it bubble up through expressions that way. The expr COLLATE ident syntax would be a simple run-time operation that pokes a new collation into a string value. The notion of a column having a particular collation would then amount to a check constraint on the values going into the column. I'm not super familiar with the spec here but from what I understood I think this would be very different. For instance, I think you need to be able to set the default collation on a whole column after the fact. Rewriting the whole table to handle a collation change seems like a non-starter. Also, if the column doesn't have a default collation specified then you need to use the default collation for a more general object -- I'm not sure if it's table or schema next. Thirdly, to handle resolving conflicting default collations you need to track where the source of the default collation was. Ie, whether it was a default or an explicit choice by the query. Collation isn't really a property of the text at all. This design would force the sources of text to pick a collation that will be used by other parts of the application that they know nothing about. How is an DBA using COPY to populate a table going to know what collation the web app which eventually uses the data in that table will want to use? The other side of the coin is that given the spec-compliant behaviour you can always emulate the behaviour you're describing by adding another column. It would be more useful too since you'll have a language column which may be useful independently from the text content. And of course the scheme you're describing would waste a huge amount of space in every string on disk. For short strings it could triple the amount of space (plus I think the explicit vs implicit collation would make it even worse). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] two servers on the same port
Eric Haszlakiewicz [EMAIL PROTECTED] writes: I just spent a couple of days trying to figure out why I couldn't start two servers on the same port, even though I was configuring separate listen_address values. That's already documented not to work, and not for any hidden implementation reason: you'd have a conflict on the Unix-domain socket name. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] two servers on the same port
Tom Lane wrote: Eric Haszlakiewicz [EMAIL PROTECTED] writes: I just spent a couple of days trying to figure out why I couldn't start two servers on the same port, even though I was configuring separate listen_address values. That's already documented not to work, and not for any hidden implementation reason: you'd have a conflict on the Unix-domain socket name. unless you use a different socket directory. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] two servers on the same port
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: That's already documented not to work, and not for any hidden implementation reason: you'd have a conflict on the Unix-domain socket name. unless you use a different socket directory. Hmm ... but the OP didn't mention any such thing. In any case I think he's misdiagnosed his problem, because the shmem code *should* ignore pre-existing shmem segments that are already in use --- see the loop in PGSharedMemoryCreate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDay.it collation discussion notes
On Sat, Oct 18, 2008 at 11:28:41AM -0400, Tom Lane wrote: It's fairly irritating to think that a string-specific option is going to become part of the fundamental type system --- it makes no sense to distinguish different collations for numeric for instance (and in fact I would want to see the parser throw an error if you tried to attach a collation to a non-string type). As for inserting a collation OID in every single variant of Expr, the answer is no. It's hardly string specific. At least geometric types have the concept of multiple collations. Any datatype can have a set of useful collations, I'd hate to restrict this to just strings. Another objection to this design is that it's completely unclear that functions from text to text should necessarily yield the same collation that went into them, but if you treat collation as a hard-wired part of the expression syntax tree you aren't going to be able to do anything else. (What will you do about functions/operators taking more than one text argument?) Well, the standard has plenty of disscussion about that. If the operator is a comparison the two collations must be coerible to the same collation. If it's not and the result type isn't text then it doesn't matter. And if the result is text then you mark it as ambiguous and thus require the user to specify explicitly. I think it would be better to treat the collation indicator as part of string *values* and let it bubble up through expressions that way. I think the storage space considerations make that a difficult option. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] Cross-column statistics revisited
On Fri, Oct 17, 2008 at 7:54 PM, Nathan Boley [EMAIL PROTECTED] wrote: I'm still working my way around the math, but copulas sound better than anything else I've been playing with. I think the easiest way to think of them is, in 2-D finite spaces, they are just a plot of the order statistics against one another. Feel free to mail me off list if you have any math questions. I'm still working out what a copula is, and how I've got to figure out the stuff below, too! :) /me has reading to do... The treatment of copulae in one of my stats texts is much better than wikipedia, I though, so I'm progressing. They sound like an excellent potential solution the more I figure out what they are, for whatever my opinion is worth, but making sure they work decently for all data types, including those where there's no real notion of distance, may be interesting. I still need to go through backend/utils/adt/selfuncs.c to figure out exactly how we use the one-dimensional values. I've previously thought that, at least in the 2D case, we could use image compression algorithms to compress the copula, but recently I've realized that this is a change point problem. In terms of compression, we want to decompose the copula into regions that are as homogenous as possible. I'm not familiar with change point problems in multiple dimensions, but I'll try and ask someone that is, probably late next week. If you decide to go the copula route, I'd be happy to write the decomposition algorithm - or at least work on the theory. Finally, a couple points that I hadn't seen mentioned earlier that should probably be considered- 1) NULL's need to be treated specially - I suspect the assumption of NULL independence is worse than other independence assumptions. Maybe dealing with NULL dependence could be a half step towards full dependence calculations? Agreed, though how to treat them I have no idea. 2) Do we want to fold the MCV's into the dependence histogram? That will cause problems in our copula approach but I'd hate to have to keep an N^d histogram dependence relation in addition to the copula. Yeah, if we're already trying to figure out how to compress copulae, having also to compress MCV matrices seems painful and error-prone. But I'm not sure why it would cause problems to keep them in the copula -- is that just because we are most interested in the copula modeling the parts of the distribution that are most sparsely populated? 3) For equality selectivity estimates, I believe the assumption that the ndistinct value distribution is uniform in the histogram will become worse as the dimension increases. I proposed keeping track of ndistinct per histogram beckets earlier in the marginal case partially motivated by this exact scenario. Does that proposal make more sense in this case? If so we'd need to store two distributions - one of the counts and one of ndistinct. It's definitely worth investigating (or seeing if someone else has already published an investigation). There are probably several similar stats we could track and make use of, provided it didn't slow the planner too much to make use of them. 4) How will this approach deal with histogram buckets that have scaling count sizes ( ie -0.4 )? I'm not sure what you mean here. - Josh / eggyknap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Lisp as a procedural language?
Someone at the PostgreSQL West conference last weekend expressed an interest in a Lisp procedural language. The only two Lisp environments I've found so far that aren't GPL are Steel Bank Common Lisp (MIT, http://sbcl.sourceforge.net) and XLispStat (BSD, http://www.stat.uiowa.edu/~luke/xls/xlsinfo/xlsinfo.html). SBCL is a very active project, but I'm not sure about XLispStat. -- M. Edward (Ed) Borasky ruby-perspectives.blogspot.com A mathematician is a machine for turning coffee into theorems. -- Alfréd Rényi via Paul Erdős -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lisp as a procedural language?
From what I remember with tinkering with Lisp a while back, SBCL and CMUCL are the big free implementations. I remember something about GCL being non-standard. Either of those should make lisp hackers happy. 2008/10/18 M. Edward (Ed) Borasky [EMAIL PROTECTED] Someone at the PostgreSQL West conference last weekend expressed an interest in a Lisp procedural language. The only two Lisp environments I've found so far that aren't GPL are Steel Bank Common Lisp (MIT, http://sbcl.sourceforge.net) and XLispStat (BSD, http://www.stat.uiowa.edu/~luke/xls/xlsinfo/xlsinfo.htmlhttp://www.stat.uiowa.edu/%7Eluke/xls/xlsinfo/xlsinfo.html). SBCL is a very active project, but I'm not sure about XLispStat. -- M. Edward (Ed) Borasky ruby-perspectives.blogspot.com A mathematician is a machine for turning coffee into theorems. -- Alfréd Rényi via Paul Erdős -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lisp as a procedural language?
On Sat, 2008-10-18 at 20:43 -0400, Nikolas Everett wrote: From what I remember with tinkering with Lisp a while back, SBCL and CMUCL are the big free implementations. I remember something about GCL being non-standard. Either of those should make lisp hackers happy. GCL (and Clisp) are both reasonable implementations of Common Lisp. However, they are both GPL, which I think is an issue for PostgreSQL community members. CMUCL development more or less stalled out, and many of the heavyweights moved to Steel Bank Common Lisp (SBCL). It's kind of a joke -- Carnegie = Steel, Mellon = Bank, so Carnegie Mellon (University) Common Lisp = Steel Bank Common Lisp. :) In any event, SBCL is MIT-licensed, which is free of some of the more annoying GPL restrictions. BTW, I checked on XLispStat and it seems to be frozen in time -- most of the people who used to use XLispStat (including me) have moved on to R (which is GPL, unfortunately). -- M. Edward (Ed) Borasky ruby-perspectives.blogspot.com A mathematician is a machine for turning coffee into theorems. -- Alfréd Rényi via Paul Erdős -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lisp as a procedural language?
M. Edward (Ed) Borasky wrote: On Sat, 2008-10-18 at 20:43 -0400, Nikolas Everett wrote: From what I remember with tinkering with Lisp a while back, SBCL and CMUCL are the big free implementations. I remember something about GCL being non-standard. Either of those should make lisp hackers happy. GCL (and Clisp) are both reasonable implementations of Common Lisp. However, they are both GPL, which I think is an issue for PostgreSQL community members. CMUCL development more or less stalled out, and many of the heavyweights moved to Steel Bank Common Lisp (SBCL). It's kind of a joke -- Carnegie = Steel, Mellon = Bank, so Carnegie Mellon (University) Common Lisp = Steel Bank Common Lisp. :) In any event, SBCL is MIT-licensed, which is free of some of the more annoying GPL restrictions. BTW, I checked on XLispStat and it seems to be frozen in time -- most of the people who used to use XLispStat (including me) have moved on to R (which is GPL, unfortunately). We're almost certain not to be including a Lisp PL in the core distribution, so the license shouldn't be an issue (c.f. PL/R) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lisp as a procedural language?
M. Edward (Ed) Borasky [EMAIL PROTECTED] writes: GCL (and Clisp) are both reasonable implementations of Common Lisp. However, they are both GPL, which I think is an issue for PostgreSQL community members. Well, it would be an issue if we wanted to distribute PL/Lisp as part of the core; but I kinda doubt that there would be enough demand to justify that. As long as it's a separate project I don't see much wrong with depending on a GPL Lisp implementation, if you find that that's the best choice technically. CMUCL development more or less stalled out, and many of the heavyweights moved to Steel Bank Common Lisp (SBCL). It's kind of a joke -- Carnegie = Steel, Mellon = Bank, so Carnegie Mellon (University) Common Lisp = Steel Bank Common Lisp. :) Not that I've got anything against CMU software ;-) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers