[HACKERS] PGDay.it collation discussion notes

2008-10-18 Thread Gregory Stark

Radek, Zdenek, and Heikki had an extended discussion here at PGDay.it on
collation support. I was volunteered to be the note-taker (!). Here is the
plan we came up with:

Firstly, the ANSI standard makes collations into schema-qualified names which
pretty much forces us to have the collation catalog table.

The ANSI standard syntax where the COLLATION keyword can follow just about any
string value in an expression and then bubbles up the expression until an
operation needs to pick a collation seemed very weird to us. Hooking that into
the parser was pretty intimidating but we thought we could postpone that till
later and solve the meat of the problem first.

Instead, at least for now, we'll have a guc variable which decides the current
collation. Eventually that would just be the default if the query (or column
or whatever) doesn't override it with an explicitly selected collation. (This
would mean we need to treat the default collation like search_path for things
like plpgsql cached plans etc. Perhaps we could have a flag to indicate
whether plans actually need to be invalidated or are collation-independent)

In a query the current collation would be grabbed by the planner when it's
deciding on operators. The planner would grab that collation oid and track it
in the sort key along with the nulls first/last and asc/desc properties. 

It's important it do this immediately in the planner rather than just
reference the guc later because eventually it needs to possible to override it
at this point and also so that if you change the guc during the query
execution (ie, in a function) it would not affect the execution of your
current query.

pg_index would need an additional column to go along with indkey and
indoptions. It would be an oidvector of collation oids.

When planning a query or building an index we need to determine whether the
opclass is collation-aware or not. It seemed best that we not hard code which
data types are collation aware and instead look at either the operator or the
pg_proc entry for the function implementing the sort operator or perhaps the
btproc for the opclass to see if it takes a third argument. Or perhaps we
would have a boolean column in pg_opclass which asserts they do and that it
should be the collation. If so then the collation is stored in the sort key
and the indcollation slots and is passed to the operator or the btproc
function as the third argument.

We think even without the parser changes this would be useful enough to take
on its own. It would allow having indexes built in different collations and
have different sessions use different collations. You could even hack queries
which combine collations by defining a function handle subparts of the query
and attach the guc the same way we attach the search_path to security definer
functions.

To finish the feature the catalog needs to add a default collation to every
column and some other database objects. Then the parser needs to check those
objects first before falling back to the session variable. Then it needs to
bubble that value up as it builds expressions so that it's available at the
comparison operator or sort node. We didn't read the part of the spec which
covered the rules for this but it does have rules which should turn up if you
search for collation derivation.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: First integrated patch

2008-10-18 Thread Simon Riggs

On Fri, 2008-10-17 at 16:47 -0400, Merlin Moncure wrote:
 On Fri, Oct 17, 2008 at 10:38 AM, Simon Riggs [EMAIL PROTECTED] wrote:
 
  First integrated patch for Hot Standby, allowing queries to be executed
  while in recovery mode.
 
  The patch tests successfully with the enclosed files:
  * primary_setup_test.sql - run it on primary node
  * standby_allowed.sql - run on standby - should all succeed
  * standby_disallowed.sql - run on standby - should all fail
  plus other manual testing.
 
  This is still WIP - its good enough to release for comments, though I am
  not yet confident enough to claim it bug free.
 
  What this doesn't do YET:
  * cope fully with subxid cache overflows (some parts still to add)
  * cope with prepared transactions on master
  * work correctly when running queries AND replaying WAL
  * work correctly with regard to AccessExclusiveLocks, which should
  prevent access to tables
 
  These last four points are what I'm working on over the next two weeks,
  plus any other holes people point out along the way. I have worked out
  designs for most of these aspects and will discuss them on -hackers,
  though most design notes are in the Wiki. I'm still looking into
  prepared transactions.
 
  Comments appreciated.
 
 It appears to be working, at least in some fashion.  The supplied
 tests all pass.

Cool

Thanks for testing so far.

 At first glance it seems like I have to force changes to the standby
 with pg_switch_xlog().
 
 hmm.

You'll have to explain some more. Normally files don't get sent until
they are full, so yes, you would need to do a pg_switch_xlog().

This is not streaming replication. Others are working on that.

 This probably isn't right:
 postgres=# \d
 
 No relations found.
 postgres=# select count(*) from foo;
   count
 -
  100
 (1 row)
 
 I created a table, pg_switch_xlog, query several times,i dropped a
 table,  pg_switch_xlog, table is 'gone', but still returns data
 
 exit/enter session, now its gone.  Sometimes I have to exit/enter
 session to get an up to date standby.  These are just first
 impressions...

Replaying and queries don't mix yet, so that is expected. I'm working on
this in phases. This patch is phase 1 - it is not the finished patch.
Phase 2: working on correct block locking to allow concurrent DML
changes to occcur while we run queries.
Phase 3: working on correct relation locking/relcache to allow
concurrent DDL changes to occur while we run queries.
I have designs of the above and expect to complete in next two weeks.

The reason for the above behaviour is that DDL changes need to fire
relcache invalidation messages so that the query backend sees the
change. The reason the table is still there is because the files haven't
been dropped yet. So everything you have seen is expected, by me.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing some DDL Locks to ShareLock

2008-10-18 Thread Simon Riggs

On Tue, 2008-10-07 at 10:35 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Tue, 2008-10-07 at 10:05 -0400, Robert Haas wrote:
  3. The patch introduces a slight weirdness: if you create two FKs on the
  same column at the same time you end up with two constraints with
  identical names. Drop constraint then removes them both, though in other
  respects they are both valid, just not uniquely. CREATE INDEX avoids
  this by way of the unique index on relname. The equivalent index on
  pg_constraint is not unique, though *cannot* be made unique without
  breaking some corner cases of table inheritance.
  
  Urk... this seems pretty undesirable.
 
  OK, but please say what behaviour you would like in its place. 
 
 I wonder whether this could be helped if we refactored pg_constraint.
 The lack of a useful pkey for it has been annoying me for awhile,
 and I think it stems from a misguided choice to put table and domain
 constraints into the same catalog.  Suppose that
 
 * table constraints go into pg_relation_constraint, with a unique key
 on (conrelid, conname)
 
 * domain constraints go into pg_domain_constraint, with a unique key
 on (contypid, conname)
 
 * pg_constraint can still exist as a union view, for client
 compatibility
 
 Then the unique key would prevent concurrent creation of
 identically-named constraints for the same relation.

I'm planning to squeeze this refactoring work in now also before freeze,
unless somebody else wants to pick this up?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] adding collation to a SQL database

2008-10-18 Thread Dave Gudeman
Normally I would lurk on this list for a month or two before I considered
posting, but I saw a note on collation and thought some of you might be
interested in my experience implementing collation for the ANTs Data Server.

In ADS, we treated the collation internally as part of the type for string
types. In other words, char, varchar, and clob expressions and columns had
not only a length and a can-be-null flag, but also a collation. This let us
do a lot less work to add collations because we didn't have to munge the
expression code very much. We just adapted the code that does type
assignment and then the information was carried through the compiler in the
type info with no further work (except where the compiler actually had to
know about collation). We didn't need to mess with index selection, for
example, because it was handled correctly just based on types. And it let us
add collated indexes very easily just by adding the new type information. We
treated those weird collation expressions like type casts, which made them
easier to implement also.

Collated compares are very slow and it's worth quite a bit of effort to to
avoid them when possible (for example by putting them last in the list of
predicates to evaluate). ICU has a facility for prepocessing search strings
so they can be byte compared (which is much faster than normal collated
compares). You can construct a collated index by using proprocessed search
strings as the keys so that index searches are faster. We didn't do that in
ADS for several reasons: first, the preprocessed strings are bigger (about
30% as I recall) so you can get fewer keys in a node. Second, you don't do
really do a lot of comparisons in an index lookup. Third, ADS had an
optimization where it didn't read the data files if the index contained all
of the columns needed for the query, and that optimization would not work
with these sorts of collation indexes. Still, the parameters of Postgresql
are quite a bit different so it might be worth considering.

Partly for backward compatibility and partly because collated compares are
so slow, ADS had a default collation that was just a normal ascii compare.
Since strings were all encoded in UTF-8, the collation order was incorrect
if there were any multi-byte characters, but in English-only columns, or
columns where they just wanted a reproducible ordering and didn't care that
much about language rules, it was much faster than the ICU compare.

regards,
David Gudeman


Re: [HACKERS] PGDay.it collation discussion notes

2008-10-18 Thread Martijn van Oosterhout
On Sat, Oct 18, 2008 at 07:41:12AM +0100, Gregory Stark wrote:
 The ANSI standard syntax where the COLLATION keyword can follow just about any
 string value in an expression and then bubbles up the expression until an
 operation needs to pick a collation seemed very weird to us. Hooking that into
 the parser was pretty intimidating but we thought we could postpone that till
 later and solve the meat of the problem first.

It's quite straightforward. Every expression has a collation, the COLLATE
keyword just overrides it. And the collation is a parameter of the
operators/functions that want to use it. Implementation is also
straightforward: add expr :: expr COLLATE ident to the bison code and a
collation oid to the Expr node. The code to bubble up the collation ids
would be in the patch I posted a few years back.

 pg_index would need an additional column to go along with indkey and
 indoptions. It would be an oidvector of collation oids.

Have you decided what the collation oid will refer to? If the index is
a descending index, does that affect the collation oid?

 When planning a query or building an index we need to determine whether the
 opclass is collation-aware or not. It seemed best that we not hard code which
 data types are collation aware and instead look at either the operator or the
 pg_proc entry for the function implementing the sort operator or perhaps the
 btproc for the opclass to see if it takes a third argument. Or perhaps we
 would have a boolean column in pg_opclass which asserts they do and that it
 should be the collation. If so then the collation is stored in the sort key
 and the indcollation slots and is passed to the operator or the btproc
 function as the third argument.

The way I approached it was to just mark the opclass (perhaps opfamily
would be better now, it didn't exist at the time). Additionally you
will need to mark the individual operators/functions as to whether
they're sensetive to the collation or not. This is needed because an
Error state in the collation oid is only an error if applied to an
operator that cares. This might not be needed in the first
implementation, though.

 We think even without the parser changes this would be useful enough to take
 on its own. It would allow having indexes built in different collations and
 have different sessions use different collations. You could even hack queries
 which combine collations by defining a function handle subparts of the query
 and attach the guc the same way we attach the search_path to security definer
 functions.

Ugh. I think just implementing the parser changes will be less work.

 To finish the feature the catalog needs to add a default collation to every
 column and some other database objects. Then the parser needs to check those
 objects first before falling back to the session variable. Then it needs to
 bubble that value up as it builds expressions so that it's available at the
 comparison operator or sort node. We didn't read the part of the spec which
 covered the rules for this but it does have rules which should turn up if you
 search for collation derivation.

Once you've added the default collation to every object, the session
variable becomes redundant. Even unknown text strings will have a
collation oid as soon as they are cast to text, since the text type
will have a default collation (just like it has a default operator
class).

Nice to see some progress being made here.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


[HACKERS] ALTER TABLE ... SET DATA TYPE (SQL:2008)

2008-10-18 Thread Peter Eisentraut

Here is a patch that allows the new SQL:2008 syntax (also used by IBM)

ALTER TABLE tab ALTER COLUMN col SET DATA TYPE typ

alongside our current syntax

ALTER TABLE tab ALTER COLUMN col TYPE typ

I verified that we implement a superset what the standard says.  (Of 
course, the standard doesn't suport the USING clause.)


There was some key word overlap with ecpg, so the patch looks bulkier 
than it really is.
diff -ur -x CVS ../cvs-pgsql/doc/src/sgml/ref/alter_table.sgml 
./doc/src/sgml/ref/alter_table.sgml
--- ../cvs-pgsql/doc/src/sgml/ref/alter_table.sgml  2008-05-10 
01:32:03.0 +0200
+++ ./doc/src/sgml/ref/alter_table.sgml 2008-10-18 10:30:41.0 +0200
@@ -33,7 +33,7 @@
 
 ADD [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable 
replaceable class=PARAMETERtype/replaceable [ replaceable 
class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
 DROP [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable [ 
RESTRICT | CASCADE ]
-ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable TYPE 
replaceable class=PARAMETERtype/replaceable [ USING replaceable 
class=PARAMETERexpression/replaceable ]
+ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable [ SET 
DATA ] TYPE replaceable class=PARAMETERtype/replaceable [ USING 
replaceable class=PARAMETERexpression/replaceable ]
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable SET 
DEFAULT replaceable class=PARAMETERexpression/replaceable
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable DROP 
DEFAULT
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable { SET 
| DROP } NOT NULL
@@ -93,7 +93,7 @@
/varlistentry
 
varlistentry
-termliteralALTER COLUMN TYPE/literal/term
+termliteralSET DATA TYPE/literal/term
 listitem
  para
   This form changes the type of a column of a table. Indexes and
@@ -760,7 +760,7 @@
with time zone/type via a literalUSING/literal clause:
 programlisting
 ALTER TABLE foo
-ALTER COLUMN foo_timestamp TYPE timestamp with time zone
+ALTER COLUMN foo_timestamp SET DATA TYPE timestamp with time zone
 USING
 timestamp with time zone 'epoch' + foo_timestamp * interval '1 second';
 /programlisting
@@ -868,8 +868,9 @@
   titleCompatibility/title
 
   para
-   The literalADD/literal, literalDROP/, and literalSET DEFAULT/
-   forms conform with the SQL standard.  The other forms are
+   The forms literalADD/literal, literalDROP/, literalSET DEFAULT/,
+   and literalSET DATA TYPE/literal (without literalUSING/literal)
+   conform with the SQL standard.  The other forms are
productnamePostgreSQL/productname extensions of the SQL standard.
Also, the ability to specify more than one manipulation in a single
commandALTER TABLE/ command is an extension.
diff -ur -x CVS ../cvs-pgsql/src/backend/catalog/sql_features.txt 
./src/backend/catalog/sql_features.txt
--- ../cvs-pgsql/src/backend/catalog/sql_features.txt   2008-10-18 
11:56:26.0 +0200
+++ ./src/backend/catalog/sql_features.txt  2008-10-18 10:23:46.0 
+0200
@@ -236,7 +236,7 @@
 F381   Extended schema manipulation01  ALTER TABLE statement: ALTER 
COLUMN clause  YES 
 F381   Extended schema manipulation02  ALTER TABLE statement: ADD 
CONSTRAINT clauseYES 
 F381   Extended schema manipulation03  ALTER TABLE statement: DROP 
CONSTRAINT clause   YES 
-F382   Alter column data type  NO  PostgreSQL syntax 
differs
+F382   Alter column data type  YES 
 F391   Long identifiersYES 
 F392   Unicode escapes in identifiers  NO  
 F393   Unicode escapes in literals NO  
diff -ur -x CVS ../cvs-pgsql/src/backend/parser/gram.y 
./src/backend/parser/gram.y
--- ../cvs-pgsql/src/backend/parser/gram.y  2008-10-15 13:19:39.0 
+0200
+++ ./src/backend/parser/gram.y 2008-10-18 10:14:05.0 +0200
@@ -304,7 +304,7 @@
 
 %type boolean copy_from
 
-%type ival   opt_column event cursor_options opt_hold
+%type ival   opt_column event cursor_options opt_hold opt_set_data
 %type objtypereindex_type drop_type comment_type
 
 %type node   fetch_direction select_limit_value select_offset_value
@@ -407,7 +407,7 @@
CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE 
CURRENT_ROLE
CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
 
-   DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
+   DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC
DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P 
DROP
 
@@ -1534,16 +1534,16 @@
$$ = (Node *)n;
}
/*
-* ALTER TABLE name ALTER [COLUMN] 

Re: [HACKERS] Hot Standby: First integrated patch

2008-10-18 Thread Merlin Moncure
On Sat, Oct 18, 2008 at 4:11 AM, Simon Riggs [EMAIL PROTECTED] wrote:

 On Fri, 2008-10-17 at 16:47 -0400, Merlin Moncure wrote:
 On Fri, Oct 17, 2008 at 10:38 AM, Simon Riggs [EMAIL PROTECTED] wrote:
 
  First integrated patch for Hot Standby, allowing queries to be executed
  while in recovery mode.
 
  The patch tests successfully with the enclosed files:
  * primary_setup_test.sql - run it on primary node
  * standby_allowed.sql - run on standby - should all succeed
  * standby_disallowed.sql - run on standby - should all fail
  plus other manual testing.
 
  This is still WIP - its good enough to release for comments, though I am
  not yet confident enough to claim it bug free.
 
  What this doesn't do YET:
  * cope fully with subxid cache overflows (some parts still to add)
  * cope with prepared transactions on master
  * work correctly when running queries AND replaying WAL
  * work correctly with regard to AccessExclusiveLocks, which should
  prevent access to tables
 
  These last four points are what I'm working on over the next two weeks,
  plus any other holes people point out along the way. I have worked out
  designs for most of these aspects and will discuss them on -hackers,
  though most design notes are in the Wiki. I'm still looking into
  prepared transactions.
 
  Comments appreciated.

 It appears to be working, at least in some fashion.  The supplied
 tests all pass.

 Cool

 Thanks for testing so far.

 At first glance it seems like I have to force changes to the standby
 with pg_switch_xlog().

 hmm.

 You'll have to explain some more. Normally files don't get sent until
 they are full, so yes, you would need to do a pg_switch_xlog().

 This is not streaming replication. Others are working on that.

right...this was expected.  less the missing parts, things are working
very well.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDay.it collation discussion notes

2008-10-18 Thread Tom Lane
Martijn van Oosterhout [EMAIL PROTECTED] writes:
 It's quite straightforward. Every expression has a collation, the COLLATE
 keyword just overrides it. And the collation is a parameter of the
 operators/functions that want to use it. Implementation is also
 straightforward: add expr :: expr COLLATE ident to the bison code and a
 collation oid to the Expr node.

It's fairly irritating to think that a string-specific option is going
to become part of the fundamental type system --- it makes no sense to
distinguish different collations for numeric for instance (and in fact
I would want to see the parser throw an error if you tried to attach a
collation to a non-string type).  As for inserting a collation OID in
every single variant of Expr, the answer is no.

Another objection to this design is that it's completely unclear that
functions from text to text should necessarily yield the same collation
that went into them, but if you treat collation as a hard-wired part of
the expression syntax tree you aren't going to be able to do anything else.
(What will you do about functions/operators taking more than one text
argument?)

I think it would be better to treat the collation indicator as part of
string *values* and let it bubble up through expressions that way.
The expr COLLATE ident syntax would be a simple run-time operation
that pokes a new collation into a string value.  The notion of a column
having a particular collation would then amount to a check constraint on
the values going into the column.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] two servers on the same port

2008-10-18 Thread Eric Haszlakiewicz

I just spent a couple of days trying to figure out why I couldn't start
two servers on the same port, even though I was configuring separate
listen_address values.  I kept gettting errors about shmget failing with
could not create shared memory segment: Invalid argument.

I finally noticed that the shared memory key mentioned in the error when 
starting the second server was the same as what the first server was
 using, which appeared to be generated based off of the port number.

Sure enough when I changed the port, it used a different shared memory
key and started right up.  After searching around on the web a bit
I found some pages that suggested running under different userids
might be necessary.  So, I tried that, and changed the port back to the
standard 5432, and it started up.

Anyway, everything seems to be working fine, but I figured this info
should be a bit easier to find, so here's a couple patches to the
documentation to mention how this works.

eric
--- doc/src/sgml/config.sgml.orig   2008-10-18 00:08:50.0 -0500
+++ doc/src/sgml/config.sgml2008-10-18 00:10:58.0 -0500
@@ -337,6 +337,12 @@
 same port number is used for all IP addresses the server listens on.
 This parameter can only be set at server start.
/para
+   para
+This setting also determines the key used for the shared memory
+segment.  Because of that, two servers can not be started on the
+same port, even if they have different listen_addresses, unless
+they are also running under two different userids.
+   /para
   /listitem
  /varlistentry
 
--- doc/src/sgml/runtime.sgml.orig  2008-10-18 00:05:37.0 -0500
+++ doc/src/sgml/runtime.sgml   2008-10-18 00:08:37.0 -0500
@@ -401,6 +401,15 @@
 /para
 
 para
+You can also get this error if you try to start two servers on the
+same machine, on the same port, even if you specify different
+listen_address values.  In order for that configuration to work,
+you'll need to run the servers under different userids, which will
+cause productnamePostgreSQL/productname to use different
+shared memory keys.
+   /para
+
+para
  An error like
 screen
 FATAL:  could not create semaphores: No space left on device

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDay.it collation discussion notes

2008-10-18 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 It's fairly irritating to think that a string-specific option is going
 to become part of the fundamental type system --- it makes no sense to
 distinguish different collations for numeric for instance

Actually I thought of that generality as an advantage. Just because we can't
think of any right now doesn't mean there aren't applications of this. The
only example I could think of was a comparison operator on numeric which
specifies a significant precision. That doesn't sound super useful but there
are a lot of data types out there and I don't see any reason to think text is
the only one in the world that will have more than one reasonable ordering.

 I think it would be better to treat the collation indicator as part of
 string *values* and let it bubble up through expressions that way.
 The expr COLLATE ident syntax would be a simple run-time operation
 that pokes a new collation into a string value.  The notion of a column
 having a particular collation would then amount to a check constraint on
 the values going into the column.

I'm not super familiar with the spec here but from what I understood I think
this would be very different.

For instance, I think you need to be able to set the default collation on a
whole column after the fact. Rewriting the whole table to handle a collation
change seems like a non-starter.

Also, if the column doesn't have a default collation specified then you need
to use the default collation for a more general object -- I'm not sure if it's
table or schema next.

Thirdly, to handle resolving conflicting default collations you need to track
where the source of the default collation was. Ie, whether it was a default or
an explicit choice by the query.

Collation isn't really a property of the text at all. This design would force
the sources of text to pick a collation that will be used by other parts of
the application that they know nothing about. How is an DBA using COPY to
populate a table going to know what collation the web app which eventually
uses the data in that table will want to use?

The other side of the coin is that given the spec-compliant behaviour you can
always emulate the behaviour you're describing by adding another column. It
would be more useful too since you'll have a language column which may be
useful independently from the text content.

And of course the scheme you're describing would waste a huge amount of space
in every string on disk. For short strings it could triple the amount of space
(plus I think the explicit vs implicit collation would make it even worse).

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] two servers on the same port

2008-10-18 Thread Tom Lane
Eric Haszlakiewicz [EMAIL PROTECTED] writes:
 I just spent a couple of days trying to figure out why I couldn't start
 two servers on the same port, even though I was configuring separate
 listen_address values.

That's already documented not to work, and not for any hidden
implementation reason: you'd have a conflict on the Unix-domain socket
name.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] two servers on the same port

2008-10-18 Thread Andrew Dunstan



Tom Lane wrote:

Eric Haszlakiewicz [EMAIL PROTECTED] writes:
  

I just spent a couple of days trying to figure out why I couldn't start
two servers on the same port, even though I was configuring separate
listen_address values.



That's already documented not to work, and not for any hidden
implementation reason: you'd have a conflict on the Unix-domain socket
name.


  


unless you use a different socket directory.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] two servers on the same port

2008-10-18 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 That's already documented not to work, and not for any hidden
 implementation reason: you'd have a conflict on the Unix-domain socket
 name.

 unless you use a different socket directory.

Hmm ... but the OP didn't mention any such thing.  In any case I think
he's misdiagnosed his problem, because the shmem code *should* ignore
pre-existing shmem segments that are already in use --- see the loop in
PGSharedMemoryCreate.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGDay.it collation discussion notes

2008-10-18 Thread Martijn van Oosterhout
On Sat, Oct 18, 2008 at 11:28:41AM -0400, Tom Lane wrote:
 It's fairly irritating to think that a string-specific option is going
 to become part of the fundamental type system --- it makes no sense to
 distinguish different collations for numeric for instance (and in fact
 I would want to see the parser throw an error if you tried to attach a
 collation to a non-string type).  As for inserting a collation OID in
 every single variant of Expr, the answer is no.

It's hardly string specific. At least geometric types have the concept
of multiple collations. Any datatype can have a set of useful
collations, I'd hate to restrict this to just strings.

 Another objection to this design is that it's completely unclear that
 functions from text to text should necessarily yield the same collation
 that went into them, but if you treat collation as a hard-wired part of
 the expression syntax tree you aren't going to be able to do anything else.
 (What will you do about functions/operators taking more than one text
 argument?)

Well, the standard has plenty of disscussion about that. If the
operator is a comparison the two collations must be coerible to the
same collation. If it's not and the result type isn't text then it
doesn't matter. And if the result is text then you mark it as ambiguous
and thus require the user to specify explicitly.

 I think it would be better to treat the collation indicator as part of
 string *values* and let it bubble up through expressions that way.

I think the storage space considerations make that a difficult option.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] Cross-column statistics revisited

2008-10-18 Thread Joshua Tolley
On Fri, Oct 17, 2008 at 7:54 PM, Nathan Boley [EMAIL PROTECTED] wrote:
 I'm still working my way around the math, but copulas sound better
 than anything else I've been playing with.

 I think the easiest way to think of them is, in 2-D finite spaces,
 they are just a plot of the order statistics against one another. Feel
 free to mail me off list if you have any math questions.

I'm still working out what a copula is, and how I've got to figure out
the stuff below, too! :) /me has reading to do... The treatment of
copulae in one of my stats texts is much better than wikipedia, I
though, so I'm progressing. They sound like an excellent potential
solution the more I figure out what they are, for whatever my opinion
is worth, but making sure they work decently for all data types,
including those where there's no real notion of distance, may be
interesting. I still need to go through backend/utils/adt/selfuncs.c
to figure out exactly how we use the one-dimensional values.

 I've previously thought that, at least in the 2D case, we could use
 image compression algorithms to compress the copula, but recently I've
 realized that this is a change point problem. In terms of compression,
 we want to decompose the copula into regions that are as homogenous as
 possible.  I'm not familiar with change point problems in multiple
 dimensions, but I'll try and ask someone that is, probably late next
 week. If you decide to go the copula route, I'd be happy to write the
 decomposition algorithm - or at least work on the theory.

 Finally, a couple points that I hadn't seen mentioned earlier that
 should probably be considered-

 1) NULL's need to be treated specially - I suspect the assumption of
 NULL independence is worse than other independence assumptions. Maybe
 dealing with NULL dependence could be a half step towards full
 dependence calculations?

Agreed, though how to treat them I have no idea.


 2) Do we want to fold the MCV's into the dependence histogram? That
 will cause problems in our copula approach but I'd hate to have to
 keep an N^d histogram dependence relation in addition to the copula.

Yeah, if we're already trying to figure out how to compress copulae,
having also to compress MCV matrices seems painful and error-prone.
But I'm not sure why it would cause problems to keep them in the
copula -- is that just because we are most interested in the copula
modeling the parts of the distribution that are most sparsely
populated?

 3) For equality selectivity estimates, I believe the assumption that
 the ndistinct value distribution is uniform in the histogram will
 become worse as the dimension increases. I proposed keeping track of
 ndistinct per histogram beckets earlier in the marginal case partially
 motivated by this exact scenario. Does that proposal make more sense
 in this case? If so we'd need to store two distributions - one of the
 counts and one of ndistinct.

It's definitely worth investigating (or seeing if someone else has
already published an investigation). There are probably several
similar stats we could track and make use of, provided it didn't slow
the planner too much to make use of them.

 4) How will this approach deal with histogram buckets that have
 scaling count sizes ( ie -0.4 )?

I'm not sure what you mean here.

- Josh / eggyknap

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Lisp as a procedural language?

2008-10-18 Thread M. Edward (Ed) Borasky
Someone at the PostgreSQL West conference last weekend expressed an
interest in a Lisp procedural language. The only two Lisp environments
I've found so far that aren't GPL are Steel Bank Common Lisp (MIT,
http://sbcl.sourceforge.net) and XLispStat (BSD,
http://www.stat.uiowa.edu/~luke/xls/xlsinfo/xlsinfo.html). SBCL is a
very active project, but I'm not sure about XLispStat. 
-- 
M. Edward (Ed) Borasky
ruby-perspectives.blogspot.com

A mathematician is a machine for turning coffee into theorems. --
Alfréd Rényi via Paul Erdős



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lisp as a procedural language?

2008-10-18 Thread Nikolas Everett
From what I remember with tinkering with Lisp a while back, SBCL and CMUCL
are the big free implementations.  I remember something about GCL being
non-standard.  Either of those should make lisp hackers happy.

2008/10/18 M. Edward (Ed) Borasky [EMAIL PROTECTED]

 Someone at the PostgreSQL West conference last weekend expressed an
 interest in a Lisp procedural language. The only two Lisp environments
 I've found so far that aren't GPL are Steel Bank Common Lisp (MIT,
 http://sbcl.sourceforge.net) and XLispStat (BSD,
 http://www.stat.uiowa.edu/~luke/xls/xlsinfo/xlsinfo.htmlhttp://www.stat.uiowa.edu/%7Eluke/xls/xlsinfo/xlsinfo.html).
 SBCL is a
 very active project, but I'm not sure about XLispStat.
 --
 M. Edward (Ed) Borasky
 ruby-perspectives.blogspot.com

 A mathematician is a machine for turning coffee into theorems. --
 Alfréd Rényi via Paul Erdős



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Lisp as a procedural language?

2008-10-18 Thread M. Edward (Ed) Borasky
On Sat, 2008-10-18 at 20:43 -0400, Nikolas Everett wrote:
 From what I remember with tinkering with Lisp a while back, SBCL and
 CMUCL are the big free implementations.  I remember something about
 GCL being non-standard.  Either of those should make lisp hackers
 happy.

GCL (and Clisp) are both reasonable implementations of Common Lisp.
However, they are both GPL, which I think is an issue for PostgreSQL
community members. CMUCL development more or less stalled out, and many
of the heavyweights moved to Steel Bank Common Lisp (SBCL). It's kind of
a joke -- Carnegie = Steel, Mellon = Bank, so Carnegie Mellon
(University) Common Lisp = Steel Bank Common Lisp. :)

In any event, SBCL is MIT-licensed, which is free of some of the more
annoying GPL restrictions. BTW, I checked on XLispStat and it seems to
be frozen in time -- most of the people who used to use XLispStat
(including me) have moved on to R (which is GPL, unfortunately).
--
M. Edward (Ed) Borasky
ruby-perspectives.blogspot.com

A mathematician is a machine for turning coffee into theorems. --
Alfréd Rényi via Paul Erdős


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lisp as a procedural language?

2008-10-18 Thread Andrew Dunstan



M. Edward (Ed) Borasky wrote:

On Sat, 2008-10-18 at 20:43 -0400, Nikolas Everett wrote:
  

From what I remember with tinkering with Lisp a while back, SBCL and
CMUCL are the big free implementations.  I remember something about
GCL being non-standard.  Either of those should make lisp hackers
happy.



GCL (and Clisp) are both reasonable implementations of Common Lisp.
However, they are both GPL, which I think is an issue for PostgreSQL
community members. CMUCL development more or less stalled out, and many
of the heavyweights moved to Steel Bank Common Lisp (SBCL). It's kind of
a joke -- Carnegie = Steel, Mellon = Bank, so Carnegie Mellon
(University) Common Lisp = Steel Bank Common Lisp. :)

In any event, SBCL is MIT-licensed, which is free of some of the more
annoying GPL restrictions. BTW, I checked on XLispStat and it seems to
be frozen in time -- most of the people who used to use XLispStat
(including me) have moved on to R (which is GPL, unfortunately).

  


We're almost certain not to be including a Lisp PL in the core 
distribution, so the license shouldn't be an issue (c.f. PL/R)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lisp as a procedural language?

2008-10-18 Thread Tom Lane
M. Edward (Ed) Borasky [EMAIL PROTECTED] writes:
 GCL (and Clisp) are both reasonable implementations of Common Lisp.
 However, they are both GPL, which I think is an issue for PostgreSQL
 community members.

Well, it would be an issue if we wanted to distribute PL/Lisp as part of
the core; but I kinda doubt that there would be enough demand to justify
that.  As long as it's a separate project I don't see much wrong with
depending on a GPL Lisp implementation, if you find that that's the best
choice technically.

 CMUCL development more or less stalled out, and many
 of the heavyweights moved to Steel Bank Common Lisp (SBCL). It's kind of
 a joke -- Carnegie = Steel, Mellon = Bank, so Carnegie Mellon
 (University) Common Lisp = Steel Bank Common Lisp. :)

Not that I've got anything against CMU software ;-)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers