date:20090401

Re: [HACKERS] Sort a column that does not exist

2009-04-01 Thread Hitoshi Harada

2009/4/1 Werner Echezuria werc...@gmail.com:
 As you can see if someone do this: SELECT * FROM table WHERE
 field=some_value ORDER BY grmemb, postgresql creates a new target entry and
 then assigned to the targetlist as a sort node. I know that it creates the
 node on the parser, but it does not work, it seems the executor don't see
 it.

See include/nodes/primnodes.h around line 1075:
boolresjunk;/* set to true to eliminate the 
attribute from
 * final target 
list */

If the TargetEntry is set resjunk = false, the final result is
filtered as junk. So more accurately the executor sees but drops it.


 How could I sort a column like this?, I know i'm missing something, but i
 just don't see it. What is the process to sort a column?


Use makeTargetEntry in makefuncs.c
TargetEntry *
makeTargetEntry(Expr *expr,
AttrNumber resno,
char *resname,
bool resjunk)

by the 4th argument you can set resjunk = false if you don't want it
to be in the result.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Heikki Linnakangas


Tom Lane wrote:

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

Tom Lane wrote:

Maybe use a special string Translate Me First that
doesn't actually need to be end-user-visible, just so no one sweats over
getting it right in context.


Yep, something like that. There seems to be a magic empty string 
translation at the beginning of every po file that returns the 
meta-information about the translation, like translation author and 
date. Assuming that works reliably, I'll use that.


At first that sounded like an ideal answer, but I can see a gotcha:
suppose the translation's author's name contains some characters that
don't convert to the database encoding.  I suppose that would result in
failure, when we'd prefer it not to.  A single-purpose string could be
documented as whatever you translate this to should be pure ASCII,
never mind if it's sensible.


I just tried that, and it seems that gettext() does transliteration, so 
any characters that have no counterpart in the database encoding will be 
replaced with something similar, or question marks. Assuming that's 
universal across platforms, and I think it is, using the empty string 
should work.


It also means that you can use lc_messages='ja' with 
server_encoding='latin1', but it will be unreadable because all the 
non-ascii characters are replaced with question marks. For something 
like lc_messages='es_ES' and server_encoding='koi8-r', it will still 
look quite nice.


Attached is a patch I've been testing. Seems to work quite well. It 
would be nice if someone could test it on Windows, which seems to be a 
bit special in this regard.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 118a6fe..390a7cf 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -290,6 +290,7 @@ locale_messages_assign(const char *value, bool doit, GucSource source)
 		if (!pg_perm_setlocale(LC_MESSAGES, value))
 			if (source != PGC_S_DEFAULT)
 return NULL;
+		pg_init_gettext_codeset();
 	}
 #ifndef WIN32
 	else
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 03d86ca..47ebe1b 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -1242,7 +1242,7 @@ pg_bindtextdomain(const char *domain)
 
 		get_locale_path(my_exec_path, locale_path);
 		bindtextdomain(domain, locale_path);
-		pg_bind_textdomain_codeset(domain, GetDatabaseEncoding());
+		pg_register_textdomain(domain);
 	}
 #endif
 }
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index bf66321..970cb83 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -842,46 +842,6 @@ cliplen(const char *str, int len, int limit)
 	return l;
 }
 
-#if defined(ENABLE_NLS)  defined(WIN32)
-static const struct codeset_map {
-	int	encoding;
-	const char *codeset;
-} codeset_map_array[] = {
-	{PG_UTF8, UTF-8},
-	{PG_LATIN1, LATIN1},
-	{PG_LATIN2, LATIN2},
-	{PG_LATIN3, LATIN3},
-	{PG_LATIN4, LATIN4},
-	{PG_ISO_8859_5, ISO-8859-5},
-	{PG_ISO_8859_6, ISO_8859-6},
-	{PG_ISO_8859_7, ISO-8859-7},
-	{PG_ISO_8859_8, ISO-8859-8},
-	{PG_LATIN5, LATIN5},
-	{PG_LATIN6, LATIN6},
-	{PG_LATIN7, LATIN7},
-	{PG_LATIN8, LATIN8},
-	{PG_LATIN9, LATIN-9},
-	{PG_LATIN10, LATIN10},
-	{PG_KOI8R, KOI8-R},
-	{PG_WIN1250, CP1250},
-	{PG_WIN1251, CP1251},
-	{PG_WIN1252, CP1252},
-	{PG_WIN1253, CP1253},
-	{PG_WIN1254, CP1254},
-	{PG_WIN1255, CP1255},
-	{PG_WIN1256, CP1256},
-	{PG_WIN1257, CP1257},
-	{PG_WIN1258, CP1258},
-	{PG_WIN866, CP866},
-	{PG_WIN874, CP874},
-	{PG_EUC_CN, EUC-CN},
-	{PG_EUC_JP, EUC-JP},
-	{PG_EUC_KR, EUC-KR},
-	{PG_EUC_TW, EUC-TW},
-	{PG_EUC_JIS_2004, EUC-JP}
-};
-#endif /* WIN32 */
-
 void
 SetDatabaseEncoding(int encoding)
 {
@@ -892,28 +852,132 @@ SetDatabaseEncoding(int encoding)
 	Assert(DatabaseEncoding-encoding == encoding);
 
 #ifdef ENABLE_NLS
-	pg_bind_textdomain_codeset(textdomain(NULL), encoding);
+	pg_init_gettext_codeset();
+	pg_register_textdomain(textdomain(NULL));
 #endif
 }
 
+static char **registered_textdomains = NULL;
+static const char *system_codeset = invalid;
+
 /*
- * On Windows, we need to explicitly bind gettext to the correct
- * encoding, because gettext() tends to get confused.
+ * Register a gettext textdomain with the backend. We will call
+ * bind_textdomain_codeset() for it to ensure that translated strings
+ * are returned in the right encoding.
  */
 void
-pg_bind_textdomain_codeset(const char *domainname, int encoding)
+pg_register_textdomain(const char *domainname)
 {
-#if defined(ENABLE_NLS)  defined(WIN32)
+#if defined(ENABLE_NLS)
 	int i;
+	MemoryContext old_cxt;
+
+	old_cxt = MemoryContextSwitchTo(TopMemoryContext);
+	if (registered_textdomains == NULL)
+	{
+		registered_textdomains = palloc(sizeof(char *) * 1);
+		registered_textdomains[0] = NULL;
+	}
 
-	for (i = 0; i  lengthof(codeset_map_array);

Re: [HACKERS] WIP: transformation hook modules and JSON support

2009-04-01 Thread Alvaro Herrera

Pavel Stehule escribió:
 Hello
 
 I am sending samples of transformation hook modules. One module is
 JSON support:.
 
 From these modules only JSON support has general usage - so only JSON
 should be integrated to core.

I'm only seeing trivial examples below, where you form the JSON objects
by plastering literals together.  Does this work on a scenario where the
values come from a table?

The question is not at all theoretical -- right now our
archives.postgresql.org site uses a JSON file that's just a dump of a
table in a database.  This file contains a list of lists, and a number
of properties for each (name, group it belongs to, description).
Obviously each one needs its label too.

Right now we generate this with a JSON Perl module.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Alvaro Herrera

Tom Lane wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:

  One problem with this idea is that it may be hard to coerce gettext into
  putting a particular string at the top of the file :-(
 
 I doubt we can, which is why the documentation needs to tell translators
 about it.

I doubt that documenting the issue will be enough (in fact I'm pretty
sure it won't).  Maybe we can just supply the string translated in our
POT files, and add a comment that the translator is not supposed to
touch it.  This doesn't seem all that difficult -- I think it just
requires that we add a msgmerge step to make update-po that uses a
file on which the message has already been translated.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: transformation hook modules and JSON support

2009-04-01 Thread Pavel Stehule

2009/4/1 Alvaro Herrera alvhe...@commandprompt.com:
 Pavel Stehule escribió:
 Hello

 I am sending samples of transformation hook modules. One module is
 JSON support:.

 From these modules only JSON support has general usage - so only JSON
 should be integrated to core.

 I'm only seeing trivial examples below, where you form the JSON objects
 by plastering literals together.  Does this work on a scenario where the
 values come from a table?

what do you thing?

My implementation works similar like SQL/XML functions - so of course,
you can read data from tables. But actually, these functions are not
100% optimised.


 The question is not at all theoretical -- right now our
 archives.postgresql.org site uses a JSON file that's just a dump of a
 table in a database.  This file contains a list of lists, and a number
 of properties for each (name, group it belongs to, description).
 Obviously each one needs its label too.

nested JSON structures are possible.

Pavel Stehule


 Right now we generate this with a JSON Perl module.

 --
 Alvaro Herrera                                http://www.CommandPrompt.com/
 The PostgreSQL Company - Command Prompt, Inc.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Sam Mason

 [ oops, forgot to send this to -hackers before ]

On Tue, Mar 31, 2009 at 05:08:45PM +0100, Greg Stark wrote:
 Both interpretations are clearly consistent but it depends on whether
 you think it's a bunch of text strings concatenated together or if
 it's a list of objects.
 
 The example of string_to_array('',',')::int[] is relevant to this
 point. The whole there's one empty element only makes sense if
 you're thinking in terms of string processing. If it's a list of any
 other kind of object it probably doesn't make sense; you can't say
 there's one empty integer or one empty composite object or one empty
 anything else.

I think this is about the only sensible option, but my reasoning is
somewhat different.

My original intuition was that array_to_string and string_to_array
should be (perfect) inverses of each other.  Unfortunately I can't see
any way to get this to happen; zero length arrays or NULL elements
in the array mean than array_to_string isn't injective.  This means
that the composition of the two functions won't result in an injective
function and my original premise is false.  Note that as far as I can
tell string_to_array is injective.  I'm assuming that the delimiter
won't appear as part of an element of the array; e.g. an array of
integers and space as a delimiter is OK, but using the same delimiter
with unconstrained text is not OK, a blank delimiter is never OK as it's
always part of a string.

Injective means there exists more than one array that encodes to the
same string.  The examples being how do you sanely encode '{}' and
'{NULL}' in a unique way; '{}' is a bad example because it's just
an artifact of how strings are represented.  The complications needed
to allow this to happen would make it a very similar function as the
array's normal output_function function and hence wouldn't serve a
useful purpose.  All of this implies that we have to make a compromise
somewhere.

The semantics that most closely match the existing behaviour would be;
for array_to_string:

  1) remove NULL values from input array
  2) call output_function on remaining elements
  3) intersperse[1] the delimiter between the remaining elements
  4) concatenate the resulting array

for string_to_array:

  1) check if input is zero length; return empty array
  2) split array based on delimiter and return

Note that both functions are STRICT; i.e. a NULL for either parameter
should cause the function to return NULL.  Arguably in string_to_array
it could be non-strict if the input string is empty, but it's probably
worth keeping it strict to simplify the semantics.

Here are some examples:

  array_to_string('{}'::TEXT[],',')  = ''
  array_to_string('{NULL}'::TEXT[],',')  = ''
  array_to_string('{NULL,NULL}'::TEXT[],',') = ''
  array_to_string('{a,NULL}'::TEXT[],',')= 'a'
  array_to_string('{NULL,a}'::TEXT[],',')= 'a'
  array_to_string('{a,b}'::TEXT[],',')   = 'a,b'
  array_to_string('{a,NULL,b}'::TEXT[],',')  = 'a,b'

  string_to_array('',',') = '{}'
  string_to_array(' ',',')= '{ }'
  string_to_array(',',',')= '{,}'
  string_to_array('a',',')= '{a}'
  string_to_array('a,',',')   = '{a,}'
  string_to_array(',a',',')   = '{,a}'
  string_to_array('a,b',',')  = '{a,b}'

My thinking before was that it should be doing:

  string_to_array('',',') = '{}'

instead, but I now think that Greg has a point and these are nicer/more
generally useful semantics.

Hum, that all got a bit more verbose than I was expecting.  Ah well, I
hope it's somewhat useful.

-- 
  Sam  http://samason.me.uk/
 
 [1] as in the intersperse function in Haskell
   http://www.haskell.org/onlinereport/list.html#sect17.3
   intersperse # [a, bar] == [a, #, bar]
 note that here we're working with arrays of string, rather than
 arrays of characters as in the report.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch to speed up pg_dump

2009-04-01 Thread Vincze, Tamas


Hi,

We have a database with tens of millions of large objects, none
of them with any comments. When running pg_dump it spends several
hours looking for BLOB comments, finding none at the end but taxing
the server so much that the simplest query takes seconds to complete.

The attached patch fixes this by fetching the description only
of those BLOBs that may have it.

For those interested, some more info:

This query takes about 2 hours to execute:
sw2=# select count(*) from pg_largeobject;
   count
---
 135807552
(1 row)

I'm throttling the transfer rate on pg_dump's stdout so that it
doesn't affect server performance a lot, but obviously it didn't
help the function saving (looking for) BLOB comments.

Regards,
Tamas
--- postgresql-8.3.7/src/bin/pg_dump/pg_dump.c.orig 2009-03-31 
15:47:28.0 -0400
+++ postgresql-8.3.7/src/bin/pg_dump/pg_dump.c  2009-04-01 10:25:39.0 
-0400
@@ -1759,7 +1759,13 @@
 
/* Cursor to get all BLOB comments */
if (AH-remoteVersion = 70200)
-   blobQry = DECLARE blobcmt CURSOR FOR SELECT loid, 
obj_description(loid, 'pg_largeobject') FROM (SELECT DISTINCT loid FROM 
pg_largeobject) ss;
+   /* Get comments for BLOBs that have a matching pg_description 
row. When there are many
+* (millions) of BLOBs without comments this avoids fetching 
and then ignoring them,
+* potentionally saving hours of backup time.
+* Note that it may still select BLOBs that have no comment if 
a pg_description row's objoid
+* matches a BLOB's loid, but references an object contained in 
a different system catalog,
+* thus the PQgetisnull() check below is still needed to ignore 
them. */
+   blobQry = DECLARE blobcmt CURSOR FOR SELECT loid, 
obj_description(loid, 'pg_largeobject') FROM (SELECT DISTINCT loid FROM 
pg_description JOIN pg_largeobject ON (pg_description.objoid = 
pg_largeobject.loid)) ss;
else if (AH-remoteVersion = 70100)
blobQry = DECLARE blobcmt CURSOR FOR SELECT loid, 
obj_description(loid) FROM (SELECT DISTINCT loid FROM pg_largeobject) ss;
else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Robert Haas

On Tue, Mar 31, 2009 at 10:44 AM, Greg Stark st...@enterprisedb.com wrote:
 On Tue, Mar 31, 2009 at 3:42 PM, Sam Mason s...@samason.me.uk wrote:

  string_to_array('',',')::INT[]  = invalid input syntax for integer: 

 Oof. That's a good point.

+1.  I find this argument much more compelling than anything else
that's been offered up so far.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to speed up pg_dump

2009-04-01 Thread Tom Lane

Vincze, Tamas vin...@neb.com writes:
 +  * Note that it may still select BLOBs that have no comment if 
 a pg_description row's objoid
 +  * matches a BLOB's loid, but references an object contained in 
 a different system catalog,

... seems like that would be easy to fix ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Mar 31, 2009 at 10:44 AM, Greg Stark st...@enterprisedb.com wrote:
 On Tue, Mar 31, 2009 at 3:42 PM, Sam Mason s...@samason.me.uk wrote:
 string_to_array('',',')::INT[]  = invalid input syntax for integer: 
 
 Oof. That's a good point.

 +1.  I find this argument much more compelling than anything else
 that's been offered up so far.

Yeah.  It seems to me that if you consider only the case where the array
elements are text, there's a weak preference for considering '' to be a
single empty string; but as soon as you think about any other datatype,
there's a strong preference to consider it a zero-element list.  So I
too have come around to favor the latter interpretation.  Do we have
any remaining holdouts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 8.4 open items list

2009-04-01 Thread Robert Haas

On Sat, Mar 28, 2009 at 12:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Mar 27, 2009 at 11:42 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Both of those things are related to 8.4 feature changes, so we should
 either do them now or decide we won't do them.

 Well, Should we have a LOCALE option in CREATE DATABASE? has to do
 with making:

 CREATE DATABASE foo WITH LOCALE = bar
 equivalent to...
 CREATE DATABASE foo WITH COLLATE = bar, CTYPE = bar

 That might be nice to have, but since it's just syntactic sugar, I
 disagree that it's now or never.

 The reason I wanted it considered now is that part of the discussion
 was about whether to rename the existing options (add or remove LC_,
 I forget which).  Once 8.4 is out it'll be too late to reconsider that.

The current situation is not horribly consistent because createdb uses
--lc-foo and the SQL syntax uses FOO.  I'm not sure which is better,
or whether it's worth making them consistent.  As Dave Page pointed
out, other people have already started designing tools based on CVS
HEAD.  At any rate, I don't think we can make LC-FOO a keyword - it
would have to be LC_FOO or something.

 The second item, Should we reject toast.fillfactor in reloptions?,
 comes with a patch.  I think I agree with ITAGAKI Takahiro that it
 would be better to have reloptions specify a set of RELOPT_KINDs to
 which they pertain rather than a single one.

 +1.  And this is something it'd be better to get right now than later,
 since it's about an API that's meant to be used by add-on modules.

Hearing no objections, I guess we need a committer to pick up the
patch and consider applying.  Maybe Alvaro since he did the previous
reloptions work, but I don't know if he has time.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread David E. Wheeler


On Apr 1, 2009, at 9:02 AM, Tom Lane wrote:


+1.  I find this argument much more compelling than anything else
that's been offered up so far.


Yeah.  It seems to me that if you consider only the case where the  
array
elements are text, there's a weak preference for considering '' to  
be a
single empty string; but as soon as you think about any other  
datatype,

there's a strong preference to consider it a zero-element list.  So I
too have come around to favor the latter interpretation.  Do we have
any remaining holdouts?


Well, I'd just point out that the return value of string_to_array() is  
text[]. Thus, this is not a problem with string_to_array(), but a  
casting problem from text[] to int[]. Making string_to_array() return  
a NULL for this case to make casting simpler is addressing the problem  
in the wrong place, IMHO. If I want to do this in Perl, for example,  
I'd do something like this:


my @ints = grep { defined $_  $_ ne '' } split ',', $string;

So I split the string into an array, and then remove unreasonable  
values. This also allows me to set defaults:


my @ints = map { $_ || 0 } split ',', $string;

This ensures that I get the proper number of records in the example of  
something like '1,2,,4'.


So I still think that string_to_array('', ',') should return '{}',  
and how casting is handled should be left to the user to flexibly  
handle.


That said, I'm not seeing a simple function for modifying an array.  
I'd have to write one for each specific case. :-(


Best,

David




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread justin





Tom Lane wrote:

  Robert Haas robertmh...@gmail.com writes:
  
  
On Tue, Mar 31, 2009 at 10:44 AM, Greg Stark st...@enterprisedb.com wrote:


  On Tue, Mar 31, 2009 at 3:42 PM, Sam Mason s...@samason.me.uk wrote:
  
  
string_to_array('',',')::INT[] = invalid input syntax for integer: ""

  
  Oof. That's a good point.
  

  
  
  
  
+1.  I find this argument much more compelling than anything else
that's been offered up so far.

  
  
Yeah.  It seems to me that if you consider only the case where the array
elements are text, there's a weak preference for considering '' to be a
single empty string; but as soon as you think about any other datatype,
there's a strong preference to consider it a zero-element list.  So I
too have come around to favor the latter interpretation.  Do we have
any remaining holdouts?

			regards, tom lane
  


I'm still a hold out, We are taking a string putting it into a array
based on a delimiter. That is very simple and straight forward. Yet
many argue if we want to cast this into another data type the function
should deal with in limited cases. 

string_to_array('',',')::INT[] works as proposed 

But 
string_to_array(',,,', ',' )::INT[] Fails 
or
string_to_array('1,2,,4', ',' )::INT[] Fails .


I'm trying to understand the difference between a empty string to a
string with many blank entries between the delimiter. 
Consider
',,' = '' once the delimiter is removed . Yet Seven zero length
entries were passed. How is that going to
be handled 

In one case it works and yet other cases it fails. This is
inconsistent behavior. Unless all zero length strings are removed or
are treated as NULLs I can't see how casting to another type is going
to work.

If zero length strings are treated as NULLs this creates idea that zero
length strings are = to NULLs. 

The input is a string and the output is text[], casting to another
data type is error prone and should be handled by the programmer.

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Tom Lane

David E. Wheeler da...@kineticode.com writes:
 Well, I'd just point out that the return value of string_to_array() is  
 text[].

True...

 Thus, this is not a problem with string_to_array(), but a  
 casting problem from text[] to int[].

Nonsense.  The question is whether string_to_array is meant to be useful
for lists of anything except text.  I agree you could argue that it
isn't.  But even in the domain of text it's not all that cut-and-dried
whether string_to_array should return array[] or array[''] for empty
input.  So ISTM we're giving up less than we gain by choosing the former.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread David E. Wheeler


On Apr 1, 2009, at 10:09 AM, Tom Lane wrote:


Thus, this is not a problem with string_to_array(), but a
casting problem from text[] to int[].


Nonsense.  The question is whether string_to_array is meant to be  
useful

for lists of anything except text.  I agree you could argue that it
isn't.  But even in the domain of text it's not all that cut-and-dried
whether string_to_array should return array[] or array[''] for empty
input.  So ISTM we're giving up less than we gain by choosing the  
former.


Yeah. I'm okay with either, as long as it's consistent. I have a mild  
preference for '{}', but I can live with ARRAY[] instead. As long as  
it's not NULL that gets returned.


Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Hiroshi Inoue


Heikki Linnakangas wrote:

Tom Lane wrote:

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

Tom Lane wrote:

Maybe use a special string Translate Me First that
doesn't actually need to be end-user-visible, just so no one sweats 
over

getting it right in context.


Yep, something like that. There seems to be a magic empty string 
translation at the beginning of every po file that returns the 
meta-information about the translation, like translation author and 
date. Assuming that works reliably, I'll use that.


At first that sounded like an ideal answer, but I can see a gotcha:
suppose the translation's author's name contains some characters that
don't convert to the database encoding.  I suppose that would result in
failure, when we'd prefer it not to.  A single-purpose string could be
documented as whatever you translate this to should be pure ASCII,
never mind if it's sensible.


I just tried that, and it seems that gettext() does transliteration, so 
any characters that have no counterpart in the database encoding will be 
replaced with something similar, or question marks.

 Assuming that's
universal across platforms, and I think it is, using the empty string 
should work.


It also means that you can use lc_messages='ja' with 
server_encoding='latin1', but it will be unreadable because all the 
non-ascii characters are replaced with question marks.


It doesn't occur in the current Windows environment. As for Windows
gnu gettext which we are using, we would see the original msgid when
iconv can't convert the msgstr to the target codeset.

set client_encoding to utf_8;
SET
show server_encoding;
 server_encoding
-
 LATIN1
(1 row)

show lc_messages;
lc_messages

 Japanese_Japan.932
 (1 row)

1;
ERROR:  syntax error at or near 1
LINE 1: 1;


OTOH when the sever encoding is utf8 then

set client_encoding to utf_8;
SET
show server_encoding;
 server_encoding
-
 UTF8
(1 row)

show lc_messages;
lc_messages

 Japanese_Japan.932
(1 row)

1;
ERROR:  1またはその近辺で構文エラー
LINE 1: 1;  ^

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread David E. Wheeler


On Apr 1, 2009, at 10:05 AM, justin wrote:


string_to_array('',',')::INT[]  works as proposed

But
string_to_array(',,,', ',' )::INT[]   Fails
or
string_to_array('1,2,,4', ',' )::INT[] Fails .


I'm trying to understand the difference between a empty string to a  
string with  many blank entries between  the delimiter.
Consider   ',,'  = ''  once the delimiter is removed .  Yet  
Seven zero length entries were passed.  How is that going to be  
handled


Right, it's making a special case of '', which does seem rather  
inconsistent to me.


Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Tom Lane

Hiroshi Inoue in...@tpf.co.jp writes:
 Heikki Linnakangas wrote:
 I just tried that, and it seems that gettext() does transliteration, so 
 any characters that have no counterpart in the database encoding will be 
 replaced with something similar, or question marks.

 It doesn't occur in the current Windows environment. As for Windows
 gnu gettext which we are using, we would see the original msgid when
 iconv can't convert the msgstr to the target codeset.

Well, if iconv has no conversion to the codeset at all then there is no
point in selecting that particular codeset setting anyway.  The question
was about whether we can distinguish no conversion available from
conversion available, but the test string has some unconvertible
characters.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSL over Unix-domain sockets

2009-04-01 Thread Martijn van Oosterhout

On Tue, Mar 31, 2009 at 11:33:26PM +0300, Peter Eisentraut wrote:
 On Saturday 28 March 2009 00:42:28 Bruce Momjian wrote:
  I assume directory permissions controlling access to the socket file
  would be enough.  You are going to have to set up SSL certificates
  anyway for this so isn't that just as hard as telling the client where
  the socket file is located?
 
 The permissions on the socket file or the containing directory doesn't tell 
 much by itself, because you also need to consider who owns it.  What that 
 basically comes down to is that the client would need to specify something 
 like, I only want a connection to a server owned by 'postgres'.  But the 
 client currently has no way of saying that, so we'd need to invent something 
 new.

If you're going to get complicated, go the whole way do SO_PEERCRED on
the socket, then you get the UID of the server...

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature

Re: [HACKERS] Patch to speed up pg_dump

2009-04-01 Thread Vincze, Tamas


Tom Lane wrote:

Vincze, Tamas vin...@neb.com writes:

+* Note that it may still select BLOBs that have no comment if 
a pg_description row's objoid
+* matches a BLOB's loid, but references an object contained in 
a different system catalog,


... seems like that would be easy to fix ...


Yes, it wasn't that hard. The revised patch is attached.

Originally I didn't want to add more dependencies on the system
catalogs. Also, I've left the DECLARE statements untouched for
pre-v7.2 backends, so the NULL check on the comment is still
needed for those cases and if the description itself is NULL.

Regards,
Tamas
--- postgresql-8.3.7/src/bin/pg_dump/pg_dump.c.orig 2009-03-31 
15:47:28.0 -0400
+++ postgresql-8.3.7/src/bin/pg_dump/pg_dump.c  2009-04-01 14:07:55.0 
-0400
@@ -1759,7 +1759,18 @@
 
/* Cursor to get all BLOB comments */
if (AH-remoteVersion = 70200)
-   blobQry = DECLARE blobcmt CURSOR FOR SELECT loid, 
obj_description(loid, 'pg_largeobject') FROM (SELECT DISTINCT loid FROM 
pg_largeobject) ss;
+   /* Get comments for BLOBs that have a matching pg_description 
row. When there are many
+* (millions) of BLOBs without comments this avoids fetching 
and then ignoring them,
+* potentionally saving hours of backup time. */
+   blobQry = DECLARE blobcmt CURSOR FOR SELECT loid, 
obj_description(loid, 'pg_largeobject') FROM (
+   SELECT DISTINCT loid FROM pg_description JOIN 
pg_largeobject 
+   ON (pg_description.objoid = 
pg_largeobject.loid) 
+   WHERE classoid = (
+   SELECT oid FROM pg_class WHERE relname 
= 'pg_largeobject' AND relnamespace = (
+   SELECT oid FROM pg_namespace 
WHERE nspname='pg_catalog'
+   )
+   ) AND objsubid = 0
+ ) ss;
else if (AH-remoteVersion = 70100)
blobQry = DECLARE blobcmt CURSOR FOR SELECT loid, 
obj_description(loid) FROM (SELECT DISTINCT loid FROM pg_largeobject) ss;
else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Sam Mason

On Wed, Apr 01, 2009 at 10:23:18AM -0700, David E. Wheeler wrote:
 On Apr 1, 2009, at 10:05 AM, justin wrote:
 string_to_array('',',')::INT[]  works as proposed
 
 But
 string_to_array(',,,', ',' )::INT[]   Fails
 or
 string_to_array('1,2,,4', ',' )::INT[] Fails .
 
 
 I'm trying to understand the difference between a empty string to a  
 string with  many blank entries between  the delimiter.
 Consider   ',,'  = ''  once the delimiter is removed .  Yet  
 Seven zero length entries were passed.  How is that going to be  
 handled
 
 Right, it's making a special case of '', which does seem rather  
 inconsistent to me.

Yes it is; but it's a useful special case because it allows:

  string_to_array(array_to_string(col,','),',')

to do the right thing whether it's got zero or more elements in.  With
the current implementation you get a NULL back in the case of zero
elements and the expected array back the rest of the time.

To me, it doesn't really matter whether:

  string_to_array(',', ',' )::INT[]

fails or not; because array_to_string will never generate a string that
looks like this.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Greg Stark

On Wed, Apr 1, 2009 at 6:23 PM, David E. Wheeler da...@kineticode.com wrote:
 Right, it's making a special case of '', which does seem rather inconsistent
 to me.


David E. Wheeler da...@kineticode.com writes:

 On Apr 1, 2009, at 10:05 AM, justin wrote:

 string_to_array('',',')::INT[]  works as proposed

 But
 string_to_array(',,,', ',' )::INT[]   Fails
 or
 string_to_array('1,2,,4', ',' )::INT[] Fails .


 I'm trying to understand the difference between a empty string to a  string
 with  many blank entries between  the delimiter.

Well, uh, in one case it's empty and in the other case it's not?

 Consider   ',,'  = ''  once the delimiter is removed .  Yet  Seven zero
 length entries were passed.  How is that going to be  handled

Well it's pretty clear empty delimiters cannot be handled consistently. Some
languages handle them as a special case (splitting every character into a
separate string, for example -- which I'll point out will result in an empty
array as a result for an empty string input) or make it an error.

 Right, it's making a special case of '', which does seem rather  inconsistent
 to me.

It's not a special case -- or it's a special case whichever we choose,
depending on which way you look at it.

What we're talking about here is replacing the blank values in the following
tables. We can get either the first one right in both cases with {} as the
result, or we can get the second one right in the second table with {}.
Either way there is an inconsistency in at least one case.

The existing behaviour of returning NULL is the only consistent choice since
the correct value is unknown. And one could argue that it's easier to
replace NULL with the correct value if the programmer knows using coalesce
than it is to replace either  or {}. But I'm still leaning to thinking
that using an arbitrary choice that at least gets most users intentions is
better.

postgres=# select input,
string_to_array(array_to_string(input,','),',') as output from (values
(array[]::text[]),(array['foo']),(array['foo','bar']),(array['foo','bar','baz']))
as input(input);
 input |output
---+---
 {}|
 {foo} | {foo}
 {foo,bar} | {foo,bar}
 {foo,bar,baz} | {foo,bar,baz}
(4 rows)

postgres=# select input,
string_to_array(array_to_string(input,','),',') as output from (values
(array[]::text[]),(array['']),(array['','']),(array['','',''])) as
input(input);
   input|   output
+
 {} |
 {}   |
 {,}| {,}
 {,,} | {,,}
(4 rows)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to speed up pg_dump

2009-04-01 Thread Tom Lane

Vincze, Tamas vin...@neb.com writes:
 Tom Lane wrote:
 Vincze, Tamas vin...@neb.com writes:
 +  * Note that it may still select BLOBs that have no comment if 
 a pg_description row's objoid
 +  * matches a BLOB's loid, but references an object contained in 
 a different system catalog,
 
 ... seems like that would be easy to fix ...

 Yes, it wasn't that hard. The revised patch is attached.

Applied to HEAD and 8.3, using regclass cast to simplify ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Hiroshi Inoue


Tom Lane wrote:

Hiroshi Inoue in...@tpf.co.jp writes:

Heikki Linnakangas wrote:
I just tried that, and it seems that gettext() does transliteration, so 
any characters that have no counterpart in the database encoding will be 
replaced with something similar, or question marks.



It doesn't occur in the current Windows environment. As for Windows
gnu gettext which we are using, we would see the original msgid when
iconv can't convert the msgstr to the target codeset.


Well, if iconv has no conversion to the codeset at all then there is no
point in selecting that particular codeset setting anyway.  The question
was about whether we can distinguish no conversion available from
conversion available, but the test string has some unconvertible
characters.


What I meant is we would see no '?' when we use Windows gnu gettext.
Whether conversion available or not depends on individual msgids.
For example, when the Japanese msgstr corresponding to a msgid has
no characters other than ASCII accidentally, Windows gnu gettext will
use the msgstr not the original msgid.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Sam Mason

On Wed, Apr 01, 2009 at 07:40:16PM +0100, Greg Stark wrote:
 The existing behaviour of returning NULL is the only consistent
 choice since the correct value is unknown. And one could argue that
 it's easier to replace NULL with the correct value if the programmer
 knows using coalesce than it is to replace either  or {}.

Couldn't a similar argument be applied for division by zero?  Since it's
not known whether the user wants to get a divide by zero exception or
infinity PG should return NULL and punt the choice to the user.

I think everybody would agree that this would be a bad thing to do!

 But I'm
 still leaning to thinking that using an arbitrary choice that at least
 gets most users intentions is better.

I'd agree; returning NULL and not forcing the user to make a choice
is a bad design decision---the user doesn't need to put a coalesce
in and hence their code will probably break in strange ways when
they're not expecting it.  Nobody suggest adding a third parameter to
string_to_array, please!

The general mantra that seems to apply here is one good option is
better than two bad ones.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Robert Haas

On Wed, Apr 1, 2009 at 12:52 PM, David E. Wheeler da...@kineticode.com wrote:
 Well, I'd just point out that the return value of string_to_array() is
 text[]. Thus, this is not a problem with string_to_array(), but a casting
 problem from text[] to int[]. Making string_to_array() return a NULL for
 this case to make casting simpler is addressing the problem in the wrong
 place, IMHO. If I want to do this in Perl, for example, I'd do something
 like this:

 my @ints = grep { defined $_  $_ ne '' } split ',', $string;

I've written code that looks a whole lot like this myself, but there's
no easy way to do that in SQL.  SQL, in particular, lacks closures, so
grep {} and map {} don't exist.  I really, really wish they did, but I
believe that our type system is too woefully pathetic to be up to the
job.  So it seems to me that arguing that SQL (which lacks those
primitives) should match Perl (which has them) isn't really getting us
anywhere.

 my @ints = map { $_ || 0 } split ',', $string;

 This ensures that I get the proper number of records in the example of 
 something like '1,2,,4'.

I can't see that there's any way to do this in SQL regardless of how
we define this operation.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Robert Haas

On Wed, Apr 1, 2009 at 1:05 PM, justin jus...@emproshunts.com wrote:
 I'm still a hold out,  We are taking a string putting it into a array based
 on a delimiter.  That is very simple and straight forward.  Yet many argue
 if we want to cast this into another data type the function should deal with
 in limited cases.

 string_to_array('',',')::INT[]  works as proposed

 But
 string_to_array(',,,', ',' )::INT[]   Fails
 or
 string_to_array('1,2,,4', ',' )::INT[] Fails .

But... but... those aren't comma-separated lists of integers.  If they
were, it would work.

string_to_array('cow,dog,horse')::INT[] will also fail.

If you take 0 items of any type whatsoever and join them together with
commas, you will get the empty string.  It is also true that if you
join 1 item together with commas, you will get that item back, and if
that item is the empty string, you will now have the empty string.  I
think it's better to worry more about the first case because it
applies to any type at all, whereas the latter case ONLY applies in
situations where the empty string is a potentially legal value.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Sam Mason

On Wed, Apr 01, 2009 at 03:19:23PM -0400, Robert Haas wrote:
 On Wed, Apr 1, 2009 at 12:52 PM, David E. Wheeler da...@kineticode.com 
 wrote:
  Well, I'd just point out that the return value of string_to_array() is
  text[]. Thus, this is not a problem with string_to_array(), but a casting
  problem from text[] to int[]. Making string_to_array() return a NULL for
  this case to make casting simpler is addressing the problem in the wrong
  place, IMHO. If I want to do this in Perl, for example, I'd do something
  like this:
 
  my @ints = grep { defined $_  $_ ne '' } split ',', $string;
 
 I've written code that looks a whole lot like this myself, but there's
 no easy way to do that in SQL.  SQL, in particular, lacks closures, so
 grep {} and map {} don't exist.  I really, really wish they did, but

I don't grok Perl so I'd appreciate an explanation of what the above
does, at a guess it looks a lot like the function I wrote up thread[1]
called array_filter_blanks and using it would look like:

  SELECT array_filter_blanks(string_to_array(arr,',')) AS ints;

 I
 believe that our type system is too woefully pathetic to be up to the
 job.

This has very little to do with PG's type system.  You either want
functions to be first class objects or support for closures, blaming the
type system is not correct.

-- 
  Sam  http://samason.me.uk/
 
 [1] http://archives.postgresql.org/pgsql-hackers/2009-03/msg01373.php

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread justin

If someone can show me a real world example  this logic simplifies the 
code and has more uses I'll bite



I just presently can't see how this works better.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Martin Gainty


Split strings into array elements using provided delimiter



string_to_array('xx~^~yy~^~zz', '~^~') 

output: {xx,yy,zz}


http://www.postgresql.org/docs/8.3/interactive/functions-array.html

?
Martin 
__ 
Disclaimer and confidentiality note 
This message is confidential and may be privileged. If you are not the intended 
recipient, we kindly ask you to  please inform the sender. Any unauthorised 
dissemination or copying hereof is prohibited. This message serves for 
information purposes only and shall not have any legally binding effect. Given 
that e-mails can easily be subject to manipulation, we can not accept any 
liability for the content provided.






 Date: Wed, 1 Apr 2009 15:49:42 -0400
 From: jus...@emproshunts.com
 To: robertmh...@gmail.com
 CC: t...@sss.pgh.pa.us; st...@enterprisedb.com; s...@samason.me.uk; 
 pgsql-gene...@postgresql.org; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [GENERAL] string_to_array with empty input
 
 If someone can show me a real world example  this logic simplifies the 
 code and has more uses I'll bite
 
 
 I just presently can't see how this works better.
 
 -- 
 Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

_
Rediscover Hotmail®: Get e-mail storage that grows with you. 
http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Storage1_042009

Re: [HACKERS] [Snowball-discuss] Snowball release cycle ?

2009-04-01 Thread Oleg Bartunov


Grant,

I'm originator of this thread and PostgreSQL is also a big user of the
Snowball project, so we also are very interested in the vital 
activity of the project. However, I see issue with license, which is

currently BSD and this allows us to use snowball in PostgreSQL (which
is also BSD licensed). Is't possible to preserve BSD license under
Apache Software Foundation ?

Another question is not about snowball compiler, but about specific
stemmers, which contributed by people. Are you willing also to
support this activity ?

I'm crossposting also to the -hackers discussion list of the PostgreSQL
project.

Oleg
On Wed, 1 Apr 2009, Grant Ingersoll wrote:

Forwarding a private message I sent to Dr. Porter to the list as a whole as a 
possible way of addressing Dr. Porter's desire to code less and also the 
desire for releases, etc.  Dr. Porter's response was that I should bring it 
up within the community to gauge other's interest, so here it is:

snip
Hi Dr. Porter,

My name is Grant Ingersoll and I am a committer on the Apache Lucene project 
(http://lucene.apache.org), which is a big user of the Snowball code.  In 
reading this message and hearing of your desire not to code anymore, it 
occurred to me that maybe the Snowball project would be interested in 
becoming a part of the Apache Software Foundation, whereby it can get 
attention from a community of developers/committers.  For instance, just the 
other day, I was thinking about a performance improvement that we could use 
in Lucene land that would allow passing in a char [] array instead of a 
String to the stemmer, since all of our Token objects are represented that 
way and it now forces us to construct a new String every time.


If you're interested, let me know and we can figure out where to go from 
here.  The donation process for projects outside of the ASF is pretty easy 
and I would propose that it become a sub-project of the Lucene project, so 
you can know it will be in good hands.  Naturally, you and Richard would also 
be committers on the project, so you could still be involved as much as you 
like.


At any rate, thanks for the great piece of software.

Cheers,
Grant Ingersoll
/snip

So, what do others think about donating the Snowball stemmers to the ASF and 
in particular the Lucene project?  Snowball would likely be a subproject of 
Lucene (like Solr, Mahout, etc.) and people could contribute patches, become 
committers, etc. just as any other ASF project, i.e. based on merit.


-Grant


On Apr 1, 2009, at 4:16 AM, Martin Porter wrote:



Oleg,

Hello, nice to hear from you you again, after so long. Richard and I have
discussed the idea of having a proper release policy from time to time, but
have never managed to put one in place. I think that is acceptable, since
snowball changes less and less as the years go by. (I am 65 now, and want 
to

withdraw from software work in the future.)

As well as the snowball compiler, I have no plans to alter the stemmers
themselves (except perhaps the English stemmer -- I have thought of undoing
or modifying an earlier extension.) The version we're currently releasing 
is

just as stable as the one you used in 2007, and probably little different.





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Justin





Martin Gainty wrote:

  
  Split strings into array elements using provided
delimiter
  
string_to_array('xx~^~yy~^~zz', '~^~') 
output: {xx,yy,zz}
  
http://www.postgresql.org/docs/8.3/interactive/functions-array.html
  

Sorry thats not the question i'm asking. 

We are debating if it makes sense to change the output in certain
cases. 

I'm for not returning nulls or returning zero element array. 

I'm asking how is the other better by giving a real world example??? I
don't see the plus side at the moment.

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 If you take 0 items of any type whatsoever and join them together with
 commas, you will get the empty string.  It is also true that if you
 join 1 item together with commas, you will get that item back, and if
 that item is the empty string, you will now have the empty string.  I
 think it's better to worry more about the first case because it
 applies to any type at all, whereas the latter case ONLY applies in
 situations where the empty string is a potentially legal value.

I'm starting to vacillate again.  It's clear that for the purposes
of string_to_array, an empty input string is fundamentally ambiguous:
it could mean a list of no things, or a list of one empty thing.
So the two cases in which an application can safely make use of this
function are
(1) if lists of no things never happen.
(2) if lists never contain empty things.
Either rule allows us to resolve the ambiguity.  We've been discussing
the fact that (2) is an okay assumption for many non-text data types,
but none-the-less string_to_array is in itself a text function and (2)
is not very good for text.  Making this worse, the format *appears*
to work fine for empty strings, so long as you don't have exactly
one of them.  So it seems like applications might be much more likely
to violate (2) than (1).

Another way to state the point is that we can offer people a choice of
two limitations: string_to_array doesn't work for zero-length lists,
or string_to_array doesn't work for empty strings (except most of the
time, it does).  The former is sounding less likely to bite people
unexpectedly.

Or we could stick to the current behavior and say use COALESCE() to
resolve the ambiguity, if you need to.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Snowball-discuss] Snowball release cycle ?

2009-04-01 Thread Richard Boulton

On Wed, Apr 01, 2009 at 05:10:01PM -0400, Grant Ingersoll wrote:
 No, it would have to be ASL 2, but that is pretty similar to BSD, no?  
 (caveat: IANAL)  i.e non-viral, free to use however you want, just  
 don't take credit for it.  Everything I've read says the two are  
 completely compatible

As I understand it, ASL 2 is incompatible with GPL 2, at least according to
the FSF.  This would be a showstopper problem for me.

-- 
Richard

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Robert Haas

On Wed, Apr 1, 2009 at 3:49 PM, Sam Mason s...@samason.me.uk wrote:
 On Wed, Apr 01, 2009 at 03:19:23PM -0400, Robert Haas wrote:
 On Wed, Apr 1, 2009 at 12:52 PM, David E. Wheeler da...@kineticode.com 
 wrote:
  Well, I'd just point out that the return value of string_to_array() is
  text[]. Thus, this is not a problem with string_to_array(), but a casting
  problem from text[] to int[]. Making string_to_array() return a NULL for
  this case to make casting simpler is addressing the problem in the wrong
  place, IMHO. If I want to do this in Perl, for example, I'd do something
  like this:
 
  my @ints = grep { defined $_  $_ ne '' } split ',', $string;

 I've written code that looks a whole lot like this myself, but there's
 no easy way to do that in SQL.  SQL, in particular, lacks closures, so
 grep {} and map {} don't exist.  I really, really wish they did, but

 I don't grok Perl so I'd appreciate an explanation of what the above
 does, at a guess it looks a lot like the function I wrote up thread[1]
 called array_filter_blanks and using it would look like:

  SELECT array_filter_blanks(string_to_array(arr,',')) AS ints;

map { closure } @list applies closure to each element of list and
makes a new list out of the results.
grep { closure } @list applies closure to each element of list and
returns the list elements for which the closure returns true.

 I
 believe that our type system is too woefully pathetic to be up to the
 job.

 This has very little to do with PG's type system.  You either want
 functions to be first class objects or support for closures, blaming the
 type system is not correct.

I'm speaking primarily of functions as first-class objects, though
closures would be nice too.   But consider an operation like

UPDATE rel SET col1 = MAP ( f OVER col2 )

We need to be able to determine whether this is well-typed, just as we
do now for any other SQL query.  Specifically, we need to check that f
is a one argument function whose argument type is that of col2 and
whose return type is that of col1.  My understanding is that right now
types are represented as 32-bit OIDs.  I think they'd need to be some
sort of more complex structure in order to handle cases like this.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Robert Haas

On Wed, Apr 1, 2009 at 5:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Or we could stick to the current behavior and say use COALESCE() to
 resolve the ambiguity, if you need to.

If there's no consensus on changing the behavior, it's probably better
to be backward compatible than not.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Snowball-discuss] Snowball release cycle ?

2009-04-01 Thread Andrew Dunstan




Richard Boulton wrote:

On Wed, Apr 01, 2009 at 05:10:01PM -0400, Grant Ingersoll wrote:
  
No, it would have to be ASL 2, but that is pretty similar to BSD, no?  
(caveat: IANAL)  i.e non-viral, free to use however you want, just  
don't take credit for it.  Everything I've read says the two are  
completely compatible



As I understand it, ASL 2 is incompatible with GPL 2, at least according to
the FSF.  This would be a showstopper problem for me.

  


Er, what does Postgres have that is covered by GPL2?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Snowball-discuss] Snowball release cycle ?

2009-04-01 Thread Kevin Grittner

 Richard Boulton rich...@lemurconsulting.com wrote: 
 On Wed, Apr 01, 2009 at 05:10:01PM -0400, Grant Ingersoll wrote:
 No, it would have to be ASL 2, but that is pretty similar to BSD,
no?  
 (caveat: IANAL)  i.e non-viral, free to use however you want, just 

 don't take credit for it.  Everything I've read says the two are  
 completely compatible
 
 As I understand it, ASL 2 is incompatible with GPL 2, at least
according to
 the FSF.  This would be a showstopper problem for me.
 
Where does GPL come into it?  (I hadn't seen that mentioned before for
either product.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Speeding up ExecProject

2009-04-01 Thread Tom Lane

I got interested in why trivial window functions looked really slow,
when using this example in the regression database:

regression=# explain analyze select *, row_number() over (order by unique2) 
from tenk1;
  QUERY PLAN
   
---
 WindowAgg  (cost=0.00..781.25 rows=1 width=244) (actual time=0.036..28.923 
rows=1 loops=1)
   -  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..631.25 rows=1 
width=244) (actual time=0.023..5.727 rows=1 loops=1)
 Total runtime: 30.423 ms
(3 rows)

(Note: all examples in this post are the median of three tries, since
the timings are a bit noisy.  That means the data is fully cached.
This is CVS HEAD on Fedora 10 x86_64.)

Profiling soon revealed that the bulk of the runtime was going into
ExecProject() and subsidiary expression-evaluation functions.  In fact,
the problem can be illustrated without any window functions at all:

regression=# explain analyze select * from tenk1;
QUERY PLAN  
   
---
 Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=244) (actual 
time=0.008..2.250 rows=1 loops=1)
 Total runtime: 3.652 ms
(2 rows)

regression=# explain analyze select *,1 from tenk1;
 QUERY PLAN 


 Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=244) (actual 
time=0.011..13.232 rows=1 loops=1)
 Total runtime: 14.602 ms
(2 rows)

The above does not show that ExecEvalConst is 5x slower than the rest
of the query put together.  Rather, what is happening is that the moment
the query targetlist contains anything but Vars, we stop using the
optimized ExecVariableList() code and fall back on the generic code in
ExecTargetList().  There are rather a lot of columns in tenk1 (16 to be
exact) and so this is a big cost.

It occurred to me that we could refactor ExecProject and associated
logic so that we use ExecVariableList-like code for all simple Vars in a
targetlist, and only invoke the full expression evaluation machinery for
the non-Var members of a tlist (if any).  The attached patch prototypes
this idea (it's code-complete, but the comments suck...).  With the
patch, adding ,1 costs a lot less:

regression=# explain analyze select * from tenk1;
QUERY PLAN  
   
---
 Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=244) (actual 
time=0.009..2.345 rows=1 loops=1)
 Total runtime: 3.780 ms
(2 rows)

regression=# explain analyze select *,1 from tenk1;
QUERY PLAN  
   
---
 Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=244) (actual 
time=0.010..5.949 rows=1 loops=1)
 Total runtime: 7.396 ms
(2 rows)

and the original example looks better too:

regression=# explain analyze select *, row_number() over (order by unique2) 
from tenk1;
  QUERY PLAN
   
---
 WindowAgg  (cost=0.00..781.25 rows=1 width=244) (actual time=0.035..18.727 
rows=1 loops=1)
   -  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..631.25 rows=1 
width=244) (actual time=0.025..5.146 rows=1 loops=1)
 Total runtime: 20.228 ms
(3 rows)

I may be overreacting to the EXPLAIN ANALYZE timings.  In practice,
if we were actually returning so many columns to the client, I/O
conversions and data transmission would exceed the ExecProject effort
by a significant margin.  Still, there are lots of cases where
targetlists contain a lot of simple Vars, so this looks like it's
probably worth doing.  Comments/objections?

regards, tom lane

Index: src/backend/executor/execQual.c
===
RCS file: /cvsroot/pgsql/src/backend/executor/execQual.c,v
retrieving revision 1.243
diff -c -r1.243 execQual.c
*** src/backend/executor/execQual.c 27 Mar 2009 18:30:21 -  1.243
--- src/backend/executor/execQual.c

Re: [HACKERS] [Snowball-discuss] Snowball release cycle ?

2009-04-01 Thread Grant Ingersoll



On Apr 1, 2009, at 4:51 PM, Oleg Bartunov wrote:


Grant,

I'm originator of this thread and PostgreSQL is also a big user of the
Snowball project, so we also are very interested in the vital  
activity of the project. However, I see issue with license, which is

currently BSD and this allows us to use snowball in PostgreSQL (which
is also BSD licensed). Is't possible to preserve BSD license under
Apache Software Foundation ?


No, it would have to be ASL 2, but that is pretty similar to BSD, no?  
(caveat: IANAL)  i.e non-viral, free to use however you want, just  
don't take credit for it.  Everything I've read says the two are  
completely compatible





Another question is not about snowball compiler, but about specific
stemmers, which contributed by people. Are you willing also to
support this activity ?


ASF is all about community, so the community would be responsible for  
supporting the activity of the project.  Obviously, Dr. Porter and  
Richard would need to be involved, but one goal of the ASF is that any  
healthy project should be able to survive the loss of a committer.


The way the ASF works, people who contribute a lot would become  
committers, etc., so yeah, I think the answer is yes.  It is a fairly  
well defined process for doing this.  i.e. if you contribute a lot,  
the other committers would nominate you to become a committer.  You  
can read more about how the ASF works at www.apache.org/


HTH,
Grant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Snowball-discuss] Snowball release cycle ?

2009-04-01 Thread Guillaume Smet

On Wed, Apr 1, 2009 at 11:44 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Where does GPL come into it?  (I hadn't seen that mentioned before for
 either product.)

Richard is one of the developers of Snowball so he might want to keep
its license compatible with GPL2.

Moreover, Snowball is used by a lot of projects, and probably some of
them are GPL2.

Note that ASL2 is however compatible with GPL3.

-- 
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Snowball-discuss] Snowball release cycle ?

2009-04-01 Thread Tom Lane

Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Where does GPL come into it?  (I hadn't seen that mentioned before for
 either product.)

Keep in mind that you're seeing some fraction of the discussion on
snowball-discuss (whatever is from people also subscribed to
pgsql-hackers; everything else is stuck in the moderation queue).
I would imagine the GPL complaint is from someone who's using the
snowball code in a GPL project.

However, from a Postgres point of view, I'm not sure that we'd be
totally happy with ASL either.  We'd certainly have to stop bundling
the snowball code in the Postgres distribution, as we do now.  Or more
likely, freeze it at the last BSD-licensed release.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Snowball-discuss] Snowball release cycle ?

2009-04-01 Thread Richard Boulton

On Wed, Apr 01, 2009 at 05:41:53PM -0400, Andrew Dunstan wrote:
 Richard Boulton wrote:
 As I understand it, ASL 2 is incompatible with GPL 2, at least according to
 the FSF.  This would be a showstopper problem for me.

 Er, what does Postgres have that is covered by GPL2?

I think cross posting has confused this thread - I was talking about
snowball not postgres.  I don't use postgres at all; in fact, I'm not sure
why this thread was copied to pgsql-hackers at all - sorry for the noise if
there wasn't a good reason.

What I mean is that I use snowball in a project which is GPL-2, and cannot
be relicensed, and snowball changing to ASL 2 would be a showstopper
problem for me for that reason.

-- 
Richard

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] protect dll lib initialisation against any exception, for 8.5

2009-04-01 Thread Tom Lane

Pavel Stehule pavel.steh...@gmail.com writes:
 attached patch allows raising exception from _PG_init function as was
 discussed before.

I fooled around with this and came up with the attached improved
version, which allows reporting the full error status.  However,
after thinking some more I feel that this is probably a cure worse
than the disease.  If we simply leave the code as it stands, an
elog(ERROR) in an init function doesn't corrupt dfmgr.c's internal list,
which is what I had been fearing when I complained about the issue.
The worst that happens is that we leave the library loaded and leak
a little bit of memory.  Unloading the library, as the patch does,
could easily make things worse not better.  Consider the not-unlikely
case that the library installs itself in a few callback hooks and
then fails.  If we unload the library, those hooks represent
*guaranteed* core dumps on next use.  If we don't unload, the hook
functions might or might not work too well --- presumably not everything
they need has been initialized --- but it's hard to imagine an outcome
that's worse than a guaranteed core dump.

So I'm thinking this is really unnecessary and we should leave well
enough alone.

regards, tom lane

Index: dfmgr.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/fmgr/dfmgr.c,v
retrieving revision 1.98
diff -c -r1.98 dfmgr.c
*** dfmgr.c 1 Jan 2009 17:23:51 -   1.98
--- dfmgr.c 1 Apr 2009 23:41:37 -
***
*** 178,184 
  static void *
  internal_load_library(const char *libname)
  {
!   DynamicFileList *file_scanner;
PGModuleMagicFunction magic_func;
char   *load_error;
struct stat stat_buf;
--- 178,184 
  static void *
  internal_load_library(const char *libname)
  {
!   DynamicFileList * volatile file_scanner;
PGModuleMagicFunction magic_func;
char   *load_error;
struct stat stat_buf;
***
*** 277,287 
}
  
/*
!* If the library has a _PG_init() function, call it.
 */
PG_init = (PG_init_t) pg_dlsym(file_scanner-handle, 
_PG_init);
if (PG_init)
!   (*PG_init) ();
  
/* OK to link it into list */
if (file_list == NULL)
--- 277,329 
}
  
/*
!* If the library has a _PG_init() function, call it.  Guard 
against
!* the function possibly throwing elog(ERROR).
 */
PG_init = (PG_init_t) pg_dlsym(file_scanner-handle, 
_PG_init);
if (PG_init)
!   {
!   MemoryContext   oldcontext = CurrentMemoryContext;
! 
!   PG_TRY();
!   {
!   (*PG_init) ();
!   }
!   PG_CATCH();
!   {
!   ErrorData  *edata;
! 
!   /* fetch the error status so we can change it */
!   MemoryContextSwitchTo(oldcontext);
!   edata = CopyErrorData();
!   FlushErrorState();
! 
!   /*
!* The const pointers in the error status very 
likely point
!* at constant strings in the library, which we 
are about to
!* unload.  Copy them so we don't dump core 
while reporting
!* the error.  This might leak a bit of memory 
but it
!* beats the alternatives.
!*/
!   if (edata-filename)
!   edata-filename = 
pstrdup(edata-filename);
!   if (edata-funcname)
!   edata-funcname = 
pstrdup(edata-funcname);
!   if (edata-domain)
!   edata-domain = pstrdup(edata-domain);
! 
!   /* library might have already called some of 
its functions */
!   
clear_external_function_hash(file_scanner-handle);
! 
!   /* try to unlink library */
!   pg_dlclose(file_scanner-handle);
!   free((char *) file_scanner);
! 
!   /* complain */
!   ReThrowError(edata);
!   }
!   PG_END_TRY();
!   }
  
/* OK to link it into list */
if (file_list == NULL)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Steve Crawford


Tom Lane wrote:


I'm starting to vacillate again.  It's clear that for the purposes
of string_to_array, an empty input string is fundamentally ambiguous:
it could mean a list of no things, or a list of one empty thing.
  
Agreed. Of the two, a list of one empty thing makes string_to_array 
closer to an inverse of array_to_string.



Or we could stick to the current behavior and say use COALESCE() to
resolve the ambiguity, if you need to.

  
Currently string_to_array(null, ',') yields a null result - 
indistinguishable from string_to_array('',','). Wrapping in coalesce 
does not help distinguish true null input from empty-string input. I'm 
not sure at the moment what other cases exist where non-null input 
generates null output.


If the decision is to leave the behavior unchanged, it at least cries 
out for a documentation patch.


Cheers,
Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplestore API problem

2009-04-01 Thread Tom Lane

Hitoshi Harada umi.tan...@gmail.com writes:
 I don't think advising or documenting such restriction to the future
 programmer is a good idea from the point of encapsulation. I've come
 up with an idea, that the read pointers remember their last slot as
 you suggest and materialize it when another slot comes in
 tuplestore_gettupleslot() and forget the former one. By this, you
 don't need the restriction above, adding minimum penalty that is paid
 if and only if you pass more than one tupleslot to tuplestore, which
 doesn't seem to be occurred currently.

I think that the problem I found a couple days ago
http://archives.postgresql.org/pgsql-hackers/2009-03/msg01247.php
probably blows a hole in all these schemes.  After-the-fact
materialization of a TupleTableSlot won't protect any pass-by-reference
Datums that have already been fetched from that slot.  Perhaps we
could invent a coding rule that would prevent the situation, but
I think it would be awfully easy to mess up in any case where you
actually had a need to keep track of more than one current tuple.

I now think that the CVS-HEAD arrangement is about as good as we
should expect to get for 8.4.  The experiments I've been doing
suggest that the extra tuple copying isn't a major bottleneck
anyway...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] protect dll lib initialisation against any exception, for 8.5

2009-04-01 Thread Pavel Stehule

2009/4/2 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 attached patch allows raising exception from _PG_init function as was
 discussed before.

 I fooled around with this and came up with the attached improved
 version, which allows reporting the full error status.  However,
 after thinking some more I feel that this is probably a cure worse
 than the disease.  If we simply leave the code as it stands, an
 elog(ERROR) in an init function doesn't corrupt dfmgr.c's internal list,
 which is what I had been fearing when I complained about the issue.
 The worst that happens is that we leave the library loaded and leak
 a little bit of memory.  Unloading the library, as the patch does,
 could easily make things worse not better.  Consider the not-unlikely
 case that the library installs itself in a few callback hooks and
 then fails.  If we unload the library, those hooks represent
 *guaranteed* core dumps on next use.  If we don't unload, the hook
 functions might or might not work too well --- presumably not everything
 they need has been initialized --- but it's hard to imagine an outcome
 that's worse than a guaranteed core dump.

 So I'm thinking this is really unnecessary and we should leave well
 enough alone.


I see it. I thing , an safety of this exception should be solved only
by programmer. It's important to release all hooks, and then raise an
exception. It is in developer responsibility.

regards
Pavel Stehule

                        regards, tom lane


 Index: dfmgr.c
 ===
 RCS file: /cvsroot/pgsql/src/backend/utils/fmgr/dfmgr.c,v
 retrieving revision 1.98
 diff -c -r1.98 dfmgr.c
 *** dfmgr.c     1 Jan 2009 17:23:51 -       1.98
 --- dfmgr.c     1 Apr 2009 23:41:37 -
 ***
 *** 178,184 
  static void *
  internal_load_library(const char *libname)
  {
 !       DynamicFileList *file_scanner;
        PGModuleMagicFunction magic_func;
        char       *load_error;
        struct stat stat_buf;
 --- 178,184 
  static void *
  internal_load_library(const char *libname)
  {
 !       DynamicFileList * volatile file_scanner;
        PGModuleMagicFunction magic_func;
        char       *load_error;
        struct stat stat_buf;
 ***
 *** 277,287 
                }

                /*
 !                * If the library has a _PG_init() function, call it.
                 */
                PG_init = (PG_init_t) pg_dlsym(file_scanner-handle, 
 _PG_init);
                if (PG_init)
 !                       (*PG_init) ();

                /* OK to link it into list */
                if (file_list == NULL)
 --- 277,329 
                }

                /*
 !                * If the library has a _PG_init() function, call it.  Guard 
 against
 !                * the function possibly throwing elog(ERROR).
                 */
                PG_init = (PG_init_t) pg_dlsym(file_scanner-handle, 
 _PG_init);
                if (PG_init)
 !               {
 !                       MemoryContext   oldcontext = CurrentMemoryContext;
 !
 !                       PG_TRY();
 !                       {
 !                               (*PG_init) ();
 !                       }
 !                       PG_CATCH();
 !                       {
 !                               ErrorData  *edata;
 !
 !                               /* fetch the error status so we can change it 
 */
 !                               MemoryContextSwitchTo(oldcontext);
 !                               edata = CopyErrorData();
 !                               FlushErrorState();
 !
 !                               /*
 !                                * The const pointers in the error status 
 very likely point
 !                                * at constant strings in the library, which 
 we are about to
 !                                * unload.  Copy them so we don't dump core 
 while reporting
 !                                * the error.  This might leak a bit of 
 memory but it
 !                                * beats the alternatives.
 !                                */
 !                               if (edata-filename)
 !                                       edata-filename = 
 pstrdup(edata-filename);
 !                               if (edata-funcname)
 !                                       edata-funcname = 
 pstrdup(edata-funcname);
 !                               if (edata-domain)
 !                                       edata-domain = 
 pstrdup(edata-domain);
 !
 !                               /* library might have already called some of 
 its functions */
 !                               
 clear_external_function_hash(file_scanner-handle);
 !
 !                               /* try to unlink library */
 !                               pg_dlclose(file_scanner-handle);
 !                               free((char *) file_scanner);
 !
 !                               /* complain */
 !

Re: [HACKERS] protect dll lib initialisation against any exception, for 8.5

2009-04-01 Thread Tom Lane

Pavel Stehule pavel.steh...@gmail.com writes:
 2009/4/2 Tom Lane t...@sss.pgh.pa.us:
 So I'm thinking this is really unnecessary and we should leave well
 enough alone.

 I see it. I thing , an safety of this exception should be solved only
 by programmer. It's important to release all hooks, and then raise an
 exception. It is in developer responsibility.

Well, if the init function is sufficiently carefully coded to back out
just the changes it's managed to apply, then good for it.  But we still
aren't losing much by leaving dfmgr as-is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] hstore bug and repair method

2009-04-01 Thread maosen.zhang

Hi all:

I’ve found a bug about hstore, example below:

 

CREATE TABLE temp_table (

dcp smallint,

atext hstore

);

COPY temp_table (dcp, atext) FROM stdin;

800 =NULL

\.

 

Then do the sql twice below :

Select * from temp_table;

 

Pg(version 8.3 and above) will coredump, backtrace below:

 

Program received signal SIGABRT, Aborted.

0x002a95a5821d in raise () from /lib64/tls/libc.so.6

(gdb) bt

#0  0x002a95a5821d in raise () from /lib64/tls/libc.so.6

#1  0x002a95a59a1e in abort () from /lib64/tls/libc.so.6

#2  0x00724b50 in ExceptionalCondition (conditionName=Could not find
the frame base for ExceptionalCondition.

) at assert.c:57

#3  0x00747c6f in pfree (pointer=0xb11290) at mcxt.c:591

#4  0x0044a500 in printtup (slot=0xb101d8, self=0xb1bac0) at
printtup.c:344

#5  0x00572de9 in ExecSelect (slot=0xb101d8, dest=0xb1bac0,
estate=0xb10058) at execMain.c:1554

#6  0x00572c70 in ExecutePlan (estate=0xb10058, planstate=0xb10388,
operation=CMD_SELECT, numberTuples=0, 

direction=ForwardScanDirection, dest=0xb1bac0) at execMain.c:1480

#7  0x00570ebe in ExecutorRun (queryDesc=0xb1d910,
direction=ForwardScanDirection, count=0) at execMain.c:270

#8  0x0064b6ad in PortalRunSelect (portal=0xb06fd8, forward=1
'\001', count=0, dest=0xb1bac0) at pquery.c:943

#9  0x0064b34d in PortalRun (portal=0xb06fd8,
count=9223372036854775807, isTopLevel=1 '\001', dest=0xb1bac0, 

altdest=0xb1bac0, completionTag=0x7fbfffee60 ) at pquery.c:769

#10 0x00645a0d in exec_simple_query (query_string=0xac5c88 select *
from temp_table ;) at postgres.c:1004

#11 0x006497a0 in PostgresMain (argc=4, argv=0xa45948,
username=0xa45920 cherokee) at postgres.c:3631

#12 0x0060ba2e in BackendRun (port=0xa468f0) at postmaster.c:3207

#13 0x0060afbd in BackendStartup (port=0xa468f0) at
postmaster.c:2830

#14 0x00608b7c in ServerLoop () at postmaster.c:1274

#15 0x0060855d in PostmasterMain (argc=3, argv=0xa25c30) at
postmaster.c:1029

#16 0x005a1dcb in main (argc=3, argv=0xa25c30) at main.c:188

 

 

The repair method is :

 

Hstore_io.c:

440 buflen = (4 /*  */ + 2 /* = */ + 2 /* , */ ) * in-size +

441 2 /* esc */ * (VARSIZE(in) - CALCDATASIZE(in-size, 0));

442 

443 -- out = ptr = palloc(buflen);

++ out = ptr = palloc(buflen+1);

 

 

此致

敬礼

茂森

 



 

image001.gif

Re: [HACKERS] protect dll lib initialisation against any exception, for 8.5

2009-04-01 Thread Greg Stark

Hmm. One case where this logic might not be true would be if the dll  
relies on c++ style static initializers and destructors. In that case  
it may very well leave hooks in place in case of an error and only  
clean them up when you call dlclose().




--
Greg


On 1 Apr 2009, at 22:58, Tom Lane t...@sss.pgh.pa.us wrote:


Pavel Stehule pavel.steh...@gmail.com writes:

2009/4/2 Tom Lane t...@sss.pgh.pa.us:

So I'm thinking this is really unnecessary and we should leave well
enough alone.



I see it. I thing , an safety of this exception should be solved only
by programmer. It's important to release all hooks, and then raise an
exception. It is in developer responsibility.


Well, if the init function is sufficiently carefully coded to back out
just the changes it's managed to apply, then good for it.  But we  
still

aren't losing much by leaving dfmgr as-is.

   regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] protect dll lib initialisation against any exception, for 8.5

2009-04-01 Thread Pavel Stehule

2009/4/2 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 2009/4/2 Tom Lane t...@sss.pgh.pa.us:
 So I'm thinking this is really unnecessary and we should leave well
 enough alone.

 I see it. I thing , an safety of this exception should be solved only
 by programmer. It's important to release all hooks, and then raise an
 exception. It is in developer responsibility.

 Well, if the init function is sufficiently carefully coded to back out
 just the changes it's managed to apply, then good for it.  But we still
 aren't losing much by leaving dfmgr as-is.


Maybe an safe minimum is cleaning symbols table without closing
library. Then the code from lib will be accessible, but functionality
will be disabled (for Postgres)?

regards
Pavel Stehule


                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] failed to commit client_encoding explained

2009-04-01 Thread Tom Lane

I think I see the reason for the recent report of $SUBJECT.
Starting with a client_encoding different from server_encoding,
change it to something else and then roll back, for example

u8=# show server_encoding ;
 server_encoding 
-
 UTF8
(1 row)

u8=# set client_encoding to latin1;
SET
u8=# begin;
BEGIN
u8=# set client_encoding to latin2;
SET
u8=# rollback;
ROLLBACK

and sure enough LOG:  failed to commit client_encoding
pops up in the postmaster log.

The reason is that SetClientEncoding() fails, because it doesn't
want to risk doing catalog lookups, unless IsTransactionState()
is true.  And in the above situation, we try to restore
client_encoding to latin1 in TRANS_ABORT state, for which
IsTransactionState() returns FALSE.

This misbehavior is new in 8.3, because in prior releases
IsTransactionState() would return TRUE for TRANS_ABORT.

I still think that the tightening of IsTransactionState is correct:
it is not a good idea to be trying to do catalog lookups in an
already-failed transaction.  Rather, we need to fix the mbutils
machinery so that it can restore a previously-accepted encoding
combination without doing any fresh catalog lookups.  This is
really pretty analogous to the pushups that assign_role and
assign_session_authorization have done for a long time to ensure
that they can restore state without catalog lookups.

The trick those two functions use (encoding the info they need
into the string saved by GUC) doesn't look like it scales very
well to the fmgr lookup data that SetClientEncoding needs.
What I think we need to do instead is have SetClientEncoding
never throw away lookup data once it's acquired it, but maintain
its own little cache of looked-up conversion functions that it
can use without doing fresh lookups.

A problem with such caching is that it'd fail to respond to changes in
the content of pg_conversion.  Now the code is already pretty
insensitive in that respect, because if you're not doing any fresh SET
client_encoding commands it won't ever notice changes in that catalog
anyway.  But this'd make it worse.  We could ameliorate the issue
somewhat by doing fresh lookups (and updating the cache) whenever doing
SetClientEncoding with IsTransactionState() true, and only relying on
the cache when IsTransactionState() is false.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] protect dll lib initialisation against any exception, for 8.5

2009-04-01 Thread Tom Lane

Pavel Stehule pavel.steh...@gmail.com writes:
 Maybe an safe minimum is cleaning symbols table without closing
 library. Then the code from lib will be accessible, but functionality
 will be disabled (for Postgres)?

If the library doesn't get added to the list in dfmgr.c, we'll never
look for symbols within it anyway.  So I don't think there's any
particular cleaning to be done --- even assuming that the platform
supports removing symbols without dlclose'ing the library, which
seems rather unlikely.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] protect dll lib initialisation against any exception, for 8.5

2009-04-01 Thread Tom Lane

Greg Stark greg.st...@enterprisedb.com writes:
 Hmm. One case where this logic might not be true would be if the dll  
 relies on c++ style static initializers and destructors. In that case  
 it may very well leave hooks in place in case of an error and only  
 clean them up when you call dlclose().

Interesting point, but considering that we don't support or encourage
use of C++ anyway, it shouldn't carry much weight in our estimate
of how an init function is likely to behave.

Also, wouldn't C++ initializers most likely get called by the dynamic
loader itself, not during the PG_init function?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] question about deparsing const node and its typmod

2009-04-01 Thread Tao Ma

Thank you for your reply to the question. If it was chosen to reproduce the
actual semantics of the expression in various contexts, I think the bpchar
type of 'abc'::bpchar is surprised me. Is it really important to show the
'bpchar' if there is no any explicit casting for the column default value.

I searched for 'default' and 'bpchar' in CVS history and failed
to figure out the reason why the 'bpchar' need to be displayed. Maybe for
the CHECK expression or the default value including operator or function the
category type can be extremely usefuly to help us understand the type conversion
result.

Is it possible to omit the bpchar or something like it, if it is a const node
for the default value?


Tom Lane t...@sss.pgh.pa.us writes:18655.1238258...@sss.pgh.pa.us...
 Tao Ma feng_e...@163.com writes:
 CREATE TABLE t (c1 CHAR(5) DEFAULT 'abc',
   c2 CHAR(5) DEFAULT 'abc'::CHAR(5));

 SELECT pg_get_expr(adbin, adrelid)
 FROM pg_attrdef
 WHERE adrelid = (SELECT oid FROM pg_class WHERE relname = 't');

  pg_get_expr
 -
  'abc'::bpchar
  'abc'::character(5)
 (2 rows)

 so I am courious about is there any possibility to make the default value
 for c1 look like the default value for c2.

 That behavior is very carefully chosen to reproduce the actual semantics
 of the expression in various contexts.  We can't change it just to make
 it look prettier.

 If you check the CVS history of ruleutils.c to see when that logic got
 changed, you should be able to locate pgsql-hackers discussions that
 worked out what the behavior has to be.  I seem to remember that the
 most recent iteration had to do with making sure that ALTER COLUMN TYPE
 had unsurprising side-effects on the column's default.

 regards, tom lane

 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

55 matches

Mail list logo