[PATCHES] [PATCH] WIP: Create shell-types explicitly

2006-02-20 Thread Martijn van Oosterhout
[Please CC any replies, thanks]

Persuant to the recent discussion, here is a patch to allow users to
create shell types by using the symtax:

CREATE TYPE foo;

It was actually much easier than I thought given that normal type
creation creates a shell type just before creating the real type. This
means it works just by stopping after creating the shell type before
creating the real type. This also means that issuing a CREATE TYPE
foo; on an already existing type succeeds without doing anything, just
like structure declarations in C.

Unfortuntly there are some minor issues not quite considered when this
was first brought up. Primarily this:

postgres=# create type text;
CREATE TYPE
postgres=# select * from pg_type where typname = 'text';
 typname | typnamespace | typowner | typlen | typbyval | typtype | typisdefined 
| typdelim | typrelid | typelem | typinput | typoutput | typreceive | typsend  
| typanalyze | typalign | typstorage | typnotnull | typbasetype | typtypmod | 
typndims | typdefaultbin | typdefault 
-+--+--++--+-+--+--+--+-+--+---++--++--+++-+---+--+---+
 text|   11 |   10 | -1 | f| b   | t
| ,|0 |   0 | textin   | textout   | textrecv   | textsend 
| -  | i| x  | f  |   0 |-1 |   
 0 |   | 
 text| 2200 |   10 |  4 | t| p   | f
| ,|0 |   0 | shell_in | shell_out | -  | -
| -  | i| p  | f  |   0 |-1 |   
 0 |   | 
(2 rows)

postgres=# drop type text;
ERROR:  cannot drop type text because it is required by the database system
postgres=# drop type public.text;
DROP TYPE

The first line creates public.text, but the drop tries to delete
pg_catalog.text. I'm not sure which we should make smarter, the create
or the drop, or whether just the error messages need to be made much
clearer as to what's going on.

Other points:
- Changed the shell create function to create a type with the same
parameters as a pseudotype. This should address Tom's issue with code
not paying attention to the fact the type is not complete yet.

- Created two functions shell_in and shell_out persuant to making shell
types look like pseudo types. I however didn't actually create a
pseudotype shell so shell_in actually returns opaque. Do people
have a problem with this?

- I still think it would be useful to require people to create the
shell type and the complete type within the same transaction, if only
to prevent people filling up catalog with useless entries. Shell types
can be dropped as normal, but still...

- Includes documentation updates. Does not include regression tests,
yet.

Comments?

http://svana.org/kleptog/pgsql/shell.diff
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.
Index: doc/src/sgml/xtypes.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/xtypes.sgml,v
retrieving revision 1.25
diff -c -r1.25 xtypes.sgml
*** doc/src/sgml/xtypes.sgml10 Jan 2005 00:04:38 -  1.25
--- doc/src/sgml/xtypes.sgml20 Feb 2006 11:50:06 -
***
*** 168,175 
   /para
  
   para
!   To define the typecomplex/type type, we need to create the
!   user-defined I/O functions before creating the type:
  
  programlisting
  CREATE FUNCTION complex_in(cstring)
--- 168,180 
   /para
  
   para
!   To define the typecomplex/type type, we first declare it as a shell 
type:
! 
! programlisting
! CREATE TYPE complex;
! /programlisting
! 
!   Then we create the user-defined I/O functions needed to create the type:
  
  programlisting
  CREATE FUNCTION complex_in(cstring)
***
*** 193,206 
 LANGUAGE C IMMUTABLE STRICT;
  /programlisting
  
-   Notice that the declarations of the input and output functions must
-   reference the not-yet-defined type.  This is allowed, but will draw
-   warning messages that may be ignored.  The input function must
-   appear first.
   /para
  
   para
!   Finally, we can declare the data type:
  programlisting
  CREATE TYPE complex (
 internallength = 16, 
--- 198,207 
 LANGUAGE C IMMUTABLE STRICT;
  /programlisting
  
   /para
  
   para
!   Finally, we can declare the data type properly:
  programlisting
  CREATE TYPE complex (
 internallength = 16, 
Index: doc/src/sgml/ref/create_type.sgml
===
RCS file: 

Re: [PATCHES] pgcrypto: fix memory leak in openssl.c

2006-02-20 Thread Marko Kreen
On 2/18/06, Marko Kreen [EMAIL PROTECTED] wrote:
 pgcrypto crypt()/md5 and hmac() leak memory when compiled against
 OpenSSL as openssl.c digest -reset will do two DigestInit calls
 against a context.  This happened to work with OpenSSL 0.9.6
 but not with 0.9.7+.

Ugh, seems I read the old code slightly wrong.  The leak happens
also with regular digest(), although it will leak only 1 context
instance, not the 1000+ as the crypt-md5 does.  And on 8.1 there
is pgp_sym_encrypt that also does lots of resets on one context,
like crypt-md5.  In addition it does regular digest() in several
places.  So if compiled against OpenSSL, its leaking everywhere.

The positive side is that only 8.1 has openssl autoconfiguration,
older versions default to builtin algorithms that can be changed
only by editing Makefile, thus most packages are hopefully safe.

--
marko

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] pgcrypto: fix memory leak in openssl.c

2006-02-20 Thread Tom Lane
Marko Kreen [EMAIL PROTECTED] writes:
 On 2/18/06, Marko Kreen [EMAIL PROTECTED] wrote:
 pgcrypto crypt()/md5 and hmac() leak memory when compiled against
 OpenSSL as openssl.c digest -reset will do two DigestInit calls
 against a context.  This happened to work with OpenSSL 0.9.6
 but not with 0.9.7+.

 Ugh, seems I read the old code slightly wrong.  The leak happens
 also with regular digest(), although it will leak only 1 context
 instance, not the 1000+ as the crypt-md5 does.

I'm confused --- does this mean that the patch you sent recently
needs further work?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [PATCH] WIP: Create shell-types explicitly

2006-02-20 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 The first line creates public.text, but the drop tries to delete
 pg_catalog.text.

This is not particularly specific to (or relevant to) shell types.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] pgcrypto: fix memory leak in openssl.c

2006-02-20 Thread Marko Kreen
On 2/20/06, Tom Lane [EMAIL PROTECTED] wrote:
 Marko Kreen [EMAIL PROTECTED] writes:
  On 2/18/06, Marko Kreen [EMAIL PROTECTED] wrote:
  pgcrypto crypt()/md5 and hmac() leak memory when compiled against
  OpenSSL as openssl.c digest -reset will do two DigestInit calls
  against a context.  This happened to work with OpenSSL 0.9.6
  but not with 0.9.7+.

  Ugh, seems I read the old code slightly wrong.  The leak happens
  also with regular digest(), although it will leak only 1 context
  instance, not the 1000+ as the crypt-md5 does.

 I'm confused --- does this mean that the patch you sent recently
 needs further work?

No, it's fine.  As I did not 'fix' old code but replaced it.

It's just that I gave wrong answer to the question 'who is affected?'

--
marko

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] WIP: further sorting speedup

2006-02-20 Thread Simon Riggs
On Sun, 2006-02-19 at 21:40 -0500, Tom Lane wrote:
 After applying Simon's recent sort patch, I was doing some profiling and
 noticed that sorting spends an unreasonably large fraction of its time
 extracting datums from tuples (heap_getattr or index_getattr).  The
 attached patch does something about this by pulling out the leading sort
 column of a tuple when it is received by the sort code or re-read from a
 tape.  This increases the space needed by 8 or 12 bytes (depending on
 sizeof(Datum)) per in-memory tuple, but doesn't cost anything as far as
 the on-disk representation goes.  The effort needed to extract the datum
 at this point is well repaid because the tuple will normally undergo
 multiple comparisons while it remains in memory.  In some quick tests
 the patch seemed to make for a significant speedup, on the order of 30%,
 despite increasing the number of runs emitted because of the smaller
 available memory.

Yeh, this is essentially the cache-the-heapgetattr idea. I'd been trying
to get that to perform by caching the fcinfo values, which was a more
complex way and relied on full key extraction. To my chagrin, I had
great difficulty that way, but now I see the benefit of first-key
extraction explains why. The 30% speedup sounds like my original
expectation. Anyway, kudos to you. 

[I'd stopped working on that to give Tim some space]

 The choice to pull out just the leading column, rather than all columns,
 is driven by concerns of (a) code complexity and (b) memory space.
 Having the extra columns pre-extracted wouldn't buy anything anyway
 in the common case where the leading key determines the result of
 a comparison.

I think key extraction is a good idea, for more reasons than just the
heapgetattr. 

For longer heap rows, putting the whole row through the sort is inferior
to extracting all keys plus a pointer to the tuple, according to:

AlphaSort: A Cache-Sensitive Parallel External
Sort, Nyberg et al, VLDB Journal 4(4): 603-627 (1995)

The above paper makes a good case that full key extraction is a great
idea above a tuple length of 16 bytes, i.e. we don't need to do it for
most CREATE INDEX situations, but it would be very helpful for heap
sorts.

I agree that as long as we are swamped by the cost of heapgetattr, then
it does seem likely that first-key extraction (and keeping it with the
tuple itself) will be a win in most cases over full-key extraction.

Nyberg et al also touch on a further point, which Luke has just
mentioned briefly on list (but we have discussed at further length). Now
that we have dropped the restriction of N=6, giving very large numbers
of runs, this also weakens the argument as to why heapsort is a good
candidate for sort algorithm. The reason for choosing heapsort was that
it gave runs ~2*Size(memory), whereas qsort produces runs only
~Size(memory). But if we have as many runs as we like, then using qsort
is not an issue. Which is good because qsort is faster in practice and
much more importantly, performs better with larger memory: heap sort
seems to suffer from a slow down when you give it *too much* memory. 

Which leaves the conclusion: further tuning of the heapsort mechanism is
*probably* not worthwhile in relation to the run forming stage of
sorting. (The variable nature of the qsort algorithm might seem an
issue, but if we execute it N times for N runs, then we'll get a much
less variable performance from it than we saw on those individual tests
earlier, so the predictability of the heapsort isn't as important a
reason to keep it).

But it seems this patch provides a win that is not dependent upon the
sort algorithm used, so its a win whatever we do.

(We still need the heapsort for the final merge, so I think we still
need to look at tuning of the final merge stage when we have a very
large work_mem setting, or simply limiting the size of the heap used
above a certain point.)

 This is still WIP because it leaks memory intra-query (I need to fix it
 to clean up palloc'd space better).  I thought I'd post it now in case
 anyone wants to try some measurements for their own favorite test cases.

 In particular it would be interesting to see what happens for a
 multi-column sort with lots of duplicated keys in the first column,
 which is the case where the least advantage would be gained.

Hmmm, well it seems clear that there will be an optimum number of keys
to be pre-extracted for any particular sort, though we will not be able
to tell what that is until we are mid-way through the sort. Using
heapsort we wouldn't really have much opportunity to decide to change
the number of columns pre-extracted...if we did use qsort, we could
learn from the last run how many keys to pre-extract.

Incidentally, I do think heapgetattr can be tuned further. When a tuple
has nulls we never use cached offset values. However, we could work out
the firstnullableattno for a table/resultset and keep cached offset
values for all attnum  firstnullableattno. If the tuple 

Re: [PATCHES] plpython: fix memory leak

2006-02-20 Thread Neil Conway
On Sun, 2006-02-19 at 20:34 -0500, Neil Conway wrote:
 Attached is a patch that fixes three Python reference leaks in
 PLy_traceback(): the objects returned by PyErr_Fetch() are owned by the
 caller, so their reference count should be decremented.

Applied to HEAD and back branches. I also noticed a minor bug in
PLy_modify_tuple(): we don't own a reference to `platt', so we shouldn't
try to decrement its refcount. I only bothered fixing the latter bug in
8.0+

-Neil



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [PATCH] WIP: Create shell-types explicitly

2006-02-20 Thread Martijn van Oosterhout
On Mon, Feb 20, 2006 at 10:13:39AM -0500, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  The first line creates public.text, but the drop tries to delete
  pg_catalog.text.
 
 This is not particularly specific to (or relevant to) shell types.

So this is not a show stopper and not something we're particularly
concerned about?

Thanks,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [PATCHES] [PATCH] WIP: Create shell-types explicitly

2006-02-20 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 On Mon, Feb 20, 2006 at 10:13:39AM -0500, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
 The first line creates public.text, but the drop tries to delete
 pg_catalog.text.

 This is not particularly specific to (or relevant to) shell types.

 So this is not a show stopper and not something we're particularly
 concerned about?

I'm not concerned about it.  If you're using the default search path,
you'd see the same behavior anytime you created any object with the same
name as a pg_catalog object.  It's been like that since 7.3, and I don't
recall seeing any complaints from users (as distinct from people trying
to break things ;-)) so I'm not very worried.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] win codepages 1253, 1254, 1255, 1257 and cleanup

2006-02-20 Thread Kris Jurka



On Sat, 18 Feb 2006, Peter Eisentraut wrote:


Kris Jurka wrote:

The attached patch adds support for windows codepages 1253, 1254,
1255, and 1257 and cleans up a bunch of the support utilities.


I've applied this patch but left out the changes to the Japanese
encoding maps, as you suggested.


The Makefile was invoking perl scripts as ./script.pl.  This fails when 
the script is not executable as UCS_to_most.pl is in CVS.  It also won't 
pick up any custom setting of the perl version/location to use.  This 
patch calls perl scripts like $(PERL) $(srcdir)/script.pl.


Kris JurkaIndex: src/backend/utils/mb/Unicode/Makefile
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/mb/Unicode/Makefile,v
retrieving revision 1.10
diff -c -r1.10 Makefile
*** src/backend/utils/mb/Unicode/Makefile   18 Feb 2006 16:15:22 -  
1.10
--- src/backend/utils/mb/Unicode/Makefile   20 Feb 2006 22:51:37 -
***
*** 69,93 
  all: $(MAPS)
  
  $(GENERICMAPS) : $(GENERICTEXTS)
!   ./UCS_to_most.pl
  
  euc_jp_to_utf8.map utf8_to_euc_jp.map : JIS0201.TXT JIS0208.TXT JIS0212.TXT
!   ./UCS_to_EUC_JP.pl
  
  euc_cn_to_utf8.map utf8_to_euc_cn.map : GB2312.TXT
!   ./UCS_to_EUC_CN.pl
  
  euc_kr_to_utf8.map utf8_to_euc_kr.map : KSX1001.TXT
!   ./UCS_to_EUC_KR.pl
  
  euc_tw_to_utf8.map utf8_to_euc_tw.map : CNS11643.TXT
!   ./UCS_to_EUC_TW.pl
  
  sjis_to_utf8.map utf8_to_sjis.map : CP932.TXT
!   ./UCS_to_SJIS.pl
  
  gb18030_to_utf8.map  utf8_to_gb18030.map : ISO10646-GB18030.TXT
!   ./UCS_to_GB18030.pl
  clean:
rm -f $(MAPS)
  
--- 69,93 
  all: $(MAPS)
  
  $(GENERICMAPS) : $(GENERICTEXTS)
!   $(PERL) $(srcdir)/UCS_to_most.pl
  
  euc_jp_to_utf8.map utf8_to_euc_jp.map : JIS0201.TXT JIS0208.TXT JIS0212.TXT
!   $(PERL) $(srcdir)/UCS_to_EUC_JP.pl
  
  euc_cn_to_utf8.map utf8_to_euc_cn.map : GB2312.TXT
!   $(PERL) $(srcdir)/UCS_to_EUC_CN.pl
  
  euc_kr_to_utf8.map utf8_to_euc_kr.map : KSX1001.TXT
!   $(PERL) $(srcdir)/UCS_to_EUC_KR.pl
  
  euc_tw_to_utf8.map utf8_to_euc_tw.map : CNS11643.TXT
!   $(PERL) $(srcdir)/UCS_to_EUC_TW.pl
  
  sjis_to_utf8.map utf8_to_sjis.map : CP932.TXT
!   $(PERL) $(srcdir)/UCS_to_SJIS.pl
  
  gb18030_to_utf8.map  utf8_to_gb18030.map : ISO10646-GB18030.TXT
!   $(PERL) $(srcdir)/UCS_to_GB18030.pl
  clean:
rm -f $(MAPS)
  

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] WIP: further sorting speedup

2006-02-20 Thread Michael Glaesemann


On Feb 21, 2006, at 3:45 , Simon Riggs wrote:


On Sun, 2006-02-19 at 21:40 -0500, Tom Lane wrote:
After applying Simon's recent sort patch, I was doing some  
profiling and
noticed that sorting spends an unreasonably large fraction of its  
time

extracting datums from tuples (heap_getattr or index_getattr).  The
attached patch does something about this by pulling out the  
leading sort
column of a tuple when it is received by the sort code or re-read  
from a

tape.


snip /

The choice to pull out just the leading column, rather than all  
columns,

is driven by concerns of (a) code complexity and (b) memory space.
Having the extra columns pre-extracted wouldn't buy anything anyway
in the common case where the leading key determines the result of
a comparison.


snip /

I agree that as long as we are swamped by the cost of heapgetattr,  
then

it does seem likely that first-key extraction (and keeping it with the
tuple itself) will be a win in most cases over full-key extraction.


Most of this is way above my head, but I'm trying to follow along:  
when you say first key and full key, are these related to relation  
keys (e.g., primary key) or attributes that are used in sorting  
(regardless of whether they're a key or not)? I notice Tom used the  
term leading [sort] column, which I read to mean the first  
attribute used to sort the relation (for whichever purpose, e.g.,  
mergejoins, order-by clauses). I'll see if I can't find the Nyberg  
paper as well to learn a bit more. (I haven't been sleeping well  
recently.)


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] WIP: further sorting speedup

2006-02-20 Thread Tom Lane
Michael Glaesemann [EMAIL PROTECTED] writes:
 Most of this is way above my head, but I'm trying to follow along:  
 when you say first key and full key, are these related to relation  
 keys (e.g., primary key) or attributes that are used in sorting  
 (regardless of whether they're a key or not)? I notice Tom used the  
 term leading [sort] column, which I read to mean the first  
 attribute used to sort the relation (for whichever purpose, e.g.,  
 mergejoins, order-by clauses).

Right, it's whatever is the sort key for this particular sort.
You could have SELECT ... ORDER BY foo,bar,baz, or you could
have construction of a multi-column btree index.  In either case,
the sort module is given a set of tuples and told to sort by
certain specified column(s) of those tuples.  What I saw in
profiling was that a large fraction of the CPU time was going
into heap_getattr (or index_getattr, for the index-tuple case)
calls to extract Datums for the sort columns.  The Datums are
then passed to the data-type-specific comparison functions,
such as btint4cmp.  In the original code we did this every time
we compared two tuples.  But a tuple is normally compared multiple
times during a sort (about logN times, in fact), so it makes sense
to do the Datum extraction just once and save the value to use
in comparisons.  The question at hand here is whether to pre-extract
Datums for each column when there are multiple sort columns, or
just extract the first column (which is often all you need for
a comparison anyway).  I think the one-column approach wins
because it keeps the sort data associated with a tuple fixed-size.
Extracting all columns would require a more complex data structure
... plus it would take more memory, and memory space is at a premium
here.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] WIP: further sorting speedup

2006-02-20 Thread Michael Glaesemann


On Feb 21, 2006, at 14:24 , Tom Lane wrote:


Michael Glaesemann [EMAIL PROTECTED] writes:

Most of this is way above my head, but I'm trying to follow along:



Right, it's whatever is the sort key for this particular sort.



Thanks, Tom. I think I may actually be starting to understand this a  
bit!


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 6: explain analyze is your friend