Re: [PATCHES] Win32 shmem

2007-03-21 Thread Magnus Hagander
On Tue, Mar 20, 2007 at 12:12:45PM -0400, Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
  I think you do still need the on_shmem_exit detach callback.
  
  Ok, will look into that. Haven't tested that scenario.
 
  That was indeed so. Added in new version, attached.
 
 If it handles the restart-after-backend-crash scenario and correctly
 locks out starting a fresh postmaster (after postmaster crash) until
 all old backends are gone, then it's probably ready to commit for
 more-widespread testing.

It does, at least in my tests. I have found one thing that needs to be
chagned for terminal server sessions, and then I need to update the build
system to use it on mingw as well. Will do that and then commit.

 I note that sysv_shmem contains some #ifdef WIN32 and #ifdef __CYGWIN__
 kluges; will it now be possible to remove those, or will the Cygwin
 build still be using that code?

I *think* so. I *think* the CYGWIN port does not rely on #ifdef WIN32s
anymore (which is corret given that it's not really win32). If I do a grep
of the sourcecode, I get a bunch of things like
./utils/mmgr/mcxt.c:#if defined(WIN32) || defined(__CYGWIN__)
which would indicate that at least some places know they're different.

I can include removal of those in my change, but I'm not in a position to
test them myself. I guess we could do it and see if the buildfarm breaks,
and if that revert it.

//Magnus


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] patch adding new regexp functions

2007-03-21 Thread Gregory Stark

Jeremy Drake [EMAIL PROTECTED] writes:

 BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros
 for those also?

Hardcoding -1 for typlen of varlenas is one of the few (the only?) magic
constants used throughout the source code. I'm surprised there isn't a macro
for it though.

Do you need the alignment? If so I want to check the code against the packed
varlena patch. Just in case.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] patch adding new regexp functions

2007-03-21 Thread Gregory Stark

Gregory Stark [EMAIL PROTECTED] writes:

 Jeremy Drake [EMAIL PROTECTED] writes:

 BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros
 for those also?

 Hardcoding -1 for typlen of varlenas is one of the few (the only?) magic
 constants used throughout the source code. I'm surprised there isn't a macro
 for it though.

 Do you need the alignment? If so I want to check the code against the packed
 varlena patch. Just in case.

Ah, it's just to construct an array, that's not a concern at all. And you're
detoasting the text data types before using or storing them so that's fine.


The only thing I would say is that this should maybe be a TextGetDatum() just
for code hygiene. It should be exactly equivalent though:

+ PointerGetDatum(matchctx-orig_str),

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] patch adding new regexp functions

2007-03-21 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes:
 BTW, should I be calling
 get_typlenbyvalalign on TEXTOID or are there macros for those also?

By and large we tend to hard-wire those properties too, eg there are
plenty of places that do things like this:

/* XXX: this hardcodes assumptions about the regtype type */
result = construct_array(tmp_ary, num_params, REGTYPEOID, 4, true, 'i');

Some are better commented than others ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-21 Thread Simon Riggs
On Tue, 2007-03-20 at 18:20 +, Heikki Linnakangas wrote:
 This patch makes CLUSTER MVCC-safe. Visibility information and update 
 chains are preserved like in VACUUM FULL.

Sounds good.

 CLUSTER is currently the only user of the facility, but I'm envisioning 
 we might have other users in the future. For example, a version of 
 VACUUM FULL that rewrites the whole table. 

I would be very much in favour of that, as discussed on -hackers.

There might be some requirement for the older VACUUM FULL behaviour, so
I'd like to suggest the syntax:

VACUUM FULL tablename [REPLACE | PRESERVE [STORAGE]];

where VACUUM FULL foo REPLACE STORAGE would be the new default, using
your new functions, and PRESERVE STORAGE would implement the old method.

 One complication in the implementation was the fact that heap_insert 
 overwrites the visibility information, and it doesn't write the full 
 tuple header to WAL. I ended up implementing a special-purpose 
 raw_heap_insert function instead, which is optimized for bulk inserting 
 a lot of tuples, knowing that we have exclusive access to the heap. 
 raw_heap_insert keeps the current buffer locked over calls, until it 
 gets full, and inserts the whole page to WAL as a single record using 
 the existing XLOG_HEAP_NEWPAGE record type.

I submitted Fast CLUSTER patch earlier which avoided writing WAL in the
same way that has been done for COPY, CREATE INDEX and CTAS. Would you
like to update your patch to do this also, or would you like me to
re-write the patch to fit with yours?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-21 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Heikki Linnakangas wrote:
 This patch makes CLUSTER MVCC-safe. Visibility information and update 
 chains are preserved like in VACUUM FULL.
 
 I created a new generic rewriteheap-facility to handle rewriting tables 
 in a visibility-preserving manner. All the update chain tracking is done 
 in rewriteheap.c, the caller is responsible for supplying the stream of 
 tuples.
 
 CLUSTER is currently the only user of the facility, but I'm envisioning 
 we might have other users in the future. For example, a version of 
 VACUUM FULL that rewrites the whole table. We could also use it to make 
 ALTER TABLE MVCC-safe, but there's some issues with that. For example, 
 what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.
 
 One complication in the implementation was the fact that heap_insert 
 overwrites the visibility information, and it doesn't write the full 
 tuple header to WAL. I ended up implementing a special-purpose 
 raw_heap_insert function instead, which is optimized for bulk inserting 
 a lot of tuples, knowing that we have exclusive access to the heap. 
 raw_heap_insert keeps the current buffer locked over calls, until it 
 gets full, and inserts the whole page to WAL as a single record using 
 the existing XLOG_HEAP_NEWPAGE record type.
 
 This makes CLUSTER a more viable alternative to VACUUM FULL. One 
 motivation for making CLUSTER MVCC-safe is that if some poor soul runs 
 pg_dump to make a backup concurrently with CLUSTER, the clustered tables 
 will appear to be empty in the dump file.
 
 The documentation doesn't anything about CLUSTER not being MVCC-safe, so 
 I suppose there's no need to touch the docs. I sent a doc patch earlier 
 to add a note about it, that doc patch should still be applied to older 
 release branches, IMO.
 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com


 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] patch adding new regexp functions

2007-03-21 Thread Jeremy Drake
On Wed, 21 Mar 2007, Tom Lane wrote:

 Jeremy Drake [EMAIL PROTECTED] writes:
  BTW, should I be calling
  get_typlenbyvalalign on TEXTOID or are there macros for those also?

 By and large we tend to hard-wire those properties too, eg there are
 plenty of places that do things like this:

 /* XXX: this hardcodes assumptions about the regtype type */
 result = construct_array(tmp_ary, num_params, REGTYPEOID, 4, true, 'i');

 Some are better commented than others ...

So, what action (if any) should be taken?  Should all (or some) of these
values be hardcoded, or should the current calls to determine them be left
in place?



   regards, tom lane

 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org



-- 
Fortune's Real-Life Courtroom Quote #19:

Q:  Doctor, how many autopsies have you performed on dead people?
A:  All my autopsies have been performed on dead people.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Stats processor not restarting

2007-03-21 Thread Bruce Momjian
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
  Bah, sorry about the noise. It was the effect of
  PGSTAT_RESTART_INTERVAL.
  Do we want to add some logging when we don't restart it due to repeated
  failures?
 
 Not really, but maybe it would be sensible to reset last_pgstat_start_time
 when doing a database-wide restart?  The motivation for the timeout was
 to reduce cycle wastage if pgstat crashed by itself, but when you've
 deliberately SIGQUITed it, that hardly seems to apply ...

You mean like this, attached?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/postmaster/pgstat.c
===
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.149
diff -c -c -r1.149 pgstat.c
*** src/backend/postmaster/pgstat.c 16 Mar 2007 17:57:36 -  1.149
--- src/backend/postmaster/pgstat.c 22 Mar 2007 02:03:31 -
***
*** 1929,1934 
--- 1929,1937 
  static void
  pgstat_exit(SIGNAL_ARGS)
  {
+   /* allow stats to restart immediately after SIGQUIT */
+   last_pgstat_start_time = 0;
+ 
need_exit = true;
  }
  

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Stats processor not restarting

2007-03-21 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Not really, but maybe it would be sensible to reset last_pgstat_start_time
 when doing a database-wide restart?

 You mean like this, attached?

I'd be fairly surprised if resetting the variable in the child process
had any effect on the postmaster...

I think a correct fix would involve exposing either the variable or a
routine to zero it from pgstat.c, and having postmaster.c do that at
the points where it intentionally SIGQUITs the stats collector.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] patch adding new regexp functions

2007-03-21 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes:
 On Wed, 21 Mar 2007, Tom Lane wrote:
 By and large we tend to hard-wire those properties too, eg there are
 plenty of places that do things like this:

 So, what action (if any) should be taken?  Should all (or some) of these
 values be hardcoded, or should the current calls to determine them be left
 in place?

I'd vote for making this new code look like the rest of it, to wit
hardwire the values.  As-is, you're expending code space and cycles
on flexibility that's entirely illusory --- if we ever decided to
change these properties of TEXT, we'd have way more work to do than
just fixing the new regexp functions.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] patch adding new regexp functions

2007-03-21 Thread Jeremy Drake
On Wed, 21 Mar 2007, Gregory Stark wrote:

 The only thing I would say is that this should maybe be a TextGetDatum() just
 for code hygiene. It should be exactly equivalent though:

I could not find such a thing using ctags, nor TextPGetDatum().  I looked
at PG_RETURN_TEXT_P and it just uses PG_RETURN_POINTER, which in turn uses
PointerGetDatum.  If there is such a thing, it is well camoflaged...

-- 
If you think the problem is bad now, just wait until we've solved it.
-- Arthur Kasspe

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] patch adding new regexp functions

2007-03-21 Thread Jeremy Drake
On Thu, 22 Mar 2007, Tom Lane wrote:

 I'd vote for making this new code look like the rest of it, to wit
 hardwire the values.

Attached please find a patch which does this.



-- 
Although written many years ago, Lady Chatterley's Lover has just been
reissued by the Grove Press, and this pictorial account of the
day-to-day life of an English gamekeeper is full of considerable
interest to outdoor minded readers, as it contains many passages on
pheasant-raising, the apprehending of poachers, ways to control vermin,
and other chores and duties of the professional gamekeeper.
Unfortunately, one is obliged to wade through many pages of extraneous
material in order to discover and savour those sidelights on the
management of a midland shooting estate, and in this reviewer's opinion
the book cannot take the place of J. R. Miller's Practical
Gamekeeping.
-- Ed Zern, Field and Stream (Nov. 1959)Index: src/backend/utils/adt/regexp.c
===
RCS file: 
/home/jeremyd/local/postgres/cvsuproot/pgsql/src/backend/utils/adt/regexp.c,v
retrieving revision 1.70
diff -c -r1.70 regexp.c
*** src/backend/utils/adt/regexp.c  20 Mar 2007 05:44:59 -  1.70
--- src/backend/utils/adt/regexp.c  22 Mar 2007 05:17:15 -
***
*** 30,35 
--- 30,36 
  #include postgres.h
  
  #include access/heapam.h
+ #include catalog/pg_type.h
  #include funcapi.h
  #include regex/regex.h
  #include utils/builtins.h
***
*** 95,106 
size_toffset;
  
re_comp_flags flags;
- 
-   /* text type info */
-   Oid   param_type;
-   int16 typlen;
-   bool  typbyval;
-   char  typalign;
  } regexp_matches_ctx;
  
  typedef struct regexp_split_ctx
--- 96,101 
***
*** 835,845 
matchctx-pmatch = palloc(sizeof(regmatch_t) * 
(matchctx-cpattern-re_nsub + 1));
matchctx-offset = 0;
  
-   /* get text type oid, too lazy to do it some other way */
-   matchctx-param_type = get_fn_expr_argtype(fcinfo-flinfo, 0);
-   get_typlenbyvalalign(matchctx-param_type, matchctx-typlen,
-matchctx-typbyval, 
matchctx-typalign);
- 
matchctx-wide_str = palloc(sizeof(pg_wchar) * (matchctx-orig_len + 
1));
matchctx-wide_len = pg_mb2wchar_with_len(VARDATA(matchctx-orig_str),

  matchctx-wide_str, matchctx-orig_len);
--- 830,835 
***
*** 915,923 
dims[0] = 1;
}
  
return construct_md_array(elems, nulls, ndims, dims, lbs,
! matchctx-param_type, 
matchctx-typlen,
! matchctx-typbyval, 
matchctx-typalign);
  }
  
  Datum
--- 905,913 
dims[0] = 1;
}
  
+   /* XXX: this hardcodes assumptions about the text type */
return construct_md_array(elems, nulls, ndims, dims, lbs,
! TEXTOID, -1, false, 
'i');
  }
  
  Datum
***
*** 976,991 
  {
ArrayBuildState *astate = NULL;
regexp_split_ctx*splitctx;
-   Oid  param_type;
int  nitems;
  
splitctx = setup_regexp_split(PG_GETARG_TEXT_P(0),
  
PG_GETARG_TEXT_P(1),
  
PG_GETARG_TEXT_P_IF_EXISTS(2));
  
-   /* get text type oid, too lazy to do it some other way */
-   param_type = get_fn_expr_argtype(fcinfo-flinfo, 0);
- 
for (nitems = 0; splitctx-offset  splitctx-wide_len; nitems++)
{
if (nitems  splitctx-wide_len)
--- 966,977 
***
*** 995,1001 
astate = accumArrayResult(astate,
  
get_next_split(splitctx),
  false,
! param_type,
  
CurrentMemoryContext);
}
  
--- 981,987 
astate = accumArrayResult(astate,
  
get_next_split(splitctx),
  false,
! TEXTOID,
  
CurrentMemoryContext);
}
  

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] patch adding new regexp functions

2007-03-21 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes:
 I could not find such a thing using ctags, nor TextPGetDatum().  I looked
 at PG_RETURN_TEXT_P and it just uses PG_RETURN_POINTER, which in turn uses
 PointerGetDatum.  If there is such a thing, it is well camoflaged...

AFAIR, the reason there's no TextPGetDatum (and ditto for lots of other
datatypes) is lack of obvious usefulness.  A function dealing with a
text * doesn't normally have reason to convert that to a Datum until
it returns --- and at that point PG_RETURN_TEXT_P is the thing to use.
Do you have a counterexample, or does this just suggest that the regexp
function patch needs some refactoring?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PATCHES] LIKE optimization in UTF-8 and locale-C

2007-03-21 Thread ITAGAKI Takahiro
Hello,

I found LIKE operators are slower on multi-byte encoding databases
than single-byte encoding ones. It comes from difference between
MatchText() and MBMatchText().

We've had an optimization for single-byte encodings using 
pg_database_encoding_max_length() == 1 test. I'll propose to extend it
in UTF-8 with locale-C case. All of trailing bytes are different from first
bytes in UTF-8 multi-byte characters, so we can use functions for single-bytes
and byte-wise comparison in the case. With the attached patch, the performance
of UTF-8 LIKE operators are pushed up to near other single-bytes encodings.

Databases initialized with locale-C are widely used in Japan, because
Japanese locale are broken in almost of platforms. Japanese user can
choose EUC-jp or UTF-8 as a server encoding, but I think UTF-8 will be
more and more used in the future.


 test 

initdb --no-locale --encoding=encoding

[HEAD]
SQL_ASCII :  7171 ms /  7203 ms /  7187 ms
LATIN1:  7172 ms /  7156 ms /  7141 ms
UTF8  : 16235 ms / 16281 ms / 16281 ms
EUC_JP: 17454 ms / 17453 ms / 17438 ms

[with patch]
SQL_ASCII :  7062 ms /  7125 ms /  7125 ms
LATIN1:  7047 ms /  7063 ms /  7047 ms
UTF8  :  7188 ms /  7234 ms /  7235 ms
EUC_JP: 17468 ms / 17453 ms / 17453 ms


CREATE OR REPLACE FUNCTION test() RETURNS INTEGER AS $$
DECLARE
cnt integer;
BEGIN
FOR i IN 1..1000 LOOP
SELECT count(*) INTO cnt FROM item WHERE i_title LIKE 
'%BABABABABARIBA%' LIMIT 50;
END LOOP;
RETURN cnt;
END;
$$ LANGUAGE plpgsql;

SELECT count(*) FROM item; -- borrowed from DBT-1 (TPC-W)
 count 
---
 1
(1 row)

 patch 

Index: src/backend/utils/adt/like.c
===
--- src/backend/utils/adt/like.c(head)
+++ src/backend/utils/adt/like.c(working copy)
@@ -21,6 +21,7 @@
 
 #include mb/pg_wchar.h
 #include utils/builtins.h
+#include utils/pg_locale.h
 
 
 #define LIKE_TRUE  1
@@ -119,6 +120,13 @@
 
 
 /*
+ * true iff match functions for single-byte characters are available.
+ */
+#define sb_match_available() \
+   (pg_database_encoding_max_length() == 1 || \
+(lc_collate_is_c()  GetDatabaseEncoding() == PG_UTF8))
+
+/*
  * interface routines called by the function manager
  */
 
@@ -138,7 +146,7 @@
p = VARDATA(pat);
plen = (VARSIZE(pat) - VARHDRSZ);
 
-   if (pg_database_encoding_max_length() == 1)
+   if (sb_match_available())
result = (MatchText(s, slen, p, plen) == LIKE_TRUE);
else
result = (MBMatchText(s, slen, p, plen) == LIKE_TRUE);
@@ -162,7 +170,7 @@
p = VARDATA(pat);
plen = (VARSIZE(pat) - VARHDRSZ);
 
-   if (pg_database_encoding_max_length() == 1)
+   if (sb_match_available())
result = (MatchText(s, slen, p, plen) != LIKE_TRUE);
else
result = (MBMatchText(s, slen, p, plen) != LIKE_TRUE);
@@ -186,7 +194,7 @@
p = VARDATA(pat);
plen = (VARSIZE(pat) - VARHDRSZ);
 
-   if (pg_database_encoding_max_length() == 1)
+   if (sb_match_available())
result = (MatchText(s, slen, p, plen) == LIKE_TRUE);
else
result = (MBMatchText(s, slen, p, plen) == LIKE_TRUE);
@@ -210,7 +218,7 @@
p = VARDATA(pat);
plen = (VARSIZE(pat) - VARHDRSZ);
 
-   if (pg_database_encoding_max_length() == 1)
+   if (sb_match_available())
result = (MatchText(s, slen, p, plen) != LIKE_TRUE);
else
result = (MBMatchText(s, slen, p, plen) != LIKE_TRUE);
@@ -275,7 +283,7 @@
int slen,
plen;
 
-   if (pg_database_encoding_max_length() == 1)
+   if (sb_match_available())
{
s = NameStr(*str);
slen = strlen(s);
@@ -316,7 +324,7 @@
int slen,
plen;
 
-   if (pg_database_encoding_max_length() == 1)
+   if (sb_match_available())
{
s = NameStr(*str);
slen = strlen(s);
@@ -357,7 +365,7 @@
int slen,
plen;
 
-   if (pg_database_encoding_max_length() == 1)
+   if (sb_match_available())
{
s = VARDATA(str);
slen = (VARSIZE(str) - VARHDRSZ);
@@ -393,7 +401,7 @@
int slen,
plen;
 
-   if (pg_database_encoding_max_length() == 1)
+   if (sb_match_available())
{
s = VARDATA(str);
slen = (VARSIZE(str) - VARHDRSZ);
@@ -429,7 +437,7 @@
text   *esc = PG_GETARG_TEXT_P(1);
text   *result;
 
-   if (pg_database_encoding_max_length() == 1)
+   if (sb_match_available())
result = do_like_escape(pat, esc);
else
   

Re: [PATCHES] patch adding new regexp functions

2007-03-21 Thread Jeremy Drake
On Thu, 22 Mar 2007, Tom Lane wrote:

 AFAIR, the reason there's no TextPGetDatum (and ditto for lots of other
 datatypes) is lack of obvious usefulness.  A function dealing with a
 text * doesn't normally have reason to convert that to a Datum until
 it returns --- and at that point PG_RETURN_TEXT_P is the thing to use.
 Do you have a counterexample, or does this just suggest that the regexp
 function patch needs some refactoring?

If you are asking why I have reason to convert text * to a Datum in cases
other than PG_RETURN_TEXT_P, it is used for calling text_substr functions
using DirectFunctionCallN.  BTW, this usage of text_substr using
PointerGetDatum was copied from the pre-existing textregexsubstr function.


-- 
Malek's Law:
Any simple idea will be worded in the most complicated way.

---(end of broadcast)---
TIP 6: explain analyze is your friend