Re: [HACKERS] Sheduling in SQL

2001-03-15 Thread Hannu Krosing

Tom Lane wrote:

 
 I've built applications that do roughly this sort of thing in Postgres
 (some of the protocol changes in 6.4 were done to make it easier ;-)).

I may misremember, but IIRC some older protocol (or at least libpq) 
returned 0 as backend pid to listening client if it was notified by itself.

Currently it returns the actual pid for any backend. Is this what you 
changed?

Anyhow we need some _documented_ way to get backend pid (there is one 
actually received and stored with "cookie" for Ctrl-C processing, but 
AFAIK it is neither documented as being the backend id nor is there a 
function to get at it).

For my own use I created a C function pid() but perhaps there should be 
something mainstream for this.

---
Hannu


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Partial index on date column

2003-03-07 Thread Hannu Krosing
Christopher Kings-Lynne kirjutas R, 07.03.2003 kell 07:28:
 Yeah, it's not really a problem for me, I just put the extra clause in.
 
 Is indexing excluding NULLs a common application of partial indexes? 

For me it is ;)

 It's
 basically all I use it for, when a column has like 90-95% NULLS and I want
 to exclude them from the index.  

 Is it worth hard-coding in the IS NOT NULL case?

I'd vote for it.


Hannu

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] PostgreSQL and SOAP, suggestions?

2003-03-31 Thread Hannu Krosing
mlw kirjutas E, 31.03.2003 kell 03:43:
 Given a HTTP formatted query:
 GET http://localhost:8181/pgmuze?query=select+*+from+zsong+limit+2;
 
 The output is entered below.
 
 Questions:
 Is there a way, without spcifying a binary cursor, to get the data types 
 associated with columns? Right now I am just using undefined, as the 
 ODBC version works.
 
 Anyone see any basic improvements needed?
 
 ?xml version = 1.0?
 soap:Envelope xmlns:MWSSQL=http://www.mohawksoft.com/MWSSQL/envelope;
  soap:Header
   !-- Fields in set --
   Columns count=9

The SOAP 1.1 spec specifies (p4.2) the following about SOAP Header:

The encoding rules for header entries are as follows: 

 1. A header entry is identified by its fully qualified element
name, which consists of the namespace URI and the local name.
All immediate child elements of the SOAP Header element MUST be
namespace-qualified. 

I'm not sure that SOAP Header is the right place for Query header info,
as the header is meant for:


  SOAP provides a flexible mechanism for extending a message in a 
  decentralized and modular way without prior knowledge between the 
  communicating parties. Typical examples of extensions that can be 
  implemented as header entries are authentication, transaction 
  management, payment etc.

So the definition of structure should probably be inside SOAP:Body .

---
Hannu


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] PostgreSQL and SOAP, suggestions?

2003-03-31 Thread Hannu Krosing
[EMAIL PROTECTED] kirjutas E, 31.03.2003 kell 19:52:
 Actually, as far as I am aware, the header is for metadata, i.e. it is the
 place to describe the data being returned.

Did you read the SOAP spec ?

 The description of the fields
 isn't the actual data retrieved, so it doesn't belong in the body, so it
 should go into the header.

That is logical, but this is not what the spec tells.

Also the spec requires immediate child elements of SOAP:Header to have
full namespace URI's.

And another question - why do you have the namespace MWSSQL defined but
never used ?

-
Hannu


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] PostgreSQL and SOAP, suggestions?

2003-04-02 Thread Hannu Krosing
mlw kirjutas T, 01.04.2003 kell 15:29:
 Hannu Krosing wrote:
 
 [EMAIL PROTECTED] kirjutas E, 31.03.2003 kell 19:52:
   
 
 Actually, as far as I am aware, the header is for metadata, i.e. it is the
 place to describe the data being returned.
 
 
 
 Did you read the SOAP spec ?
 
 yes

???


What you have come up with _is_not_ a SOAP v1.1 message at all. It does
use some elements with similar names but from different namespace.

the SOAP Envelope, Header and Body elemants must be from namespace
http://schemas.xmlsoap.org/soap/envelope/

Per section 3 paragraph 2 of SOAP spec a conforming SOAP processor MUST
discard a message that has incorrect namespace.

 ?xml version = 1.0?
 mwssql:Envelope xmlns:mwssql=http://www.mohawksoft.com/mwssql/envelope;
  mwssql:Header

The SOAP-ENV:Header is a generic mechanism for adding features to a
SOAP message in a decentralized manner without prior agreement between
the communicating parties. SOAP defines a few attributes that can be
used to indicate who should deal with a feature and whether it is
optional or mandatory (see section 4.2).

The SOAP-ENV:Body is a container for mandatory information intended
for the ultimate recipient of the message (see section 4.3). SOAP
defines one element for the body, which is the Fault element used for
reporting errors.


The Header element is encoded as the first immediate child element of
the SOAP Envelope XML element. All immediate child elements of the
Header element are called header entries.

The encoding rules for header entries are as follows: 

 1. A header entry is identified by its fully qualified element
name, which consists of the namespace URI and the local name.
All immediate child elements of the SOAP Header element MUST be
namespace-qualified.

...

An example is a header with an element identifier of Transaction, a
mustUnderstand value of 1, and a value of 5. This would be encoded
as follows:

SOAP-ENV:Header
   t:Transaction
  xmlns:t=some-URI SOAP-ENV:mustUnderstand=1
  5
   /t:Transaction
/SOAP-ENV:Header

   exec:sqlupdate cgrpairs set ratio=0 where srcitem=100098670/exec:sql
   exec:affected2657/exec:affected
   qry:sqlselect * from ztitles limit 2/qry:sql
   qry:ROWSET
qry:ROW columns=28

where are namespaces exec:, qry: abd t: defined ?


Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] PostgreSQL and SOAP, suggestions?

2003-04-02 Thread Hannu Krosing
mlw kirjutas K, 02.04.2003 kell 15:56:
 Hannu Krosing wrote:
 
 What you have come up with _is_not_ a SOAP v1.1 message at all. It does
 use some elements with similar names but from different namespace.
 
 the SOAP Envelope, Header and Body elemants must be from namespace
 http://schemas.xmlsoap.org/soap/envelope/
 
 [snip]
 Hmm, I read SHOULD and MAY in the spec, assuming that it was not 
 MUST are you saying it is invalid if I do not use the SOAP URIs for 
 the name spaces? If so, no big deal, I'll change them.

AFAICS you can _leave_out_ the namespace, but not put in another,
nonconforming namespace.

 As for defining the namespaces, yea that's easy enough, just tack on an 
 attribute.
 
 I still don't see where putting the field definitions in the soap header 
 is an invalid use of that space.

It is not strictly nonconforming, just not the intended use of
transparently adding new info:

  4.2 SOAP Header

  SOAP provides a flexible mechanism for extending a message in a
  decentralized and modular way without prior knowledge between the
  communicating parties. Typical examples of extensions that can be
  implemented as header entries are authentication, transaction
  management, payment etc.

I.e. the intended use of *SOAP* Header is *not* defining the structure
of the message but is rather something similar to e-mail (rfc822)
Headers.

The XML way of defining a message is using a DTD, XML-schema, Relax NG
schema or somesuch, either embedded (forbidden for DTD's in SOAP) or
referenced.

Also for me the following:

  The Header element is encoded as the first immediate child element of
  the SOAP Envelope XML element. All immediate child elements of the
  Header element are called header entries.

  The encoding rules for header entries are as follows: 

 1. A header entry is identified by its fully qualified element
name, which consists of the namespace URI and the local name.
All immediate child elements of the SOAP Header element MUST be
namespace-qualified.

describes an element with a full embedded URI, not just
namespace-qualified tagname, but I may be reading it wrong and the
namespace could be defined at outer level. But defining namespace at the
outer level is counterintuitive for cases where the header element is to
be processed and removed by some SOAP intermediary.

Also this seems to support *not* using Header for essensial structure
definitions:

  4.3.1 Relationship between SOAP Header and Body

  While the Header and Body are defined as independent elements, they
  are in fact related. The relationship between a body entry and a
  header entry is as follows: A body entry is semantically equivalent to
  a header entry intended for the default actor and with a SOAP
  mustUnderstand attribute with a value of 1. The default actor is
  indicated by not using the actor attribute (see section 4.2.2).

This suggests that putting the structure definition as 1-st Body element
and data as second would be equivalent to putting structure in Header

-
Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] PostgreSQL and SOAP, suggestions?

2003-04-03 Thread Hannu Krosing
[EMAIL PROTECTED] kirjutas N, 03.04.2003 kell 02:01:
 mlw wrote:
  I think you are interpreting the spec a bit too restrictively. The 
  syntax is fairly rigid, but the spec has a great degree of flexibility. 
  I agree that, syntactically, it must work through a parser, but there is 
  lots of room to be flexible.
 
 This is /exactly/ the standard problem with SOAP.
 
 There is enough flexibility that there are differing approaches
 associated, generally speaking, with IBM versus Microsoft whereby it's
 easy to generate SOAP requests that work fine with one that break with
 the other.

Do you know of some:

a) standard conformance tests

b) recommended best practices for being compatible with all mainstream
implementations (I'd guess a good approach would be to generate very
strictly conformant code but accept all that you can, even if against
pedantic reading of the spec)

-
Hannu


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] updateable cursors visibility

2003-03-31 Thread Hannu Krosing
Hiroshi Inoue kirjutas E, 31.03.2003 kell 03:40:
 Tom Lane wrote:
  
  Serializable or not, there is a good case for saying that cursors don't
  see changes made after they are opened, period.  The current
  implementation locks down the cursor's snapshot at DECLARE time. 
 
 It's only because PostgreSQL( or other MVCC based DBMS) is
 easy and natural to implement cursors in such a way. However,
 It isn't a requirement of the SQL standard, IIRC.
 
 As for ODBC, ODBC has the following cursor types about the
 visibility of other changes.
 1) static
It never detects other changes.

This seems the clearest ?

 2) dynamic
It can detect any changes made to the membership, order,
and values of the result set after the cursor is opened.

What would it mean in practice, i.e. if you are on the first row in the
cursor and then update tha ORDER BY field so that your row becomes the
last one, will the next FETCH move the cursor past end ?

what happens, if the row you are on is removed from the keyset, either
by current or any other backend ?

What about the case when you have moved past the last row, and suddenly
a new row appears which is positioned after the last row ?

What about when you are moving over several rows that have the same
ordering position and then one more appears - should it go before or
after the current position ?

 3) keyset-driven
It always detects changes to the values of rows.

What about _new_ rows, or rows that no more belong to the keyset ?

 4) mixed
A combination of a keyset-driven cursor and a dynamic
cursor.   

Combined in what way ?

 It's not clear to me now how we should realize the above
 type of cursors at server side.

From your short description it is not even clear for me how *exactly*
should they behave.

--
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] updateable cursors visibility

2003-03-31 Thread Hannu Krosing
Hiroshi Inoue kirjutas E, 31.03.2003 kell 19:08:
  -Original Message-
  From: Hannu Krosing [mailto:[EMAIL PROTECTED] 
  
  Hiroshi Inoue kirjutas E, 31.03.2003 kell 03:40:
  
   2) dynamic
  It can detect any changes made to the membership, order,
  and values of the result set after the cursor is opened.
  
  What would it mean in practice, i.e. if you are on the first 
  row in the
  cursor and then update tha ORDER BY field so that your row becomes the
  last one, will the next FETCH move the cursor past end ?
 
 No. The row next to the old first row would be FETCHed.

In what way would the _changes_made_to_the_order_ be reflected then ?
 
  what happens, if the row you are on is removed from the keyset, either
  by current or any other backend ?
 
 The dynamic cursor doesn't see the row any longer.

It seems to be doable with MVCC - just ;) check for visibility of
underlying tuples at each fetch. At least it does not seem any harder
for MVCC than for other CC methods.

  
   3) keyset-driven
  It always detects changes to the values of rows.
  
  What about _new_ rows, 
 
 It never detects new rows. 

Then I must have misunderstood the can detect any changes made to the
membership, order, and values part. I assumed that any changes wold
also include rows that magically become part of the query by either
changes in values or being inserted.

  or rows that no more belong to the keyset ?
 
 They are the same as deleted ones. 

So they are no more visible to cursor ?

  From your short description it is not even clear for me how *exactly*
  should they behave.
 
 I only intended to illustrate various type of visibilities roughly
 because
 there were no such reference in this thread.
 
 regards,
 Hiroshi Inoue
 
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
 http://archives.postgresql.org


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: tsearch V2 (Was: Re: [HACKERS] Two weeks to feature freeze)

2003-06-20 Thread Hannu Krosing
The Hermit Hacker kirjutas R, 20.06.2003 kell 08:28:
 On Fri, 20 Jun 2003, Tom Lane wrote:
 
   On Fri, 20 Jun 2003, The Hermit Hacker wrote:
   Is there a strong reason why tsearch isn't in gborg?
 
  I think text search is a pretty important facility that should
  eventually be part of the core distribution.  It's more likely to get
  there from contrib than from gborg ...
 
 Why part of the core distribution, and not just left as a loadable module,
 like it is now?

I remember Tom saying that builtin functions calls are a lot faster than
loadable C functions.

If that can be fixed, then it *could* stay loadable.

Also, having built-in full text indexing is very desirable. And I don't
see any even nearly as good competing fulltext indexing modules
anywhere.

If we had to move something *out* of core in order to get tsearch in,
then I personally would not mind if all geometry types go to gborg, but
I'm sure there are some users who would mind ;)

---
Hannu


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] PlPython

2003-06-29 Thread Hannu Krosing
Tom Lane kirjutas E, 23.06.2003 kell 01:29:
 Kevin Jacobs [EMAIL PROTECTED] writes:
  Attached is a patch that removes all of the RExec code from plpython from
  the current PostgreSQL CVS.  In addition, plpython needs to be changed to an
  untrusted language in createlang.
 
 I am inclined to rename plpython to plpythonu, by analogy to pltclu.

...

 Comments?

could we not just make sure that plpython uses python ver  2.x and use
plpythonu for python versions = 2.x until a secure regex solution comes
from Guido and folks ?

I guess most plpython users would be much happier with plpython with
some minor limitations due to older version than with being forced to
use an untrusted pl altogether.

IIRC python 1.5.2 has a perfectly good RExec.

Or is there a requirement that only latest language versions are used in
pg 74 ;)

--
Hannu


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] PlPython

2003-06-29 Thread Hannu Krosing
Tom Lane kirjutas E, 30.06.2003 kell 00:18:
 Hannu Krosing [EMAIL PROTECTED] writes:
  could we not just make sure that plpython uses python ver  2.x and use
  plpythonu for python versions = 2.x until a secure regex solution comes
  from Guido and folks ?
 
 We'd still have to mark it untrusted, so what's the point?

No we don't! The old version of plpython was perfectly OK when used with
python 1.5.x and will be so. The RExec security holes were only
introduced with new class mechanisms in python 2.x.

The version with patch which removes RExec (as Python 2.x is not
supporting it ) is the right thoing to do FOR PYTHON 2.X, but there is
no reason to remove safe execution when using python 1.5.x.

Thus my proposition for using the old version as plpython and the new
version as plpython-u, but allowing the non-u version to be compuled
only for python v  2.x.

-
Hannu


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] PlPython

2003-06-29 Thread Hannu Krosing
Tom Lane kirjutas E, 30.06.2003 kell 01:21:
 Hannu Krosing [EMAIL PROTECTED] writes:
  The version with patch which removes RExec (as Python 2.x is not
  supporting it ) is the right thoing to do FOR PYTHON 2.X, but there is
  no reason to remove safe execution when using python 1.5.x.
 
 Who's still using 1.5, I guess is the question?  And are they likely
 to be updating their PG installation when they're not updating Python?

Python is designed such that one can install and use different versions
in parallel - for example the deafult directopries for libraries are
/usr/lib/python1.5/ and /usr/lib/python2.2/ if you have installed python
1.5.x and 2.2.x, also executables are python2, python2.2 and python1.5,
with plain python being a link to one of these.

I guess that anyone who needs safe Restricted Execution will be using
1.5 at least for that purpose until RExec is fixed in 2.x.

--
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] PlPython

2003-06-30 Thread Hannu Krosing
Tom Lane kirjutas E, 30.06.2003 kell 06:39:
 Hannu Krosing [EMAIL PROTECTED] writes:
  Tom Lane kirjutas E, 30.06.2003 kell 01:21:
  Who's still using 1.5, I guess is the question?  And are they likely
  to be updating their PG installation when they're not updating Python?
 
  I guess that anyone who needs safe Restricted Execution will be using
  1.5 at least for that purpose until RExec is fixed in 2.x.
 
 I don't find that real compelling ...
 
 The bottom line is that this has to get done.  I have the time to
 convert plpython to untrusted status tomorrow.  I do not have the time,
 the infrastructure, nor the interest to build a conditional setup.
 Unless someone else wants to volunteer to make it happen in a timely
 fashion, untrusted is what it will be.

Fine with me.

Just don't put in any hacks to pretend it is trusted.

---
Hannu



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [GENERAL] PlPython

2003-07-01 Thread Hannu Krosing
elein kirjutas T, 24.06.2003 kell 00:42:

There is a realtively clean hack one can use to convert plpython
functions to plpythonu manually - just rename the language for the time
of loading functions - do as superuser

update pg_language set lanname = 'plpython' where lanname = 'plpythonu';

LOAD YOUR Pl/Python FUNCTIONS ;

update pg_language set lanname = 'plpythonu' where lanname = 'plpython';

 PS: I've built and tested the plpython patch against
 7.3.2 and am happy it does not affect the features I count
 on. 

As it should.

The untrusted language gives you *more* power, not less.

The untrusted status means that the user has to be trusted to use that
much power.


Hannu

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] help needed with yacc/bison

2003-07-01 Thread Hannu Krosing
Oleg Bartunov kirjutas T, 01.07.2003 kell 15:49:
 Hi there,
 
 attached archive contains simple parser demonstrating our
 problem. untar it, make, make test
 
 Good test:
 echo -n 12 34.1234 ... | ./parser
 INTEGER:'12'
 CHAR:   ' '
 VERSION:'34.1234'
 CHAR:   ' '
 DOT:'.'
 DOT:'.'
 DOT:'.'
 Wrong:
 echo -n 12 34.1234. ... | ./parser
 INTEGER:'12'
 CHAR:   ' '
 yyerror: syntax error, unexpected CHAR, expecting INTEGER
 
 The problem is recognizing VERSION
 (from gram.y)
 
 version:
 INTEGER DOT INTEGER{ $$ = strconcat($1, $3, $2); }
 | version DOT INTEGER  { $$ = strconcat($1, $3, $2); }

removing the line above seems to fix your problem ;)

 ;
 
 For last query '34.1234.' we want to print VERSION '34.1234' and
 return DOT.

you can't return DOT as version is str and DOT is opr

 This is just an test example, actually we know workaround
 for this case, but we need something simple and universal :)

please describe the problem with some more samples, as it will make it
easier which kind of universal you are searching for ;)


Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] help needed with yacc/bison

2003-07-02 Thread Hannu Krosing
Oleg Bartunov kirjutas K, 02.07.2003 kell 11:39:

   The problem is recognizing VERSION
   (from gram.y)
  
   version:
   INTEGER DOT INTEGER{ $$ = strconcat($1, $3, $2); }
   | version DOT INTEGER  { $$ = strconcat($1, $3, $2); }
 
  removing the line above seems to fix your problem ;)
 
 No, it's there by intention. VERSION could be not just 7.3 but 7.3.3 :)

Try attached gram.y and lex.l

---
Hannu

%{
#include sys/types.h
#include stdio.h
#include string.h
#include stdlib.h

#include scan.h

char * strconcat( char *s1, char *s2, char c ) {
char *s;
int l1 = strlen(s1), l2=strlen(s2);

s = (char*)malloc( l1+l2+2 );
memcpy(s, s1, l1);
if ( c )
*(s+l1) = c;
memcpy( (s+ (l1+((c)?1:0)) ) , s2, l2);
*( s+(l1+l2+((c)?1:0)) ) = '\0'; 

return s;   
}

int yyerror(char*);
int yylex(void);
int yyparse(void);

%}
%error-verbose

%union { 
char*str;
u_int32_t  opr;
}

%type str version 
%type str integer 
%type opr symbol 
%type opr dot

%token str VERSION_MINOR
%token str INTEGER 
%token opr DOT
%token opr CHAR
%token str ENDOF

%%

input:
| input data
;

data:   ENDOF   { return 0; }
| symbol
| dot
| version { printf(VERSION:\t'%s'\n, $1); }
| integer
;

version: 
INTEGER VERSION_MINOR{ $$ = strconcat($1, $2, '-'); } 
| version VERSION_MINOR  { $$ = strconcat($1, $2, '+'); }
; 

integer:
INTEGER { printf(INTEGER:\t'%s'\n, $1); }
;

symbol:
CHAR { printf(CHAR:\t'%c'\n, $1); }
;

dot:
 DOT   { printf(DOT:\t'%c'\n, $1); }
;

%%

int yyerror(char *s) {
printf(yyerror: %s\n,s);
return 0;
}

int main(void) {
yyparse();
return 0;
}

%{
#include sys/types.h
#include string.h
#include scan.h

%}

%option 8bit
%option never-interactive
%option nounput
%option noyywrap


%%

\.[0-9]+{ yylval.str=strdup(yytext+1); return VERSION_MINOR; }

[0-9]+  { yylval.str=strdup(yytext); return INTEGER; }

\.  { yylval.opr=(u_int32_t)*yytext; return DOT;  }

.   { yylval.opr=(u_int32_t)*yytext; return CHAR; }

EOF { return ENDOF; }

%%



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] postmaster startup failure

2003-07-16 Thread Hannu Krosing
WHen running PostgreSQL 7.3.3-1 (from rpm's) on Redhat 9.0 I got the
following in logs and the postmaster will not start up.

Any Ideas what I could do to start up ?

This in on a laptop used for development, but I still would like to not
initdb.


postmaster successfully started
LOG:  database system shutdown was interrupted at 2003-07-17 00:42:29
EEST
LOG:  checkpoint record is at 0/304E76A8
LOG:  redo record is at 0/304E76A8; undo record is at 0/0; shutdown
FALSE
LOG:  next transaction id: 3981836; next oid: 4003572
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 0/304E76E8
LOG:  ReadRecord: unexpected pageaddr 0/2C504000 in log file 0, segment
48, offset 5259264
LOG:  redo done at 0/30503FDC
PANIC:  XLogWrite: write request 0/30504000 is past end of log
0/30504000
LOG:  startup process (pid 2445) was terminated by signal 6
LOG:  aborting startup due to startup process failure

-'
Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] postmaster startup failure

2003-07-17 Thread Hannu Krosing
Tom Lane kirjutas N, 17.07.2003 kell 19:49:
 Ugh.  The reason we hadn't seen this happen in the field was that it is
 a bug I introduced in a patch two months ago :-(
 
 7.3.3 will in fact fail to start up, with the above error, any time the
 last record of the WAL file ends exactly at a page boundary.  I think
 we're gonna need a quick 7.3.4 ...
 
 If you want a source patch for 7.3.3, here it is.

Thanks!

---
Hannu


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] TODO items

2003-08-14 Thread Hannu Krosing
Tom Lane kirjutas R, 08.08.2003 kell 16:56:
 Bruce Momjian [EMAIL PROTECTED] writes:
  o Add optional textual message to NOTIFY
 
 Not done, but there is room in the FE/BE protocol now for something like
 this.

Were there any other changes to NOTIFY - there was talk about making
NOTIFY use some other structure instead of ordinary PG tables in
backend.

--
Hannu


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] encoding question

2003-08-14 Thread Hannu Krosing
Christopher Kings-Lynne kirjutas N, 07.08.2003 kell 04:33:
 My other question is we play around with bytea fields to escape nulls and
 chars  32 and stuff so that when someone browses the table, they get
 '\000unknown\000...', etc.

actually bytea *stores* char(0), you get \000 or \x0 or @ or whatever
depending on whatever you use for displaying it.

the escaping i's done only to fit the data into a SQL statement when
inserting the data into the database. select returns straight bytes from
bytea.

   However, are the other field types for which
 we have to do this?  Can you put nulls and stuff in text/varchar/char
 fields?

No. Nulls are not allowed in text/varchar fields.

-
Hannu


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Speeding up operations

2003-08-14 Thread Hannu Krosing
Rahul_Iyer kirjutas K, 13.08.2003 kell 08:23:
 hi...
 im on a project using Postgres. The project involves, at times, upto
 5,000,000 inserts. I was checking the performance of Postgres for 5M inserts
 into a 2 column table (one col=integer, 2nd col=character). I used the
 Prepare... and execute method, so i basically had 5M execute statements and
 1 prepare statement. Postgres took 144min for this... is there any way to
 improve this performance? if so, how? btw, im using it on a SPARC/Solaris
 2.6.

If you are inserting into an empty table with primary key (or other
constraints), you can run ANALYZE on that table in 1-2 minutes after you
have started the INSERTs, so that constraint-checking logic will do the
right thing (use inedex for pk).

in my tests I achieved about 9000 inserts/sec by using multiple
inserting frontends and ~100 inserts per transaction (no indexes, 6
columns, 4 processors, 2GB memory, test clients running on same
computer)

--
Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Proper Unicode support

2003-08-14 Thread Hannu Krosing
Oleg Bartunov kirjutas E, 11.08.2003 kell 11:52:
 On Mon, 11 Aug 2003, Peter Eisentraut wrote:
 
  Alexey Mahotkin writes:
 
   AFAIK, currently the codepoints are sorted in their numerical order.  I've
   searched the source code and could not find the actual place where this is
   done.  I've seen executor/nodeSort.c and utils/tuplesort.c.  AFAIU, they
   are generic sorting routines.
 
  PostgreSQL uses the operating system's locale routines for this.  So the
  sort order depends on choosing a locale that can deal with Unicode.
 
 
 sort order works, but upper/lower are broken.

I think that the original MB/Unicode support was made for japanese
language/characters, and AFAIK they don't even have the concept
(problem) of upper/lower case.

A question to the core - are there any plans to rectify this for less
fortunate languages/charsets?

Will the ASCII-speaking core tolerate the potential loss of performance
from locale-aware upper/lower ?

Will this be considered a feature or a bugfix (i.e. should we attempt to
fix it for 7.4) ?

---
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] New array functions

2003-08-28 Thread Hannu Krosing
Joe Conway kirjutas N, 28.08.2003 kell 21:15:
 Greg Stark wrote:
  Specifically I want to know how to replace my int_array_aggregate(int) and
  int_array_enum(_int) calls.
 
 I have no idea what those are -- are they from contrib?
 
 You can create an aggregate to turn arbitrary datatype elements into 
 arrays like this:
 
 CREATE AGGREGATE array_aggregate
 (
BASETYPE = anyelement,
SFUNC = array_append,
STYPE = anyarray,
INITCOND = '{}'
 );

Any idea of performance - is this array_aggregate(anyelement) faster,
slower or about same than int_array_aggregate(int) ?

 If int_array_enum() is supposed to take '{1,2,3}' and produce three 
 rows, that function was proposed but rejected. Subsequently Peter 
 Eisentraut pointed out a SQL99 syntax that does this, but I did not get 
 it done for 7.4. Perhaps for 7.5.

So we got to keep intagg at least until 7.5 ...

---
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] ALTER TABLE

2003-08-29 Thread Hannu Krosing
Rod Taylor kirjutas R, 29.08.2003 kell 03:31:
 I've been fiddling away on ALTER TABLE and have a few questions about a
 possible datatype change. Just alter portions.
 
 I presume I'll need to do a table rewrite. What is the best way to
 change a single datum?  heap_modify() takes a single relation type where
 I will need to deal with different types.  Simply build a new tuple with
 old datums (easy enough)? Do we care about OIDs being renumbered?

AFAIK alter table change column should do the equivalent of

alter table x add column temp_name newdatatype;
update table x set temp_name=convert(name);
alter table x drop colum name;
alter table x rename column temp_name to name;

This should not renumber OIDS.

 ALTER TABLE test ADD CHECK(col  4),
   add column bob integer default 2 not null,
   add column bob2 serial check(bob2 = 255),
   drop column col2 cascade;

or with your combined syntax

alter table x
add column temp_name newdatatype = convert(current_name),
drop column current_name,
rename column temp_name tocurrent_ name;

---
Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Preliminary notes about hash index concurrency (long)

2003-09-01 Thread Hannu Krosing
Tom Lane kirjutas E, 01.09.2003 kell 15:41:
 Bruce Momjian [EMAIL PROTECTED] writes:
  I know someone reported a problem with the hash indexes (data loss,
  serious)--- was that a new 7.4 but or something that has existed for a
  long time?
 
 AFAICT the bug's been there since Berkeley days.

One could check how BSDDB (http://www.sleepycat.com) handles these
issues. It is reported to have started as btree/hash index code
extracted from an early version of postgres, so perhaps there one could
at least get some ideas, though their locking / concurrency control are
probably much different.

--
Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Win32 native port

2003-09-04 Thread Hannu Krosing
Joerg Hessdoerfer kirjutas N, 04.09.2003 kell 10:22:
 I'm currently in the process of setting up my development environment (how the 
 heck do I get bison/flex to compile under MingW/MSYS? Oh my...)

there is a precompiled bison in the MinGW filelist
http://www.mingw.org/download.shtml#hdr2

dunno about flex ;(

 , and then 
 I'll go adventuring in the code.

--
Hannu


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] Warning for missing createlang

2003-09-07 Thread Hannu Krosing
Andrew Dunstan kirjutas L, 06.09.2003 kell 16:14:
 Peter Eisentraut wrote:
 
 Tom Lane writes:
 
   
 
 There are good security arguments not to have it in the default install,
 no?
 
 
 
 I think last time the only reason we saw was that dump restoring would be
 difficult.  I don't see any security reasons.
 
 
 That could be overcome by doing a 'drop language' before running your 
 restore, couldn't it?

or to have CREATE OR REPLACE LANGUAGE  (like we have for FUNCTIONS).

---
Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] PostgreSQL not ACID compliant?

2003-09-22 Thread Hannu Krosing
Heikki Tuuri kirjutas P, 21.09.2003 kell 12:51:
 Tom,
 
 - Original Message - 
 From: Tom Lane [EMAIL PROTECTED]
 To: Heikki Tuuri [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Sunday, September 21, 2003 10:32 AM
 Subject: Re: [HACKERS] PostgreSQL not ACID compliant?
 
 
  Heikki Tuuri [EMAIL PROTECTED] writes:
   if you set the transaction isolation level SERIALIZABLE in MySQL/InnoDB,
   then InnoDB uses next-key locking in every SELECT, and transactions
 really
   are serializable in the mathematical sense.
 
  My understanding is that next-key locking only helps when all your
  predicates are point or range searches against an available b-tree
  index.
 
 all SQL queries are performed through index searches.

Does that mean that an index is allways needed for predicate column, or
is this an abstract index in some realational algebra sense ?

  That is why next-key
 locking enforces serializability. IBM researcher C. Mohan has written papers
 about next-key locking. Next-key locking is an approximation of predicate
 locking. We simply lock more to make sure the 'predicates' themselves are
 locked.
 
  While that might cover many practical cases, it can hardly
  be called a complete solution to the problem of serializability.
 
 It is a complete solution.

Is this solution complete only for MAX() case, or is this a general
solution woking for things line AVG or STDDEV and perhaps all
user-defined aggregates as well ?

 Another approximation of predicate locking is
 table level locking, a solution which Oracle used some 15 years ago, if you
 switched it on the SERIALIZABLE isolation level.

Table level locking seems to be a complete solution indeed, just not
concurrent at all. It may be that we have to forget concurrency to get
complete and general concurrency ;( 

Or is next key locking something more than a solution for getting
continuous nextval() 's ?

--
Hannu

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: ADD FOREIGN KEY (was Re: [HACKERS] [GENERAL] 7.4Beta)

2003-09-29 Thread Hannu Krosing
Andreas Pflug kirjutas E, 29.09.2003 kell 12:04:
 Christopher Kings-Lynne wrote:
 
  You could just as easily argue that the lack of integrity testing at 
  data load time was equally a bug.
 
  I think we need someway of telling postgres to suppress a foreign key 
  check.
 
  The main problem is that the foreign key column is often not indexed. 
 
 So a db designer made a bloody mistake.
 The problem is there's no easy way to find out what's missing.
 I'd really like EXPLAIN to display all subsequent triggered queries 
 also, to see the full scans caused by missing indexes.

It could probably be doable for EXPLAIN ANALYZE (by actually tracing
execution), but then you will see really _all_ queries, i.e. for a 1000
row update you would see 1 UPDATE query and 1000 fk checks ...

OTOH, you probably can get that already from logs with right logging
parameters.

-
Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: ADD FOREIGN KEY (was Re: [HACKERS] [GENERAL] 7.4Beta)

2003-09-29 Thread Hannu Krosing
Shridhar Daithankar kirjutas E, 29.09.2003 kell 13:34:
 On Monday 29 September 2003 15:58, Christopher Kings-Lynne wrote:
  So a db designer made a bloody mistake.
  The problem is there's no easy way to find out what's missing.
  I'd really like EXPLAIN to display all subsequent triggered queries
  also, to see the full scans caused by missing indexes.
  
   It could probably be doable for EXPLAIN ANALYZE (by actually tracing
   execution), but then you will see really _all_ queries, i.e. for a 1000
   row update you would see 1 UPDATE query and 1000 fk checks ...
  
   OTOH, you probably can get that already from logs with right logging
   parameters.
 
  Actually - it shouldn't be too hard to write a query that returns all
  unindexed foreign keys, surely?
 
 Correct me if I am wrong but I remember postgresql throwing error that foreign 
 key field was not unique in foreign table. Obviously it can not detect that 
 without an index. Either primary key or unique constraint would need an 
 index.
 
 What am I missing here?
 
 
 IOW, how do I exactly create foreign keys without an index?

hannu=# create table pkt(i int primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
'pkt_pkey' for table 'pkt'
CREATE TABLE
hannu=# create table fkt(j int references pkt);
NOTICE:  CREATE TABLE will create implicit trigger(s) for FOREIGN KEY
check(s)
CREATE TABLE
hannu=#

now the *foreygn key* column (fkt.j) is without index. As foreign keys
are enforced both ways, this can be a problem when changing table pkt or
bulk creating FK's on big tables.


Hannu


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 7.4 status

2003-09-30 Thread Hannu Krosing
Bruce Momjian kirjutas T, 30.09.2003 kell 02:16:
 Tom Lane wrote:
  
   !  Faster regular expression code
  
  We could tout more functionality too, since the new regex package
  has a lot of advanced stuff that wasn't there before.
 
 Added more powerful
 

This wording covers nicely possible incompatibilities too ;)

--
Hannu

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Weird locking situation

2003-10-02 Thread Hannu Krosing
Tom Lane kirjutas N, 02.10.2003 kell 17:30:
 Christopher Kings-Lynne [EMAIL PROTECTED] writes:
  What is going on here?  Surely getting a FOR UPDATE row lock should 
  prevent another process getting an update lock?

 The behavior you describe would certainly be a bug, but you'll have to
 show a reproducible example to convince me it wasn't pilot error.  One
 idea that springs to mind is that maybe additional rows with id=1 were
 inserted (by some other transaction) between the SELECT FOR UPDATE and
 the UPDATE?

Perhaps he was looking for key locking, so thet select ... where
key=1 for update would also prevent inserts where key=1 ?


Hannu



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Index/Function organized table layout

2003-10-02 Thread Hannu Krosing
James Rogers kirjutas N, 02.10.2003 kell 20:50:

 To give a real world example, a standard query on one of our tables that
 has not been CLUSTER-ed recently (i.e. within the last several days)
 generates an average of ~2,000 cache misses.  Recently CLUSTER-ed, it
 generates ~0 cache misses on average.  Needless to say, one is *much*
 faster than the other. 

So what you really need is the CLUSTER command to leave pages half-empty
and the tuple placement logic on inserts/updates to place new tuples
near the place where they would be placed by CLUSTER. I.e. the code that
does actual inserting should be aware of CLUSTERing.

I guess that similar behaviour (half-empty pages, or even each second
page empty which is better as it creates less dirty buffers) could also
significantly speed up updates on huge number of tuples, as then code
could always select a place near the old one and thus avoid needless
head-movements between reading and writing areas.

 In my case, not only does CLUSTER-ing increase the number of concurrent
 queries possible without disk thrashing by an integer factor, but the
 number of buffers touched on a query that generates a cache misses is
 greatly reduced as well.  The problem is that CLUSTER-ing is costly and
 index-organizing some of the tables would reduce the buffer needs, since
 the index tuple in these cases are almost as large as the heap tuples
 they reference.

True, but my above suggestion would be much easier to implement
near-term. It seems to be a nice incremental improvement just needs 
to touch places:

1. selecting where new tuples go : 

  * updated ones go near old ones if not clustered and near the place
CLUSTER would place them if clustered. 

  * inserted ones go to the less than half-empty pages if not clustered
and near the place CLUSTER would place them if clustered. 

2. making reorganization code (CLUSTER and VACUUM FULL) to leave space 
in pages for clustered updates/inserts.

the half above could of course mean anything from 10% to 95% depending
on access patterns.

-
Hannu


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Index/Function organized table layout

2003-10-03 Thread Hannu Krosing
James Rogers kirjutas N, 02.10.2003 kell 23:44:
 On Thu, 2003-10-02 at 12:09, Hannu Krosing wrote:
  So what you really need is the CLUSTER command to leave pages half-empty
  and the tuple placement logic on inserts/updates to place new tuples
  near the place where they would be placed by CLUSTER. I.e. the code that
  does actual inserting should be aware of CLUSTERing.
 
 
 Not exactly. What you are describing is more akin to partitioning or
 hash-organized tables i.e. sorting insert/update tuples to various pages
 according to some hash function.

What I actually thought I was describing is how CLUSTER should work in a
postgres flavour of MVCC storage ;). Not the CLUSTER command, but the
whole feature.

 B-Tree organized tables basically make the table and the primary index
 the same thing, and tend to be most useful for tables that use a single
 multi-column primary key index for queries.  This has the effect of
 putting all the tuples for a typical query in the same small number of
 heap pages (and therefore buffers), allowing very efficient access in
 the typical case with the caveat that non-indexed queries will be quite
 slow.

AFAICS we could resolve this problem (querying indexes only) by keeping
a copy of visibility info (tmin,tmax,...) in index tuples. This would
make index updates bigger and thus slower, so this should be optional.

If you then put all fields in primary key, then the main table could be
dropped. If there is no data table then no other indexes would then be
allowed, or they must be double-indexes referencing the primary key,
not tuple and thus even bigger ...

 B-Tree organized tables are particularly useful when the insert order is
 orthogonal to the typical query order.  As I mentioned before, tables
 that store parallel collections of time-series data are classic
 examples.  Often the data is inserted into the pages in order that can
 roughly be described as (timestamp, id), but is queried using (id,
 timestamp) as the index.  If you have enough ids, you end up with the
 pathological case where you typically have one relevant tuple per page
 for a given query.

But if we had clustered the table on (id, timestamp), then the data
would be in right order for queries, if cluster worked well.

 The nuisance would be keeping track of which pages are collecting which
 tuples.  Knowing the CLUSTER index doesn't tell you much about which
 pages would currently be a good place to put a new tuple.  You could
 always markup the index that CLUSTER uses to keep track of good
 candidates (plus some additional structures), but the more I think about
 that, the more it looks like a nasty hack.

Yeah, index-organized tables seems exact fit for your problem, but then
my abstract idea of what clustering should do is exactly that - keep
tuples in roughly the same order as an index ;)

So what really is needed is a smart tuple-placer which can keep tuples
that are close (as defined by index) together in a small number of
pages. These pages themselves need not be coninuous, they can be
sprinkled around in the whole data table, but they need to stay clusters
of index-close tuples.
 

Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Dreaming About Redesigning SQL

2003-10-04 Thread Hannu Krosing
Mike Mascari kirjutas L, 04.10.2003 kell 06:32:

 
 2) The query language should be computationally complete. The user
 should be able to author complete applications in the language, rather
 than the language being a sublanguage.

To me it seems like requiring that one should be able to author complete
programs in regex.

Yes, when all you have is a hammer everything looks like a nail ;)


Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Using backslash in query

2003-10-04 Thread Hannu Krosing
Tom Lane kirjutas R, 03.10.2003 kell 18:34:
 Michael Brusser [EMAIL PROTECTED] writes:
  But when I doubled the number of backslashes:
SELECT smth. FROM  tbl WHERE situation LIKE '%%';
  - it actually worked fine.
 
 Backslash is special to both the string-literal parser and the LIKE code.
 So when you write the above, the pattern value that arrives at the LIKE
 processor has one less level of backslashing:
   %\\%
 and the LIKE processor interprets this as percent, a literal backslash,
 and another percent.

Regarding the dollar-quoting discussions -

Will we be able to write the above query as 

SELECT smth. FROM  tbl WHERE WHERE situation LIKE $$%\\%$$;

in 7.4 or is \ still special there ?

if it is then one \ in regex in plpython still needs to be entered as
 which has some geek coolness but would not be what I'd prefer
to do on a regular basis.


Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Index/Function organized table layout (from Re:

2003-10-04 Thread Hannu Krosing
Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
 [EMAIL PROTECTED] (Jean-Luc Lachance) writes:
  That's one of the draw back of MVCC.  
  I once suggested that the transaction number and other house keeping
  info be included in the index, but was told to forget it...
  It would solve once and for all the issue of seq_scan vs index_scan.
  It would simplify the aggregate problem.
 
 It would only simplify _one_ case, namely the case where someone cares
 about the cardinality of a relation, and it would do that at
 _considerable_ cost.
 
 A while back I outlined how this would have to be done, and for it to
 be done efficiently, it would be anything BUT simple.  

Could this be made a TODO item, perhaps with your attack plan. 
Of course as strictly optional feature useful only for special situations
(see below)

I cross-post this to [HACKERS] as it seem relevant to a problem recently
discussed there.

 It would be very hairy to implement it correctly, and all this would
 cover is the single case of SELECT COUNT(*) FROM SOME_TABLE;

Not really. Just yesterday there was a discussion on [HACKERS] about
implementing btree-organized tables, which would be much less needed if
the visibility info were kept in indexes. 

 If you had a single WHERE clause attached, you would have to revert to
 walking through the tuples looking for the ones that are live and
 committed, which is true for any DBMS.

If the WHERE clause could use the same index (or any index with
visibility info) then there would be no need for walking through the
tuples in data relation.

the typical usecase cited on [HACKERS] was time series data, where
inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
order. Now if the index would include all relevant fields
(id,timestamp,data1,data2,...,dataN) then the query could run on index
only touching just a few pages and thus vastly improving performance. I
agree that this is not something everybody needs, but when it is needed
it is needed bad.

 And it still begs the same question, of why the result of this query
 would be particularly meaningful to anyone.  I don't see the
 usefulness; I don't see the value of going to the considerable effort
 of fixing this purported problem.

Being able to do fast count(*) is just a side benefit.


Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function

2003-10-04 Thread Hannu Krosing
Tom Lane kirjutas L, 04.10.2003 kell 19:07:
 Hannu Krosing [EMAIL PROTECTED] writes:
  Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
  A while back I outlined how this would have to be done, and for it to
  be done efficiently, it would be anything BUT simple.  
 
  Could this be made a TODO item, perhaps with your attack plan. 
 
 If I recall that discussion correctly, no one including Christopher
 thought the attack plan was actually reasonable.
 
 What this keeps coming down to is that an optimization that helps only
 COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in
 development and maintenance effort to justify its existence.

Please read further in my email ;)

The point I was trying to make was that faster count(*)'s is just a side
effect. If we could (conditionally) keep visibility info in indexes,
then this would also solve the problem fo much more tricky question of
index-structured tables.

Count(*) is *not* the only query that could benefit from not needing to
go to actual data table for visibilty info, The much more needed case
would be the inveres time series type of queries, which would
otherways trash cache pages badly.


Hannu


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Index/Function organized table layout

2003-10-05 Thread Hannu Krosing
Greg Stark kirjutas P, 05.10.2003 kell 00:17:

 I've never seen anyone use this feature, and I never seriously considered it
 myself. It sort of has the feel of an antiquated feature that traded too much
 flexibility and abstraction for raw performance on very slow disk hardware. 

Read A Conversation with Jim Gray referenced from this slashdot
article: 
http://slashdot.org/article.pl?sid=03/09/17/1246255mode=threadtid=126
for info on how disk drives are slower than ever (relatively), and how
one should treat them as such, especially for large data volumes.

 However I wonder if the nested tables feature doesn't use it under the hood
 though. It seems they would both be useful for the same types of tables.
 
 I'm not sure what this means for Postgres. I'm not sure if Postgres should use
 a different name to avoid confusion and possibly to leave room in the future
 for the possibility of supporting something like this. Or perhaps something
 like this would be useful for Postgres now or in the near future? Or perhaps
 the consensus is as I said, that this is an old idea that no longer gets any
 respect and postgres should just pretend it doesn't exist?

We can't pretend CLUSTER does not exist until we have some better
technology to offer instead.


Hannu




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Architecture Roadmap?

2003-10-07 Thread Hannu Krosing
James Rogers kirjutas T, 07.10.2003 kell 01:48:
 Hi,
 
 I guess what I am wondering
 is whether there is any kind of quasi-official mechanism for putting
 certain features in certain future versions depending on the nature and
 significance of implementing those features and the number of things it
 touches.

I don't know of any (except discussing it om [HACKERS] ;), but I very
much like the way Pyhton does it via PEPs (Python Enchancement
Proposals) see: http://www.python.org/peps/ .

-
Hannu

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] http://www.pgsql.com/register/submit.php

2003-10-13 Thread Hannu Krosing
Jean-Michel POURE kirjutas E, 13.10.2003 kell 16:20:
 Le Lundi 13 Octobre 2003 14:45, Robert Treat a crit :
  Someone could have downloaded the package from sourceforge/freshmeat, so
  I don't think it hurts to have a pointer to postgresql.org.  What might
  be better than a link for the submit page would just be a blurb
  mentioning we're always looking for new case studies from folks who are
  using postgresql, and perhaps point them some place for that. What I'd
  like to see added to the message is a reminder to run initdb...
 
 In France, according to the Loi informatique et libert, users have a right 
 to access their personal data. As a result, every web form must display a 
 warning which reminds users that they can access their personal data.

Even if no personal data is entered or queried ?

 This is not a potential problem for PostgreSQL.inc, but for Linux vendors who 
 release PostgreSQL (RedHat, Mandrake, etc...) in France. The fine can be 
 quite large. Also, a judge may stop immediately the release of products 
 including PostgreSQL.

Do we need special NOTICE for French users telling them In case you
have entered some personal data you can access it by doing a SELECT
query after each INSERT/UPDATE to be legally sound in France ?

This could then be automatically enabled by LC_ALL=fr .


Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-15 Thread Hannu Krosing
James Rogers kirjutas K, 15.10.2003 kell 11:26:
 On 10/14/03 11:31 PM, James Rogers [EMAIL PROTECTED] wrote:
  
  There is some abstraction in Postgres and the database is well-written, but
  it isn't written in a manner that makes it easy to swap out operating system
  or API models.  It is written to be portable at all levels.  A database
  kernel isn't necessarily required to be portable at the very lowest level,
  but it is vastly more optimizable because you aren't forced into a narrow
  set of choices for interfacing with the operating system.
 
 
 Just to clarify, my post wasn't really to say that we should run out and
 make Postgres use a database kernel type internal model tomorrow.  The point
 of all that was that Oracle does things that way for a very good reason and
 that there can be benefits that may not be immediately obvious.

OTOH, what may be a perfectly good reason for Oracle, may not be it for
PostgreSQL.

For me the beauty of OS software has always been the possibility to fix
problems at the right level (kernel, library, language) , and not to
just make workarounds at another level (your application).

So getting some API's into kernel for optimizing cache usage or
writeback strategies would be much better than using raw writes and
rewriting the whole thing ourseleves. 

The newer linux kernels have several schedulers to choose from, why not
push for choice in other areas as well.

The ultimate database kernel could thus be a custom tuned linux kernel
;)

 It is really one of those emergent needs when a database engine gets to a
 certain level of sophistication.  For smaller and simpler databases, you
 don't really need it and the effort isn't justified.  At some point, you
 cross a threshold where not only does it become justified but it becomes a
 wise idea or not having it will start to punish you in a number of different
 ways.  I personally think that Postgres is sitting on the cusp of its a
 wise idea, and that it is something worth thinking about in the future.

This thread reminds me of Linus/Tannenbaum Monolithic vs. Microkernel
argument - while theoretically Microkernels are better Linux could
outperform it by having the required modularity on source level, and
being an open-source project this was enough. It also beat the Mach
kernel by being there whereas microkernel based mach was too hard to
develop/debug and thus has taken way longer to mature.

--
Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Unicode upper() bug still present

2003-10-20 Thread Hannu Krosing
Tom Lane kirjutas E, 20.10.2003 kell 03:35:
 Oliver Elphick [EMAIL PROTECTED] writes:
  There is a bug in Unicode upper() which has been present since 7.2:
 
 We don't support upper/lower in multibyte character sets, and can't as
 long as the functionality is dependent on ctype.h's toupper()/tolower().
 It's been suggested that we could use wctype.h where available.
 However there are a bunch of issues that would have to be solved to make
 that happen.  (How do we convert between the database character encoding 
 and the wctype representation?  

How do we do it for sorting ?

 How do we even find out what
 representation the current locale setting expects to use?)

Why not use the same locale settings as for sorting (i.e. databse
encoding) until we have a proper multi-locale support in the backend ?

It seems inconsistent that we do use locale-aware sorts but not
upper/lower.

this is for UNICODE database using locale et_EE.UTF-8

ucdb=# select t, upper(t) from tt order by 1;
 t | upper
---+---
 a | A
 s | S
  | 
  | 
  | 
  | 
  | 
  | 
(8 rows)

as you see, the sort order is right, but some characters are and some
are not converted the result is a complete mess ;(

---
Hannu


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Unicode upper() bug still present

2003-10-20 Thread Hannu Krosing
Tatsuo Ishii kirjutas E, 20.10.2003 kell 15:37:
  Tom Lane kirjutas E, 20.10.2003 kell 03:35:
   Oliver Elphick [EMAIL PROTECTED] writes:
There is a bug in Unicode upper() which has been present since 7.2:
   
   We don't support upper/lower in multibyte character sets, and can't as
   long as the functionality is dependent on ctype.h's toupper()/tolower().
   It's been suggested that we could use wctype.h where available.
   However there are a bunch of issues that would have to be solved to make
   that happen.  (How do we convert between the database character encoding 
   and the wctype representation?  
  
  How do we do it for sorting ?
  
   How do we even find out what
   representation the current locale setting expects to use?)
  
  Why not use the same locale settings as for sorting (i.e. databse
  encoding) until we have a proper multi-locale support in the backend ?
 
 There's absolutely no relationship between database encoding and
 locale. 

How does the system then use locale for sorting and not for upper/lower
?

I would have rather expected the opposite, as lower/uper rules are litte
more locale independent than collation.

 IMO depending on the system locale is a completely wrong
 design decision and we should go toward for having our own collate
 data.  

I agree completely. We could probably lift something from IBM's ICU.

-
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Unicode upper() bug still present

2003-10-20 Thread Hannu Krosing
Peter Eisentraut kirjutas E, 20.10.2003 kell 21:02:
 Tom Lane writes:
 
  I noticed by chance that glibc has a reentrant locale API that seems
  to allow for efficient access to multiple locales concurrently.  Perhaps
  it would be a reasonable solution to support multiple locales only on
  machines that have this library.  If we have to write our own locale
  support it's likely to be a long time coming :-(
 
 Naturally, I cannot promise anything, but this is at the top of my list
 for the next release.  I already have sorted out the specifications and
 algorithms and collected locale data for most corners of the world, so
 it's just the coding left. 

Have you checked ICU ( http://oss.software.ibm.com/icu/ ) ?

It seems to have all the needed data at least.

 Unfortunately, a real, sustainable fix of this
 situations requires us to start at the very bottom, namely the character
 set conversion interface, then the gettext interface, then the new locale
 library, then integrating the per-column granularity into the
 parser/planer/executor.  So you may be looking at a two-release process.

---
Hannu

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Unicode upper() bug still present

2003-10-21 Thread Hannu Krosing
Karel Zak kirjutas T, 21.10.2003 kell 10:50:
 On Mon, Oct 20, 2003 at 10:58:00PM +0200, Peter Eisentraut wrote:
 
  (Note that I say Unicode a lot here because those people do a lot of
  research and standardization in this area, which is available for free,
  but this does not constrain the result to work only with the Unicode
  character set.)
 
  Why  cannot  do PostgreSQL  as  100%  pure  Unicode system? We  can  do
  conversion  from/to  others  encodings as  client/server  communication
  extension, but  internaly in BE  we can  use only pure  Unicode data. I
  think a lot of things will more simple...

I've heard that some far-east languages have had some issues with 16-bit
UNICODE, but the 32-bit should have fixed it.

I would also support a move to UNICODE (store as SCSU, process as 16 or
32 bit wchars, i/o as UTF-8) for NCHAR/NVARCHAR/NTEXT and pure 7-bit
byte-value ordered ASCII for CHAR/VARCHAR/TEXT.

But this would surely have some issues with backward compatibility.


Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Unicode upper() bug still present

2003-10-21 Thread Hannu Krosing
Tatsuo Ishii kirjutas T, 21.10.2003 kell 12:07:
   Why  cannot  do PostgreSQL  as  100%  pure  Unicode system? We  can  do
   conversion  from/to  others  encodings as  client/server  communication
   extension, but  internaly in BE  we can  use only pure  Unicode data. I
   think a lot of things will more simple...
 
 Please don't do that. There's a known issue of round trip conversion
 between Unicode and other encodings 

Are these unsolvable even in theory ?

-
Hannu


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Dreaming About Redesigning SQL

2003-10-21 Thread Hannu Krosing
Anthony W. Youngman kirjutas P, 19.10.2003 kell 21:24:

 
 As soon as a requirement for a database specifies extraction of the
 maximum power from the box, it OUGHT to rule out all the current
 relational databases. MV flattens it for it for performance. As an MV
 programmer, I *KNOW* that I can find any thing I'm looking for (or find
 out it doesn't exist) with just ONE disk seek.

Relational or not, this requires either in-memory index or perfect hash.

BTW, how do you find the oldest red elephant with just one disk seek?

as in SQL:

select from elephants where colour=red order by age desc limit 1;

 A relational programmer
 has to ask the db does this exist and hope the db is optimised to be
 able to return the result quickly. To quote the Pick FAQ SQL optimises
 the easy task of finding stuff in memory. Pick optimises the hard task
 of getting it into memory in the first place.

SQL by itself optimises nothing: by definition it evaluates full cross
products and then compares all rows with predicates.

Some SQL implementations do optimse a little ;)

 Relational is all about theory and proving things mathematically
 correct. MV is all about engineering and getting the result.

Or perhaps just getting _the_ result ;) 

getting some other result will probably need another MV database ;)

 Unless you can use set theory to predict the future,

Isn't this what PostgreSQL's optimiser does ?

--
Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Dreaming About Redesigning SQL

2003-10-23 Thread Hannu Krosing
Marshall Spight kirjutas N, 23.10.2003 kell 11:01:
 Anthony W. Youngman [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]
  Just like the academics were
  brainwashed into thinking that microkernels were the be-all and end-all
  - until Linus showed them by practical example that they were all idiots
...
 Linus set out to build a unix kernel workalike, and he chose
 the easiest path, copying architecture from the 1970s, along
 with all the weaknesses that those idiot academics had identified
 years earlier. Since then, his monolithic kernel has gotten a lot
 of marketshare, due to a number of different reasons, none of
 them being architectural superiority.

Unless you count as architectural superiority the fact that it can be
actually written and debugged in a reasonable time.

Being able to mathematically define something as not having certain
weaknesses does not quarantee that the thing can be actually implemented
and/or is usable.

--
Hannu


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] round() function wrong?

2003-10-24 Thread Hannu Krosing
Peter Eisentraut kirjutas R, 24.10.2003 kell 22:16:
 Jochen Westland [invigo] writes:
 
  In my version
  select round(2.5); returns 2;
  select round(2.501) returns 3;
 
  refering to my math professor thats wrong, at least in germany.
  select round(2.5); should return 3
 
 The convention that .5 values should be rounded up is just that, a
 convention. 

Also, which way is up ?

hannu=# select round(0.5);
 round
---
 1
(1 row)
 
hannu=# select round(-0.5);
 round
---
-1
(1 row)


  On systems with IEEE 754 floating point, the default is
 normally to round to the nearest even number.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] PostgreSQL on Novell Netware 6.5.

2003-10-24 Thread Hannu Krosing
Eduardo D Piovesam kirjutas E, 20.10.2003 kell 16:35:
 They stopped at 7.2.4 because they're finishing some usefull APIs,
 which'll make the port much more easy.

Will this involve using a Linux kernel ;)

 When this part is done, a new port will be made with 7.4. With much
 less NetWare specific code and maybe, it'll be accepted by the
 PostgreSQL community.

Sure!


Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] random access - bytea

2003-10-26 Thread Hannu Krosing
Dennis Bjorklund kirjutas P, 26.10.2003 kell 07:30:
 On Sat, 25 Oct 2003, Joe Conway wrote:
 
  That will modify the bytea column so that it is stored uncompressed in 
  the TOAST table.
  
  Now, simply use substr() to grab any random chunk:
  SELECT substr(foo_bytea, 2, 100) from foo where foo_id = 42;
 
 This would imply that every little read would have to do a scan on a table 
 to find the row and then to perform the substr. An open command can 
 be optimized a lot more, for example to cache entries that have been 
 opened so that it's fast to read the next 1kb or whatever you want.
 
 Also, the above does not solve writes at all which can also be made 
 transaction safe and fast with a better api where you can update a part 
 of a field.

I brought it up once and Tom answered that TOAST tables are not
transaction aware, so you can't update just some parts of toasted
entities - you must always write the whole fields. So this will not be
just an api change.

 The above is not really a replacement of the current lo-objects.

True.

-
Hannu


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Experimental patch for inter-page delay in VACUUM

2003-11-02 Thread Hannu Krosing
Tom Lane kirjutas P, 02.11.2003 kell 20:00:
 Jan Wieck [EMAIL PROTECTED] writes:
  I am currently looking at implementing ARC as a replacement strategy. I 
  don't have anything that works yet, so I can't really tell what the 
  result would be and it might turn out that we want both features.
 
 It's likely that we would.  As someone (you?) already pointed out,
 VACUUM has bad side-effects both in terms of cache flushing and in
 terms of sheer I/O load.  Those effects require different fixes AFAICS.
 
 One thing that bothers me here is that I don't see how adjusting our
 own buffer replacement strategy is going to do much of anything when
 we cannot control the kernel's buffer replacement strategy.  

At least for OpenSource/Free OS'es it would probably be possible to
persuade kernel developers to give the needed control to userspace apps.

So the take over all RAM is not the only option ;)

 To get any
 real traction we'd have to go back to the take over most of RAM for
 shared buffers approach, which we already know to have a bunch of
 severe disadvantages.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] ADD/DROP INHERITS

2006-06-07 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-06-07 kell 15:33, kirjutas Greg Stark:
 Tom Lane [EMAIL PROTECTED] writes:
 
  I thought we had agreed that the semantics of ADD INHERITS would be to
  reject the command if the child wasn't already suitable to be a child
  of the parent.  Not to modify it by adding columns or constraints or
  whatever.  For the proposed uses of ADD INHERITS (in particular,
  linking and unlinking partition tables) an incompatibility in schema
  almost certainly means you made a mistake, and you don't really want
  the system helpfully fixing your table to match the parent.
 
 I didn't see any discussion like that and I find it pretty surprising.

I'm pretty sure that what was discussed was just attaching/detaching
child tables into inheritance chains with no table alterations.

Maybe it was never mentioned explicitly, but that was how I understood
the discussion.

 Personally I would have agreed. For partitioned tables you certainly don't
 want it to create new columns without warning you.

Exactly!

 But that's entirely inconsistent with the way inherited tables work in
 general. It seems to go against the grain of Postgres's general style to
 implement just the use case that's useful for a particular application rather
 than keep the features logically consistent with each other. 

There are too many conflicting definitions of logically consistent, so
doing the bare minimum is the best way to avoid the whole problem.

 Perhaps there should be an option when issuing the ADD INHERITS to indicate
 whether you want it to create new columns or only match existing columns. That
 would also give me a convenient excuse to skip all those NOTICEs about merging
 column definitions.

nonono! the whole pg inheritance/partitioning thing is still quite
low-level and ADD/DEL INHERITS is the wrong place to start fixing it.

 Actually I think in the long term for partitioned tables Postgres will have to
 implement a special syntax just like Oracle and other databases. The user
 doesn't really want to have to manually manage all the partitions as tables.
 That imposes a lot of extra work to have to define the tables with the right
 syntax, maintain the constraints properly, etc.

Yes. Maybe. But this is something that requires much more thought and
planning than adding the simplest possible ADD/DELETE INHERITS.

 For the user it would be better to have a single property of the partitioned
 table that specified the partition key. Then when adding a partition you would
 only have to specify the key range it covers, not write an arbitrary
 constraint from scratch. Nor would you have to create an empty table with the
 proper definition first then add it in.

Don't try to solve too many problems at once. Starting with just a
possibility to move suitable ready-made partitions in and out of
inheritance chain solves a really big problem. No need to try to
obfuscate it with extra functionality, at least not initially.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] How to avoid transaction ID wrap

2006-06-07 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-06-06 kell 13:53, kirjutas Christopher
Browne:
  We have triggers that fire is something interesting is found on insert.
  We want this thing to run for a log time.
  From the numbers, you can see the PostgreSQL database is VERY loaded.
  Running VACUUM may not always be possible without losing data.
 
  why ? just run it with very friendly delay settings.
 
 Friendly delay settings can have adverse effects; it is likely to
 make vacuum run on the order of 3x as long, which means that if you
 have a very large table that takes 12h to VACUUM, vacuum delay will
 increase that to 36h, which means you'll have a transaction open for
 36h.
 
 That'll be very evil, to be sure...

Not always. I know that it is evil in slony1 context, but often it *is*
possible to design your system in a way where a superlong transaction is
almost unnoticable. 

Long transactions are evil in case they cause some fast-changing table
to grow its storage size several orders of magnitude, but if that is not
the case then they just run there in backgroun with no ill effects,
especially do-nothing transactions like vacuum.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] More on inheritance and foreign keys

2006-06-08 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-08 kell 08:38, kirjutas Albert Cervera
Areny:
 Hi,
   after starting this thread 
 http://archives.postgresql.org/pgsql-hackers/2006-05/msg00222.php, I thought 
 I'd finally go for making foreign keys my own way instead of trying to patch 
 PostgreSQL. However, I've realized that managing foreign keys with my own 
 PL/SQL or C function isn't possible as I need DEFERRED checks which are 
 currently only available for foreign keys.

remember that you must manage both ends of foreign key. and you have to
lock the other table while changing values at either end.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] How to avoid transaction ID wrap

2006-06-08 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-06-07 kell 17:45, kirjutas Jim C. Nasby:

 Plus, if the only issue here is in fact the long-running transaction for
 vacuum, there's other ways to address that which would be a lot less
 intrusive than doing something like going to 64 bit XIDs. IIRC, in 8.2
 vacuum will start a new transaction every time it fills up
 maintenance_work_mem, so just setting that low could solve the problem
 (at the expense of a heck of a lot of extra IO).

If the aim is to *only* avoid transaction wraparound, then maybe we
could introduce VACUUM FREEZE ONLY; which never removes any old tuples,
but instead just marks them by setting xmin=xmax for them, in addition
to its freezing of live-and-visible-to-all tuples.

This would avoid touching indexes at all and may well be what is desired
for tables with only very little updates/deletes.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] ADD/DROP INHERITS

2006-06-08 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-08 kell 09:32, kirjutas Greg Stark:
 Zeugswetter Andreas DCP SD [EMAIL PROTECTED] writes:
 
  Imho the op should only choose that path if he wants to fill the table
  before adding the inheritance. It makes no sense to add columns with default
  values to existing rows of the child table, especially when you inherit the
  defaults from the parent.
 
 We already have ALTER TABLE ADD COLUMN working for columns with defaults, so I
 think that horse has left the barn. 

Do you mean that in newer versions ALTER TABLE ADD COLUMN will change
existing data without asking me ?

That would be evil!

Even worse if ALTER TABLE ALTER COLUMN SET DEFAULT would do the same.

Soon we will be as bad as MS Word !

 It was awfully annoying for users when that feature was missing. 
 Any non-linearities in the user interface like this
 end up being surprises and annoyances for users.

I would be *really*, *really*, *really* annoyed if an op that I expected
to take less than 1 sec takes 5 hours and then forces me to spend
another 10 hours on VACUUM FULL+REINDEX or CLUSTER to get performance
back.

And making such radical changes even between major versions should be
stricty forbidden.

 In any case there's a separate problem with defaults. We want to guarantee
 that you can DROP a partition and then re-ADD it and the result should be a
 noop at least from the user's perspective.

If DROP partition keeps defaults, and ADD does not change them then DROP
+ADD is a NOOP.

 We can't do that unless I
 compromise on my idea that adding a child after the fact should be equivalent
 to creating it with the parent in the definition.
 
 When creating a table with the parent in the definition CREATE TABLE will copy
 the parent's default if the default in the child is NULL:
 
 postgres=# create table b (i integer default null) inherits (a);
 NOTICE:  merging column i with inherited definition
 CREATE TABLE
 postgres=# \d b
Table public.b
  Column |  Type   | Modifiers 
 +-+---
  i  | integer | default 2
 Inherits: a
 
 
 The problem is that it's possible to fiddle with the defaults after the table
 is created, including dropping a default. If you drop the default and then
 DROP-ADD the partition it would be a problem if the default magically
 reappeared.

sure. it should not magically appear.

 The only way to allow DROP then ADD to be a noop would be to accept whatever
 the DEFAULT is on the child table without complaint. And I'm not just saying
 that because it's the easiest for me to implement :)

exactly. that would be the correct behaviour. even for NULL default.

 This is already a factor for NOT NULL constraints too. When adding a parent
 after the fact your NULLable column can magically become NOT NULL if the
 parent is NOT NULL. But for adding a partition after the fact we can't just
 change the column to NOT NULL because there may already be NULL rows in the
 table.

constraints should match, that is a child table should already have all
the constraints of parent, but may have more.

 We could do a pass-3 check for the NOT NULL constraint but if we're not doing
 other schema changes then it makes more sense to just refuse to add such a
 table.

nono. the ADD/DROP INHERITS should not do any data checking, just
comparison of metainfo. the partitions could be huge and having to check
data inside them would negate most of the usefullness for ADD/DROP
INHERITS.

-- 

Hannu Krosing
Database Architect
Skype Techshould benologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] How to avoid transaction ID wrap

2006-06-08 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-08 kell 12:09, kirjutas Tom Lane:
 Hannu Krosing [EMAIL PROTECTED] writes:
  If the aim is to *only* avoid transaction wraparound, then maybe we
  could introduce VACUUM FREEZE ONLY; which never removes any old tuples,
  but instead just marks them by setting xmin=xmax for them, in addition
  to its freezing of live-and-visible-to-all tuples.
 
  This would avoid touching indexes at all and may well be what is desired
  for tables with only very little updates/deletes.
 
 Seems like useless complexity.  If there are so few dead tuples that you
 can afford to not reclaim them, then there are so few that reclaiming
 them isn't really going to cost much either ...

It will cost 1 full scan per index, which can be quite a lot of disk
read traffic, if indexes are not used, say when most access is local to
some hotspot.

   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] ADD/DROP INHERITS

2006-06-08 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-08 kell 11:42, kirjutas Greg Stark:
 Hannu Krosing [EMAIL PROTECTED] writes:
 
  Do you mean that in newer versions ALTER TABLE ADD COLUMN will change
  existing data without asking me ?
  
  That would be evil!
  
  Even worse if ALTER TABLE ALTER COLUMN SET DEFAULT would do the same.
 
 postgres=# alter table test add b integer default 1;
 ALTER TABLE
 postgres=# select * from test;
  a | b 
 ---+---
  0 | 1
 (1 row)
 
   It was awfully annoying for users when that feature was missing. 
   Any non-linearities in the user interface like this
   end up being surprises and annoyances for users.
  
  I would be *really*, *really*, *really* annoyed if an op that I expected
  to take less than 1 sec takes 5 hours and then forces me to spend
  another 10 hours on VACUUM FULL+REINDEX or CLUSTER to get performance
  back.
 
 I forget whether the developer managed to get it working without doing any
 table rewriting. In theory the table just needs to know that records that are
 missing that column in the null bitmap should behave as if they have the
 default value. But I seem to recall some headaches with that approach.

I remember that discussion, but I'm surprised that something got
implemented and accepted into core with so many unsolvable
problems/logical inconsistencies/new pitfalls.

for example - to be consistent, one should also make ALTER TABLE ALTER
COLUMN col SET DEFAULT x change each default value, no ? but how
should one know it for records which are updated, possibly in columns
newer than the one with changed DEFAULT. Or was a new default bitmap
introduced in addition to null bitmap ?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] ADD/DROP constraints

2006-06-08 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-08 kell 16:30, kirjutas Greg Stark:
 On a separate note. The one major remaining piece here is in constraints. I'm
 thinking what I have to check is that every constraint present on the parent
 table is present on the child tables. And I'm thinking I should do that by
 looking at the constraint's textual definition (consrc).
 
 This doesn't allow you to get by with a single stronger constraint -- you
 would still need the redundant looser constraint to satisfy the inheritance.

You could find some infrastructure for stronger constraint recognition
in constraint exclusion code, if you want to go that way.

 But it does let you get by with constraint names that don't match the
 parent's.
 
 I'm not sure that's such a good thing, since pg_dump would then generate a
 redundant constraint when it generates the table. Maybe that would go if
 constraints got conislocal and coninh.

Currently pg_dump generates all constraints with ONLY clause anyway.

But I agree that we should get rid of ONLY for ADD CONSTRAINT once we
disallow dropping inherited constraints.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] ADD/DROP INHERITS

2006-06-11 Thread Hannu Krosing
Ühel kenal päeval, L, 2006-06-10 kell 13:06, kirjutas Greg Stark:

 But perhaps it's just worth those downsides to keep DROP/ADD a noop in more
 cases.

I don't think that keeping DROP/ADD a noop in more cases. is a goal
here.

It may be a kind of semi-useful metric of design goodness, but never a
goal in itself.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Rethinking stats communication mechanisms

2006-06-18 Thread Hannu Krosing
Ühel kenal päeval, P, 2006-06-18 kell 15:09, kirjutas Tom Lane:
 Magnus Hagander [EMAIL PROTECTED] writes:
  Might it not be a win to also store per backend global values in the
  shared memory segment? Things like time of last command, number of
  transactions executed in this backend, backend start time and other
  values that are fixed-size?

One thing that is doable in constant size memory and would be enormously
usable for us is counting to-level function calls and storing their
total (and possibly also max) duration.

The resaon being, that our production databases are accessed by clients
using  functions only (with some uninteresting exeptions of course),
that is call in form of SELECT x,y,z FROM myfunc(i,j,k)

So reserving N*1.5 slots (N being the number of functions defined at
databse startup) would be ok. If more than N*0.5 functions are defined
in the database lifetime, then the rest are simply ignored (not
counted).

Or maybe a better approach would be to have a conf variable
number-of-functions-to-track and a special boolean flag track_me in
pg_functions. In this way you don't accidentally run out of shared mem
by defining lots of new functions and then restarting the cluster.


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Getting rid of extra gettimeofday() calls

2006-06-19 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-06-19 kell 11:17, kirjutas Tom Lane:
 As of CVS tip, PG does up to four separate gettimeofday() calls upon the
 arrival of a new client command.  This is because the statement_timestamp,
 stats_command_string, log_duration, and statement_timeout features each
 independently save an indication of statement start time.  Given what
 we've found out recently about gettimeofday() being unduly expensive on
 some hardware, this cries out to get fixed.  I propose that we do
 SetCurrentStatementStartTimestamp() immediately upon receiving a client
 message, and then make the other features copy that value instead of
 fetching their own.
 
 Another gettimeofday() call that I would like to get rid of is the one
 currently done at the end of statement when stats_command_string is
 enabled: we record current time when resetting the activity_string to
 IDLE. 

Is it just IDLE or also IDLE in transaction ?

If we are going to change things anyway, I'd like the latter to show the
time since start of transaction, so that I Would at least have an easy
way to write a transaction timeout script :)


I don't really care about what plain IDLE uses.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-22 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-22 kell 09:59, kirjutas Mark Woodward:
  After a long battle with technology, [EMAIL PROTECTED] (Mark
  Woodward), an earthling, wrote:
  Clinging to sanity, [EMAIL PROTECTED] (Mark Woodward) mumbled into
  It pointed to *ALL* the versions.
 
 Hmm, OK, then the problem is more serious than I suspected.
 This means that every index on a row has to be updated on every
 transaction that modifies that row. Is that correct?

Yes. 

 I am attaching some code that shows the problem with regard to
 applications such as web server session management, when run, each second
 the system can handle fewer and fewer connections. Here is a brief output:
 
 [EMAIL PROTECTED]:~/pgfoo$ ./footest
 1307 sessions per second, elapsed: 1
 1292 sessions per second, elapsed: 2
 1287 sessions per second, elapsed: 3
 
 1216 sessions per second, elapsed: 25
 1213 sessions per second, elapsed: 26
 1208 sessions per second, elapsed: 27
 
 1192 sessions per second, elapsed: 36
 1184 sessions per second, elapsed: 37
 1183 sessions per second, elapsed: 38
 
 1164 sessions per second, elapsed: 58
 1170 sessions per second, elapsed: 59
 1168 sessions per second, elapsed: 60
 
 As you can see, in about a minute at high load, this very simple table
 lost about 10% of its performance, and I've seen worse based on update
 frequency.  Before you say this is an obscure problem, I can tell you it
 isn't. I have worked with more than a few projects that had to switch away
 from PostgreSQL because of this behavior.

You mean systems that are designed so exactly, that they can't take 10%
performance change ?

Or just that they did not vacuum for so long, that performance was less
than needed in the end?

btw, what did they switch to ?

 Obviously this is not a problem with small sites, but this is a real
 problem with an enterprise level web site with millions of visitors and
 actions a day. 

On such site you should design so that db load stays below 50% and run
vacuum often, that may even mean that you run vacuum continuously with
no wait between runs. If you run vacuum with right settings, 

 Quite frankly it is a classic example of something that
 does not scale. The more and more updates there are, the higher the load
 becomes. You can see it on top as the footest program runs.

Yes, you understood correctly - the more updates, the higher the load :)

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-22 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-22 kell 10:20, kirjutas Jonah H. Harris:
 On 6/22/06, Alvaro Herrera [EMAIL PROTECTED] wrote:
   Hmm, OK, then the problem is more serious than I suspected.
   This means that every index on a row has to be updated on every
   transaction that modifies that row. Is that correct?
 
  Add an index entry, yes.
 
 Again, this is a case for update-in-place.  No need to write an extra
 index entry and incur the WAL associated with it. 

I guess that MySQL on its original storage does that, but they allow
only one concurrent update per table and no transactions.

 Imagine a table
 with 3 indexes on it... I would estimate that we perform at least 3 to
 6 times more overhead than any commercial database on such an update.

One way to describe what commercial databases do to keep constant 
update rates is saying that they do either vacuuming as part of 
update, or they just use locks anf force some transactions to wait or 
fail/retry.

Depending on exact details and optimisations done, this can be either
slower or faster than postgresql's way, but they still need to do
something to get transactional visibility rules implemented.

   There has to be a more linear way of handling this scenario.
 
  So vacuum the table often.
 
 It's easy to say VACUUM often... but I'd bet that vacuuming is going
 to lessen the throughput in his tests even more; no matter how it's
 tuned.

Running VACUUM often/continuously will likely keep his update rate
fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2% extra
load. At least if vacuum is configured right and the server is not
already running at 100% IO saturation, in which case it will be worse.

The max throughput figure is not something you actually need very often
in production. What is interesting is setting up the server so that you
can service your loads comfortably. Running the server at 100% lead is
not anything you want to do on production server. There will be things
you need to do anyway and you need some headroom for that.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-22 kell 12:41, kirjutas Mark Woodward:

  Depending on exact details and optimisations done, this can be either
  slower or faster than postgresql's way, but they still need to do
  something to get transactional visibility rules implemented.
 
 I think they have a different strategy. I think they maintain the notion
 of current version of a row, and hunt for previous versions when needed,
 at least that's how I suspect Oracle does it with redo logs.

Not current but last :)

And one side effect of redo logs is that it is practically impossible to
do large deletes on production databases. So you design around that,
like you have to design around limitations of MVCC.

There has to be a more linear way of handling this scenario.
  
   So vacuum the table often.
 
  It's easy to say VACUUM often... but I'd bet that vacuuming is going
  to lessen the throughput in his tests even more; no matter how it's
  tuned.
 
  Running VACUUM often/continuously will likely keep his update rate
  fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2% extra
  load. At least if vacuum is configured right and the server is not
  already running at 100% IO saturation, in which case it will be worse.
 
 Assuming the table is a reasonable size, the I/O required for vacuum
 doesn't kill everything else!

I have solved the problem of unneccessary IO by keeping active and
finished rows in separate tables, with the finish() function moving the
row between tables.

In case of the number of actively modified rows being in only tens or
low hundreds of thousands of rows, (i.e. the modified set fits in
memory) the continuous vacuum process shows up as just another backend,
not really taking order of magnitude more resources. It mainly generates
WAL traffic, as modified pages are already in memory/cache and are
mostly synced by background writer and/or checkpoint.

Of course you have to adjust vacuum_cost_* variables so as to not
saturate IO.

  The max throughput figure is not something you actually need very often
  in production.
 
 No, but you need to have some degree of certainty and predictability in
 the system you are developing.

Yup. You have to design it so it has.

  What is interesting is setting up the server so that you
  can service your loads comfortably. Running the server at 100% lead is
  not anything you want to do on production server. There will be things
  you need to do anyway and you need some headroom for that.
 
 Of course, you design it so peaks are easily managed, but unless you run
 vacuum continuously, and that has its own set of problems, you run into
 this problem, and it can get really really bad.

Usually it gets really bad if you *don't* run vacuum continuously, maybe
hopeing to do it in slower times at night. For high-update db you have
to run it continuously, maybe having some 5-15 sec pauses between runs.


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-22 kell 13:49, kirjutas Mark Woodward:
  Christopher Browne [EMAIL PROTECTED] writes:

  Basically there's no free lunch: if you want the benefits of MVCC it's
  going to cost you somewhere.  In the Postgres design you pay by having
  to do VACUUM pretty often for heavily-updated tables.  I don't think
  that decision is fundamentally wrong --- the attractive thing about it
  is that the overhead is pushed out of the foreground query-processing
  code paths.
 
 Under certain circumstances, it is a very poor design. Think of a single
 row table that keeps a scoreboard or a session table that keeps a limited
 number of rows that are updated very frequently.

A single row table that every session updates is a really bad design on
any database, as it is a sure point of lock contention and thus removes
any possibility of concurrency.

But except for locking problems, it will perform really well when you
vacuum often enough :)

  We still have lots of work to do in making autovacuum
  smarter, avoiding vacuuming parts of relations that have not changed,
  and so on.  But I have no desire to go over to an Oracle-style solution
  instead.  We can't beat them by trying to be like them, and we run no
  small risk of falling foul of some of their patents if we do.
 
 I proposed having a key row entry for each logical row. The key row
 entry points to the latest version of the row. There, each row entry is a
 linked list, in descending order, of previous row versions. 

Do I understand right, that you are proposing a redesign of how indexing
works, by updating indexes in-place. 

How would older rows be found then by transactions needing to see
them ? 

Do you suggest reverting to seqscan when we see _any_ newer
transactions ?

Or if you want to have index pointing to latest row with each value in
indexed field, how would you find the last time this value was used ?

Don't tell me that you plan to trace the full update-chain on each
update.

Or would this new indexing mechanism be used only for non-changing key
fields ? How would you check for that ?

 The vast majority of the time, the latest version will be the first version. 

Not in a web scenario. In my experience more complicated web-pages tend
to produce lots of concurrent accesses.

 It is
 only when you have a previously started long running or concurrent
 transaction will you ever look at previous versions.

 I'm not saying it is an easy slam dunk, as I can think of a few
 difficulties off the top of my head, but it would solve the steady
 degradation of performance between vacuums and, to a possibly lesser
 extent, the cost of updating a row in a heavily indexed table.

VACUUMing often also solves the problem of steady degradation of
performance between vacuums :)

No need to be afraid of vacuum. Vacuum is your friend! Just learn to use
it right.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-06-23 kell 10:30, kirjutas Mark Woodward:

   What is interesting is setting up the server so that you
   can service your loads comfortably. Running the server at 100% lead is
   not anything you want to do on production server. There will be things
   you need to do anyway and you need some headroom for that.
 
  Of course, you design it so peaks are easily managed, but unless you run
  vacuum continuously, and that has its own set of problems, you run into
  this problem, and it can get really really bad.
 
  Usually it gets really bad if you *don't* run vacuum continuously, maybe
  hopeing to do it in slower times at night. For high-update db you have
  to run it continuously, maybe having some 5-15 sec pauses between runs.
 
 And how much I/O does this take?

Surprisingly its mostly WAL traffic, the heap/index pages themselves are
often not yet synced to disk by time of vacuum, so no additional traffic
there. If you had made 5 updates per page and then vacuum it, then you
make effectively 1 extra WAL write meaning 20% increase in WAL traffic. 

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-06-23 kell 18:05, kirjutas Csaba Nagy:
Usually it gets really bad if you *don't* run vacuum continuously, maybe
hopeing to do it in slower times at night. For high-update db you have
to run it continuously, maybe having some 5-15 sec pauses between runs.
   
   And how much I/O does this take?
  
  Surprisingly its mostly WAL traffic, the heap/index pages themselves are
  often not yet synced to disk by time of vacuum, so no additional traffic
  there. If you had made 5 updates per page and then vacuum it, then you
  make effectively 1 extra WAL write meaning 20% increase in WAL traffic. 
 
 Is this also holding about read traffic ? I thought vacuum will make a
 full table scan... for big tables a full table scan is always badly
 influencing the performance of the box. If the full table scan would be
 avoided, then I wouldn't mind running vacuum in a loop... 

I was referring to a design that keeps frequently updated tuples in a
separate table.

 In fact I think that it would make sense to replace the whole current
 vacuum stuff with a background thread which does that continuously using
 a dead space map. That could be a heap sorted by tuple deletion time,
 and always cleaned up up to the oldest running transaction's start
 time... there would be no need for any other autovacuum then.

This has been on todo list for some time already.

 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-24 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-06-23 kell 13:08, kirjutas Tom Lane:
 Csaba Nagy [EMAIL PROTECTED] writes:
  Surprisingly its mostly WAL traffic, the heap/index pages themselves are
  often not yet synced to disk by time of vacuum, so no additional traffic
  there. If you had made 5 updates per page and then vacuum it, then you
  make effectively 1 extra WAL write meaning 20% increase in WAL traffic. 
 
  Is this also holding about read traffic ? I thought vacuum will make a
  full table scan... for big tables a full table scan is always badly
  influencing the performance of the box. If the full table scan would be
  avoided, then I wouldn't mind running vacuum in a loop... 
 
 If you're doing heavy updates of a big table then it's likely to end up
 visiting most of the table anyway, no?  There is talk of keeping a map
 of dirty pages, but I think it'd be a win for infrequently-updated
 tables, not ones that need constant vacuuming.
 
 I think a lot of our problems in this area could be solved with fairly
 straightforward tuning efforts on the existing autovacuum
 infrastructure.  In particular, someone should be looking into
 recommendable default vacuum-cost-delay settings so that a background
 vacuum doesn't affect performance too much.

One thing that would help updates quite a lot in some scenarios is
keeping the pages only partially-filled, so that most updates could keep
the new version in the same page. I think that has also been discussed
as an option to vacuum and maybe as part of initial inserts. Maybe some
of it even ended up as a todo item.

 Another problem with the
 current autovac infrastructure is that it doesn't respond very well to
 the case where there are individual tables that need constant attention
 as well as many that don't.  If you have N databases then you can visit
 a particular table at most once every N*autovacuum_naptime seconds, and
 *every* table in the entire cluster gets reconsidered at that same rate.
 I'm not sure if we need the ability to have multiple autovac daemons
 running at the same time, 

My patch enabling effective continuous vacuum of fast-update tables,
while still being able to vacuum huge slowly changing ones is still not
applied. Without that patch there is no reason to vacuum the small and
fast changingg tables while vacuum on bigger tables is running, as it
won't clean out dead tuples anyway.

 but we definitely could use something with a
 more flexible table-visiting pattern.  Perhaps it would be enough to
 look through the per-table stats for each database before selecting the
 database to autovacuum in each cycle, instead of going by least
 recently autovacuumed.
 
 Bottom line: there's still lots of low-hanging fruit.  Why are people
 feeling that we need to abandon or massively complicate our basic
 architecture to make progress?

Maybe we could start from reusing the index tuples which point to
invisible tuples ? The index is not MVCC anyway, so maybe it is easier
to do in-place replacement there ?

This probably has the same obstacles which have prevented us from
removing those in the first place (removing instead of marking as
invisible). Does it cause some locking issues ? Or does it go against
some other constraints of our index lookups ?

I think that just setting the invisible bit in an index leaf node causes
nearly as much disk io as removing the node.

If we could delete/reuse old index tuples, it would solve a sizable
chunk of index-growth problem, especially for cases where referenced key
value does not change.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-24 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-06-23 kell 17:27, kirjutas Bruce Momjian:
 Jonah H. Harris wrote:
  On 6/23/06, Tom Lane [EMAIL PROTECTED] wrote:
   What I see in this discussion is a huge amount of the grass must be
   greener on the other side syndrome, and hardly any recognition that
   every technique has its downsides and complications.
  
  I'm being totally objective.  I don't think we should abandon
  PostgreSQL's overall design at all, because we do perform INSERTs and
  DELETEs much better than most systems.  However, I've looked at many
  systems and how they implement UPDATE so that it is a scalable
  operation.  Sure, there are costs and benefits to each implementation,
  but I think we have some pretty brilliant people in this community and
  can come up with an elegant design for scalable UPDATEs.
 
 I think the UPDATE case is similar to the bitmap index scan or perhaps
 bitmap indexes on disk --- there are cases we know can not be handled
 well by our existing code, so we have added (or might add) these
 features to try to address those difficult cases.

Not really. Bitmap index scan and bitmap index are both new additions
working well with existing framework. 

While the problem of slowdown on frequent updates is real, the suggested
fix is just plain wrong, as it is based on someones faulty assumption on
how index lookup works, and very much simplified view of how different
parts of the system work to implement MVCC.

The original fix he suggests was to that imagined behaviour and thus
ignored all the real problems of such change.

All the next suggestions were variations of the first ones, and failed
to address or even research any problems brought up.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-24 Thread Hannu Krosing
Ühel kenal päeval, L, 2006-06-24 kell 15:44, kirjutas Jan Wieck:

  That fixes the symptom, not the problem. The problem is performance
  steadily degrades over time.
  
  No, you got it backwards.  The performance degradation is the symptom.
  The problem is that there are too many dead tuples in the table.  There
  is one way to solve that problem -- remove them, which is done by
  running vacuum.
 
 Precisely.
 
  There are some problems with vacuum itself, that I agree with.  For
  example it would be good if a long-running vacuum wouldn't affect a
  vacuum running in another table because of the long-running transaction
  effect it has.  It would be good if vacuum could be run partially over a
  table.  It would be good if there was a way to speed up vacuum by using
  a dead space map or something.
 
 It would be good if vacuum wouldn't waste time on blocks that don't have 
 any possible work in them. Vacuum has two main purposes. A) remove dead 
 rows and B) freeze xids. Once a block has zero deleted rows and all xids 
 are frozen, there is nothing to do with this block and vacuum should 
 skip it until a transaction updates that block.
 
 This requires 2 bits per block, which is 32K per 1G segment of a heap. 
 Clearing the bits is done when the block is marked dirty. This way 
 vacuum would not waste any time and IO on huge slow changing tables. 
 That part, sequentially scanning huge tables that didn't change much is 
 what keeps us from running vacuum every couple of seconds.

Seems like a plan. 

Still, there is another problem which is not solved by map approach
only, at least with current implementation of vacuum.

This is the fact that we need to do full scan over index(es) to clean up
pointers to removed tuples. And huge tables tend to have huge indexes.

As indexes have no MVCC info inside them, it may be possible to start
reusing index entries pointing to rows that are invisible to all running
transactions. Currently we just mark these index entries as dead, but
maybe there is a way to reuse them. This could solve the index bloat
problem for may cases.

Another possible solution for indexes with mostly dead pointers is doing
a reindex, but this will become possible only after we have implemented
a concurrent, non-blocking CREATE INDEX.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Hannu Krosing
Ühel kenal päeval, L, 2006-06-24 kell 19:36, kirjutas Bruce Momjian:
 Hannu Krosing wrote:
  ?hel kenal p?eval, R, 2006-06-23 kell 13:08, kirjutas Tom Lane:

   
   Bottom line: there's still lots of low-hanging fruit.  Why are people
   feeling that we need to abandon or massively complicate our basic
   architecture to make progress?
  
  Maybe we could start from reusing the index tuples which point to
  invisible tuples ? The index is not MVCC anyway, so maybe it is easier
  to do in-place replacement there ?
  
  This probably has the same obstacles which have prevented us from
  removing those in the first place (removing instead of marking as
  invisible). Does it cause some locking issues ? Or does it go against
  some other constraints of our index lookups ?
  
  I think that just setting the invisible bit in an index leaf node causes
  nearly as much disk io as removing the node.
  
  If we could delete/reuse old index tuples, it would solve a sizable
  chunk of index-growth problem, especially for cases where referenced key
  value does not change.
 
 I think heap _and_ index reuse is the only useful direction.  Index or
 heap reuse alone seems too marginal for the added complexity.

Sure, but index reuse seems a lot easier, as there is nothing additional
to remember or clean out when doing it.

When reusing a heap tuple you have to clean out all index entries
pointing to it.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Hannu Krosing
Ühel kenal päeval, P, 2006-06-25 kell 14:24, kirjutas Bruce Momjian:
 Jan Wieck wrote:
   Sure, but index reuse seems a lot easier, as there is nothing additional
   to remember or clean out when doing it.
   
   Yes, seems so.  TODO added:
   
 * Reuse index tuples that point to heap tuples that are not visible to
   anyone?
   
   When reusing a heap tuple you have to clean out all index entries
   pointing to it.
   
   Well, not for UPDATE for no key changes on the same page, if we do that.
   
  
  An update that results in all the same values of every indexed column of 
  a known deleted invisible tuple. This reused tuple can by definition not 
  be the one currently updated. So unless it is a table without a primary 
  key, this assumes that at least 3 versions of the same row exist within 
  the same block. How likely is that to happen?
 
 Good question.  You take the current tuple, and make another one on the
 same page.  Later, an update can reuse the original tuple if it is no
 longer visible to anyone (by changing the item id), so you only need two
 tuples, not three.  My hope is that a repeated update would eventually
 move to a page that enough free space for two (or more) versions.

I can confirm that this is exactly what happens when running an
update-heavy load with frequent vacuums. Eventually most rows get their
own db pages or share the same page with 2-3 rows. And there will be
lots of unused (filed up, or cleaned and not yet reused) pages.

The overall performance could be made a little better by tuning the
system to not put more than N new rows on the same page at initial
insert or when the row move to a new page during update. Currently
several new rows are initially put on the same page and then move around
during repeated updates until they slow(ish)ly claim their own page.

 Does that help explain it?
 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Hannu Krosing
Ühel kenal päeval, P, 2006-06-25 kell 06:52, kirjutas Mark Woodward:

 I'm not sure why vacuum can't run similarly to the way it does now.

What do you mean ?

Currently vacuum runs a three-step process

1) runs a full scan over heap and collects all dead tuple ctids from
heap

2) run full scans over all indexes of the relation and removes any
pointers pointing to dead tuples.

3) runs another full scan over heap and removes the tuples in the list
collected at step 1.

There is no modifications done to live tuples (ok, they *may* get frozen
if they are above certain age, but this is not relevant to current
discussion).

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-26 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-06-26 kell 14:56, kirjutas Martijn van
Oosterhout:
 On Mon, Jun 26, 2006 at 07:17:31AM -0400, Bruce Momjian wrote:
  Correct!  We use the same pointers used by normal UPDATEs, except we set
  a bit on the old tuple indicating it is a single-index tuple, and we
  don't create index entries for the new tuple.  Index scan routines will
  need to be taught about the new chains, but because only one tuple in
  the chain is visible to a single backend, the callers should not need to
  be modified.
 
 I suppose we would also change the index_getmulti() function to return
 a set of ctids plus flags so the caller knows to follow the chains,
 right? 

It is probably better to always return the pointer to the head of CITC
chain (the one an index points to) and do extra visibility checks and
chain-following on each access. This would keep the change internal to
tuple fetching functions.

 And for bitmap index scans you would only remember the page in
 the case of such a tuple, since you can't be sure the exact ctid you've
 got is the one you want.

no, you should only use the pointer to CITC head outside tuple access
funtions. And this pointer to CITC head is what is always passed to
those access functions/macros.

The VACUUM would run its passes thus:

pass 1: run over heap, collect pointers to single dead tuples, and fully
dead CITC chains (fully dead = no live tuples on this page). Clean up
old tuples from CITC chains and move live tuples around so that CITC
points to oldest possibly visible (not vacuumed) tuple. Doing this there
frees us from need to collect a separate set of pointers for those. Or
have you planned that old tuples from CITC chains are collected on the
go/as needed ? Of course we could do both.

pass 2: clean indexes based on ctid from pass 1

pass 3: clean heap based on ctid from pass 1

If yo do it this way, you dont need to invent new data structures to
pass extra info about CITC internals to passes 2 and 3

On more thing - when should free space map be notified about free space
in pages with CITC chains ?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-26 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-06-26 kell 09:10, kirjutas Mark Woodward:
  Ühel kenal päeval, R, 2006-06-23 kell 17:27, kirjutas Bruce Momjian:
  Jonah H. Harris wrote:
   On 6/23/06, Tom Lane [EMAIL PROTECTED] wrote:
What I see in this discussion is a huge amount of the grass must be
greener on the other side syndrome, and hardly any recognition that
every technique has its downsides and complications.
  
   I'm being totally objective.  I don't think we should abandon
   PostgreSQL's overall design at all, because we do perform INSERTs and
   DELETEs much better than most systems.  However, I've looked at many
   systems and how they implement UPDATE so that it is a scalable
   operation.  Sure, there are costs and benefits to each implementation,
   but I think we have some pretty brilliant people in this community and
   can come up with an elegant design for scalable UPDATEs.
 
  I think the UPDATE case is similar to the bitmap index scan or perhaps
  bitmap indexes on disk --- there are cases we know can not be handled
  well by our existing code, so we have added (or might add) these
  features to try to address those difficult cases.
 
  Not really. Bitmap index scan and bitmap index are both new additions
  working well with existing framework.
 
  While the problem of slowdown on frequent updates is real, the suggested
  fix is just plain wrong, as it is based on someones faulty assumption on
  how index lookup works, and very much simplified view of how different
  parts of the system work to implement MVCC.
 
 Yes, the suggestion was based on MVCC concepts, not a particular
 implementation.

On the contrary - afaik, it was loosely based on how Oracle does it with
its rollback segments, only assuming that rollback segments are kept in
heap and that indexes point only to the oldest row version :p

  The original fix he suggests was to that imagined behaviour and thus
  ignored all the real problems of such change.
 
 The original suggestion, was nothing more than a hypothetical for the
 purpose of discussion.
 
 The problem was the steady degradation of performance on frequent updates.
 That was the point of discussion.  I brought up one possible way to
 start a brain storm. The discussion then morphed into critisizing the
 example and not addressing the problem.

The problem is heatedly discussed every 3-4 months.

 Anyway, I think some decent discussion about the problem did happen, and
 that is good.

Agreed. 

Maybe this _was_ the best way to bring up the discussion again.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-26 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-06-26 kell 10:50, kirjutas Bruce Momjian:
 Hannu Krosing wrote:
  ?hel kenal p?eval, E, 2006-06-26 kell 14:56, kirjutas Martijn van
  Oosterhout:
   On Mon, Jun 26, 2006 at 07:17:31AM -0400, Bruce Momjian wrote:
Correct!  We use the same pointers used by normal UPDATEs, except we set
a bit on the old tuple indicating it is a single-index tuple, and we
don't create index entries for the new tuple.  Index scan routines will
need to be taught about the new chains, but because only one tuple in
the chain is visible to a single backend, the callers should not need to
be modified.
   
   I suppose we would also change the index_getmulti() function to return
   a set of ctids plus flags so the caller knows to follow the chains,
   right? 
  
  It is probably better to always return the pointer to the head of CITC
  chain (the one an index points to) and do extra visibility checks and
  chain-following on each access. This would keep the change internal to
  tuple fetching functions.
 
 So index_getnext() traverses the chain and returns one member per call. 
 Makes sense.  Just realize you are in a single index entry returning
 multiple tuples.  We will need some record keeping to track that.

Maybe we need to push visibility checks further down, so that
index_getnext() returns only the one heap row that is visible.

   And for bitmap index scans you would only remember the page in
   the case of such a tuple, since you can't be sure the exact ctid you've
   got is the one you want.
  
  no, you should only use the pointer to CITC head outside tuple access
  funtions. And this pointer to CITC head is what is always passed to
  those access functions/macros.
  
  The VACUUM would run its passes thus:
  
  pass 1: run over heap, collect pointers to single dead tuples, and fully
  dead CITC chains (fully dead = no live tuples on this page). Clean up
  old tuples from CITC chains and move live tuples around so that CITC
  points to oldest possibly visible (not vacuumed) tuple. Doing this there
  frees us from need to collect a separate set of pointers for those. Or
  have you planned that old tuples from CITC chains are collected on the
  go/as needed ? Of course we could do both.
 
 Non-visible CITC members should be freed during an UPDATE on the same
 page, so vacuum doesn't have to be involved.

Ok.

  pass 2: clean indexes based on ctid from pass 1
  
  pass 3: clean heap based on ctid from pass 1
  
  If yo do it this way, you dont need to invent new data structures to
  pass extra info about CITC internals to passes 2 and 3
  
  On more thing - when should free space map be notified about free space
  in pages with CITC chains ?
 
 Uh, well, I am thinking we only free CITC space when we are going to use
 it for an UPDATE, rather than free things while doing an operation.  It
 is good to keep the cleanup overhead out of the main path as much as
 possible.

So vacuum should only remove dead CITC chains and leave the ones with
live tuples to CITC internal use ?

That would also suggest that pages having live CITC chains and less than
N% of free space should mot be reported to FSM.

 Also, seems I can't spell algorithms very well:
 
 Definition:  Single-Index-Tuple Chain (SITC)
  -
 Thinking of vacuum, right now it does these cleanups:
 
   o  non-visible UPDATEs on the same page with no key changes
   o  non-visible UPDATEs on the same page with key changes
   o  non-visible UPDATEs on different pages
   o  DELETEs
   o  aborted transactions
 
 The big question is what percentage of dead space is the first one?  My
 guess is 65%.

Can be from 0% to 99.9%, very much dependent on application.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-26 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-06-26 kell 16:58, kirjutas Martijn van
Oosterhout:
 On Mon, Jun 26, 2006 at 10:50:26AM -0400, Bruce Momjian wrote:
I suppose we would also change the index_getmulti() function to return
a set of ctids plus flags so the caller knows to follow the chains,
right? 
   
   It is probably better to always return the pointer to the head of CITC
   chain (the one an index points to) and do extra visibility checks and
   chain-following on each access. This would keep the change internal to
   tuple fetching functions.
  
  So index_getnext() traverses the chain and returns one member per call. 
  Makes sense.  Just realize you are in a single index entry returning
  multiple tuples.  We will need some record keeping to track that.
 
 Yes, and for index_getmulti (which doesn't visit the heap at all) we'll
 have to change all the users of that (which aren't many, I suppose).
 It's probably worth making a utility function to expand them.
 
 I'm still confused where bitmap index scan fit into all of this. Is
 preserving the sequential scan aspect of these a goal with this new
 setup?

Bitmap index scan does not have to change much - only the function that
gets tuple by its ctid must be able to trace forward chains within the
page.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
 Jim C. Nasby wrote:
  On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
   
   It is certainly possible to do what you are suggesting, that is have two
   index entries point to same chain head, and have the index access
   routines figure out if the index qualifications still hold, but that
   seems like a lot of overhead.

I think Jim meant not 2 pointing to the same head, but 2 pointing into
the same chain. Say we have table with (id serial, ts timestamp) where
ts changes at each update and id does not.

So after 3 updates on one page we have one CITC/ITPC head with pointers
from both indexes and two follow-up tuples with pointers from only ts
index.

The problem with this setup is, that we can't reuse any of those
follow-up tuples without index cleanup.

   Also, once there is only one visible row in the chain, removing old
   index entries seems quite complex because you have to have vacuum keep
   the qualifications of each row to figure out which index tuple is the
   valid one (seems messy).
   
  Perhaps my point got lost... in the case where no index keys change
  during an update, SITC seems superior in every way to my proposal. My
  idea (let's call it Index Tuple Page Consolidation, ITPC) would be
  beneficial to UPDATEs that modify one or more index keys but still put
  the tuple on the same page. Where SITC would be most useful for tables
  that have a very heavy update rate and very few indexes, ITPC would
  benefit tables that have more indexes on them; where presumably it's
  much more likely for UPDATEs to change at least one index key (which
  means SITC goes out the window, if I understand it correctly). If I'm
  missing something and SITC can in fact deal with some index keys
  changing during an UPDATE, then I see no reason for ITPC.
 
 I understood what you had said.  The question is whether we want to get
 that complex with this feature, and if there are enough use cases
 (UPDATE with index keys changing) to warrant it.

I'd like to think that most heavily-updated tables avoid that, but there
may be still cases where this is needed.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing:
 Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
  Jim C. Nasby wrote:
   On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:

It is certainly possible to do what you are suggesting, that is have two
index entries point to same chain head, and have the index access
routines figure out if the index qualifications still hold, but that
seems like a lot of overhead.
 
 I think Jim meant not 2 pointing to the same head, but 2 pointing into
 the same chain. Say we have table with (id serial, ts timestamp) where
 ts changes at each update and id does not.
 
 So after 3 updates on one page we have one CITC/ITPC head with pointers
 from both indexes and two follow-up tuples with pointers from only ts
 index.
 
 The problem with this setup is, that we can't reuse any of those
 follow-up tuples without index cleanup.

But we still have to think about similar cases (index entries pointing
inside CITC chains), unless we plan to disallow adding indexes to
tables.

Perhaps that case has to simply disable heap tuple reuse until some
event. what would that event be?

Or maybe we should have some bitmap of dirty tuple ids inside each page,
that is tuple ids that have index pointers to them. and then avoid using
these ?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-06-26 kell 11:31, kirjutas Bruce Momjian:
 Hannu Krosing wrote:
pass 3: clean heap based on ctid from pass 1

If yo do it this way, you dont need to invent new data structures to
pass extra info about CITC internals to passes 2 and 3

On more thing - when should free space map be notified about free space
in pages with CITC chains ?
   
   Uh, well, I am thinking we only free CITC space when we are going to use
   it for an UPDATE, rather than free things while doing an operation.  It
   is good to keep the cleanup overhead out of the main path as much as
   possible.
  
  So vacuum should only remove dead CITC chains and leave the ones with
  live tuples to CITC internal use ?
 
 Yes, it has to.  What else would it do?  Add index entries?

No, clean out the dead part. 

But this would probably add the page to FSM - do we want that.

Also, this cleaning should probably be done at pass1, so we dont have to
carry the ctids of tuples which have no index entries around to passes 2
and 3 . This has the downside of possibly writing the heap page twice,
so maybe we dont want it.

  That would also suggest that pages having live CITC chains and less than
  N% of free space should mot be reported to FSM.
 
 Parts of the CITC that are not visible can be used for free space by
 vacuum, but the visible part is left alone.
 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Single Index Tuple Chain (SITC) method

2006-06-29 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-06-28 kell 18:19, kirjutas Tom Lane:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Here is an overview of the SITC method:
  http://momjian.us/cgi-bin/pgsitc
 
 A pretty fundamental problem is that the method assumes it's OK to
 change the CTID of a live tuple (by swapping its item pointer with some
 expired version).  It is not --- this will break:
   * active UPDATEs and DELETEs that may have fetched the CTID
 but not yet completed processing to decide whether to change
 the tuple;
   * pending AFTER ROW triggers, such as foreign key checks;
   * ODBC as well as other applications that assume CTID is a
 usable unique row identifier within transactions.

We should *always* return the ctid of CITC head, as this is the one that
does not change. 

And anyway, ctid is a usable unique row identifier only within read-only
transactions, or not ?

 VACUUM FULL can get away with moving tuples to new CTIDs because it takes
 AccessExclusiveLock, so there can be no open transactions with knowledge
 of current CTIDs in the table.  This is not OK for something that's
 supposed to happen in plain UPDATEs, though.

Would it still be a problem, if we *always* refer to the whole CITC
chain by its externally visible ctid, an look up the real tuple inside
tuple fetch op at every access.

(1) If we had some special bits for tuples at CITC chain head and inside
CITC but not at head, then even seqscan can ignore non-head CITC chain
members at its find next tuple op and do the real tuple lookup in some
inner function when it hits CITC head.

Is it correct to assume, that only one row version can be in process of
being modified at any one time?

 Another problem is you can't recycle tuples, nor item ids, without
 taking a VACUUM-style lock on the page (LockBufferForCleanup).  If
 anyone else is holding a pin on the page they risk getting totally
 confused --- for instance, a seqscan will either miss a tuple or scan it
 twice depending on which direction you're juggling item ids around it.

I think (1) above solves this, at cost of looking twice at CITC internal
tuple headers.

 The concurrency loss involved in LockBufferForCleanup is OK for
 background-maintenance operations like VACUUM, but I seriously doubt
 anyone will find it acceptable for UPDATE.  It could easily create
 application-level deadlocks, too.  (VACUUM is safe against that because
 it only holds one lock.)


Tom - what do you think of the other related idea, that of reusing dead
index entries ?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-29 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-06-27 kell 12:16, kirjutas Bruce Momjian:
 Hannu Krosing wrote:
  ?hel kenal p?eval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing:
   ?hel kenal p?eval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
Jim C. Nasby wrote:
 On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
  
  It is certainly possible to do what you are suggesting, that is 
  have two
  index entries point to same chain head, and have the index access
  routines figure out if the index qualifications still hold, but that
  seems like a lot of overhead.
   
   I think Jim meant not 2 pointing to the same head, but 2 pointing into
   the same chain. Say we have table with (id serial, ts timestamp) where
   ts changes at each update and id does not.
   
   So after 3 updates on one page we have one CITC/ITPC head with pointers
   from both indexes and two follow-up tuples with pointers from only ts
   index.
   
   The problem with this setup is, that we can't reuse any of those
   follow-up tuples without index cleanup.
  
  But we still have to think about similar cases (index entries pointing
  inside CITC chains), unless we plan to disallow adding indexes to
  tables.
 
 CREATE INDEX has to undo any chains where the new indexed columns change
 in the chain, and add index entries to remove the chain.

Yes, that would be the most straightforward solution.

It could be better in some cases, if we could avoid adding entries to
other indexes. Maybe we can just reset some flags, so that some SITC ops
like finding tuples by the CITC index pointer still work while adding
new entries wont. 

But it will be tricky to make this work for bitmap index scans. 

So yes, index build is a slop operation anyway, so making it even a
little slower is probably not a big problem. And most CITC chains will
have only one visible row at a time, this will probably not be a big
issue. 

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Single Index Tuple Chain (SITC) method

2006-06-29 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-29 kell 12:35, kirjutas Tom Lane:
 Martijn van Oosterhout kleptog@svana.org writes:
  Tom - what do you think of the other related idea, that of reusing dead
  index entries ?
 
 Possibly workable for btree now that we do page-at-a-time index scans;
 however I'm pretty hesitant to build any large infrastructure atop that
 change until we've got more performance results.  We might yet end up
 reverting it.
 
 Another issue is that this would replace a simple hint-bit setting with
 an index change that requires a WAL entry.  There'll be more WAL traffic
 altogether from backends retail-deleting index tuples than there would
 be from VACUUM cleaning the whole page at once --- and it won't cut the
 I/O demand from VACUUM any, either, since VACUUM still has to scan the
 index.  AFAICS this wouldn't make VACUUM either cheaper or less
 necessary, so I'm not sure I see the point.

How can it generate more traffic ? 

When you replace a dead index entry with a live one, you just reuse
space - you would have to WAL log the index in both cases (adding a new
entry or replacing dead entry)

Espacially in the case, where you replace an index entryu with the same
value.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Index corruption

2006-06-29 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-29 kell 16:42, kirjutas Chris Browne:
 [EMAIL PROTECTED] (Marc Munro) writes:
  As you see, slony is attempting to enter one tuple
  ('374520943','22007','0') two times.
 
  Each previous time we have had this problem, rebuilding the indexes on
  slony log table (sl_log_1) has fixed the problem.  I have not reindexed
  the table this time as I do not want to destroy any usable evidence.
 
 We have seen this phenomenon on 7.4.8 several times; pulled dumps of
 sl_log_1 and index files that Jan Wieck looked at, which alas hasn't
 led to a fix.
 
 He did, mind you, find some concurrency pattern that led, if memory
 serves, to 7.4.12's release.  We had experienced cases where there was
 some worse corruption that required that we rebuild replicas from
 scratch :-(.

How well did you check the C-language triggers and special slony
functions for possibly corrupting some backend/shared-mem structures ?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Index corruption

2006-06-29 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-06-29 kell 17:23, kirjutas Tom Lane:
 Marc Munro [EMAIL PROTECTED] writes:
  Tom,
  we have a newer and much smaller (35M) file showing the same thing:
 
 Thanks.  Looking into this, what I find is that *both* indexes have
 duplicated entries for the same heap tuple:
 
...
 However, the two entries in idx1 contain different data!!
 
 What I speculate right at the moment is that we are not looking at index
 corruption at all, but at heap corruption: somehow, the first insertion
 into ctid (27806,2) got lost and the same ctid got re-used for the next
 inserted row.  We fixed one bug like this before ...

Marc: do you have triggers on some replicated tables ?

I remember having some corruption in a database with weird circular
trigger structures, some of them being slony log triggers. 

The thing that seemed to mess up something inside there, was when change
on parent rownt fired a trigger that changes child table rows and there
rows fired another trigger that changed the same parent row again.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Index corruption

2006-06-30 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-06-30 kell 12:05, kirjutas Jan Wieck:
 On 6/30/2006 11:55 AM, Tom Lane wrote:
 
  Jan Wieck [EMAIL PROTECTED] writes:
  On 6/30/2006 11:17 AM, Marko Kreen wrote:
  If the xxid-s come from different DB-s, then there can still be problems.
  
  How so? They are allways part of a multi-key index having the 
  originating node ID first.
  
  Really?
  
  create table @[EMAIL PROTECTED] (
  log_origin  int4,
  log_xid @[EMAIL PROTECTED],
  log_tableid int4,
  log_actionseq   int8,
  log_cmdtype char,
  log_cmddata text
  );
  create index sl_log_1_idx1 on @[EMAIL PROTECTED]
  (log_origin, log_xid @[EMAIL PROTECTED], log_actionseq);
  
  create index sl_log_1_idx2 on @[EMAIL PROTECTED]
  (log_xid @[EMAIL PROTECTED]);
 
 You're right ... forgot about that one. And yes, there can be 
 transactions originating from multiple origins (masters) in the same 
 log. The thing is, the index is only there because in a single origin 
 situation (most installations are), the log_origin is allways the same. 
 The optimizer therefore sometimes didn't think using an index at all 
 would be good.
 
 However, transactions from different origins are NEVER selected together 
 and it wouldn't make sense to compare their xid's anyway. So the index 
 might return index tuples for rows from another origin, but the 
 following qualifications against the log_origin in the heap tuple will 
 filter them out.

The problem was not only with returning too many rows from tuples, but
as much returning too few. In case when you return too few rows some
actions will just be left out from replication and thus will be missing
from slaves.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] update/insert, delete/insert efficiency WRT vacuum

2006-07-05 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-07-04 kell 14:53, kirjutas Zeugswetter
Andreas DCP SD:
   Is there a difference in PostgreSQL performance between these two 
   different strategies:
   
   
   if(!exec(update foo set bar='blahblah' where name = 'xx'))
   exec(insert into foo(name, bar) values('xx','blahblah'); or
 
 In pg, this strategy is generally more efficient, since a pk failing
 insert would create
 a tx abort and a heap tuple. (so in pg, I would choose the insert first
 strategy only when 
 the insert succeeds most of the time (say  95%))
 
 Note however that the above error handling is not enough, because two
 different sessions
 can still both end up trying the insert (This is true for all db systems
 when using this strategy).

I think the recommended strategy is to first try tu UPDATE, if not found
then INSERT, if primary key violation on insert, then UPDATE


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] A couple thoughts about btree fillfactor

2006-07-10 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-07-10 kell 12:36, kirjutas Tom Lane:

 3. What should the minimum fillfactor be?  The patch as submitted
 set the minimum to 50% for all relation types.  I'm inclined to
 think we should allow much lower fillfactors, maybe down to 10%.
 A really low fillfactor could be a good idea in a heavily updated
 table --- at least, I don't think we have any evidence to prove
 that it's not sane to want a fillfactor below 50%.

Sure 50% is way too big as an lower limit. We may even want to have
pages that have only 1 tuple in heavy update cases.

So perhaps we should set the minimum to 1% or even 0.1% and apply
similar logic you suggested for btree pages above, that is stop adding
new ones when the threasold is reached.

 Comments?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Warm-Standby using WAL archiving / Seperate

2006-07-11 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-07-11 kell 08:38, kirjutas Andrew Rawnsley:
 Just having a standby mode that survived shutdown/startup would be a nice
 start...

I think that Simon Riggs did some work on this at the code sprint
yesterday.

 I also do the blocking-restore-command technique, which although workable,
 has a bit of a house-of-cards feel to it sometimes.
 

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Max size of a btree index entry

2006-07-11 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-07-11 kell 10:46, kirjutas Josh Berkus:
 Tom,
 
  Obviously a tree containing many such pages would be awfully inefficient
  to search, but I think a more common case is that there are a few wide
  entries in an index of mostly short entries, and so pushing the hard
  limit up a little would add some flexibility with little performance
  cost in real-world cases.
  
  Have I missed something?  Is this worth changing?
 
 Not sure.  I don't know that the difference between 2.7K and 3.9K would 
 have ever made a difference to me in any real-world case.

One (hopefully) soon-to-be real-world case is index-only queries.

We discussed one approach with Luke and he expressed interest in getting
actually done in not too distant future.

 If we're going to tinker with this code, it would be far more valuable 
 to automatically truncate b-tree entries at, say, 1K so that they could 
 be efficiently indexed.

That would not work, if we want to get all data from indexes.

Maybe compressing the keys (like we do for TOAST) would be a better
solution.

 Of course, a quick archives search of -SQL, -Newbie and -General would 
 indicate how popular of an issue this is.

It may become populat again, when we will be able to do index-only
scans. 

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Three weeks left until feature freeze

2006-07-11 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-07-11 kell 17:34, kirjutas Thomas Hallgren:
 Josh Berkus wrote:
  Tom,
 

  IOW pljava is (already) bigger than the other four PLs put together.
 
  That is odd.  Thomas?
 
 It's not that odd really:
 
 1. the mapping is strongly typed, i.e. each scalar type in PostgreSQL 
 has a set of functions that maps it to the correct primitive in Java 
 (int4 is a java int, double precision is a double etc.). PL/Java will 
 resort to string coercion only when no other option is left.
 2. a type mapping is provided for *all* types. Scalar, composite, 
 pseudo, array types, and result sets.
 3. new Java mappings can be created on the fly. Both for scalar and 
 composite types.
 4. you can create new scalar types in PostgreSQL that uses IO functions 
 written in Java.

Maybe this functionality could be lifted out of PL/Java and made
available to all PL-s ?

At least at some API level.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] More nuclear options

2006-07-11 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-07-11 kell 14:05, kirjutas Josh Berkus:
 Robert,
 
  Given the current number of projects that have no code / files / anything 
  associated with them on pgfoundry/gborg right now, this argument rings a 
  little hollow. 
 
 If you're so keen to add to the problem, you can have my spot as 
 pgfoundry admin

Why not just make *one* project, called DeadProjects and keep one
tarball + one README.TXT per directory under it, so that in the unlikely
event that someone (pg_necromancer ?) does want to resurrect a dead
project he/she/it has a place to get the code from.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Three weeks left until feature freeze

2006-07-12 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-07-12 kell 09:49, kirjutas Kaare Rasmussen:
  There should be a Procedural Language section on pgfoundry for all of the
  PLs, IMHO, and a README in contrib within core that points to it
  (README.procedural_languages, if nothing else) ...
 
 I thought that the general consensus was that only plpgsql ought to be in 
 core, the rest should be independent projects.

That would be doable if we had a stable language API.

As i understand it, we still dont. And even more - most of the changes
to API come frome the needs of those (new) languages 

 It would be nice to have an easy way to retrieve and install the desired PL's 
 but that's more of a packaging issue.
 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pgsql-patches considered harmful

2006-07-14 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-07-12 kell 23:04, kirjutas Marc G. Fournier:
 On Wed, 12 Jul 2006, Magnus Hagander wrote:
 
  There are list servers out there capable of simply ripping any 
  attachments to a message (possibly over a certain size) and stick it on 
  a website, replacing it with a link in the email. Is majordomo one of 
  them?
 
 Majordomo2 has a 'hook' for it, but, like most OSS software, nobody has 
 had the requirement to actually code it ... any perl experts here 
 interested in doing it?

Does it have to be perl ?

I can do it in python in an hour or two.

 
 Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
 Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
 Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Three weeks left until feature freeze

2006-07-14 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-07-12 kell 17:48, kirjutas Thomas Hallgren:
 Andrew Dunstan wrote:
  There is in effect no API at all, other than what is available to all 
  backend modules. If someone wants to create an API which will be both 
  sufficiently stable and sufficiently complete to meet the needs of the 
  various PLs (especially, as Hannu rightly observes, any new PLs that 
  come along) then  we can revisit this question. Until then I suggest 
  that it is at best premature. I am not even sure such a thing is 
  actually possible.
  
 I concur with this. The needs for a module like PL/Java is very different 
 then the needs of 
 PL/Perl so let's get some more PL's in before we do a refactoring effort to 
 create common 
 API's. Personally, I'm not sure what would be included. The call handler 
 API's together with 
 the SPI API's are in essence what you need. The rest is fairly specialized 
 anyway.

http://pgfoundry.org/projects/python seems to do something similar to
what you describe for PL/Java. For example it is using postgreSQLs
native types and operators for calculations instead of converting types
to pl's native types.

and it also has lots of code lines ;)

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Forcing wal rotation

2006-07-15 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-07-14 kell 17:39, kirjutas Simon Riggs:
 On Fri, 2006-07-14 at 12:09 -0400, Tom Lane wrote:
  Florian G. Pflug [EMAIL PROTECTED] writes:
   I've now thought about how to fix that without doing that rather crude 
   rsync-pg_xlog-hack.
   I've read through the code, and learned that wal-segments are expected to 
   have a specific size -
   thus rotating them early is not that easy.
  
  Simon was working on a patch for this at the code sprint; I think it's
  submitted to -patches already.  
 
 Slightly different patch. I'm working on this one still.

What is your approach here ?

And by any chance, do you plan to backport the standby WAL playback mode
patches to 8.0 and 8.1 series ?

  Explicitly filling the segment as you
  propose would be really bad for performance.
 
 Yes, current approach I am taking is better than that.

Another thing that was discussed was adding a function to postgres that
could be called to get current WAL file and offset, so an external
process could do async wal-copying at the time WAL is being written
instead of copying it all when it is finished. 

This could reduce the lag of data availability to only (fractions of)
seconds.

Is anyone working on it ?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


<    4   5   6   7   8   9   10   11   12   13   >