Re: [HACKERS] Sheduling in SQL
Tom Lane wrote: I've built applications that do roughly this sort of thing in Postgres (some of the protocol changes in 6.4 were done to make it easier ;-)). I may misremember, but IIRC some older protocol (or at least libpq) returned 0 as backend pid to listening client if it was notified by itself. Currently it returns the actual pid for any backend. Is this what you changed? Anyhow we need some _documented_ way to get backend pid (there is one actually received and stored with "cookie" for Ctrl-C processing, but AFAIK it is neither documented as being the backend id nor is there a function to get at it). For my own use I created a C function pid() but perhaps there should be something mainstream for this. --- Hannu ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Partial index on date column
Christopher Kings-Lynne kirjutas R, 07.03.2003 kell 07:28: Yeah, it's not really a problem for me, I just put the extra clause in. Is indexing excluding NULLs a common application of partial indexes? For me it is ;) It's basically all I use it for, when a column has like 90-95% NULLS and I want to exclude them from the index. Is it worth hard-coding in the IS NOT NULL case? I'd vote for it. Hannu ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL and SOAP, suggestions?
mlw kirjutas E, 31.03.2003 kell 03:43: Given a HTTP formatted query: GET http://localhost:8181/pgmuze?query=select+*+from+zsong+limit+2; The output is entered below. Questions: Is there a way, without spcifying a binary cursor, to get the data types associated with columns? Right now I am just using undefined, as the ODBC version works. Anyone see any basic improvements needed? ?xml version = 1.0? soap:Envelope xmlns:MWSSQL=http://www.mohawksoft.com/MWSSQL/envelope; soap:Header !-- Fields in set -- Columns count=9 The SOAP 1.1 spec specifies (p4.2) the following about SOAP Header: The encoding rules for header entries are as follows: 1. A header entry is identified by its fully qualified element name, which consists of the namespace URI and the local name. All immediate child elements of the SOAP Header element MUST be namespace-qualified. I'm not sure that SOAP Header is the right place for Query header info, as the header is meant for: SOAP provides a flexible mechanism for extending a message in a decentralized and modular way without prior knowledge between the communicating parties. Typical examples of extensions that can be implemented as header entries are authentication, transaction management, payment etc. So the definition of structure should probably be inside SOAP:Body . --- Hannu ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] PostgreSQL and SOAP, suggestions?
[EMAIL PROTECTED] kirjutas E, 31.03.2003 kell 19:52: Actually, as far as I am aware, the header is for metadata, i.e. it is the place to describe the data being returned. Did you read the SOAP spec ? The description of the fields isn't the actual data retrieved, so it doesn't belong in the body, so it should go into the header. That is logical, but this is not what the spec tells. Also the spec requires immediate child elements of SOAP:Header to have full namespace URI's. And another question - why do you have the namespace MWSSQL defined but never used ? - Hannu ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PostgreSQL and SOAP, suggestions?
mlw kirjutas T, 01.04.2003 kell 15:29: Hannu Krosing wrote: [EMAIL PROTECTED] kirjutas E, 31.03.2003 kell 19:52: Actually, as far as I am aware, the header is for metadata, i.e. it is the place to describe the data being returned. Did you read the SOAP spec ? yes ??? What you have come up with _is_not_ a SOAP v1.1 message at all. It does use some elements with similar names but from different namespace. the SOAP Envelope, Header and Body elemants must be from namespace http://schemas.xmlsoap.org/soap/envelope/ Per section 3 paragraph 2 of SOAP spec a conforming SOAP processor MUST discard a message that has incorrect namespace. ?xml version = 1.0? mwssql:Envelope xmlns:mwssql=http://www.mohawksoft.com/mwssql/envelope; mwssql:Header The SOAP-ENV:Header is a generic mechanism for adding features to a SOAP message in a decentralized manner without prior agreement between the communicating parties. SOAP defines a few attributes that can be used to indicate who should deal with a feature and whether it is optional or mandatory (see section 4.2). The SOAP-ENV:Body is a container for mandatory information intended for the ultimate recipient of the message (see section 4.3). SOAP defines one element for the body, which is the Fault element used for reporting errors. The Header element is encoded as the first immediate child element of the SOAP Envelope XML element. All immediate child elements of the Header element are called header entries. The encoding rules for header entries are as follows: 1. A header entry is identified by its fully qualified element name, which consists of the namespace URI and the local name. All immediate child elements of the SOAP Header element MUST be namespace-qualified. ... An example is a header with an element identifier of Transaction, a mustUnderstand value of 1, and a value of 5. This would be encoded as follows: SOAP-ENV:Header t:Transaction xmlns:t=some-URI SOAP-ENV:mustUnderstand=1 5 /t:Transaction /SOAP-ENV:Header exec:sqlupdate cgrpairs set ratio=0 where srcitem=100098670/exec:sql exec:affected2657/exec:affected qry:sqlselect * from ztitles limit 2/qry:sql qry:ROWSET qry:ROW columns=28 where are namespaces exec:, qry: abd t: defined ? Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PostgreSQL and SOAP, suggestions?
mlw kirjutas K, 02.04.2003 kell 15:56: Hannu Krosing wrote: What you have come up with _is_not_ a SOAP v1.1 message at all. It does use some elements with similar names but from different namespace. the SOAP Envelope, Header and Body elemants must be from namespace http://schemas.xmlsoap.org/soap/envelope/ [snip] Hmm, I read SHOULD and MAY in the spec, assuming that it was not MUST are you saying it is invalid if I do not use the SOAP URIs for the name spaces? If so, no big deal, I'll change them. AFAICS you can _leave_out_ the namespace, but not put in another, nonconforming namespace. As for defining the namespaces, yea that's easy enough, just tack on an attribute. I still don't see where putting the field definitions in the soap header is an invalid use of that space. It is not strictly nonconforming, just not the intended use of transparently adding new info: 4.2 SOAP Header SOAP provides a flexible mechanism for extending a message in a decentralized and modular way without prior knowledge between the communicating parties. Typical examples of extensions that can be implemented as header entries are authentication, transaction management, payment etc. I.e. the intended use of *SOAP* Header is *not* defining the structure of the message but is rather something similar to e-mail (rfc822) Headers. The XML way of defining a message is using a DTD, XML-schema, Relax NG schema or somesuch, either embedded (forbidden for DTD's in SOAP) or referenced. Also for me the following: The Header element is encoded as the first immediate child element of the SOAP Envelope XML element. All immediate child elements of the Header element are called header entries. The encoding rules for header entries are as follows: 1. A header entry is identified by its fully qualified element name, which consists of the namespace URI and the local name. All immediate child elements of the SOAP Header element MUST be namespace-qualified. describes an element with a full embedded URI, not just namespace-qualified tagname, but I may be reading it wrong and the namespace could be defined at outer level. But defining namespace at the outer level is counterintuitive for cases where the header element is to be processed and removed by some SOAP intermediary. Also this seems to support *not* using Header for essensial structure definitions: 4.3.1 Relationship between SOAP Header and Body While the Header and Body are defined as independent elements, they are in fact related. The relationship between a body entry and a header entry is as follows: A body entry is semantically equivalent to a header entry intended for the default actor and with a SOAP mustUnderstand attribute with a value of 1. The default actor is indicated by not using the actor attribute (see section 4.2.2). This suggests that putting the structure definition as 1-st Body element and data as second would be equivalent to putting structure in Header - Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PostgreSQL and SOAP, suggestions?
[EMAIL PROTECTED] kirjutas N, 03.04.2003 kell 02:01: mlw wrote: I think you are interpreting the spec a bit too restrictively. The syntax is fairly rigid, but the spec has a great degree of flexibility. I agree that, syntactically, it must work through a parser, but there is lots of room to be flexible. This is /exactly/ the standard problem with SOAP. There is enough flexibility that there are differing approaches associated, generally speaking, with IBM versus Microsoft whereby it's easy to generate SOAP requests that work fine with one that break with the other. Do you know of some: a) standard conformance tests b) recommended best practices for being compatible with all mainstream implementations (I'd guess a good approach would be to generate very strictly conformant code but accept all that you can, even if against pedantic reading of the spec) - Hannu ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] updateable cursors visibility
Hiroshi Inoue kirjutas E, 31.03.2003 kell 03:40: Tom Lane wrote: Serializable or not, there is a good case for saying that cursors don't see changes made after they are opened, period. The current implementation locks down the cursor's snapshot at DECLARE time. It's only because PostgreSQL( or other MVCC based DBMS) is easy and natural to implement cursors in such a way. However, It isn't a requirement of the SQL standard, IIRC. As for ODBC, ODBC has the following cursor types about the visibility of other changes. 1) static It never detects other changes. This seems the clearest ? 2) dynamic It can detect any changes made to the membership, order, and values of the result set after the cursor is opened. What would it mean in practice, i.e. if you are on the first row in the cursor and then update tha ORDER BY field so that your row becomes the last one, will the next FETCH move the cursor past end ? what happens, if the row you are on is removed from the keyset, either by current or any other backend ? What about the case when you have moved past the last row, and suddenly a new row appears which is positioned after the last row ? What about when you are moving over several rows that have the same ordering position and then one more appears - should it go before or after the current position ? 3) keyset-driven It always detects changes to the values of rows. What about _new_ rows, or rows that no more belong to the keyset ? 4) mixed A combination of a keyset-driven cursor and a dynamic cursor. Combined in what way ? It's not clear to me now how we should realize the above type of cursors at server side. From your short description it is not even clear for me how *exactly* should they behave. -- Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] updateable cursors visibility
Hiroshi Inoue kirjutas E, 31.03.2003 kell 19:08: -Original Message- From: Hannu Krosing [mailto:[EMAIL PROTECTED] Hiroshi Inoue kirjutas E, 31.03.2003 kell 03:40: 2) dynamic It can detect any changes made to the membership, order, and values of the result set after the cursor is opened. What would it mean in practice, i.e. if you are on the first row in the cursor and then update tha ORDER BY field so that your row becomes the last one, will the next FETCH move the cursor past end ? No. The row next to the old first row would be FETCHed. In what way would the _changes_made_to_the_order_ be reflected then ? what happens, if the row you are on is removed from the keyset, either by current or any other backend ? The dynamic cursor doesn't see the row any longer. It seems to be doable with MVCC - just ;) check for visibility of underlying tuples at each fetch. At least it does not seem any harder for MVCC than for other CC methods. 3) keyset-driven It always detects changes to the values of rows. What about _new_ rows, It never detects new rows. Then I must have misunderstood the can detect any changes made to the membership, order, and values part. I assumed that any changes wold also include rows that magically become part of the query by either changes in values or being inserted. or rows that no more belong to the keyset ? They are the same as deleted ones. So they are no more visible to cursor ? From your short description it is not even clear for me how *exactly* should they behave. I only intended to illustrate various type of visibilities roughly because there were no such reference in this thread. regards, Hiroshi Inoue ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: tsearch V2 (Was: Re: [HACKERS] Two weeks to feature freeze)
The Hermit Hacker kirjutas R, 20.06.2003 kell 08:28: On Fri, 20 Jun 2003, Tom Lane wrote: On Fri, 20 Jun 2003, The Hermit Hacker wrote: Is there a strong reason why tsearch isn't in gborg? I think text search is a pretty important facility that should eventually be part of the core distribution. It's more likely to get there from contrib than from gborg ... Why part of the core distribution, and not just left as a loadable module, like it is now? I remember Tom saying that builtin functions calls are a lot faster than loadable C functions. If that can be fixed, then it *could* stay loadable. Also, having built-in full text indexing is very desirable. And I don't see any even nearly as good competing fulltext indexing modules anywhere. If we had to move something *out* of core in order to get tsearch in, then I personally would not mind if all geometry types go to gborg, but I'm sure there are some users who would mind ;) --- Hannu ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PlPython
Tom Lane kirjutas E, 23.06.2003 kell 01:29: Kevin Jacobs [EMAIL PROTECTED] writes: Attached is a patch that removes all of the RExec code from plpython from the current PostgreSQL CVS. In addition, plpython needs to be changed to an untrusted language in createlang. I am inclined to rename plpython to plpythonu, by analogy to pltclu. ... Comments? could we not just make sure that plpython uses python ver 2.x and use plpythonu for python versions = 2.x until a secure regex solution comes from Guido and folks ? I guess most plpython users would be much happier with plpython with some minor limitations due to older version than with being forced to use an untrusted pl altogether. IIRC python 1.5.2 has a perfectly good RExec. Or is there a requirement that only latest language versions are used in pg 74 ;) -- Hannu ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PlPython
Tom Lane kirjutas E, 30.06.2003 kell 00:18: Hannu Krosing [EMAIL PROTECTED] writes: could we not just make sure that plpython uses python ver 2.x and use plpythonu for python versions = 2.x until a secure regex solution comes from Guido and folks ? We'd still have to mark it untrusted, so what's the point? No we don't! The old version of plpython was perfectly OK when used with python 1.5.x and will be so. The RExec security holes were only introduced with new class mechanisms in python 2.x. The version with patch which removes RExec (as Python 2.x is not supporting it ) is the right thoing to do FOR PYTHON 2.X, but there is no reason to remove safe execution when using python 1.5.x. Thus my proposition for using the old version as plpython and the new version as plpython-u, but allowing the non-u version to be compuled only for python v 2.x. - Hannu ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PlPython
Tom Lane kirjutas E, 30.06.2003 kell 01:21: Hannu Krosing [EMAIL PROTECTED] writes: The version with patch which removes RExec (as Python 2.x is not supporting it ) is the right thoing to do FOR PYTHON 2.X, but there is no reason to remove safe execution when using python 1.5.x. Who's still using 1.5, I guess is the question? And are they likely to be updating their PG installation when they're not updating Python? Python is designed such that one can install and use different versions in parallel - for example the deafult directopries for libraries are /usr/lib/python1.5/ and /usr/lib/python2.2/ if you have installed python 1.5.x and 2.2.x, also executables are python2, python2.2 and python1.5, with plain python being a link to one of these. I guess that anyone who needs safe Restricted Execution will be using 1.5 at least for that purpose until RExec is fixed in 2.x. -- Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PlPython
Tom Lane kirjutas E, 30.06.2003 kell 06:39: Hannu Krosing [EMAIL PROTECTED] writes: Tom Lane kirjutas E, 30.06.2003 kell 01:21: Who's still using 1.5, I guess is the question? And are they likely to be updating their PG installation when they're not updating Python? I guess that anyone who needs safe Restricted Execution will be using 1.5 at least for that purpose until RExec is fixed in 2.x. I don't find that real compelling ... The bottom line is that this has to get done. I have the time to convert plpython to untrusted status tomorrow. I do not have the time, the infrastructure, nor the interest to build a conditional setup. Unless someone else wants to volunteer to make it happen in a timely fashion, untrusted is what it will be. Fine with me. Just don't put in any hacks to pretend it is trusted. --- Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [GENERAL] PlPython
elein kirjutas T, 24.06.2003 kell 00:42: There is a realtively clean hack one can use to convert plpython functions to plpythonu manually - just rename the language for the time of loading functions - do as superuser update pg_language set lanname = 'plpython' where lanname = 'plpythonu'; LOAD YOUR Pl/Python FUNCTIONS ; update pg_language set lanname = 'plpythonu' where lanname = 'plpython'; PS: I've built and tested the plpython patch against 7.3.2 and am happy it does not affect the features I count on. As it should. The untrusted language gives you *more* power, not less. The untrusted status means that the user has to be trusted to use that much power. Hannu ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] help needed with yacc/bison
Oleg Bartunov kirjutas T, 01.07.2003 kell 15:49: Hi there, attached archive contains simple parser demonstrating our problem. untar it, make, make test Good test: echo -n 12 34.1234 ... | ./parser INTEGER:'12' CHAR: ' ' VERSION:'34.1234' CHAR: ' ' DOT:'.' DOT:'.' DOT:'.' Wrong: echo -n 12 34.1234. ... | ./parser INTEGER:'12' CHAR: ' ' yyerror: syntax error, unexpected CHAR, expecting INTEGER The problem is recognizing VERSION (from gram.y) version: INTEGER DOT INTEGER{ $$ = strconcat($1, $3, $2); } | version DOT INTEGER { $$ = strconcat($1, $3, $2); } removing the line above seems to fix your problem ;) ; For last query '34.1234.' we want to print VERSION '34.1234' and return DOT. you can't return DOT as version is str and DOT is opr This is just an test example, actually we know workaround for this case, but we need something simple and universal :) please describe the problem with some more samples, as it will make it easier which kind of universal you are searching for ;) Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] help needed with yacc/bison
Oleg Bartunov kirjutas K, 02.07.2003 kell 11:39: The problem is recognizing VERSION (from gram.y) version: INTEGER DOT INTEGER{ $$ = strconcat($1, $3, $2); } | version DOT INTEGER { $$ = strconcat($1, $3, $2); } removing the line above seems to fix your problem ;) No, it's there by intention. VERSION could be not just 7.3 but 7.3.3 :) Try attached gram.y and lex.l --- Hannu %{ #include sys/types.h #include stdio.h #include string.h #include stdlib.h #include scan.h char * strconcat( char *s1, char *s2, char c ) { char *s; int l1 = strlen(s1), l2=strlen(s2); s = (char*)malloc( l1+l2+2 ); memcpy(s, s1, l1); if ( c ) *(s+l1) = c; memcpy( (s+ (l1+((c)?1:0)) ) , s2, l2); *( s+(l1+l2+((c)?1:0)) ) = '\0'; return s; } int yyerror(char*); int yylex(void); int yyparse(void); %} %error-verbose %union { char*str; u_int32_t opr; } %type str version %type str integer %type opr symbol %type opr dot %token str VERSION_MINOR %token str INTEGER %token opr DOT %token opr CHAR %token str ENDOF %% input: | input data ; data: ENDOF { return 0; } | symbol | dot | version { printf(VERSION:\t'%s'\n, $1); } | integer ; version: INTEGER VERSION_MINOR{ $$ = strconcat($1, $2, '-'); } | version VERSION_MINOR { $$ = strconcat($1, $2, '+'); } ; integer: INTEGER { printf(INTEGER:\t'%s'\n, $1); } ; symbol: CHAR { printf(CHAR:\t'%c'\n, $1); } ; dot: DOT { printf(DOT:\t'%c'\n, $1); } ; %% int yyerror(char *s) { printf(yyerror: %s\n,s); return 0; } int main(void) { yyparse(); return 0; } %{ #include sys/types.h #include string.h #include scan.h %} %option 8bit %option never-interactive %option nounput %option noyywrap %% \.[0-9]+{ yylval.str=strdup(yytext+1); return VERSION_MINOR; } [0-9]+ { yylval.str=strdup(yytext); return INTEGER; } \. { yylval.opr=(u_int32_t)*yytext; return DOT; } . { yylval.opr=(u_int32_t)*yytext; return CHAR; } EOF { return ENDOF; } %% ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] postmaster startup failure
WHen running PostgreSQL 7.3.3-1 (from rpm's) on Redhat 9.0 I got the following in logs and the postmaster will not start up. Any Ideas what I could do to start up ? This in on a laptop used for development, but I still would like to not initdb. postmaster successfully started LOG: database system shutdown was interrupted at 2003-07-17 00:42:29 EEST LOG: checkpoint record is at 0/304E76A8 LOG: redo record is at 0/304E76A8; undo record is at 0/0; shutdown FALSE LOG: next transaction id: 3981836; next oid: 4003572 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/304E76E8 LOG: ReadRecord: unexpected pageaddr 0/2C504000 in log file 0, segment 48, offset 5259264 LOG: redo done at 0/30503FDC PANIC: XLogWrite: write request 0/30504000 is past end of log 0/30504000 LOG: startup process (pid 2445) was terminated by signal 6 LOG: aborting startup due to startup process failure -' Hannu ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] postmaster startup failure
Tom Lane kirjutas N, 17.07.2003 kell 19:49: Ugh. The reason we hadn't seen this happen in the field was that it is a bug I introduced in a patch two months ago :-( 7.3.3 will in fact fail to start up, with the above error, any time the last record of the WAL file ends exactly at a page boundary. I think we're gonna need a quick 7.3.4 ... If you want a source patch for 7.3.3, here it is. Thanks! --- Hannu ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] TODO items
Tom Lane kirjutas R, 08.08.2003 kell 16:56: Bruce Momjian [EMAIL PROTECTED] writes: o Add optional textual message to NOTIFY Not done, but there is room in the FE/BE protocol now for something like this. Were there any other changes to NOTIFY - there was talk about making NOTIFY use some other structure instead of ordinary PG tables in backend. -- Hannu ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] encoding question
Christopher Kings-Lynne kirjutas N, 07.08.2003 kell 04:33: My other question is we play around with bytea fields to escape nulls and chars 32 and stuff so that when someone browses the table, they get '\000unknown\000...', etc. actually bytea *stores* char(0), you get \000 or \x0 or @ or whatever depending on whatever you use for displaying it. the escaping i's done only to fit the data into a SQL statement when inserting the data into the database. select returns straight bytes from bytea. However, are the other field types for which we have to do this? Can you put nulls and stuff in text/varchar/char fields? No. Nulls are not allowed in text/varchar fields. - Hannu ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Speeding up operations
Rahul_Iyer kirjutas K, 13.08.2003 kell 08:23: hi... im on a project using Postgres. The project involves, at times, upto 5,000,000 inserts. I was checking the performance of Postgres for 5M inserts into a 2 column table (one col=integer, 2nd col=character). I used the Prepare... and execute method, so i basically had 5M execute statements and 1 prepare statement. Postgres took 144min for this... is there any way to improve this performance? if so, how? btw, im using it on a SPARC/Solaris 2.6. If you are inserting into an empty table with primary key (or other constraints), you can run ANALYZE on that table in 1-2 minutes after you have started the INSERTs, so that constraint-checking logic will do the right thing (use inedex for pk). in my tests I achieved about 9000 inserts/sec by using multiple inserting frontends and ~100 inserts per transaction (no indexes, 6 columns, 4 processors, 2GB memory, test clients running on same computer) -- Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Proper Unicode support
Oleg Bartunov kirjutas E, 11.08.2003 kell 11:52: On Mon, 11 Aug 2003, Peter Eisentraut wrote: Alexey Mahotkin writes: AFAIK, currently the codepoints are sorted in their numerical order. I've searched the source code and could not find the actual place where this is done. I've seen executor/nodeSort.c and utils/tuplesort.c. AFAIU, they are generic sorting routines. PostgreSQL uses the operating system's locale routines for this. So the sort order depends on choosing a locale that can deal with Unicode. sort order works, but upper/lower are broken. I think that the original MB/Unicode support was made for japanese language/characters, and AFAIK they don't even have the concept (problem) of upper/lower case. A question to the core - are there any plans to rectify this for less fortunate languages/charsets? Will the ASCII-speaking core tolerate the potential loss of performance from locale-aware upper/lower ? Will this be considered a feature or a bugfix (i.e. should we attempt to fix it for 7.4) ? --- Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] New array functions
Joe Conway kirjutas N, 28.08.2003 kell 21:15: Greg Stark wrote: Specifically I want to know how to replace my int_array_aggregate(int) and int_array_enum(_int) calls. I have no idea what those are -- are they from contrib? You can create an aggregate to turn arbitrary datatype elements into arrays like this: CREATE AGGREGATE array_aggregate ( BASETYPE = anyelement, SFUNC = array_append, STYPE = anyarray, INITCOND = '{}' ); Any idea of performance - is this array_aggregate(anyelement) faster, slower or about same than int_array_aggregate(int) ? If int_array_enum() is supposed to take '{1,2,3}' and produce three rows, that function was proposed but rejected. Subsequently Peter Eisentraut pointed out a SQL99 syntax that does this, but I did not get it done for 7.4. Perhaps for 7.5. So we got to keep intagg at least until 7.5 ... --- Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] ALTER TABLE
Rod Taylor kirjutas R, 29.08.2003 kell 03:31: I've been fiddling away on ALTER TABLE and have a few questions about a possible datatype change. Just alter portions. I presume I'll need to do a table rewrite. What is the best way to change a single datum? heap_modify() takes a single relation type where I will need to deal with different types. Simply build a new tuple with old datums (easy enough)? Do we care about OIDs being renumbered? AFAIK alter table change column should do the equivalent of alter table x add column temp_name newdatatype; update table x set temp_name=convert(name); alter table x drop colum name; alter table x rename column temp_name to name; This should not renumber OIDS. ALTER TABLE test ADD CHECK(col 4), add column bob integer default 2 not null, add column bob2 serial check(bob2 = 255), drop column col2 cascade; or with your combined syntax alter table x add column temp_name newdatatype = convert(current_name), drop column current_name, rename column temp_name tocurrent_ name; --- Hannu ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Preliminary notes about hash index concurrency (long)
Tom Lane kirjutas E, 01.09.2003 kell 15:41: Bruce Momjian [EMAIL PROTECTED] writes: I know someone reported a problem with the hash indexes (data loss, serious)--- was that a new 7.4 but or something that has existed for a long time? AFAICT the bug's been there since Berkeley days. One could check how BSDDB (http://www.sleepycat.com) handles these issues. It is reported to have started as btree/hash index code extracted from an early version of postgres, so perhaps there one could at least get some ideas, though their locking / concurrency control are probably much different. -- Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Win32 native port
Joerg Hessdoerfer kirjutas N, 04.09.2003 kell 10:22: I'm currently in the process of setting up my development environment (how the heck do I get bison/flex to compile under MingW/MSYS? Oh my...) there is a precompiled bison in the MinGW filelist http://www.mingw.org/download.shtml#hdr2 dunno about flex ;( , and then I'll go adventuring in the code. -- Hannu ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Warning for missing createlang
Andrew Dunstan kirjutas L, 06.09.2003 kell 16:14: Peter Eisentraut wrote: Tom Lane writes: There are good security arguments not to have it in the default install, no? I think last time the only reason we saw was that dump restoring would be difficult. I don't see any security reasons. That could be overcome by doing a 'drop language' before running your restore, couldn't it? or to have CREATE OR REPLACE LANGUAGE (like we have for FUNCTIONS). --- Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] PostgreSQL not ACID compliant?
Heikki Tuuri kirjutas P, 21.09.2003 kell 12:51: Tom, - Original Message - From: Tom Lane [EMAIL PROTECTED] To: Heikki Tuuri [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Sunday, September 21, 2003 10:32 AM Subject: Re: [HACKERS] PostgreSQL not ACID compliant? Heikki Tuuri [EMAIL PROTECTED] writes: if you set the transaction isolation level SERIALIZABLE in MySQL/InnoDB, then InnoDB uses next-key locking in every SELECT, and transactions really are serializable in the mathematical sense. My understanding is that next-key locking only helps when all your predicates are point or range searches against an available b-tree index. all SQL queries are performed through index searches. Does that mean that an index is allways needed for predicate column, or is this an abstract index in some realational algebra sense ? That is why next-key locking enforces serializability. IBM researcher C. Mohan has written papers about next-key locking. Next-key locking is an approximation of predicate locking. We simply lock more to make sure the 'predicates' themselves are locked. While that might cover many practical cases, it can hardly be called a complete solution to the problem of serializability. It is a complete solution. Is this solution complete only for MAX() case, or is this a general solution woking for things line AVG or STDDEV and perhaps all user-defined aggregates as well ? Another approximation of predicate locking is table level locking, a solution which Oracle used some 15 years ago, if you switched it on the SERIALIZABLE isolation level. Table level locking seems to be a complete solution indeed, just not concurrent at all. It may be that we have to forget concurrency to get complete and general concurrency ;( Or is next key locking something more than a solution for getting continuous nextval() 's ? -- Hannu ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: ADD FOREIGN KEY (was Re: [HACKERS] [GENERAL] 7.4Beta)
Andreas Pflug kirjutas E, 29.09.2003 kell 12:04: Christopher Kings-Lynne wrote: You could just as easily argue that the lack of integrity testing at data load time was equally a bug. I think we need someway of telling postgres to suppress a foreign key check. The main problem is that the foreign key column is often not indexed. So a db designer made a bloody mistake. The problem is there's no easy way to find out what's missing. I'd really like EXPLAIN to display all subsequent triggered queries also, to see the full scans caused by missing indexes. It could probably be doable for EXPLAIN ANALYZE (by actually tracing execution), but then you will see really _all_ queries, i.e. for a 1000 row update you would see 1 UPDATE query and 1000 fk checks ... OTOH, you probably can get that already from logs with right logging parameters. - Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: ADD FOREIGN KEY (was Re: [HACKERS] [GENERAL] 7.4Beta)
Shridhar Daithankar kirjutas E, 29.09.2003 kell 13:34: On Monday 29 September 2003 15:58, Christopher Kings-Lynne wrote: So a db designer made a bloody mistake. The problem is there's no easy way to find out what's missing. I'd really like EXPLAIN to display all subsequent triggered queries also, to see the full scans caused by missing indexes. It could probably be doable for EXPLAIN ANALYZE (by actually tracing execution), but then you will see really _all_ queries, i.e. for a 1000 row update you would see 1 UPDATE query and 1000 fk checks ... OTOH, you probably can get that already from logs with right logging parameters. Actually - it shouldn't be too hard to write a query that returns all unindexed foreign keys, surely? Correct me if I am wrong but I remember postgresql throwing error that foreign key field was not unique in foreign table. Obviously it can not detect that without an index. Either primary key or unique constraint would need an index. What am I missing here? IOW, how do I exactly create foreign keys without an index? hannu=# create table pkt(i int primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'pkt_pkey' for table 'pkt' CREATE TABLE hannu=# create table fkt(j int references pkt); NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s) CREATE TABLE hannu=# now the *foreygn key* column (fkt.j) is without index. As foreign keys are enforced both ways, this can be a problem when changing table pkt or bulk creating FK's on big tables. Hannu ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 7.4 status
Bruce Momjian kirjutas T, 30.09.2003 kell 02:16: Tom Lane wrote: ! Faster regular expression code We could tout more functionality too, since the new regex package has a lot of advanced stuff that wasn't there before. Added more powerful This wording covers nicely possible incompatibilities too ;) -- Hannu ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Weird locking situation
Tom Lane kirjutas N, 02.10.2003 kell 17:30: Christopher Kings-Lynne [EMAIL PROTECTED] writes: What is going on here? Surely getting a FOR UPDATE row lock should prevent another process getting an update lock? The behavior you describe would certainly be a bug, but you'll have to show a reproducible example to convince me it wasn't pilot error. One idea that springs to mind is that maybe additional rows with id=1 were inserted (by some other transaction) between the SELECT FOR UPDATE and the UPDATE? Perhaps he was looking for key locking, so thet select ... where key=1 for update would also prevent inserts where key=1 ? Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Index/Function organized table layout
James Rogers kirjutas N, 02.10.2003 kell 20:50: To give a real world example, a standard query on one of our tables that has not been CLUSTER-ed recently (i.e. within the last several days) generates an average of ~2,000 cache misses. Recently CLUSTER-ed, it generates ~0 cache misses on average. Needless to say, one is *much* faster than the other. So what you really need is the CLUSTER command to leave pages half-empty and the tuple placement logic on inserts/updates to place new tuples near the place where they would be placed by CLUSTER. I.e. the code that does actual inserting should be aware of CLUSTERing. I guess that similar behaviour (half-empty pages, or even each second page empty which is better as it creates less dirty buffers) could also significantly speed up updates on huge number of tuples, as then code could always select a place near the old one and thus avoid needless head-movements between reading and writing areas. In my case, not only does CLUSTER-ing increase the number of concurrent queries possible without disk thrashing by an integer factor, but the number of buffers touched on a query that generates a cache misses is greatly reduced as well. The problem is that CLUSTER-ing is costly and index-organizing some of the tables would reduce the buffer needs, since the index tuple in these cases are almost as large as the heap tuples they reference. True, but my above suggestion would be much easier to implement near-term. It seems to be a nice incremental improvement just needs to touch places: 1. selecting where new tuples go : * updated ones go near old ones if not clustered and near the place CLUSTER would place them if clustered. * inserted ones go to the less than half-empty pages if not clustered and near the place CLUSTER would place them if clustered. 2. making reorganization code (CLUSTER and VACUUM FULL) to leave space in pages for clustered updates/inserts. the half above could of course mean anything from 10% to 95% depending on access patterns. - Hannu ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Index/Function organized table layout
James Rogers kirjutas N, 02.10.2003 kell 23:44: On Thu, 2003-10-02 at 12:09, Hannu Krosing wrote: So what you really need is the CLUSTER command to leave pages half-empty and the tuple placement logic on inserts/updates to place new tuples near the place where they would be placed by CLUSTER. I.e. the code that does actual inserting should be aware of CLUSTERing. Not exactly. What you are describing is more akin to partitioning or hash-organized tables i.e. sorting insert/update tuples to various pages according to some hash function. What I actually thought I was describing is how CLUSTER should work in a postgres flavour of MVCC storage ;). Not the CLUSTER command, but the whole feature. B-Tree organized tables basically make the table and the primary index the same thing, and tend to be most useful for tables that use a single multi-column primary key index for queries. This has the effect of putting all the tuples for a typical query in the same small number of heap pages (and therefore buffers), allowing very efficient access in the typical case with the caveat that non-indexed queries will be quite slow. AFAICS we could resolve this problem (querying indexes only) by keeping a copy of visibility info (tmin,tmax,...) in index tuples. This would make index updates bigger and thus slower, so this should be optional. If you then put all fields in primary key, then the main table could be dropped. If there is no data table then no other indexes would then be allowed, or they must be double-indexes referencing the primary key, not tuple and thus even bigger ... B-Tree organized tables are particularly useful when the insert order is orthogonal to the typical query order. As I mentioned before, tables that store parallel collections of time-series data are classic examples. Often the data is inserted into the pages in order that can roughly be described as (timestamp, id), but is queried using (id, timestamp) as the index. If you have enough ids, you end up with the pathological case where you typically have one relevant tuple per page for a given query. But if we had clustered the table on (id, timestamp), then the data would be in right order for queries, if cluster worked well. The nuisance would be keeping track of which pages are collecting which tuples. Knowing the CLUSTER index doesn't tell you much about which pages would currently be a good place to put a new tuple. You could always markup the index that CLUSTER uses to keep track of good candidates (plus some additional structures), but the more I think about that, the more it looks like a nasty hack. Yeah, index-organized tables seems exact fit for your problem, but then my abstract idea of what clustering should do is exactly that - keep tuples in roughly the same order as an index ;) So what really is needed is a smart tuple-placer which can keep tuples that are close (as defined by index) together in a small number of pages. These pages themselves need not be coninuous, they can be sprinkled around in the whole data table, but they need to stay clusters of index-close tuples. Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Dreaming About Redesigning SQL
Mike Mascari kirjutas L, 04.10.2003 kell 06:32: 2) The query language should be computationally complete. The user should be able to author complete applications in the language, rather than the language being a sublanguage. To me it seems like requiring that one should be able to author complete programs in regex. Yes, when all you have is a hammer everything looks like a nail ;) Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Using backslash in query
Tom Lane kirjutas R, 03.10.2003 kell 18:34: Michael Brusser [EMAIL PROTECTED] writes: But when I doubled the number of backslashes: SELECT smth. FROM tbl WHERE situation LIKE '%%'; - it actually worked fine. Backslash is special to both the string-literal parser and the LIKE code. So when you write the above, the pattern value that arrives at the LIKE processor has one less level of backslashing: %\\% and the LIKE processor interprets this as percent, a literal backslash, and another percent. Regarding the dollar-quoting discussions - Will we be able to write the above query as SELECT smth. FROM tbl WHERE WHERE situation LIKE $$%\\%$$; in 7.4 or is \ still special there ? if it is then one \ in regex in plpython still needs to be entered as which has some geek coolness but would not be what I'd prefer to do on a regular basis. Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Index/Function organized table layout (from Re:
Christopher Browne kirjutas R, 03.10.2003 kell 00:57: [EMAIL PROTECTED] (Jean-Luc Lachance) writes: That's one of the draw back of MVCC. I once suggested that the transaction number and other house keeping info be included in the index, but was told to forget it... It would solve once and for all the issue of seq_scan vs index_scan. It would simplify the aggregate problem. It would only simplify _one_ case, namely the case where someone cares about the cardinality of a relation, and it would do that at _considerable_ cost. A while back I outlined how this would have to be done, and for it to be done efficiently, it would be anything BUT simple. Could this be made a TODO item, perhaps with your attack plan. Of course as strictly optional feature useful only for special situations (see below) I cross-post this to [HACKERS] as it seem relevant to a problem recently discussed there. It would be very hairy to implement it correctly, and all this would cover is the single case of SELECT COUNT(*) FROM SOME_TABLE; Not really. Just yesterday there was a discussion on [HACKERS] about implementing btree-organized tables, which would be much less needed if the visibility info were kept in indexes. If you had a single WHERE clause attached, you would have to revert to walking through the tuples looking for the ones that are live and committed, which is true for any DBMS. If the WHERE clause could use the same index (or any index with visibility info) then there would be no need for walking through the tuples in data relation. the typical usecase cited on [HACKERS] was time series data, where inserts are roughly in (timestamp,id)order but queries in (id,timestamp) order. Now if the index would include all relevant fields (id,timestamp,data1,data2,...,dataN) then the query could run on index only touching just a few pages and thus vastly improving performance. I agree that this is not something everybody needs, but when it is needed it is needed bad. And it still begs the same question, of why the result of this query would be particularly meaningful to anyone. I don't see the usefulness; I don't see the value of going to the considerable effort of fixing this purported problem. Being able to do fast count(*) is just a side benefit. Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function
Tom Lane kirjutas L, 04.10.2003 kell 19:07: Hannu Krosing [EMAIL PROTECTED] writes: Christopher Browne kirjutas R, 03.10.2003 kell 00:57: A while back I outlined how this would have to be done, and for it to be done efficiently, it would be anything BUT simple. Could this be made a TODO item, perhaps with your attack plan. If I recall that discussion correctly, no one including Christopher thought the attack plan was actually reasonable. What this keeps coming down to is that an optimization that helps only COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in development and maintenance effort to justify its existence. Please read further in my email ;) The point I was trying to make was that faster count(*)'s is just a side effect. If we could (conditionally) keep visibility info in indexes, then this would also solve the problem fo much more tricky question of index-structured tables. Count(*) is *not* the only query that could benefit from not needing to go to actual data table for visibilty info, The much more needed case would be the inveres time series type of queries, which would otherways trash cache pages badly. Hannu ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Index/Function organized table layout
Greg Stark kirjutas P, 05.10.2003 kell 00:17: I've never seen anyone use this feature, and I never seriously considered it myself. It sort of has the feel of an antiquated feature that traded too much flexibility and abstraction for raw performance on very slow disk hardware. Read A Conversation with Jim Gray referenced from this slashdot article: http://slashdot.org/article.pl?sid=03/09/17/1246255mode=threadtid=126 for info on how disk drives are slower than ever (relatively), and how one should treat them as such, especially for large data volumes. However I wonder if the nested tables feature doesn't use it under the hood though. It seems they would both be useful for the same types of tables. I'm not sure what this means for Postgres. I'm not sure if Postgres should use a different name to avoid confusion and possibly to leave room in the future for the possibility of supporting something like this. Or perhaps something like this would be useful for Postgres now or in the near future? Or perhaps the consensus is as I said, that this is an old idea that no longer gets any respect and postgres should just pretend it doesn't exist? We can't pretend CLUSTER does not exist until we have some better technology to offer instead. Hannu ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Architecture Roadmap?
James Rogers kirjutas T, 07.10.2003 kell 01:48: Hi, I guess what I am wondering is whether there is any kind of quasi-official mechanism for putting certain features in certain future versions depending on the nature and significance of implementing those features and the number of things it touches. I don't know of any (except discussing it om [HACKERS] ;), but I very much like the way Pyhton does it via PEPs (Python Enchancement Proposals) see: http://www.python.org/peps/ . - Hannu ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] http://www.pgsql.com/register/submit.php
Jean-Michel POURE kirjutas E, 13.10.2003 kell 16:20: Le Lundi 13 Octobre 2003 14:45, Robert Treat a crit : Someone could have downloaded the package from sourceforge/freshmeat, so I don't think it hurts to have a pointer to postgresql.org. What might be better than a link for the submit page would just be a blurb mentioning we're always looking for new case studies from folks who are using postgresql, and perhaps point them some place for that. What I'd like to see added to the message is a reminder to run initdb... In France, according to the Loi informatique et libert, users have a right to access their personal data. As a result, every web form must display a warning which reminds users that they can access their personal data. Even if no personal data is entered or queried ? This is not a potential problem for PostgreSQL.inc, but for Linux vendors who release PostgreSQL (RedHat, Mandrake, etc...) in France. The fine can be quite large. Also, a judge may stop immediately the release of products including PostgreSQL. Do we need special NOTICE for French users telling them In case you have entered some personal data you can access it by doing a SELECT query after each INSERT/UPDATE to be legally sound in France ? This could then be automatically enabled by LC_ALL=fr . Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Database Kernels and O_DIRECT
James Rogers kirjutas K, 15.10.2003 kell 11:26: On 10/14/03 11:31 PM, James Rogers [EMAIL PROTECTED] wrote: There is some abstraction in Postgres and the database is well-written, but it isn't written in a manner that makes it easy to swap out operating system or API models. It is written to be portable at all levels. A database kernel isn't necessarily required to be portable at the very lowest level, but it is vastly more optimizable because you aren't forced into a narrow set of choices for interfacing with the operating system. Just to clarify, my post wasn't really to say that we should run out and make Postgres use a database kernel type internal model tomorrow. The point of all that was that Oracle does things that way for a very good reason and that there can be benefits that may not be immediately obvious. OTOH, what may be a perfectly good reason for Oracle, may not be it for PostgreSQL. For me the beauty of OS software has always been the possibility to fix problems at the right level (kernel, library, language) , and not to just make workarounds at another level (your application). So getting some API's into kernel for optimizing cache usage or writeback strategies would be much better than using raw writes and rewriting the whole thing ourseleves. The newer linux kernels have several schedulers to choose from, why not push for choice in other areas as well. The ultimate database kernel could thus be a custom tuned linux kernel ;) It is really one of those emergent needs when a database engine gets to a certain level of sophistication. For smaller and simpler databases, you don't really need it and the effort isn't justified. At some point, you cross a threshold where not only does it become justified but it becomes a wise idea or not having it will start to punish you in a number of different ways. I personally think that Postgres is sitting on the cusp of its a wise idea, and that it is something worth thinking about in the future. This thread reminds me of Linus/Tannenbaum Monolithic vs. Microkernel argument - while theoretically Microkernels are better Linux could outperform it by having the required modularity on source level, and being an open-source project this was enough. It also beat the Mach kernel by being there whereas microkernel based mach was too hard to develop/debug and thus has taken way longer to mature. -- Hannu ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Unicode upper() bug still present
Tom Lane kirjutas E, 20.10.2003 kell 03:35: Oliver Elphick [EMAIL PROTECTED] writes: There is a bug in Unicode upper() which has been present since 7.2: We don't support upper/lower in multibyte character sets, and can't as long as the functionality is dependent on ctype.h's toupper()/tolower(). It's been suggested that we could use wctype.h where available. However there are a bunch of issues that would have to be solved to make that happen. (How do we convert between the database character encoding and the wctype representation? How do we do it for sorting ? How do we even find out what representation the current locale setting expects to use?) Why not use the same locale settings as for sorting (i.e. databse encoding) until we have a proper multi-locale support in the backend ? It seems inconsistent that we do use locale-aware sorts but not upper/lower. this is for UNICODE database using locale et_EE.UTF-8 ucdb=# select t, upper(t) from tt order by 1; t | upper ---+--- a | A s | S | | | | | | (8 rows) as you see, the sort order is right, but some characters are and some are not converted the result is a complete mess ;( --- Hannu ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Unicode upper() bug still present
Tatsuo Ishii kirjutas E, 20.10.2003 kell 15:37: Tom Lane kirjutas E, 20.10.2003 kell 03:35: Oliver Elphick [EMAIL PROTECTED] writes: There is a bug in Unicode upper() which has been present since 7.2: We don't support upper/lower in multibyte character sets, and can't as long as the functionality is dependent on ctype.h's toupper()/tolower(). It's been suggested that we could use wctype.h where available. However there are a bunch of issues that would have to be solved to make that happen. (How do we convert between the database character encoding and the wctype representation? How do we do it for sorting ? How do we even find out what representation the current locale setting expects to use?) Why not use the same locale settings as for sorting (i.e. databse encoding) until we have a proper multi-locale support in the backend ? There's absolutely no relationship between database encoding and locale. How does the system then use locale for sorting and not for upper/lower ? I would have rather expected the opposite, as lower/uper rules are litte more locale independent than collation. IMO depending on the system locale is a completely wrong design decision and we should go toward for having our own collate data. I agree completely. We could probably lift something from IBM's ICU. - Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Unicode upper() bug still present
Peter Eisentraut kirjutas E, 20.10.2003 kell 21:02: Tom Lane writes: I noticed by chance that glibc has a reentrant locale API that seems to allow for efficient access to multiple locales concurrently. Perhaps it would be a reasonable solution to support multiple locales only on machines that have this library. If we have to write our own locale support it's likely to be a long time coming :-( Naturally, I cannot promise anything, but this is at the top of my list for the next release. I already have sorted out the specifications and algorithms and collected locale data for most corners of the world, so it's just the coding left. Have you checked ICU ( http://oss.software.ibm.com/icu/ ) ? It seems to have all the needed data at least. Unfortunately, a real, sustainable fix of this situations requires us to start at the very bottom, namely the character set conversion interface, then the gettext interface, then the new locale library, then integrating the per-column granularity into the parser/planer/executor. So you may be looking at a two-release process. --- Hannu ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Unicode upper() bug still present
Karel Zak kirjutas T, 21.10.2003 kell 10:50: On Mon, Oct 20, 2003 at 10:58:00PM +0200, Peter Eisentraut wrote: (Note that I say Unicode a lot here because those people do a lot of research and standardization in this area, which is available for free, but this does not constrain the result to work only with the Unicode character set.) Why cannot do PostgreSQL as 100% pure Unicode system? We can do conversion from/to others encodings as client/server communication extension, but internaly in BE we can use only pure Unicode data. I think a lot of things will more simple... I've heard that some far-east languages have had some issues with 16-bit UNICODE, but the 32-bit should have fixed it. I would also support a move to UNICODE (store as SCSU, process as 16 or 32 bit wchars, i/o as UTF-8) for NCHAR/NVARCHAR/NTEXT and pure 7-bit byte-value ordered ASCII for CHAR/VARCHAR/TEXT. But this would surely have some issues with backward compatibility. Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Unicode upper() bug still present
Tatsuo Ishii kirjutas T, 21.10.2003 kell 12:07: Why cannot do PostgreSQL as 100% pure Unicode system? We can do conversion from/to others encodings as client/server communication extension, but internaly in BE we can use only pure Unicode data. I think a lot of things will more simple... Please don't do that. There's a known issue of round trip conversion between Unicode and other encodings Are these unsolvable even in theory ? - Hannu ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Dreaming About Redesigning SQL
Anthony W. Youngman kirjutas P, 19.10.2003 kell 21:24: As soon as a requirement for a database specifies extraction of the maximum power from the box, it OUGHT to rule out all the current relational databases. MV flattens it for it for performance. As an MV programmer, I *KNOW* that I can find any thing I'm looking for (or find out it doesn't exist) with just ONE disk seek. Relational or not, this requires either in-memory index or perfect hash. BTW, how do you find the oldest red elephant with just one disk seek? as in SQL: select from elephants where colour=red order by age desc limit 1; A relational programmer has to ask the db does this exist and hope the db is optimised to be able to return the result quickly. To quote the Pick FAQ SQL optimises the easy task of finding stuff in memory. Pick optimises the hard task of getting it into memory in the first place. SQL by itself optimises nothing: by definition it evaluates full cross products and then compares all rows with predicates. Some SQL implementations do optimse a little ;) Relational is all about theory and proving things mathematically correct. MV is all about engineering and getting the result. Or perhaps just getting _the_ result ;) getting some other result will probably need another MV database ;) Unless you can use set theory to predict the future, Isn't this what PostgreSQL's optimiser does ? -- Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dreaming About Redesigning SQL
Marshall Spight kirjutas N, 23.10.2003 kell 11:01: Anthony W. Youngman [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Just like the academics were brainwashed into thinking that microkernels were the be-all and end-all - until Linus showed them by practical example that they were all idiots ... Linus set out to build a unix kernel workalike, and he chose the easiest path, copying architecture from the 1970s, along with all the weaknesses that those idiot academics had identified years earlier. Since then, his monolithic kernel has gotten a lot of marketshare, due to a number of different reasons, none of them being architectural superiority. Unless you count as architectural superiority the fact that it can be actually written and debugged in a reasonable time. Being able to mathematically define something as not having certain weaknesses does not quarantee that the thing can be actually implemented and/or is usable. -- Hannu ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] round() function wrong?
Peter Eisentraut kirjutas R, 24.10.2003 kell 22:16: Jochen Westland [invigo] writes: In my version select round(2.5); returns 2; select round(2.501) returns 3; refering to my math professor thats wrong, at least in germany. select round(2.5); should return 3 The convention that .5 values should be rounded up is just that, a convention. Also, which way is up ? hannu=# select round(0.5); round --- 1 (1 row) hannu=# select round(-0.5); round --- -1 (1 row) On systems with IEEE 754 floating point, the default is normally to round to the nearest even number. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL on Novell Netware 6.5.
Eduardo D Piovesam kirjutas E, 20.10.2003 kell 16:35: They stopped at 7.2.4 because they're finishing some usefull APIs, which'll make the port much more easy. Will this involve using a Linux kernel ;) When this part is done, a new port will be made with 7.4. With much less NetWare specific code and maybe, it'll be accepted by the PostgreSQL community. Sure! Hannu ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] random access - bytea
Dennis Bjorklund kirjutas P, 26.10.2003 kell 07:30: On Sat, 25 Oct 2003, Joe Conway wrote: That will modify the bytea column so that it is stored uncompressed in the TOAST table. Now, simply use substr() to grab any random chunk: SELECT substr(foo_bytea, 2, 100) from foo where foo_id = 42; This would imply that every little read would have to do a scan on a table to find the row and then to perform the substr. An open command can be optimized a lot more, for example to cache entries that have been opened so that it's fast to read the next 1kb or whatever you want. Also, the above does not solve writes at all which can also be made transaction safe and fast with a better api where you can update a part of a field. I brought it up once and Tom answered that TOAST tables are not transaction aware, so you can't update just some parts of toasted entities - you must always write the whole fields. So this will not be just an api change. The above is not really a replacement of the current lo-objects. True. - Hannu ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Tom Lane kirjutas P, 02.11.2003 kell 20:00: Jan Wieck [EMAIL PROTECTED] writes: I am currently looking at implementing ARC as a replacement strategy. I don't have anything that works yet, so I can't really tell what the result would be and it might turn out that we want both features. It's likely that we would. As someone (you?) already pointed out, VACUUM has bad side-effects both in terms of cache flushing and in terms of sheer I/O load. Those effects require different fixes AFAICS. One thing that bothers me here is that I don't see how adjusting our own buffer replacement strategy is going to do much of anything when we cannot control the kernel's buffer replacement strategy. At least for OpenSource/Free OS'es it would probably be possible to persuade kernel developers to give the needed control to userspace apps. So the take over all RAM is not the only option ;) To get any real traction we'd have to go back to the take over most of RAM for shared buffers approach, which we already know to have a bunch of severe disadvantages. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] ADD/DROP INHERITS
Ühel kenal päeval, K, 2006-06-07 kell 15:33, kirjutas Greg Stark: Tom Lane [EMAIL PROTECTED] writes: I thought we had agreed that the semantics of ADD INHERITS would be to reject the command if the child wasn't already suitable to be a child of the parent. Not to modify it by adding columns or constraints or whatever. For the proposed uses of ADD INHERITS (in particular, linking and unlinking partition tables) an incompatibility in schema almost certainly means you made a mistake, and you don't really want the system helpfully fixing your table to match the parent. I didn't see any discussion like that and I find it pretty surprising. I'm pretty sure that what was discussed was just attaching/detaching child tables into inheritance chains with no table alterations. Maybe it was never mentioned explicitly, but that was how I understood the discussion. Personally I would have agreed. For partitioned tables you certainly don't want it to create new columns without warning you. Exactly! But that's entirely inconsistent with the way inherited tables work in general. It seems to go against the grain of Postgres's general style to implement just the use case that's useful for a particular application rather than keep the features logically consistent with each other. There are too many conflicting definitions of logically consistent, so doing the bare minimum is the best way to avoid the whole problem. Perhaps there should be an option when issuing the ADD INHERITS to indicate whether you want it to create new columns or only match existing columns. That would also give me a convenient excuse to skip all those NOTICEs about merging column definitions. nonono! the whole pg inheritance/partitioning thing is still quite low-level and ADD/DEL INHERITS is the wrong place to start fixing it. Actually I think in the long term for partitioned tables Postgres will have to implement a special syntax just like Oracle and other databases. The user doesn't really want to have to manually manage all the partitions as tables. That imposes a lot of extra work to have to define the tables with the right syntax, maintain the constraints properly, etc. Yes. Maybe. But this is something that requires much more thought and planning than adding the simplest possible ADD/DELETE INHERITS. For the user it would be better to have a single property of the partitioned table that specified the partition key. Then when adding a partition you would only have to specify the key range it covers, not write an arbitrary constraint from scratch. Nor would you have to create an empty table with the proper definition first then add it in. Don't try to solve too many problems at once. Starting with just a possibility to move suitable ready-made partitions in and out of inheritance chain solves a really big problem. No need to try to obfuscate it with extra functionality, at least not initially. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] How to avoid transaction ID wrap
Ühel kenal päeval, T, 2006-06-06 kell 13:53, kirjutas Christopher Browne: We have triggers that fire is something interesting is found on insert. We want this thing to run for a log time. From the numbers, you can see the PostgreSQL database is VERY loaded. Running VACUUM may not always be possible without losing data. why ? just run it with very friendly delay settings. Friendly delay settings can have adverse effects; it is likely to make vacuum run on the order of 3x as long, which means that if you have a very large table that takes 12h to VACUUM, vacuum delay will increase that to 36h, which means you'll have a transaction open for 36h. That'll be very evil, to be sure... Not always. I know that it is evil in slony1 context, but often it *is* possible to design your system in a way where a superlong transaction is almost unnoticable. Long transactions are evil in case they cause some fast-changing table to grow its storage size several orders of magnitude, but if that is not the case then they just run there in backgroun with no ill effects, especially do-nothing transactions like vacuum. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] More on inheritance and foreign keys
Ühel kenal päeval, N, 2006-06-08 kell 08:38, kirjutas Albert Cervera Areny: Hi, after starting this thread http://archives.postgresql.org/pgsql-hackers/2006-05/msg00222.php, I thought I'd finally go for making foreign keys my own way instead of trying to patch PostgreSQL. However, I've realized that managing foreign keys with my own PL/SQL or C function isn't possible as I need DEFERRED checks which are currently only available for foreign keys. remember that you must manage both ends of foreign key. and you have to lock the other table while changing values at either end. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] How to avoid transaction ID wrap
Ühel kenal päeval, K, 2006-06-07 kell 17:45, kirjutas Jim C. Nasby: Plus, if the only issue here is in fact the long-running transaction for vacuum, there's other ways to address that which would be a lot less intrusive than doing something like going to 64 bit XIDs. IIRC, in 8.2 vacuum will start a new transaction every time it fills up maintenance_work_mem, so just setting that low could solve the problem (at the expense of a heck of a lot of extra IO). If the aim is to *only* avoid transaction wraparound, then maybe we could introduce VACUUM FREEZE ONLY; which never removes any old tuples, but instead just marks them by setting xmin=xmax for them, in addition to its freezing of live-and-visible-to-all tuples. This would avoid touching indexes at all and may well be what is desired for tables with only very little updates/deletes. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] ADD/DROP INHERITS
Ühel kenal päeval, N, 2006-06-08 kell 09:32, kirjutas Greg Stark: Zeugswetter Andreas DCP SD [EMAIL PROTECTED] writes: Imho the op should only choose that path if he wants to fill the table before adding the inheritance. It makes no sense to add columns with default values to existing rows of the child table, especially when you inherit the defaults from the parent. We already have ALTER TABLE ADD COLUMN working for columns with defaults, so I think that horse has left the barn. Do you mean that in newer versions ALTER TABLE ADD COLUMN will change existing data without asking me ? That would be evil! Even worse if ALTER TABLE ALTER COLUMN SET DEFAULT would do the same. Soon we will be as bad as MS Word ! It was awfully annoying for users when that feature was missing. Any non-linearities in the user interface like this end up being surprises and annoyances for users. I would be *really*, *really*, *really* annoyed if an op that I expected to take less than 1 sec takes 5 hours and then forces me to spend another 10 hours on VACUUM FULL+REINDEX or CLUSTER to get performance back. And making such radical changes even between major versions should be stricty forbidden. In any case there's a separate problem with defaults. We want to guarantee that you can DROP a partition and then re-ADD it and the result should be a noop at least from the user's perspective. If DROP partition keeps defaults, and ADD does not change them then DROP +ADD is a NOOP. We can't do that unless I compromise on my idea that adding a child after the fact should be equivalent to creating it with the parent in the definition. When creating a table with the parent in the definition CREATE TABLE will copy the parent's default if the default in the child is NULL: postgres=# create table b (i integer default null) inherits (a); NOTICE: merging column i with inherited definition CREATE TABLE postgres=# \d b Table public.b Column | Type | Modifiers +-+--- i | integer | default 2 Inherits: a The problem is that it's possible to fiddle with the defaults after the table is created, including dropping a default. If you drop the default and then DROP-ADD the partition it would be a problem if the default magically reappeared. sure. it should not magically appear. The only way to allow DROP then ADD to be a noop would be to accept whatever the DEFAULT is on the child table without complaint. And I'm not just saying that because it's the easiest for me to implement :) exactly. that would be the correct behaviour. even for NULL default. This is already a factor for NOT NULL constraints too. When adding a parent after the fact your NULLable column can magically become NOT NULL if the parent is NOT NULL. But for adding a partition after the fact we can't just change the column to NOT NULL because there may already be NULL rows in the table. constraints should match, that is a child table should already have all the constraints of parent, but may have more. We could do a pass-3 check for the NOT NULL constraint but if we're not doing other schema changes then it makes more sense to just refuse to add such a table. nono. the ADD/DROP INHERITS should not do any data checking, just comparison of metainfo. the partitions could be huge and having to check data inside them would negate most of the usefullness for ADD/DROP INHERITS. -- Hannu Krosing Database Architect Skype Techshould benologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com NOTICE: This communication contains privileged or other confidential information. If you have received it in error, please advise the sender by reply email and immediately delete the message and any attachments without copying or disclosing the contents. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] How to avoid transaction ID wrap
Ühel kenal päeval, N, 2006-06-08 kell 12:09, kirjutas Tom Lane: Hannu Krosing [EMAIL PROTECTED] writes: If the aim is to *only* avoid transaction wraparound, then maybe we could introduce VACUUM FREEZE ONLY; which never removes any old tuples, but instead just marks them by setting xmin=xmax for them, in addition to its freezing of live-and-visible-to-all tuples. This would avoid touching indexes at all and may well be what is desired for tables with only very little updates/deletes. Seems like useless complexity. If there are so few dead tuples that you can afford to not reclaim them, then there are so few that reclaiming them isn't really going to cost much either ... It will cost 1 full scan per index, which can be quite a lot of disk read traffic, if indexes are not used, say when most access is local to some hotspot. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] ADD/DROP INHERITS
Ühel kenal päeval, N, 2006-06-08 kell 11:42, kirjutas Greg Stark: Hannu Krosing [EMAIL PROTECTED] writes: Do you mean that in newer versions ALTER TABLE ADD COLUMN will change existing data without asking me ? That would be evil! Even worse if ALTER TABLE ALTER COLUMN SET DEFAULT would do the same. postgres=# alter table test add b integer default 1; ALTER TABLE postgres=# select * from test; a | b ---+--- 0 | 1 (1 row) It was awfully annoying for users when that feature was missing. Any non-linearities in the user interface like this end up being surprises and annoyances for users. I would be *really*, *really*, *really* annoyed if an op that I expected to take less than 1 sec takes 5 hours and then forces me to spend another 10 hours on VACUUM FULL+REINDEX or CLUSTER to get performance back. I forget whether the developer managed to get it working without doing any table rewriting. In theory the table just needs to know that records that are missing that column in the null bitmap should behave as if they have the default value. But I seem to recall some headaches with that approach. I remember that discussion, but I'm surprised that something got implemented and accepted into core with so many unsolvable problems/logical inconsistencies/new pitfalls. for example - to be consistent, one should also make ALTER TABLE ALTER COLUMN col SET DEFAULT x change each default value, no ? but how should one know it for records which are updated, possibly in columns newer than the one with changed DEFAULT. Or was a new default bitmap introduced in addition to null bitmap ? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] ADD/DROP constraints
Ühel kenal päeval, N, 2006-06-08 kell 16:30, kirjutas Greg Stark: On a separate note. The one major remaining piece here is in constraints. I'm thinking what I have to check is that every constraint present on the parent table is present on the child tables. And I'm thinking I should do that by looking at the constraint's textual definition (consrc). This doesn't allow you to get by with a single stronger constraint -- you would still need the redundant looser constraint to satisfy the inheritance. You could find some infrastructure for stronger constraint recognition in constraint exclusion code, if you want to go that way. But it does let you get by with constraint names that don't match the parent's. I'm not sure that's such a good thing, since pg_dump would then generate a redundant constraint when it generates the table. Maybe that would go if constraints got conislocal and coninh. Currently pg_dump generates all constraints with ONLY clause anyway. But I agree that we should get rid of ONLY for ADD CONSTRAINT once we disallow dropping inherited constraints. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] ADD/DROP INHERITS
Ühel kenal päeval, L, 2006-06-10 kell 13:06, kirjutas Greg Stark: But perhaps it's just worth those downsides to keep DROP/ADD a noop in more cases. I don't think that keeping DROP/ADD a noop in more cases. is a goal here. It may be a kind of semi-useful metric of design goodness, but never a goal in itself. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Rethinking stats communication mechanisms
Ühel kenal päeval, P, 2006-06-18 kell 15:09, kirjutas Tom Lane: Magnus Hagander [EMAIL PROTECTED] writes: Might it not be a win to also store per backend global values in the shared memory segment? Things like time of last command, number of transactions executed in this backend, backend start time and other values that are fixed-size? One thing that is doable in constant size memory and would be enormously usable for us is counting to-level function calls and storing their total (and possibly also max) duration. The resaon being, that our production databases are accessed by clients using functions only (with some uninteresting exeptions of course), that is call in form of SELECT x,y,z FROM myfunc(i,j,k) So reserving N*1.5 slots (N being the number of functions defined at databse startup) would be ok. If more than N*0.5 functions are defined in the database lifetime, then the rest are simply ignored (not counted). Or maybe a better approach would be to have a conf variable number-of-functions-to-track and a special boolean flag track_me in pg_functions. In this way you don't accidentally run out of shared mem by defining lots of new functions and then restarting the cluster. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Getting rid of extra gettimeofday() calls
Ühel kenal päeval, E, 2006-06-19 kell 11:17, kirjutas Tom Lane: As of CVS tip, PG does up to four separate gettimeofday() calls upon the arrival of a new client command. This is because the statement_timestamp, stats_command_string, log_duration, and statement_timeout features each independently save an indication of statement start time. Given what we've found out recently about gettimeofday() being unduly expensive on some hardware, this cries out to get fixed. I propose that we do SetCurrentStatementStartTimestamp() immediately upon receiving a client message, and then make the other features copy that value instead of fetching their own. Another gettimeofday() call that I would like to get rid of is the one currently done at the end of statement when stats_command_string is enabled: we record current time when resetting the activity_string to IDLE. Is it just IDLE or also IDLE in transaction ? If we are going to change things anyway, I'd like the latter to show the time since start of transaction, so that I Would at least have an easy way to write a transaction timeout script :) I don't really care about what plain IDLE uses. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, N, 2006-06-22 kell 09:59, kirjutas Mark Woodward: After a long battle with technology, [EMAIL PROTECTED] (Mark Woodward), an earthling, wrote: Clinging to sanity, [EMAIL PROTECTED] (Mark Woodward) mumbled into It pointed to *ALL* the versions. Hmm, OK, then the problem is more serious than I suspected. This means that every index on a row has to be updated on every transaction that modifies that row. Is that correct? Yes. I am attaching some code that shows the problem with regard to applications such as web server session management, when run, each second the system can handle fewer and fewer connections. Here is a brief output: [EMAIL PROTECTED]:~/pgfoo$ ./footest 1307 sessions per second, elapsed: 1 1292 sessions per second, elapsed: 2 1287 sessions per second, elapsed: 3 1216 sessions per second, elapsed: 25 1213 sessions per second, elapsed: 26 1208 sessions per second, elapsed: 27 1192 sessions per second, elapsed: 36 1184 sessions per second, elapsed: 37 1183 sessions per second, elapsed: 38 1164 sessions per second, elapsed: 58 1170 sessions per second, elapsed: 59 1168 sessions per second, elapsed: 60 As you can see, in about a minute at high load, this very simple table lost about 10% of its performance, and I've seen worse based on update frequency. Before you say this is an obscure problem, I can tell you it isn't. I have worked with more than a few projects that had to switch away from PostgreSQL because of this behavior. You mean systems that are designed so exactly, that they can't take 10% performance change ? Or just that they did not vacuum for so long, that performance was less than needed in the end? btw, what did they switch to ? Obviously this is not a problem with small sites, but this is a real problem with an enterprise level web site with millions of visitors and actions a day. On such site you should design so that db load stays below 50% and run vacuum often, that may even mean that you run vacuum continuously with no wait between runs. If you run vacuum with right settings, Quite frankly it is a classic example of something that does not scale. The more and more updates there are, the higher the load becomes. You can see it on top as the footest program runs. Yes, you understood correctly - the more updates, the higher the load :) -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, N, 2006-06-22 kell 10:20, kirjutas Jonah H. Harris: On 6/22/06, Alvaro Herrera [EMAIL PROTECTED] wrote: Hmm, OK, then the problem is more serious than I suspected. This means that every index on a row has to be updated on every transaction that modifies that row. Is that correct? Add an index entry, yes. Again, this is a case for update-in-place. No need to write an extra index entry and incur the WAL associated with it. I guess that MySQL on its original storage does that, but they allow only one concurrent update per table and no transactions. Imagine a table with 3 indexes on it... I would estimate that we perform at least 3 to 6 times more overhead than any commercial database on such an update. One way to describe what commercial databases do to keep constant update rates is saying that they do either vacuuming as part of update, or they just use locks anf force some transactions to wait or fail/retry. Depending on exact details and optimisations done, this can be either slower or faster than postgresql's way, but they still need to do something to get transactional visibility rules implemented. There has to be a more linear way of handling this scenario. So vacuum the table often. It's easy to say VACUUM often... but I'd bet that vacuuming is going to lessen the throughput in his tests even more; no matter how it's tuned. Running VACUUM often/continuously will likely keep his update rate fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2% extra load. At least if vacuum is configured right and the server is not already running at 100% IO saturation, in which case it will be worse. The max throughput figure is not something you actually need very often in production. What is interesting is setting up the server so that you can service your loads comfortably. Running the server at 100% lead is not anything you want to do on production server. There will be things you need to do anyway and you need some headroom for that. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, N, 2006-06-22 kell 12:41, kirjutas Mark Woodward: Depending on exact details and optimisations done, this can be either slower or faster than postgresql's way, but they still need to do something to get transactional visibility rules implemented. I think they have a different strategy. I think they maintain the notion of current version of a row, and hunt for previous versions when needed, at least that's how I suspect Oracle does it with redo logs. Not current but last :) And one side effect of redo logs is that it is practically impossible to do large deletes on production databases. So you design around that, like you have to design around limitations of MVCC. There has to be a more linear way of handling this scenario. So vacuum the table often. It's easy to say VACUUM often... but I'd bet that vacuuming is going to lessen the throughput in his tests even more; no matter how it's tuned. Running VACUUM often/continuously will likely keep his update rate fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2% extra load. At least if vacuum is configured right and the server is not already running at 100% IO saturation, in which case it will be worse. Assuming the table is a reasonable size, the I/O required for vacuum doesn't kill everything else! I have solved the problem of unneccessary IO by keeping active and finished rows in separate tables, with the finish() function moving the row between tables. In case of the number of actively modified rows being in only tens or low hundreds of thousands of rows, (i.e. the modified set fits in memory) the continuous vacuum process shows up as just another backend, not really taking order of magnitude more resources. It mainly generates WAL traffic, as modified pages are already in memory/cache and are mostly synced by background writer and/or checkpoint. Of course you have to adjust vacuum_cost_* variables so as to not saturate IO. The max throughput figure is not something you actually need very often in production. No, but you need to have some degree of certainty and predictability in the system you are developing. Yup. You have to design it so it has. What is interesting is setting up the server so that you can service your loads comfortably. Running the server at 100% lead is not anything you want to do on production server. There will be things you need to do anyway and you need some headroom for that. Of course, you design it so peaks are easily managed, but unless you run vacuum continuously, and that has its own set of problems, you run into this problem, and it can get really really bad. Usually it gets really bad if you *don't* run vacuum continuously, maybe hopeing to do it in slower times at night. For high-update db you have to run it continuously, maybe having some 5-15 sec pauses between runs. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, N, 2006-06-22 kell 13:49, kirjutas Mark Woodward: Christopher Browne [EMAIL PROTECTED] writes: Basically there's no free lunch: if you want the benefits of MVCC it's going to cost you somewhere. In the Postgres design you pay by having to do VACUUM pretty often for heavily-updated tables. I don't think that decision is fundamentally wrong --- the attractive thing about it is that the overhead is pushed out of the foreground query-processing code paths. Under certain circumstances, it is a very poor design. Think of a single row table that keeps a scoreboard or a session table that keeps a limited number of rows that are updated very frequently. A single row table that every session updates is a really bad design on any database, as it is a sure point of lock contention and thus removes any possibility of concurrency. But except for locking problems, it will perform really well when you vacuum often enough :) We still have lots of work to do in making autovacuum smarter, avoiding vacuuming parts of relations that have not changed, and so on. But I have no desire to go over to an Oracle-style solution instead. We can't beat them by trying to be like them, and we run no small risk of falling foul of some of their patents if we do. I proposed having a key row entry for each logical row. The key row entry points to the latest version of the row. There, each row entry is a linked list, in descending order, of previous row versions. Do I understand right, that you are proposing a redesign of how indexing works, by updating indexes in-place. How would older rows be found then by transactions needing to see them ? Do you suggest reverting to seqscan when we see _any_ newer transactions ? Or if you want to have index pointing to latest row with each value in indexed field, how would you find the last time this value was used ? Don't tell me that you plan to trace the full update-chain on each update. Or would this new indexing mechanism be used only for non-changing key fields ? How would you check for that ? The vast majority of the time, the latest version will be the first version. Not in a web scenario. In my experience more complicated web-pages tend to produce lots of concurrent accesses. It is only when you have a previously started long running or concurrent transaction will you ever look at previous versions. I'm not saying it is an easy slam dunk, as I can think of a few difficulties off the top of my head, but it would solve the steady degradation of performance between vacuums and, to a possibly lesser extent, the cost of updating a row in a heavily indexed table. VACUUMing often also solves the problem of steady degradation of performance between vacuums :) No need to be afraid of vacuum. Vacuum is your friend! Just learn to use it right. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, R, 2006-06-23 kell 10:30, kirjutas Mark Woodward: What is interesting is setting up the server so that you can service your loads comfortably. Running the server at 100% lead is not anything you want to do on production server. There will be things you need to do anyway and you need some headroom for that. Of course, you design it so peaks are easily managed, but unless you run vacuum continuously, and that has its own set of problems, you run into this problem, and it can get really really bad. Usually it gets really bad if you *don't* run vacuum continuously, maybe hopeing to do it in slower times at night. For high-update db you have to run it continuously, maybe having some 5-15 sec pauses between runs. And how much I/O does this take? Surprisingly its mostly WAL traffic, the heap/index pages themselves are often not yet synced to disk by time of vacuum, so no additional traffic there. If you had made 5 updates per page and then vacuum it, then you make effectively 1 extra WAL write meaning 20% increase in WAL traffic. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, R, 2006-06-23 kell 18:05, kirjutas Csaba Nagy: Usually it gets really bad if you *don't* run vacuum continuously, maybe hopeing to do it in slower times at night. For high-update db you have to run it continuously, maybe having some 5-15 sec pauses between runs. And how much I/O does this take? Surprisingly its mostly WAL traffic, the heap/index pages themselves are often not yet synced to disk by time of vacuum, so no additional traffic there. If you had made 5 updates per page and then vacuum it, then you make effectively 1 extra WAL write meaning 20% increase in WAL traffic. Is this also holding about read traffic ? I thought vacuum will make a full table scan... for big tables a full table scan is always badly influencing the performance of the box. If the full table scan would be avoided, then I wouldn't mind running vacuum in a loop... I was referring to a design that keeps frequently updated tuples in a separate table. In fact I think that it would make sense to replace the whole current vacuum stuff with a background thread which does that continuously using a dead space map. That could be a heap sorted by tuple deletion time, and always cleaned up up to the oldest running transaction's start time... there would be no need for any other autovacuum then. This has been on todo list for some time already. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, R, 2006-06-23 kell 13:08, kirjutas Tom Lane: Csaba Nagy [EMAIL PROTECTED] writes: Surprisingly its mostly WAL traffic, the heap/index pages themselves are often not yet synced to disk by time of vacuum, so no additional traffic there. If you had made 5 updates per page and then vacuum it, then you make effectively 1 extra WAL write meaning 20% increase in WAL traffic. Is this also holding about read traffic ? I thought vacuum will make a full table scan... for big tables a full table scan is always badly influencing the performance of the box. If the full table scan would be avoided, then I wouldn't mind running vacuum in a loop... If you're doing heavy updates of a big table then it's likely to end up visiting most of the table anyway, no? There is talk of keeping a map of dirty pages, but I think it'd be a win for infrequently-updated tables, not ones that need constant vacuuming. I think a lot of our problems in this area could be solved with fairly straightforward tuning efforts on the existing autovacuum infrastructure. In particular, someone should be looking into recommendable default vacuum-cost-delay settings so that a background vacuum doesn't affect performance too much. One thing that would help updates quite a lot in some scenarios is keeping the pages only partially-filled, so that most updates could keep the new version in the same page. I think that has also been discussed as an option to vacuum and maybe as part of initial inserts. Maybe some of it even ended up as a todo item. Another problem with the current autovac infrastructure is that it doesn't respond very well to the case where there are individual tables that need constant attention as well as many that don't. If you have N databases then you can visit a particular table at most once every N*autovacuum_naptime seconds, and *every* table in the entire cluster gets reconsidered at that same rate. I'm not sure if we need the ability to have multiple autovac daemons running at the same time, My patch enabling effective continuous vacuum of fast-update tables, while still being able to vacuum huge slowly changing ones is still not applied. Without that patch there is no reason to vacuum the small and fast changingg tables while vacuum on bigger tables is running, as it won't clean out dead tuples anyway. but we definitely could use something with a more flexible table-visiting pattern. Perhaps it would be enough to look through the per-table stats for each database before selecting the database to autovacuum in each cycle, instead of going by least recently autovacuumed. Bottom line: there's still lots of low-hanging fruit. Why are people feeling that we need to abandon or massively complicate our basic architecture to make progress? Maybe we could start from reusing the index tuples which point to invisible tuples ? The index is not MVCC anyway, so maybe it is easier to do in-place replacement there ? This probably has the same obstacles which have prevented us from removing those in the first place (removing instead of marking as invisible). Does it cause some locking issues ? Or does it go against some other constraints of our index lookups ? I think that just setting the invisible bit in an index leaf node causes nearly as much disk io as removing the node. If we could delete/reuse old index tuples, it would solve a sizable chunk of index-growth problem, especially for cases where referenced key value does not change. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, R, 2006-06-23 kell 17:27, kirjutas Bruce Momjian: Jonah H. Harris wrote: On 6/23/06, Tom Lane [EMAIL PROTECTED] wrote: What I see in this discussion is a huge amount of the grass must be greener on the other side syndrome, and hardly any recognition that every technique has its downsides and complications. I'm being totally objective. I don't think we should abandon PostgreSQL's overall design at all, because we do perform INSERTs and DELETEs much better than most systems. However, I've looked at many systems and how they implement UPDATE so that it is a scalable operation. Sure, there are costs and benefits to each implementation, but I think we have some pretty brilliant people in this community and can come up with an elegant design for scalable UPDATEs. I think the UPDATE case is similar to the bitmap index scan or perhaps bitmap indexes on disk --- there are cases we know can not be handled well by our existing code, so we have added (or might add) these features to try to address those difficult cases. Not really. Bitmap index scan and bitmap index are both new additions working well with existing framework. While the problem of slowdown on frequent updates is real, the suggested fix is just plain wrong, as it is based on someones faulty assumption on how index lookup works, and very much simplified view of how different parts of the system work to implement MVCC. The original fix he suggests was to that imagined behaviour and thus ignored all the real problems of such change. All the next suggestions were variations of the first ones, and failed to address or even research any problems brought up. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, L, 2006-06-24 kell 15:44, kirjutas Jan Wieck: That fixes the symptom, not the problem. The problem is performance steadily degrades over time. No, you got it backwards. The performance degradation is the symptom. The problem is that there are too many dead tuples in the table. There is one way to solve that problem -- remove them, which is done by running vacuum. Precisely. There are some problems with vacuum itself, that I agree with. For example it would be good if a long-running vacuum wouldn't affect a vacuum running in another table because of the long-running transaction effect it has. It would be good if vacuum could be run partially over a table. It would be good if there was a way to speed up vacuum by using a dead space map or something. It would be good if vacuum wouldn't waste time on blocks that don't have any possible work in them. Vacuum has two main purposes. A) remove dead rows and B) freeze xids. Once a block has zero deleted rows and all xids are frozen, there is nothing to do with this block and vacuum should skip it until a transaction updates that block. This requires 2 bits per block, which is 32K per 1G segment of a heap. Clearing the bits is done when the block is marked dirty. This way vacuum would not waste any time and IO on huge slow changing tables. That part, sequentially scanning huge tables that didn't change much is what keeps us from running vacuum every couple of seconds. Seems like a plan. Still, there is another problem which is not solved by map approach only, at least with current implementation of vacuum. This is the fact that we need to do full scan over index(es) to clean up pointers to removed tuples. And huge tables tend to have huge indexes. As indexes have no MVCC info inside them, it may be possible to start reusing index entries pointing to rows that are invisible to all running transactions. Currently we just mark these index entries as dead, but maybe there is a way to reuse them. This could solve the index bloat problem for may cases. Another possible solution for indexes with mostly dead pointers is doing a reindex, but this will become possible only after we have implemented a concurrent, non-blocking CREATE INDEX. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, L, 2006-06-24 kell 19:36, kirjutas Bruce Momjian: Hannu Krosing wrote: ?hel kenal p?eval, R, 2006-06-23 kell 13:08, kirjutas Tom Lane: Bottom line: there's still lots of low-hanging fruit. Why are people feeling that we need to abandon or massively complicate our basic architecture to make progress? Maybe we could start from reusing the index tuples which point to invisible tuples ? The index is not MVCC anyway, so maybe it is easier to do in-place replacement there ? This probably has the same obstacles which have prevented us from removing those in the first place (removing instead of marking as invisible). Does it cause some locking issues ? Or does it go against some other constraints of our index lookups ? I think that just setting the invisible bit in an index leaf node causes nearly as much disk io as removing the node. If we could delete/reuse old index tuples, it would solve a sizable chunk of index-growth problem, especially for cases where referenced key value does not change. I think heap _and_ index reuse is the only useful direction. Index or heap reuse alone seems too marginal for the added complexity. Sure, but index reuse seems a lot easier, as there is nothing additional to remember or clean out when doing it. When reusing a heap tuple you have to clean out all index entries pointing to it. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, P, 2006-06-25 kell 14:24, kirjutas Bruce Momjian: Jan Wieck wrote: Sure, but index reuse seems a lot easier, as there is nothing additional to remember or clean out when doing it. Yes, seems so. TODO added: * Reuse index tuples that point to heap tuples that are not visible to anyone? When reusing a heap tuple you have to clean out all index entries pointing to it. Well, not for UPDATE for no key changes on the same page, if we do that. An update that results in all the same values of every indexed column of a known deleted invisible tuple. This reused tuple can by definition not be the one currently updated. So unless it is a table without a primary key, this assumes that at least 3 versions of the same row exist within the same block. How likely is that to happen? Good question. You take the current tuple, and make another one on the same page. Later, an update can reuse the original tuple if it is no longer visible to anyone (by changing the item id), so you only need two tuples, not three. My hope is that a repeated update would eventually move to a page that enough free space for two (or more) versions. I can confirm that this is exactly what happens when running an update-heavy load with frequent vacuums. Eventually most rows get their own db pages or share the same page with 2-3 rows. And there will be lots of unused (filed up, or cleaned and not yet reused) pages. The overall performance could be made a little better by tuning the system to not put more than N new rows on the same page at initial insert or when the row move to a new page during update. Currently several new rows are initially put on the same page and then move around during repeated updates until they slow(ish)ly claim their own page. Does that help explain it? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, P, 2006-06-25 kell 06:52, kirjutas Mark Woodward: I'm not sure why vacuum can't run similarly to the way it does now. What do you mean ? Currently vacuum runs a three-step process 1) runs a full scan over heap and collects all dead tuple ctids from heap 2) run full scans over all indexes of the relation and removes any pointers pointing to dead tuples. 3) runs another full scan over heap and removes the tuples in the list collected at step 1. There is no modifications done to live tuples (ok, they *may* get frozen if they are above certain age, but this is not relevant to current discussion). -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, E, 2006-06-26 kell 14:56, kirjutas Martijn van Oosterhout: On Mon, Jun 26, 2006 at 07:17:31AM -0400, Bruce Momjian wrote: Correct! We use the same pointers used by normal UPDATEs, except we set a bit on the old tuple indicating it is a single-index tuple, and we don't create index entries for the new tuple. Index scan routines will need to be taught about the new chains, but because only one tuple in the chain is visible to a single backend, the callers should not need to be modified. I suppose we would also change the index_getmulti() function to return a set of ctids plus flags so the caller knows to follow the chains, right? It is probably better to always return the pointer to the head of CITC chain (the one an index points to) and do extra visibility checks and chain-following on each access. This would keep the change internal to tuple fetching functions. And for bitmap index scans you would only remember the page in the case of such a tuple, since you can't be sure the exact ctid you've got is the one you want. no, you should only use the pointer to CITC head outside tuple access funtions. And this pointer to CITC head is what is always passed to those access functions/macros. The VACUUM would run its passes thus: pass 1: run over heap, collect pointers to single dead tuples, and fully dead CITC chains (fully dead = no live tuples on this page). Clean up old tuples from CITC chains and move live tuples around so that CITC points to oldest possibly visible (not vacuumed) tuple. Doing this there frees us from need to collect a separate set of pointers for those. Or have you planned that old tuples from CITC chains are collected on the go/as needed ? Of course we could do both. pass 2: clean indexes based on ctid from pass 1 pass 3: clean heap based on ctid from pass 1 If yo do it this way, you dont need to invent new data structures to pass extra info about CITC internals to passes 2 and 3 On more thing - when should free space map be notified about free space in pages with CITC chains ? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, E, 2006-06-26 kell 09:10, kirjutas Mark Woodward: Ãœhel kenal päeval, R, 2006-06-23 kell 17:27, kirjutas Bruce Momjian: Jonah H. Harris wrote: On 6/23/06, Tom Lane [EMAIL PROTECTED] wrote: What I see in this discussion is a huge amount of the grass must be greener on the other side syndrome, and hardly any recognition that every technique has its downsides and complications. I'm being totally objective. I don't think we should abandon PostgreSQL's overall design at all, because we do perform INSERTs and DELETEs much better than most systems. However, I've looked at many systems and how they implement UPDATE so that it is a scalable operation. Sure, there are costs and benefits to each implementation, but I think we have some pretty brilliant people in this community and can come up with an elegant design for scalable UPDATEs. I think the UPDATE case is similar to the bitmap index scan or perhaps bitmap indexes on disk --- there are cases we know can not be handled well by our existing code, so we have added (or might add) these features to try to address those difficult cases. Not really. Bitmap index scan and bitmap index are both new additions working well with existing framework. While the problem of slowdown on frequent updates is real, the suggested fix is just plain wrong, as it is based on someones faulty assumption on how index lookup works, and very much simplified view of how different parts of the system work to implement MVCC. Yes, the suggestion was based on MVCC concepts, not a particular implementation. On the contrary - afaik, it was loosely based on how Oracle does it with its rollback segments, only assuming that rollback segments are kept in heap and that indexes point only to the oldest row version :p The original fix he suggests was to that imagined behaviour and thus ignored all the real problems of such change. The original suggestion, was nothing more than a hypothetical for the purpose of discussion. The problem was the steady degradation of performance on frequent updates. That was the point of discussion. I brought up one possible way to start a brain storm. The discussion then morphed into critisizing the example and not addressing the problem. The problem is heatedly discussed every 3-4 months. Anyway, I think some decent discussion about the problem did happen, and that is good. Agreed. Maybe this _was_ the best way to bring up the discussion again. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, E, 2006-06-26 kell 10:50, kirjutas Bruce Momjian: Hannu Krosing wrote: ?hel kenal p?eval, E, 2006-06-26 kell 14:56, kirjutas Martijn van Oosterhout: On Mon, Jun 26, 2006 at 07:17:31AM -0400, Bruce Momjian wrote: Correct! We use the same pointers used by normal UPDATEs, except we set a bit on the old tuple indicating it is a single-index tuple, and we don't create index entries for the new tuple. Index scan routines will need to be taught about the new chains, but because only one tuple in the chain is visible to a single backend, the callers should not need to be modified. I suppose we would also change the index_getmulti() function to return a set of ctids plus flags so the caller knows to follow the chains, right? It is probably better to always return the pointer to the head of CITC chain (the one an index points to) and do extra visibility checks and chain-following on each access. This would keep the change internal to tuple fetching functions. So index_getnext() traverses the chain and returns one member per call. Makes sense. Just realize you are in a single index entry returning multiple tuples. We will need some record keeping to track that. Maybe we need to push visibility checks further down, so that index_getnext() returns only the one heap row that is visible. And for bitmap index scans you would only remember the page in the case of such a tuple, since you can't be sure the exact ctid you've got is the one you want. no, you should only use the pointer to CITC head outside tuple access funtions. And this pointer to CITC head is what is always passed to those access functions/macros. The VACUUM would run its passes thus: pass 1: run over heap, collect pointers to single dead tuples, and fully dead CITC chains (fully dead = no live tuples on this page). Clean up old tuples from CITC chains and move live tuples around so that CITC points to oldest possibly visible (not vacuumed) tuple. Doing this there frees us from need to collect a separate set of pointers for those. Or have you planned that old tuples from CITC chains are collected on the go/as needed ? Of course we could do both. Non-visible CITC members should be freed during an UPDATE on the same page, so vacuum doesn't have to be involved. Ok. pass 2: clean indexes based on ctid from pass 1 pass 3: clean heap based on ctid from pass 1 If yo do it this way, you dont need to invent new data structures to pass extra info about CITC internals to passes 2 and 3 On more thing - when should free space map be notified about free space in pages with CITC chains ? Uh, well, I am thinking we only free CITC space when we are going to use it for an UPDATE, rather than free things while doing an operation. It is good to keep the cleanup overhead out of the main path as much as possible. So vacuum should only remove dead CITC chains and leave the ones with live tuples to CITC internal use ? That would also suggest that pages having live CITC chains and less than N% of free space should mot be reported to FSM. Also, seems I can't spell algorithms very well: Definition: Single-Index-Tuple Chain (SITC) - Thinking of vacuum, right now it does these cleanups: o non-visible UPDATEs on the same page with no key changes o non-visible UPDATEs on the same page with key changes o non-visible UPDATEs on different pages o DELETEs o aborted transactions The big question is what percentage of dead space is the first one? My guess is 65%. Can be from 0% to 99.9%, very much dependent on application. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com NOTICE: This communication contains privileged or other confidential information. If you have received it in error, please advise the sender by reply email and immediately delete the message and any attachments without copying or disclosing the contents. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, E, 2006-06-26 kell 16:58, kirjutas Martijn van Oosterhout: On Mon, Jun 26, 2006 at 10:50:26AM -0400, Bruce Momjian wrote: I suppose we would also change the index_getmulti() function to return a set of ctids plus flags so the caller knows to follow the chains, right? It is probably better to always return the pointer to the head of CITC chain (the one an index points to) and do extra visibility checks and chain-following on each access. This would keep the change internal to tuple fetching functions. So index_getnext() traverses the chain and returns one member per call. Makes sense. Just realize you are in a single index entry returning multiple tuples. We will need some record keeping to track that. Yes, and for index_getmulti (which doesn't visit the heap at all) we'll have to change all the users of that (which aren't many, I suppose). It's probably worth making a utility function to expand them. I'm still confused where bitmap index scan fit into all of this. Is preserving the sequential scan aspect of these a goal with this new setup? Bitmap index scan does not have to change much - only the function that gets tuple by its ctid must be able to trace forward chains within the page. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian: Jim C. Nasby wrote: On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote: It is certainly possible to do what you are suggesting, that is have two index entries point to same chain head, and have the index access routines figure out if the index qualifications still hold, but that seems like a lot of overhead. I think Jim meant not 2 pointing to the same head, but 2 pointing into the same chain. Say we have table with (id serial, ts timestamp) where ts changes at each update and id does not. So after 3 updates on one page we have one CITC/ITPC head with pointers from both indexes and two follow-up tuples with pointers from only ts index. The problem with this setup is, that we can't reuse any of those follow-up tuples without index cleanup. Also, once there is only one visible row in the chain, removing old index entries seems quite complex because you have to have vacuum keep the qualifications of each row to figure out which index tuple is the valid one (seems messy). Perhaps my point got lost... in the case where no index keys change during an update, SITC seems superior in every way to my proposal. My idea (let's call it Index Tuple Page Consolidation, ITPC) would be beneficial to UPDATEs that modify one or more index keys but still put the tuple on the same page. Where SITC would be most useful for tables that have a very heavy update rate and very few indexes, ITPC would benefit tables that have more indexes on them; where presumably it's much more likely for UPDATEs to change at least one index key (which means SITC goes out the window, if I understand it correctly). If I'm missing something and SITC can in fact deal with some index keys changing during an UPDATE, then I see no reason for ITPC. I understood what you had said. The question is whether we want to get that complex with this feature, and if there are enough use cases (UPDATE with index keys changing) to warrant it. I'd like to think that most heavily-updated tables avoid that, but there may be still cases where this is needed. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing: Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian: Jim C. Nasby wrote: On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote: It is certainly possible to do what you are suggesting, that is have two index entries point to same chain head, and have the index access routines figure out if the index qualifications still hold, but that seems like a lot of overhead. I think Jim meant not 2 pointing to the same head, but 2 pointing into the same chain. Say we have table with (id serial, ts timestamp) where ts changes at each update and id does not. So after 3 updates on one page we have one CITC/ITPC head with pointers from both indexes and two follow-up tuples with pointers from only ts index. The problem with this setup is, that we can't reuse any of those follow-up tuples without index cleanup. But we still have to think about similar cases (index entries pointing inside CITC chains), unless we plan to disallow adding indexes to tables. Perhaps that case has to simply disable heap tuple reuse until some event. what would that event be? Or maybe we should have some bitmap of dirty tuple ids inside each page, that is tuple ids that have index pointers to them. and then avoid using these ? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, E, 2006-06-26 kell 11:31, kirjutas Bruce Momjian: Hannu Krosing wrote: pass 3: clean heap based on ctid from pass 1 If yo do it this way, you dont need to invent new data structures to pass extra info about CITC internals to passes 2 and 3 On more thing - when should free space map be notified about free space in pages with CITC chains ? Uh, well, I am thinking we only free CITC space when we are going to use it for an UPDATE, rather than free things while doing an operation. It is good to keep the cleanup overhead out of the main path as much as possible. So vacuum should only remove dead CITC chains and leave the ones with live tuples to CITC internal use ? Yes, it has to. What else would it do? Add index entries? No, clean out the dead part. But this would probably add the page to FSM - do we want that. Also, this cleaning should probably be done at pass1, so we dont have to carry the ctids of tuples which have no index entries around to passes 2 and 3 . This has the downside of possibly writing the heap page twice, so maybe we dont want it. That would also suggest that pages having live CITC chains and less than N% of free space should mot be reported to FSM. Parts of the CITC that are not visible can be used for free space by vacuum, but the visible part is left alone. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Single Index Tuple Chain (SITC) method
Ühel kenal päeval, K, 2006-06-28 kell 18:19, kirjutas Tom Lane: Bruce Momjian [EMAIL PROTECTED] writes: Here is an overview of the SITC method: http://momjian.us/cgi-bin/pgsitc A pretty fundamental problem is that the method assumes it's OK to change the CTID of a live tuple (by swapping its item pointer with some expired version). It is not --- this will break: * active UPDATEs and DELETEs that may have fetched the CTID but not yet completed processing to decide whether to change the tuple; * pending AFTER ROW triggers, such as foreign key checks; * ODBC as well as other applications that assume CTID is a usable unique row identifier within transactions. We should *always* return the ctid of CITC head, as this is the one that does not change. And anyway, ctid is a usable unique row identifier only within read-only transactions, or not ? VACUUM FULL can get away with moving tuples to new CTIDs because it takes AccessExclusiveLock, so there can be no open transactions with knowledge of current CTIDs in the table. This is not OK for something that's supposed to happen in plain UPDATEs, though. Would it still be a problem, if we *always* refer to the whole CITC chain by its externally visible ctid, an look up the real tuple inside tuple fetch op at every access. (1) If we had some special bits for tuples at CITC chain head and inside CITC but not at head, then even seqscan can ignore non-head CITC chain members at its find next tuple op and do the real tuple lookup in some inner function when it hits CITC head. Is it correct to assume, that only one row version can be in process of being modified at any one time? Another problem is you can't recycle tuples, nor item ids, without taking a VACUUM-style lock on the page (LockBufferForCleanup). If anyone else is holding a pin on the page they risk getting totally confused --- for instance, a seqscan will either miss a tuple or scan it twice depending on which direction you're juggling item ids around it. I think (1) above solves this, at cost of looking twice at CITC internal tuple headers. The concurrency loss involved in LockBufferForCleanup is OK for background-maintenance operations like VACUUM, but I seriously doubt anyone will find it acceptable for UPDATE. It could easily create application-level deadlocks, too. (VACUUM is safe against that because it only holds one lock.) Tom - what do you think of the other related idea, that of reusing dead index entries ? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, T, 2006-06-27 kell 12:16, kirjutas Bruce Momjian: Hannu Krosing wrote: ?hel kenal p?eval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing: ?hel kenal p?eval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian: Jim C. Nasby wrote: On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote: It is certainly possible to do what you are suggesting, that is have two index entries point to same chain head, and have the index access routines figure out if the index qualifications still hold, but that seems like a lot of overhead. I think Jim meant not 2 pointing to the same head, but 2 pointing into the same chain. Say we have table with (id serial, ts timestamp) where ts changes at each update and id does not. So after 3 updates on one page we have one CITC/ITPC head with pointers from both indexes and two follow-up tuples with pointers from only ts index. The problem with this setup is, that we can't reuse any of those follow-up tuples without index cleanup. But we still have to think about similar cases (index entries pointing inside CITC chains), unless we plan to disallow adding indexes to tables. CREATE INDEX has to undo any chains where the new indexed columns change in the chain, and add index entries to remove the chain. Yes, that would be the most straightforward solution. It could be better in some cases, if we could avoid adding entries to other indexes. Maybe we can just reset some flags, so that some SITC ops like finding tuples by the CITC index pointer still work while adding new entries wont. But it will be tricky to make this work for bitmap index scans. So yes, index build is a slop operation anyway, so making it even a little slower is probably not a big problem. And most CITC chains will have only one visible row at a time, this will probably not be a big issue. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Single Index Tuple Chain (SITC) method
Ühel kenal päeval, N, 2006-06-29 kell 12:35, kirjutas Tom Lane: Martijn van Oosterhout kleptog@svana.org writes: Tom - what do you think of the other related idea, that of reusing dead index entries ? Possibly workable for btree now that we do page-at-a-time index scans; however I'm pretty hesitant to build any large infrastructure atop that change until we've got more performance results. We might yet end up reverting it. Another issue is that this would replace a simple hint-bit setting with an index change that requires a WAL entry. There'll be more WAL traffic altogether from backends retail-deleting index tuples than there would be from VACUUM cleaning the whole page at once --- and it won't cut the I/O demand from VACUUM any, either, since VACUUM still has to scan the index. AFAICS this wouldn't make VACUUM either cheaper or less necessary, so I'm not sure I see the point. How can it generate more traffic ? When you replace a dead index entry with a live one, you just reuse space - you would have to WAL log the index in both cases (adding a new entry or replacing dead entry) Espacially in the case, where you replace an index entryu with the same value. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Index corruption
Ühel kenal päeval, N, 2006-06-29 kell 16:42, kirjutas Chris Browne: [EMAIL PROTECTED] (Marc Munro) writes: As you see, slony is attempting to enter one tuple ('374520943','22007','0') two times. Each previous time we have had this problem, rebuilding the indexes on slony log table (sl_log_1) has fixed the problem. I have not reindexed the table this time as I do not want to destroy any usable evidence. We have seen this phenomenon on 7.4.8 several times; pulled dumps of sl_log_1 and index files that Jan Wieck looked at, which alas hasn't led to a fix. He did, mind you, find some concurrency pattern that led, if memory serves, to 7.4.12's release. We had experienced cases where there was some worse corruption that required that we rebuild replicas from scratch :-(. How well did you check the C-language triggers and special slony functions for possibly corrupting some backend/shared-mem structures ? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Index corruption
Ühel kenal päeval, N, 2006-06-29 kell 17:23, kirjutas Tom Lane: Marc Munro [EMAIL PROTECTED] writes: Tom, we have a newer and much smaller (35M) file showing the same thing: Thanks. Looking into this, what I find is that *both* indexes have duplicated entries for the same heap tuple: ... However, the two entries in idx1 contain different data!! What I speculate right at the moment is that we are not looking at index corruption at all, but at heap corruption: somehow, the first insertion into ctid (27806,2) got lost and the same ctid got re-used for the next inserted row. We fixed one bug like this before ... Marc: do you have triggers on some replicated tables ? I remember having some corruption in a database with weird circular trigger structures, some of them being slony log triggers. The thing that seemed to mess up something inside there, was when change on parent rownt fired a trigger that changes child table rows and there rows fired another trigger that changed the same parent row again. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Index corruption
Ühel kenal päeval, R, 2006-06-30 kell 12:05, kirjutas Jan Wieck: On 6/30/2006 11:55 AM, Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: On 6/30/2006 11:17 AM, Marko Kreen wrote: If the xxid-s come from different DB-s, then there can still be problems. How so? They are allways part of a multi-key index having the originating node ID first. Really? create table @[EMAIL PROTECTED] ( log_origin int4, log_xid @[EMAIL PROTECTED], log_tableid int4, log_actionseq int8, log_cmdtype char, log_cmddata text ); create index sl_log_1_idx1 on @[EMAIL PROTECTED] (log_origin, log_xid @[EMAIL PROTECTED], log_actionseq); create index sl_log_1_idx2 on @[EMAIL PROTECTED] (log_xid @[EMAIL PROTECTED]); You're right ... forgot about that one. And yes, there can be transactions originating from multiple origins (masters) in the same log. The thing is, the index is only there because in a single origin situation (most installations are), the log_origin is allways the same. The optimizer therefore sometimes didn't think using an index at all would be good. However, transactions from different origins are NEVER selected together and it wouldn't make sense to compare their xid's anyway. So the index might return index tuples for rows from another origin, but the following qualifications against the log_origin in the heap tuple will filter them out. The problem was not only with returning too many rows from tuples, but as much returning too few. In case when you return too few rows some actions will just be left out from replication and thus will be missing from slaves. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] update/insert, delete/insert efficiency WRT vacuum
Ühel kenal päeval, T, 2006-07-04 kell 14:53, kirjutas Zeugswetter Andreas DCP SD: Is there a difference in PostgreSQL performance between these two different strategies: if(!exec(update foo set bar='blahblah' where name = 'xx')) exec(insert into foo(name, bar) values('xx','blahblah'); or In pg, this strategy is generally more efficient, since a pk failing insert would create a tx abort and a heap tuple. (so in pg, I would choose the insert first strategy only when the insert succeeds most of the time (say 95%)) Note however that the above error handling is not enough, because two different sessions can still both end up trying the insert (This is true for all db systems when using this strategy). I think the recommended strategy is to first try tu UPDATE, if not found then INSERT, if primary key violation on insert, then UPDATE -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] A couple thoughts about btree fillfactor
Ühel kenal päeval, E, 2006-07-10 kell 12:36, kirjutas Tom Lane: 3. What should the minimum fillfactor be? The patch as submitted set the minimum to 50% for all relation types. I'm inclined to think we should allow much lower fillfactors, maybe down to 10%. A really low fillfactor could be a good idea in a heavily updated table --- at least, I don't think we have any evidence to prove that it's not sane to want a fillfactor below 50%. Sure 50% is way too big as an lower limit. We may even want to have pages that have only 1 tuple in heavy update cases. So perhaps we should set the minimum to 1% or even 0.1% and apply similar logic you suggested for btree pages above, that is stop adding new ones when the threasold is reached. Comments? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Warm-Standby using WAL archiving / Seperate
Ühel kenal päeval, T, 2006-07-11 kell 08:38, kirjutas Andrew Rawnsley: Just having a standby mode that survived shutdown/startup would be a nice start... I think that Simon Riggs did some work on this at the code sprint yesterday. I also do the blocking-restore-command technique, which although workable, has a bit of a house-of-cards feel to it sometimes. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Max size of a btree index entry
Ühel kenal päeval, T, 2006-07-11 kell 10:46, kirjutas Josh Berkus: Tom, Obviously a tree containing many such pages would be awfully inefficient to search, but I think a more common case is that there are a few wide entries in an index of mostly short entries, and so pushing the hard limit up a little would add some flexibility with little performance cost in real-world cases. Have I missed something? Is this worth changing? Not sure. I don't know that the difference between 2.7K and 3.9K would have ever made a difference to me in any real-world case. One (hopefully) soon-to-be real-world case is index-only queries. We discussed one approach with Luke and he expressed interest in getting actually done in not too distant future. If we're going to tinker with this code, it would be far more valuable to automatically truncate b-tree entries at, say, 1K so that they could be efficiently indexed. That would not work, if we want to get all data from indexes. Maybe compressing the keys (like we do for TOAST) would be a better solution. Of course, a quick archives search of -SQL, -Newbie and -General would indicate how popular of an issue this is. It may become populat again, when we will be able to do index-only scans. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Three weeks left until feature freeze
Ühel kenal päeval, T, 2006-07-11 kell 17:34, kirjutas Thomas Hallgren: Josh Berkus wrote: Tom, IOW pljava is (already) bigger than the other four PLs put together. That is odd. Thomas? It's not that odd really: 1. the mapping is strongly typed, i.e. each scalar type in PostgreSQL has a set of functions that maps it to the correct primitive in Java (int4 is a java int, double precision is a double etc.). PL/Java will resort to string coercion only when no other option is left. 2. a type mapping is provided for *all* types. Scalar, composite, pseudo, array types, and result sets. 3. new Java mappings can be created on the fly. Both for scalar and composite types. 4. you can create new scalar types in PostgreSQL that uses IO functions written in Java. Maybe this functionality could be lifted out of PL/Java and made available to all PL-s ? At least at some API level. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] More nuclear options
Ühel kenal päeval, T, 2006-07-11 kell 14:05, kirjutas Josh Berkus: Robert, Given the current number of projects that have no code / files / anything associated with them on pgfoundry/gborg right now, this argument rings a little hollow. If you're so keen to add to the problem, you can have my spot as pgfoundry admin Why not just make *one* project, called DeadProjects and keep one tarball + one README.TXT per directory under it, so that in the unlikely event that someone (pg_necromancer ?) does want to resurrect a dead project he/she/it has a place to get the code from. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Three weeks left until feature freeze
Ühel kenal päeval, K, 2006-07-12 kell 09:49, kirjutas Kaare Rasmussen: There should be a Procedural Language section on pgfoundry for all of the PLs, IMHO, and a README in contrib within core that points to it (README.procedural_languages, if nothing else) ... I thought that the general consensus was that only plpgsql ought to be in core, the rest should be independent projects. That would be doable if we had a stable language API. As i understand it, we still dont. And even more - most of the changes to API come frome the needs of those (new) languages It would be nice to have an easy way to retrieve and install the desired PL's but that's more of a packaging issue. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pgsql-patches considered harmful
Ühel kenal päeval, K, 2006-07-12 kell 23:04, kirjutas Marc G. Fournier: On Wed, 12 Jul 2006, Magnus Hagander wrote: There are list servers out there capable of simply ripping any attachments to a message (possibly over a certain size) and stick it on a website, replacing it with a link in the email. Is majordomo one of them? Majordomo2 has a 'hook' for it, but, like most OSS software, nobody has had the requirement to actually code it ... any perl experts here interested in doing it? Does it have to be perl ? I can do it in python in an hour or two. Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Three weeks left until feature freeze
Ühel kenal päeval, K, 2006-07-12 kell 17:48, kirjutas Thomas Hallgren: Andrew Dunstan wrote: There is in effect no API at all, other than what is available to all backend modules. If someone wants to create an API which will be both sufficiently stable and sufficiently complete to meet the needs of the various PLs (especially, as Hannu rightly observes, any new PLs that come along) then we can revisit this question. Until then I suggest that it is at best premature. I am not even sure such a thing is actually possible. I concur with this. The needs for a module like PL/Java is very different then the needs of PL/Perl so let's get some more PL's in before we do a refactoring effort to create common API's. Personally, I'm not sure what would be included. The call handler API's together with the SPI API's are in essence what you need. The rest is fairly specialized anyway. http://pgfoundry.org/projects/python seems to do something similar to what you describe for PL/Java. For example it is using postgreSQLs native types and operators for calculations instead of converting types to pl's native types. and it also has lots of code lines ;) -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Forcing wal rotation
Ühel kenal päeval, R, 2006-07-14 kell 17:39, kirjutas Simon Riggs: On Fri, 2006-07-14 at 12:09 -0400, Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: I've now thought about how to fix that without doing that rather crude rsync-pg_xlog-hack. I've read through the code, and learned that wal-segments are expected to have a specific size - thus rotating them early is not that easy. Simon was working on a patch for this at the code sprint; I think it's submitted to -patches already. Slightly different patch. I'm working on this one still. What is your approach here ? And by any chance, do you plan to backport the standby WAL playback mode patches to 8.0 and 8.1 series ? Explicitly filling the segment as you propose would be really bad for performance. Yes, current approach I am taking is better than that. Another thing that was discussed was adding a function to postgres that could be called to get current WAL file and offset, so an external process could do async wal-copying at the time WAL is being written instead of copying it all when it is finished. This could reduce the lag of data availability to only (fractions of) seconds. Is anyone working on it ? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match