Re: [HACKERS] GROUPING SETS revisited
Hello 2010/8/3 Joshua Tolley : > In case anyone's interested, I've taken the CTE-based grouping sets patch from > [1] and made it apply to 9.1, attached. I haven't yet done things like checked > it for whitespace consistency, style conformity, or anything else, but (tuits > permitting) hope to figure out how it works and get it closer to commitability > in some upcoming commitfest. > > I mention it here so that if someone else is working on it, we can avoid > duplicated effort, and to see if a CTE-based grouping sets implementation is > really the way we think we want to go. > I am plaing with it now :). I have a plan to replace CTE with similar but explicit executor node. The main issue of existing patch was using just transformation to CTE. I agree, so it isn't too much extensiable in future. Now I am cleaning identifiers little bit. Any colaboration is welcome. My plan: 1) clean CTE patch 2) replace CTE with explicit executor node, but still based on tuplestore 3) when will be possible parallel processing based on hash agg - then we don't need to use tuplestore comments?? Regards Pavel > [1] http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.php > > -- > Joshua Tolley / eggyknap > End Point Corporation > http://www.endpoint.com > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkxXrggACgkQRiRfCGf1UMMlCQCglaIdtPj8Qe6G60V2LHn5pFNn > kgIAniXRgIQEbVrK/eDVZnmKCzw33lT9 > =XVVV > -END PGP SIGNATURE- > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GROUPING SETS revisited
In case anyone's interested, I've taken the CTE-based grouping sets patch from [1] and made it apply to 9.1, attached. I haven't yet done things like checked it for whitespace consistency, style conformity, or anything else, but (tuits permitting) hope to figure out how it works and get it closer to commitability in some upcoming commitfest. I mention it here so that if someone else is working on it, we can avoid duplicated effort, and to see if a CTE-based grouping sets implementation is really the way we think we want to go. [1] http://archives.postgresql.org/pgsql-hackers/2009-05/msg00700.php -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com diff --git a/src/backend/parser/Makefile b/src/backend/parser/Makefile index a8f4c07..fb248a6 100644 *** a/src/backend/parser/Makefile --- b/src/backend/parser/Makefile *** override CPPFLAGS := -I. -I$(srcdir) $(C *** 15,21 OBJS= analyze.o gram.o keywords.o kwlookup.o parser.o \ parse_agg.o parse_clause.o parse_coerce.o parse_cte.o parse_expr.o \ parse_func.o parse_node.o parse_oper.o parse_param.o parse_relation.o \ ! parse_target.o parse_type.o parse_utilcmd.o scansup.o FLEXFLAGS = -CF --- 15,21 OBJS= analyze.o gram.o keywords.o kwlookup.o parser.o \ parse_agg.o parse_clause.o parse_coerce.o parse_cte.o parse_expr.o \ parse_func.o parse_node.o parse_oper.o parse_param.o parse_relation.o \ ! parse_target.o parse_type.o parse_utilcmd.o scansup.o parse_gsets.o FLEXFLAGS = -CF diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 6b99a10..1b579a8 100644 *** a/src/backend/parser/analyze.c --- b/src/backend/parser/analyze.c *** *** 34,39 --- 34,40 #include "parser/parse_clause.h" #include "parser/parse_coerce.h" #include "parser/parse_cte.h" + #include "parser/parse_gsets.h" #include "parser/parse_oper.h" #include "parser/parse_param.h" #include "parser/parse_relation.h" *** parse_sub_analyze(Node *parseTree, Parse *** 150,155 --- 151,313 } /* + * process GROUPING SETS + */ + static SelectStmt * + makeSelectStmt(List *targetList, List *fromClause) + { + SelectStmt *n = makeNode(SelectStmt); + n->distinctClause = NULL; + n->intoClause = NULL; + n->targetList = targetList; + n->fromClause = fromClause; + n->whereClause = NULL; + n->groupClause = NULL; + n->havingClause = NULL; + n->windowClause = NIL; + n->withClause = NULL; + n->valuesLists = NIL; + n->sortClause = NIL; + n->limitOffset = NULL; + n->limitCount = NULL; + n->lockingClause = NIL; + n->op = SETOP_NONE; + n->all = false; + n->larg = NULL; + n->rarg = NULL; + return n; + } + + static List * + makeStarTargetList(void) + { + ResTarget *rt = makeNode(ResTarget); + + rt->name = NULL; + rt->indirection = NIL; + rt->val = (Node *) makeNode(ColumnRef); + ((ColumnRef *) rt->val)->fields = list_make1(makeNode(A_Star)); + rt->location = -1; + + return list_make1(rt); + } + + static SelectStmt * + transformGroupingSets(ParseState *pstate, SelectStmt *stmt) + { + if (stmt->groupClause && IsA(stmt->groupClause, GroupByClause)) + { + GroupingSetsSpec *gss = (GroupingSetsSpec *) expandGroupingSets(pstate, + (List *)((GroupByClause *)stmt->groupClause)->fields); + + if (pstate->p_hasGroupingSets) + { + CommonTableExpr *cte = makeNode(CommonTableExpr); + SelectStmt *cteedstmt; + int ngroupingsets = list_length(gss->set_list) + (gss->has_empty_set ? 1 : 0); + bool all = ((GroupByClause *) stmt->groupClause)->all; + + cteedstmt = makeSelectStmt(NIL, NIL); + cteedstmt->intoClause = stmt->intoClause; + cteedstmt->sortClause = stmt->sortClause; + cteedstmt->limitOffset = stmt->limitOffset; + cteedstmt->limitCount = stmt->limitCount; + cteedstmt->lockingClause = stmt->lockingClause; + + cte->ctename = "**g**"; + cte->ctequery = (Node *) stmt; + cte->location = -1; + + cteedstmt->withClause = makeNode(WithClause); + cteedstmt->withClause->ctes = list_make1(cte); + cteedstmt->withClause->recursive = false; + cteedstmt->withClause->location = -1; + + /* when is more than one grouping set, then we should generate setop node */ + if (ngroupingsets > 1) + { + /* add quuery under union all for every grouping set */ + SelectStmt *larg = NULL; + SelectStmt *rarg; + ListCell*lc; + + foreach(lc, gss->set_list) + { + List *groupClause; + + Assert(IsA(lfirst(lc), List)); + groupClause = (List *) lfirst(lc); + + if (larg == NULL) + { + larg = makeSelectStmt(copyObject(stmt->targetList), + list_make1(makeRangeVar(NULL, "**g**", -1))); + larg->groupClause = (Node *) groupClause; + larg->havingClause = copyObject(stmt->havingClause); + } + else + { + SelectStmt *setop = makeSelectStmt(NIL, NIL); + + rarg
Re: [HACKERS] review: psql: edit function, show function commands patch
2010/8/3 Robert Haas : > On Mon, Aug 2, 2010 at 11:17 PM, Tom Lane wrote: >> Robert Haas writes: >>> On Mon, Aug 2, 2010 at 10:49 PM, Tom Lane wrote: I'm tempted to suggest forgetting about any user-configurable parameter and just provide code that strcmp's the $EDITOR value to see if it recognizes the editor name, otherwise do nothing. >> >>> With all due respect, that sounds like an amazingly bad idea. Surely, >>> we'll be forever getting patches to add $MYFAVORITEEDITOR to the list, >>> or complaints that it's not already included. >> >> Well, yeah, that's the idea. I say that beats a constant stream of >> complaints that $MYFAVORITEEDITOR no longer works at all --- which >> is what your concern was, no? > > Well, it'd still work fine for \e foo. It'll just blow up for \e foo > 3. My concern isn't really that things that which work now will break > so much as that this new feature will fail to work for a large > percentage of the people who try to use it, including virtually > everyone who tries to use it on Win32. > >>> While this is >>> superficially a Nice Thing to Have and I would certainly support it if >>> +linenumber were relatively universal, it's really a pretty minor >>> convenience when you come right down to it, and I am not at all >>> convinced it is worth the hassle of trying to divine what piece of >>> syntax will equip the user's choice of editor with the necessary >>> amount of clue. >> >> The other approach we could take is that this whole thing is disabled by >> default, and you have to set a psql variable EDITOR_LINENUMBER_SWITCH >> to turn it on. If you haven't read the documentation enough to find >> out that variable exists, well, no harm no foul. > > That might be reasonable. Right now the default behavior is to do > +line on Linux and /line on Windows. But maybe a more sensible > default would be to fail with an error message that says "you have to > set thus-and-so variable if you want to use this feature". That seems > better than sometimes working and sometimes failing depending on what > editor the user has configured. > I like this idea > A side question is whether this should be an environment variable or a > psql variable. > it can be just psql variable. Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise Postgres Company > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: psql: edit function, show function commands patch
2010/8/3 Robert Haas : > On Sun, Aug 1, 2010 at 11:48 PM, Robert Haas wrote: >>> b) more robust algorithm for header rows identification >> >> Have not gotten to this one yet. > > I notIce that on WIN32 the default editor is notepad.exe and the > default editor navigation option is "/". Does "notepad.exe /lineno > filename" actually work on Windows? A quick Google search suggests > that the answer is "no", which seems somewhat unfortunate: it means > we'd be shipping "broken on Win32 out of the box". > > http://www.robvanderwoude.com/commandlineswitches.php#Notepad notapad supports nothing :( Microsoft doesn't use command line. I found this syntax in pspad. Next other editor is notepad++. It use a syntax -n. Wide used KDE editors use --line syntax > > This is actually my biggest concern about this patch - that it may be > just too much of a hassle to actually make it work for people. I just > tried setting $EDITOR to MacOS's TextEdit program, and it turns out > that TextEdit doesn't understand +. I'm afraid that we're going to > end up with a situation where it only works for people using emacs or > vi, and everyone else ends up with a broken install (and, possibly, no > clear idea how to fix it). Perhaps it would be better to accept \sf > and reject \sf+ and \ef . > \sf+ is base - we really need a source output with line numbers. \ef foo func is just user very friendly function. I agree, so this topic have to be better documented > Assuming we get past that threshold issue, I'm also wondering whether > the "navigation option" terminology is best; e.g. set > PSQL_EDITOR_NAVIGATION_OPTION to configure it. It doesn't seem > terrible, but it doesn't seem great, either. Anyone have a better > idea? > > The docs are a little rough; they could benefit from some editing by a > fluent English speaker. If anyone has time to work on this, it would > be much appreciated. > > Instead of "line number is unacceptable", I think you should write > "invalid line number". > > "dollar" should not be spelled "dolar". "function" should not be > spelled "finction". > > This change looks suspiciously like magic. If it's actually safe, it > needs a comment explaining why: > > - sys = pg_malloc(strlen(editorName) + strlen(fname) + 10 + 1); > + sys = pg_malloc(strlen(editorName) + strlen(fname) + 20 + 1); > > Attempting to compile with this patch applied emits a warning on my > machine; whether or not the warning is valid, you need to make it go > away: > > command.c: In function ‘HandleSlashCmds’: > command.c:1055: warning: ‘bsep’ may be used uninitialized in this function > command.c:1055: note: ‘bsep’ was declared here > > Why does the \sf output have a trailing blank line? This seems weird, > especially because \ef puts no such trailing blank line in the editor. > > Instead of: > > + /* skip useles whitespaces */ > + while (c >= func) > + if (isblank(*c)) > + c--; > + else > + break; > > ...wouldn't it be just as good to write: > > while (c >= func && isblank(*c)) > c--; > > (and similarly elsewhere) > > In extract_separator, if you invert the sense of the first if-test, > then you can avoid having to indent the entire function contents. > > -- I'' recheck these issue Pavel > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise Postgres Company > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: psql: edit function, show function commands patch
On Mon, Aug 2, 2010 at 11:17 PM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Aug 2, 2010 at 10:49 PM, Tom Lane wrote: >>> I'm tempted >>> to suggest forgetting about any user-configurable parameter and just >>> provide code that strcmp's the $EDITOR value to see if it recognizes the >>> editor name, otherwise do nothing. > >> With all due respect, that sounds like an amazingly bad idea. Surely, >> we'll be forever getting patches to add $MYFAVORITEEDITOR to the list, >> or complaints that it's not already included. > > Well, yeah, that's the idea. I say that beats a constant stream of > complaints that $MYFAVORITEEDITOR no longer works at all --- which > is what your concern was, no? Well, it'd still work fine for \e foo. It'll just blow up for \e foo 3. My concern isn't really that things that which work now will break so much as that this new feature will fail to work for a large percentage of the people who try to use it, including virtually everyone who tries to use it on Win32. >> While this is >> superficially a Nice Thing to Have and I would certainly support it if >> +linenumber were relatively universal, it's really a pretty minor >> convenience when you come right down to it, and I am not at all >> convinced it is worth the hassle of trying to divine what piece of >> syntax will equip the user's choice of editor with the necessary >> amount of clue. > > The other approach we could take is that this whole thing is disabled by > default, and you have to set a psql variable EDITOR_LINENUMBER_SWITCH > to turn it on. If you haven't read the documentation enough to find > out that variable exists, well, no harm no foul. That might be reasonable. Right now the default behavior is to do +line on Linux and /line on Windows. But maybe a more sensible default would be to fail with an error message that says "you have to set thus-and-so variable if you want to use this feature". That seems better than sometimes working and sometimes failing depending on what editor the user has configured. A side question is whether this should be an environment variable or a psql variable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (9.1) btree_gist support for searching on "not equals"
On Mon, 2010-08-02 at 12:27 -0400, Robert Haas wrote: > I was also wondering if it would be worth adding some additional > regression testing to contrib/btree_gist exercising this new > functionality. Thoughts? Sure. I attached two tests. Regards, Jeff Davis *** a/contrib/btree_gist/Makefile --- b/contrib/btree_gist/Makefile *** *** 11,17 DATA_built = btree_gist.sql DATA= uninstall_btree_gist.sql REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz time timetz \ ! date interval macaddr inet cidr text varchar char bytea bit varbit numeric ifdef USE_PGXS PG_CONFIG = pg_config --- 11,17 DATA= uninstall_btree_gist.sql REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz time timetz \ ! date interval macaddr inet cidr text varchar char bytea bit varbit numeric mixed ifdef USE_PGXS PG_CONFIG = pg_config *** /dev/null --- b/contrib/btree_gist/expected/mixed.out *** *** 0 --- 1,31 + SET enable_seqscan = 'false'; + -- test search for "not equals" + CREATE TABLE test_ne ( +a TIMESTAMP, +b NUMERIC + ); + CREATE INDEX test_ne_idx ON test_ne USING gist (a, b); + INSERT INTO test_ne SELECT '2009-01-01', 10.7 FROM generate_series(1,1000); + INSERT INTO test_ne VALUES('2007-02-03', -91.3); + INSERT INTO test_ne VALUES('2011-09-01', 43.7); + INSERT INTO test_ne SELECT '2009-01-01', 10.7 FROM generate_series(1,1000); + SELECT * FROM test_ne WHERE a <> '2009-01-01' AND b <> 10.7; + a | b + --+--- + Sat Feb 03 00:00:00 2007 | -91.3 + Thu Sep 01 00:00:00 2011 | 43.7 + (2 rows) + + -- test search for "not equals" using an exclusion constraint + CREATE TABLE zoo ( +cage INTEGER, +animal TEXT, +EXCLUDE USING gist (cage WITH =, animal WITH <>) + ); + NOTICE: CREATE TABLE / EXCLUDE will create implicit index "zoo_cage_animal_excl" for table "zoo" + INSERT INTO zoo VALUES(123, 'zebra'); + INSERT INTO zoo VALUES(123, 'zebra'); + INSERT INTO zoo VALUES(123, 'lion'); + ERROR: conflicting key value violates exclusion constraint "zoo_cage_animal_excl" + DETAIL: Key (cage, animal)=(123, lion) conflicts with existing key (cage, animal)=(123, zebra). + INSERT INTO zoo VALUES(124, 'lion'); *** /dev/null --- b/contrib/btree_gist/sql/mixed.sql *** *** 0 --- 1,30 + + SET enable_seqscan = 'false'; + + -- test search for "not equals" + + CREATE TABLE test_ne ( +a TIMESTAMP, +b NUMERIC + ); + CREATE INDEX test_ne_idx ON test_ne USING gist (a, b); + + INSERT INTO test_ne SELECT '2009-01-01', 10.7 FROM generate_series(1,1000); + INSERT INTO test_ne VALUES('2007-02-03', -91.3); + INSERT INTO test_ne VALUES('2011-09-01', 43.7); + INSERT INTO test_ne SELECT '2009-01-01', 10.7 FROM generate_series(1,1000); + + SELECT * FROM test_ne WHERE a <> '2009-01-01' AND b <> 10.7; + + -- test search for "not equals" using an exclusion constraint + + CREATE TABLE zoo ( +cage INTEGER, +animal TEXT, +EXCLUDE USING gist (cage WITH =, animal WITH <>) + ); + + INSERT INTO zoo VALUES(123, 'zebra'); + INSERT INTO zoo VALUES(123, 'zebra'); + INSERT INTO zoo VALUES(123, 'lion'); + INSERT INTO zoo VALUES(124, 'lion'); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: psql: edit function, show function commands patch
Robert Haas writes: > On Mon, Aug 2, 2010 at 10:49 PM, Tom Lane wrote: >> I'm tempted >> to suggest forgetting about any user-configurable parameter and just >> provide code that strcmp's the $EDITOR value to see if it recognizes the >> editor name, otherwise do nothing. > With all due respect, that sounds like an amazingly bad idea. Surely, > we'll be forever getting patches to add $MYFAVORITEEDITOR to the list, > or complaints that it's not already included. Well, yeah, that's the idea. I say that beats a constant stream of complaints that $MYFAVORITEEDITOR no longer works at all --- which is what your concern was, no? > While this is > superficially a Nice Thing to Have and I would certainly support it if > +linenumber were relatively universal, it's really a pretty minor > convenience when you come right down to it, and I am not at all > convinced it is worth the hassle of trying to divine what piece of > syntax will equip the user's choice of editor with the necessary > amount of clue. The other approach we could take is that this whole thing is disabled by default, and you have to set a psql variable EDITOR_LINENUMBER_SWITCH to turn it on. If you haven't read the documentation enough to find out that variable exists, well, no harm no foul. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: psql: edit function, show function commands patch
On Mon, Aug 2, 2010 at 10:49 PM, Tom Lane wrote: > Robert Haas writes: >> This is actually my biggest concern about this patch - that it may be >> just too much of a hassle to actually make it work for people. I just >> tried setting $EDITOR to MacOS's TextEdit program, and it turns out >> that TextEdit doesn't understand +. I'm afraid that we're going to >> end up with a situation where it only works for people using emacs or >> vi, and everyone else ends up with a broken install (and, possibly, no >> clear idea how to fix it). > > [ disclaimer: I've not looked at the proposed patch yet ] > > It seems like this ought to be fairly easily surmountable as long as > the patch is designed for failure. It isn't. > The fallback position is just that > the line number does nothing, ie \ef foo just opens the editor and > doesn't try to position the cursor anywhere special; nobody can complain > about that because it's no worse than before. What we need is to not > try to force positioning if we don't recognize the editor. Supposing for the moment that we could make it work that way, that would be reasonable. > I'm tempted > to suggest forgetting about any user-configurable parameter and just > provide code that strcmp's the $EDITOR value to see if it recognizes the > editor name, otherwise do nothing. With all due respect, that sounds like an amazingly bad idea. Surely, we'll be forever getting patches to add $MYFAVORITEEDITOR to the list, or complaints that it's not already included. While this is superficially a Nice Thing to Have and I would certainly support it if +linenumber were relatively universal, it's really a pretty minor convenience when you come right down to it, and I am not at all convinced it is worth the hassle of trying to divine what piece of syntax will equip the user's choice of editor with the necessary amount of clue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: psql: edit function, show function commands patch
Robert Haas writes: > This is actually my biggest concern about this patch - that it may be > just too much of a hassle to actually make it work for people. I just > tried setting $EDITOR to MacOS's TextEdit program, and it turns out > that TextEdit doesn't understand +. I'm afraid that we're going to > end up with a situation where it only works for people using emacs or > vi, and everyone else ends up with a broken install (and, possibly, no > clear idea how to fix it). [ disclaimer: I've not looked at the proposed patch yet ] It seems like this ought to be fairly easily surmountable as long as the patch is designed for failure. The fallback position is just that the line number does nothing, ie \ef foo just opens the editor and doesn't try to position the cursor anywhere special; nobody can complain about that because it's no worse than before. What we need is to not try to force positioning if we don't recognize the editor. I'm tempted to suggest forgetting about any user-configurable parameter and just provide code that strcmp's the $EDITOR value to see if it recognizes the editor name, otherwise do nothing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: psql: edit function, show function commands patch
On Sun, Aug 1, 2010 at 11:48 PM, Robert Haas wrote: >> b) more robust algorithm for header rows identification > > Have not gotten to this one yet. I notIce that on WIN32 the default editor is notepad.exe and the default editor navigation option is "/". Does "notepad.exe /lineno filename" actually work on Windows? A quick Google search suggests that the answer is "no", which seems somewhat unfortunate: it means we'd be shipping "broken on Win32 out of the box". http://www.robvanderwoude.com/commandlineswitches.php#Notepad This is actually my biggest concern about this patch - that it may be just too much of a hassle to actually make it work for people. I just tried setting $EDITOR to MacOS's TextEdit program, and it turns out that TextEdit doesn't understand +. I'm afraid that we're going to end up with a situation where it only works for people using emacs or vi, and everyone else ends up with a broken install (and, possibly, no clear idea how to fix it). Perhaps it would be better to accept \sf and reject \sf+ and \ef . Assuming we get past that threshold issue, I'm also wondering whether the "navigation option" terminology is best; e.g. set PSQL_EDITOR_NAVIGATION_OPTION to configure it. It doesn't seem terrible, but it doesn't seem great, either. Anyone have a better idea? The docs are a little rough; they could benefit from some editing by a fluent English speaker. If anyone has time to work on this, it would be much appreciated. Instead of "line number is unacceptable", I think you should write "invalid line number". "dollar" should not be spelled "dolar". "function" should not be spelled "finction". This change looks suspiciously like magic. If it's actually safe, it needs a comment explaining why: - sys = pg_malloc(strlen(editorName) + strlen(fname) + 10 + 1); + sys = pg_malloc(strlen(editorName) + strlen(fname) + 20 + 1); Attempting to compile with this patch applied emits a warning on my machine; whether or not the warning is valid, you need to make it go away: command.c: In function ‘HandleSlashCmds’: command.c:1055: warning: ‘bsep’ may be used uninitialized in this function command.c:1055: note: ‘bsep’ was declared here Why does the \sf output have a trailing blank line? This seems weird, especially because \ef puts no such trailing blank line in the editor. Instead of: + /* skip useles whitespaces */ + while (c >= func) + if (isblank(*c)) + c--; + else + break; ...wouldn't it be just as good to write: while (c >= func && isblank(*c)) c--; (and similarly elsewhere) In extract_separator, if you invert the sense of the first if-test, then you can avoid having to indent the entire function contents. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte charater set in levenshtein function
On Mon, Aug 2, 2010 at 5:07 PM, Alexander Korotkov wrote: > Now I think patch is as good as can be. :) OK, committed. > I'm going to prepare less-or-equal function in same manner as this patch. Sounds good. Since we're now more than half-way through this CommitFest and this patch has undergone quite a bit of change from what we started out with, I'd like to ask that you submit the new patch for the next CommitFest, to begin September 15th. https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where in the world is Itagaki Takahiro?
On 8/2/10 3:42 PM, Itagaki Takahiro wrote: > Sorry for delayed reply. I moved to a new job, > and was very busy for it. Congratulations! Are you still at NTT Open Source? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where in the world is Itagaki Takahiro?
2010/8/3 Kevin Grittner : > I haven't seen any posts or CF activity from Itagaki in over a week, > so I'm wondering how to handle some patches in the CF. Does anyone > know whether he's on vacation? Expected return? Sorry for delayed reply. I moved to a new job, and was very busy for it. My proposed patches should be returned or postponed to the next commitfest. I will restart to check patches I reviewed before in one or two weeks. I apologize to patch's authors for the delay. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Row-level Locks & SERIALIZABLE transactions, postgres vs. Oracle
Hi I've updated mvcc.sgml to explain the new serialization conflict rules for row-level locks, and added a paragraph to backend/executor/README that explains the implementation of those. I've chosen backend/executor/README because it already contains a description of UPDATE handling in READ COMMITTED mode, which seemed at least somewhat related. The patch now removes all code dealing with the crosscheck_snapshot, since the RI triggers should now be safe without that. I've also fixed the formatting of multi-line comments to match the coding style. I kept the braces around single-line "if" or "else" statements wherever both "if" and "else" blocks were present and one of them wasn't a single-line block. I think, in addition to the documentation changes this patch contains, that a section on how to write RI triggers in pl/pgsql would be nice to have. It's not strictly part of the documentation of this feature though, more a potential use-case, so I didn't tackle it for the moment. I do hope to find the time to write such a section, though, and if that happens I'll post it as a separate documentation-only patch. I've pushed the changes to the branch serializable_row_locks on git://github.com/fgp/postgres.git and also attached a patch. best regards, Florian Pflug serializable_row_locks.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Where in the world is Itagaki Takahiro?
I haven't seen any posts or CF activity from Itagaki in over a week, so I'm wondering how to handle some patches in the CF. Does anyone know whether he's on vacation? Expected return? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On 02/08/10 07:46, Pavel Stehule wrote: I have not any suggestions now - so I'll change flag to "ready to commit" sorry - contrib module should be a fixed patch attached Thanks Pavel, you saved me some time! Regards, -- Mike Fowler Registered Linux user: 379787 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte charater set in levenshtein function
2010/8/2 Alexander Korotkov : > The dump of the table with russian dictionary is in attachment. > > I use following tests: > SELECT SUM(levenshtein(a, 'foo')) from words; > SELECT SUM(levenshtein(a, 'Urbański')) FROM words; > SELECT SUM(levenshtein(a, 'ańs')) FROM words; > SELECT SUM(levenshtein(a, 'foo')) from words2; > SELECT SUM(levenshtein(a, 'дом')) FROM words2; > SELECT SUM(levenshtein(a, 'компьютер')) FROM words2; > > With your last version of patch results are: > 63ms 94ms 61ms 100ms 121ms 217ms. > > But I found some other optimization. With this condition: > if (x[x_char_len-1] == y[y_char_len-1] && x_char_len == y_char_len && > (x_char_len == 1 || char_same(x, y, x_char_len - 1))) > results become: > 58ms 96ms 63ms 102ms 107ms 171ms > > On single-byte characters results almost didn't change, but they come better > with multi-byte characters. Generally, this improvement is because first > bytes of different characters are frequently same (for example, almost all > Cyrillic characters start from same byte, and I think that there is similar > situation in other alphabets), but match of last bytes is just a > coincidence. Hey, that's really cool. It helps a lot here, too: My previous patch, with your 5 test cases: Time: 100.556 ms Time: 205.254 ms Time: 127.070 ms Time: 77.734 ms Time: 90.345 ms Time: 166.954 ms Your original patch, same 5 test cases: Time: 131.489 ms Time: 215.048 ms Time: 125.471 ms Time: 80.068 ms Time: 87.110 ms Time: 166.918 ms My patch, with your proposed change to compare the last character first, same 5 test cases: Time: 96.489 ms Time: 201.497 ms Time: 121.876 ms Time: 75.005 ms Time: 80.219 ms Time: 142.844 ms Updated patch attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company mb_levenshtein_rmh-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for check constraints using multiple inheritance
On Mon, Aug 2, 2010 at 2:56 PM, Yeb Havinga wrote: > I do not completely understand what you mean with the destruction of > reusability of ATOneLevelRecursion, currently only called by ATPrepAddColumn > - in the patch it is documented in the definition of relVisited that is it > visit info for each subcommand. The loop over subcommands is in > ATController, where the value is properly reset for each all current and > future subcommands. Hence the ATOneLevelRecursion routing is usable in its > current form for all callers during the prep stage, and not only > ATPrepAddColumn. Well, only if they happen to want the "visit each table only once" behavior, which might not be true for every command type. >> I am of the opinion that the chances of a unifying solution popping up >> are pretty much zero. :-) > > Me too, now I understand the fixes must be in the execution stages. OK. I'll go ahead and commit the patch upthread, so that the constraint bug is fixed, and then we can keep arguing about what do about the column bug, perhaps on a new thread. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lock_timeout GUC patch - Review
Robert Haas wrote: >> I wonder whether this patch shouldn't be rejected with a request >> that the timeout framework be submitted to the next CF. > I think "Returned with Feedback" would be more appropriate than > "Rejected", since we're asking for a rework, rather than saying - > we just don't want this. But otherwise, +1. Done. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lock_timeout GUC patch - Review
On Mon, Aug 2, 2010 at 3:09 PM, Kevin Grittner wrote: > Marc Cousin wrote: > >> This time, it's this case that doesn't work : > >> I really feel that the timeout framework is the way to go here. > > Since Zoltán also seems to feel this way: > > http://archives.postgresql.org/message-id/4c516c3a.6090...@cybertec.at > > I wonder whether this patch shouldn't be rejected with a request > that the timeout framework be submitted to the next CF. Does anyone > feel this approach (without the framework) should be pursued > further? I think "Returned with Feedback" would be more appropriate than "Rejected", since we're asking for a rework, rather than saying - we just don't want this. But otherwise, +1. Generally, I'm of the opinion that patches needing significant rework should be set to "Returned with Feedback" and resubmitted for the next CF; otherwise, it just gets unmanageable. Our goal for a CF should be to review everything thoroughly, not to get everything committed. Otherwise, we end up with a never-ending train of what are effectively new patches popping up, and it becomes impossible to close out the CommitFest on time. Reviewers and committers end up getting slammed, and it's not very much fun for patch authors (who are trying to frantically do last-minute rewrites) either; nor is it good for the overall quality of code the ends up in our tree. Better to take a breather and then start fresh. (If you don't believe in committer fatigue, look at the review I gave Itagaki Takahiro on the partitioning patch in January vs. the review I gave in July. One of those reviews is a whole lot more specific, detailed, and accurate than the other one...) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lock_timeout GUC patch - Review
Boszormenyi Zoltan wrote: > Kevin Grittner írta: >> I wonder whether this patch shouldn't be rejected with a request >> that the timeout framework be submitted to the next CF. Does >> anyone feel this approach (without the framework) should be >> pursued further? > > I certainly think so, the current scheme seems to be very fragile > and doesn't want to be extended. Sorry, I phrased that question in a rather confusing way; I'm not sure, but it sounds like you're in favor of dropping this approach and pursuing the timeout framework in the next CF -- is that right? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lock_timeout GUC patch - Review
Hi, Kevin Grittner írta: > Marc Cousin wrote: > > >> This time, it's this case that doesn't work : >> > > >> I really feel that the timeout framework is the way to go here. >> > > Since Zoltán also seems to feel this way: > > http://archives.postgresql.org/message-id/4c516c3a.6090...@cybertec.at > > I wonder whether this patch shouldn't be rejected with a request > that the timeout framework be submitted to the next CF. Does anyone > feel this approach (without the framework) should be pursued > further? > I certainly think so, the current scheme seems to be very fragile and doesn't want to be extended. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lock_timeout GUC patch - Review
Marc Cousin wrote: > This time, it's this case that doesn't work : > I really feel that the timeout framework is the way to go here. Since Zoltán also seems to feel this way: http://archives.postgresql.org/message-id/4c516c3a.6090...@cybertec.at I wonder whether this patch shouldn't be rejected with a request that the timeout framework be submitted to the next CF. Does anyone feel this approach (without the framework) should be pursued further? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for check constraints using multiple inheritance
Robert Haas wrote: I don't think that this is much cleaner than the global variable solution; you haven't really localized that need to know about the new flag in any meaningful way, the hacks in ATOneLevelRecusion() basically destroy any pretense of that code possibly being reusable for some other caller. However, there's a more serious problem, which is that it doesn't in general fix the bug: try it with the top1/top2/bottom/basement example I posted upthread. If you add the same column to both top1 and top2 and then drop it in both top1 and top2, basement ends up with a leftover copy. The problem is that "only visit each child once" is not the right algorithm; what you need to do is "only visit the descendents of each child if no merge happened at the parent". I believe that the only way to do this correct is to merge the prep stage into the execution stage, as the code for adding constraints already does. At the prep stage, you don't have enough information to determine which relations you'll ultimately need to update, since you don't know where the merges will happen. Hello Robert, Again thanks for looking at the patch. Unfortunately I missed the top1/top2 example earlier, but now I've seen it I understand that it is impossible to fix this problem during the prep stage, without looking at actual existing columns, i.e. without the merge code. Running the top1/top2 example I'd also have expected an error while adding the column to the second top, since the columns added to top1 and top2 are from a different origin. It is apparently considered good behaviour, however troubles emerge when e.g. trying to rename a_table_column in the top1/top2 example, where that is no problem in the 'lollipop' structure, that has a single origin. ALTER TABLE top RENAME COLUMN a_table_column TO another_table_column; SELECT t.oid, t.relname, a.attname, a.attinhcount FROM pg_class t JOIN pg_attribute a ON (a.attrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' AND a.attname LIKE '%table_column%' ORDER BY oid; I do not completely understand what you mean with the destruction of reusability of ATOneLevelRecursion, currently only called by ATPrepAddColumn - in the patch it is documented in the definition of relVisited that is it visit info for each subcommand. The loop over subcommands is in ATController, where the value is properly reset for each all current and future subcommands. Hence the ATOneLevelRecursion routing is usable in its current form for all callers during the prep stage, and not only ATPrepAddColumn. I am of the opinion that the chances of a unifying solution popping up are pretty much zero. :-) Me too, now I understand the fixes must be in the execution stages. regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compiling CVS HEAD with clang under OSX
On Mon, Aug 2, 2010 at 4:27 PM, Tom Lane wrote: > Here's the problem: if the compiler is allowed to assume that overflow > cannot happen, it is always going to be able to "prove" that the > if-test is constant false. This is inherent. Anybody claiming to > exhibit a safe way to code the overflow test is really only telling > you that today's version of their compiler isn't smart enough to make > the proof. So I'll do the next two parts of the dialogue myself: Questioner: So why not write that as: if ((arg1 > 0 && arg2 > 0 && arg1 > MAXINT - arg2) || (arg1 < 0 && arg2 < 0 && arg1 < MININT + arg2)) elog("overflow") else return arg1+arg2; Tom: Because that code is much more complex and prone to errors especially when you start getting into multiplication and other operations and it's also much slower than the code which allows overflow to happen and then checks that the result makes sense. I'm not entirely sure I agree. At least I haven't actually gone through all the arithmetic operations and I'm not sure how much more complex they get. If they were all at that level of complexity I think I would say we should go ahead and bite the performance bullet and do it the ultra-safe way. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (9.1) btree_gist support for searching on "not equals"
On Mon, Aug 2, 2010 at 2:39 AM, Jeff Davis wrote: > On Sun, 2010-08-01 at 21:57 -0400, Robert Haas wrote: >> On Fri, Jul 16, 2010 at 1:19 AM, Jeff Davis wrote: >> > Thank you for the review. >> > >> > On Mon, 2010-07-12 at 17:17 +0900, Itagaki Takahiro wrote: >> >> (1) Exclusion constraints support for operators where "x x" >> >> is false (tiny patch) >> >> https://commitfest.postgresql.org/action/patch_view?id=307 >> >> (2) btree_gist support for searching on <> ("not equals") >> >> https://commitfest.postgresql.org/action/patch_view?id=308 >> >> >> >> Those patches should be committed at once because (2) requires (1) to work >> >> with EXCLUDE constraints. Also, (1) has no benefits without (2) because we >> >> have no use cases for <> as an index-able operator. Both patches are very >> >> simple and small, and worked as expected both "WHERE <>" and EXCLUDE >> >> constraints cases. >> > >> > It appears that Tom already committed (1). >> > >> >> I'd like to ask you to write additional documentation about btree_gist [1] >> >> that the module will be more useful when it is used with exclusion >> >> constraints together. Without documentation, no users find the usages. >> > >> > Good idea, new patch attached. >> >> It seems pretty odd to define a constant called >> BTNotEqualStrategyNumber in contrib/btree_gist. Shouldn't we either >> call this something else, or define it in access/skey.h? Considering >> that there seem to be some interesting gymnastics being done with >> BTMaxStrategyNumber, I'd vote for the former. Maybe just >> BtreeGistNotEqualStrategyNumber? > > Sounds good to me. OK, committed that way. > At some point we may be interested to add this to BTree, as well. But we > can cross that bridge when we come to it. Yeah. I was also wondering if it would be worth adding some additional regression testing to contrib/btree_gist exercising this new functionality. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
2010/7/29 Alexander Korotkov : > But, in pg_trgm it makes it possible to combine different similarity levels > in one query. For example: > select * from test_trgm order by t <-> 'asdf' < 0.5 or t <-> 'qwer' < 0.4; > Is there any chance to handle this syntax also? Maybe I'm missing something, but I don't think that ORDER BY clause makes much sense. OR is going to reduce a true or false value - and it's usually not that interesting to order by a column that can only take one of two values. Am I confused? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for check constraints using multiple inheritance
On Mon, Aug 2, 2010 at 10:47 AM, Yeb Havinga wrote: >>> The attached patch uses the globally defined list. >> >> I can't speak for any other committer, but personally I'm prepared to >> reject out of hand any solution involving a global variable. This >> code is none to easy to follow as it is and adding global variables to >> the mix is not going to make it better, even if we can convince >> ourselves that it's safe and correct. This is something of a corner >> case, so I won't be crushed if a cleaner fix is too invasive to >> back-patch. > > Thanks for looking at the patch. I've attached a bit more wordy version, > that adds a boolean to AlteredTableInfo and a function to wipe that boolean > between processing of subcommands. I don't think that this is much cleaner than the global variable solution; you haven't really localized that need to know about the new flag in any meaningful way, the hacks in ATOneLevelRecusion() basically destroy any pretense of that code possibly being reusable for some other caller. However, there's a more serious problem, which is that it doesn't in general fix the bug: try it with the top1/top2/bottom/basement example I posted upthread. If you add the same column to both top1 and top2 and then drop it in both top1 and top2, basement ends up with a leftover copy. The problem is that "only visit each child once" is not the right algorithm; what you need to do is "only visit the descendents of each child if no merge happened at the parent". I believe that the only way to do this correct is to merge the prep stage into the execution stage, as the code for adding constraints already does. At the prep stage, you don't have enough information to determine which relations you'll ultimately need to update, since you don't know where the merges will happen. >> Incidentally, we need to shift this discussion to a new >> subject line; we have a working patch for the original subject of this >> thread, and are now discussing the related problem I found with >> attributes. >> > > Both solutions have changes in callers of 'find_inheritance_children'. I was > working in the hope a unifiying solution would pop up naturally, but so far > it has not. Checking of the new AlteredTableInfo->relVisited boolean could > perhaps be pushed down into find_inheritance_children, if multiple-'doing > things' for the childs under a multiple-inheriting relation is unwanted for > every kind of action. It seems to me that the question whether that is the > case must be answered, before the current working patch for coninhcount is > 'ready for committer'. I am of the opinion that the chances of a unifying solution popping up are pretty much zero. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compiling CVS HEAD with clang under OSX
Neil Conway writes: > FWIW, I think we should aim to eventually remove the dependency on > -fwrapv, and instead make the code correct under the semantics > guaranteed by the C spec. [ shrug... ] We've gone around on that before. I can't get excited about it, and the reason is that 99% of the places where we actually need -fwrapv behavior is in places where we are trying to test for overflow. The need for that can't be legislated away. As an example, in int4pl we have code like result = arg1 + arg2; if (some-test-on-result-and-arg1-and-arg2) elog(ERROR, "overflow"); Here's the problem: if the compiler is allowed to assume that overflow cannot happen, it is always going to be able to "prove" that the if-test is constant false. This is inherent. Anybody claiming to exhibit a safe way to code the overflow test is really only telling you that today's version of their compiler isn't smart enough to make the proof. Next year, who knows? I'm uninterested in getting into that kind of arms race with the compiler authors. I prefer to keep the goalposts exactly where they've been for the past forty years, namely -fwrapv semantics. If the compiler guys actually were serious about helping people write code that didn't need -fwrapv, they would provide primitives for testing for arithmetic overflow. I would be ecstatic if we could write int4pl like this: if (sum_overflows(arg1, arg2)) elog(ERROR, "overflow"); else return arg1 + arg2; especially since with a halfway decent compiler this sort of construction wouldn't even require doing the addition twice --- presumably the compiler could see that it had a common subexpression there, and generate code that consists of one addition plus a branch-on-overflow test. This'd be clean, readable, and far more efficient than the hacks we have to resort to now. Short of that, though, -fwrapv is what it's gonna be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lock_timeout GUC patch - Review
The Monday 02 August 2010 13:59:59, Boszormenyi Zoltan wrote : > > > > Also, I made sure that only one or two timeout causes (one of > > deadlock_timeout > > and lock_timeout in the first case or statement_timeout plus one of the > > other two) > > can be active at a time. > > A little clarification is needed. The above statement is not entirely true. > There can be a case when all three timeout causes can be active, but only > when deadlock_timeout is the smallest of the three. If the fin_time value > for the another timeout cause is smaller than for deadlock_timeout then > there's no point to make deadlock_timeout active. And in case > deadlock_timeout > triggers and CheckDeadLock() finds that there really is a deadlock then the > possibly active lock_timeout gets disabled to avoid calling > RemoveFromWaitQueue() twice. I hope the comments in the code explain it > well. > > > Previously I was able to trigger a segfault > > with the default > > 1sec deadlock_timeout and lock_timeout = 999 or 1001. Effectively, the > > system's > > clock resolution makes the lock_timeout and deadlock_timeout equal and > > RemoveFromWaitQueue() was called twice. This way it's a lot more robust. > > > Ok, I've tested this new version: This time, it's this case that doesn't work : Session 1 : lock a table Session 2 : set lock_timeout to a large value, just to make it obvious (10 seconds). It times out after 1 s (the deadlock timeout), returning 'could not obtain lock'. Of course, it should wait 10 seconds. I really feel that the timeout framework is the way to go here. I know what you said about it before, but I think that the current code is getting too complicated, with too many special cases all over. As this is only my second review and I'm sure I am missing things here, could someone with more experience have a look and give advice ? Cheers Marc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte charater set in levenshtein function
2010/8/2 Alexander Korotkov : > On Mon, Aug 2, 2010 at 5:20 AM, Robert Haas wrote: >> I reviewed this code in a fair amount of detail today and ended up >> rewriting it. In general terms, it's best to avoid changing things >> that are not relevant to the central purpose of the patch. This patch >> randomly adds a whole bunch of whitespace and removes a whole bunch of >> comments which I find useful and do not want to see removed. It took >> me about an hour to reverse out all of those changes, which then let >> me get down to the business of actually analyzing the code. > I'm sorry. This is my first patch, I will be more accurate next time. But, I > think there is no unified opinion about some changes like replacement "!m" > with "m==0". Sure; if that were the only such change, I wouldn't have mentioned it. > I think this line is not correct: > "if (m != s_bytes && n != t_bytes)" Good catch, fixed in the attached version. >> The attached patch takes the approach of using a fast-path for just >> the innermost loop when neither string contains any multi-byte >> characters (regardless of whether or not the encoding allows >> multi-byte characters). It's still slower than CVS HEAD, but for >> strings containing no multi-byte characters it's only marginally, if >> at all, slower than previous releases, because 9.0 doesn't contain >> your fix to avoid the extra string copy; and it's faster than your >> implementation even when multi-byte characters ARE present. It is >> slightly slower on single-byte encoding databases (I tested >> SQL_ASCII), but still faster than previous releases. It's also quite >> a bit less code duplication. > > Can I look at the test which shows that your implementation is faster than > my when multi-byte characters are present? My tests show reverse. For > example, with russian dictionary of openoffice: Hmm, with the correction you mention above, I'm getting about the same results with multi-byte characters (but faster with only single-byte characters). I did this: CREATE TABLE words (a text not null primary key); COPY words from '/usr/share/dict/words'; SELECT SUM(levenshtein(a, 'foo')) from words; SELECT SUM(levenshtein(a, 'Urbański')) FROM words; SELECT SUM(levenshtein(a, 'ańs')) FROM words; With the updated patch attached below, these ran about 102 ms, 222 ms, 129 ms. With your patch, I get about 134 ms, 222 ms, 130 ms. Perhaps this is because I only have multi-byte characters in one of the inputs, not both. I don't have a dictionary handy with multi-byte words in it. Can you send me a file? > I think that the cause of slow down slowdown is memcmp function. This > function is very fast for long parts of memory, but of few bytes inline > function is faster, because compiler have more freedom for optimization. > After replacing memcmp with inline function like following in your > implementation: Yeah, that does seem to be slightly better. I've done a version of this in the attached patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company mb_levenshtein_rmh-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for check constraints using multiple inheritance
Robert Haas wrote: On Mon, Aug 2, 2010 at 9:20 AM, Yeb Havinga wrote: The attached patch uses the globally defined list. I can't speak for any other committer, but personally I'm prepared to reject out of hand any solution involving a global variable. This code is none to easy to follow as it is and adding global variables to the mix is not going to make it better, even if we can convince ourselves that it's safe and correct. This is something of a corner case, so I won't be crushed if a cleaner fix is too invasive to back-patch. Hello Robert, Thanks for looking at the patch. I've attached a bit more wordy version, that adds a boolean to AlteredTableInfo and a function to wipe that boolean between processing of subcommands. Incidentally, we need to shift this discussion to a new subject line; we have a working patch for the original subject of this thread, and are now discussing the related problem I found with attributes. Both solutions have changes in callers of 'find_inheritance_children'. I was working in the hope a unifiying solution would pop up naturally, but so far it has not. Checking of the new AlteredTableInfo->relVisited boolean could perhaps be pushed down into find_inheritance_children, if multiple-'doing things' for the childs under a multiple-inheriting relation is unwanted for every kind of action. It seems to me that the question whether that is the case must be answered, before the current working patch for coninhcount is 'ready for committer'. regards, Yeb Havinga diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 49a6f73..d43efc3 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -99,7 +99,6 @@ typedef struct OnCommitItem static List *on_commits = NIL; - /* * State information for ALTER TABLE * @@ -132,6 +131,7 @@ typedef struct AlteredTableInfo Oid relid; /* Relation to work on */ charrelkind;/* Its relkind */ TupleDesc oldDesc;/* Pre-modification tuple descriptor */ + boolrelVisited; /* Was the rel visited before, for the current subcommand */ /* Information saved by Phase 1 for Phase 2: */ List *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */ /* Information saved by Phases 1/2 for Phase 3: */ @@ -335,6 +335,7 @@ static void ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode) static void copy_relation_data(SMgrRelation rel, SMgrRelation dst, ForkNumber forkNum, bool istemp); static const char *storage_name(char c); +static void ATResetRelVisited(List **wqueue); /* @@ -2583,6 +2584,7 @@ ATController(Relation rel, List *cmds, bool recurse, LOCKMODE lockmode) { AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd); + ATResetRelVisited(&wqueue); ATPrepCmd(&wqueue, rel, cmd, recurse, false, lockmode); } @@ -3503,6 +3505,23 @@ ATGetQueueEntry(List **wqueue, Relation rel) } /* + * ATResetRelVisited: reset the visitation info for each rel in the working + * queue, so it can be used for the next subcommand. + */ +static void +ATResetRelVisited(List **wqueue) +{ + AlteredTableInfo *tab; + ListCell *ltab; + + foreach(ltab, *wqueue) + { + tab = (AlteredTableInfo *) lfirst(ltab); + tab->relVisited = false; + } +} + +/* * ATSimplePermissions * * - Ensure that it is a relation (or possibly a view) @@ -3618,19 +3637,29 @@ ATSimpleRecursion(List **wqueue, Relation rel, /* * ATOneLevelRecursion * - * Here, we visit only direct inheritance children. It is expected that - * the command's prep routine will recurse again to find indirect children. - * When using this technique, a multiply-inheriting child will be visited - * multiple times. + * Here, we visit only direct inheritance children. It is expected that the + * command's prep routine will recurse again to find indirect children. When + * using this technique, a multiple-inheriting child will be visited multiple + * times. Childs of multiple-inheriting childs however are only visited once + * for each parent. */ static void ATOneLevelRecursion(List **wqueue, Relation rel, AlterTableCmd *cmd, LOCKMODE lockmode) { Oid relid = RelationGetRelid(rel); + AlteredTableInfo *tab = ATGetQueueEntry(wqueue, rel); ListCell *child; List *children; + /* If we already visited the current multiple-inheriting relation, we +* mustn't recurse to it's child tables, because they've already been +* visited. Visiting them would lead to an incorrect value for +* attinhcount. */ + if (tab->relVisited
Re: [HACKERS] english parser in text search: support for multiple words in the same position
On Mon, Aug 2, 2010 at 10:21 AM, Kevin Grittner wrote: > Sushant Sinha wrote: > >> Yes thats what I am planning to do. I just wanted to see if anyone >> can help me in estimating whether this is doable in the current >> parser or I need to write a new one. If possible, then some idea >> on how to go about implementing? > > The current tsearch parser is a state machine which does clunky mode > switches to handle special cases like you describe. If you're > looking at doing very much in there, you might want to consider a > rewrite to something based on regular expressions. See discussion > in these threads: > > http://archives.postgresql.org/message-id/200912102005.16560.and...@anarazel.de > > http://archives.postgresql.org/message-id/4b210d9e02250002d...@gw.wicourts.gov > > That was actually at the top of my personal PostgreSQL TODO list > (after my current project is wrapped up), but I wouldn't complain if > someone else wanted to take it. :-) If you end up rewriting it, it may be a good idea, in the initial rewrite, to mimic the current results as closely as possible - and then submit a separate patch to change the results. Changing two things at the same time exponentially increases the chance of your patch getting rejected. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] english parser in text search: support for multiple words in the same position
Sushant Sinha wrote: > Yes thats what I am planning to do. I just wanted to see if anyone > can help me in estimating whether this is doable in the current > parser or I need to write a new one. If possible, then some idea > on how to go about implementing? The current tsearch parser is a state machine which does clunky mode switches to handle special cases like you describe. If you're looking at doing very much in there, you might want to consider a rewrite to something based on regular expressions. See discussion in these threads: http://archives.postgresql.org/message-id/200912102005.16560.and...@anarazel.de http://archives.postgresql.org/message-id/4b210d9e02250002d...@gw.wicourts.gov That was actually at the top of my personal PostgreSQL TODO list (after my current project is wrapped up), but I wouldn't complain if someone else wanted to take it. :-) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] english parser in text search: support for multiple words in the same position
Sushant Sinha writes: >> This would needlessly increase the number of tokens. Instead you'd >> better make it work like compound word support, having just "wikipedia" >> and "org" as tokens. > The current text parser already returns url and url_path. That already > increases the number of unique tokens. I am only asking for adding of > normal english words as well so that if someone types only "wikipedia" > he gets a match. The suggestion to make it work like compound words is still a good one, ie given wikipedia.org you'd get back hostwikipedia.org host-part wikipedia host-part org not just the "host" token as at present. Then the user could decide whether he needed to index hostname components or not, by choosing whether to forward hostname-part tokens to a dictionary or just discard them. If you submit a patch that tries to force the issue by classifying hostname parts as plain words, it'll probably get rejected out of hand on backwards-compatibility grounds. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] english parser in text search: support for multiple words in the same position
On Mon, 2010-08-02 at 09:32 -0400, Robert Haas wrote: > On Mon, Aug 2, 2010 at 9:12 AM, Sushant Sinha wrote: > > The current text parser already returns url and url_path. That already > > increases the number of unique tokens. I am only asking for adding of > > normal english words as well so that if someone types only "wikipedia" > > he gets a match. > [...] > > Postgres english parser already emits urls as tokens. Only thing I am > > asking is on improving the tokenization and positioning. > > Can you write a patch to implement your idea? > Yes thats what I am planning to do. I just wanted to see if anyone can help me in estimating whether this is doable in the current parser or I need to write a new one. If possible, then some idea on how to go about implementing? -Sushant. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for check constraints using multiple inheritance
On Mon, Aug 2, 2010 at 9:20 AM, Yeb Havinga wrote: > Robert Haas wrote: >> >> I agree that's the crux of the problem, but I can't see solving it >> with a global variable. I realize you were just testing... > > Yes it was just a test. However, somewhere information must be kept or > altered so it can be detected that a relation has already been visited, i.e. > it is a multiple inheriting child. The other solutions I could think of are > more intrusive (i.e. definitionin ATController and passing as parameter). > > The attached patch uses the globally defined list. After ATPrepCmd the list > pointer is reset to NIL, the list not freed since the allocs are done in a > memory context soon to be deleted (PortalHeapMemory). It passes regression > as well as the script below. I can't speak for any other committer, but personally I'm prepared to reject out of hand any solution involving a global variable. This code is none to easy to follow as it is and adding global variables to the mix is not going to make it better, even if we can convince ourselves that it's safe and correct. This is something of a corner case, so I won't be crushed if a cleaner fix is too invasive to back-patch. Incidentally, we need to shift this discussion to a new subject line; we have a working patch for the original subject of this thread, and are now discussing the related problem I found with attributes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] english parser in text search: support for multiple words in the same position
On Mon, Aug 2, 2010 at 9:12 AM, Sushant Sinha wrote: > The current text parser already returns url and url_path. That already > increases the number of unique tokens. I am only asking for adding of > normal english words as well so that if someone types only "wikipedia" > he gets a match. [...] > Postgres english parser already emits urls as tokens. Only thing I am > asking is on improving the tokenization and positioning. Can you write a patch to implement your idea? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] english parser in text search: support for multiple words in the same position
Hi, On 08/02/2010 03:12 PM, Sushant Sinha wrote: The current text parser already returns url and url_path. That already increases the number of unique tokens. Well, I think I simply turned that off to be able to search for plain words. It still works for complete URLs, those are just treated like text, then. Earlier people have expressed the need to index urls/emails and currently the text parser already does so. Reverting that would be a regression of functionality. Further, a ranking function can take advantage of direct match of a token. That's a point, yes. However, simply making the same string turn up twice in the tokenizer's output doesn't sound like the right solution to me. Especially considering that the query parser uses the very same tokenizer. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres as Historian
On 02/08/2010 3:20 AM, Hardik Belani wrote: We are using postgres as RDBMS for our product. There is a requirement coming for a feature which will require me to store data about various data points (mainly numbers) on a time scale. Data measurement is being taken every few secs/mins based and it is needed to be stored for statistical analysis. Now this requires numbers (integers/floats) to be stored at every mins. For this i can create a table with number and time (may be time offset instead of timestamp) as columns. But still it will require me to store huge number of rows in the order of few millions. Data is read only and only inserts can happen. But I need to perform all kinds of aggregation to get various statistics. for example: daily avg, monthly avg etc.. We already are using postgres for our product so using postgres does not add any additional installation requirement and hence it is a bit easier. Would you recommand postgres for this kind of requirement and will be provide the performance. OR do you recommand any other database meant for such requirements. I am also searching for a good historian database if postgres doesn't suppport. Thanks, Hardik Hi Hardik, Data warehousing techniques could help you with your requirements of aggregating large amounts of data. Have a look at "The Data Warehouse Toolkit" by R. Kimball on how to design a star schema with aggregate tables (these can be done as materialized views using PL/pgSQL and triggers under postgres). You could also use an OLAP server (e.g. Mondrian, which pretty nice and open source as well) on top of your postgres DB, as it can use aggregate tables transparently when needed. Etienne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for check constraints using multiple inheritance
Robert Haas wrote: I agree that's the crux of the problem, but I can't see solving it with a global variable. I realize you were just testing... Yes it was just a test. However, somewhere information must be kept or altered so it can be detected that a relation has already been visited, i.e. it is a multiple inheriting child. The other solutions I could think of are more intrusive (i.e. definitionin ATController and passing as parameter). The attached patch uses the globally defined list. After ATPrepCmd the list pointer is reset to NIL, the list not freed since the allocs are done in a memory context soon to be deleted (PortalHeapMemory). It passes regression as well as the script below. regards, Yeb Havinga DROP SCHEMA IF EXISTS test_inheritance CASCADE; CREATE SCHEMA test_inheritance; SET search_path TO test_inheritance; CREATE TABLE top (i int); CREATE TABLE mid1 () INHERITS (top); CREATE TABLE mid2 () INHERITS (top); CREATE TABLE bottom () INHERITS (mid1, mid2); CREATE TABLE basement () INHERITS (bottom); ALTER TABLE top ADD COLUMN a_table_column integer, ADD COLUMN a_table_column2 integer; ALTER TABLE top ADD COLUMN a_table_column3 integer; ALTER TABLE top ADD CONSTRAINT a_check_constraint CHECK (i IN (0,1)), ADD CONSTRAINT a_check_constraint2 CHECK (i IN (0,1)); ALTER TABLE top ADD CONSTRAINT a_check_constraint3 CHECK (i IN (0,1)); SELECT t.oid, t.relname, a.attinhcount FROM pg_class t JOIN pg_attribute a ON (a.attrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' AND a.attname LIKE 'a_table_column%' ORDER BY oid; SELECT t.oid, t.relname, c.coninhcount FROM pg_class t JOIN pg_constraint c ON (c.conrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' ORDER BY oid; diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 49a6f73..08efffc 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -99,6 +99,10 @@ typedef struct OnCommitItem static List *on_commits = NIL; +/* + * Per subcommand history of relids visited in an inheritance hierarchy. + */ +static List *visited_relids = NIL; /* * State information for ALTER TABLE @@ -2584,6 +2588,7 @@ ATController(Relation rel, List *cmds, bool recurse, LOCKMODE lockmode) AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd); ATPrepCmd(&wqueue, rel, cmd, recurse, false, lockmode); + visited_relids = NIL; } /* Close the relation, but keep lock until commit */ @@ -3618,10 +3623,11 @@ ATSimpleRecursion(List **wqueue, Relation rel, /* * ATOneLevelRecursion * - * Here, we visit only direct inheritance children. It is expected that - * the command's prep routine will recurse again to find indirect children. - * When using this technique, a multiply-inheriting child will be visited - * multiple times. + * Here, we visit only direct inheritance children. It is expected that the + * command's prep routine will recurse again to find indirect children. When + * using this technique, a multiple-inheriting child will be visited multiple + * times. Childs of multiple-inheriting childs however are only visited once + * for each parent. */ static void ATOneLevelRecursion(List **wqueue, Relation rel, @@ -3631,6 +3637,14 @@ ATOneLevelRecursion(List **wqueue, Relation rel, ListCell *child; List *children; + /* If we already visited the current multiple-inheriting relation, we +* mustn't recurse to it's child tables, because they've already been +* visited. Visiting them would lead to an incorrect value for +* attinhcount. */ + if (list_member_oid(visited_relids, relid)) + return; + visited_relids = lappend_oid(visited_relids, relid); + children = find_inheritance_children(relid, lockmode); foreach(child, children) @@ -4891,6 +4905,15 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel, CommandCounterIncrement(); /* +* If the constraint got merged with an existing constraint, we're done. +* We mustn't recurse to child tables in this case, because they've already +* got the constraint, and visiting them again would lead to an incorrect +* value for coninhcount. +*/ + if (newcons == NIL) + return; + + /* * Propagate to children as appropriate. Unlike most other ALTER * routines, we have to do this one level of recursion at a time; we can't * use find_all_inheritors to do it in one pass. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] english parser in text search: support for multiple words in the same position
> On 08/01/2010 08:04 PM, Sushant Sinha wrote: > > 1. We do not have separate tokens "wikipedia" and "org" > > 2. If we have the two tokens we should have them at adjacent position so > > that a phrase search for "wikipedia org" should work. > > This would needlessly increase the number of tokens. Instead you'd > better make it work like compound word support, having just "wikipedia" > and "org" as tokens. The current text parser already returns url and url_path. That already increases the number of unique tokens. I am only asking for adding of normal english words as well so that if someone types only "wikipedia" he gets a match. > > Searching for "wikipedia.org" or "wikipedia org" should then result in > the same search query with the two tokens: "wikipedia" and "org". Earlier people have expressed the need to index urls/emails and currently the text parser already does so. Reverting that would be a regression of functionality. Further, a ranking function can take advantage of direct match of a token. > > position 0: WORD(wikipedia), URL(wikipedia.org/search?q=sushant) > > IMO the differentiation between WORDs and URLs is not something the text > search engine should have to take care a lot. Let it just do the > searching and make it do that well. Postgres english parser already emits urls as tokens. Only thing I am asking is on improving the tokenization and positioning. > What does a token "wikipedia.org/search?q=sushant" buy you in terms of > text searching? Or even result highlighting? I wouldn't expect anybody > to want to search for a full URL, do you? There have been need expressed in past. And an exact token match can result in better ranking functions. For example, a tf-idf ranking will rank matching of such unique tokens significantly higher. -Sushant. > Regards > > Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication
On Mon, Aug 2, 2010 at 8:57 AM, Yeb Havinga wrote: > Fujii Masao wrote: >> >> On Mon, Aug 2, 2010 at 7:53 PM, Robert Haas wrote: >> >>> >>> Let's not get *the manner of specifying the policy* confused with *the >>> need to update the policy when the master changes*. It doesn't seem >>> likely you would want the same value for synchronous_standbys on all >>> your machines. In the most common configuration, you'd probably have: >>> >>> on A: synchronous_standbys=B >>> on B: synchronous_standbys=A >>> >> >> Oh, true. But, what if we have another synchronous standby called C? >> We specify the policy as follows?: >> >> on A: synchronous_standbys=B,C >> on B: synchronous_standbys=A,C >> on C: synchronous_standbys=A,B >> >> We would need to change the setting on both A and B when we want to >> change the name of the third standby from C to D, for example. No? >> > > What if the master is named as well in the 'pool of servers that are in > sync'? In the scenario above this pool would be A,B,C. Working with this > concept has as benefit that the setting can be copied to all other servers > as well, and is invariant under any number of failures or switchovers. The > same could also hold for quorum expressions like A && (B || C), if A,B,C are > either master or standby. > > I initially though that once the definitions could be the same on all > servers, having them in a system catalog would be a good thing. However > that'd propably hard to setup, and also in the case of failures during > change of the parameters it could become very messy. Yeah, I think this information has to be stored either in GUCs or in a flat-file somewhere. Putting it in a system catalog will cause major problems when trying to get a down system back up, I think. I suspect that for complex setups, people will need to use some kind of cluster-ware to update the settings as nodes go up and down. But I think it will still be simpler if the nodes are named. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication
Fujii Masao wrote: On Mon, Aug 2, 2010 at 7:53 PM, Robert Haas wrote: Let's not get *the manner of specifying the policy* confused with *the need to update the policy when the master changes*. It doesn't seem likely you would want the same value for synchronous_standbys on all your machines. In the most common configuration, you'd probably have: on A: synchronous_standbys=B on B: synchronous_standbys=A Oh, true. But, what if we have another synchronous standby called C? We specify the policy as follows?: on A: synchronous_standbys=B,C on B: synchronous_standbys=A,C on C: synchronous_standbys=A,B We would need to change the setting on both A and B when we want to change the name of the third standby from C to D, for example. No? What if the master is named as well in the 'pool of servers that are in sync'? In the scenario above this pool would be A,B,C. Working with this concept has as benefit that the setting can be copied to all other servers as well, and is invariant under any number of failures or switchovers. The same could also hold for quorum expressions like A && (B || C), if A,B,C are either master or standby. I initially though that once the definitions could be the same on all servers, having them in a system catalog would be a good thing. However that'd propably hard to setup, and also in the case of failures during change of the parameters it could become very messy. regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication
On Mon, Aug 2, 2010 at 8:32 PM, Robert Haas wrote: > Sure. If you give the standbys names, then if people change the > names, they'll have to update their configuration. But I can't see > that as an argument against doing it. You can remove the possibility > that someone will have a hassle if they rename a server by not > allowing them to give it a name in the first place, but that doesn't > seem like a win from a usability perspective. I'm just comparing your idea (i.e., set synchronous_standbys on each possible master) with my idea (i.e., set replication_mode on each standby). Though your idea has the advantage described in the following post, it seems to make the setup of the standbys more complicated, as I described. So I'm trying to generate better idea. http://archives.postgresql.org/pgsql-hackers/2010-08/msg7.php Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lock_timeout GUC patch - Review
Boszormenyi Zoltan írta: > Marc Cousin írta: > >> The Thursday 29 July 2010 13:55:38, Boszormenyi Zoltan wrote : >> >> >>> I fixed this by adding CheckLockTimeout() function that works like >>> CheckStatementTimeout() and ensuring that the same start time is >>> used for both deadlock_timeout and lock_timeout if both are active. >>> The preference of errors if their timeout values are equal is: >>> statement_timeout > lock_timeout > deadlock_timeout >>> >>> >> As soon as lock_timeout is bigger than deadlock_timeout, it doesn't >> work, with this new version. >> >> Keeping the deadlock_timeout to 1s, when lock_timeout >= 1001, >> lock_timeout doesn't trigger anymore. >> >> > > I missed one case when the lock_timeout_active should have been set > but the timer must not have been re-set, this caused the problem. > I blame the hot weather and having no air conditioning. The second is > now fixed. :-) > > I also added one line in autovacuum.c to disable lock_timeout, > in case it's globally set in postgresq.conf as per Alvaro's comment. > > Also, I made sure that only one or two timeout causes (one of > deadlock_timeout > and lock_timeout in the first case or statement_timeout plus one of the > other two) > can be active at a time. A little clarification is needed. The above statement is not entirely true. There can be a case when all three timeout causes can be active, but only when deadlock_timeout is the smallest of the three. If the fin_time value for the another timeout cause is smaller than for deadlock_timeout then there's no point to make deadlock_timeout active. And in case deadlock_timeout triggers and CheckDeadLock() finds that there really is a deadlock then the possibly active lock_timeout gets disabled to avoid calling RemoveFromWaitQueue() twice. I hope the comments in the code explain it well. > Previously I was able to trigger a segfault > with the default > 1sec deadlock_timeout and lock_timeout = 999 or 1001. Effectively, the > system's > clock resolution makes the lock_timeout and deadlock_timeout equal and > RemoveFromWaitQueue() was called twice. This way it's a lot more robust. > Best regards, Zoltán Böszörményi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication
On Mon, Aug 2, 2010 at 7:06 AM, Fujii Masao wrote: > On Mon, Aug 2, 2010 at 7:53 PM, Robert Haas wrote: >> Let's not get *the manner of specifying the policy* confused with *the >> need to update the policy when the master changes*. It doesn't seem >> likely you would want the same value for synchronous_standbys on all >> your machines. In the most common configuration, you'd probably have: >> >> on A: synchronous_standbys=B >> on B: synchronous_standbys=A > > Oh, true. But, what if we have another synchronous standby called C? > We specify the policy as follows?: > > on A: synchronous_standbys=B,C > on B: synchronous_standbys=A,C > on C: synchronous_standbys=A,B > > We would need to change the setting on both A and B when we want to > change the name of the third standby from C to D, for example. No? Sure. If you give the standbys names, then if people change the names, they'll have to update their configuration. But I can't see that as an argument against doing it. You can remove the possibility that someone will have a hassle if they rename a server by not allowing them to give it a name in the first place, but that doesn't seem like a win from a usability perspective. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lock_timeout GUC patch - Review
Marc Cousin írta: > The Thursday 29 July 2010 13:55:38, Boszormenyi Zoltan wrote : > >> I fixed this by adding CheckLockTimeout() function that works like >> CheckStatementTimeout() and ensuring that the same start time is >> used for both deadlock_timeout and lock_timeout if both are active. >> The preference of errors if their timeout values are equal is: >> statement_timeout > lock_timeout > deadlock_timeout >> > > As soon as lock_timeout is bigger than deadlock_timeout, it doesn't > work, with this new version. > > Keeping the deadlock_timeout to 1s, when lock_timeout >= 1001, > lock_timeout doesn't trigger anymore. > I missed one case when the lock_timeout_active should have been set but the timer must not have been re-set, this caused the problem. I blame the hot weather and having no air conditioning. The second is now fixed. :-) I also added one line in autovacuum.c to disable lock_timeout, in case it's globally set in postgresq.conf as per Alvaro's comment. Also, I made sure that only one or two timeout causes (one of deadlock_timeout and lock_timeout in the first case or statement_timeout plus one of the other two) can be active at a time. Previously I was able to trigger a segfault with the default 1sec deadlock_timeout and lock_timeout = 999 or 1001. Effectively, the system's clock resolution makes the lock_timeout and deadlock_timeout equal and RemoveFromWaitQueue() was called twice. This way it's a lot more robust. Best regards, Zoltán Böszörményi diff -dcrpN pgsql.orig/doc/src/sgml/config.sgml pgsql/doc/src/sgml/config.sgml *** pgsql.orig/doc/src/sgml/config.sgml 2010-07-26 10:05:37.0 +0200 --- pgsql/doc/src/sgml/config.sgml 2010-07-29 11:58:56.0 +0200 *** COPY postgres_log FROM '/full/path/to/lo *** 4479,4484 --- 4479,4508 + + lock_timeout (integer) + +lock_timeout configuration parameter + + + + Abort any statement that tries to acquire a heavy-weight lock (e.g. rows, + pages, tables, indices or other objects) and the lock has to wait more + than the specified number of milliseconds, starting from the time the + command arrives at the server from the client. + If log_min_error_statement is set to ERROR or lower, + the statement that timed out will also be logged. A value of zero + (the default) turns off the limitation. + + + + Setting lock_timeout in + postgresql.conf is not recommended because it + affects all sessions. + + + + vacuum_freeze_table_age (integer) diff -dcrpN pgsql.orig/doc/src/sgml/ref/lock.sgml pgsql/doc/src/sgml/ref/lock.sgml *** pgsql.orig/doc/src/sgml/ref/lock.sgml 2010-04-03 09:23:01.0 +0200 --- pgsql/doc/src/sgml/ref/lock.sgml 2010-07-29 11:58:56.0 +0200 *** LOCK [ TABLE ] [ ONLY ] NOWAIT is specified, LOCK TABLE does not wait to acquire the desired lock: if it cannot be acquired immediately, the command is aborted and an !error is emitted. Once obtained, the lock is held for the !remainder of the current transaction. (There is no UNLOCK TABLE command; locks are always released at transaction end.) --- 39,49 NOWAIT is specified, LOCK TABLE does not wait to acquire the desired lock: if it cannot be acquired immediately, the command is aborted and an !error is emitted. If lock_timeout is set to a value !higher than 0, and the lock cannot be acquired under the specified !timeout value in milliseconds, the command is aborted and an error !is emitted. Once obtained, the lock is held for the remainder of !the current transaction. (There is no UNLOCK TABLE command; locks are always released at transaction end.) diff -dcrpN pgsql.orig/doc/src/sgml/ref/select.sgml pgsql/doc/src/sgml/ref/select.sgml *** pgsql.orig/doc/src/sgml/ref/select.sgml 2010-06-20 13:59:13.0 +0200 --- pgsql/doc/src/sgml/ref/select.sgml 2010-07-29 11:58:56.0 +0200 *** FOR SHARE [ OF semNum; + + do + { + ImmediateInterruptOK = interruptOK; + CHECK_FOR_INTERRUPTS(); + errStatus = semop(sema->semId, &sops, 1); + ImmediateInterruptOK = false; + } while (errStatus < 0 && errno == EINTR && !lock_timeout_detected); + + if (lock_timeout_detected) + return; + if (errStatus < 0) + elog(FATAL, "semop(id=%d) failed: %m", sema->semId); + } diff -dcrpN pgsql.orig/src/backend/port/win32_sema.c pgsql/src/backend/port/win32_sema.c *** pgsql.orig/src/backend/port/win32_sema.c 2010-01-02 17:57:50.0 +0100 --- pgsql/src/backend/port/win32_sema.c 2010-07-29 11:58:56.0 +0200 *** *** 16,21 --- 16,22 #include "miscadmin.h" #include "storage/ipc.h" #include "storage/pg_sema.h" + #include "storage/proc.h" static HANDLE *mySemSet; /* IDs of sema s
Re: [HACKERS] Synchronous replication
On Mon, Aug 2, 2010 at 7:53 PM, Robert Haas wrote: > Let's not get *the manner of specifying the policy* confused with *the > need to update the policy when the master changes*. It doesn't seem > likely you would want the same value for synchronous_standbys on all > your machines. In the most common configuration, you'd probably have: > > on A: synchronous_standbys=B > on B: synchronous_standbys=A Oh, true. But, what if we have another synchronous standby called C? We specify the policy as follows?: on A: synchronous_standbys=B,C on B: synchronous_standbys=A,C on C: synchronous_standbys=A,B We would need to change the setting on both A and B when we want to change the name of the third standby from C to D, for example. No? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres as Historian
On Mon, Aug 2, 2010 at 3:20 AM, Hardik Belani wrote: > We are using postgres as RDBMS for our product. There is a requirement > coming for a feature which will require me to store data about various data > points (mainly numbers) on a time scale. Data measurement is being taken > every few secs/mins based and it is needed to be stored for statistical > analysis. Now this requires numbers (integers/floats) to be stored at every > mins. > > For this i can create a table with number and time (may be time offset > instead of timestamp) as columns. But still it will require me to store huge > number of rows in the order of few millions. Data is read only and only > inserts can happen. But I need to perform all kinds of aggregation to get > various statistics. for example: daily avg, monthly avg etc.. > > We already are using postgres for our product so using postgres does not add > any additional installation requirement and hence it is a bit easier. > > Would you recommand postgres for this kind of requirement and will be > provide the performance. OR do you recommand any other database meant > for such requirements. I am also searching for a good historian database if > postgres doesn't suppport. You can certainly use Postgres in this kind of environment, and I have. Of course, if the volume of data is higher than your hardware can keep up with, then you're going to have problems. Disabling synchronous_commit may help, if losing the last few transactions is acceptable in the event of a system crash; appropriate use of table partitioning may help, too. There are certainly databases out there that are better optimized for this case (consider the rrd stuff built into mrtg, for example), but they're also not as feature-rich, so it's a trade-off. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication
On Mon, Aug 2, 2010 at 5:02 AM, Fujii Masao wrote: > On Sun, Aug 1, 2010 at 9:51 PM, Robert Haas wrote: >> Perhaps someone will claim that nobody wants to do that anyway (which >> I don't believe, BTW), but even in simpler cases it would be nicer to >> have an explicit policy rather than - in effect - inferring a policy >> from a soup of GUC settings. For example, if you want one synchronous >> standby (A) and two asynchronous standbys (B and C). You can say >> quorum=1 on the master and then configure vote=1 on A and vote=0 on B >> and C, but now you have to look at four machines to figure out what >> the policy is, and a change on any one of those machines can break it. >> ISTM that if you can just write synchronous_standbys=A on the master, >> that's a whole lot more clear and less error-prone. > > Some standbys may become master later by failover. So we would > need to write something like synchronous_standbys=A on not only > current one master but also those standbys. Changing > synchronous_standbys would require change on all those servers. > Or the master should replicate even that change to the standbys? Let's not get *the manner of specifying the policy* confused with *the need to update the policy when the master changes*. It doesn't seem likely you would want the same value for synchronous_standbys on all your machines. In the most common configuration, you'd probably have: on A: synchronous_standbys=B on B: synchronous_standbys=A -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication
On Sun, Aug 1, 2010 at 9:51 PM, Robert Haas wrote: > Perhaps someone will claim that nobody wants to do that anyway (which > I don't believe, BTW), but even in simpler cases it would be nicer to > have an explicit policy rather than - in effect - inferring a policy > from a soup of GUC settings. For example, if you want one synchronous > standby (A) and two asynchronous standbys (B and C). You can say > quorum=1 on the master and then configure vote=1 on A and vote=0 on B > and C, but now you have to look at four machines to figure out what > the policy is, and a change on any one of those machines can break it. > ISTM that if you can just write synchronous_standbys=A on the master, > that's a whole lot more clear and less error-prone. Some standbys may become master later by failover. So we would need to write something like synchronous_standbys=A on not only current one master but also those standbys. Changing synchronous_standbys would require change on all those servers. Or the master should replicate even that change to the standbys? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication
On Sun, Aug 1, 2010 at 3:11 PM, Heikki Linnakangas wrote: > I don't think any of this quorum stuff makes much sense without explicitly > registering standbys in the master. I'm not sure if this is a good idea. This requires users to do more manual operations than ever when setting up the replication; assign unique name (or ID) to each standby, register them in the master, specify the names in each recovery.conf (or elsewhere), and remove the registration from the master when getting rid of the standby. But this is similar to the way of MySQL replication setup, so some people (excluding me) may be familiar with it. > That would also solve the fuzziness with wal_keep_segments - if the master > knew what standbys exist, it could keep track of how far each standby has > received WAL, and keep just enough WAL for each standby to catch up. What if the registered standby stays down for a long time? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Initial review of xslt with no limits patch
Hello 2010/8/2 Mike Fowler : > Hi Pavel, > > Currently your patch isn't applying to head, from the looks of things a > function signature has changed. Can you update your patch please? > yes - see attachment > Also, having had a read through the patch itself I note that there are no > tests and no changes to documentation. Shouldn't the documentation advertise > that the there are no limits on the numbers of parameters? A couple of tests > will also help me to test your patch, > the limit of 10 parameters was not documented. With this patch, there are not limit - or limit comes from libxml2. Test are a problem - for me - I don't understand to xslt too much - so I am not able to write xslt template with more than 10 params. I looked there and I the params are not tested now - so it is necessary to write a new set of regress tests. But I am not able to do it :(. > Below is the results of running patch: > > patch -p0 < ../nolimits.diff > patching file ./contrib/xml2/xslt_proc.c > Hunk #1 FAILED at 42. > Hunk #2 succeeded at 57 (offset -2 lines). > Hunk #3 succeeded at 69 (offset -2 lines). > Hunk #4 succeeded at 142 (offset -4 lines). > Hunk #5 succeeded at 179 (offset -4 lines). > Hunk #6 succeeded at 192 with fuzz 1 (offset -4 lines). > 1 out of 6 hunks FAILED -- saving rejects to file > ./contrib/xml2/xslt_proc.c.rej > > The rejects were: > > > *** ./contrib/xml2/xslt_proc.c.orig 2010-03-03 20:10:22.0 +0100 > --- ./contrib/xml2/xslt_proc.c 2010-05-03 15:07:17.010918303 +0200 > *** > *** 42,50 > extern void pgxml_parser_init(void); > > /* local defs */ > ! static void parse_params(const char **params, text *paramstr); > > ! #define MAXPARAMS 20 /* must be even, see parse_params() > */ > > #endif /* USE_LIBXSLT */ > > --- 42,51 > extern void pgxml_parser_init(void); > > /* local defs */ > ! const char **parse_params(text *paramstr); > > ! #define INIT_PARAMS 20 /* must be even, see > parse_params() */ > ! #define EXTEND_PARAMS 20 /* must be even, see parse_params() > */ > > #endif /* USE_LIBXSLT */ > > > Regards, > -- > Mike Fowler > Registered Linux user: 379787 > Regards Pavel Stehule *** ./contrib/xml2/xslt_proc.c.orig 2010-07-06 21:18:55.0 +0200 --- ./contrib/xml2/xslt_proc.c 2010-08-02 09:31:32.410911283 +0200 *** *** 41,49 extern void pgxml_parser_init(void); /* local defs */ ! static void parse_params(const char **params, text *paramstr); ! #define MAXPARAMS 20 /* must be even, see parse_params() */ #endif /* USE_LIBXSLT */ --- 41,50 extern void pgxml_parser_init(void); /* local defs */ ! const char **parse_params(text *paramstr); ! #define INIT_PARAMS 20 /* must be even, see parse_params() */ ! #define EXTEND_PARAMS 20 /* must be even, see parse_params() */ #endif /* USE_LIBXSLT */ *** *** 57,63 text *doct = PG_GETARG_TEXT_P(0); text *ssheet = PG_GETARG_TEXT_P(1); text *paramstr; ! const char *params[MAXPARAMS + 1]; /* +1 for the terminator */ xsltStylesheetPtr stylesheet = NULL; xmlDocPtr doctree; xmlDocPtr restree; --- 58,64 text *doct = PG_GETARG_TEXT_P(0); text *ssheet = PG_GETARG_TEXT_P(1); text *paramstr; ! const char **params; xsltStylesheetPtr stylesheet = NULL; xmlDocPtr doctree; xmlDocPtr restree; *** *** 69,79 if (fcinfo->nargs == 3) { paramstr = PG_GETARG_TEXT_P(2); ! parse_params(params, paramstr); } else /* No parameters */ params[0] = NULL; /* Setup parser */ pgxml_parser_init(); --- 70,83 if (fcinfo->nargs == 3) { paramstr = PG_GETARG_TEXT_P(2); ! params = parse_params(paramstr); } else + { /* No parameters */ + params = palloc(sizeof(char *)); params[0] = NULL; + } /* Setup parser */ pgxml_parser_init(); *** *** 139,160 #ifdef USE_LIBXSLT ! static void ! parse_params(const char **params, text *paramstr) { char *pos; char *pstr; - int i; char *nvsep = "="; char *itsep = ","; ! pstr = text_to_cstring(paramstr); pos = pstr; ! ! for (i = 0; i < MAXPARAMS; i++) { ! params[i] = pos; pos = strstr(pos, nvsep); if (pos != NULL) { --- 143,175 #ifdef USE_LIBXSLT ! const char ** ! parse_params(text *paramstr) { char *pos; char *pstr; char *nvsep = "="; char *itsep = ","; ! const char **params; ! int nparams; ! int mparams; /* max params */ ! pstr = text_to_cstring(paramstr); + + mparams = INIT_PARAMS; + params = (const char **) palloc(INIT_PARAMS * sizeof(char *) + 1); pos = pstr; ! nparams = 0; ! while (*pos != '\0') { ! if (nparams >= mparams) ! { ! /* extend params params */ ! mparams += EXTEND_PARAMS; ! params = (const char **) repalloc(params, mparams * sizeof(char *) + 1); ! } ! para
Re: [HACKERS] english parser in text search: support for multiple words in the same position
Hi, On 08/01/2010 08:04 PM, Sushant Sinha wrote: 1. We do not have separate tokens "wikipedia" and "org" 2. If we have the two tokens we should have them at adjacent position so that a phrase search for "wikipedia org" should work. This would needlessly increase the number of tokens. Instead you'd better make it work like compound word support, having just "wikipedia" and "org" as tokens. Searching for "wikipedia.org" or "wikipedia org" should then result in the same search query with the two tokens: "wikipedia" and "org". position 0: WORD(wikipedia), URL(wikipedia.org/search?q=sushant) IMO the differentiation between WORDs and URLs is not something the text search engine should have to take care a lot. Let it just do the searching and make it do that well. What does a token "wikipedia.org/search?q=sushant" buy you in terms of text searching? Or even result highlighting? I wouldn't expect anybody to want to search for a full URL, do you? Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Postgres as Historian
We are using postgres as RDBMS for our product. There is a requirement coming for a feature which will require me to store data about various data points (mainly numbers) on a time scale. Data measurement is being taken every few secs/mins based and it is needed to be stored for statistical analysis. Now this requires numbers (integers/floats) to be stored at every mins. For this i can create a table with number and time (may be time offset instead of timestamp) as columns. But still it will require me to store huge number of rows in the order of few millions. Data is read only and only inserts can happen. But I need to perform all kinds of aggregation to get various statistics. for example: daily avg, monthly avg etc.. We already are using postgres for our product so using postgres does not add any additional installation requirement and hence it is a bit easier. Would you recommand postgres for this kind of requirement and will be provide the performance. OR do you recommand any other database meant for such requirements. I am also searching for a good historian database if postgres doesn't suppport. Thanks, Hardik