Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On 14.11.2010 22:55, Tom Lane wrote: Heikki Linnakangas writes: On 13.11.2010 17:07, Tom Lane wrote: Robert Haas writes: Come to think of it, I'm not really sure I understand what protects SetLatch() against memory ordering hazards. Is that actually safe? Hmm ... that's a good question. It certainly *looks* like it could malfunction on machines with weak memory ordering. Can you elaborate? Weak memory ordering means that stores into shared memory initiated by one processor are not guaranteed to be observed to occur in the same sequence by another processor. This implies first that the latch code could malfunction all by itself, if two processes manipulate a latch at about the same time, and second (probably much less likely) that there could be a malfunction involving a process that's waited on a latch not seeing the shared-memory status updates that another process did "before" setting the latch. This is not at all hypothetical --- my first attempt at rewriting the sinval signaling code, a couple years back, failed on PPC machines in the buildfarm because of exactly this type of issue. Hmm, SetLatch only sets one flag, so I don't see how it could malfunction all by itself. And I would've thought that declaring the Latch variable "volatile" prevents rearrangements. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] changing MyDatabaseId
Robert Haas writes: > Looking through the code, it appears to me that we'd need to do the > following (not necessarily in this order): Don't forget 9. Unload loadable modules that do not exist according to the new database's catalogs; eg we don't want postgis trying to run when its supporting tables don't exist in the selected database. 10. Somehow persuade remaining loadable modules to discard whatever state they have that might be database-local. We don't have workable APIs for either of those operations ATM. I believe also that there are probably race conditions in several of the steps you listed; in particular there is certainly a risk involved in changing the database-we-advertise-being-connected-to versus a concurrent DROP DATABASE. Maybe that's easily soluble, but I'm not sure about it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MULTISET and additional functions for ARRAY
Hello 2010/11/15 Itagaki Takahiro : > On Fri, Nov 12, 2010 at 00:02, Itagaki Takahiro > wrote: >> Postgres supports ARRAY data types well, but there are some >> more array functions in the SQL standard. Also, the standard >> has MULTISET data type, that is an unordered array. > > Here is a WIP patch for multiset function supports. Note that multiset > types are not added in the patch; it just adds functions and syntax. > Arguments or result types of those functions are anyarray rather than > anymultiset. The result type is always flatten into on-dimensional > array because some functions requires per-element operations; I'm not > sure how we should treat trim_array( 2D 3x3 array, 2 elements ). So, > it is treated as trim_array( 9 elements array, 2 elements ) in the patch. > > The SQL standard defines special syntax for multiset. I added four > unreserved keywords for them; A, MEMBER, MULTISET, and SUB. > (I don't like such ad-hoc syntax, but it is the standard...) > Some of the new expressions are just syntactic sugar for existing > other expressions or new functions. For example, "$1 MEMBER OF $2" is > expanded to "$1 = ANY ($2)" and "$1 IS A SET" to "multiset_is_a_set($1)". > > I have not researched the spec yet enough, especially NULLs in collections. > I'll continue to check the details. > > BTW, some of the intermediate products to implement those features might > be useful if exported. like array_sort() and array_unique(). If there is > demand, I'll add those functions, too. > > Any comments for the approach or detailed features? > I has not a standard, so I can't to speak about conformance with standard, but I must to say, so these functionality should be extremely useful for plpgsql programming. You can see samples a some functions on the net for array sorting, for reduce a redundant values via SQL language. I am sure, so implementation in C will be much faster. Maybe can be useful to implement a searching on sorted array. You can hold a flag if multiset is sorted or not. Regards Pavel Stehule > === New features === > - [FUNCTION] cardinality(anyarray) => integer > - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray > - [FUNCTION] element(anyarray) => anyelement > - [SYNTAX] $1 MEMBER OF $2 --> $1 = ANY ($2) > - [SYNTAX] $1 SUB MULTISET OF $2 --> $1 <@ $2 > - [SYNTAX] $1 IS A SET --> multiset_is_a_set($1) > - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 --> > multiset_union($1, $2, all?) > - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $2 --> > multiset_intersect($1, $2, all?) > - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $2 --> > multiset_except($1, $2, all?) > - [AGGREGATE] collect(anyelement) => anyarray --> same as array_agg() > - [AGGREGATE] fusion(anyarray) => anyarray > - [AGGREGATE] intersection(anyarray) => anyarray > > -- > Itagaki Takahiro > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
Greg Stark writes: > Fwiw I think he's right that sum(int2) should perhaps be redefined to > return int8. As it stands all it would take is a 64k rows to > potentially overflow. It's not super likely but it is plausible and > the performance penalty to use int8 wouldn't be super big either. It's not unreasonable. I think the performance penalty for int8 was higher when that choice was made than it is now --- and in particular, on a 64-bit machine it's now pretty much negligible. On the other hand, you can always execute sum(foo::int4) if you need a wider sum, just like the escape hatch if any of the other datatype choices aren't working for you. It's not clear that we should force a performance loss on people who don't need it (and I can't offhand recall *ever* hearing a complaint about sum(int2) overflowing...) I believe what the OP was arguing for was not so much this sort of marginal tinkering as inventing a more general notion of "integer type" that would avoid the whole issue. The problem with that is that we have to work within the set of types specified by the SQL standard. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] changing MyDatabaseId
I've spent a few hours pouring over the source code with coarse-toothed comb, trying to figure out just exactly what might break if we changed MyDatabaseId after backend startup time, or in other words, allowed a backend to unbind from the database to which it was originally bound and rebind to a new one. This is related to the periodic conversations we've had about a built-in connection pooler, and/or maintaining a pool of worker threads that could be used to service parallel query, replication sets, etc. What follows is not meant to be a concrete design proposal; it's basic research that may lead to a proposal at some time in the future. Still, comments are welcome. For the use cases that seem interesting, it is sufficient to think about changing MyDatabaseId when there is no transaction in progress, or perhaps, when there is a transaction in progress which does that and nothing else. This means that code that simply uses MyDatabaseId in the course of going about its duties is not going to blow up. However, anything that stores the current database ID someplace is a hazard, as is any code that stores information which was obtained using the current database ID. Looking through the code, it appears to me that we'd need to do the following (not necessarily in this order): 1. Blow away all of our catalog caches as if we've received a sinval reset. Since most of our caches are hooked into the catalog cache via invalidation hooks, this should trigger cascading invalidations of other caches that need it, at least in most cases. (Further looking is probably needed to identify any cases where a sinval-reset-equivalent is insufficient.) 2. Blow away absolutely everything from the relcache. It appears to me that we'd need to repeat the phase 2 and 3 initialization steps. We can't even keep the entries for shared catalogs, because the descriptors that formrdesc() coughs up may not exactly match what we read from the pg_class entry. It's tempting to propose adding enough additional shared catalogs to make the relcache entries for shared catalogs independent of any particular database's contents, but right now they are not. 3. Reinitialize the relmapper. 4. Release all locks on objects in the old database; and our shared lock on the database itself. Take a new shared lock on the new database. 5. Flush out any backend-local statistics that have been gathered but not yet sent to the statistics collector. 6. Update our entry in the PgBackendStatus array so pg_stat_activity sees the new database assignment. 7. Change MyProc->databaseId while holding ProcArrayLock in LW_EXCLUSIVE mode. 8. Update MyDatabaseTableSpace. I haven't benchmarked this (so perhaps I should STFU) but #2 and maybe #1 and #3 sounds like the most painful part of this activity. Still, there's not much help for it that I can see. The best you can hope for is to keep around entries to the shared catalogs, and that's not going to get you terribly far, and at least ATM even that is unsafe. I think maybe the thing to do is try to quantify how much time is being spent in each part of the backend startup process - like connect to the database lots and lots of times with individual connections, run a trivial SELECT against a table, and then disconnect and repeat. The problem is that (unless someone knows how to do magic tricks with oprofile or dtrace) you're only going to find out the amount of time actually spent executing each portion of the code. It won't tell you how much overhead you have from fork() itself, let alone how much faster the initialization steps would have run if they had been running on a warmed-up process address space rather than one that might well need to fault in a lot of page-table entries in its first few moments of life. But it might still provide some useful data. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED estimated time of arrival?
On Mon, Nov 15, 2010 at 12:41, Shigeru HANADA wrote: > No, SQL/MED would not support indexing foreign tables, at least in > first version. Because it would be difficult to use common row id for > various FDWs. I think the reason is the SQL standard never mention about indexes. It is not a specific issue for SQL/MED. > To support indexing foreign tables might need to change > common structure of index tuple to be able to hold virtual row-id, not > ItemPointerData. I'm not sure we actually need foreign indexes because the query text sent to another server is same whether the foreign table has indexes. Of course, foreign indexes might be useful to calculate costs to scan foreign tables, but the cost also comes from non-index conditions. I think foreign table and foreign index are a model for row-based databases, including postgres. But other DBs might have different cost models. So, it would be better to encapsulate such operations in FDW. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED estimated time of arrival?
On Fri, 12 Nov 2010 08:27:54 -0800 Eric Davies wrote: > Thank you for the time estimate and the interface discussion. It > sounds like the PostgreSQL SQL/MED code will be very useful when it > is done. Our product provides read-only access to files, so > updates/inserts/deletes aren't an issue for us. > > One thing that is not clear to me is indexing support. Will it be > possible to index a SQL/MED table as if it were a regular table? No, SQL/MED would not support indexing foreign tables, at least in first version. Because it would be difficult to use common row id for various FDWs. To support indexing foreign tables might need to change common structure of index tuple to be able to hold virtual row-id, not ItemPointerData. Instead, FDW can handle expressions which are parsed from WHERE clause and JOIN condition of original SQL, and use them to optimize scanning. For example, FDW for PostgreSQL pushes some conditions down to remote side to decrease result tuples to be transferred. I hope this idea helps you. > What > would be the equivalent of Informix's row ids? Answer to the second question would be "ItemPointerData". It consists of a block number and an offset in the block, and consume 6 bytes for each tuple. With this information, PostgreSQL can access to a data tuple directly. Actual definition is: typedef struct ItemPointerData { BlockIdData ip_blkid; OffsetNumber ip_posid; } ItemPointer; Does Informix uses common row-id (AFAIK it's 4 bytes integer) for both of virtual tables and normal tables? Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MULTISET and additional functions for ARRAY
On Fri, Nov 12, 2010 at 00:02, Itagaki Takahiro wrote: > Postgres supports ARRAY data types well, but there are some > more array functions in the SQL standard. Also, the standard > has MULTISET data type, that is an unordered array. Here is a WIP patch for multiset function supports. Note that multiset types are not added in the patch; it just adds functions and syntax. Arguments or result types of those functions are anyarray rather than anymultiset. The result type is always flatten into on-dimensional array because some functions requires per-element operations; I'm not sure how we should treat trim_array( 2D 3x3 array, 2 elements ). So, it is treated as trim_array( 9 elements array, 2 elements ) in the patch. The SQL standard defines special syntax for multiset. I added four unreserved keywords for them; A, MEMBER, MULTISET, and SUB. (I don't like such ad-hoc syntax, but it is the standard...) Some of the new expressions are just syntactic sugar for existing other expressions or new functions. For example, "$1 MEMBER OF $2" is expanded to "$1 = ANY ($2)" and "$1 IS A SET" to "multiset_is_a_set($1)". I have not researched the spec yet enough, especially NULLs in collections. I'll continue to check the details. BTW, some of the intermediate products to implement those features might be useful if exported. like array_sort() and array_unique(). If there is demand, I'll add those functions, too. Any comments for the approach or detailed features? === New features === - [FUNCTION] cardinality(anyarray) => integer - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray - [FUNCTION] element(anyarray) => anyelement - [SYNTAX] $1 MEMBER OF $2 --> $1 = ANY ($2) - [SYNTAX] $1 SUB MULTISET OF $2 --> $1 <@ $2 - [SYNTAX] $1 IS A SET --> multiset_is_a_set($1) - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 --> multiset_union($1, $2, all?) - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $2 --> multiset_intersect($1, $2, all?) - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $2 --> multiset_except($1, $2, all?) - [AGGREGATE] collect(anyelement) => anyarray --> same as array_agg() - [AGGREGATE] fusion(anyarray) => anyarray - [AGGREGATE] intersection(anyarray) => anyarray -- Itagaki Takahiro multiset-20101115.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contrib: auth_delay module
On Thu, Nov 4, 2010 at 10:04 AM, Robert Haas wrote: > On Thu, Nov 4, 2010 at 6:35 AM, Stephen Frost wrote: >> * Jan Urbański (wulc...@wulczer.org) wrote: >>> On 04/11/10 14:09, Robert Haas wrote: >>> > Hmm, I wonder how useful this is given that restriction. >>> >>> As KaiGai mentined, it's more to make bruteforcing difficult (read: tmie >>> consuming), right? >> >> Which it would still do, since the attacker would be bumping up against >> max_connections. max_connections would be a DOS point, but that's no >> different from today. Other things could be put in place to address >> that (max # of connections from a given IP or range could be implemented >> using iptables, as an example). >> >> 5 second delay w/ max connections at 100 would mean max of 20 attempts >> per second, no? That's alot fewer than 100*(however many attempts can >> be done in a second). Doing a stupid while true; psql -d blah; done >> managed to get 50 successful ident auths+no-db-found errors done in a >> second on one box here. 5000 >> 20, and I wasn't even trying. > > OK. I was just asking. I don't object to it if people think it's > useful, especially if they are looking at it as "I would actually use > this on my system" rather than "I can imagine a hypothetical person > using this". I haven't heard anyone say "yes, I would actually use this on my system"? Any takers? If we're to commit this, then the patch needs to add a new file authdelay.smgl, fill it in with appropriate contents, and update contrib.sgml and filelist.sgml accordingly. I also note that the patch offers the ability to log superuser logins. Since that seems a bit off-topic for a contrib module called auth_delay, and since we already have a GUC called log_connections, I'm inclined to propose removing that part. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
On Sun, Nov 14, 2010 at 9:16 PM, Greg Stark wrote: > On Sun, Nov 14, 2010 at 11:15 AM, Daniel Farina wrote: >> SUM(int2) => int4 >> SUM(int4) => int8 >> SUM(int8) => numeric >> >> Some weaknesses: >> >> SUM, of any precision, assumes that the precision being accumulated >> into (which is also the return-precision) is enough to avoid overflow. >> This is generally the case, but there's no reason why it *must* be >> true. I'm especially looking at the int2 to int4 conversion. One could >> imagine a more interesting scenario where overflow behavior could >> occur much more easily. > > Fwiw I think he's right that sum(int2) should perhaps be redefined to > return int8. As it stands all it would take is a 64k rows to > potentially overflow. It's not super likely but it is plausible and > the performance penalty to use int8 wouldn't be super big either. > > int4 doesn't seem like as realistic a problem since it would take 4 > billion rows and the performance penalty for using numeric would be > much higher. I think you could get the same effect by removing sum(int2) altogether, which I wouldn't lose any sleep over. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix for cube picksplit function
On Fri, Oct 29, 2010 at 2:59 PM, Alexander Korotkov wrote: > Gist picksplit method implementation in cube contrib module contains a bug > in Guttman's split algorithm implementation. It was mentioned before but it > is still not fixed yet. Current implementation doesn't cause incorrect > behavior, but index becomes very bad, because each index scan require to > read significant part of index. It looks to me like this can be safely back-patched, because it doesn't affect the contents of index entries already on disk, just the manner in which we construct new ones. So I've committed this and back-patched it all the way back to 8.1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Explain analyze getrusage tracking
This is an update to my earlier patch to add getrusage resource tracking to EXPLAIN ANALYZE. With this patch you get something like: QUERY PLAN -- Seq Scan on i (cost=0.00..6919.44 rows=262144 width=101) (actual time=17.240..1123.751 rows=262144 loops=1) Resources: sys=210.000ms user=430.000ms read=33.6MB Buffers: shared read=4298 Total runtime: 1548.651 ms (4 rows) The main change is to make it work under Windows. At least I think the changes should make it work under Windows, I haven't been able to test it. Actually I'm not to happy with the way I did it, I would be more inclined to hack the getrusagestub,h definition of struct rusage to have an instr_time in it so that we can use the same macros directly. But that's more changes than I would be happy making without being able to compile them to test them. -- greg *** a/src/backend/commands/explain.c --- b/src/backend/commands/explain.c *** *** 110,115 static void ExplainJSONLineEnding(ExplainState *es); --- 110,116 static void ExplainYAMLLineStarting(ExplainState *es); static void escape_json(StringInfo buf, const char *str); static void escape_yaml(StringInfo buf, const char *str); + static double normalize_memory(double amount, char **unit, int *precision); *** *** 142,147 ExplainQuery(ExplainStmt *stmt, const char *queryString, --- 143,150 es.costs = defGetBoolean(opt); else if (strcmp(opt->defname, "buffers") == 0) es.buffers = defGetBoolean(opt); + else if (strcmp(opt->defname, "resource") == 0) + es.rusage = defGetBoolean(opt); else if (strcmp(opt->defname, "format") == 0) { char *p = defGetString(opt); *** *** 368,373 ExplainOnePlan(PlannedStmt *plannedstmt, ExplainState *es, --- 371,378 instrument_option |= INSTRUMENT_TIMER; if (es->buffers) instrument_option |= INSTRUMENT_BUFFERS; + if (es->rusage) + instrument_option |= INSTRUMENT_RUSAGE; /* * Use a snapshot with an updated command ID to ensure this query sees *** *** 1098,1103 ExplainNode(PlanState *planstate, List *ancestors, --- 1103,1199 break; } + /* Show resource usage from getrusage */ + if (es->rusage && es->format == EXPLAIN_FORMAT_TEXT) + { + const struct rusage *usage = &planstate->instrument->rusage; + + bool has_rusage = (!INSTR_TIME_IS_ZERO_TV(usage->ru_stime) || + !INSTR_TIME_IS_ZERO_TV(usage->ru_utime) || + usage->ru_inblock > 0 || + usage->ru_oublock > 0); + bool has_verbose_rusage = (usage->ru_minflt > 0 || + usage->ru_majflt > 0 || + usage->ru_nswap> 0 || + usage->ru_msgsnd > 0 || + usage->ru_msgrcv > 0 || + usage->ru_nsignals > 0 || + usage->ru_nvcsw> 0 || + usage->ru_nivcsw > 0); + + if (has_rusage || (es->verbose && has_verbose_rusage)) + { + appendStringInfoSpaces(es->str, es->indent *2); + appendStringInfoString(es->str, "Resources:"); + + if (!INSTR_TIME_IS_ZERO_TV(usage->ru_stime)) + { + double stime = INSTR_TIME_GET_DOUBLE_TV(usage->ru_stime); + appendStringInfo(es->str, " sys=%.3fms", stime * 1000); + } + + if (!INSTR_TIME_IS_ZERO_TV(usage->ru_utime)) + { + double utime = INSTR_TIME_GET_DOUBLE_TV(usage->ru_utime); + appendStringInfo(es->str, " user=%.3fms", utime * 1000); + } + + if (usage->ru_inblock > 0) + { + double inblock; + char *units; + int prec; + inblock = normalize_memory((double)usage->ru_inblock * 512, &units, &prec); + appendStringInfo(es->str, " read=%.*f%s", prec, inblock, units); + } + if (usage->ru_oublock > 0) + { + double oublock; + char *units; + int prec; + oublock = normalize_memory((double)usage->ru_oublock * 512, &units, &prec); + appendStringInfo(es->str, " written=%.*f%s", prec, oublock, units); + } + if (es->verbose) + { + if (usage->ru_minflt > 0) + appendStringInfo(es->str, " minflt=%ld", usage->ru_minflt); + if (usage->ru_majflt > 0) + appendStringInfo(es->str, " majflt=%ld", usage->ru_majflt); + if (usage->ru_nswap > 0) + appendStringInfo(es->str, " nswap=%ld", usage->ru_nswap); + if (usage->ru_msgsnd > 0) + appendStringInfo(es->str, " msgsnd=%ld", usage->ru_msgsnd); + if (usage->ru_msgrcv > 0) + appendStringInfo(es->str, " msgrcv=%ld", usage->ru_msgrcv); + if (usage->ru_nsignals > 0) + appendStringInfo(es->str, " nsignals=%ld", usage->ru_nsignals); + if (usage->ru_nvcsw > 0) + appendStringInfo(es->str, " nvcsw=%ld", usage->ru_nvcsw); + if (usage->ru_nivcsw > 0) + appendStringInfo(es->str, " nivcsw=%ld", usage->ru_nivcsw); + } + appendStringInfoChar(es->str, '
Re: [HACKERS] Comparison with "true" in source code
On Mon, Nov 15, 2010 at 11:13, Robert Haas wrote: >> I added an additional cleanup to 'header_mode' in ecpg; I changed the type >> from bool to char to hold 'h' or 'c'. Do you think it is reasonable? > > I looked at this but found that part a bit too clever for its own good. > > So committed the rest, plus an additional one-line change to psql's > print.c to avoid making the two accesses to format->wrap_right_pointer > inconsistent with each other. Thanks! -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
On Sun, Nov 14, 2010 at 11:15 AM, Daniel Farina wrote: > SUM(int2) => int4 > SUM(int4) => int8 > SUM(int8) => numeric > > Some weaknesses: > > SUM, of any precision, assumes that the precision being accumulated > into (which is also the return-precision) is enough to avoid overflow. > This is generally the case, but there's no reason why it *must* be > true. I'm especially looking at the int2 to int4 conversion. One could > imagine a more interesting scenario where overflow behavior could > occur much more easily. Fwiw I think he's right that sum(int2) should perhaps be redefined to return int8. As it stands all it would take is a 64k rows to potentially overflow. It's not super likely but it is plausible and the performance penalty to use int8 wouldn't be super big either. int4 doesn't seem like as realistic a problem since it would take 4 billion rows and the performance penalty for using numeric would be much higher. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Comparison with "true" in source code
On Wed, Nov 3, 2010 at 9:45 PM, Itagaki Takahiro wrote: > On Wed, Nov 3, 2010 at 2:19 AM, Michael Meskes wrote: >> On Mon, Nov 01, 2010 at 12:17:02PM +0900, Itagaki Takahiro wrote: >>> There are some "== true" in the codes, but they might not be safe >>> because all non-zero values are true in C. Is it worth cleaning up them? > > Here is a proposed cleanup that replaces "boolean == true" with "boolean". > I didn't touch "== false" unless they are not in pairs of comparisons > with true because comparison with false is a valid C code. > > Note that I also changed "boolean != true" in pg_upgrade, > but I didn't change ones in xlog.c because it might check > corrupted fields in control files. > >>> src/interfaces/ecpg/preproc/ecpg.c(310): >>> ptr2ext[3] = (header_mode == true) ? 'h' : 'c'; >> I actually see no reason why these variables are not defined as bool instead >> of >> int, so I changed this. Hopefully I found all of them. > > I added an additional cleanup to 'header_mode' in ecpg; I changed the type > from bool to char to hold 'h' or 'c'. Do you think it is reasonable? I looked at this but found that part a bit too clever for its own good. So committed the rest, plus an additional one-line change to psql's print.c to avoid making the two accesses to format->wrap_right_pointer inconsistent with each other. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support
On Sun, Nov 14, 2010 at 12:13 PM, Tom Lane wrote: > I wrote: >> I still think it's worth looking into whether the bug can be dodged >> by shortening the eval calls. > > In fact, that does seem to work; I'll commit a patch after testing a > bit more. > > We still need someone to add the missing build dependencies so that > make -j is trustworthy again. Yes, please. This is currently failing for me: gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g -Werror -bundle -multiply_defined suppress -o ascii_and_mic.so ascii_and_mic.o -L../../../../../../src/port -L/opt/local/lib -Wl,-dead_strip_dylibs -Werror -bundle_loader ../../../../../../src/backend/postgres^M ld: file not found: ../../../../../../src/backend/postgres collect2: ld returned 1 exit status make[3]: *** [ascii_and_mic.so] Error 1 make[2]: *** [all-ascii_and_mic-recurse] Error 2 make[1]: *** [all-backend/utils/mb/conversion_procs-recurse] Error 2 make[1]: *** Waiting for unfinished jobs -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Count backend self-sync calls
On Sun, Nov 14, 2010 at 7:27 PM, Tom Lane wrote: > Robert Haas writes: >> It might be even better to mention that the reason why we couldn't >> forward the fsync request is that the fsync request queue is full. >> I'm not sure exactly how to phrase that. I thought about: > >> fsync request queue is full > >> But that seems not to answer the "so what" question. There is an >> example like this in the docs: > >> could not forward fsync request (fsync request queue is full) > >> ...but I'm not sure I like that. > > Well, that example is meant to cover cases where you have to assemble a > couple of independently created phrases. In this case I'd suggest > could not forward fsync request because request queue is full Sounds good to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Count backend self-sync calls
On Sun, Nov 14, 2010 at 7:19 PM, Greg Smith wrote: >> But if this is generating a lot of log data or adding a lot of >> overhead, then you have bigger problems anyway: >> >> + elog(DEBUG1, "Unable to forward fsync request, executing >> directly"); >> > > The argument against this log line even existing is that if the field is > added to pg_stat_bgwriter, that's probably how you want to monitor this data > anyway. I'll remove it if you really want it gone, but personally I think it's useful to have. I've more than once had to debug a problem given a PostgreSQL log file with the debug level cranked up and not a whole lot else. Rare events that cause performance to tank are worth logging, IMHO. > I started out touching code that called it just "sync", but then crossed to > other code that called it "fsync", and made the external UI use that name. > No objections to sorting that out within my patch so it's consistent. OK, I'll do that before I commit it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Marti Raudsepp wrote: PostgreSQL's default settings change when built with Linux kernel headers 2.6.33 or newer. As discussed on the pgsql-performance list, this causes a significant performance regression: http://archives.postgresql.org/pgsql-performance/2010-10/msg00602.php NB! I am not proposing to change the default -- to the contrary -- this patch restores old behavior. Following our standard community development model, I've put this patch onto our CommitFest list: https://commitfest.postgresql.org/action/patch_view?id=432 and assigned myself as the reviewer. I didn't look at this until now because I already had some patch development and review work to finish before the CommitFest deadline we just crossed. Now I can go back to reviewing other people's work. P.S. There is no pgsql-patch list anymore; everything goes through the hackers list now. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest 2010-11: Call for Reviewers
Despite some jerk submitting three brand new patches at the last minute by himself, the November CommitFest is now closed for new submissions and ready for review! Only 7 of the 40 patches that need review have a reviewer assigned so far. Reviewers, mark down a patch you want to look at in the CommitFest app: https://commitfest.postgresql.org/action/commitfest_view?id=8 or ask to be assigned one on the pgsql-rrreviewers list, if you want to work on something but don't have a preference for which patch. There are a few larger patches in this round, but the majority of those already have a reviewer attached to them. Many of the remaining patches are pretty approachable even if you're relatively new to reviewing PostgreSQL code. See the usual for details: http://wiki.postgresql.org/wiki/Reviewing_a_Patch http://wiki.postgresql.org/wiki/RRReviewers -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_bgwriter broken?
Greg Smith writes: > The way you've explained it now, I see why it's not worth adding a > specific regression test for this in the future, and what else I should > have checked when I ran into my problem (that other system stats views > were also working or not). FWIW, I wouldn't object to a test that somehow exercised the specific functionality of the view; I was just dubious that there was any value in the sort of test you were suggesting. We found ways to exercise the table-counter stuff in a sufficiently reproducible way for the stats test --- can you do something similar here? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Count backend self-sync calls
Robert Haas writes: > It might be even better to mention that the reason why we couldn't > forward the fsync request is that the fsync request queue is full. > I'm not sure exactly how to phrase that. I thought about: > fsync request queue is full > But that seems not to answer the "so what" question. There is an > example like this in the docs: > could not forward fsync request (fsync request queue is full) > ...but I'm not sure I like that. Well, that example is meant to cover cases where you have to assemble a couple of independently created phrases. In this case I'd suggest could not forward fsync request because request queue is full or, if you think there might sometime be a need to have a strerror variant, ie could not forward fsync request: %m then maybe this would make the most sense: could not forward fsync request: request queue is full regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Count backend self-sync calls
Robert Haas wrote: I think this one could be removed: + if (n > 0) + elog(DEBUG1,"Absorbing %d fsync requests",n); Right; that one is just there to let you know the patch is working, and how much the background writer does for you per pass, mainly for the purpose of reviewing the patch. But if this is generating a lot of log data or adding a lot of overhead, then you have bigger problems anyway: + elog(DEBUG1, "Unable to forward fsync request, executing directly"); The argument against this log line even existing is that if the field is added to pg_stat_bgwriter, that's probably how you want to monitor this data anyway. I don't think there's actually much value to it, which is one reason I didn't worry too much about matching style guidelines, translation, etc. You've found the two things that I think both need to go away before commit, but I left them in because I think they're valuable for testing the patch itself does what it's supposed to. Also, how about referring to this as buffers_backend_fsync consistently throughout, instead of dropping the "f" in some places? I started out touching code that called it just "sync", but then crossed to other code that called it "fsync", and made the external UI use that name. No objections to sorting that out within my patch so it's consistent. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Instrument checkpoint sync calls
Magnus Hagander wrote: I think that it's reasonable for the sort of people who turn log_checkpoints on to also get the sync summary line, thus it being logged at LOG level. In that case, wouldn't it make more sense to add a couple of more fields to the existing line? Easier to post-process the logfile that way... It might. One trade-off is that if you're looking at the sync write detail, the summary comes out in a similar form. And it was easy to put in here--I'd have to return some new data out of the sync phase call in order for that to show up in the main log. If there's general buy-in on the idea, I could do all of that. I personally think that if you're already making an fsync call and have log_checkpoints on, the additional overhead of also timing that fsync is minimal even on platforms where timing is slow (I don't have such a system to test that assumption however)... It sounds like it should be - but that should be possible to measure, no? What I was alluding to is that I know gettimeofday executes fast on my Linux system here, so even if I did measure the overhead and showed it's near zero that doesn't mean it will be so on every platform. The "how long does it take to find out the current time on every supported PostgreSQL platform?" question is one I'd like to have an answer to, but it's hard to collect properly. All I know is that I don't have any system where it's slow to properly test again here. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
Re: [HACKERS] pg_stat_bgwriter broken?
Tom Lane wrote: If the cause was what I suggested, most likely the other stats views were broken too --- were you able to pass the regression tests in that state? I was having a lot of problems with the system that included regression test issues, and yelped about this one earlier than I normally would because it was blocking work I wanted to get submitted today. Sorry about the noise, and thanks for the info about why this was happening. The way you've explained it now, I see why it's not worth adding a specific regression test for this in the future, and what else I should have checked when I ran into my problem (that other system stats views were also working or not). -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Spread checkpoint sync
Final patch in this series for today spreads out the individual checkpoint fsync calls over time, and was written by myself and Simon Riggs. Patch is based against a system that's already had the two patches I sent over earlier today applied, rather than HEAD, as both are useful for measuring how well this one works. You can grab a tree with all three from my Github repo, via the "checkpoint" branch: https://github.com/greg2ndQuadrant/postgres/tree/checkpoint This is a work in progress. While I've seen this reduce checkpoint spike latency significantly on a large system, I don't have any referencable performance numbers I can share yet. There are also a couple of problems I know about, and I'm sure others I haven't thought of yet The first known issues is that it delays manual or other "forced" checkpoints, which is not necessarily wrong if you really are serious about spreading syncs out, but it is certainly surprising when you run into it. I notice this most when running createdb on a busy system. No real reason for this to happen, the code passes that it's a forced checkpoint down but just doesn't act on it yet. The second issue is that the delay between sync calls is currently hard-coded, at 3 seconds. I believe the right path here is to consider the current checkpoint_completion_target to still be valid, then work back from there. That raises the question of what percentage of the time writes should now be compressed into relative to that, to leave some time to spread the sync calls. If we're willing to say "writes finish in first 1/2 of target, syncs execute in second 1/2", that I could implement that here. Maybe that ratio needs to be another tunable. Still thinking about that part, and it's certainly open to community debate. The thing to realize that complicates the design is that the actual sync execution may take a considerable period of time. It's much more likely for that to happen than in the case of an individual write, as the current spread checkpoint does, because those are usually cached. In the spread sync case, it's easy for one slow sync to make the rest turn into ones that fire in quick succession, to make up for lost time. There's some history behind this design that impacts review. Circa 8.3 development in 2007, I had experimented with putting some delay between each of the fsync calls that the background writer executes during a checkpoint. It didn't help smooth things out at all at the time. It turns out that's mainly because all my tests were on Linux using ext3. On that filesystem, fsync is not very granular. It's quite likely it will push out data you haven't asked to sync yet, which means one giant sync is almost impossible to avoid no matter how you space the fsync calls. If you try and review this on ext3, I expect you'll find a big spike early in each checkpoint (where it flushes just about everything out) and then quick response for the later files involved. The system this patch originated to help fix was running XFS. There, I've confirmed that problem doesn't exist, that individual syncs only seem to push out the data related to one file. The same should be true on ext4, but I haven't tested that myself. Not sure how granular the fsync calls are on Solaris, FreeBSD, Darwin, etc. yet. Note that it's still possible to get hung on one sync call for a while, even on XFS. The worst case seems to be if you've created a new 1GB database table chunk and fully populated it since the last checkpoint, on a system that's just cached the whole thing so far. One change that turned out be necessary rather than optional--to get good performance from the system under tuning--was to make regular background writer activity, including fsync absorb checks, happen during these sync pauses. The existing code ran the checkpoint sync work in a pretty tight loop, which as I alluded to in an earlier patch today can lead to the backends competing with the background writer to get their sync calls executed. This squashes that problem if the background writer is setup properly. What does properly mean? Well, it can't do that cleanup if the background writer is sleeping. This whole area was refactored. The current sync absorb code uses the constant WRITES_PER_ABSORB to make decisions. This new version replaces that hard-coded value with something that scales to the system size. It now ignores doing work until the number of pending absorb requests has reached 10% of the number possible to store (BgWriterShmem->max_requests, which is set to the size of shared_buffers in 8K pages, AKA NBuffers). This may actually postpone this work for too long on systems with large shared_buffers settings; that's one area I'm still investigating. As far as concerns about this 10% setting not doing enough work, which is something I do see, you can always increase how often absorbing happens by decreasing bgwr
Re: [HACKERS] Count backend self-sync calls
On Sun, Nov 14, 2010 at 6:10 PM, Tom Lane wrote: > Robert Haas writes: >> With those changes, I think this is committable, and will do so, >> barring objections. > > Obey message style guidelines, please, especially if you're going > to promote any of those to ereports. The only new message would be the one Greg has as: Unable to forward fsync request, executing directly For that, we could just go with: could not forward fsync request (Lower case, avoid use of unable, telegram style with program as implicit subject.) It might be even better to mention that the reason why we couldn't forward the fsync request is that the fsync request queue is full. I'm not sure exactly how to phrase that. I thought about: fsync request queue is full But that seems not to answer the "so what" question. There is an example like this in the docs: could not forward fsync request (fsync request queue is full) ...but I'm not sure I like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI update
That it's not ready for commit this minute does not mean that it shouldn't be in the CF this month. Delaying the first review of the patch until the next CF pretty much ensures that we'll miss 9.1 with it, so please add to the current CF :) Cheers, David. On Sat, Nov 13, 2010 at 02:15:40PM -0600, Kevin Grittner wrote: > Circumstances have conspired to leave me with very little time to > work on the SSI patch during the last few weeks. I'm still convinced > that the work mentioned in this post is necessary to have a > commit-quality patch: > > http://archives.postgresql.org/pgsql-hackers/2010-10/msg01754.php > > I also think it's going to be very desirable to convert the conflict > pointers used in the papers (and in the patch so far) to lists, to > eliminate one source of false positives and allow more aggressive > clean-up of transactions. I we implement the logic from the above > post and *then* convert the pointers to lists, it seems like more > work than implementing the lists first. Therefore, I'm planning on > doing the lists first. > > Anyone who thinks that's a bad idea, please speak up soon, as I'm > starting coding on that today. > > There's no way there will be a patch implementing this in time for > the 2010-11 CF. A couple months ago there were a couple people who > said they'd be willing to look at this between CFs, so I'm hoping > that their schedules still permit them to do that after this > unfortunate delay. > > -Kevin > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
On Sun, Nov 14, 2010 at 11:02:08PM +0100, Yeb Havinga wrote: > On 2010-11-14 21:06, Marko Tiikkaja wrote: > >On 2010-11-14 8:51 PM +0200, Yeb Havinga wrote: > >>On 2010-11-14 19:35, Robert Haas wrote: > >>>On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja > >>> wrote: > In my opinion, all of these should have the same effect: > DELETE all rows > from "foo". Any other option means we're going to have > trouble predicting > how a query is going to behave. > >>>I think it's clear that's the only sensible behavior. > >>What if CTE's ever get input parameters? > > > >What about input parameters? > With input parameters there is a clear link between a CTE and a > caller. If a CTE is called more than once, it must be executed more > than once, e.g. (notation t:x means cte has parameter x) > > WITH t:x AS (INSERT INTO foo VALUES(x) RETURNING *) > SELECT (SELECT * FROM t(1)), (SELECT * FROM t(2)); > runs the cte two times, hence two new rows in foo. I think we can worry about that if we ever have run-time functions done as WITH, but I think they'd be a *much* better fit for DO. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal
On 11/14/2010 05:15 PM, Tom Lane wrote: Josh is ignoring the proposal that is on the table and seems actually workable, which is to consult the visibility map during index-only scans. For mostly-static tables this would save trips to the heap for very little extra I/O. The hard part is to make the VM reliable, but that is not obviously harder than making separately-stored hint bits reliable. I thought we had agreement in the past that this was the way we should proceed. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Count backend self-sync calls
Robert Haas writes: > With those changes, I think this is committable, and will do so, > barring objections. Obey message style guidelines, please, especially if you're going to promote any of those to ereports. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Count backend self-sync calls
On Sun, Nov 14, 2010 at 5:07 PM, Greg Smith wrote: > The patch also adds some logging to the innards involved here, to help with > understanding problems in this area. I don't think that should be in the > version committed as is. May want to drop the logging level or make it > disabled in regular builds, since it is sitting somewhere it generates a lot > of log data and adds overhead. I think this one could be removed: + if (n > 0) + elog(DEBUG1,"Absorbing %d fsync requests",n); But if this is generating a lot of log data or adding a lot of overhead, then you have bigger problems anyway: + elog(DEBUG1, "Unable to forward fsync request, executing directly"); I'm inclined to change that to an ereport(), but otherwise it seems OK. Also, how about referring to this as buffers_backend_fsync consistently throughout, instead of dropping the "f" in some places? With those changes, I think this is committable, and will do so, barring objections. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
Daniel Farina writes: > There are other ways one might be able to attack the performance part > of the problem, but consider the loss of information about the type > from int(2|4|8) to numeric when composing a series of sums: we know > the value produced fits the abstract notion of an Integer, but we lose > that information. The same could be said of SUM(x::numeric(1000,0)) > yielding an unrestricted numeric, rather than one of scale 0. Not only > is there more information available to the user, but the optimizer > should be able to benefit from that information as well. However, for > an arbitrary user-defined operator to take advantage it would seem to > me that there needs to be a hook where some reasoning can take place > over its input types and subsequently determine the return prototype > at that call site. Yeah, this has been discussed before. The problem is that it's too much work for too little reward. There are too many different possible behaviors for functions, and nobody is going to go through and write a subtype-inference helper function for every function in the system, or even any usefully large fraction of the functions in the system. What's more, the subtype definitions have largely been predetermined for us by SQL, and predetermined in such a way that knowing the subtype isn't really all that exciting. Is anybody willing to put in months of work to teach the system that sum(numeric(7,0)) yields numeric(something,0) rather than numeric(something,something)? I doubt it. The return on that knowledge would be too small, and there are too many cases where you couldn't deduce anything anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal
Greg Stark writes: > On Sun, Nov 14, 2010 at 8:52 PM, Josh Berkus wrote: >> For example, imagine if the hint bits were moved to a separate per-table >> bitmap outside the table instead of being stored with each row, as the >> current FSM is. > How many times do we have to keep going around the same block? > We *already* have separate bitmap outside the table for transaction > commit bits. It's the clog. > The only reason the hint bits exist is to cache that so we don't need > to do extra I/O to check tuple visibility. If the hint bits are moved > outside the table then they serve no purpose whatsover. Then you have > an additional I/O to attempt to save an additional I/O. Well, not quite. The case this could improve is index-only scans: you could go (or so he hopes) directly from the index to the hint bits given the TID stored by the index. A clog lookup is not possible without XMIN & XMAX, which we do not keep in index entries. But I'm just as skeptical as you are about this being a net win. It'll pessimize too much other stuff. Josh is ignoring the proposal that is on the table and seems actually workable, which is to consult the visibility map during index-only scans. For mostly-static tables this would save trips to the heap for very little extra I/O. The hard part is to make the VM reliable, but that is not obviously harder than making separately-stored hint bits reliable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Instrument checkpoint sync calls
On Sun, Nov 14, 2010 at 22:37, Greg Smith wrote: > Attached patch adds some logging for each individual fsync call made during > a checkpoint, along with a summary at the end. You need to have the > following to see all of the detail: > > log_checkpoints=on > log_min_messages=debug1 > > And here's a sample: > > LOG: checkpoint starting: immediate force wait > DEBUG: checkpoint sync: file=1 time=1.946000 msec > DEBUG: checkpoint sync: file=2 time=0.666000 msec > DEBUG: checkpoint sync: file=3 time=0.004000 msec > LOG: checkpoint sync: files=3 longest=1.946000 msec average=0.872000 msec > LOG: checkpoint complete: wrote 3 buffers (0.1%); 0 transaction log file(s) > added, 0 removed, 0 recycled; write=0.000 s, sync=0.002 s, total=0.003 s > > I think that it's reasonable for the sort of people who turn log_checkpoints > on to also get the sync summary line, thus it being logged at LOG level. In that case, wouldn't it make more sense to add a couple of more fields to the existing line? Easier to post-process the logfile that way... > The detail on individual syncs might go to DEBUG2 or lower instead of > DEBUG1 where I put it, that part I don't have a strong opinion on. It's at > DEBUG1 to make testing the patch without a gigantic amount of log data also > coming in easier. > > Right now the code is written such that all the calls that grab timing > information are wrapped around "ifdef DEBUG_FSYNC", which is a variable set > to 1 that could be a compile-time option like DEBUG_DEADLOCK, to allow > turning this code path off at build time. I personally think that if you're > already making an fsync call and have log_checkpoints on, the additional > overhead of also timing that fsync is minimal even on platforms where timing > is slow (I don't have such a system to test that assumption however). And > I've seen enough value in troubleshooting nasty checkpoint sync problems > using this patch to feel it's worth having even if it does add some > overhead. It sounds like it should be - but that should be possible to measure, no? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
On Sun, Nov 14, 2010 at 1:25 PM, Robert Haas wrote: > Like Tom, I'm not sure this is really a type-system problem. This > sounds like a complaint that operations on "numeric" are much slower > than operations on "int4" and "int8", even for values that could be > represented by either type. I think that's a valid complaint, but I > don't see how changing the type system would help. I think what you'd > need to is optimize the existing numeric type, or provide a new > numeric-ish type with optimizations for dealing with > small-to-medium-sized integers. There are other ways one might be able to attack the performance part of the problem, but consider the loss of information about the type from int(2|4|8) to numeric when composing a series of sums: we know the value produced fits the abstract notion of an Integer, but we lose that information. The same could be said of SUM(x::numeric(1000,0)) yielding an unrestricted numeric, rather than one of scale 0. Not only is there more information available to the user, but the optimizer should be able to benefit from that information as well. However, for an arbitrary user-defined operator to take advantage it would seem to me that there needs to be a hook where some reasoning can take place over its input types and subsequently determine the return prototype at that call site. I am not trying to suggest that there is no way that one could attack the details around SUM I have mentioned without changing the type system, only that they are limitations I encountered and worked around where a more powerful type system came to mind as an avenue for a simpler/more proper solution. fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
On tor, 2010-11-11 at 19:35 +0200, Marko Tiikkaja wrote: > I apologize, I had misunderstood what you are suggesting. But now > that I do, it seems to be an even worse idea to go your way. Based on > my research, I'm almost certain that the SQL standard says that the > execution order is deterministic if there is at least one DML > statement in the WITH list. > > Can anyone confirm this? SQL:2008 doesn't allow any DML in the WITH list. SQL:2011 has the "combined data store and retrieval" feature that was discussed in another thread which basically implements the same thing. They apparently avoid the whole issue by allowing only one data change delta table per query. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Count backend self-sync calls
The attached patch adds a new field to pg_stat_bgwriter, counting the number of times backends execute their own fsync calls. Normally, when a backend needs to fsync data, it passes a request to the background writer, which then absorbs the call into its own queue of work to do. However, under some types of heavy system load, the associated queue can fill. When this happens, backends are forced to do their own fsync call. This is potentially much worse than when they do a regular write. The really nasty situation is when the background writer is busy because it's executing a checkpoint. In that case, it's possible for the backend fsync calls to start competing with the ones the background writer is trying to get done, causing the checkpoint sync phase to execute slower than it should. I've seen the sync phase take over 45 minutes on a really busy server once it got into this condition, where hundreds of clients doing their own backend fsync calls were fighting against the checkpoint fsync work. With this patch, you can observe that happening as an upwards spike in pg_stat_bgwriter.buffers_backend_sync, which as documented is an inclusive subset of the total shown in buffers_backend. While it takes a busier system than I can useful show how to simulate here to show a really bad situation, I'm able to see some of these unabsorbed backend fsync calls when initializing a pgbench database, to prove they happen in the lab. The attached test program takes as its input a pgbench scale counter. It then creates a pgbench database (deleting any existing pgbench database, so watch out for that) and shows the values accumulated in pg_stat_bgwriter during that period. Here's an example, using the script's default scale of 100 on a server with 8GB of RAM and fake fsync (the hard drives are lying about it): -[ RECORD 1 ]+- now | 2010-11-14 16:08:41.36421-05 ... Initializing pgbench -[ RECORD 1 ]+-- now | 2010-11-14 16:09:46.713693-05 checkpoints_timed| 0 checkpoints_req | 0 buffers_checkpoint | 0 buffers_clean| 0 maxwritten_clean | 0 buffers_backend | 654716 buffers_backend_sync | 90 buffers_alloc| 803 This is with default sizing for memory structures. As you increase shared_buffers, one of the queues involved here increases proportionately, making it less likely to run into this problem. That just changes it to the kind of problem I've only seen on a larger system with a difficult to simulate workload. The production system getting hammered with this problem (running a web application) that prompted writing the patch had shared_buffers=4GB at the time. The patch also adds some logging to the innards involved here, to help with understanding problems in this area. I don't think that should be in the version committed as is. May want to drop the logging level or make it disabled in regular builds, since it is sitting somewhere it generates a lot of log data and adds overhead. It is nice for now, as it lets you get an idea how much fsync work *is* being absorbed by the BGW, as well as showing what relation is suffering from this issue. Example of both those things, with the default config for everything except log_checkpoints (on) and log_min_messages (debug1): DEBUG: Absorbing 4096 fsync requests DEBUG: Absorbing 150 fsync requests DEBUG: Unable to forward fsync request, executing directly CONTEXT: writing block 158638 of relation base/16385/16398 Here 4096 is the most entries the BGW will ever absorb at once, and all 90 of the missed sync calls are logged so you can see what files they came from. As a high-level commentary about this patch, I'm not sure what most end users will ever do with this data. At the same time, I wasn't sure what a typical end user would do with anything else in pg_stat_bgwriter either when it was added, and it turns out the answer is "wait for people who know the internals to write things that monitor it". For example, Magnus has updated recent versions of the Munin plug-in for PostgreSQL to usefully graph pg_stat_bgwriter data over time. As I have some data to suggest checkpoint problems on Linux in particular are getting worse as total system memory increases, I expect that having a way to easily instrument for this particular problem will be correspondingly more important in the future too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index dbf966b..f701b8f 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -264,8 +264,10 @@ postgres: user database host diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 09574c3..8766cc
Re: [HACKERS] wCTE behaviour
On 2010-11-14 21:06, Marko Tiikkaja wrote: On 2010-11-14 8:51 PM +0200, Yeb Havinga wrote: On 2010-11-14 19:35, Robert Haas wrote: On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja wrote: In my opinion, all of these should have the same effect: DELETE all rows from "foo". Any other option means we're going to have trouble predicting how a query is going to behave. I think it's clear that's the only sensible behavior. What if CTE's ever get input parameters? What about input parameters? With input parameters there is a clear link between a CTE and a caller. If a CTE is called more than once, it must be executed more than once, e.g. (notation t:x means cte has parameter x) WITH t:x AS (INSERT INTO foo VALUES(x) RETURNING *) SELECT (SELECT * FROM t(1)), (SELECT * FROM t(2)); runs the cte two times, hence two new rows in foo. But what about WITH t:x AS (INSERT INTO foo VALUES(x) RETURNING *) SELECT (SELECT t(1)), (SELECT t(1)); it would be strange to expect a single row in foo here, since the only thing different from the previous query is a constant value. Though I like the easyness of "run exactly once" for uncorrelated cte's, I still have the feeling that it somehow mixes the expression and operational realm. In logic there's a difference between a proposition and an assertion. With "run exactly once", stating a proposition is made synonymous to asserting it. That makes syntactic operations or rewriting of writable CTEs hard, if not impossible. For instance, variable substitution in the second example makes a CTE without parameters: WITH t' AS (INSERT INTO foo VALUES(1) RETURNING *), t'' AS AS (INSERT INTO foo VALUES(1) RETURNING *), SELECT (SELECT t'), (SELECT t''); since t' and t'' are equal, WITH t' AS (INSERT INTO foo VALUES(1) RETURNING *) SELECT (SELECT t'), (SELECT t'); A syntactic operation like this on the query should not result in a different operation when it's run. Hence two new rows in foo are still expected, but the "run exactly once" dictates one new row for that query. regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
Robert Haas writes: > On Sun, Nov 14, 2010 at 1:51 PM, Yeb Havinga wrote: >> What if CTE's ever get input parameters? > Then they'd be functions, which we already have. If you mean something like prepare foo(int) as with x as (delete from tab where id = $1 returning *) insert into log_table select * from x; I don't see that the parameter makes things any less well-defined. If you mean a parameter in the sense of an executor parameter passed in from a surrounding nestloop, that'd scare me too --- but I thought we were going to disallow wCTEs except at the top level of a query, so the case wouldn't arise. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal
On Sun, Nov 14, 2010 at 8:52 PM, Josh Berkus wrote: > For example, imagine if the hint bits were moved to a separate per-table > bitmap outside the table instead of being stored with each row, as the > current FSM is. How many times do we have to keep going around the same block? We *already* have separate bitmap outside the table for transaction commit bits. It's the clog. The only reason the hint bits exist is to cache that so we don't need to do extra I/O to check tuple visibility. If the hint bits are moved outside the table then they serve no purpose whatsover. Then you have an additional I/O to attempt to save an additional I/O. The only difference between the clog and your proposal is that the clog is two bits per transaction and your proposal is 4 bits per tuple. The per-tuple idea guarantees that the extra I/O will be very localized which isn't necessarily true for the clog but the clog is small enough that it probably is true anyways. And even if there's no I/O the overhead to consult the clog/per-table fork in memory is probably significant. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
On Sun, Nov 14, 2010 at 08:07:22PM +0200, Marko Tiikkaja wrote: > On 2010-11-14 8:01 PM +0200, I wrote: > >In my opinion, all of these should have the same effect: DELETE all rows > >from "foo". > > Since the example wasn't entirely clear on this one: in my opinion > the DML should also only be executed once. So: > > WITH t AS (INSERT INTO foo VALUES (0) RETURNING *) > SELECT 1 FROM t t1, t t2; > > would only insert one row in any case. Right :) Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Instrument checkpoint sync calls
Attached patch adds some logging for each individual fsync call made during a checkpoint, along with a summary at the end. You need to have the following to see all of the detail: log_checkpoints=on log_min_messages=debug1 And here's a sample: LOG: checkpoint starting: immediate force wait DEBUG: checkpoint sync: file=1 time=1.946000 msec DEBUG: checkpoint sync: file=2 time=0.666000 msec DEBUG: checkpoint sync: file=3 time=0.004000 msec LOG: checkpoint sync: files=3 longest=1.946000 msec average=0.872000 msec LOG: checkpoint complete: wrote 3 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.002 s, total=0.003 s I think that it's reasonable for the sort of people who turn log_checkpoints on to also get the sync summary line, thus it being logged at LOG level. The detail on individual syncs might go to DEBUG2 or lower instead of DEBUG1 where I put it, that part I don't have a strong opinion on. It's at DEBUG1 to make testing the patch without a gigantic amount of log data also coming in easier. Right now the code is written such that all the calls that grab timing information are wrapped around "ifdef DEBUG_FSYNC", which is a variable set to 1 that could be a compile-time option like DEBUG_DEADLOCK, to allow turning this code path off at build time. I personally think that if you're already making an fsync call and have log_checkpoints on, the additional overhead of also timing that fsync is minimal even on platforms where timing is slow (I don't have such a system to test that assumption however). And I've seen enough value in troubleshooting nasty checkpoint sync problems using this patch to feel it's worth having even if it does add some overhead. I'm a little concerned about log_checkpoints changing on me in the middle of the execution of a checkpoint, which would cause some problems here. Not sure if that's something I should explicitly code for, given that all I think it will do is screw up one of the timing results. It does seem a risk from the last minute self-review I just did of the code. I'll give a sample program that stresses the system, generating slow timing results and other types of bad behavior, along with the next patch I submit here shortly. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index 1219fcf..c72da72 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -20,6 +20,7 @@ #include "catalog/catalog.h" #include "miscadmin.h" +#include "portability/instr_time.h" #include "postmaster/bgwriter.h" #include "storage/fd.h" #include "storage/bufmgr.h" @@ -29,6 +30,11 @@ #include "utils/memutils.h" #include "pg_trace.h" +/* + * For now just leave this on all the time, eventually this will + * need to be a full compiler flag like DEBUG_DEADLOCK + */ +#define DEBUG_FSYNC 1 /* interval for calling AbsorbFsyncRequests in mdsync */ #define FSYNCS_PER_ABSORB 10 @@ -927,6 +933,15 @@ mdsync(void) PendingOperationEntry *entry; int absorb_counter; +#ifdef DEBUG_FSYNC + /* Statistics on sync times */ + int processed = 0; + instr_time sync_start,sync_end,diff; + double elapsed; + double longest = 0; + double total_elapsed = 0; +#endif + /* * This is only called during checkpoints, and checkpoints should only * occur in processes that have created a pendingOpsTable. @@ -1069,9 +1084,32 @@ mdsync(void) seg = _mdfd_getseg(reln, entry->tag.forknum, entry->tag.segno * ((BlockNumber) RELSEG_SIZE), false, EXTENSION_RETURN_NULL); + +#ifdef DEBUG_FSYNC +if (log_checkpoints) + INSTR_TIME_SET_CURRENT(sync_start); +#endif if (seg != NULL && FileSync(seg->mdfd_vfd) >= 0) +{ + +#ifdef DEBUG_FSYNC + if (log_checkpoints) + { + INSTR_TIME_SET_CURRENT(sync_end); + diff=sync_end; + INSTR_TIME_SUBTRACT(diff, sync_start); + elapsed = (double) INSTR_TIME_GET_MICROSEC(diff) / 1000.0; + if (elapsed > longest) + longest = elapsed; + total_elapsed+=elapsed; + processed++; + /* Reduce the log level here to DEBUG2 in final patch? */ + elog(DEBUG1, "checkpoint sync: file=%d time=%f msec", processed, elapsed); + } +#endif break; /* success; break out of retry loop */ + } /* * XXX is there any point in allowing more than one retry? @@ -1113,6 +1151,13 @@ mdsync(void) elog(ERROR, "pendingOpsTable corrupted"); } /* end loop over hashtable entries */ +#ifdef DEBUG_FSYNC + /* Report on sync performance metrics */ + if (log_checkpoints && (processed > 0)) + elog(LOG, "checkpoint sync: files=%d longest=%f msec average=%f msec", + processed, longest, total_elapsed / processed); +#endif + /* Flag successful completion of mdsync */ mdsync_in_progress = false; } -- Se
Re: [HACKERS] pg_stat_bgwriter broken?
Greg Smith writes: > Tom Lane wrote: >> Worksforme. You probably need a full recompile and/or initdb > Yeah, sorry about the noise. This went away after some more intensive > rebuilding. I think I still want to add some regression testing of this > view as suggested. If that had been there, I'd have been a lot more > confident it was my mistake, and not something that just slipped through > without being tested. [ shrug... ] That's one of the simplest views in the entire system. It's very hard to conceive of a bug that would affect it and not other views, *especially* such a bug that would be exposed by a test that doesn't actually exercise the functionality, such as you're suggesting here. If the cause was what I suggested, most likely the other stats views were broken too --- were you able to pass the regression tests in that state? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
On Sun, Nov 14, 2010 at 1:51 PM, Yeb Havinga wrote: > On 2010-11-14 19:35, Robert Haas wrote: >> >> On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja >> wrote: >>> >>> In my opinion, all of these should have the same effect: DELETE all rows >>> from "foo". Any other option means we're going to have trouble >>> predicting >>> how a query is going to behave. >> >> I think it's clear that's the only sensible behavior. > > What if CTE's ever get input parameters? Then they'd be functions, which we already have. As Tom recently pointed out, you can even make them temporary with an explicit pg_temp schema qualification. Perhaps someday we'll have lambda-expressions, but I have no reason to believe that they'll use any of the wCTE syntax. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
On Sun, Nov 14, 2010 at 2:27 PM, Daniel Farina wrote: > On Sun, Nov 14, 2010 at 7:47 AM, Tom Lane wrote: >> Daniel Farina writes: >>> Here are some weaknesses in the SUM aggregate that run up against the >>> type system. Maybe they'll help crystallize some discussion: >> >>> SUM(int2) => int4 >>> SUM(int4) => int8 >>> SUM(int8) => numeric >> >>> Some weaknesses: >> >>> SUM, of any precision, assumes that the precision being accumulated >>> into (which is also the return-precision) is enough to avoid overflow. >> >> This is not a flaw of the type system, it's just an implementation >> choice in the SUM() aggregates. We could easily have chosen wider >> accumulation and/or result types. > > That's true, but there are downsides to escalating the precision so > aggressively. > > The case I was thinking about in particular involves composition of > SUM. If one can assume that a relation has int4s and that will never > overflow an int8 (as is done now), I don't see a great way to optimize > the following case without special exceptions in the optimizer for > particular aggregates known a-priori. Here's what would happen now: > > SELECT SUM(x::int8)::numeric > FROM (SELECT SUM(x::int4)::int8 AS x > FROM rel > GROUP BY y) some_name; > > Could be rendered, by this assumption, as: > > SELECT SUM(x::int8)::int8 > (same FROM clause) > > (Why would anyone write a query like this? Views. Possibly inlined SQL > UDFs, too.) > > This can be measurably faster. It also more properly constrains the > result type, as numeric can also handle non-integer quantities. > > I should have underscored that a positive aspect of having a > type-class like facility that allows declaration things like this > hypothetical Integer when backed by concrete types that might support > a superset of functionality. Like Tom, I'm not sure this is really a type-system problem. This sounds like a complaint that operations on "numeric" are much slower than operations on "int4" and "int8", even for values that could be represented by either type. I think that's a valid complaint, but I don't see how changing the type system would help. I think what you'd need to is optimize the existing numeric type, or provide a new numeric-ish type with optimizations for dealing with small-to-medium-sized integers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_bgwriter broken?
Tom Lane wrote: Worksforme. You probably need a full recompile and/or initdb Yeah, sorry about the noise. This went away after some more intensive rebuilding. I think I still want to add some regression testing of this view as suggested. If that had been there, I'd have been a lot more confident it was my mistake, and not something that just slipped through without being tested. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_bgwriter broken?
Greg Smith writes: > I'm behind on my list mail so maybe this has been mentioned already, but > when I just tried pg_stat_bgwriter from a build against today's HEAD I > got this: > pgbench=# select count(*) FROM pg_stat_bgwriter; > ERROR: did not find '}' at end of input node Worksforme. You probably need a full recompile and/or initdb. This is a typical symptom when someone adds a field to a parse node type and you have code or stored rules that haven't been updated. (Usually though we try to bump catversion when an initdb is needed because of such a change.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in plpython's Python Generators
Thanks, Jan. I tested the patch and it looks fine. Oleg On Sun, 14 Nov 2010, Jan Urbaski wrote: On 24/10/10 00:32, Jan UrbaЪЪski wrote: On 21/10/10 20:48, Alvaro Herrera wrote: ... and presumably somebody can fix the real bug that Jean-Baptiste hit, too. AFAICS the error comes from PLy_function_handler disconnecting from SPI after calling into the Python code and then going ahead and reading the result from the iterator. Here's a patch with a fix for that bug. Cheers, Jan Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: extensible enums
>> I'd say put it on and mark it with an [E]. We could use some more >> [E]asy items for that list. > > We don't need to add marginally-useful features just because they're > easy. If it doesn't have a real use-case, the incremental maintenance > cost of more code is a good reason to reject it. I'll bite. Use-case: 1) DBA adds "Department Role" enum, with set {'Director','Secretary','Staff','Support','Temporary','Liason'}. 2) 3-person data entry team updates all employee records with those roles. 3) First summary report is run. 4) Manager points out that "Liason" is misspelled. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Heikki Linnakangas writes: > On 13.11.2010 17:07, Tom Lane wrote: >> Robert Haas writes: >>> Come to think of it, I'm not really sure I understand what protects >>> SetLatch() against memory ordering hazards. Is that actually safe? >> >> Hmm ... that's a good question. It certainly *looks* like it could >> malfunction on machines with weak memory ordering. > Can you elaborate? Weak memory ordering means that stores into shared memory initiated by one processor are not guaranteed to be observed to occur in the same sequence by another processor. This implies first that the latch code could malfunction all by itself, if two processes manipulate a latch at about the same time, and second (probably much less likely) that there could be a malfunction involving a process that's waited on a latch not seeing the shared-memory status updates that another process did "before" setting the latch. This is not at all hypothetical --- my first attempt at rewriting the sinval signaling code, a couple years back, failed on PPC machines in the buildfarm because of exactly this type of issue. The quick-and-dirty way to fix this is to attach a spinlock to each latch, because we already have memory ordering sync instructions in the spinlock primitives. Doing better would probably involve developing a new set of processor-specific primitives --- which would be pretty easy for the processors where we have gcc inline asm, but it would take some research for the platforms where we're relying on magic OS-provided subroutines. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
Daniel Farina wrote: On Sat, Nov 13, 2010 at 7:54 PM, Darren Duncan wrote: You don't have to kludge things by implementing arrays as blobs for example; you can implement them as relations instead. Geospatial types can just be tuples. Arrays of structured types can just be relations with an attribute per type attribute. Arrays of simple types can just be unary relations. In practice, my guess is the performance of these approaches would be terrible for a number of workloads. I don't agree that arrays having their own storage implementation is a kludge: there are even operators like unnest that can be used to turn them back into relations. I'm not discounting this at all. The data model can formally define types in one way and a DBMS can implement various well known cases in specialized ways that are more efficient. Arrays are a good example. But with other cases they still get a default implementation. If arrays are flexible enough, then different arrays could be implemented differently, eg some as blobs and some as relations; the former could be better for small or simple arrays and the latter for large or complex arrays. I have long thought (but never really gave voice to) there being value having first-class relation values, but I would also say that's another kettle of fish. I also want closures, and don't think that's completely nuts. I think that first-class functions are important at least, if not full-blown closures. You can define generic relational restrictions or summaries or whatever with such; eg, the general case of a WHERE is a function that takes a relation and a function as input, and results in a relation; more restricted cases of WHERE can be defined with simpler functions that take 2 relation inputs instead. You may want to learn more about this, first. Postgres's type system, while relatively simple, is not as ill-conceived or primitive as you seem to assume it is. There are also a lot of important system details, like expanding/replacing the typmod facility that only allows Postgres 32 bits of type-augmenting data (such as the (1) in the type numeric(1)). I'm not saying that Pg's system is primitive et al. The thread starter said it needed an overhaul, so indicating there are deficiencies, and I'm suggesting some effective ways to do that. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal
> If we got rid of hint bits, we'd need workarounds for the ensuing > massive performance loss. There is no reason whatsoever to imagine > that we'd come out ahead in the end. Oh, there's no question that we need something which serves the same purpose as the existing hit bits. But there's a lot of question about whether our existing implementation is optimal. For example, imagine if the hint bits were moved to a separate per-table bitmap outside the table instead of being stored with each row, as the current FSM is. Leaving aside the engineering required for this (which would be considerable, especially when it comes to consistency and durability), this would potentially allow solutions to the following issues: * Index-only access * I/O associated with hint bit setting * Vacuum freezing old-cold data * Page-level CRCs * Rsyncing tables for replication Alternately, we could attack this by hint bit purpose. For example, if we restructured the CLOG so that it was an efficient in-memory index (yes, I'm being handwavy), then having the XID-is-visible hint bits might become completely unnecessary. We could then also improve the visibility map to be reliable and include "frozen" bits as well. Overall, what I'm pointing out is that our current implementation of hint bits is blocking not one by several major features and causing our users performance pain. It's time to look for an implementation which doesn't have the same problems we're familiar with. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_stat_bgwriter broken?
I'm behind on my list mail so maybe this has been mentioned already, but when I just tried pg_stat_bgwriter from a build against today's HEAD I got this: pgbench=# select count(*) FROM pg_stat_bgwriter; ERROR: did not find '}' at end of input node Can someone confirm if this broke recently, or is it just me? Last time I would have tested this myself was a few weeks ago. Regardless, I was thinking of adding some basic sanity checking on this view, that runs from the regression tests to catch this class of problem in the future. It's kind of sloppy that this and the bgwriter counter reset aren't tested as part of "make check". I think these two would always produce stable results: postgres=# SELECT count(*) FROM pg_stat_bgwriter; count --- 1 postgres=# SELECT pg_stat_reset_shared('bgwriter'); pg_stat_reset_shared -- (1 row) And the first one of those is similarly broken on my install. Thoughts on whether adding those to the regression tests would be a worthwhile patch? I'll do the work, just thinking out loud about the concept. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On 13.11.2010 17:07, Tom Lane wrote: Robert Haas writes: Come to think of it, I'm not really sure I understand what protects SetLatch() against memory ordering hazards. Is that actually safe? Hmm ... that's a good question. It certainly *looks* like it could malfunction on machines with weak memory ordering. Can you elaborate? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
On 2010-11-14 8:51 PM +0200, Yeb Havinga wrote: On 2010-11-14 19:35, Robert Haas wrote: On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja wrote: In my opinion, all of these should have the same effect: DELETE all rows from "foo". Any other option means we're going to have trouble predicting how a query is going to behave. I think it's clear that's the only sensible behavior. What if CTE's ever get input parameters? What about input parameters? Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a new problem in MERGE
Boxuan Zhai wrote: I have plan to fix the above two bugs together. (in fact, I have already started coding in merge_v202 edition). My question is how should I make my update be consistent with yours. Is it possible for you to give me an edition that I can work on? I just got this reconciled with HEAD again. There have been two changes I made in the code you'll eventually want in your working copy: 1) Fix NIL/NULL confusion: https://github.com/greg2ndQuadrant/postgres/commit/9013ba9e81490e3623add1b029760817021297c0 2) Update ExecMerge to accept and pass through an oldtuple value. This is needed to make the code compatible with the PostgreSQL git HEAD after the changes made in the 2009-09 CommitFest. Bit rot updates made: https://github.com/greg2ndQuadrant/postgres/commit/be03bd201720f42a666f7e356ec8507c1357f502 I'm not sure if how I did (2) is correct for all cases, but at least the code compiles again now and the server will start. Attached is an updated patch that applies to HEAD as of right now, and that code has been pushed to https://github.com/greg2ndQuadrant/postgres/tree/merge-unstable with the changes rebased to be the last two commits. It fails "make installcheck" on my system. But as the initial diffs I looked at relate to enums and such, I don't think that's a problem with your patch. Will investigate further here with some of my own patches I'm working on today. Hopefully this is enough to unblock what you were looking for more details from me about. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD merge-v203-20101114.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
On Sun, Nov 14, 2010 at 7:47 AM, Tom Lane wrote: > Daniel Farina writes: >> Here are some weaknesses in the SUM aggregate that run up against the >> type system. Maybe they'll help crystallize some discussion: > >> SUM(int2) => int4 >> SUM(int4) => int8 >> SUM(int8) => numeric > >> Some weaknesses: > >> SUM, of any precision, assumes that the precision being accumulated >> into (which is also the return-precision) is enough to avoid overflow. > > This is not a flaw of the type system, it's just an implementation > choice in the SUM() aggregates. We could easily have chosen wider > accumulation and/or result types. That's true, but there are downsides to escalating the precision so aggressively. The case I was thinking about in particular involves composition of SUM. If one can assume that a relation has int4s and that will never overflow an int8 (as is done now), I don't see a great way to optimize the following case without special exceptions in the optimizer for particular aggregates known a-priori. Here's what would happen now: SELECT SUM(x::int8)::numeric FROM (SELECT SUM(x::int4)::int8 AS x FROM rel GROUP BY y) some_name; Could be rendered, by this assumption, as: SELECT SUM(x::int8)::int8 (same FROM clause) (Why would anyone write a query like this? Views. Possibly inlined SQL UDFs, too.) This can be measurably faster. It also more properly constrains the result type, as numeric can also handle non-integer quantities. I should have underscored that a positive aspect of having a type-class like facility that allows declaration things like this hypothetical Integer when backed by concrete types that might support a superset of functionality. fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
On 2010-11-14 19:35, Robert Haas wrote: On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja wrote: In my opinion, all of these should have the same effect: DELETE all rows from "foo". Any other option means we're going to have trouble predicting how a query is going to behave. I think it's clear that's the only sensible behavior. What if CTE's ever get input parameters? regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] POSIX shared memory redux
On Sun, Nov 14, 2010 at 11:06 AM, Tom Lane wrote: > Martijn van Oosterhout writes: >> The only real solution seems to me to be to keep a small SysV shared >> memory segment for the locking and allocate the rest of the shared >> memory some other way. > > Yeah, that's been discussed. It throws all the portability gains out > the window. It might get you out from under the need to readjust a > machine's SHMMAX setting before you can use a large amount of shared > memory, but it's not clear that's enough of a win to be worth the > trouble. One of the things that would be really nice to be able to do is resize our shm after startup, in response to changes in configuration parameters. That's not so easy to make work, of course, but I feel like this might be going in the right direction, since POSIX shms can be resized using ftruncate(). > The other direction that we could possibly go is to find some other way > entirely of interlocking access to the data directory. If for example > we could rely on a file lock held by the postmaster and all backends, > we could check that instead of having to rely on a shmem behavior. > The killer objection to that so far is that file locking is unreliable > in some environments, particularly NFS. But it'd have some advantages > too --- in particular, in the NFS context, the fact that the lock is > visible to would-be postmasters on different machines might be thought > a huge safety improvement over what we do now. I've never had a lot of luck making filesystem locks work reliably, but I don't discount the possibility that I was doing it wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja wrote: > In my opinion, all of these should have the same effect: DELETE all rows > from "foo". Any other option means we're going to have trouble predicting > how a query is going to behave. I think it's clear that's the only sensible behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HOT updates in index-less tables
On 14.11.2010 00:29, Robert Haas wrote: On Sat, Nov 13, 2010 at 12:13 PM, Tom Lane wrote: Hannu Krosing writes: On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote: If a table has no indexes, we will always decide that any same-page update operation is a HOT update, since obviously it isn't modifying any indexed columns. But is there any benefit to doing so? If we do the in-page "mini vacuum" even without HOT, then there should be no benefit from index-less HOT updates. AFAICS we do: heap_update marks the page as prunable whether it's a HOT update or not. The only difference between treating the update as HOT vs not-HOT is that if there was more than one HOT update, the intermediate tuples could be completely reclaimed by page pruning (ie, their line pointers go away too). With not-HOT updates, the intermediate line pointers would have to remain in DEAD state until vacuum, since page pruning wouldn't know if there were index entries pointing at them. But that seems like a pretty tiny penalty. I'm not at all convinced that's a tiny penalty. Me neither. It's a tiny penalty when you consider one update, but if you repeatedly update the same tuple, you accumulate dead line pointers until the next real vacuum runs. With HOT updates, you reach a steady state where page pruning is all you need. Then again, if you're repeatedly updating a row in a table with no indexes, presumably it's a very small table or you would create an index on it. And frequently autovacuuming a small index is quite cheap too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in plpython's Python Generators
On Sun, Oct 24, 2010 at 01:32, Jan Urbański wrote: > The error handling in plpython is well-known to be a mess, hence the > confusing error message that OP got. Another annoying thing is that SPI > calls are not done in a subtransaction, which means you can't trap > errors with a try/catch Python construct. I'm currently trying to get > try/catch to work and might end up cleaning up error handling as well... > but this can take me some time, so maybe someone should plug this > particular hole right away. While on this topic, maybe you can chime in on the pgsql-docs list for my documentation patches to document the current (broken) behavior? Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
On 2010-11-14 8:01 PM +0200, I wrote: In my opinion, all of these should have the same effect: DELETE all rows from "foo". Since the example wasn't entirely clear on this one: in my opinion the DML should also only be executed once. So: WITH t AS (INSERT INTO foo VALUES (0) RETURNING *) SELECT 1 FROM t t1, t t2; would only insert one row in any case. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
On 2010-11-14 5:28 PM +0200, Hitoshi Harada wrote: 2010/11/14 Marko Tiikkaja: .. and a wild patch appears. Could you update wiki on this feature if you think we've reached the consensus? You're probably referring to http://archives.postgresql.org/pgsql-hackers/2010-11/msg00660.php which was unfortunately just me talking too soon. There still doesn't appear to be a consensus on the difference (if any) between these queries: WITH t AS (DELETE FROM foo RETURNING *) SELECT 1LIMIT 0; -- unreferenced CTE WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 FROM t LIMIT 0; -- referenced, but not read WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 FROM t LIMIT 1; -- referenced, but only partly read WITH t AS (DELETE FROM foo RETURNING *) SELECT 1 FROM t t1, t t2; -- referenced, read multiple times In my opinion, all of these should have the same effect: DELETE all rows from "foo". Any other option means we're going to have trouble predicting how a query is going to behave. As far as I know, we do have a consensus that the order of execution should be an implementation detail, and that the statements should always be executed in the exact same snapshot (i.e. no CID bump between). Also, wrapping up the discussion like pros& cons on the different execution models helps not only the advance discussions but also reviews of this patch. Do you mean between the "execute in order, bump CID" and "execute in whatever order but to completion" behaviours? Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support
I wrote: > I still think it's worth looking into whether the bug can be dodged > by shortening the eval calls. In fact, that does seem to work; I'll commit a patch after testing a bit more. We still need someone to add the missing build dependencies so that make -j is trustworthy again. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add label to enum syntax
On Nov 14, 2010, at 7:42 AM, Andrew Dunstan wrote: > It's fairly unscientific and inconclusive, and the discussion seems to have > died. I think since Tom and I did most of the work on this our voices should > count a little louder :-) , so I'm going to go with his suggestion of VALUE, > unless there are loud squawks. +1 Maybe update the doc mentions of "label," then. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in plpython's Python Generators
On 24/10/10 00:32, Jan Urbański wrote: > On 21/10/10 20:48, Alvaro Herrera wrote: >> ... and presumably somebody can fix the real bug that Jean-Baptiste hit, >> too. > > AFAICS the error comes from PLy_function_handler disconnecting from SPI > after calling into the Python code and then going ahead and reading the > result from the iterator. Here's a patch with a fix for that bug. Cheers, Jan *** src/pl/plpython/expected/plpython_setof.out --- src/pl/plpython/expected/plpython_setof.out 2010-11-14 17:38:39.0 +0100 *** *** 31,36 --- 31,44 return self.icontent return producer(count, content) $$ LANGUAGE plpythonu; + CREATE FUNCTION test_setof_spi_in_iterator() RETURNS SETOF text AS + $$ + for s in ('Hello', 'Brave', 'New', 'World'): + plpy.execute('select 1') + yield s + plpy.execute('select 2') + $$ + LANGUAGE plpythonu; -- Test set returning functions SELECT test_setof_as_list(0, 'list'); test_setof_as_list *** *** 107,109 --- 115,126 (2 rows) + SELECT test_setof_spi_in_iterator(); + test_setof_spi_in_iterator + + Hello + Brave + New + World + (4 rows) + *** src/pl/plpython/plpython.c --- src/pl/plpython/plpython.c 2010-11-14 17:38:39.0 +0100 *** *** 1000,1012 } /* ! * Disconnect from SPI manager and then create the return values datum ! * (if the input function does a palloc for it this must not be ! * allocated in the SPI memory context because SPI_finish would free ! * it). */ - if (SPI_finish() != SPI_OK_FINISH) - elog(ERROR, "SPI_finish failed"); if (proc->is_setof) { --- 1000,1010 } /* ! * Cannot disconnect from SPI yet, because in case of a SRF returning a ! * Python iterator we will use PyIter_Next() which calls back into ! * Python code, which can contain calls to SPI functions. In that case ! * we will disconnect from SPI after the iterator is exhausted. */ if (proc->is_setof) { *** *** 1064,1074 --- 1062,1085 (errcode(ERRCODE_DATA_EXCEPTION), errmsg("error fetching next item from iterator"))); + /* Disconnect from the SPI manager before returning. */ + if (SPI_finish() != SPI_OK_FINISH) + elog(ERROR, "SPI_finish failed"); + fcinfo->isnull = true; return (Datum) NULL; } } + /* + * Disconnect from SPI manager and then create the return values datum + * (if the input function does a palloc for it this must not be + * allocated in the SPI memory context because SPI_finish would free + * it). + */ + if (SPI_finish() != SPI_OK_FINISH) + elog(ERROR, "SPI_finish failed"); + plerrcontext.callback = plpython_return_error_callback; plerrcontext.previous = error_context_stack; error_context_stack = &plerrcontext; *** src/pl/plpython/sql/plpython_setof.sql --- src/pl/plpython/sql/plpython_setof.sql 2010-11-14 17:38:39.0 +0100 *** *** 35,40 --- 35,50 return producer(count, content) $$ LANGUAGE plpythonu; + CREATE FUNCTION test_setof_spi_in_iterator() RETURNS SETOF text AS + $$ + for s in ('Hello', 'Brave', 'New', 'World'): + plpy.execute('select 1') + yield s + plpy.execute('select 2') + $$ + LANGUAGE plpythonu; + + -- Test set returning functions SELECT test_setof_as_list(0, 'list'); *** *** 51,53 --- 61,65 SELECT test_setof_as_iterator(1, 'list'); SELECT test_setof_as_iterator(2, 'list'); SELECT test_setof_as_iterator(2, null); + + SELECT test_setof_spi_in_iterator(); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support
Robert Haas writes: > On Nov 14, 2010, at 10:44 AM, Tom Lane wrote: >> I still think it's worth looking into whether the bug can be dodged >> by shortening the eval calls. But if not, I think I'd vote for >> reverting. Maybe we could revisit this in a couple of years. > +1. The current master branch fails to build on my (rather new) Mac > with make -j2. I complained of the same thing, but AFAICS that's not a make bug; it's a missing build dependency, which could be fixed if we choose to keep this infrastructure. It probably ought to be fixed even if we don't. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support
On Nov 14, 2010, at 10:44 AM, Tom Lane wrote: > I still think it's worth looking into whether the bug can be dodged > by shortening the eval calls. But if not, I think I'd vote for > reverting. Maybe we could revisit this in a couple of years. +1. The current master branch fails to build on my (rather new) Mac with make -j2. I could upgrade my toolchain but it seems like more trouble than it's worth, not to mention a possible obstacle to new users and developers. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support
On 11/14/2010 10:44 AM, Tom Lane wrote: And on the fourth hand, what we're buying here is pretty marginal for developers and of no interest whatever for users. I still think it's worth looking into whether the bug can be dodged by shortening the eval calls. But if not, I think I'd vote for reverting. Maybe we could revisit this in a couple of years. +1 cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] POSIX shared memory redux
Martijn van Oosterhout writes: > The only real solution seems to me to be to keep a small SysV shared > memory segment for the locking and allocate the rest of the shared > memory some other way. Yeah, that's been discussed. It throws all the portability gains out the window. It might get you out from under the need to readjust a machine's SHMMAX setting before you can use a large amount of shared memory, but it's not clear that's enough of a win to be worth the trouble. The other direction that we could possibly go is to find some other way entirely of interlocking access to the data directory. If for example we could rely on a file lock held by the postmaster and all backends, we could check that instead of having to rely on a shmem behavior. The killer objection to that so far is that file locking is unreliable in some environments, particularly NFS. But it'd have some advantages too --- in particular, in the NFS context, the fact that the lock is visible to would-be postmasters on different machines might be thought a huge safety improvement over what we do now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
Daniel Farina writes: > Here are some weaknesses in the SUM aggregate that run up against the > type system. Maybe they'll help crystallize some discussion: > SUM(int2) => int4 > SUM(int4) => int8 > SUM(int8) => numeric > Some weaknesses: > SUM, of any precision, assumes that the precision being accumulated > into (which is also the return-precision) is enough to avoid overflow. This is not a flaw of the type system, it's just an implementation choice in the SUM() aggregates. We could easily have chosen wider accumulation and/or result types. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support
Dave Page writes: > On Sat, Nov 13, 2010 at 8:13 PM, Peter Eisentraut wrote: >> Well, it looks like $(eval) is pretty broken in 3.80, so either we >> require 3.81 or we abandon this line of thought. > 3.81 might be a problem for Solaris - unless I pay for a support > contract from Oracle, I'm not going to get any updates from them, > which means I'll have to install a custom build. Now that's no biggie > for me, but it does see to raise the bar somewhat for users that might > want to build from source. For another data point, I find make 3.80 in OS X 10.4, while 10.5 and 10.6 have 3.81. 10.4 is certainly behind the curve, but Apple still seem to be releasing security updates for it. I was about to draw an analogy to flex -- we are now requiring a version of flex that's roughly contemporaneous with make 3.81. However, we don't require flex to build from a tarball, so on second thought that situation isn't very comparable. Moving the goalposts for make would definitely affect more people. On the third hand, gmake is very very easy to install: if you're capable of building Postgres from source, it's hard to believe that gmake should scare you off. (I've installed multiple versions on my ancient HPUX dinosaur, and it's never been any harder than ./configure, make, make check, make install.) And on the fourth hand, what we're buying here is pretty marginal for developers and of no interest whatever for users. I still think it's worth looking into whether the bug can be dodged by shortening the eval calls. But if not, I think I'd vote for reverting. Maybe we could revisit this in a couple of years. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add label to enum syntax
On 10/29/2010 01:47 AM, Pavel Golub wrote: Hello, Alvaro. You wrote: AH> Excerpts from Pavel Golub's message of jue oct 28 07:50:24 -0300 2010: Forgot link to poll: http://pgolub.wordpress.com/2010/10/28/poll-alter-type-enumtype-add-what-newlabel/ AH> Hah, there are 17 votes as of right now, no option is below 23% and no AH> option is above 29%. Yeah, right now 42 votes: VALUE 26% LABEL 26% Just ADD 'newlabel' 24% ELEMENT 21% MEMBER 2% It's fairly unscientific and inconclusive, and the discussion seems to have died. I think since Tom and I did most of the work on this our voices should count a little louder :-) , so I'm going to go with his suggestion of VALUE, unless there are loud squawks. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
2010/11/14 Marko Tiikkaja : > On 2010-11-12 8:25 PM +0200, I wrote: >> >> I'm going to take some time off this weekend to get a patch with this >> behaviour to the next commitfest. > > .. and a wild patch appears. > > This is almost exactly the patch from 2010-02 without > CommandCounterIncrement()s. It's still a bit rough around the edges and > needs some more comments, but I'm posting it here anyway. > > This patch passes all regression tests, but feel free to try to break it, > there are probably ways to do that. This one also has the "always run DMLs > to completion, and exactly once" behaviour. > Could you update wiki on this feature if you think we've reached the consensus? http://wiki.postgresql.org/wiki/WriteableCTEs Also, wrapping up the discussion like pros & cons on the different execution models helps not only the advance discussions but also reviews of this patch. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] POSIX shared memory redux
On Sat, Nov 13, 2010 at 08:07:52PM -0500, Tom Lane wrote: > "A.M." writes: > > The goal of this work is to address all of the shortcomings of previous > > POSIX shared memory patches as pointed out mostly by Tom Lane. > > It seems like you've failed to understand the main shortcoming of this > whole idea, which is the loss of ability to detect pre-existing backends > still running in a cluster whose postmaster has crashed. The nattch > variable of SysV shmem segments is really pretty critical to us, and > AFAIK there simply is no substitute for it in POSIX-land. I've been looking and there really doesn't appear to be. This is consistant as there is nothing else in POSIX where you can determine how many other people have the same file, pipe, tty, etc open. I asked a few people for ideas and got answers like: just walk through /proc and check. Apart from the portability issues, this won't work if there are different user-IDs in play. The only real solution seems to me to be to keep a small SysV shared memory segment for the locking and allocate the rest of the shared memory some other way. If all backends map the SysV memory before the other way, then you can use the non-existance of the SysV SHM to determine the non-existance of the other segment. Quite a bit more work, ISTM. Haveva nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle signature.asc Description: Digital signature
Re: [HACKERS] Refactoring the Type System
On Fri, Nov 12, 2010 at 4:07 PM, Jeff Davis wrote: > I think the best we'll do is be able to hack on some of the things that > we actively want and have clear use cases for, such as type interfaces. > We might have to give up on some of the more ambitious ideas that > involve propagating interesting information through the type inference > system; or having any real type that wasn't declared with CREATE TYPE. > Consider that right now we bundle the element type information along > with the array _value_. Here are some weaknesses in the SUM aggregate that run up against the type system. Maybe they'll help crystallize some discussion: SUM(int2) => int4 SUM(int4) => int8 SUM(int8) => numeric Some weaknesses: SUM, of any precision, assumes that the precision being accumulated into (which is also the return-precision) is enough to avoid overflow. This is generally the case, but there's no reason why it *must* be true. I'm especially looking at the int2 to int4 conversion. One could imagine a more interesting scenario where overflow behavior could occur much more easily. SUM always promotes types upwards in precision, and is unable to keep types of the smallest possible precision should SUM expressions be composed. SUM is unable to maintain any supplementary information about precision, i.e. say anything interesting about the typmod, which defeats or makes impossible many interesting optimizations. I think that a type-interface system moves towards solving the first couple of problems, since SUM can return some abstract type such as "Integer" and use that to promote more aggressively (avoiding overflow) or keep representations small (avoiding unnecessary promotion) at run-time. It might require Integer to be an abstract, non-storable data type, though, so the current CREATE TYPE is not going to make life easy. The third problem is slightly different...it might require some user-pluggable code to be called as part of semantic analysis. I have felt the idea of making a Postgres Type a more robust first-class data type and somehow being able to attach a function to another function/aggregate that is responsible for getting called during semantic analysis and returning the proper signature roll around in my head, but it might be the whispers of cthulu, too. fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring the Type System
On Sat, Nov 13, 2010 at 7:54 PM, Darren Duncan wrote: > A key component of a good type system is that users can define data types, > and moreover where possible, system-defined types are defined in the same > ways as users define types. For example, stuff like temporal types or > geospatial types are prime candidates for being defined like user-defined > types. They are user defined types, and even separately compiled, distributed, and dynamically linked. > You don't have to kludge things by implementing arrays as blobs for example; > you can implement them as relations instead. Geospatial types can just be > tuples. Arrays of structured types can just be relations with an attribute > per type attribute. Arrays of simple types can just be unary relations. In practice, my guess is the performance of these approaches would be terrible for a number of workloads. I don't agree that arrays having their own storage implementation is a kludge: there are even operators like unnest that can be used to turn them back into relations. I have long thought (but never really gave voice to) there being value having first-class relation values, but I would also say that's another kettle of fish. I also want closures, and don't think that's completely nuts. > I also want to emphasize that, while I drew inspiration from many sources > when defining Muldis D, and there was/is a lot I still didn't/don't know > about Postgres, I have found that as I use and learn Postgres, I'm finding > frequently that how Postgres does things is similar and compatible to how I > independently came up with Muldis D's design; I'm finding more similarities > all the time. You may want to learn more about this, first. Postgres's type system, while relatively simple, is not as ill-conceived or primitive as you seem to assume it is. There are also a lot of important system details, like expanding/replacing the typmod facility that only allows Postgres 32 bits of type-augmenting data (such as the (1) in the type numeric(1)). fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support
On Sat, Nov 13, 2010 at 8:13 PM, Peter Eisentraut wrote: > On lör, 2010-11-13 at 20:07 +0200, Peter Eisentraut wrote: >> On lör, 2010-11-13 at 12:20 -0500, Tom Lane wrote: >> > Peter Eisentraut writes: >> > > On lör, 2010-11-13 at 11:12 -0500, Tom Lane wrote: >> > >> It looks like all the unhappy critters are getting the same "virtual >> > >> memory exhausted" error. I wonder whether they are all using make >> > >> 3.80 ... >> > >> > > It turns out that there is an unrelated bug in 3.80 that some Linux >> > > distributions have patched around. 3.81 or 3.82 are OK. >> > >> > So what do you mean by "unrelated bug"? Can we work around it? >> >> The information is fuzzy, but the problem has been reported around the >> internet, and it appears to be related to the foreach function. I think >> I have an idea how to work around it, but I'll need some time. > > Well, it looks like $(eval) is pretty broken in 3.80, so either we > require 3.81 or we abandon this line of thought. 3.81 might be a problem for Solaris - unless I pay for a support contract from Oracle, I'm not going to get any updates from them, which means I'll have to install a custom build. Now that's no biggie for me, but it does see to raise the bar somewhat for users that might want to build from source. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers